Exploring Data With RapidMiner

Exploring Data With RapidMiner

Exploring Data With RapidMiner

I recently received and promptly devoured a reviewer’s copy of Exploring Data With RapidMiner, written by Andrew Chisholm and published by Packt Publishing.

Here is the table of contents as I see it with the reader I use:

Exploring Data With RapidMiner TOC

Exploring Data With RapidMiner TOC

I had started looking at RapidMiner again a few weeks ago, seeking a more sturdy, general purpose natural language processing environment than the stylometry focused JGAAP. There are good video tutorials on this task, but that’s just not the same as having a book written by someone who really knows the product that lays out the details for you.

I was both surprised and pleased by what I found in this book. I was expecting there would be a lot of stuff about process design and the applications of the myriad of widgets within the RapidMiner system, but with the focus being analysis. Instead what I found is a book on getting datasets into RapidMiner in the first place. Having this book handy is going to do three things for me:

1.) I am an old unix guy so a lot of what the book covers is things that I would do with awk, grep, sed, and maybe python. The system permits those sorts of transformations to be done using graphical widgets which you daisy chain together, much like you pipe unix command line results from one process to the next. So the book is a guide to taking the skills I already have and turning them into RapidMiner processes.

2.) I work with others who want analytical work done on certain datasets. A couple of them are fairly technical, so much so that they will like getting a copy of this book, and then I can ship them finished processes from within RapidMiner, and they no longer have to put up with me being a choke point.

3.) There is advice about capacity planning and process execution time. This is normally hard won wisdom that comes from over running processor, memory, or disk. The particulars on what may be problematic and how to tune around it can be the win/lose factor when first evaluating a product.

Who needs a copy of this book?

If you’re a network administrator and a business unit you support wants to use RapidMiner, but they’ve only got the business analytics portion of the skill set, this book is your guide to getting them up and running.

If you’re a business analytics person intimidated by the process of getting stuff into RapidMiner so you can work on it, there is a lot of structure to this book, there are good examples. You can progress quickly from “How do I do this?” to “How do I do this to solve my specific business problem?”

If this was of interest to you, following my blog for at least the next several weeks might be a good move. I typically write at least one follow on piece beyond an initial review and expanding my RapidMiner skills is one of my bigger objectives for 2014.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s