Data Mining Book Review: Stats with Cats

A few months ago, I discovered the blog Stats with Cats, written by Charles Kufs. He recently (beginning of 2011) wrote a book about statistics with the same name as his blog. The book is a journey in the world of statistics and data analysis. The focus of the book is not on the technical aspects of statistics but rather how they are used in every day projects. The book can be easily read from the beginning till the end and is very didactic. It’s like a story which teaches you how to deal with data analysis.

The first part of the book introduces statistics by explaining data, samples, variables, file formats and many other topics. The second part is about the planning and management of data analysis projects. It’s particularly interesting for consultant in data analysis and data mining. In the third part, the author explains variable selection, sampling and variance among others. The next part deals with data quality and descriptive statistics. In the fifth part, you will learn everything you need to know about models. The last part will help you applying your knowledge in your job.

The book contains several advices for consultant as well as data analysis project manager. It also contains a lot of useful tables: checklist for variables, variance control techniques, top ten flaws in data analyses, etc. Screenshots of Excel, SPSS, Statistica are often given. Other tools such as SAS are also mentioned. I’m just surprised of the few mention (one I think) of MATLAB which is a rather important and powerful tool for data analysis.

To conclude, the book is rather… excellent! If you’re a beginner, this book will provide you with the tips and tricks you really need to know about data analysis. If you’re an experienced data analyst, it will still bring you advices in consulting and highlight issues about communicating statistics. With his book, Charles Kufs has succeeded in several ways. First, stats are not boring! More than that, it will bring you through a pleasant journey in the world of data analysis. Second, he has an amazing number of relevant quotes. And third, he mixes statistics with cats!