Without any suspense, “An Introduction to Statistical Learning” (ISL) by James, Witten, Hastie and Tibshirani is a key book in the Data Science literature. I would summarize it as a book written by statisticians for non-statisticians. Indeed, while the book “The Elements of Statistical Learning” was heavy on theory and equations, ISL is the practical counterpart. The book is very clear and contains only theory you need to understand the data mining algorithms covered. It’s thus a invaluable resource for Data Scientists who don’t need all theorems and proofs related to a given algorithm, but still need to understand how it works.
Several examples are provided to illustrate each algorithm. Each chapter contains a section with R labs, showing the code needed to move from reading the book to doing data science. The book has a strong emphasis on linear regression and related non-linear approaches (more than half of the book). This lets very few place to other approaches such as decision trees and SVM, which are still covered. The final chapter rapidly covers PCA and clustering.
Although the book is targeted towards a larger audience than statisticians, you shouldn’t be afraid of equations (by the way, if you look for an excellent book covering data science algorithms with nearly no equation, have a look at “Data Science for Business” from Provost and Fawcett). With such an excellent book, we are obviously more exigent and I was looking for more coverage of validity indices for clustering, Support Vector Regression, and a final chapter about trends and challenges. In conclusion, ISL is the definitive resource for Data Scientists who want to get the correct level of statistical background in their work.
This is a guest post from William Blears, Founder of Perceptive Digital.
There’s been a lot of talk lately about how data is driving online marketing forward. What you don’t often hear is how this same data is empowering customers to make wiser purchasing decisions. I thought it’d be interesting to highlight five ways in which empowered consumers can benefit from an abundance of available data.
Automate This is a journey into the world of anything that can be automated, from stock picking to medical diagnosis. The author, Christopher Steiner, excels in telling stories and bringing interesting anecdotes to the reader. Although focused on the trading world, the book explores topics such as automated music creation, geopolitical analysis and poker playing.
Automate This is about the… Continue reading...
IoT and Analytics, April 13th, Lausanne. Event organized by the Swiss Association for Analytics. More information and free subscription at http://meetu.ps/e/Bl61H/mgXJj/d… Continue reading... | 3 Comments
This is a guest post by Jeremy Sutter.
Sometimes, technically minded people feel they are not good candidates for leadership positions. Sometimes, they feel like leadership requires more people skills than they have. However, in the era of big companies and big data, that may be changing. It may be the person with the best information and the best… Continue reading... | 2 Comments
I recently received a copy of The Master Algorithm, written by Pedro Domingos, to review. While I was reading the back cover, my first impression was skepticism. Indeed, Domingos main idea in this book, is the (future) existence of a so called “Master Algorithm” which will outperform any other algorithms for any kind of tasks.
This is a guest post by Khushbu Shah from DeZyre.com.
Internet today as a collective agency is creating 2.5 quintillion bytes of date on a daily basis and nearly 90% of all of our global data has emerged in the past 2 years.