Without any suspense, “An Introduction to Statistical Learning” (ISL) by James, Witten, Hastie and Tibshirani is a key book in the Data Science literature. I would summarize it as a book written by statisticians for non-statisticians. Indeed, while the book “The Elements of Statistical Learning” was heavy on theory and equations, ISL is the practical counterpart. The book is very clear and contains only theory you need to understand the data mining algorithms covered. It’s thus a invaluable resource for Data Scientists who don’t need all theorems and proofs related to a given algorithm, but still need to understand how it works.
Several examples are provided to illustrate each algorithm. Each chapter contains a section with R labs, showing the code needed to move from reading the book to doing data science. The book has a strong emphasis on linear regression and related non-linear approaches (more than half of the book). This lets very few place to other approaches such as decision trees and SVM, which are still covered. The final chapter rapidly covers PCA and clustering.
Although the book is targeted towards a larger audience than statisticians, you shouldn’t be afraid of equations (by the way, if you look for an excellent book covering data science algorithms with nearly no equation, have a look at “Data Science for Business” from Provost and Fawcett). With such an excellent book, we are obviously more exigent and I was looking for more coverage of validity indices for clustering, Support Vector Regression, and a final chapter about trends and challenges. In conclusion, ISL is the definitive resource for Data Scientists who want to get the correct level of statistical background in their work.