Data Science Book Review: The Master Algorithm
I recently received a copy of The Master Algorithm, written by Pedro Domingos, to review. While I was reading the back cover, my first impression was skepticism. Indeed, Domingos main idea in this book, is the (future) existence of a so called “Master Algorithm” which will outperform any other algorithms for any kind of tasks.
I can see you smiling and so did I when I started reading the book. Reaching the end, I have a different, let’s say more enlightened, view of the concept of Master Algorithm. Although Domingos has several arguments on why such an algorithm should exist, I’m still not convinced, and I will tell you why at the end of this post.
Nevertheless, The Master Algorithm is a must read for several reasons. First, the book is a journey in the field of data science algorithms, grouping data scientists into “tribes”:
- Symbolists (e.g. decision tree)
- Connectionists (e.g. neural networks)
- Evolutionaries (e.g. genetic algorithm)
- Bayesians (e.g. naive Bayes)
- Analogizers (e.g. SVM)
Each chapter describes the principles and history behind these tribes. This gives the reader a broad and comprehensive view of data mining approaches and differences between them. The chapter about combining all methods is full of metaphors and very well written. A vision of the future is provided in the last chapter.
Second reason why Domingos book is a must read: its idea of a unique algorithm to tune (beating all others) is interesting and well described. For Domingos, we will need to combine different learning paradigms to reach the Master Algorithm. According to Domingos, it is the role of the reader to discover it, the book being only a starting point. This is a very nice way of motivating non-experts to read the book. Finally, the book is provocative (at least if you don’t believe in a unique algorithm to solve all problems).
Let me now explain why I don’t believe that a Master Algorithm will soon replace the variety of existing ones. First, the very existence of a variety of algorithms is due to the various ways we can solve a given problem. As long as there is more than one person using Predictive Analytics, there will be more than one algorithm used. People use different algorithms because they think differently and are better at solving a problem with a specific algorithm they know better.
Second, the many different applications of Predictive Analytics are way too heterogeneous to be solved by a unique algorithm. Want to understand your model? Use linear regression. Need stability? Just try SVM. Limited by an low memory device implementation? Rules obtained using decision trees may do the trick. In conclusion, the variety of algorithms used are needed due to the different skills of people using them, the various applications to solve as well as deployment particularities, for example.
Whether you agree with Domingos or not, this book is a must have to learn machine learning without equation. It will help you get the big picture of the several learning paradigms. Finally, the provocative idea is not only intriguing, but also very well argued.