Data mining is a field which is closely related to information extraction and search engines. Web Dragons: Inside the Myths of Search Engine Technology explains everything you want to know about search engines (the so called “web dragons”) and how they work. Before reading the book, you perhaps wonder why Witten and co-authors called search engines “web dragons”. After reading the book, I’m sure you will understand why. Search engines are guardians of the world information and their power is formidable.
The approach is descriptive and historical rather than technical. Thus, the book is intended to a wide audience: people working with data, librarians, webmasters, but also search engine users who wants to know more about the tool they use everyday. The first author, Ian Witten, is involved in the data mining field (see for example the famous book Data Mining (Witten and Frank, 2005). The book thus makes many allusions to data mining applications. It is divided as follows:
- Setting the scene
- Literature and the web
- Meet the web
- How to search
- The web wars
- Who controls information?
- The dragons evolve
The two first chapters cover the history of search engines (starting from the very beginning: writing, etc.). You can easily skip these chapters (which maybe interesting to librarians for example) and start with the third one. There, you learn everything about the web, protocols, programming languages, etc. The strength of the book is to cover all these topics in a readable manner. You never face code or pseudo-code, only clear and interesting descriptions. The next chapter covers basics of search engine ranking (e.g. PageRank) in details and much more. Principal search engines are also introduced and explained. The following chapter (The web wars) explains the different ways of abusing such search engines (link boosting, term boosting, link farm, spam, etc.). The chapter is very interesting and instructing.
The next chapter (Who controls information?) points out the power of web dragons. They control world information and this raises privacy and copyright issues. Finally, the last chapter covers evolution of search engines. According to the authors, we are at the very beginning of information search. They focus on web communities that maybe the next step for search engine. As a conclusion, I recommend this book to anyone that is interested in how search engines work and especially how important they are for our society.