What Google can’t mine

January 18, 2007 by
Filed under: deep web, hidden web, information, search engine 

While I was reading a book about search and information, I found a particular chapter about the hidden web very interesting. Basically, the hidden web is the part of the Internet that is accessible to people but not to bots (such as Google bots). In other words, these pages exist, but they are not referenced in search engines simply because it is too difficult (or sometimes impossible?) to index them. Examples are dynamically generated webpages, most databases behind websites, pages requiring password access, etc. More details can be found on wikipedia under the term deep web.

Michael Bergman is estimating the size of the hidden web as 400-550 times the visible web. I think the metaphor of an iceberg can easily be used in this situation. The question now is how will Google and other search engines do to access this (or at least part of this) information?

Share

Comments

One Comment on What Google can’t mine

  1. Will Dwinnell on Thu, 8th Feb 2007 2:13 pm
  2. Another issue is the near-total reliance of some people on Google to search the World Wide Web. While I have found Google to be an effective search engine, I have found it useful to utilize a number of other search engines. Using more than one engine provides diversity of response and helps avoid search dead-ends (“Well, I can’t find it using Google… It must not be on the Web.”)

    I suggest these alternatives, but there are others:

    AllTheWeb
    AltaVista
    Clusty
    Devil Finder
    hakia
    Ixquick

Tell me what you're thinking...





  • Swiss Association for Analytics

  • Most Popular Posts

  • T-shirts, Mugs & Mousepads


    All benefits given to a charity association
  • Data Mining Search Engine

    Supported by AnalyticBridge

  • Archives

  • Reading Recommandations