What Google can’t mine
While I was reading a book about search and information, I found a particular chapter about the hidden web very interesting. Basically, the hidden web is the part of the Internet that is accessible to people but not to bots (such as Google bots). In other words, these pages exist, but they are not referenced in search engines simply because it is too difficult (or sometimes impossible?) to index them. Examples are dynamically generated webpages, most databases behind websites, pages requiring password access, etc. More details can be found on wikipedia under the term deep web.
Michael Bergman is estimating the size of the hidden web as 400-550 times the visible web. I think the metaphor of an iceberg can easily be used in this situation. The question now is how will Google and other search engines do to access this (or at least part of this) information?