Search Engine Index Sizes: Google vs Yahoo vs MSN

When we do a search on Google or any other web search engine, the total number of results are generally listed in the top right corner. This count can help us determine the actual size of the search engine.

Just perform a simple search for common words (words that are probably found in every text document like "the", "a", "is", "of", "or") and you can roughly compute the size of the entire search index.

We did the above experiment with the three most popular search engines - Google, Yahoo and Microsoft owned MSN. Here are some very interesting stats about their index sizes:

» MSN looks like a new born baby. It indexes just 10% of content when compared with Google.

» Google indexes the largest number of web pages for any of the common words. Yahoo comes second but not close enough.



» For overlapping queries ["the" OR "is" OR "of" OR "in" OR "are" OR "a"], Google finds 25 Billion documents while Yahoo shows just 25 Million results. See screenshots below.





Related: Which is the most Honest Search Engine ?

Limitation: The above results are only for text documents like PDF, Word Files, XLS or HTML files. No images or audi-video content is included.

Find this article at: http://labnol.blogspot.com/2006/07/search-engine-index-sizes-google-vs.html

web: http://www.labnol.org/ email: amit@labnol.org

Reader Comments

for an overlapping query, the screenshots reveal the fact that,
Yahoo returned - 25 million
google returned - 25 billion

however, your conclusion states that both returned roughly the same number of index. Am i missing something here?

Thanks - that was a mistake at my end. Corrected.

Yahoo is better than Google, I am talking about Index. Yahoo's index size is 8 TG whereas Google's is only 4.5. One more thing I would like to share is that my site contains only 400 pages but google has indexed 1247 !!

Hi Amit,
By searching for common words like 'the','and', etc you are populating a rough index size for documents containing those words which most probably will be 'english' documents. Hence you are excluding other languages.
A better way to get the index size is explained by me here: http://blog.semanticvoid.com/2005/09/12/index-count-of-google-search/

Of course this simple experiment takes into consideration only pages in the English language, which in all probability make up most of the Net.

Is it possible that each search engine has a different strategy for these common words. For e.g. Search engine X might choose to index all the common words, search engine Y might consider common words only in conjunction with other non-noise words whereas search engine Z might consider these common words normal words in their own right.

Amit,
Have you considered counting number of supplimental results from google?
Like the above posters, i don't consider number of indexed pages to be some sort of plus point or to be any indication of efficinecy.
Just few months ago there was a press release that Yahoo surpassed google in number of indexed pages.
This makes me wonder if searching for very common words like 'a', 'the' will reveal actual size of search engine index sizes.
Yahoo, msn may be smart not to index these words.

Also, whenever google tries to play with indexing algorithms, it always screws up the search results and some companies just vanish (on google). Recently google has earned a bad reuptaion for mismanaging its index servers.
Yahoo is probably going to occupy a respectable second position in search (right now it is at distant second position).

thanks

« Back to main



Google Custom Search