why is it that Google can return near-instantaneous search results from the entirety of the internet, but takes forever to find that recipe your Aunt Mildred emailed you two months ago?
The key difference is that the storage backend for the web index can be built in batch, and is small enough to serve out of RAM. While the index for email must be updated in real time, and is too large to fit in memory. This means that doing an email search requires loading data off disk, and applying incremental index updates from a log. Also the email search query is handled by a single machine, while web search is shared across a cluster.
- The total contents of the web is actually smaller than the sum of the sizes of the contents of everyone's gmail. This means it could take more servers to hold all the indexes for mail search than for web search.
- When you search the web, for the most part, you're getting the same results for your query as anyone else would get for that query. This means caching works well for web search. Most search engines have a small "hot index" with the most popular content that can handle the majority of queries which is replicated out to lots of local datacenters, giving low average response time even if the worst case is slow.
- Gmail search results are sorted by time and need to be perfect matches, whereas web search results are sorted by relevance and approximations can be made to cut corners.
The key difference is that the storage backend for the web index can be built in batch, and is small enough to serve out of RAM. While the index for email must be updated in real time, and is too large to fit in memory. This means that doing an email search requires loading data off disk, and applying incremental index updates from a log. Also the email search query is handled by a single machine, while web search is shared across a cluster.
Google publicly states that their indexing system uses "nearly 100 million gigabytes of storage".
No comments:
Post a Comment