Google Caffeine 大规模实时增量索引
来源:互联网 发布:图片修图用什么软件 编辑:程序博客网 时间:2024/04/29 22:04
Our new search index: Caffeine
6/08/2010 05:00:00 PM
(Cross-posted on the Webmaster Central Blog)
Some background for those of you who don't build search engines for a living like us: when you search Google, you're not searching the live web. Instead you're searching Google's index of the web which, like the list in the back of a book, helps you pinpoint exactly the information you need. (Here's a good explanation of how it all works.)
So why did we build a new search indexing system? Content on the web is blossoming. It's growing not just in size and numbers but with the advent of video, images, news and real-time updates, the average webpage is richer and more complex. In addition, people's expectations for search are higher than they used to be. Searchers want to find the latest relevant content and publishers expect to be found the instant they publish.
To keep up with the evolution of the web and to meet rising user expectations, we've built Caffeine. The image below illustrates how our old indexing system worked compared to Caffeine:
Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.
With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.
Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.
We've built Caffeine with the future in mind. Not only is it fresher, it's a robust foundation that makes it possible for us to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you. So stay tuned, and look for more improvements in the months to come.
http://googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
- Google Caffeine 大规模实时增量索引
- Google新索引系统Caffeine领先业界
- 下一代大规模增量索引平台 – Percolator
- 下一代大规模增量索引平台 – Percolator
- mongo-connector实时增量索引
- coreseek实时索引更新之增量索引
- coreseek 增量索引模拟实时索引
- coreseek实时索引更新之增量索引
- sphinx 增量索引 实现近实时更新
- sphinx 增量索引 实现近实时更新
- sphinx 增量索引 实现近实时更新
- sphinx 增量索引 实现近实时更新
- Solr与MongoDB集成,实时增量索引
- sphinx 增量索引 实现近实时更新
- sphinx 增量索引 实现实时更新
- sphinx 增量索引 实现近实时更新
- Solr与MongoDB集成,实时增量索引
- sphinx 增量索引 实现近实时更新
- 为什么指向字符串的指针内容有时可变有时不可变?
- 老是忘记掉如何查看共享和ftp的方法
- 再谈ArcGIS10许可初始化
- Linux pci驱动分析
- EMOS1.5的配置过程 (能发邮件,不能收邮件)
- Google Caffeine 大规模实时增量索引
- 【加密/解密】Botan 中的 AES 加密算法实例
- Java 透析C/S和B/S结构
- 关于telnet的远程连接
- 修改Ubuntu11.10的默认登录环境
- junit
- Asp.net, cookie操作
- 性能分析工具汇总
- 充实的人生应当有丰富的经历和回忆。年轻时付出劳动,年老时奉献经验。一生只做43件事,足矣。