nutch1.3与solr3.4集成部署在eclipse上之——运行的输出日志

来源:互联网 发布:淘宝退款车怎么玩 编辑:程序博客网 时间:2024/06/05 18:37

nutch1.3与solr3.4集成部署在eclipse上成功


在eclipse上运行参数是:

crawl urls -solr http://localhost:8080/l-nutch-solr -depth 3 -topN 10


运行时输出日志:

crawl started in: crawl-20111107123624rootUrlDir = urlsthreads = 10depth = 3solrUrl=http://localhost:8080/solr/topN = 10Injector: starting at 2011-11-07 12:36:25Injector: crawlDb: crawl-20111107123624/crawldbInjector: urlDir: urlsInjector: Converting injected urls to crawl db entries.Injector: Merging injected urls into crawl db.Injector: finished at 2011-11-07 12:36:30, elapsed: 00:00:05Generator: starting at 2011-11-07 12:36:30Generator: Selecting best-scoring urls due for fetch.Generator: filtering: trueGenerator: normalizing: trueGenerator: topN: 10Generator: jobtracker is 'local', generating exactly one partition.Generator: Partitioning selected urls for politeness.Generator: segment: crawl-20111107123624/segments/20111107123633Generator: finished at 2011-11-07 12:36:35, elapsed: 00:00:04Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.Fetcher: starting at 2011-11-07 12:36:35Fetcher: segment: crawl-20111107123624/segments/20111107123633Fetcher: threads: 10QueueFeeder finished: total 1 records + hit by time limit :0fetching http://www.amazon.cn/-finishing thread FetcherThread, activeThreads=7-finishing thread FetcherThread, activeThreads=7-finishing thread FetcherThread, activeThreads=7-finishing thread FetcherThread, activeThreads=6-finishing thread FetcherThread, activeThreads=5-finishing thread FetcherThread, activeThreads=4-finishing thread FetcherThread, activeThreads=3-finishing thread FetcherThread, activeThreads=1-finishing thread FetcherThread, activeThreads=2-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0-finishing thread FetcherThread, activeThreads=0-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0-activeThreads=0Fetcher: finished at 2011-11-07 12:36:39, elapsed: 00:00:04ParseSegment: starting at 2011-11-07 12:36:39ParseSegment: segment: crawl-20111107123624/segments/20111107123633ParseSegment: finished at 2011-11-07 12:36:42, elapsed: 00:00:02CrawlDb update: starting at 2011-11-07 12:36:42CrawlDb update: db: crawl-20111107123624/crawldbCrawlDb update: segments: [crawl-20111107123624/segments/20111107123633]CrawlDb update: additions allowed: trueCrawlDb update: URL normalizing: trueCrawlDb update: URL filtering: trueCrawlDb update: Merging segment data into db.CrawlDb update: finished at 2011-11-07 12:36:44, elapsed: 00:00:01Generator: starting at 2011-11-07 12:36:44Generator: Selecting best-scoring urls due for fetch.Generator: filtering: trueGenerator: normalizing: trueGenerator: topN: 10Generator: jobtracker is 'local', generating exactly one partition.Generator: Partitioning selected urls for politeness.Generator: segment: crawl-20111107123624/segments/20111107123646Generator: finished at 2011-11-07 12:36:48, elapsed: 00:00:04Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.Fetcher: starting at 2011-11-07 12:36:48Fetcher: segment: crawl-20111107123624/segments/20111107123646Fetcher: threads: 10QueueFeeder finished: total 10 records + hit by time limit :0fetching http://www.amazon.cn/%E4%B8%89%E6%98%9FS5838-3G%E6%89%8B%E6%9C%BA/dp/B005KP4AFG?_encoding=UTF8&s=electronics-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005OPL41A?_encoding=UTF8&s=electronics-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8fetching http://www.amazon.cn/b?ie=UTF8&node=79553071-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7fetching http://www.amazon.cn/%E5%B0%8F%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=814224051-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=6-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-IdeaPad-Y470N-%E7%AC%94%E8%AE%B0%E6%9C%AC%E7%94%B5%E8%84%91/dp/B005LT2VIE?_encoding=UTF8&s=electronics-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5fetching http://www.amazon.cn/ThinkPad-E40-0579-A22-14-0%E8%8B%B1%E5%AF%B8%E7%AC%94%E8%AE%B0%E6%9C%AC%E7%94%B5%E8%84%91-%E9%80%81%E5%8E%9F%E8%A3%85%E5%8C%85/dp/B005LFRMVY?_encoding=UTF8&s=electronics-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640644496  now           = 1320640639907  0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=4* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640644496  now           = 1320640640909  0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640644496  now           = 1320640641910  0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640644496  now           = 1320640642911  0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640644496  now           = 1320640643912  0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051fetching http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 1  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640644496  now           = 1320640644913  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640650546  now           = 1320640645914  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640650546  now           = 1320640646915  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640650546  now           = 1320640647916  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640650546  now           = 1320640648918  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640650546  now           = 1320640649919  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640655698  now           = 1320640650919  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640655698  now           = 1320640651921  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640655698  now           = 1320640652923  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640655698  now           = 1320640653924  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640655698  now           = 1320640654925  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640660855  now           = 1320640655926  0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640660855  now           = 1320640656927  0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640660855  now           = 1320640657928  0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640660855  now           = 1320640658929  0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640660855  now           = 1320640659930  0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051fetching http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051-finishing thread FetcherThread, activeThreads=9-finishing thread FetcherThread, activeThreads=8-finishing thread FetcherThread, activeThreads=7-finishing thread FetcherThread, activeThreads=6-finishing thread FetcherThread, activeThreads=5-finishing thread FetcherThread, activeThreads=4-finishing thread FetcherThread, activeThreads=3-finishing thread FetcherThread, activeThreads=2-finishing thread FetcherThread, activeThreads=1-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0-finishing thread FetcherThread, activeThreads=0-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0-activeThreads=0Fetcher: finished at 2011-11-07 12:37:43, elapsed: 00:00:55ParseSegment: starting at 2011-11-07 12:37:43ParseSegment: segment: crawl-20111107123624/segments/20111107123646ParseSegment: finished at 2011-11-07 12:37:45, elapsed: 00:00:01CrawlDb update: starting at 2011-11-07 12:37:45CrawlDb update: db: crawl-20111107123624/crawldbCrawlDb update: segments: [crawl-20111107123624/segments/20111107123646]CrawlDb update: additions allowed: trueCrawlDb update: URL normalizing: trueCrawlDb update: URL filtering: trueCrawlDb update: Merging segment data into db.CrawlDb update: finished at 2011-11-07 12:37:47, elapsed: 00:00:01Generator: starting at 2011-11-07 12:37:47Generator: Selecting best-scoring urls due for fetch.Generator: filtering: trueGenerator: normalizing: trueGenerator: topN: 10Generator: jobtracker is 'local', generating exactly one partition.Generator: Partitioning selected urls for politeness.Generator: segment: crawl-20111107123624/segments/20111107123749Generator: finished at 2011-11-07 12:37:51, elapsed: 00:00:04Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.Fetcher: starting at 2011-11-07 12:37:51Fetcher: segment: crawl-20111107123624/segments/20111107123749Fetcher: threads: 10QueueFeeder finished: total 10 records + hit by time limit :0fetching http://www.amazon.cn/%E8%81%94%E6%83%B3-P90W-WCDMA-%E6%95%B0%E5%AD%97%E7%A7%BB%E5%8A%A8%E7%94%B5%E8%AF%9D%E6%9C%BA-THINK%E9%BB%91/dp/B005GZ0I5G?_encoding=UTF8&s=electronicsfetching http://g-ec4.images-amazon.com/images/G/28/x-locale/common/transparent-pixel._V192562247_.gif-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=8-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=8-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8fetching http://www.amazon.cn/%E8%81%94%E6%83%B3-P90W-WCDMA-%E6%95%B0%E5%AD%97%E7%A7%BB%E5%8A%A8%E7%94%B5%E8%AF%9D%E6%9C%BA-%E7%84%89%E7%B2%89/dp/B005GZ0IC4?_encoding=UTF8&s=electronics-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7fetching http://www.amazon.cn/gp/yourstore/homefetching http://www.amazon.cn/gp/css/homepage.htmlfetching http://www.amazon.cn/%E6%89%8B%E8%A1%A8-%E6%97%B6%E9%92%9F/b?ie=UTF8&node=1953164051-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=4* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 1  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640683363  now           = 1320640684037  0. http://www.amazon.cn/gp/registry/wishlist  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  3. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640689186  now           = 1320640685037  0. http://www.amazon.cn/gp/registry/wishlist  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  3. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640689186  now           = 1320640686039  0. http://www.amazon.cn/gp/registry/wishlist  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  3. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640689186  now           = 1320640687043  0. http://www.amazon.cn/gp/registry/wishlist  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  3. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640689186  now           = 1320640688044  0. http://www.amazon.cn/gp/registry/wishlist  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  3. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640689186  now           = 1320640689045  0. http://www.amazon.cn/gp/registry/wishlist  1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  3. http://www.amazon.cn/gp/help/customer/display.htmlfetching http://www.amazon.cn/gp/registry/wishlist-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 1  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640689186  now           = 1320640690047  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640695079  now           = 1320640691048  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640695079  now           = 1320640692049  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640695079  now           = 1320640693049  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640695079  now           = 1320640694051  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640695079  now           = 1320640695053  0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  2. http://www.amazon.cn/gp/help/customer/display.htmlfetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640700231  now           = 1320640696053  0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640700231  now           = 1320640697054  0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640700231  now           = 1320640698056  0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640700231  now           = 1320640699057  0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640700231  now           = 1320640700058  0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics  1. http://www.amazon.cn/gp/help/customer/display.htmlfetching http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640705384  now           = 1320640701058  0. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640705384  now           = 1320640702060  0. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640705384  now           = 1320640703060  0. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640705384  now           = 1320640704061  0. http://www.amazon.cn/gp/help/customer/display.html-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1* queue: http://www.amazon.cn  maxThreads    = 1  inProgress    = 0  crawlDelay    = 5000  minCrawlDelay = 0  nextFetchTime = 1320640705384  now           = 1320640705063  0. http://www.amazon.cn/gp/help/customer/display.htmlfetching http://www.amazon.cn/gp/help/customer/display.html-finishing thread FetcherThread, activeThreads=8-finishing thread FetcherThread, activeThreads=8-finishing thread FetcherThread, activeThreads=7-finishing thread FetcherThread, activeThreads=6-finishing thread FetcherThread, activeThreads=5-finishing thread FetcherThread, activeThreads=4-finishing thread FetcherThread, activeThreads=3-finishing thread FetcherThread, activeThreads=2-finishing thread FetcherThread, activeThreads=1-finishing thread FetcherThread, activeThreads=0-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0-activeThreads=0Fetcher: finished at 2011-11-07 12:38:26, elapsed: 00:00:35ParseSegment: starting at 2011-11-07 12:38:26ParseSegment: segment: crawl-20111107123624/segments/20111107123749Error parsing: http://g-ec4.images-amazon.com/images/G/28/x-locale/common/transparent-pixel._V192562247_.gif: failed(2,0): Can't retrieve Tika parser for mime-type image/gifParseSegment: finished at 2011-11-07 12:38:28, elapsed: 00:00:01CrawlDb update: starting at 2011-11-07 12:38:28CrawlDb update: db: crawl-20111107123624/crawldbCrawlDb update: segments: [crawl-20111107123624/segments/20111107123749]CrawlDb update: additions allowed: trueCrawlDb update: URL normalizing: trueCrawlDb update: URL filtering: trueCrawlDb update: Merging segment data into db.CrawlDb update: finished at 2011-11-07 12:38:30, elapsed: 00:00:01LinkDb: starting at 2011-11-07 12:38:30LinkDb: linkdb: crawl-20111107123624/linkdbLinkDb: URL normalize: trueLinkDb: URL filter: trueLinkDb: adding segment: file:/E:/Workspaces/workspace1/L-nutch/crawl-20111107123624/segments/20111107123633LinkDb: adding segment: file:/E:/Workspaces/workspace1/L-nutch/crawl-20111107123624/segments/20111107123646LinkDb: adding segment: file:/E:/Workspaces/workspace1/L-nutch/crawl-20111107123624/segments/20111107123749LinkDb: finished at 2011-11-07 12:38:32, elapsed: 00:00:01SolrIndexer: starting at 2011-11-07 12:38:32SolrIndexer: finished at 2011-11-07 12:38:37, elapsed: 00:00:05SolrDeleteDuplicates: starting at 2011-11-07 12:38:37SolrDeleteDuplicates: Solr url: http://localhost:8080/solr/SolrDeleteDuplicates: finished at 2011-11-07 12:38:39, elapsed: 00:00:01crawl finished: crawl-20111107123624

抓取数据模型

1. CrawlDB,用于存储所有的urls信息,包括抓取机制,抓取状态,网页指纹和元数据。

2. LinkDB,存储每一个url的连入锚链接和锚文本

3. Segment,原始的网页内容;解析后的网页;元数据;外链接;用于索引的元文本




参考:http://blog.csdn.net/amuseme_lu/article/details/5993916