串行爬虫sequentipl

来源:互联网 发布:ubuntu虚拟机无法上网 编辑:程序博客网 时间:2024/06/07 13:00
from link_crawler import link_crawlerfrom mongo_cache import MongoCachefrom alexa_cb import AlexaCallbackdef main():    scrape_callback = AlexaCallback()    cache = MongoCache()    link_crawler(scrape_callback.seed_url, scrape_callback=scrape_callback, cache=cache)if __name__ == '__main__':    main()

注意:报pymongo.errors.OperationFailure: exception: Index with name: timestamp_1 already exists with different options时,需要注掉mongo_cache中的

self.db.webpage.create_index('timestamp', expireAfterSeconds=expires.total_seconds())

原创粉丝点击