Scrapy之信号
来源:互联网 发布:淘宝运费险最多赔多少 编辑:程序博客网 时间:2024/06/16 17:55
信号提供了一种机制,可以让事件发生时调用该事件的回调函数,例如,当爬虫开启,或者当抓取到了一个Item
。你可以通过crawler.signals.connect()
方法来把它们和回调函数关联起来。Scrapy共有11个信号,或许理解它们的最简单的方式就是在实例中观察它们。这时创建了一个爬虫的工程,主要目的就是记录了每次的方法调用。爬虫本身比较简单,只是yield
了两个Item
然后抛出一个异常,并且在处理第二个Item
时让Item Pipeline
抛出一个DropItem
异常:
def parse(self, response): for i in range(2): item = HooksasyncItem() item['name'] = "Hello %d" % i yield item raise Exception("dead")
完整的爬虫工程可以找一下这里。
使用这个工程,我们可以更好地理解信号是在何时被发送的。看一下下面的执行结果,注意日志行之间的注释:
$ scrapy crawl test... many lines ...# First we get those two signals...INFO: Extension, signals.spider_opened firedINFO: Extension, signals.engine_started fired# Then for each URL we get a request_scheduled signalINFO: Extension, signals.request_scheduled fired...# when download completes we get response_downloadedINFO: Extension, signals.response_downloaded firedINFO: DownloaderMiddlewareprocess_response called forexample.com# Work between response_downloaded and response_receivedINFO: Extension, signals.response_received firedINFO: SpiderMiddlewareprocess_spider_input called forexample.com# here our parse() method gets called... and thenSpiderMiddleware usedINFO: SpiderMiddlewareprocess_spider_output called forexample.com# For every Item that goes through pipelines successfully...INFO: Extension, signals.item_scraped fired# For every Item that gets dropped using the DropItemexception...INFO: Extension, signals.item_dropped fired# If your spider throws something else...INFO: Extension, signals.spider_error fired# ... the above process repeats for each URL# ... till we run out of them. then...INFO: Extension, signals.spider_idle fired# by hooking spider_idle you can schedule further Requests. Ifyou don't# the spider closes.INFO: Closing spider (finished)INFO: Extension, signals.spider_closed fired# ... stats get printed# and finally engine gets stopped.INFO: Extension, signals.engine_stopped fired
只有11个信号可能有些限制,但是Scrapy所有默认的中间件都是用它们实现的,所以11个信号已经足够了。要注意的是,除了spider_idle
、spider_error
、request_scheduled
、response_received
和response_downloaded
这些信号,你都可以在其他信号中返回Deferred
对象而不是实际的值。
0 0
- Scrapy之信号
- scrapy-redis(五):scrapy中信号工作的原理
- Scrapy 0.22 文档翻译 之 Scrapy一瞥
- Scrapy进阶之Scrapy的架构
- scrapy 之二
- scrapy 之三
- scrapy爬虫之Spider
- scrapy爬虫之selectors
- 爬虫之Scrapy
- Scrapy设置之Analysis
- Scrapy设置之Feeds
- Scrapy设置之深入
- scrapy 学习之路
- scrapy 之 Spider类
- scrapy之原理
- scrapy之ip池
- scrapy之其他
- python爬虫之Scrapy
- 并查集+Set-BZOJ-1604-[Usaco2008 Open]Cow Neighborhoods 奶牛的邻居
- 查询相册和摄像头的使用权限
- iOS-UITextView占位文字placeholder
- A strange lift
- myeclipse 远程调控 tomcat 配置
- Scrapy之信号
- Previous operation has not finished; run 'cleanup' if it was interrupted
- Trie树
- android常用权限
- 仿Atom的activate-power-mode插件的Android studio插件
- EasyUI datagrid 与 pagination 动态赋值的联用
- 纪念我的第一次出差--武汉行
- 解析Android获取系统cpu信息,内存,版本,电量等信息的方法详解(转)
- xml中bean的property为变量