Srapy - Architecture overview【Data flow】
来源:互联网 发布:申请淘宝api接口流程 编辑:程序博客网 时间:2024/05/17 22:43
#Data flow
The data flow in Scrapy is controlled by the execution engine, and goes like this:
- The Engine gets the initial Requests to crawl from the Spider.
- The Engine schedules the Requests in the Scheduler and asks for the
next Requests to crawl. - The Scheduler returns the next Requests to the Engine.
- The Engine sends the Requests to the Downloader, passing through the
Downloader Middleware (requests direction). - Once the page finishes downloading the Downloader generates a
Response (with that page) and sends it to the Engine, passing
through the Downloader Middleware (response direction). - The Engine receives the Response from the Downloader and sends it to
the Spider for processing, passing through the Spider Middleware
(input direction). - The Spider processes the Response and returns scraped items and new
Requests (to follow) to the Engine, passing through the Spider
Middleware (output direction). - The Engine sends processed items to Item Pipelines, then send
processed Requests to the Scheduler and asks for possible next
Requests to crawl. - The process repeats (from step 1) until there are no more requests
from the Scheduler.
Scrapy的架构初探
https://zhuanlan.zhihu.com/p/21320942
0 0
- Srapy - Architecture overview【Data flow】
- PCM data flow - 1 - Overview
- Section2---Overview of Data Flow in DirectShow
- PCM data flow - part 1: Overview
- Designing Data Storage Architecture - Pricing Overview
- [DirectShow] 012 - Overview of Data Flow in DirectShow
- Asterisk Internal Architecture Overview
- g80 architecture overview
- Asterisk Architecture Overview
- Xen Architecture Overview
- PeopleSoft Internet Architecture Overview
- FastTrack architecture overview
- A Swing Architecture Overview
- PeopleSoft Architecture && Overview
- OpenStack Project Architecture Overview
- 1.3. Deployment Architecture Overview
- 02 Architecture Overview
- MCU overview & architecture
- 滚动拖拽
- Pixhawk原生固件PX4之常用函数解读
- jQuery实现遍历复选框
- 大富翁的游戏规则
- MyEclipse ci3中JSP乱码
- Srapy - Architecture overview【Data flow】
- 【深度学习:目标检测】CVPR 2016:目标检测领域的新进展
- (Java)LeetCode-61. Rotate List
- HDU 又见GCD
- HDU 5922 && 5924
- CODESYS 个性化设置
- KindEditor的使用
- Android群英传知识点回顾——第三章:Android控件架构与自定义控件详解
- ListView性能优化