a-web-crawler-with-asyncio-coroutines
来源:互联网 发布:施乐2011网络设置 编辑:程序博客网 时间:2024/06/14 20:00
The one with the callback pattern:
import socketfrom selectors import DefaultSelector,EVENT_WRITE,EVENT_READdef loop(): while not Stopped: events = selector.select() for event_key,event_mask in events: callback = event_key.data callback()class Fetcher: def __init__(self,url): self.response = b'' self.url = url self.sock = None #conncet a socket, def fetch(self): self.sock = socket.socket() self.sock.setblocking(False) try: self.sock.connect(('xkcd.com',80)) except BlockingIOError: pass #register the nexr callback from connect to send request status selector.register(self.sock.fileno(),EVENT_WRITE,self.connected) def connected(self,key,mask): print('connceted!') selector.unregister(key.fd) request = 'GET {} HTTP/1.0\r\nHost:xkcd.com\r\n\r\n'.format(self.url) self.sock.send(request.encode('ascii')) #register the next callback from send request to wait response status selector.register(key.fd,EVENT_READ,self.read_response) def read_response(self,key,mask): global stopped chunk = self.sock.recv(4096) if chunk: self.response += chunk else: selector.unregister(key.fd) #parse the response data to a set of urls links = self.parse_links() #Python set-logic for link in links.difference(seen_urls): urls_todo.add(link) Fetcher(link).fetch() seen_urls.update(links) urls_todo.remove(self.url) if not urls_todo: stopped = Trueselector = DefaultSelector()#global sets of the URLs we have yet to fetchurls_todo = set(['/'])seen_urls = set(['/'])
but callbacks are stubborn to debug due to the stack ripping
阅读全文
0 0
- a-web-crawler-with-asyncio-coroutines
- Phpfetcher - a simple web crawler framework
- Web Crawler
- Web crawler
- 【GoLang笔记】A Tour of Go - Exercise: Web Crawler
- How to make a Web crawler using Java?
- Just a Crawler
- Web crawler作业报告
- Web crawler 初学
- opensource web crawler C#
- hidden web crawler
- Python Spider, Web Crawler
- asyncio
- asyncio
- asyncio
- a summary of python crawler
- Simple Web Crawler Used Python
- Crawler
- 给初学者的RxJava2.0教程(二)
- dvaJs + react 快速构建项目
- 基于Sentinel的Redis集群(主从&Sharding)的redis封装实现
- 多线程使用之主线程与多线程响应同步
- Perl CPAN install PM package
- a-web-crawler-with-asyncio-coroutines
- MySQL添加用户、删除用户与授权
- ZXing笔记(1)—Hello ZXing
- c231n notes list
- C#中的Path类的方法详解
- django模板继承常用标签和规则(看django源码遇到了{{ block.super }})
- spring security 自定义认证登录
- 转:交叉验证和bias-tradeoff的权衡
- redis-aof