关于asyncio的ValueError: too many file descriptors in select()错误
来源:互联网 发布:华为手机全球销量知乎 编辑:程序博客网 时间:2024/05/17 01:28
最近写爬虫用asyncio+aiohttp的形式,代码如下:
import aiohttpimport asyncioheaders = { "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8", "Accept-Encoding": "gzip, deflate, sdch, br", "Accept-Language": "zh-CN,zh;q=0.8", }async def ss(url): async with aiohttp.ClientSession() as session: async with session.get(url,headers=headers) as resp: print(resp.status) d = (await resp.text("utf-8","ignore")) cc(d)def cc(v): print(v) soup = BeautifulSoup(v, "lxml") contents = soup.select("div.content") for conten in contents: articleAuthor = conten.select("div.blog_info > a") if articleAuthor: # print(articleAuthor) articleAuthor = articleAuthor[0] else: articleAuthor = "" print(articleAuthor)loop = asyncio.get_event_loop()tasks = [ss(url) for url in ["http://www.iteye.com/blogs/tag/java?page="+str(x) for x in range(1,2)] ]loop.run_until_complete(asyncio.gather(*tasks))
乍一看代码没有问题,运行起来代码也没有问题,但是如果将url增加到上千个就会报ValueError: too many file descriptors in select()的错误
这是为什么呢?
因为asyncio内部用到了select,而select就是那个什么系统打开文件数是有限度的,上面的代码一次性将处理url的函数作为任务扔进了一个超大的List中,这就引起了错误,用这种形式无法写大规模爬虫
那怎么办呢?
用回调
代码如下:
from bs4 import BeautifulSoupimport aiohttpimport asyncioimport timeurlss=[]headers = { "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8", "Accept-Encoding": "gzip, deflate, sdch, br", "Accept-Language": "zh-CN,zh;q=0.8", }async def ss(url): async with aiohttp.ClientSession() as session: async with session.get(url,headers=headers) as resp: print(resp.status) return await resp.text("utf-8","ignore")def cc(v): print("ssssssss") # print(v.result()) # result()获取内容 soup = BeautifulSoup(v.result(), "lxml") contents = soup.select("div.content") for conten in contents: # articleAuthor = conten.select("div.blog_info > a") # if articleAuthor: # # print(articleAuthor) # articleAuthor = articleAuthor[0] # else: # articleAuthor = "" articleUrl = conten.select("h3 > a") if articleUrl: articleUrl = articleUrl[0].get("href") urlss.append(articleUrl)# async def ss2(url):# async with aiohttp.ClientSession() as session:# async with session.get(url,headers=headers) as resp:# print(resp.status)# return await resp.text("utf-8","ignore")def cc2(v): print("ssssssss222222222222") # print(v.result()) # result()获取内容 soup = BeautifulSoup(v.result(), "lxml") articleImages_list = soup.select("img") if articleImages_list: articleImages_list = articleImages_list[0].get("src") else: articleImages_list = [] print(articleImages_list)now = lambda: time.time()start = now()loop = asyncio.get_event_loop()# url = "http://www.iteye.com/blogs/tag/java?page=1"for url in ["http://www.iteye.com/blogs/tag/java?page="+str(x) for x in range(1,2)]: coroutine = ss(url) # 添加任务 task = asyncio.ensure_future(coroutine) # 回调 task.add_done_callback(cc) # 事件循环 loop.run_until_complete(task) for url in urlss: coroutine = ss(url) task = asyncio.ensure_future(coroutine) task.add_done_callback(cc2) loop.run_until_complete(task)print('TIME: ', now() - start)
阅读全文
0 0
- 关于asyncio的ValueError: too many file descriptors in select()错误
- 关于错误:ValueError: too many values to unpack
- 关于Android 4.0编译生成Recovery的一个错误:ValueError: too many values to unpack
- python错误for k,v in d: ValueError: too many values to unpack
- python2.7错误for k,v in dict: ValueError: too many values to unpack
- ValueError: too many values to unpack
- CDH分支节点无法被监听,错误提示ValueError: too many values to unpack
- 关于Too many fetch-failures错误
- 关于 Python opencv 使用中的 ValueError: too many values to unpack
- OBIEE Tips #5: Too many values in single select prompt
- Too many connections错误的解决办法
- Too many connections错误的解决办法
- MYSQL Too many connections错误的解决办法
- MYSQL Too many connections错误的解决办法
- MYSQL Too many connections错误的解决办法
- MYSQL Too many connections错误的解决办法
- FreePBX中Too many directories in /var/spool/asterisk/voicemail/default/错误的解决办法
- 关于MySQL的 too many connections
- nyoj 1248 海岛争霸
- 安卓_解析
- java 创建验证码,图片,可直接运行
- python 遍历文件夹
- 事务并发处理带来的问题
- 关于asyncio的ValueError: too many file descriptors in select()错误
- 网站的安全架构
- 一种排序
- 02:找第一个只出现一次的字符(1.7编程基础之字符串)
- 字符串转化为整数
- 单调队列模板
- 搭建elk
- POJ
- fork()详解