scrapy之user-agent池
来源:互联网 发布:淘宝上美瞳为什么便宜 编辑:程序博客网 时间:2024/05/17 23:30
常见的反爬策略有很多,今天我们一起跟随小省开始,ua的反爬之旅,咳咳咳,敲黑板喽!
直接上代码:
首先建立中间件
#!/usr/bin/env python# -*- coding: utf-8 -*-# Create by shengjk1 on 2017/11/8import randomfrom scrapy.contrib.downloadermiddleware.useragent import UserAgentMiddlewareclass UserAgentMiddleware(UserAgentMiddleware): def __init__(self, user_agent=''): self.user_agent = user_agent def process_request(self, request, spider): # 这句话用于随机选择user-agent ua = random.choice(self.user_agent_list) if ua: request.headers.setdefault('User-Agent', ua) user_agent_list = [ 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6', 'Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11', 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)', 'Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6', 'Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1', 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1', 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.3 Mobile/14E277 Safari/603.1.30', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36']
然后setting文件中配置
DOWNLOADER_MIDDLEWARES = {'screptile.useragent_middleware.UserAgentMiddleware' :400}
注意点:
默认header中不能有 User-Agent,否则,自定义的User-Agent中间件是不起作用的
阅读全文
0 0
- scrapy之user-agent池
- python scrapy 之 随机选择user-agent
- Scrapy命令 和 User Agent
- Scrapy 通过中间件实现随机User-Agent
- HTTP之User-Agent
- python爬虫之scrapy中user agent浅谈(两种方法)
- scrapy动态设置user agent,使用IP地址池,禁用cookies,设置下载延迟.
- Http Header之User-Agent
- scrapy爬虫防止被禁止 User Agent切换
- scrapy-redis介绍(三):如何自定义user-agent
- scrapy User Agent切换的两种方法
- Scrapy在采集网页时使用随机user-agent
- python3 网络爬虫(五)scrapy中使用User-Agent
- scrapy使用random user-agent的两种方式
- scrapy防禁止 设置user-agent的方法
- scrapy修改user-agent的几种方法
- User Agent
- user agent
- 关于git的一些常用命令
- 机器学习系列-朴素贝叶斯分类器
- 第六课 平移物体
- 装好mysql启动时输入密码后闪退
- Access数据库使用数据库加密,C#连接access数据库问题处理
- scrapy之user-agent池
- Java代码操作mongodb与mongo查询sheel对应-----(列出每一步所得出的表格)
- 四相八拍步进电机驱动
- excel拼接换行符:char(10)
- 25-IP Address Allocation I
- 日期与时间
- 机器学习——监督学习(一)
- ubuntu下启动AS 模拟器报错" Unknown AVD name "
- [bzoj1833][DP]count 数字计数