Python的验证码识别,模拟ajax请求,爬取优酷会员(滑稽)
来源:互联网 发布:淘宝收藏的店铺在哪里 编辑:程序博客网 时间:2024/05/19 13:19
首先想写一个爬取一个网站的优酷会员分享,但是是要输入验证码。
首先,我用谷歌分析其验证码的请求。
然后拼接url 去访问发现做了限制
那么应该是做了检测对请求头。
复制刷新验证码图片的请求头。自己构造个请求,并写出图片
def getyzm(): headers={ 'Accept-Encoding':'gzip, deflate, sdch', 'Accept-Language':'zh-CN,zh;q=0.8', 'Connection':'keep-alive', #Cookie:PHPSESSID=d763fd34e25925880c490955de8e0f2c 'Host':'vip.cengfan6.com', 'Referer':'http://vip.cengfan6.com/y/', 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36', 'X-Requested-With':'XMLHttpRequest' } i =random.randint(1,999999) print(i) url='http://vip.cengfan6.com/y/../code.php?s=%i' %i html = requests.get(url,headers=headers) #写出图片 with open('yzm.png','wb') as f: f.write(html.content)
然后就是验证码识别了。开始用的pyteeser。真不是很好安装(苦笑)
参考
http://www.th7.cn/Program/Python/201602/768304.shtml
http://m.blog.csdn.net/article/details?id=53537010
https://my.oschina.net/jhao104/blog/647326?fromerr=xJxwPW5X
太麻烦了,然后用的 pytesseract
测试
import pytesseractfrom PIL import Imageimage = Image.open('c:/yzm.png')code = pytesseract.image_to_string(image)print(code)
啊,识别出了英文。我的是数字啊orz
想了下要么看下机器学习训练下。啊,我不会啊,要学!
参考学习 http://www.cnblogs.com/beer/p/5672678.html
先用人工的把(伤心)
#识别验证码def viewyzm(): print('please input yanzhengma') time.sleep(2) image = Image.open('yzm.png') image.show() yzm = raw_input(u'关闭图片才能输入') print(yzm)getyzm()viewyzm()
后面又遇到了ajax请求。
谷歌看到请求
很有意思的是,刷新页面请求的是历史记录,先获取之前获取的账号密码。
我写了两个函数,一个是请求新的账号密码和请求历史记录的账号密码。
网站做了限制,只能获取5个。我做了代理还是只能5个。what?不是对ip做了限制?
def get_vip(): #请求,但是没有解密,可以在历史记录中获取获取到的vip账号 headers={ 'Accept-Encoding':'gzip, deflate, sdch', 'Accept-Language':'zh-CN,zh;q=0.8', 'Connection':'keep-alive', #'Cookie':PHPSESSID=d3a9d9a7a9ad9fee71a9588773388ead 'Host':'vip.cengfan6.com', 'Referer':'http://vip.cengfan6.com/y/', 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36', 'X-Requested-With':'XMLHttpRequest' } proxies={ '117.90.6.65':9000 } vip_url='http://vip.cengfan6.com/ajax.php?code=%s &typename=2' %viewyzm() viphtml = requests.get(vip_url,headers=headers,proxies=proxies) print(viphtml.content)def get_host_vip(): proxies={ '117.90.6.65':9000 } vip_url= 'http://vip.cengfan6.com/ajax_jilu.php?viptype=2' viphtml = requests.get(vip_url,proxies=proxies) vips =re.findall('<p>优酷(土豆)帐号:(.+?)密码:(.+?)</p>',viphtml.content) for vip in vips: print(vip[0]+":"+vip[1])
应该是我设置代理的方式有误。
不过5个也是够的。我经常用这个网站的会员。手动滑稽
所有代码记录下~~:
# -*- coding: UTF-8 -*-#../code.php?s=992671249#url='http://vip.cengfan6.com/y/'import requestsfrom bs4 import BeautifulSoupimport randomfrom PIL import Imageimport timeimport re#获取验证码def getyzm(): headers={ 'Accept-Encoding':'gzip, deflate, sdch', 'Accept-Language':'zh-CN,zh;q=0.8', 'Connection':'keep-alive', #Cookie:PHPSESSID=d763fd34e25925880c490955de8e0f2c 'Host':'vip.cengfan6.com', 'Referer':'http://vip.cengfan6.com/y/', 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36', 'X-Requested-With':'XMLHttpRequest' } i =random.randint(1,999999) print(i) url='http://vip.cengfan6.com/y/../code.php?s=%i' %i html = requests.get(url,headers=headers) #写出图片 with open('yzm.png','wb') as f: f.write(html.content)#识别验证码def viewyzm(): print('please input yanzhengma') time.sleep(2) image = Image.open('yzm.png') image.show() yzm = raw_input(u'关闭图片才能输入') return yzmxhrhd ='''Accept-Encoding:gzip, deflate, sdchAccept-Language:zh-CN,zh;q=0.8Connection:keep-aliveCookie:PHPSESSID=d3a9d9a7a9ad9fee71a9588773388eadHost:vip.cengfan6.comReferer:http://vip.cengfan6.com/y/User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36X-Requested-With:XMLHttpRequest'''def get_vip(): #请求,但是没有解密,可以在历史记录中获取获取到的vip账号 headers={ 'Accept-Encoding':'gzip, deflate, sdch', 'Accept-Language':'zh-CN,zh;q=0.8', 'Connection':'keep-alive', 'Cookie':'PHPSESSID=d3a9d9a7a9ad9fee71a9588773388ewd', 'Host':'vip.cengfan6.com', 'Referer':'http://vip.cengfan6.com/y/', 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36', 'X-Requested-With':'XMLHttpRequest' } proxies={ '117.90.6.65':9000 } vip_url='http://vip.cengfan6.com/ajax.php?code=%s &typename=2' %viewyzm() viphtml = requests.get(vip_url,headers=headers,proxies=proxies) print(viphtml.content)def get_host_vip(): proxies={ '117.90.6.65':9000 } vip_url= 'http://vip.cengfan6.com/ajax_jilu.php?viptype=2' viphtml = requests.get(vip_url,proxies=proxies) vips =re.findall('<p>优酷(土豆)帐号:(.+?)密码:(.+?)</p>',viphtml.content) for vip in vips: print(vip[0]+":"+vip[1])getyzm()get_vip()get_host_vip()
真的不是为了这个获取会员而做的。主要想多写些东西。不写就容易忘。
0 0
- Python的验证码识别,模拟ajax请求,爬取优酷会员(滑稽)
- Tesseract-OCR牛刀小试:模拟请求时的验证码识别
- python---POST/GET请求数据包,图片验证码自动化识别,pytesseract,模拟用户一次正常登录
- python 模拟ajax请求
- python模拟登陆知乎(手工识别验证码)
- 【Python模拟登录与验证码识别】
- 【Python模拟登录】py2模拟登陆CSDN(人工识别验证码)
- python 模拟ajax发送请求
- python模拟登陆豆瓣网和验证码识别
- django后台和app客户端的搭建指南(python,android)。模拟手机获取验证码的请求
- Python selenium自动化识别验证码模拟登录操作(二)
- 【Python】【验证码识别】python 模拟登陆时,验证码自动识别问题
- python 识别验证码
- python 验证码识别
- python验证码识别
- python验证码识别
- 《python识别验证码》
- python 验证码识别
- [UWP]依赖属性1:概述
- Problem E. Bet(2016 China-Final)【高精度除法】
- PHP closure
- Web HTTP协议中URI和URL区别
- shell-01
- Python的验证码识别,模拟ajax请求,爬取优酷会员(滑稽)
- Dstream生成RDD实例详解
- HDU 3529 舞蹈链之可重复覆盖
- 《InsideUE4》UObject(四)类型系统代码生成
- java网络编程
- 线程间通信常用的三种方法
- JAVA修饰符类型(public,protected,private,friendly)
- 231. Power of Two
- systemverilog语法(三)