python识别验证码
来源:互联网 发布:古墓丽影8mac配置要求 编辑:程序博客网 时间:2024/05/17 06:22
Windows系统
示例代码
import sysimport timeimport urllibimport shutilimport pytesser3import requestsfrom lxml import etreeconfig={'gid':1}def parse(s, html, idx): result = {} tree = etree.HTML(html) valimg = None valimgs = tree.xpath('//img[@id="imgCode"]/@src') if len(valimgs) > 0: valimg = valimgs[0] validateCode = None if valimg: fname = 'img/' + str(0) + '_' + str(config['gid']) + '.jpg' config['gid'] = config['gid'] + 1 ri = s.get("https://sojump.com/jq/16276361.aspx?from=timeline" + valimg) with open(fname, 'wb') as f: for chk in ri: f.write(chk) f.close() validateCode = pytesser.image_file_to_string(fname) validateCode = validateCode.strip() validateCode = validateCode.replace(' ', '') validateCode = validateCode.replace('\n', '') result['validateCode'] = validateCode return result s = requests.Session()r = s.get('https://sojump.com/jq/16276361.aspx?from=timeline')while True: res = parse(s, r.text, 0)print(res)
一:安装pytesser3
1.安装pytesser3
pip3 install pytesser3
2.安装PIL(前者的依赖)
pip install pillow
3.安装tesseract-ocr引擎(没有会识别的很慢)
http://101.96.10.43/internode.dl.sourceforge.net/project/tesseract-ocr-alt/tesseract-ocr-setup-3.02.02.exe(win)
http://blog.csdn.net/strugglerookie/article/details/71606540(linux:centos)
二:安装lxml
pip3 install lxml
**
Linux系统
**
示例代码:
mport randomimport requestsimport urllib.parseimport urllib.requestfrom PIL import Imageimport pytesseractfrom time import time,strftime, localtimedef download(qid,header,i): url='https://www.wjx.cn/AntiSpamImageGen.aspx?q='+qid+'&t='+str(int(time() * 1000)) req = urllib.request.Request(url,headers=header)def download(qid,header,i): url='https://www.wjx.cn/AntiSpamImageGen.aspx?q='+qid+'&t='+str(int(time() * 1000)) req = urllib.request.Request(url,headers=header) data = urllib.request.urlopen(req).read() pic = open('%d.gif'%(i),'wb') pic.write(data) pic.close()def binarizing(img): #input: gray image 对图像灰度值低的像素点处理,去除噪声 threshold=30 pixdata = img.load() w, h = img.size for y in range(h): for x in range(w): if pixdata[x, y] > threshold: pixdata[x, y] = 255 else: pixdata[x, y] = 0 return imgdef depoint(img): #input: gray image pixdata = img.load() w,h = img.size for y in range(1,h-1):#图像扩展防止溢出 for x in range(1,w-1): count = 0 if pixdata[x,y-1] > 245: count = count + 1 if pixdata[x,y+1] > 245: count = count + 1 if pixdata[x-1,y] > 245: count = count + 1 if pixdata[x+1,y] > 245: count = count + 1 if count >2: pixdata[x,y] = 255 return imgdef shibie(img): imgry = img.convert('L')#convert对图片处理(参数L是对图像灰度处理) threshold = 140 table = [] for i in range(256): if i < threshold: table.append(0) else: table.append(1) out = imgry.point(table, '1') print(str(pytesseract.image_to_string(out)).strip())#识别 return(str(pytesseract.image_to_string(out)).strip())
1.
安装pytesseract
pip3 install pytesseract
2.安装PIL
pip3 install pillow
3.剩下的看提示吧-_-||
阅读全文
0 0
- python 识别验证码
- python 验证码识别
- python验证码识别
- python验证码识别
- 《python识别验证码》
- python 验证码识别
- python 识别验证码
- Python验证码识别
- Python 验证码识别
- python 识别验证码
- python验证码识别
- python 验证码识别
- python 验证码识别
- Python 识别验证码
- python识别验证码
- [python]python验证码识别
- Python验证码识别模块
- python怎样识别验证码
- 数模算法-一些连续离散化方法
- Threejs开发笔记之三光源
- 第八章 拦截器机制(三) 自定义拦截器
- MyBatis学习(四)-实现一对多的关联表
- HDU
- python识别验证码
- 数模算法-数值分析算法
- 利用SVD的方法求解ICP(详细推导)
- 使用Java8 Files类读写文件
- 数模算法-图象处理算法
- 生活随笔:怀念在广州大学的日子
- python基础4
- Reinforcement Learning强化学习系列之一:model-based learning
- go web: 4 处理默认错误