windows下pytesseract识别验证码遇到的WindowsError: [Error 2] 的解决方法

来源:互联网 发布:创意马克杯 知乎 编辑:程序博客网 时间:2024/04/30 23:48

安装PIL+pytesseract
安装很简单,参考http://www.waitalone.cn/python-php-ocr.html

从http://www.lfd.uci.edu/~gohlke/pythonlibs/里面下载pillow选择自己的版本即可, 我是2.7,然而这里有个问题,明明我机子是64位的,我下载了64位的whl然后pip安装的时候居然报错了,说格式不支持,然后我就去下载32位了,居然特么的安装上了。算了....


然后

pip install pytesseract

安装成功后执行脚本:


from PIL import Imagefrom pytesseract import image_to_stringimage = Image.open(r'7364.png')  # Open image object using PIL<pre name="code" class="python"><pre name="code" class="plain">报错,错误如下:

Traceback (most recent call last):  File "F:/spider/test.py", line 4, in <module>    print image_to_string(image)     # Run tesseract.exe on image    File "C:\Users\tandazhao\spider_venv\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string    config=config)  File "C:\Users\tandazhao\spider_venv\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract    stderr=subprocess.PIPE)  File "C:\Python27\Lib\subprocess.py", line 711, in __init__    errread, errwrite)  File "C:\Python27\Lib\subprocess.py", line 959, in _execute_child    startupinfo)WindowsError: [Error 2] 

print image_to_string(image) # Run tesseract.exe on image

上网找解决方法,说是pytesseract.py 里面的

tesseract_cmd = 'tesseract' 改成  tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

好,我改

再次运行,嗯,再次报错

Traceback (most recent call last):  File "F:/spider/test.py", line 4, in <module>    print image_to_string(image)     # Run tesseract.exe on image    File "C:\Users\tandazhao\spider_venv\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string    config=config)  File "C:\Users\tandazhao\spider_venv\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract    stderr=subprocess.PIPE)  File "C:\Python27\Lib\subprocess.py", line 711, in __init__    errread, errwrite)  File "C:\Python27\Lib\subprocess.py", line 959, in _execute_child    startupinfo)WindowsError: [Error 2] 

呵呵哒,仔细看命令,发现windows下\t转义了。。。。然后在tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'前面加个r,

tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

执行,OK,识别出来了

C:\Users\tandazhao\spider_venv\Scripts\python.exe F:/spider/test.py
7364

Process finished with exit code 0


哈哈哈



1 0
原创粉丝点击