Python入门：PIL之验证码破解

来源：互联网发布：淘宝客qq群拉人编辑：程序博客网时间：2024/05/23 15:06

环境介绍

1、当前文件的路径是：/Users/frankslg/PycharmProjects/cjb/ver/ver_code1.py
2、而存放图片的路径是：/Users/frankslg/PycharmProjects/cjb/img/*.jpeg
3、os.getcwd()
Out[3]: ‘/Users/frankslg/PycharmProjects/cjb’

代码实现

#ver_code1.pyfrom PIL import Imageimport pytesseractimport osdef convert(pic_path,pic):    #先将图片进行灰度处理，也就是处理成单色，然后进行下一步单色对比    imgrey = pic.convert('L')    #去除图片噪点,170是经过多次调整后,去除噪点的最佳值    '''    其实就是对已处理的灰度图片,中被认为可能形成验证码字符的像素进行阀值设定,    如果阀值等于170,我就认为是形成验证码字符串的所需像素,然后将其添加进一个空table中,    最后通过im.point将使用table拼成一个新验证码图片    '''    threshold = 170    table = []    for i in range(256):        if i < threshold:            table.append(0)        else:            table.append(1)    #使用table（是上面生成好的）生成图片    out = imgrey.point(table,'1')    out.save(pic_path + '/' + 'cjb'+ str(threshold) + '.jpeg','jpeg')    #读取处理好的图片的路径    a = pic_path + '/' + 'cjb' + str(threshold) + '.jpeg'    img3 = Image.open(a,'r')    #将图片中的像素点识别成字符串（图片中的像素点如果没有处理好，    #可能在识别过程中会有误差，如多个字符少个字符，或者识别错误等）    vcode = pytesseract.image_to_string(img3)    print(vcode)#此句也是测试结果时使用的    return vcode#此句才是将被破解的验证码字符串返回给需要的代码的if __name__ == '__main__':    pic_path = (os.getcwd()[:-4])+ '/img'#先获取图片的存储路径    pic = pic_path + '/' + os.listdir(pic_path)[0]#找到对应的图片，此处的0是指，    #找图片目录中第一个图片，你可以根据自己的需要进行修改    pic_open = Image.open(pic,'r')    convert(pic_path,pic_open)

运行效果

原图：
这里写图片描述
灰度图：

清除噪点后的图：

注：这里要说明一点，清除噪点后的图是白底黑字，还是黑底白字就看噪点处理代码中大于噪点时使用的是1还是0
代码执行后的结果：
WDHA

参考资料

In[18]: help(Image.open(pic,’r’).convert)

Help on method convert in module PIL.Image:

convert(mode=None, matrix=None, dither=None, palette=0, colors=256) method of PIL.JpegImagePlugin.JpegImageFile instance
Returns a converted copy of this image. For the “P” mode, this
method translates pixels through the palette. If mode is
omitted, a mode is chosen so that all information in the image
and the palette can be represented without a palette.

The current version supports all possible conversions between"L", "RGB" and "CMYK." The **matrix** argument only supports "L"and "RGB".When translating a color image to black and white (mode "L"),the library uses the ITU-R 601-2 luma transform::    L = R * 299/1000 + G * 587/1000 + B * 114/1000The default method of converting a greyscale ("L") or "RGB"image into a bilevel (mode "1") image uses Floyd-Steinbergdither to approximate the original image luminosity levels. Ifdither is NONE, all non-zero values are set to 255 (white). Touse other thresholds, use the :py:meth:`~PIL.Image.Image.point`method.:param mode: The requested mode. See: :ref:`concept-modes`.:param matrix: An optional conversion matrix.  If given, this   should be 4- or 12-tuple containing floating point values.:param dither: Dithering method, used when converting from   mode "RGB" to "P" or from "RGB" or "L" to "1".   Available methods are NONE or FLOYDSTEINBERG (default).:param palette: Palette to use when converting from mode "RGB"   to "P".  Available palettes are WEB or ADAPTIVE.:param colors: Number of colors to use for the ADAPTIVE palette.   Defaults to 256.:rtype: :py:class:`~PIL.Image.Image`:returns: An :py:class:`~PIL.Image.Image` object.

In[10]: help(im.point)

Help on method point in module PIL.Image:

point(lut, mode=None) method of PIL.JpegImagePlugin.JpegImageFile instance
Maps this image through a lookup table or function.

:param lut: A lookup table, containing 256 (or 65336 if   self.mode=="I" and mode == "L") values per band in the   image.  A function can be used instead, it should take a   single argument. The function is called once for each   possible pixel value, and the resulting table is applied to   all bands of the image.:param mode: Output mode (default is same as input).  In the   current version, this can only be used if the source image   has mode "L" or "P", and the output has mode "1" or the   source image mode is "I" and the output mode is "L".:returns: An :py:class:`~PIL.Image.Image` object.

In[16]: help(pytesseract.image_to_string)

Help on function image_to_string in module pytesseract.pytesseract:

image_to_string(image, lang=None, boxes=False, config=None)
Runs tesseract on the specified image. First, the image is written to disk,
and then the tesseract command is run on the image. Resseract’s result is
read, and the temporary files are erased.

also supports boxes and config.if boxes=True    "batch.nochop makebox" gets added to the tesseract callif config is set, the config gets appended to the command.    ex: config="-psm 6"

0 0