Tesseract 识别 Tiff 格式文件——alpha通道干扰

来源:互联网 发布:stm32编程环境 编辑:程序博客网 时间:2024/05/18 01:46

Problem with Tesseract and tiff format

安装Tesseract后进行OCR验证测试,在使用imagemagick将一幅PNG图像转换为TIF图像

PNG图像


因TIF图像CSDN不支持上传,所以未上传;

convert a.png a.tif

执行下述conmmand :

tesseract a.tif a

显示如下错误:


主要错误为spp not in set {1,3,4}

参考:

It probably means your TIFF image has an alpha channel and therefore the underlying Leptonica library used by Tesseract doesn't support it. If you're using Imagemagick then be aware that operations such as-draw can cause alpha channels to be added. If you're using convert in your workflow and want to remove the channel again immediately, flatten the image before writing by adding -background white -flatten +matte before the output filename, e.g.:

可能因为TIFF图像中包含alpha通道,而Tesseract 使用的库Leptonicabu不支持alpha;

在使用Imagemagick进行图像格式转换时,若加上选项-draw,则会加入alpha,(经验证即使不加该选项也会默认加入alpha);

所以可以在输出文件前加入-background white -flatten +matte,以消除alpha通道!比如:

convert input.tiff -fill white -draw 'rectangle 10,10 20,20' -background white -flatten +matte output.tiff
那么再次对该TIFF图像进行OCR识别后,则不会出现上述提示!
参考及翻译文件地址:
http://stackoverflow.com/questions/5083492/problem-with-tesseract-and-tiff-format




0 0
原创粉丝点击