Tesseract-OCR 字符识别---样本训练

来源：互联网发布：剑灵捏脸数据天族女编辑：程序博客网时间：2024/06/03 08:38

详细训练方法地址：http://blog.csdn.net/firehood_/article/details/8433077

1.将裁剪好的图片（jpg格式），运行jTessBoxEditor工具，在点击菜单栏中Tools--->Merge TIFF。在弹出的对话框中选择样本图像（按Shift选择多张）合并成num.font.exp0.tif文件2.在文件中打开命令提示符cmd，执行命令：tesseract.exe num.font.exp0.tif num.font.exp0 batch.nochop makebox  生成的BOX文件为num.font.exp0.box，BOX文件为Tessercat识别出的文字和其坐标。3.运行jTessBoxEditor工具，open打开tif文件，矫正图片4.bat文件运行得到traineddata

rem 执行改批处理前先要目录下创建font_properties文件  echo Run Tesseract for Training..  tesseract.exe num.font.exp0.tif num.font.exp0 nobatch box.train  echo Compute the Character Set..  unicharset_extractor.exe num.font.exp0.box  mftraining -F font_properties -U unicharset -O num.unicharset num.font.exp0.tr  echo Clustering..  cntraining.exe num.font.exp0.tr  echo Rename Files..  rename normproto num.normproto  rename inttemp num.inttemp  rename pffmtable num.pffmtable  rename shapetable num.shapetable   echo Create Tessdata..  combine_tessdata.exe num.

5.复制到tesseract—ocr\tessdata文件夹中6.命令提示符执行命令：tesseract.exe number.jpg result -l num  （查看识别率，num是traineddata前缀训练好的）

0 0