Tesseract-OCR 进行文字识别 VS2010

来源：互联网发布：二维动画片制作软件编辑：程序博客网时间：2024/04/29 07:41

近日做铸件文字识别的项目，需要识别铸件上的字符和数字，找到开源的识别库Tesseract，下面简单记录下怎么使用。

首先在项目主页http://code.google.com/p/tesseract-ocr/ 下载库和相应的字库。由于本人使用的是VS2010，其lib和include等库使用的VS2008进行编译的，所以一直出错。用VS2010的同学可以在这里下载编译好的VS2010的相应的库。

然后进行配置，和其他库的配置类似，include lib dll。

#include "allheaders.h"#include "baseapi.h"#include "strngs.h"#include <cv.h>#include <highgui.h>#include <iostream>using namespace cv;using namespace std;int _tmain(int argc, _TCHAR* argv[]){        char *image_path="zj.jpg";tesseract::TessBaseAPI  api;api.Init(NULL,"eng",tesseract::OEM_DEFAULT);api.SetPageSegMode(tesseract::PSM_AUTO);FILE* fin = fopen(image_path, "rb");if (fin == NULL) {printf("Cannot open input file: %s\n", image_path);exit(2);}fclose(fin);PIX   *pixs;if ((pixs = pixRead(image_path)) == NULL) {printf("Unsupported image type.\n");exit(3);}pixDestroy(&pixs);STRING text_out;if (!api.ProcessPages(image_path, NULL, 0, &text_out)) {printf("Error during processing.\n");}cout<<"识别结果为："<<text_out.string();                  return 0;}

0 0