编译Tesseract OCR 1.03

来源:互联网 发布:linux ip转发 编辑:程序博客网 时间:2024/05/21 08:48


原先是HP写的,现在Open source了。支持英文字母和数字。据说辨识程度是世界排名第三的。http://sourceforge.net/projects/tesseract-ocr

Linux下编译:
configure
make
make install

发现错误:
分为2中错误
第一种是关于符号转换的bug,const char* 转换 char* 错误,经常发生在str××××相关函数,解决方法--将第一个参数用(char*)强制转换一下。
第二中错误是发生在C++代码引用C代码的问题上,解决方法如下
11111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111

# diff -C 3 ./cutil/globals.h~ ./cutil/globals.h
*** ./cutil/globals.h~  2007-05-15 20:13:26.000000000 -0500
--- ./cutil/globals.h   2007-06-16 04:27:42.000000000 -0500
***************
*** 45,53 ****
  extern int debugs[MAXPROC];      /*debug flags */
  extern int plots[MAXPROC];       /*plot flags */
  extern int corners[4];           /*corners of scan window */
  extern int optind;               /*option index */
  extern char *optarg;             /*option argument */
!                                  /*image file name */
  extern char imagefile[FILENAMESIZE];
                                   /* main directory */
  extern char directory[FILENAMESIZE];
--- 45,58 ----
  extern int debugs[MAXPROC];      /*debug flags */
  extern int plots[MAXPROC];       /*plot flags */
  extern int corners[4];           /*corners of scan window */
+ #ifdef __cplusplus
+ extern "C" {
+ #endif
  extern int optind;               /*option index */
  extern char *optarg;             /*option argument */
! #ifdef __cplusplus
! }
! #endif                               /*image file name */
  extern char imagefile[FILENAMESIZE];
                                   /* main directory */
  extern char directory[FILENAMESIZE];

2222222222222222222222222222222222222222222222222222222222222222
2222222222222222222222222222222222222222222222222222222222222222

# diff -C 3 ./cutil/tordvars.h~ ./cutil/tordvars.h
*** ./cutil/tordvars.h~ 2007-05-16 16:33:53.000000000 -0500
--- ./cutil/tordvars.h  2007-06-16 04:25:43.000000000 -0500
***************
*** 39,44 ****
--- 39,46 ----
  extern FILE *correct_fp;                    //correct text
  extern FILE *matcher_fp;
 
+ extern "C"
+ {

  extern int blob_skip;                       /* Skip to next selection */
  extern int num_word_choices;                /* How many words to keep */
  extern int similarity_enable;               /* Switch for Similarity */
***************
*** 50,55 ****
--- 52,58 ----
  extern int show_bold;                       /* Use bold text */
  extern int display_text;                    /* Show word text */
  extern int display_blocks;                  /* Show word as boxes */
+ }
 
  extern float overlap_threshold;             /* Overlap Threshold */
  extern float certainty_threshold;           /* When to quit looking */

测试:
执行例子图像文件tesseract.exe phototest.tif abc batch
输出结果在abc.txt,识别率竟然是100%。当然你自己做的图片就不一定有这么高。

原创粉丝点击