4 Tesseract-ocr 系列 : 开源OCR比较
来源:互联网 发布:全球tv直播软件 编辑:程序博客网 时间:2024/06/06 18:06
对开源OCR做一个简单的调研,网上找到的关于OCR的一些资料:
国内参考资料:
最好的开源或开放API的ocr引擎是什么?
https://www.zhihu.com/question/22417946
列举了4中OCR
目录:
1. Tesseract-OCR (Google)
2. Azure (Microsoft)
3. ABBYY Real-Time Recognition SDK
4. ocr space
OCR:几大开源库
http://blog.csdn.net/qianliheshan/article/details/48974927
OCR 中文识别用哪种软件识别率比较高?
https://www.zhihu.com/question/19593313
OCR识别的开源分析
http://blog.csdn.net/luojun2007/article/details/51614133
国外资料:
What are the best open source OCR libraries?
https://www.quora.com/What-are-the-best-open-source-OCR-libraries
Are you looking for programming libraries or even OCR software works for you ?
OCR libraries
1) Python pyocr and tesseract ocr over python
2) Using R language ( Extracting Text from PDFs; Doing OCR; all within R )
Free OCR Softwares
1. Google’s & HP’s Tesseract
2. Google’s Keep
3. Microsoft Document Imaging ( MODI ) ( assuming majority of us would be having a windows OS )
4. Microsoft One Note
5. Microsoft Oxford Project API ( This API is free until some time )
6. FreeOCR ( This is based on Tesseract engine again )
There are lot more but these are the best and out of all these, if you are looking for accuracy , Microsoft Document Imaging does better job. And if you are looking for hand written text ocr conversion then Google’s Keep does better job.
Commercial Products
1. Adobe Acrobat Pro ( RTF file format gives you best result )
2. Captiva
3. Abbyy
4. Informatica ( Not sure which module within Informatica )
5. IBM Datacapture (Datacap) (IBM Watson)
If accuracy is only your main constraint, there is something like Unprecedented Data Access at your Service( captricity ) which boasts of 99% accuracy since they crowd source people and make them convert hand written text without compromising security.
关于Tesseract-OCR (Google)
支持100多种语言,自带的库识别率不高,但是关键是可以自己训练来改善识别率。提供支持 C 和 C++ 的API。目前使用的最多。
关于安装、使用、训练等详细说明可以参见GitHub:https://github.com/tesseract-ocr/
- 4 Tesseract-ocr 系列 : 开源OCR比较
- 开源OCR引擎Tesseract-OCR
- 开源OCR引擎Tesseract
- 开源OCR引擎Tesseract
- 开源OCR引擎Tesseract
- 开源OCR引擎Tesseract
- 2 Tesseract-ocr 系列 : Tesseract-ocr training (训练)
- 开源OCR引擎Tesseract-OCR简介
- tesseract-ocr
- ocr tesseract
- Tesseract OCR
- Tesseract OCR
- tesseract ocr
- Tesseract-ocr
- Tesseract ocr
- Tesseract-OCR
- tesseract-ocr
- Tesseract-OCR学习系列(一)简介
- C++学习笔记十五之类的构造函数和析构函数
- sdl从sdl_renderer读出数据,然后构造一个sdl_surface
- gcvt字符串转换函数应用实例
- PB常用事件
- 使用Postman模拟传入json字符串测试代码
- 4 Tesseract-ocr 系列 : 开源OCR比较
- HDU6112-今夕何夕
- B树、B-树、B+树、B*树
- android6.0获取蓝牙mac地址 但获取到02:00:00:00:00:00 解决方法
- SpringCloud教程 | 第二篇: 服务消费者(rest+ribbon)
- 微服务开发的入门级框架Spring Boot详解(四)
- vector
- 《深入理解java虚拟机笔记》
- Best Time to Buy and Sell Stock