OCR项目汇总

来源:互联网 发布:linux线程的优先级控制 编辑:程序博客网 时间:2024/05/29 08:13

基本介绍

1 OCR文字识别用的是什么算法?|知乎
2 深度学习文字识别论文综述|CSDN, 综述中涉及到的论文都很旧,
3 文字检测与识别资源|CSDN,涉及的论文都很新,五颗星
4 Awesome Scene Text Recognition,awesome,五颗星
5 OCR, 这个博主的质量都很高,五颗星
6 YunOS场景文字识别|阿里云

paper

reading text in the wild, VGG 组
1 Reading Text in the Wild with Convolutional Neural Networks, VGG组,, IJCV2016
阅读笔记|CSDN
2 Synthetic Data for Text Localisation in Natural Images, VGG组, CVPR2016,
阅读笔记|CSDN,code
3 Deep Features for Text Spotting
, VGG组, ECCV2014
4 Detecting Text in Natural Image with
Connectionist Text Proposal Network,
code, ECCV2016

CVPR2017相关paper

  • Awesome Typography: Statistics-Based Text Effects Transfer,文字生成,效果很酷炫
  • EAST: An Efficient and Accurate Scene Text Detector, 快&准的场景文字检测
  • Detecting Oriented Text in Natural Images by Linking Segments
  • Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection
  • Unambiguous Text Localization and Retrieval for Cluttered Scenes
    , 文本定位和检索

数据集

  • MSRA Text Detection 500 Database (MSRA-TD500)
  • The Street View Text Dataset)
  • The Street View House Numbers (SVHN)_Dataset
  • NEOCR: Natural Environment OCR Dataset
  • KAIST Scene Text Database
  • ICDAR 2003 Robust Reading Competitions
  • ICDAR 2005 Robust Reading Competitions
  • ICDAR 11
  • ICDAR 2013 Robust Reading Competition
  • COCO-Text: Dataset for Text Detection and Recognition

gtihub code

1 tesseract, stars 12k, C/C++接口
2 tesseract.js, stars 12k, pure js,支持62种语言的OCR
3 paperless, stars 3.6k, 主打document OCR
4 pyocr, starts 606, A Python wrapper for Tesseract and Cuneiform
5 doc2text, stars 1k, 依赖opencv与tesseract
6 pdftabextract, stars 668,pdf中的表格提取转换到excel中
7 tesserocr,tesseract-ocr API
的python 接口
8 SSD_scene_text_detection, 将SSD用于场景文本检测中

复现点:
1 paper: reading text in the wild with deep convolutional neural network
论文阅读笔记:论文阅读:Reading Text in the Wild with Convolutional Neural Networks,
部分代码为code|matlab

文章的主要思想为先利用region proposal产生出足够多的候选区域,再resize这些候选框到固定大小,用一个CNN来对这些候选框进行单词的分类,超过90k个单词。使用生成的带文本的图片的方法,能够保证文本单词的样本量。
思路很清晰,限制条件也很明显,不能出现样本外的单词,诸如一些合成词;此外,候选框也需要完整地包含单词。

2 paper : EAST: An Efficient and Accurate Scene Text Detector
旷视的最新成果。

原创粉丝点击