Comparison of optical character recognition software

来源:互联网 发布:小微企业融资数据 编辑:程序博客网 时间:2024/05/06 13:55

申明:本文非笔者原创,原文转载自:http://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_software


This comparison of optical character recognition software includes:

  • OCR engines, that do the actual character identification
  • Layout analysis software, that divide scanned documents into zones suitable for OCR
  • Graphical interfaces to one or more OCR engines
  • Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discoverysystems, records management solutions)
NameFounded yearLatest stable versionRelease yearLicenseOnlineWindowsMac OS XLinuxBSDProgramming languageSDK?LanguagesFontsOutput FormatsNotesTesseract19853.02Oct 2012ApacheNoYesYesYesYesC++, CYes35+[1]?Text,hOCR,[2]others with different user interfaces[3]or the APICreated by Hewlett-Packard; under further development by Google[4] It was one of the top 3 engines in the 1995 UNLV Accuracy test.ExperVision[5]TypeReader & RTK19877.1.170.11252010ProprietaryYesYesYesYesYesC/C++Yes212618 Won the highest marks in the independent testing performed byUNLV for X consecutive years (in 1994).[6][citation needed]


The speed of ExperVision’s OpenRTK is four to eight times faster than competition. — PC Magazine[7] but also "Not as accurate as rival products, clumsy interface, limited options for proofreading, couldn't open some files in standard PDF or image formats."[8] PC Magazine

ABBYY FineReader1989112011ProprietaryYesYesYesYesYesC/C++Yes198[9]?DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[10]ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[11]AnyDoc Software1989??ProprietaryNoYesNoNoNoVBScript??? Works with structured, semi-structured, and unstructured documents.Aquaforest OCR SDK20011.412013ProprietaryYes[12]YesNoNoNoC#, VB.NET, ASP.NETYes23OmniFont (Extended Module available, including support for over 100 languages)[13]PDF, PDF/A, RTF, TXTAquaforest's[14] OCR SDK for .NET[15]enables developers to directly make use of the Aquaforest OCR engine in their own applications and create searchable PDFs, RTF or text files from TIFFs, Bitmaps and Image-Only PDFs.LEADTOOLS[16]1990[17]18.02013ProprietaryYesYesYesYesNoC/C++, .NET, Objective-C, Java, JavaScriptYes56[18]Any printed fontPDF, PDF/A, DOC, DOCX, XLS, XPS, RTF, HTML, ANSI Text, Unicode Text, CSV[19]Supports Latin, Asian, Arabic, and MICR character sets.[16] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[20] ICR (handwritten text recognition) is supported.[21]CuneiForm/OpenOCR1996122007BSD variantNoYesYesYesYesC/C++Yes28Any printed fontHTML, hOCR, native, RTF,TeX, TXT[22]Enterprise-class system, can save text formatting and recognizes complicated tables of any structureTransym OCR20003.32011ProprietaryNoYesNoNoNoC#, C/C++, VB, VB.NETYes11?  Image to OCR Converter2010[23]1.2[24]2012ProprietaryNoYesNoNoNoC/C++, VB and .NETCommand Line40?SearchablePDF, Text-Only PDF, Word, HTML, Text[25]It can read most image formats and pdf files, and can scan images from scanner or camera.[26][27]SimpleOCR20023.52008ProprietaryNoYesNoNoNo????  Dynamsoft OCR SDK20038.22012ProprietaryYesYesNoNoNoC/C++Yes40+[28]?PDF, TXTDynamsoft is the leading provider of image capture SDKs and version control tools.OmniPage2005182011ProprietaryNoYesYesYesNoC/C++, C#[29]Yes?? Product of Nuance CommunicationsMicrosoft Office OneNote 20072007?2007ProprietaryNoYesNoNoNo????  FreeOCR?4.2August 2012ProprietaryNoYesNoNoNo???? [30]GOCR?0.492010GPLYes[31]YesYesYesYesC???  Ocrad?0.21[32]2011GPLYesYesYesYesYesC++YesLatin alphabet? Command lineSmartScore???ProprietaryNoYesYesNoNo???? For musical scoresMicrosoft Office Document Imaging?Office 20072007ProprietaryNoYesNoNoNo???? Uses OmniPage[citation needed]Puma.NET???BSDNoYesNoNoNoC#Yes28Any printed font .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applicationsReadSoft???ProprietaryNoYesNoNoNo???? Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.Scantron?Cognition??ProprietaryNoYesNoNoNo???? For working with localized interfaces, corresponding language support is required.OCRFeeder?0.7.112009GPLNoNoNoYesNoPython??? Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines likeTesseract or OcradOCRopus?0.62012ApacheNoNoNoYesNoPython???hOCR, HTML, TXT[33]Pluggable framework under active development, used forGoogle BooksNameFounded yearLatest stable versionRelease yearLicenseOnlineWindowsMac OS XLinuxBSDProgramming languageSDK?LanguagesFontsOutput FormatsNotes