Tesseract OCR Tutorial

来源：互联网发布：怪物猎人p3软件数据编辑：程序博客网时间：2024/06/15 20:58

from: https://www.raywenderlich.com/93276/implementing-tesseract-ocr-ios

Tesseract OCR Tutorial

Lyndsey Scott on February 13, 2015

Code your way into his/her heart this Valentine’s Day!

Update 01/26/2016: Updated for Xcode 7.2 and Swift 2.1.

Though I originally wrote this tutorial for Valentine’s Day, OCR can bring you love year-round. ;]

You’ve undoubtedly seen it before… It’s widely used to process everything from scanned documents, to the handwritten scribbles on your tablet PC, to the Word Lens technology Google recently added to their Translate app. And today you’ll learn to use it in your very own iPhone app! Pretty neat, huh?

So… what is it?

What is OCR?

Optical Character Recognition, or OCR, is the process of electronically extracting text from images and reusing it in a variety of ways such as document editing, free-text searches, or compression.

In this tutorial, you’ll learn how to use Tesseract, an open source OCR engine maintained by Google.

Introducing Tesseract

Tesseract OCR is quite powerful, but does have the following limitations:

Unlike some OCR engines (like those used by the U.S. Postal Service to sort mail), Tesseract is unable to recognize handwriting and is limited to about 64 fonts in total.
Tesseract requires a bit of preprocessing to improve the OCR results; images need to be scaled appropriately, have as much image contrast as possible, and have horizontally-aligned text.
Finally, Tesseract OCR only works on Linux, Windows, and Mac OS X.

Uh oh…how are you going to use this in iOS? Luckily, there’s an Objective-C wrapper for Tesseract OCR, which can also be used in Swift and iOS. Don’t worry, this Swift-compatible version is the one included in the starter package!

Phew! :]

The App: Love In A Snap

You didn’t think the team here at Ray Wenderlich would let you down this upcoming Valentine’s Day, did you? Of course not! We’ve got your back. We’ve managed to figure out the sure-fire way to impress your true heart’s desire. And you’re about to build the app to make it happen.

U + OCR = LUV

In this tutorial, you’ll learn how to use Tesseract, an open source OCR engine maintained by Google. You’ll work on the Love In A Snap app, which lets you take a picture of a love poem and “make it your own” by replacing the name of the original poet’s muse with the name of the object of your own affection. Brilliant! Get ready to impress.

Getting Started

Download the starter project package here and extract it to a convenient location.

The archive contains the following folders:

LoveInASnap: The Xcode starter project for this tutorial.
Tesseract Resources: The Tesseract framework and language data.
Image Resources: Sample images containing text that you’ll use later.

Looking at your current LoveinASnap.xcodeproj, you’ll notice that ViewController.swift has been pre-populated with a few @IBOutlets and empty @IBAction methods which link the view controller to its pre-made Main.storyboard interface.

Following those empty methods, you’ll see two pre-coded functions which handle showing and removing the view’s activity indicator:

func addActivityIndicator() {  activityIndicator = UIActivityIndicatorView(frame: view.bounds)  activityIndicator.activityIndicatorViewStyle = .WhiteLarge  activityIndicator.backgroundColor = UIColor(white: 0, alpha: 0.25)  activityIndicator.startAnimating()  view.addSubview(activityIndicator)} func removeActivityIndicator() {  activityIndicator.removeFromSuperview()  activityIndicator = nil}

Next there are several more methods which move the elements of the view in order to prevent the keyboard from blocking active text fields:

func moveViewUp() {  if topMarginConstraint.constant != originalTopMargin {    return  }   topMarginConstraint.constant -= 135  UIView.animateWithDuration(0.3, animations: { () -> Void in    self.view.layoutIfNeeded()  })} func moveViewDown() {  if topMarginConstraint.constant == originalTopMargin {    return  }   topMarginConstraint.constant = originalTopMargin  UIView.animateWithDuration(0.3, animations: { () -> Void in    self.view.layoutIfNeeded()  }) }

Finally, the remaining methods appropriately trigger keyboard resignation and calls to moveViewUp()and moveViewDown() depending on user action:

@IBAction func backgroundTapped(sender: AnyObject) {  view.endEditing(true)  moveViewDown()} func textFieldDidBeginEditing(textField: UITextField) {  moveViewUp()} @IBAction func textFieldEndEditing(sender: AnyObject) {  view.endEditing(true)  moveViewDown()} func textViewDidBeginEditing(textView: UITextView) {  moveViewDown()}

Although important to the app’s UX, these methods are the least relevant to this tutorial and as such, have been pre-populated for you so you can get into the fun coding nitty-gritty right away.

But before writing your first line of code, build and run the starter code; click around a bit in the app to get a feel for the UI. The text view isn’t editable at present, and tapping on the text fields simply calls and dismisses the keyboard. Your job is to bring this app to life!

ocr-first-run

Adding the Tesseract Framework

Inside the starter ZIP file you unpacked should be a Tesseract Resources folder, which contains the Tesseract framework as well as the tessdata folder that holds English and French language recognition data.

Open that folder in the Finder and add TesseractOCR.framework to your project by dragging it to Xcode’s Project navigator. Make sure Copy items if needed is checked.

Adding the Tesseract framework

Finally, click Finish to add the framework.

Now you’ll need to add the tessdata folder as a referenced folder so the internal folder structure is maintained. Drag the tessdata folder from the Finder to the Supporting Files group in the Project navigator.

Again, make sure Copy items if needed is checked and also make sure that the Added Folders option is set to Create folder references.

Adding tessdata as a referenced folder

Finally, click Finish to add the data to your project. You’ll see a blue tessdata folder appear in theProject Navigator; the blue color tells you that the folder is referenced rather than an Xcode group.

Since Tesseract requires libstdc++.6.0.9.dylib (or libstdc++.6.0.9.tbd if libstdc++.6.0.9.dylib is unavailable in your current Xcode version) and CoreImage.framework you’ll need to link both of these libraries in.

Select the LoveInASnap project file and the LoveInASnap target. In the General tab, scroll down toLinked Frameworks and Libraries.

ocr-frameworks

There should be only one file here: TesseractOCR.framework, which you just added. Click the + button underneath the list. Find both libstdc++.dylib (or libstdc++.6.0.9.tbd) and CoreImage.framework and add them to your project.

ocr-addlibs

Then on the above tab bar next to Build Phases, click Build Settings. Find Other Linker Flags using the convenient search bar at the top of the table and append -lstdc++ to any and all existing Other Linker Flags keys. Then in that same Build Settings table, find C++ Standard Library and make sure it’s set to“Compiler Default”; then (as of Swift 2.0) find Enable Bitcode and set it to No.

Almost there! One last step…

Wipe away those happy tears, Champ! Almost there! One step to go…

Finally, since Tesseract is an Objective-C framework, you’ll need to create an Objective-C bridging header to use the framework in your Swift app.

The easiest way to create an Objective-C bridging header and all the project settings to support it is to add any Objective-C file to your project.

Go to File\New\File…, select iOS\Source\Cocoa Touch Class and then click Next. EnterFakeObjectiveCClass as the Class name and choose NSObject as the subclass. Also, make sure the Language is set to Objective-C! Click Next, then Create.

When prompted Would you like to configure an Objective-C bridging header? select Yes.

You can chuck out those Objective-c classes! (For this tutorial at least...)

Toss out those Objective-c classes!

You’ve successfully created an Objective-C bridging header. You can delete FakeObjectiveCClass.m and FakeObjectiveCClass.h from the project now, since you really just needed the bridging header. :]

To import the Tesseract framework into your new bridging header, findLoveInASnap-Bridging-Header.h in Project Navigator, open it, then add the following line:

#import <TesseractOCR/TesseractOCR.h>

Now you will have access to the Tesseract framework throughout your project. Build and run your project to make sure everything still compiles properly.

All good? Now you can get started with the fun stuff!

Loading up the Image

Of course, the first thing you’ll need for your OCR app is a mechanism to to load up an image to process. The easiest way to do this is to use an instance of UIImagePickerController to select an image from the camera or Photo Library.

Open ViewController.swift and replace the existing stub of takePhoto() with the following implementation:

@IBAction func takePhoto(sender: AnyObject) {  // 1  view.endEditing(true)  moveViewDown()  // 2  let imagePickerActionSheet = UIAlertController(title: "Snap/Upload Photo",    message: nil, preferredStyle: .ActionSheet)  // 3  if UIImagePickerController.isSourceTypeAvailable(.Camera) {    let cameraButton = UIAlertAction(title: "Take Photo",      style: .Default) { (alert) -> Void in        let imagePicker = UIImagePickerController()        imagePicker.delegate = self        imagePicker.sourceType = .Camera        self.presentViewController(imagePicker,          animated: true,          completion: nil)    }    imagePickerActionSheet.addAction(cameraButton)  }  // 4  let libraryButton = UIAlertAction(title: "Choose Existing",    style: .Default) { (alert) -> Void in      let imagePicker = UIImagePickerController()      imagePicker.delegate = self      imagePicker.sourceType = .PhotoLibrary      self.presentViewController(imagePicker,        animated: true,        completion: nil)  }  imagePickerActionSheet.addAction(libraryButton)  // 5  let cancelButton = UIAlertAction(title: "Cancel",    style: .Cancel) { (alert) -> Void in  }  imagePickerActionSheet.addAction(cancelButton)  // 6  presentViewController(imagePickerActionSheet, animated: true,    completion: nil)}

This code presents two or three options to the user depending on the capabilities of their device. Here’s what’s going on in more detail:

If you’re currently editing either the text view or a text field, close the keyboard and move the view back to its original position.
Create a UIAlertController with the action sheet style to present a set of capture options to the user.
If the device has a camera, add the Take Photo button to imagePickerActionSheet. Selecting this button creates and presents an instance of UIImagePickerController with sourceType .Camera.
Add a Choose Existing button to imagePickerActionSheet. Selecting this button creates and presents an instance of UIImagePickerController with sourceType .PhotoLibrary.
Add a Cancel button to imagePickerActionSheet. Selecting this button cancels yourUIImagePickerController, even though you don’t specify an action beyond setting the style as.Cancel.
Finally, present your instance of UIAlertController.

Build and run your project; tap the Snap/Upload a picture of your Poem button and you should see your new UIAlertController like so:

ocr-action-sheet

If you’re running on the simulator, there’s no physical camera available so you won’t see the “Take Photo” option.

As mentioned earlier in the list of Tesseract’s limitations, images must be within certain size constraints for optimal OCR results. If an image is too big or too small, Tesseract may return bad results or even, strangely enough, crash the entire program with an EXC_BAD_ACCESS error.

To that end, you’ll need a method to resize the image without altering its aspect ratio so you distort the image as little as possible.

Scaling Images to Preserve Aspect Ratio

The aspect ratio of an image is the proportional relationship between its width and height. Mathematically speaking, to reduce the size of the original image without affecting the aspect ratio, you must keep the width to height ratio constant.

Aspect_Ratio

When you know both the height and the width of the original image, and you know either the desired height or width of the final image, you can rearrange the aspect ratio equation as follows:

Aspect_Ratio_b

This results in the two formulas Height1/Width1 * width2 = height2 — and conversely, Width1/Height1 * height2 = width2. You’ll use these formulas to maintain the image’s aspect ratio in your scaling method.

Still in ViewController.swift, add the following helper method to the class:

func scaleImage(image: UIImage, maxDimension: CGFloat) -> UIImage {   var scaledSize = CGSize(width: maxDimension, height: maxDimension)  var scaleFactor: CGFloat   if image.size.width > image.size.height {    scaleFactor = image.size.height / image.size.width    scaledSize.width = maxDimension    scaledSize.height = scaledSize.width * scaleFactor  } else {    scaleFactor = image.size.width / image.size.height    scaledSize.height = maxDimension    scaledSize.width = scaledSize.height * scaleFactor  }   UIGraphicsBeginImageContext(scaledSize)  image.drawInRect(CGRectMake(0, 0, scaledSize.width, scaledSize.height))  let scaledImage = UIGraphicsGetImageFromCurrentImageContext()  UIGraphicsEndImageContext()   return scaledImage}

Given maxDimension, this method takes the height or width of the image — whichever is greater — and sets that dimension equal to the maxDimension argument. It then scales the other side of the image appropriately based on the aspect ratio, redraws the original image to fit into the newly calculated frame, then finally returns the newly scaled image back to the calling method.

Whew! </math>

Now that we’ve gotten all of that out of the way (drumroll please…) you can now get started with your Tesseract implementation!

Implementing Tesseract OCR

Find the UIImagePickerControllerDelegate class extension at the bottom of ViewController.swift and add the following method inside the extension:

func imagePickerController(picker: UIImagePickerController,  didFinishPickingMediaWithInfo info: [String : AnyObject]) {    let selectedPhoto = info[UIImagePickerControllerOriginalImage] as! UIImage    let scaledImage = scaleImage(selectedPhoto, maxDimension: 640)     addActivityIndicator()     dismissViewControllerAnimated(true, completion: {      self.performImageRecognition(scaledImage)    })}

imagePickerController(_:didFinishPickingMediaWithInfo:) is a UIImagePickerDelegate method that returns the selected image information in an info dictionary object. You get the selected photo from info using theUIImagePickerControllerOriginalImage key and then scale it using scaleImage(_:maxDimension:).

You call addActivityIndicator() to disable user interaction and display an activity indicator to the user while Tesseract does its work. You then dismiss your UIImagePicker and pass the image toperformImageRecognition() (which you’ll implement next!) for processing.

Next, add the following method to the main class declaration:

func performImageRecognition(image: UIImage) {  // 1  let tesseract = G8Tesseract()  // 2  tesseract.language = "eng+fra"  // 3  tesseract.engineMode = .TesseractCubeCombined  // 4  tesseract.pageSegmentationMode = .Auto  // 5  tesseract.maximumRecognitionTime = 60.0  // 6  tesseract.image = image.g8_blackAndWhite()  tesseract.recognize()  // 7  textView.text = tesseract.recognizedText  textView.editable = true  // 8  removeActivityIndicator()}

This is where the OCR magic happens! Since this is the meat of this tutorial, here’s a detailed look at each part of the code in turn:

Initialize tesseract to a contain a new G8Tesseract object.
Your poem vil impress vith French! Ze language ov love! *Haugh* *Haugh* *Haugh*
Tesseract will search for the .traineddata files of the languages you specify in this parameter; specifying eng andfra will search for “eng.traineddata” and “fra.traineddata”containing the data to detect English and French text respectively. The French trained data has been included in this project since the sample poem you’ll be using for this tutorial contains a bit of French (Très romantique!). The poem’s French accented characters aren’t in the English character set, so you need to link to French .traineddata in order for those accents to appear; it’s also good to include the French data since there’s a component of .traineddata which takes language vocabulary into account.
You can specify three different OCR engine modes: .TesseractOnly, which is the fastest, but least accurate method; .CubeOnly, which is slower but more accurate since it employs more artificial intelligence; and .TesseractCubeCombined, which runs both .TesseractOnly and .CubeOnly to produce the most accurate results — but as a result is the slowest mode of the three.
Tesseract assumes by default that it’s processing a uniform block of text, but your sample image has multiple paragraphs. Tesseract’s pageSegmentationMode lets the Tesseract engine know how the text is divided, so in this case, set pageSegmentationMode to .Auto to allow for fully automatic page segmentation and thus the ability to recognize paragraph breaks.
Here you set maximumRecognitionTime to limit the amount of time your Tesseract engine devotes to image recognition. However, only the Tesseract engine is limited by this setting; if you’re using the.CubeOnly or .TesseractCubeCombined engine mode, the Cube engine will continue processing even once your Tesseract engine has hit its maximumRecognitionTime.
You’ll get the best results of of Tesseract when the text contrasts highly with the background. Tesseract has a built in filter, g8_blackAndWhite(), that desaturates the image, increases the contrast, and reduces the exposure. Here, you’re assigning the filtered image to the image property of your Tesseract object, before kicking off the Tesseract image recognition process.
Note that the image recognition is synchronous so at this point, the text is available. You then put the recognized text into your textView and make the view editable so your user can edit it as she likes.
Finally, remove the activity indicator to signal that the OCR is complete and to let the user edit their poem.

Now it’s time to test this first batch of code you’ve written and see what happens!

Processing Your First Image

The sample image for this tutorial, found in Image Resources\Lenore.png is shown below:

Lenore

Lenore.png contains an image of a love poem addressed to a “Lenore” — but with a few edits you can turn it into a poem that is sure to get the attention of the one you desire! :]

Although you could print a copy of the image, then snap a picture with the app to perform the OCR, make it easy on yourself and add the image to your device’s Camera Roll to eliminate the potential for human error, lighting inconsistencies, skewed text, and flawed printing among other things. If you’re using the Simulator, simply drag and drop the image file onto the Simulator.

Build and run your app; select Snap/Upload a picture of your Poem then select Choose Existing and choose the sample image from the Photo Library to begin Tesseract processing. You’ll have to allow your app to access the Photo Library the first time you run it, and you’ll see the activity indicator spinning away after you select an image.

And… Voila! Eventually, the deciphered text appears in the text view — and it looks like Tesseract did a great job with the OCR.

OCR_complete

But if the apple of your eye isn’t named “Lenore”, he or she probably won’t appreciate this poem coming from you as it stands…and they’ll likely want to know who this “Lenore” character is! ;]

And considering “Lenore” appears quite often in the scanned text, customizing the poem to your tootsie’s liking is going to take a bit of work…

What’s that, you say? Yes, you COULD implement a great time-saving function to find and replace these words! Brilliant idea! The next section shows you how to do just that.

Finding and Replacing Text

Now that the OCR engine has turned the image into text in the text view, you can treat it as you would any other string!

Open ViewController.swift and you’ll see that there’s already a swapText() method ready for you, which is hooked up to the Swap button in your app. How convenient. :]

Replace the implementation of swapText() with the following:

@IBAction func swapText(sender: AnyObject) {  // 1  if let text = textView.text, let findText = findTextField.text,     let replaceText = replaceTextField.text {    // 2    textView.text =    textView.text.stringByReplacingOccurrencesOfString(findTextField.text,      withString: replaceTextField.text, options: [], range: nil)    // 3    findTextField.text = nil    replaceTextField.text = nil    // 4    view.endEditing(true)    moveViewDown()  }}

The above code is pretty straightforward, but take a moment to walk through it step-by-step.

Only execute the swap code if textView, findTextField, and replaceTextField aren’t nil.
Otherwise, find all occurrences of the string you’ve typed into findTextField in the textView and replace them with the string you’ve entered in replaceTextField.
Next, clear out the values in findTextField and replaceTextField once the replacements are complete.
Finally, resign the keyboard and move the view back into the correct position. As before intakePhoto(), you’re ensuring the view stays positioned correctly when the keyboard goes away.

Note: Tapping the background also ends “editing” mode and moves the view into its original position. This is facilitated through a UIButton that lives behind the other elements of the interface, which triggers backgroundTapped() in ViewController.swift.

Build and run your app; select the sample image again and let Tesseract do its thing. Once the text appears, enter Lenore in the Find this… field (note that the searched text is case-sensitive), then enter your true love’s name in the Replace with… field, and tap Swap to complete the switch-a-roo.

swap_text

Presto chango — you’ve created a love poem that will is tailored to your sweetheart and your sweetheart alone.

Play around with the find and replace to replace other words and names as necessary; once you’re done — uh, what should you do with it once you’re done? Such artistic creativity and bravery shouldn’t live on your device alone; you’ll need some way to share your masterpiece with the world.

Sharing The Final Result

In this final section, you’ll create an UIActivityViewController to let your users can share their new creations.

Replace the current implementation of sharePoem() in ViewController.swift with the following:

@IBAction func sharePoem(sender: AnyObject) {  // 1  if textView.text.isEmpty {    return  }  // 2  let activityViewController = UIActivityViewController(activityItems:    [textView.text], applicationActivities: nil)  // 3  let excludeActivities = [    UIActivityTypeAssignToContact,    UIActivityTypeSaveToCameraRoll,    UIActivityTypeAddToReadingList,    UIActivityTypePostToFlickr,    UIActivityTypePostToVimeo]  activityViewController.excludedActivityTypes = excludeActivities  // 4  presentViewController(activityViewController, animated: true,    completion: nil)}

Taking each numbered comment in turn:

If the textView is empty, don’t share anything.
Otherwise, create an new instance of UIActivityViewController, put the text from the text view inside an array and pass it in as the activity item to be shared.
UIActivityViewController has a long list of built-in activity types. You can excludeUIActivityTypeAssignToContact, UIActivityTypeSaveToCameraRoll, UIActivityTypeAddToReadingList,UIActivityTypePostToFlickr, and UIActivityTypePostToVimeo since they don’t make much sense in this context.
Finally, present your UIActivityViewController and let the user share their creation where they wish.

Build and run the app again, and run the image through Tesseract. You can do the find and replace steps again if you like and when you’re happy with the text, tap the share button.

share_poem

That’s it! Your Love In A Snap app is complete — and sure to win over the heart of the one you adore.

Or if you’re anything like me, you’ll replace Lenore’s name with your own, send that poem to your inbox through a burner account, stay in alone on Valentine’s night, order in some Bibimbap, have a glass of wine, get a bit bleary-eyed, then pretend that email you received is from the Queen of England for an especially classy and sophisticated St. Valentine’s evening full of romance, comfort, mystery, and intrigue. But maybe that’s just me…

Where to Go From Here?

You can download the final version of the project here.

You can find the iOS wrapper for Tesseract on GitHub at https://github.com/gali8/Tesseract-OCR-iOS. You can also download more language data from Google’s Tesseract OCR site; use data versions 3.02 or higher to guarantee compatibility with the current framework.

Try out the app with other poems, songs, and snippets of text; try snapping some images with your camera as well as using images from your Photo Library. You’ll see how the OCR results vary between sources.

Examples of potentially problematic image inputs that can be corrected for improved results. Source: Google’s Tesseract OCR site

Remember: “Garbage In, Garbage Out”. The easiest way to improve the quality of the output is to improve the quality of the input. As Google lists on their Tesseract OCR site there are many ways in which your input quality could be improved: dark or uneven lighting, image noise, skewed text orientation, and thick dark image borders can all contribute to less-than-perfect results.

You can look into image pre-processing or even implement your own artificial intelligence logic, such as neural networks or utilizing Tesseract’s own training tools to help your program learn from its errors and improve its success rate over time. Or since even small variations in image brightness, color, contrast, exposure, can result in variations in output, you can run the image through multiple filters then compare the results to determine the most accurate output. Chances are you’ll get the best results by using some or all of these strategies in combination, so play around with some of these approaches and see what works best for your application.

Tesseract is pretty powerful as is, but the potential for OCR is unlimited. Keep in mind as you use and improve the capabilities of Tesseract OCR with your software that as a sensing, thinking being, if you’re capable of deciphering characters using your eyes or ears or even fingertips, you’re a certifiable expert at character recognition already and are fully capable of teaching your computer so much more than it already knows.

ocr_expert

As always, if you have comments or questions on this tutorial, Tesseract, or OCR strategies, feel free to join the discussion below!

Lyndsey Scott

Actress, Model, App Developer -- www.LyndseyScott.com

0 0