Optical Character Recognizer Reference

Accessing the OCR System
Introduction
Main Screen
Lookup Words
- Basic Operation
- Camera Controls
Capture Flashcards
Capture Text
Still Image Recognition

Accessing the OCR System

The Pleco Optical Character Recognizer system is a paid add-on module; if purchased, you can access it by tapping on the menu button at the bottom right corner of your iPhone's screen, then tapping on"Read+OCR" (second tab from the left) and on the "Optical Character Recognizer" command at the top of the screen.

If you haven't purchased the OCR module yet, you can do so through the Add-ons tab; there's also a demo version accessible through there (just tap on the "Optical Character Recognizer" item). We strongly recommend that you try out that demo version before purchasing this module; this is a totally new feature, not just for us but for Chinese dictionary software in general, and while we're working hard to improve it, there are a number of limitations and you may not find that it works well enough to be usable for you yet.

Introduction

The new Optical Character Recognizer system in Pleco 2.2 is our first attempt to introduce a totally new way to look up unknown characters since the debut of our original Palm OS software way back in 2001. Rather than handwriting in unknown characters, or tediously looking them up in a radical index, using OCR you can simply point your iPhone's camera at a word to look it up instantly; you don't even need to tap a shutter button, you just line up the camera with the word and the definition appears instantly.

Principles

Much like our handwriting recognizer, our OCR system works by matching characters to templates in a database; it turns the image of the character into a simple mathematical structure, identifies its key features (lengths / positions / curvatures of strokes, etc), then searches through its database of 10,000+ Chinese characters to find the one that most closely matches that pattern.

However, while the handwriting recognizer always has a very clear picture of the character you drew - it knows exactly where every stroke is located, where it starts / ends, what order strokes were drawn in, where it overlaps other strokes - the OCR system has to contend with a much murkier one; characters on a camera image can be small, grainy, and out-of-focus, and the same calligraphic flourishes that make printed Chinese text so pretty to look at also make it harder to see the underlying structure of each character.

OCR is also up against some psychological hurdles compared to handwriting input; while a mis-recognized handwritten character can be chalked up to one's poor handwriting / incorrect stroke order, with a printed character there's nobody to blame but the recognition software. On top of which, because OCR must recognize multiple characters at a time, there's no opportunity for it to show you its other, less likely matches like the handwriting recognizer does. Handling lots of characters at once also means that even if gets a higher percentage of them accurate on the first try, if just a few of those are incorrect it'll still feel as if it got the entire block of text wrong. So while handwriting only has to contend with one character at a time, and can even be forgiven for getting that character wrong as long as the correct character is among its top 5 matches, OCR has to deal with multiple characters and get every one of them exactly correct in order to seem like it's doing its job.

(this is all a convoluted way of asking you to be patient if things don't work perfectly every time; we're new at this, we're working on all sorts of cool new image processing / analytical tools to bring this even closer to character recognizer perfection, but in the meantime we hope you'll find it accurate enough to be useful in its current form)

Limitations

Here are some specific limitations to keep in mind when using our OCR system:

Printed text only - the templates which our system matches characters to are based on common printed Chinese fonts rather than on handwritten characters. Very neat handwriting might occasionally work, but officially only printed text is supported.
Limited character set - our system recognizes a total of 6,763 simplified Chinese characters and 5,401 traditional ones (for Chinese computing geeks, it's all of the characters in the GB-2312 standard and the characters from the commonly-used half of the Big5 standard), so some rare characters may not be recognized simply because they're not in the database. Every new character we support is another character that might potentially result in a false positive match, so we have to keep the numbers limited for the sake of accuracy.
Background interference - our system has a very hard time distinguishing Chinese characters from other things it sees - background images / patterns, intense shadows / bright spots, even simple rectangular borders around signs and such can all create problems. Turning on the "Sauvola binarization" option in Settings can help with background / lighting issues, and resizing the recognition area to include only characters and leave out any extraneous lines / patterns / etc can help with borders, but there may be some types of text (white characters with black outlines against a light-colored background, e.g.) that you simply can't get it to recognize reliably.
Jitter - at the moment, the system can feel very "jittery," frequently changing which characters it thinks it sees even when you're pointing the camera at the same text. Turning on the "Hide unused chars" option in Settings can make things feel a bit smoother (though it doesn't actually change the algorithm), and increasing the "Word detect samples" value in Settings can make the dictionary definition change less often at least; it can also help to turn on the built-in flash (on an iPhone 4) or simply turn on a nearby lamp, as this tends to make the camera see images more clearly and with less background noise. The history function also helps if you find that the definition changes before you have a chance to finish reading it.

We are working to improve the jitter situation in a future update through some fancy statistical modeling (averaging the results from multiple frames of video), but for right now, if you have unsteady hands or want to use our OCR system on a train / bus / etc, you might want to consider a setup where your iPhone remains stationary and only the text you're recognizing moves; for example, clipping your iPhone to a table and sliding a book around underneath it. In our testing at least, we've found that this produces much better results in moving vehicles and other shake-intensive situations (as any shaking that does occur affects the iPhone and the book equally), and if you employ the built-in zoom function, you can easily position the iPhone far enough away from the text to allow you to see every corner of the page without having to move the phone.
Focus - this one's actually more of an cell phone camera problem than an OCR problem. At close distances, the iPhone camera's lens has a very poor depth-of-field - in other words, the range of distances at which objects will be in focus is quite small - so even if you move the iPhone just a little bit farther from / closer to the page you may find that it quickly gets out-of-focus. Most of the time it'll refocus automatically after a few seconds, but if it doesn't, tap on the "focus" button at the bottom left corner of the screen to manually re-focus, or just give the iPhone a shake (this seems to reset its camera).
Line spanning - in English and other alphabetic languages, each word is generally entirely on one line of text; only very rarely do words wrap around (with a hyphen) to the next line. In Chinese, however, every line of text generally has the exact same number of characters on it, and so you routinely encounter words that start on one line and end on the next; e.g. the first character of the word is the last character in a particular line of text, and the second character of the word is the first character on the next line. This means that in order to look up that word with our OCR system, you need to point it at both halves of the word separately and combine them, which you can do through the span lines command; slightly annoying, but there's nothing we can do to really "fix" it since it's inherent in the nature of Chinese text.

Hardware Requirements

At present, the OCR module is only compatible with devices that have built-in, autofocus cameras; right now this means it only works on the iPhone 3GS and iPhone 4. We've tried very hard to get it working on non-autofocus devices like the iPhone 3G and 4th-generation iPod Touch, but after extensive testing we've found that they simply can't see characters well enough for our system to recognize them clearly; we're exploring a number of workarounds for this (sharpening images by stitching multiple frames of video together, for example, or creating some sort of clip-on plastic lens that improves focus at short distances), but none of those have come together so far, so for right now all we can do is hope that the next-generation iPod comes with a higher-quality camera.

While the still image capture mode can theoretically work without any built-in camera at all (copying over photos from a computer or, on iPad, from a memory card via the Camera Connection Kit), that mode is still highly experimental and we don't yet feel it works well enough to sell it on its own; hopefully after a few bug-fix updates it'll be ready for prime time, though.

Main Screen

Tap on "Optical Character Recognizer" at the top of the "Read+OCR" tab to get to the main OCR screen shown here:

The "Live Video Capture" options will be the only ones visible unless you enable the experimental still image capture system in Settings.

Lookup Words: the primary way of using Pleco OCR (and the one shown in our demo video): point your phone's camera at a word and its definition instantly pops up below. If you turn on the "Take over wild button" option in Settings, you'll also be able to access this screen by tapping on an "OCR" button in the main Dict tab when the search input bar is open.

Capture Flashcards: point the camera at a word and hold it there for a second or so to add it to your flashcard database; only visible if you've purchased the flashcard add-on.

Capture Text: point the camera at a longer block of text and tap on a button to bring up that block of text in a document reader screen; you can then tap on characters in the text to look them up, or save the output to a text file. This is best for short blocks of text of no more than a paragraph or so.

Still Image Capture Modes (Experimental)

Take Photo: take a picture with your iPhone's camera and bring up the still image capture system screen to crop / recognize it.

Photo Library: crop / recognize a photo from your iPhone's photo library / camera roll.

Image File: select an image file (.png, .jpg, etc) that's been copied into Pleco's document storage area and crop / recognize that.

Lookup Words

Tap on "Lookup Words" in the main screen above to get to the actual recognizer interface:

Lots of options here, but most of them are fairly straightforward:

Top Bar:

Exit - return to the main OCR screen to use a different OCR method or go back to the document reader.
Camera - display an additional toolbar of camera controls - zoom, flash (iPhone 4 only), and a black-on-white/white-on-black toggle.
History - bring up a screen with a list of recently looked up words - you can view their definitions, add them to flashcards, play audio, etc through there.
Menu - tap on this button to jump to other parts of Pleco, i.e., the Dict, Add-ons, and Settings tabs.

Middle:

Recognition Area - the OCR system will look up characters inside of this box, beginning from the side that the arrow (>) is on. Drag any corner to resize it. Characters that are part of the recognized word will be shown in blue, other characters in green. See below for more information.
Switch Dictionary - tap on this button to view a definition for the same word in a different dictionary. This selection is "sticky," so the OCR system will default to the newly-selected dictionary for later word lookups as well.
Scroll Entry - tap on the up / down arrows to scroll through a list of entries (if more than one are found that match the current word).
Definition - the dictionary definition of the recognized word.

Bottom Bar:

Focus - tap on this button to re-focus the iPhone's camera if the image gets blurry / out-of-focus; you can also just shake the iPhone to do the same thing.
Pause - tap on this button to pause recognition and freeze on the current word; this brings up a few additional options in place of the focus/span buttons.
Span Lines - "lock in" the first half of a word that wraps around to the next line of text; point the camera at that first half, tap on this button, then point it at the second half to see a definition for the combined word.

Basic Operation

To look up a word, point your iPhone's camera at the word you want to look up and square up that word within the recognition area. It's OK if there are additional characters in the recognition area too; just make sure that the left edge of the recognition area (where the > is located) is lined up with the first character in the word, and (if there's more than one line of text visible) that the top edge of the word is lined up with the top of the recognition area.

The OCR system will show you every character it recognizes within the recognition area in green, and once it's confident enough in a particular couple of characters, it will "lock on" to those characters, show them in blue instead of green, and display their definition. If you point at a different set of characters it'll quickly lock on to those instead, so you can scan along a whole line of text and read definitions as you go. (both the blue and green colors used can be changed in Settings)

Both horizontal and vertical text are supported; if the recognition area is resized to be vertical (significantly taller than it is wide), the > indicator will move from the left side of it to the top, indicating that Pleco is now recognizing text vertically. To pause the system and temporarily stop recognizing characters, tap on the pause button, or to combine characters from two different lines of text, tap on span lines. Tap on the history button (second from the right at the top of the screen) to scroll through the last few words recognized.

Recognition Area

The recognition area is the bright green box in the center of the screen; it can be resized by dragging any of the four corners (which resize it symmetrically but don't move it around - it always remains centered in the same spot). Pleco's recognizer will only attempt to recognize characters within that area; it doesn't look outside of it at all, so it won't pick up a character that's half-in, half-out (or at least won't do so accurately).

It's perfectly OK if the recognition area is longer than necessary for a particular word, as long as the word is aligned with the left side (or the top if you're recognizing vertical text). In fact, it can even help with recognition accuracy - seeing more characters helps the system get a better picture of their size / darkness / etc - so it's quite reasonable to resize it as large as it will go and just leave it that way all the time. Since it won't look outside of the box, though, resizing it to just one character wide is an easy way to look up the meanings of individual characters by themselves, and can also help to avoid "cheating" if you're looking at a word that you're supposed to know; looking up one character of a forgotten word may give you a hint without revealing the whole word's meaning / pronunciation.

If you find that characters are too small for the recognition area, try zooming in (though this can reduce accuracy), or just hold the iPhone closer to the text you want to recognize. Conversely, since the speed of the OCR system is directly proportional to the size of the recognition area, if you find that the characters you're recognizing are very large, you can consider turning on the "Shrink big images" option in Settings to reduce their size before recognizing them (greatly improving speed).

If you find that the recognizer sometimes thinks a compound character like 林 is actually two characters (木木), it may be that it's having a tough time detecting the size of the font; making the recognition area wider may help with this, or if turn off the "Allow multiple lines" option in Settings (and make sure that the recognition area never stretches down to part of the next line of text) that should help also.

Pause Commands

Tap on the "pause" button at the bottom of the screen to stop recognizing characters and bring up this alternate toolbar at the bottom of the screen:

Tap on flashcard to add the current word to your flashcard database, or tap on details to bring up a full-screen dictionary definition (from which you can play audio, view example sentences, tap on individual characters in the headword to look up their meanings / stroke order / components, etc). Tap on resume to start recognizing characters again.

Span Lines

Often when reading Chinese you'll encounter a word that starts on one line and ends on the next, much like a hyphenated word in English (though much more common). For example:

我要给阿Ｑ做正传，已
经不止一两年了。

已经 is a single word, but since it starts on one line and ends on another, there's no way to simply point the recognizer at it and recognize the whole word.

Our solution to this is the conveniently-located "span lines" button. To use it, point the camera at the first part of the word (已 in the above example), then tap on "span lines" - you'll see that character / characters appear just above the recognition area, like this:

After that, point the camera at the second part of the word (经) to see the result for the entire word. Tap on the "span lines" button again (renamed to "cancel span") to return to normal recognition.

Camera Controls

Tap on the camera icon (second from the left at the top of the screen) to bring up an additional toolbar just below the top one:

The first two buttons control the OCR's zoom factor. This is strictly a digital zoom - there's no magnifying lens in an iPhone camera, all we can do is blow up the image - so it only goes up to a maximum zoom factor of 4x; tap on the zoom in button once to go from 1x to 2x, then again to go to 4x.

The next button toggles the built-in LED flash - only available on the iPhone 4. It works quite well to illuminate objects at close range (like most of the text you're likely to be looking up with OCR), but it can also confuse the recognizer by making some parts of an image much brighter than others (so it's difficult to see where the text is) - turning on the Sauvola Binarization option in Settings can help considerably with this (at the cost of making the recognizer a bit slower).

The final option toggles between black-on-white and white-on-black text. Most of the time the recognizer can figure this out automatically, but if you find that it gets it wrong (you'll know because the recognized characters will have absolutely nothing to do with the text you're looking at), you can tap on this button to manually force it to one mode or the other. (the icon in the bar above is for black-on-white)

Capture Flashcards

The second of the three "live" OCR modes is Capture Flashcards. This mode, only available if you've purchased the flashcard add-on module (though we're planning to include a free "lite" version of that in our next major update), lets you point your camera at a word to instantly add it to your flashcard database; it's almost like using a barcode scanner.

The interface for this is almost exactly the same as that for Lookup Words above. Tap on "Capture Flashcards" in the main screen and you'll see a prompt asking you to select a category for your new flashcards; choose that category and you'll be presented with our standard live OCR interface. However, in Capture Flashcards mode, after pointing at the same word for a second or so (this interval can be changed in Settings), you'll hear a beep and the screen will flash a message telling you you've created a new flashcard. This is especially useful for digitizing a long list of words at the end of a textbook chapter - you can enter each word in a fraction of the time it would take enter it manually.

Your new flashcard will be based on the currently-displayed dictionary definition; tapping on the Switch Dictionary button will change the dictionary used for the current and subsequent cards, though you can also go back and change their definitions later through Organize Flashcards.

One important Settings option specific to Capture Flashcards is "Unknown word handling." With the default behavior, "Truncate," the system will create a card based on the longest match it can find for the word in the recognition area; if it only matches the first character then it'll only create a card for that character. However, if you change this option to "Create Custom," you'll be prompted to create a brand new custom flashcard instead, with the headword prepopulated with the recognized characters; this is especially useful for items like character names that aren't likely to appear in a dictionary.

Capture Text

The final "live" OCR mode is the most confusing but possibly also the most powerful; with "Capture Text," instead of recognizing a single word at a time, you can recognize a larger block of text - several words, a sentence or even an entire paragraph - and bring it up in a document reader screen, save it to a text file, or save it to the system pasteboard to insert into another app.

This is similar in a lot of ways to still image capture mode, but unlike in that system, in Capture Text you see the results "live" on the screen so that you can make sure your text is framed up / recognized correctly (rather than having to go back and take another picture if the system gets it wrong the first time). However, since it's accessing the iPhone's camera in video mode instead of photo mode, the images it gets are much lower-resolution, so it can't see clearly enough to recognize an entire page of text (and even if it could, it would be so slow as to barely constitute an improvement over still image capture).

The interface to this is still very similar to Lookup Words, but with no Switch Dictionary / entry scroll buttons and two new buttons on the bottom of the screen:

Capture: bring up the recognized text in a standard Pleco document reader screen; you can tap on words in that to view their meaning, or save the results to a text file.

Copy: copies the recognized text to the system pasteboard, so that you can easily paste it into another application.

Also, the history button is missing in the top bar; there's no history function for Capture Text at the moment, though if you turn on the "Combine captured text" option in Settings, each time you tap on "capture," the recognized text will be appended to the end of the previously-captured text instead of replacing it; you'll end up with a copy of all of the text captured during a particular recognition session. (though for scanning full pages we'd recommend trying your luck with still image mode instead)

Still Image Recognition

Pleco's OCR system also supports a still image recognition mode, much like a more conventional OCR system; however, this mode is highly experimental, is missing a number of important features and probably contains at least a few bugs. For this reason, it must be activated through the Settings screen before you can use it; once you do that, options to open the still image recognizer will appear in the main OCR screen.

At the moment, the still image capture system is only accessible if you've purchased the document reader add-on module, though we plan to change this in a future release (once it loses its "experimental" status).

Each option (Take Photo / Photo Library / Image File) lets you take or select a picture, but regardless of which one you use you'll end up on this screen:

There are just a few controls here; the arrow buttons at the top of the screen rotate the image left or right, the button at the top right corner alternates between white-on-black and black-on-white text (hidden if you've told Pleco to detect this automatically in Settings), and the big green box around the image lets you choose which area will be recognized.

Drag the four corners around until the box wraps around the part of the image that you want to recognize - the system works best if you limit it to a block of text of the same font size / lighting level / etc, then tap anywhere in the middle of the box to run the recognizer. After a few seconds, you'll be taken to a standard Pleco document reader screen from which you can tap on words to look them up, edit or save the recognized text.

Return to Table of Contents