AOH :: 2204.TXT|
Logitech document 2204: "Optical Character Recognition Technology Information"
Document Number 2204
Optical Character Recognition Technology Information
In the Beginning . . .
Although optical character recognition (OCR) has only recently been
popularized, OCR, or at least the concept of OCR, has existed since the
beginning of the nineteenth century. In 1809, the first patents for reading
devices to aid the blind were awarded. These inventions were the first real
"seeds" of OCR's development.
The next 100 years saw numerous advances in optical scanning. One important
invention was the "retina scanner" that used a mosaic of photocells in an
image transmission system. Another important milestone in the evolution of
OCR was the invention of the "Nipkow Disk," a sequential scanning disk which
made possible the technique of line-by-line analysis of images, as well as
other future innovations. For example, the principle of Nipkow's sequential
scanning process was used in the operation of modern television cameras and as
a technology incorporated in many current OCR systems.
Shortly before World War I, the first true "readers," or machines that were
able to convert printed characters into another form, were made commercially
available. In 1912, Emmanuel Goldberg patented a machine which directly read
characters and converted those symbols into standard telegraph code.
Goldberg's machine read typed messages, converted them to paper tape, and
then used the tape to transmit telegraphic messages over wires without human
intervention. His invention demonstrated a practical application of OCR.
During the same time, but independently, Fournier D'Albe invented an OCR
device called the "Optophone." The Optophone was a hand-held scanner that
optically scanned printed material and produced series of audible tones while
being moved along a page. Each tone corresponded to a specific letter or
character which allowed a visually impaired person to interpret written
material. In the late 1920's, AT&T patented systems which scanned messages
and encoded them into "Morse Code" for telegraphic transmission.
Emmanuel Goldberg was responsible for yet another significant development in
1931, when he patented a device that searched photographic transparencies of
data records and attempted to match them against a template of the desired
search pattern. The hypothesis behind this system was that once a match was
located, the coincidence of pattern would cause a light source to be
completely blocked from a detection device, more specifically, a photographic
cell. This concept was the beginning concept of "template matching." This
technique was actually applied in the first actual working character readers
which appeared in the 1950's.
In the mid 1940's, the birth of the electronic data processing industry
created the need for a productive method of data entry. Although IBM entered
the optical scanning field in 1938, and was awarded various OCR-related
patents, including one for a "Light Sensitive Device," the computer pioneer
made no attempts to market commercial OCR devices until after 1960.
A nonscientific article in the mid 1950's introduced to the public the first
potential commercial marketplace for OCR technology and equipment -- an
invention named "Gismo." Developed by a Department of Defense engineer, Gismo
was capable of reading of reading 23 letters of the alphabet, which had been
produced by a standard typewriter. Gismo could also understand Morse Code,
read musical notations, and even read aloud from printed pages. The inventor
was quoted as saying that once "Gismo" got into production, the machine would
have about 99.9-percent reading accuracy and would sell for approximately
$1000.00. Of course, this was all theory in 1950. But it generated
substantial enthusiasm and pointed to a bright future for OCR.
Shortly after "Gismo" captured the public's attention, the same engineer
founded Intelligent Machines Research Corporation (IMR). IMR developed and
applied OCR technology to the problems and needs of commercial data
processing. The company went on to achieve a major first in OCR with the
installation of a commercial OCR reader at Reader's Digest in New York in
1954. The initial reader was used to convert typewritten documents (sales
reports) into punched cards for input into the subscription department
computer. This equipment enabled the magazine to reduce order processing from
the former rate of one month to a little more than a day. Reader's Digest
scanner, often cited as "paying for itself" twice each year, had already read
its billionth character by September of 1959.
Numerous other companies were early adopters of OCR: First National City
Bank, New York (processing travelers' checks); National Biscuit Company
(converting sales records to cards); AT&T (dividend checks and stockholder
records); Ohio Bell Telephone; Arizona Public Service Company; Atlantic City
Electric (cash accounting) and numerous government agencies including the U.S.
Post Office. At this time, most of the OCR systems were hardware + software
combined devices costing hundreds of thousands of dollars that were restricted
to reading two specialized fonts: OCR A and OCR B.
"Matrix matching" dominated OCR technology during the 1970's. In matrix
matching systems, the software compares small parts of each bit-image scanned
to bit-patterns stored in a library, finding which stored character is the
most similar to the bit-pattern scanned. However, the large variety of fonts,
type sizes, and styles created a major problem for matrix matching. For
example, an Italic "A" has a different pattern from a Roman "A," even within
the same size and type family. Because of this, a matrix-matching OCR system
must have either an enormous library of bit-patterns, (which requires a time-
consuming search for each match), or the system must be limited to matching a
few type styles.
Matrix matching systems are commonly referred to as "trainable," since they
allow the user to "train" the program to recognize different fonts.
Generally, after a document has been scanned, the program separates out what
it believes to be character images and asks the user to identify each image.
It then stores each bitmap as the assigned character in its library and
matches later images against that collection of bitmaps in order to identify
characters. This process is very time-consuming, given the number of fonts
In 1974, a company named Kurzweil was formed to extend the capabilities of OCR
to fonts other than the set fonts. The company's initial goal was to enable
blind people to hear written documents through OCR software and voice
synthesis. A new technology was sorely needed, since matrix matching was
becoming increasingly difficult, as word processors and laser printers gave
rise to a rapid proliferation of fonts and heavily kerned, touching text. The
technology pioneered by Kurzweil for the blind was called "OmniFont."
OmniFont, also known as "feature extraction," looks at the features of a
character to recognize it, instead of looking at the entire letter and
matching it to a letter in its library. The features each character are
matched to the features of a known character. For example, a figure charac-
terized by two slanted lines with a horizontal line across the center is an
"A." A vertical line with a circle attached on the lower right hand side is
a "b." If the circle is on the other side, it is a "d." OmniFont works on
most normal fonts because most fonts, as different as they are, share the same
The major benefits of OmniFont over matrix matching are speed and the ability
to read most normal fonts. The increase in speed is the result of minimizing
the samples table in relation to the volume of fonts supported. A matrix
matching table can include multiple samples of each character and can be
updated by the user training it. OmniFont only uses a table of generic
features which does not increase in size and makes the search process much
In 1976, DEST pioneered an OCR solution to the business and office market, and
in 1980, introduced a product call the Workless Station. The company claimed
that the Workless station garnered 65% of the flatbed scanner market.
However, the product was specialized and not for the mass commercial market.
In 1988, Caere brought OCR to the mass commercial market with the OmniPage
product -- an OmniFont OCR package aimed at the rapidly expanding flatbed
scanner market. What had cost many thousands of dollars and ran only on
expensive hardware, was now offered to owners of personal computers with
OCR on Every Desktop
Scanners -- the electronic "partner" of OCR -- give "eyes" to the computer by
providing a bridge between the analog world of everyday reality and the
digital world of the computer. But before Caere revolutionized the scanner
market with the introduction of OmniFont technology, flatbed scanners were
seen as devices for capturing images, not text. Today, scanners are seen as
both graphics and text solutions.
Until recently, quality images could only be captured and digitized with
extremely expensive flatbed and sheetfed scanners. However, the same
functionality and sophistication are now available in the smaller, more
affordable hand-held scanner. As a result, hand-held scanners have evolved
from tech toy of computer hobbyist into integral, productive desktop tools for
business people as well as the home user.
As this evolution takes place, users are demanding capabilities beyond image
capture, as they purchase hand-held scanners to create complex documents that
incorporate both text and graphics. Flatbed scanners are already able to
perform optical character recognition (OCR) at high level of speed and
accuracy; the challenge lies in bringing this capability to the hand-held
In 1988, Logitech introduced the first ScanMan hand-held scanner and brought
scanning to the individual desktop. The unit was intended for graphics
scanning and limited to 200 dpi hardware resolution. In addition, it was
difficult to scan straight with this early model, which contained only one
set of rollers. Thus, OCR was not a recommended use for the scanner. What's
more, initial OCR packages for hand-held scanners were expensive and, in many
cases, too slow and inaccurate to truly enhance individual productivity.
ScanMan Plus for DOS, introduced by Logitech in late 1989, paved the way for
OCR in the hand-held environment. With its 400 dpi hardware resolution,
extra set of rollers, scanning speed indicator, straightedge head design, and
scanning speed indicator, ScanMan Plus enabled users to control their scans
and achieve a level of resolution necessary for OCR.
The first version of CatchWord, a DOS-based OCR software by Logitech, followed
the introduction of ScanMan Plus. CatchWord marked the second stage in the
evolution of Logitech hand-held scanners into highly functional, multipurpose
input devices. CatchWord used OmniFont technology, giving hand-held scanners
the flexibility to capture a wide range of fonts and styles. CatchWord was
also able to scan full pages of text by stitching together two scans of a
In 1992, Logitech introduced CatchWord Pro for Windows. CatchWord Pro for
Windows represented a new generation of OCR software that kept the special
requirements of hand-held scanners in mind.
Logitech is now directly partnering with the acknowledged market leader in OCR
software for the personal computer and the founder of OmniFont technology --
Los Gatos, Calif.-based Caere Corporation. Caere is tailoring its popular
OmniFont Direct product for use with Logitech's Windows-based ScanMan hand-
held scanners. The application -- OmniPage Direct for Logitech -- is
positioned as an affordable basic utility designed to meet the needs of
ScanMan users who wish to capture a few pages of text to incorporate into
Much of the history of OCR was obtained with permission from the book The
History of OCR by Herbert Schantz. Herbert Schantz is the Director and Vice
President of the Recognition Technology Users Group and a member of the
OCR/Scanner/Fax Association. He has written many papers and given numerous
presentations on the theory, economics, and application of OCR dating back to
1969. Logitech would like to thank him for writing such an exciting and
informative book on a subject that does not have that much written about it.
The entire AOH site is optimized to look best in Firefox® 3 on a widescreen monitor (1440x900 or better).
Site design & layout copyright © 1986- AOH
We do not send spam. If you have received spam bearing an artofhacking.com email address, please forward it with full headers to firstname.lastname@example.org.