The invention described herein provides a method and apparatus for document processing that efficiently separates and interrelates single modalities, such as text, handwriting, and images. In particular, the present invention starts with the recognition of text characters and words for the efficient separation of text paragraphs from images by maintaining their relationships for a possible reconstruction of the original page. The text separation and extraction is based on a hierarchical framing process. The process starts with the framing of a single character, after its recognition, continues with the recognition and framing of a word, and ends with the framing of all text lines. The method and apparatus described herein can process different types of documents, such as typed, handwritten, skewed, mixed, but not half-tone ones.
|
1. Method for separating text from images, comprising the steps of:
binarizing an entire page of text;
a first step of scanning said binarized page of text so as to detect a character, wherein said first step of scanning further comprises:
generating the pyramidal form of said page;
scanning said page for text or images;
determining whether either text or images are detected;
IF either text or images are detected, THEN:
defining region as the first pyramidal region;
focusing on the upper left corner of said region;
attempting to detect a text character; and
returning to said step of determining whether either text or images are detected;
OTHERWISE,
determining whether either text or images are detected;
a first step of creating a temporal window on said binarized page and a second step of scanning said temporal window so as to extract a character shape;
graphing line segments;
recognizing and framing a character;
a first step of connecting adjacent character frames in the same word;
a second step of creating multi-frame word blocks;
recognizing hand-written words;
a second step of connecting word frames, further comprising
connecting the last frame of each word with the first frame of the next word;
saving the coordinates of lines of text and paragraphs on a given page; and
extracting images from a page.
12. Apparatus for separating text from images, comprising:
means for binarizing an entire page of text;
a first means for scanning a binarized page of text so as to detect a character,
wherein said first means for scanning further comprises:
means for generating the pyramidal form of said page;
means for scanning said page for text or images;
means for determining whether either text or images are detected;
IF either text or images are detected, THEN said first means for scanning further comprises:
means for defining region as the first pyramidal region;
means for focusing on the upper left corner of said region;
means for attempting to detect a text character; and
means for returning to said means for determining whether either text or images are detected;
OTHERWISE, said first means for scanning further comprises:
means for determining whether either text or images are detected;
a first means for creating a temporal window on said binarized page and a second means for scanning said temporal window so as to extract a character shape;
means for graphing line segments;
means for recognizing and framing a character;
a first means for connecting adjacent character frames in the same word;
a second means for creating multi-frame word blocks;
means for recognizing hand-written words;
a second means for connecting word frames, further comprising
means for connecting the last frame of each word with the first frame of the next word;
means for saving the coordinates of lines of text and paragraphs on a given page; and
means for extracting images from a page.
2. Method of
creating a temporal window;
scanning within said window so as to detect the edges of a character or shape of an object;
determining whether an edge of a character or shape of an object is detected;
IF an edge of a character or shape of an object is detected, THEN:
extracting the shape of said character or said object;
representing said shape as a string “S”; and
returning to said step of determining whether an edge of a character or shape of an object is detected;
OTHERWISE,
returning to said step of determining whether an edge of a character or shape of an object is detected.
3. Method of
applying line generation and recognition process to string “S”;
recognizing segments of said string “S” as straight lines or curves; and
converting said string “S” into a graph “G”.
4. Method of
determining whether a character has been extracted and graphed;
IF it is determined that a character has been extracted and graphed,
THEN:
performing graph matching;
classifying said character in a database;
OTHERWISE,
returning to said step of determining whether a character has been extracted and graphed;
determining whether said extracted character is recognizable;
IF said extracted character is recognizable, THEN:
determining whether said character overlaps with adjacent characters;
IF it is determined that said character overlaps with adjacent characters, THEN:
applying a voting recognition process;
generating a frame of said character; and
returning to said step of determining whether a character has been extracted and graphed;
OTHERWISE,
advancing to said step of generating a frame of said character;
OTHERWISE,
considering pattern as part of an image or drawing;
caving said pattern's coordinates; and
returning to said step of determining whether a character has been extracted and graphed.
5. Method of
determining whether the first character has been framed;
IF it is determined that the first character has been framed, THEN:
performing character extraction and recognition on adjacent character;
determining whether adjacent characters belong to the same word;
IF it is determined that adjacent characters belong to the same word, THEN:
matching possible connection patterns; and
connecting adjacent characters into one frame,
OTHERWISE,
repeating said step of determining whether first character has been framed.
6. Method of
determining whether the distance between the last two characters is greater than “dc”;
IF said distance is greater than “dc”, THEN:
creating a multi-frame block of the extracted word,
OTHERWISE,
repeating character framing; and
returning to said step of determining whether the distance between the last two characters is greater than “dc”.
7. Method of
segmenting said hand-written word;
saving said word in ASCII format;
composing a text word from said ASCII format;
comparing said text word with lexicon database;
determining whether a character is NOT isolated from adjacent characters;
IF said character is NOT isolated, THEN:
centering window Wnxm around said character;
determining whether three character recognitions have been made;
IF three said recognitions have been made, THEN:
comparing said three character recognitions by voting;
selecting said character with more than two appearances;
saving selected character in memory; and
determining whether the last said character of a said word has been recognized and saved;
IF said last character has been recognized and saved, THEN:
extracting length of said word;
defining the starting character of said word; and
matching said word with lexicon database;
OTHERWISE;
returning to said step of comparing said three character recognitions by voting;
OTHERWISE;
returning to said step of determining whether a character is NOT isolated from adjacent characters;
OTHERWISE;
proceeding to said step of saving said selected character in memory.
8. Method of
saving the coordinates (x,y) and the relative orientation (rv) of the first character frame of each text line relative to the borders of the document pages; and
interrelating the blocks of said extracted lines of text with numbers according to their relative positions on said document page.
9. Method of
sequentially scanning said image region;
saving said coordinates (c,y) of said image; and
saving said relative orientation (rv) of said image.
10. Method of
S=cnk1(djk1)nk2(djk2) . . . cn(djll) . . . nlm(djlm)cc wherein nkm ε Z, djkm ε {1,2,3,4,5,6,7,8}, c=0, cc=9, and i,j,k,l,m ε Z.
11. Method of
f:S→G=N1ar12N2ar23 N3 . . . arnkNk; wherein said line segment (SL or CL) corresponds to a graph node:
f:SLi→Ni or CLj→Ni: wherein each said graph node Ni represents the properties of the corresponding segment:
N1={Realtive Starting Point (SP), Length (L), Direction (D), Curvature (K)};
wherein each are arij represents the relationships between segments:
arij={connectivity (co), parallelism (p), symmetry (sy), relative size (rs), relative distance (rd), relative orientation (ro), similarity (si), . . . }, r ε{co, p, sy, rm, rd, si}; and
wherein the actual matching process, each said node Ni has only property in the curvature (K).
13. Apparatus as in
means for creating a temporal window;
means for scanning within said window so as to detect the edges of a character or shape of an object;
means for determining whether an edge of a character or shape of an object is detected;
IF an edge of a character or shape of an object is detected, THEN, said second means for scanning said temporal window further comprises:
means for extracting the shape of said character or said object;
means for representing said shape as a string “S”; and
means for returning to said step of determining whether an edge of a character or shape of an object is detected;
OTHERWISE, said second means for scanning said temporal window further comprises:
means for re-implementing said means for determining whether an edge of a character or shape of an object is detected.
14. Apparatus as in
means for applying line generation and recognition process to string “S”;
means for recognizing segments of said string “S” as straight lines or curves; and
means for converting said string “S” into a graph “G”.
15. Apparatus as in
means for determining whether a character has been extracted and graphed;
IF it is determined that a character has been extracted and graphed,
THEN, said means for recognizing and framing a character further comprises:
means for performing graph matching;
means for classifying said character in a database;
OTHERWISE, said means for recognizing and framing a character further comprises:
means for returning to said step of determining whether a character has been extracted and graphed;
means for determining whether said extracted character is recognizable;
IF said extracted character is recognizable, THEN, said means for recognizing and framing a character further comprises:
means for determining whether said character overlaps with adjacent characters;
IF it is determined that said character overlaps with adjacent characters, THEN, said means for recognizing and framing a character further comprises:
means for applying a voting recognition process;
means for generating a frame of said character; and
mean for re-implementing said means for determining whether a character has been extracted and graphed;
OTHERWISE, said means for recognizing and framing a character further comprises:
means for advancing to said step of generating a frame of said character;
OTHERWISE, said means for recognizing and framing a character further comprises:
means for considering pattern as part of an image or drawing;
means for saving said pattern's coordinates; and
means for re-implementing said means for determining whether a character has been extracted and graphed.
16. Apparatus as in
means for determining whether the first character has been framed;
IF it is determined that the first character has been framed, THEN, said first means for connecting adjacent character frames in the same word further comprises:
means for performing character extraction and recognition on adjacent character;
means for determining whether adjacent characters belong to same word;
IF it is determined that adjacent characters belong to the same word, THEN, said first means for connecting adjacent character frames in the same word further comprises:
means for matching possible connection patterns; and
connecting adjacent characters into one frame,
OTHERWISE, said first means for connecting adjacent character frames in the same word further comprises:
means for returning to said step of performing character extraction and recognition on adjacent character;
OTHERWISE, said first means for connecting adjacent character frames in the same word further comprises:
means for repeating said step of determining whether first character has been framed.
17. Apparatus as in
means for determining whether the distance between the last two characters is greater than “dc”;
IF said distance is greater than “dc”, THEN, said second means for creating multi-frame word blocks further comprises:
means for creating a multi-frame block of the extracted word,
OTHERWISE, said second means for creating multi-frame word blocks further comprises:
means for repeating character framing; and
means for re-implementing said means for determining whether the distance between the last two characters is greater than “dc”.
18. Apparatus as in
means for segmenting said hand-written word;
means for saving said word in ASCII format;
means for composing a text word from said ASCII format;
means for determining whether a character is NOT isolated from adjacent characters;
IF said character is NOT isolated, THEN, said means for recognizing hand-written words further comprises:
means for centering window Wnxm around said character;
means for determining whether three character recognitions have been made;
IF three said recognitions have been made, THEN, said means for recognizing hand-written words further comprises:
means for comparing said three character recognitions by voting;
means for selecting said character with more than two appearances;
means for saving selected character in memory; and
means for determining whether the last said character of a said word has been recognized and saved;
IF said last character has been recognized and saved, THEN, said means for recognizing hand-written words further comprises:
means for extracting length of said word;
means for defining the starting character of said word; and
means for matching said word with lexicon database;
OTHERWISE; said means for recognizing hand-written words further comprises:
means for re-implementing said means for comparing said three character recognitions by voting;
OTHERWISE; said means for recognizing hand-written words further comprises:
means for re-implementing said means for determining whether a character is NOT isolated from adjacent characters;
OTHERWISE; said means for recognizing hand-written words further comprises:
means for implementing out-of-sequence said means for saving said selected character in memory.
19. Apparatus as in
means for saving the coordinates (x,y) and the relative orientation (rv) of the first character frame of each text line relative to the borders of the document page; and
means for interrelating the blocks of said extracted lines of text with numbers according to their relative positions on said document page.
20. Apparatus as in
means for sequentially scanning said image region;
means for saving said coordinates (x,y) of said image; and
means for saving said relative orientation (rv) of said image.
21. Apparatus as in
S=cnk1(djk1)nk2 (djk2) . . . cn(djll) . . . nlm (djlm)cc wherein nkm ε Z, djkm ε {1,2,3,4,5,6,7,8}, c=0, cc=9, and i,j,k,l,m ε Z.
22. Apparatus as in
f:S→G=N1ar12N2ar23 N3 . . . arnkNk; wherein said line segment (SL or CL) corresponds to a graph node:
f:SLi→Ni or CLj→Ni; wherein each said graph node N1 represents the properties of the corresponding segment:
Ni={Realtive Starting Point (SP), Length (L), Direction (D), Curvature (K)};
wherein each arc arij represents the relationships between segments:
arij={connectivity (co), parallelism (p), symmetry (sy), relative size (rs), relative distance (rd), relative orientation (ro), similarity (si), . . . }, r ε{co, p, sy, rm, rd, si}; and
wherein the actual matching process, each said node Ni has only property in the curvature (K).
|
This patent application claims the priority benefit of the filing date of a provisional application, Ser. No. 60/354,149, filed in the United States Patent and Trademark Office on Feb. 4, 2002.
The invention described herein may be manufactured and used by or for the Government for governmental purposes without the payment of any royalty thereon.
The recognition of printed and handwritten characters and words is an important research field with many applications existing in post offices for identifying the postal code from the addresses on the envelopes and sorting the mail, in banks for check processing, in libraries for computerizing the storage of books and texts, and also as reading devices for blind people, etc. Although many methodologies and systems have been developed for optical character recognition (OCR), OCR remains a challenging area. In particular, a good OCR system spends on the average about 2–3 seconds for the recognition of a handwritten character from a handwritten word. An extreme case is the OCR system by Loral, which is based on a very expensive parallel multiprocessor system of 1024 Intel-386 microprocessors, where each 386 CPU processes only one character at a time. There are also many OCR methods based on neural networks, such as the AT&T Bell labs OCR chip, the multiple Neural Networks OCR approach, etc. There are some other OCR methods based on human like recognition. One of them uses a fuzzy graph based OCR approach, with adaptive learning capabilities, which reduces the character dimensions to speed up the recognition process. It scans the text page, detects a character, extracts and recognizes it, produces the appropriate ASCII code, and sends it to the host computer in a few milliseconds simulated average test time. Image Processing and Pattern Recognition (IPPR) are two older research fields with many significant contributions. The recognition and extraction of objects from images is a small sub-field of IPPR. There are many successful methods based on neural nets or graphs to recognize different kind of objects (faces, cars, chairs, tables, buildings, etc) under very noisy conditions.
Recently, attention has been focused on the document processing field due to multimedia applications. Although document processing is an interesting research field, it introduces many difficult problems associated with the recognition of text characters from images. For instance, there are cases where a document can be considered either as text or as image, like images generated by text characters. Also, artistic letters in very old and valuable books, where the starting letter of each paragraph look like a complex image. In some cases, however, the text is handwritten, and the problem becomes more difficult. Several methods have been developed for document processing. Most of these methods deal with the segmentation of a page and the separation of text from images. One prior art method is a “top-down” approach and produces good results under the condition that the examined page can be separated into blocks. Another prior art method is algorithmic “bottom up” process with good performance in several categories of pages with good spacing features, and “non overlapping” blocks. Yet another prior art method exists and is also a “bottom up” process with very good performance especially in long text uniform strings. Still another prior art method exists that separates images from text (typed or handwritten) by maintaining their relationships.
One object of the present invention is to provide a method and apparatus for processing documents by separating text from images yet maintaining their relationship for reconstruction.
Another object of the present invention is to provide a method and apparatus for recognizing single characters, words, and lines of text.
Yet another object of the present invention is to provide a method and apparatus for recognizing typed as well as handwritten words and letters.
The invention described herein provides a method and apparatus for document processing that efficiently separates and interrelates single modalities, such as text, handwriting, and images. In particular, the present invention starts with the recognition of text characters and words for the efficient separation of text paragraphs from images by maintaining their relationships for a possible reconstruction of the original page. The text separation and extraction is based on a hierarchical framing process. The method starts with the framing of a single character, after its recognition, continues with the recognition and framing of a word, and ends with the framing of all text lines. The method and apparatus described herein can process different types of documents, such as typed, handwritten, skewed, mixed, but not half-tone ones.
According to an embodiment of the present invention, method for separating text from images, comprises the steps of: a first step of scanning a binarized page of text so as to detect a character; a first step of creating a temporal window on the binarized page and a second step of scanning the temporal window so as to extract a character shape; graphing line segments; recognizing and framing a character; a first step of connecting adjacent character frames in the same word; a second step of creating multi-frame word blocks; recognizing hand-written words; a second step of connecting word frames; saving the coordinates of lines of text and paragraphs on a given page; and extracting images from a page.
Binarization & Character Detection
Referring to
Character Recognition
Referring to
S=cnk1(djk1)nk2(djk2) . . . cn(djll) . . . nlm (djlm)cc
where nkm ε Z, djkm ε {1,2,3,4,5,6,7,8}, c=0, cc=9, and i,j,k,l,mεZ.
Referring to
f:S→G=N1ar12N2ar23 N3. . . arnkNk
where a line segment (SL or CL) corresponds to a graph node:
f:SLi→Ni or CLj →Ni
where each graph node Ni represents the properties of the corresponding segment:
Ni={Realtive Starting Point (SP), Length (L), Direction (D), Curvature (K)}
and each are arij represents the relationships between segments:
arij={connectivity (co), parallelism (p), symmetry (sy), relative size (rs), relative distance (rd), relative orientation (ro), similarity (si), . . . }, r ε{co, p, sy, rm, rd, si}
For the actual matching process, each node Ni has only property in the curvature (K).
Referring to
Character Framing
Referring to
Connecting Character Frames
Referring to
Word Framing
Referring to
Word Recognition
Referring to
Text Line Framing
Referring to
Connecting and Extracting Text Line Frames
Referring to
Extracting Images
Referring to
While the preferred embodiments have been described and illustrated, it should be understood that various substitutions, equivalents, adaptations and modifications of the invention may be made thereto by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is to be understood that the present invention has been described by way of illustration and not limitation.
Bourbakis, Nicholas G., Borek, Stanley E.
Patent | Priority | Assignee | Title |
10440305, | Oct 20 2010 | Comcast Cable Communications, LLC | Detection of transitions between text and non-text frames in a video stream |
10817741, | Feb 29 2016 | Alibaba Group Holding Limited | Word segmentation system, method and device |
11134214, | Oct 20 2010 | Comcast Cable Communications, LLC | Detection of transitions between text and non-text frames in a video stream |
8189917, | Sep 25 2008 | Sharp Kabushiki Kaisha | Methods and systems for locating text in a digital image |
8718367, | Jul 10 2009 | INTUIT INC. | Displaying automatically recognized text in proximity to a source image to assist comparibility |
8989499, | Oct 20 2010 | Comcast Cable Communications, LLC | Detection of transitions between text and non-text frames in a video stream |
9843759, | Oct 20 2010 | Comcast Cable Communications, LLC | Detection of transitions between text and non-text frames in a video stream |
Patent | Priority | Assignee | Title |
4513442, | Feb 27 1981 | Siemens Aktiengesellschaft | Method for locating and circumscribing text areas on a master which may contain text, graphics and/or image areas |
5335290, | Apr 06 1992 | Ricoh Company | Segmentation of text, picture and lines of a document image |
5724445, | Jul 23 1991 | Canon Kabushiki Kaisha | Image processing method and apparatus |
5852676, | Apr 11 1995 | DOCUSTREAM, INC | Method and apparatus for locating and identifying fields within a document |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Nov 18 2002 | BOURBAKIS, NICHOLAS G | United States Air Force | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 017397 | /0316 | |
Nov 22 2002 | BOREK, STANLEY E | United States Air Force | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 017397 | /0316 | |
Dec 05 2002 | The United States of America as represented by the Secretary of the Air Force | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Oct 02 2009 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jan 16 2014 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Feb 20 2018 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Feb 20 2018 | M1556: 11.5 yr surcharge- late pmt w/in 6 mo, Large Entity. |
Date | Maintenance Schedule |
Jul 25 2009 | 4 years fee payment window open |
Jan 25 2010 | 6 months grace period start (w surcharge) |
Jul 25 2010 | patent expiry (for year 4) |
Jul 25 2012 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 25 2013 | 8 years fee payment window open |
Jan 25 2014 | 6 months grace period start (w surcharge) |
Jul 25 2014 | patent expiry (for year 8) |
Jul 25 2016 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 25 2017 | 12 years fee payment window open |
Jan 25 2018 | 6 months grace period start (w surcharge) |
Jul 25 2018 | patent expiry (for year 12) |
Jul 25 2020 | 2 years to revive unintentionally abandoned end. (for year 12) |