The current document is directed to methods and systems for identifying symbols corresponding to symbol images in a scanned-document image or other text-containing image, with the symbols corresponding to Chinese or Japanese characters, to Korean morpho-syllabic blocks, or to symbols of other languages that use a large number of symbols for writing and printing. In one implementation, the methods and systems to which the current document is directed carry out an initial processing step on one or more scanned images to identify a subset of the total number of symbols frequently used in the scanned document image or images. One or more lists of graphemes for the language of the text are then ordered in most-likely-occurring to least-likely-occurring order to facilitate a second optical-character-recognition step in which symbol images extracted from the one or more scanned-document images are associated with one or more graphemes most likely to correspond to the scanned symbol image.
|
1. A system comprising:
a memory to store a pixel-based representation of a document image, a votes data structure and a set of symbol pattern data structures, the votes data structure to store a vote value for each grapheme of a set of graphemes, wherein a symbol pattern data structure of the set of symbol pattern data structures corresponds to a symbol pattern of a set of symbol patterns and comprises references to a subset of graphemes associated with the symbol pattern; and
a processor, operatively coupled to the memory, to:
identify, for each of a plurality of sub-images of the document image, a corresponding symbol image;
compute, for each symbol pattern of the set of symbol patterns, a respective level of similarity to each symbol image;
identify, for each symbol pattern, graphemes of a respective symbol pattern that correspond to computed levels of similarity to the symbol image;
increment the vote values in the votes data structure for each identified grapheme of the respective symbol pattern;
for each symbol image, aggregate the symbol patterns and associated graphemes into a cluster data structure, wherein the symbol patterns are ordered within the cluster data structure based on the vote values of the graphemes in the votes data structure;
determine, for each symbol image, an encoding of the symbol image, wherein to determine the encoding of the symbol image, the processor is to traverse the ordered symbol patterns within the cluster data structure; and
generate a symbol-based representation of the document image comprising the encoding of each symbol image.
15. A non-transitory computer-readable medium comprising instructions that, when executed by a processor, cause the processor to:
store a votes data structure and a set of symbol pattern data structures, the votes data structure to store a vote value for each grapheme of a set of graphemes, and wherein a symbol pattern data structure of the set of symbol pattern data structures corresponds to a symbol pattern of a set of symbol patterns and comprises references to a subset of graphemes associated with the symbol pattern;
identify a pixel-based representation of a document image;
identify, for each of a plurality of sub-images of the document image a corresponding symbol image;
compute, for each symbol pattern, of the set of symbol pattern, a respective level of similarity to each symbol image;
identify, for each symbol pattern, graphemes of a respective symbol pattern that correspond to computed levels of similarity to the symbol image;
increment the vote values in the votes data structure for each identified grapheme of the respective symbol pattern;
for each symbol image, aggregate the symbol patterns and associated graphemes, into a cluster data structure, wherein the symbol patterns are ordered within the cluster data structure based on the vote values of the graphemes in the votes data structure, and wherein the vote values correspond to frequency at which the symbol image occurs in the document image;
determine, for each symbol image, an encoding of the symbol image, wherein to determine the encoding of the symbol image, the processor is to traverse the ordered symbol patterns within the cluster data structure; and
generate a symbol-based representation of the document image comprising the encoding of the symbol image.
9. A method comprising:
storing, by a processor, a votes data structure and a set of symbol pattern data structures, the votes data structure to store a vote value for each grapheme of a set of graphemes, wherein a symbol pattern data structure of the set of symbol pattern data structures corresponds to a symbol pattern of a set of symbol patterns and comprises references to a subset of graphemes associated with the symbol pattern;
identifying, by the processor, a pixel-based representation of a document image;
identifying, by the processor, for each of a plurality of sub-images of the document image a corresponding symbol image;
computing, by the processor, for each symbol pattern, of the set of symbol patterns, a respective level of similarity to each symbol image;
identifying, by the processor, for each symbol pattern, graphemes of a respective symbol pattern that correspond to computed levels of similarity to the symbol image;
incrementing, by the processor, the vote values in the votes data structure for each identified grapheme of the respective symbol pattern;
for each symbol image, aggregating, by the processor, the symbol patterns and associated graphemes, into a cluster data structure, wherein the symbol patterns are ordered within the cluster data structure based on the vote values of the graphemes in the votes data structure, and wherein the vote values correspond to frequency at which the symbol image occurs in the document image;
determining, by the processor, for each symbol image, an encoding of the symbol image, wherein to determine the encoding of the symbol image, the processor is to traverse the ordered symbol patterns within the cluster data structure; and
generating, by the processor, a symbol-based representation of the document image comprising the encoding of each symbol image.
2. The system of
a reference to the symbol pattern data structure in the set of symbol pattern data structures, or
the symbol pattern data structure in the set of symbol pattern data structures.
3. The system of
4. The system of
one or more parameter values for the symbol pattern represented by the symbol pattern data structure; and
indices, associated with weight values produced by comparing the symbol pattern represented by the symbol pattern data structure to a symbol image, that each indexes a grapheme indication within the symbol pattern data structure.
5. The system of
for each symbol pattern represented by the cluster data structure,
computing a weight that represents a comparison between the symbol image and the symbol pattern; and
when the computed weight indicates more than a threshold similarity between the symbol pattern and the symbol image,
selecting indications of one or more graphemes that together comprise indications of a set of graphemes associated with the symbol pattern and computed weight, and
adding a value to the vote value in the votes data structure corresponding to each of the graphemes in the set of graphemes.
6. The system of
setting an accumulating weight to 0; and
for each parameter associated with the cluster data structure,
computing a parameter value with respect to the symbol image,
computing an absolute value of a difference between the computed parameter value and a parameter value computed with respect to the symbol pattern, and
adding the computed absolute value to the accumulating weight.
7. The system of
selecting an index of a grapheme indication stored within the symbol pattern data structure that is associated with the computed weight; and
selecting the grapheme indications stored within the symbol pattern data structure from a first grapheme indication up to, and including, the grapheme indication that is indexed by the selected index.
8. The system of
the processor is further to add a computed vote value to a vote value in the votes data structure.
10. The method of
one or more parameter values for the symbol pattern represented by the symbol pattern data structure; and
indices, associated with weight values produced by comparing the symbol pattern represented by the symbol pattern data structure to a symbol image, that each indexes a grapheme indication within the symbol pattern data structure.
11. The method of
for each symbol image,
for each symbol pattern,
computing a weight that represents a comparison between the symbol image and the symbol pattern; and
when the computed weight indicates more than a threshold similarity between the symbol pattern and the symbol image,
selecting indications of one or more graphemes that together comprise indications of a set of graphemes associated with the symbol pattern and computed weight, and
adding a value to the vote value in the votes data structure corresponding to each of the graphemes in the set of graphemes.
12. The method of
setting an accumulating weight to 0; and
for each parameter associated with the cluster data structure,
computing a parameter value with respect to the symbol image,
computing an absolute value of a difference between the computed parameter value and a parameter value computed with respect to the symbol pattern, and
adding the computed absolute value to the accumulating weight.
13. The method of
selecting an index of a grapheme indication stored within the symbol pattern data structure that is associated with the computed weight; and
selecting the grapheme indications stored within the symbol pattern data structure from a first grapheme indication up to, and including, the grapheme indication that is indexed by the selected index.
14. The method of
the number of votes stored in the votes data structure for the grapheme following preprocessing of the symbol image; or
a value computed from the number of votes stored in the votes data structure for the grapheme following preprocessing of the symbol image.
|
This application claims the benefit of priority under 35 USC 119 to Russian Patent Application No. 2014103152, filed Jan. 1, 2014; the disclosure of which is incorporated by reference.
The current application is directed to automated processing of scanned-document images and other text-containing images and, in particular, to methods and systems that efficiently convert symbol images extracted from scanned documents to digital encodings of the corresponding symbols.
Printed, typewritten, and handwritten documents have long been used for recording and storing information. Despite current trends towards paperless offices, printed documents continue to be widely used in commercial, institutional, and home environments. With the development of modern computer systems, the creation, storage, retrieval, and transmission of electronic documents has evolved, in parallel with continued use of printed documents, into an extremely efficient and cost-effective alternative information-recording and information-storage medium. Because of overwhelming advantages in efficiency and cost effectiveness enjoyed by modern electronic-document-based information storage and information transactions, printed documents are routinely converted into electronic documents by various methods and systems, including conversion of printed documents into digital scanned-document images using electro-optico-mechanical scanning devices, digital cameras, and other devices and systems followed by automated processing of the scanned-document images to produce electronic documents encoded according to one or more of various different electronic-document-encoding standards. As one example, it is now possible to employ a desktop scanner and sophisticated optical-character-recognition (“OCR”) control programs that control a personal computer to convert a printed-paper document into a corresponding electronic document that can be displayed and edited using a word-processing program.
While modern OCR systems have advanced to the point that complex printed documents that include pictures, frames, line boundaries, and other non-text elements as well as text symbols of any of many common alphabet-based languages can be automatically converted to electronic documents, challenges remain with respect to conversion of printed documents containing Chinese and Japanese characters or Korean morpho-syllabic blocks.
The current document is directed to methods and systems for identifying symbols corresponding to symbol images in a scanned-document image or other text-containing image, with the symbols corresponding to Chinese or Japanese characters, to Korean morpho-syllabic blocks, or to symbols of other languages that use a large number of symbols for writing and printing. In one implementation, the methods and systems to which the current document is directed carry out an initial processing step on one or more scanned images to identify a subset of the total number of symbols frequently used in the scanned document image or images. One or more lists of graphemes for the language of the text are then ordered in most-likely-occurring to least-likely-occurring order to facilitate a second optical-character-recognition step in which symbol images extracted from the one or more scanned-document images are associated with one or more graphemes most likely to correspond to the scanned symbol image.
The current document is directed to methods and systems that efficiently match symbols of a language to symbol images extracted from one or more scanned-document images or other text-containing images. The methods and systems employ a first pass over the symbol images to identify a subset of the graphemes of the language that most likely occur within the text contained in the one or more scanned-document images or other text-containing images. The symbols of the language are organized into one or more clusters of related symbols and graphemes, and the graphemes within each cluster are sorted by likelihood of occurrence within the one or more text-containing images. In a second step, symbol images extracted from the one or more text-containing images are then matched to one or more symbols of the language that they most likely represent. In the following discussion, scanned-document images and electronic documents are first introduced. A second subsection discusses certain currently available OCR methods and systems. A third subsection includes a detailed description of the methods and systems to which the current document is directed.
Scanned Document Images and Electronic Documents
Printed documents can be converted into digitally encoded, scanned-document images by various means, including electro-optico-mechanical scanning devices and digital cameras.
By contrast, a typical electronic document produced by a word-processing program contains various types of line-drawing commands, references to image representations, such as digitally encoded photographs, and digitally encoded text characters. One commonly used encoding standard for text characters is the Unicode standard. The Unicode standard commonly uses 8-bit bytes for encoding American Standard Code for Information Exchange (“ASCII”) characters and 16-bit words for encoding symbols and characters of many languages, including Japanese, Mandarin, and other non-alphabetic-character-based languages. A large part of the computational work carried out by an OCR program is to recognize images of text characters in a digitally encoded scanned-document image and convert the images of characters into corresponding Unicode encodings. Clearly, encoding text characters in Unicode takes far less storage space than storing pixilated images of text characters. Furthermore, Unicode-encoded text characters can be edited, reformatted into different fonts, and processed in many additional ways by word-processing programs while digitally encoded scanned-document images can only be modified through specialized image-editing programs.
In an initial phase of scanned-document-image-to-electronic-document conversion, a printed document, such as the example document 100 shown in
Once an initial phase of analysis has determined the various different regions of a scanned-document image, those regions likely to contain text are further processed by OCR routines in order to identify text characters and convert the text characters into Unicode or some other character-encoding standard. In order for the OCR routines to process text-containing regions, an initial orientation of the text-containing region is determined so that various pattern-matching methods can be efficiently employed by the OCR routines to identify text characters. It should be noted that the images of documents may not be properly aligned within scanned-document images due to positioning of the document on a scanner or other image-generating device, due to non-standard orientations of text-containing regions within a document, and for other reasons. The text-containing regions are then partitioned into sub-images that contain individual characters or symbols, and these sub-images are then generally scaled and oriented, and the symbol images are centered within the sub-image to facilitate subsequent automated recognition of the symbols that correspond to the symbol images.
Currently Available OCR Methods and Systems
In order to provide a concrete discussion of various optical-character-recognition techniques, an example symbol set for a hypothetical language is used.
In fact, although the relationships between symbols, graphemes, and patterns is shown, in
904; (2) the longest vertical continuous line segment relative to the vertical symbol-window dimension
906; (3) the percent total area corresponding to the symbol image, or black space, b, 908; (4) the number of internal vertical stripes, vs, 910; (5) the number of horizontal internal stripes, hs, 912; (6) the sum of the number of internal vertical stripes and horizontal stripes, vs+hs, 914; and (7) the ratio of the longest vertical line segment to the longest horizontal line segment,
916. Thus, considering the first row 920 of table 902 in
0.6, is significantly greater than the numeric value of
0.2. Symbol 606 represents only 12 percent of the entire symbol window 602. There are no internal horizontal or vertical white spaces within symbol 606, and thus vs, hs, and vs+hs are all 0. The ratio
is 3. Because the example symbols are all relatively simple and block-like, there are relatively few different values for each of the parameters in table 902.
Despite the fact that each of the parameters discussed above with reference to
(916 in
Additional parameters can be used in order to uniquely distinguish each symbol within each cluster or partition. Consider, for example, cluster 8 (1102) shown in
Finally, in
may be needed, rather than R comparisons. Additionally, rather than comparing each symbol image with each pattern, the symbol images may be traversed until a pattern that produces a comparison score above some relatively high threshold is found. In this case, the number of patterns that are compared in each symbol image may be
rather than P. But, using these improvements, the computational complexity is nonetheless proportional to some generally large fraction of NPR.
N(CR1+P′R2),
where
the number of symbols on page=N;
number of clusters=C;
number of patterns/cluster=P;
number of initial parameters=R;
number of additional parameters=R2.
Because P′ is generally far smaller than P, and because C is even smaller still, the computational complexity for the second implementation of the routine “process” is quite favorable compared to the computational complexity for the first implementation of the routine “process.”
Another approach to speeding up the first implementation of the routine “process,” discussed above with reference to
Methods and Systems to which the Current Document is Directed
As shown in
After the preprocessing step carried out in the nested for-loops 1704, each symbol image is processed by a third implementation of the routine “process.” Pseudocode for the third implementation of the routine “process” 1710 is provided in
There is, of course, an initial preprocessing penalty represented by the term “e” 1744. However, as discussed above, the number of symbol images that are processed, N, is generally quite small in comparison to P or P′, for languages such as Chinese, Japanese, and Korean, and therefore the third implementation of the routine “process” provides significantly decreased computational complexity in comparison to either the first or second implementations of the routine “process,” discussed above. More importantly, the third implementation of the routine “process” is guaranteed to look through all of the clusters until some maximum number of potentially matching symbols is found. When the threshold for similarity for clusters is set to a relatively low value and the threshold for similarity for patterns is set relatively high, there is a very high probability that the list of potentially matching symbols returned by the routine “process” will include the actual symbol that best matches the input symbol image.
The above discussion, including the third implementation outlined in
The current document is directed to the control logic and data structures within an OCR system that allows for both clustering of patterns as well as for the above-described preprocessing step in which graphemes within patterns can be sorted by the frequency of occurrence of the graphemes within a text-containing scanned image or set of scanned images. These control logic and data structures are used in a preprocessing/clustering OCR implementation in which a fixed set of parameters is associated with each cluster and used in symbol-image/pattern comparisons with respect to patterns contained in the cluster. The clusters may be used in different local operations or phases of a complex OCR processing task, and the particular parameters used, and the number of parameters used, for symbol-image/pattern comparisons with respect to patterns contained in the cluster may differ in different local operations and phases, and may often differ among different clusters.
As shown in
Next, as shown in
As shown in
Thus, when the computed weight for the comparison of the symbol image to the pattern is less than the cutoff value, then the symbol image is sufficiently similar to the pattern that at least some of the graphemes corresponding to the pattern deserve a vote in the preprocessing step. Those graphemes sufficiently similar to the symbol image are selected based on the computed weight using an index selected from one of the indices 1908 corresponding to the computed weight. Then, elements of the data structure “votes” corresponding to these graphemes are incremented to reflect votes for these graphemes based on preprocessing of the symbol image.
Next, C++-like pseudocode is provided to illustrate the preprocessing of a symbol image with respect to the patterns within a cluster, as illustrated in
First, a number of data structures and class declarations are provided:
1
int votes[NUM_GRAPHEMES];
2
class parameter
3
{
4
virtual double parameterize (symbolImage* s);
5
};
6
parameter Parameters [NUM_PARAMETERS];
7
class pattern
8
{
9
private :
10
int patternNo;
11
double parameters[MAX_PARAMETERS];
12
int numIndices;
13
int indices[MAX_INDICES];
14
int numGraphemes;
15
int graphemes[MAX_GRAPHEMES];
16
public :
17
double getParameter (int i);
18
int getIndex (double w);
19
int getGrapheme (int i);
20
pattern ( );
21
};
22
class cluster
23
{
24
private :
25
int num;
26
int clusterParameters[MAX_CLUSTER_PARAMETERS];
27
double cutoff;
28
int numPatterns;
29
pattern*patterns;
30
public :
31
double getCutoff ( );
32
int getNum ( );
33
int getParameter (i);
34
pattern* getPattern (i);
35
int getNumPatterns ( );
36
cluster ( );
37
};
The data structure “votes” 1802 is declared on line 1, above. A small portion of a declaration for a class “parameter” is provided on lines 2-5. In the current discussion, the only relevant aspect of the parameter class is that the base class includes a virtual function member “parameterize” that takes, as input, a symbol image and that returns, as output, a floating-point parameter value. Of course, in certain cases, a particular parameter may have only integer values, rather than floating-point values. The data structure “parameters” 1804 is declared on line 6. A portion of a class “pattern” is declared on lines 7-21. The class “pattern” includes private data members “patternNo” (1904 in
The following pseudocode routine “vote” illustrates implementation of the preprocessing method with respect to a single symbol image and a single cluster:
36
void vote (symbolImage* s, cluster* c)
37
{
38
double params[MAX_PARAMETERS];
39
int i, j, k, l;
40
double weight, t;
41
pattern* p;
42
43
for (i = 0; i < c → getNum( ); i++)
44
params[i] = Parameters[c → getParameter
(i)].parameterize(s);
45
for (j = 0; j < c → getNumPattern( ); j++)
46
{
47
p = c → getPattern(i);
48
weight = 0;
49
for (i = 0; i < c → getNum( ); c++)
50
{
51
t = p → getParameter (i) − params [i];
52
weight + = (t < 0) ? − t : t;
53
}
54
if (weight > c → getCutoff( )) continue;
55
k = p → getIndex(weight);
56
for (l = 0; l < k; l++)
57
votes[p → getGrapheme(l)]++;
58
}
59
}
The routine “vote” receives, as arguments, a pointer to a symbol image and a pointer to a cluster. Local variables include the array “params” declared on line 38, that stores computed parameter values for the symbol image, iteration integers i, j, k, and l, declared on line 39, floating-point variables “weight” and “t,” used to store a computed weight resulting from a comparison between the input symbol image and a pattern within the cluster, and a pointer p, declared on line 41, that points to a pattern within the input cluster. In the for-loop of lines 43-44, parameter values for all the parameters used by the cluster are computed for the input symbol image and stored in the array “params” (1924 in
There are many different alternative approaches to the preprocessing step and above-described data structures. For example, rather than a cutoff weight for an entire cluster, cutoff weights for particular patterns may be used, with the cutoff weights included in the pattern data structure. As another example, the indices stored within the pattern may be instances of classes that contain lists of grapheme codes, rather than indexes pointed into an ordered list of grapheme codes, as in the currently described implementation. Many other such alternative implementations are possible. For example, the routine “vote” may receive, as a second argument, a pointer to an array “params” and, in the for-loop of lines 43-44, may compute parameter values only when they have not already been computed while processing the symbol image with respect to other clusters. Different types of weight computations and symbol-image-to-pattern comparisons may be used in alternative implementations. In certain cases, larger-valued weights may indicate greater similarity between a symbol image and a pattern, unlike the above-described weights that increase in value as the similarity between a symbol image and a pattern decreases. In certain OCR systems, real coefficients may be associated with graphemes to allow for fractional votes and votes greater than 1. In certain OCR systems, graphemes, patterns, and/or clusters may be sorted, based on votes accumulated during preprocessing, to facilitate efficient subsequent symbol recognition. In certain implementations, a cluster data structure may include only a number of pattern data structures or references to pattern data structures, with the cutoff and patterns associated with the cluster specified in control logic, rather than stored in the cluster data structure.
Once votes have been collected in the array “votes” for a particular symbol image after preprocessing the symbol image with respect to the patterns contained in a cluster, the votes array can be subsequently processed to return a list of grapheme codes for which votes were received, ordered in descending order by the number of votes received by each grapheme code. The number of votes accumulated for a particular grapheme may be considered to be a computed level of similarity of the grapheme to the symbol image. Alternatively, a different level-of-similarity metric may be computed based on the number of votes accumulated for the grapheme. This ordered list represents the graphemes that are most similar to the symbol image, in descending similarity order. Alternatively, the votes array may be used to accumulate votes generated by preprocessing a symbol image with respect to multiple clusters, after which an ordered list of graphemes most similar to the symbol image may be produced. In other methods, votes may be accumulated for two or more symbol images prior to using the votes to generate a list of grapheme codes. In other words, there are many possible ways for preprocessing methods to accumulate votes in the array “votes” and there are many ways for the accumulated votes to be used to generate various types of results, such as an ordered list of graphemes most similar to a particular symbol image.
Although the present invention has been described in terms of particular embodiments, it is not intended that the invention be limited to these embodiments. Modifications within the spirit of the invention will be apparent to those skilled in the art. For example, any of many different possible implementations of the data structures and methods used for preprocessing according to the generalized third implementation, described above, within an OCR system may be obtained by varying any of many different design and implementation parameters, including data structures, control structures, modular organization, programming language, underlying operating system and hardware, and many other such design and implementation parameters.
It is appreciated that the previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
4950167, | Oct 19 1989 | Jewish Employment and Vocational Service | Visual detail perception test kit and methods of use |
5031225, | Dec 09 1987 | RICOH COMPANY, LTD , A JAPANESE CORP | Character recognition method for recognizing character in an arbitrary rotation position |
5425110, | Apr 19 1993 | Xerox Corporation; Fuji Xerox Corporation | Method and apparatus for automatic language determination of Asian language documents |
5471549, | Nov 28 1990 | Hitachi, Ltd. | Method of detecting and correcting a direction of image data and document image filing system employing the same |
5625710, | Jul 20 1988 | Fujitsu Limited | Character recognition apparatus using modification of a characteristic quantity |
5710916, | May 24 1994 | Panasonic Corporation of North America | Method and apparatus for similarity matching of handwritten data objects |
5771712, | Jun 07 1995 | UOP LLC | Hydrocarbon gas processing |
6005986, | Dec 03 1997 | The United States of America as represented by the National Security; Government of the United States, as represented by the National Security Agency | Method of identifying the script of a document irrespective of orientation |
6041137, | Aug 25 1995 | Microsoft Technology Licensing, LLC | Radical definition and dictionary creation for a handwriting recognition system |
6148119, | Feb 01 1995 | Canon Kabushiki Kaisha | Character recognition in input images divided into areas |
6208968, | Dec 16 1998 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Computer method and apparatus for text-to-speech synthesizer dictionary reduction |
6389166, | Oct 26 1998 | Matsushita Electric Industrial Co., Ltd. | On-line handwritten Chinese character recognition apparatus |
6512522, | Apr 15 1999 | Avid Technology, Inc.; AVID TECHNOLOGY, INC | Animation of three-dimensional characters along a path for motion video sequences |
6636630, | May 28 1999 | Sharp Kabushiki Kaisha | Image-processing apparatus |
6657625, | Jun 09 1999 | Microsoft Technology Licensing, LLC | System and method of caching glyphs for display by a remote terminal |
6661417, | Aug 28 2000 | DYNACOMWARE TAIWAN INC | System and method for converting an outline font into a glyph-based font |
6804414, | May 01 1998 | Fujitsu Limited | Image status detecting apparatus and document image correcting apparatus |
7013264, | Mar 07 1997 | Microsoft Technology Licensing, LLC | System and method for matching a textual input to a lexical knowledge based and for utilizing results of that match |
7027054, | Aug 14 2002 | AvaWorks, Incorporated | Do-it-yourself photo realistic talking head creation system and method |
7106905, | Aug 23 2002 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Systems and methods for processing text-based electronic documents |
7151860, | Jul 30 1999 | Fujitsu Limited | Document image correcting device and a correcting method |
7254269, | Aug 31 2000 | HEWLETT-PACKARD DEVELOPMENT COMPANY L P | Character recognition system |
7454067, | Sep 12 2003 | Adobe Inc | Symbol classification depending on cluster shapes in difference image |
7805004, | Feb 28 2007 | Microsoft Technology Licensing, LLC | Radical set determination for HMM based east asian character recognition |
8027539, | Jan 11 2008 | Sharp Laboratories of America, Inc. | Method and apparatus for determining an orientation of a document including Korean characters |
8041119, | Jan 05 2007 | Compal Electronics, Inc. | Method for determining orientation of chinese words |
8326037, | Nov 23 2005 | MATROX ELECTRONIC SYSTEMS, LTD | Methods and apparatus for locating an object in an image |
8478045, | Sep 09 2010 | PFU Limited | Method and apparatus for processing an image comprising characters |
8953888, | Feb 10 2011 | Microsoft Technology Licensing, LLC | Detecting and localizing multiple objects in images using probabilistic inference |
9589185, | Dec 10 2014 | ABBYY DEVELOPMENT INC | Symbol recognition using decision forests |
20030152269, | |||
20050149471, | |||
20050163377, | |||
20070094166, | |||
20080082426, | |||
20090034848, | |||
20100010989, | |||
20100257173, | |||
20110043528, | |||
20110135192, | |||
20110286669, | |||
20120166180, | |||
20120308127, | |||
20130077865, | |||
20130086068, | |||
20130322759, | |||
20140214398, | |||
20140310639, | |||
20150139559, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Oct 07 2014 | ABBYY DEVELOPMENT LLC | (assignment on the face of the patent) | / | |||
Oct 14 2014 | CHULININ, YURI | ABBYY DEVELOPMENT LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 034107 | /0853 | |
Dec 08 2017 | ABBYY DEVELOPMENT LLC | ABBYY PRODUCTION LLC | MERGER SEE DOCUMENT FOR DETAILS | 047997 | /0652 | |
Dec 31 2021 | ABBYY PRODUCTION LLC | ABBYY DEVELOPMENT INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 059249 | /0873 |
Date | Maintenance Fee Events |
Oct 04 2021 | REM: Maintenance Fee Reminder Mailed. |
Mar 21 2022 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Feb 13 2021 | 4 years fee payment window open |
Aug 13 2021 | 6 months grace period start (w surcharge) |
Feb 13 2022 | patent expiry (for year 4) |
Feb 13 2024 | 2 years to revive unintentionally abandoned end. (for year 4) |
Feb 13 2025 | 8 years fee payment window open |
Aug 13 2025 | 6 months grace period start (w surcharge) |
Feb 13 2026 | patent expiry (for year 8) |
Feb 13 2028 | 2 years to revive unintentionally abandoned end. (for year 8) |
Feb 13 2029 | 12 years fee payment window open |
Aug 13 2029 | 6 months grace period start (w surcharge) |
Feb 13 2030 | patent expiry (for year 12) |
Feb 13 2032 | 2 years to revive unintentionally abandoned end. (for year 12) |