A system and method for script and orientation detection of images are disclosed. In one example, textual content in the image is extracted. Further, a vertical component run (VCR) and horizontal component run (HCR) are obtained by vectorizing each connected component in the extracted textual content. Furthermore, a concatenated vertical document vectors (vdv) and a horizontal document vector (HDV) are computed. In addition, a substantially matching script and orientation is obtained by comparing the computed concatenated vdv and HDV of the image with reference vdv and HDV associated with each script and orientation, respectively. Also, the substantially matching script and orientation are declared as the script and orientation of the image, if the computed concatenated vdv and HDV of the image substantially match with the reference vdv and HDV of the matching script and orientation, respectively.
|
1. A method for script and orientation detection of an image, comprising:
extracting textual content in the image;
obtaining a vertical component run (VCR) and a horizontal component run (HCR) by vectorizing each connected component in the extracted textual content in the image;
computing a concatenated vertical document vector (vdv) and horizontal document vector (HDV) by averaging the obtained VCR and HCR for each connected component in the image;
obtaining a substantially matching script and orientation by comparing the computed concatenated vdv and HDV of the image with each of a plurality of reference vdvs and HDVs, wherein each reference vdv and HDV is associated with a script and an orientation of a plurality of scripts and orientations;
determining whether the computed concatenated vdv and HDV of the image substantially match with the reference vdv and HDV associated with the matching script and orientation; and
if the computed concatenated vdv and HDV of the image substantially match with the reference vdv and HDV associated with the matching script and orientation, then declaring the matching script and orientation as the script and orientation of the image.
11. A non-transitory computer-readable storage medium for script and orientation detection of images having instructions that when executed by a computing device, cause the computing device to:
extract textual content in the image;
obtain a vertical component run (VCR) and a horizontal component run (HCR) by vectorizing each connected component in the extracted textual content in the image; compute a concatenated vertical document vector (vdv) and horizontal document vector (HDV) by averaging the obtained VCR and HCR for each connected component in the image;
obtain a substantially matching script and orientation by comparing the computed concatenated vdv and HDV of the image with each of a set of reference vdv and HDV, wherein each reference vdv and HDV is associated with a script and an orientation of a plurality of scripts and orientations;
determine whether the computed concatenated vdv and HDV of the image substantially match with the reference vdv and HDV associated with the matching script and orientation; and
if the computed concatenated vdv and HDV of the image substantially match with the reference vdv and HDV associated with the matching script and orientation, then declaring the matching script and orientation as the script and orientation of the image.
14. A system for script and orientation detection of images, comprising:
a processor;
a memory coupled to the processor; and
a script and orientation detection module residing in the memory,
wherein the script and orientation detection module extracts textual content in the image,
wherein the script and orientation detection module obtains a vertical component run (VCR) and a horizontal component run (HCR) by vectorizing each connected component in the extracted textual content in the image,
wherein the script and orientation detection module computes a concatenated vertical document vector (vdv) and horizontal document vector (HDV) by averaging the obtained VCR and HCR for each connected component in the image,
wherein the script and orientation detection module obtains a substantially matching script and orientation by comparing the computed concatenated vdv and HDV of the image with each of a plurality of reference vdvs and HDVs, wherein each reference vdv and HDV is associated with a script and an orientation of a plurality of scripts and orientations,
wherein the script and orientation detection module determines whether the computed concatenated vdv and HDV of the image substantially match with the reference vdv and HDV associated with the matching script and orientation, and
wherein the script and orientation detection module declares the matching script and orientation as the script and orientation of the image, when the computed concatenated vdv and HDV of the image substantially match with the reference vdv and HDV associated with the matching script and orientation.
2. The method of
3. The method of
obtaining a gray level/colored image by capturing the image using a camera or scanner;
obtaining a binarized image from the gray level/colored image;
dilating the binarized image to join disjoint parts of characters in the binarized image; and
identifying and extracting the textual content by performing a connected component analysis and a resolution based thresholding on the dilated image.
4. The method of
generating the reference vdv and HDV for each script and orientation by averaging the vdvs and HDVs obtained from a plurality of images, each of the plurality of images being associated with a script and an orientation.
5. The method of
computing sums of squared differences (SSDs) between the computed vdv and HDV and each of the reference vdvs and HDVs.
6. The method of
obtaining a minimum SSD from the computed SSDs; and
obtaining the substantially matching script and orientation associated with the obtained minimum SSD, wherein the obtained minimum SSD is less than or equal to a first threshold value.
7. The method of
computing orientation SSDs between the computed vdv and HDV and each of a set of reference vdvs and HDVs, each reference vdv and HDV being associated with an orientation of a plurality of orientations;
determining whether any one of the computed orientation SSDs is equal to or below a second threshold value; and
if any one of the computed orientation SSDs is equal to or below the second threshold value, declaring the orientation associated with computed SSD that is equal to or below the second threshold value as the orientation of the image.
8. The method of
if none of the computed orientation SSDs is equal to or below the second threshold value, then performing a statistical orientation identification to identify the orientation of the image.
9. The method of
10. The method of
if the computed concatenated vdv and HDV of the image does not substantially match with the reference vdv and HDV of the matching script and orientation, then performing a statistical script identification to identify the script of the image.
12. The method of
13. The non-transitory computer-readable storage medium of
obtaining a gray level/colored image by capturing the image using a camera or scanner;
obtaining a binarized image from the gray level/colored image;
dilating the binarized image to join disjoint parts of characters in the binarized image; and
identifying and extracting the textual content by performing a connected component analysis and a resolution based thresholding on the dilated image.
15. The system of
wherein the script and orientation detection module obtains a binarized image from the gray level/colored image,
wherein the script and orientation detection module dilating the binarized image to join disjoint parts of characters in the binarized image, and
wherein the script and orientation detection module identifies and extracts the textual content by performing a connected component analysis and a resolution based thresholding on the dilated image.
|
With increase in the usage of soft version of images, there has been a need for identifying script and their orientations. Currently, manual checks are performed to categorize the images based on scripts and to correct orientation of the images. However, the manual process can be very time consuming and tedious and may not be cost effective during bulk scanning.
Further, rapid growth in digital libraries has necessitated the need for automated systems for identifying script and their orientations in the images. Furthermore, such automated processing may be required before performing optical character recognition (OCR) analysis.
Existing automated techniques for script and orientation detection of the images are not robust enough to accurately detect the script and orientation and/or are highly computationally intensive.
Examples of the invention will now be described in detail with reference to the accompanying drawings, in which:
The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way.
A system and method for script and orientation detection of images are disclosed. In the following detailed description of the examples of the present subject matter, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific examples in which the present subject matter may be practiced. These examples are described in sufficient detail to enable those skilled in the art to practice the present subject matter, and it is to be understood that other examples may be utilized and that changes may be made without departing from the scope of the present subject matter. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present subject matter is defined by the appended claims.
At block 104, a vertical component run (VCR) and horizontal component run (HCR) are obtained by vectorizing each connected component of a plurality of connected components in the extracted textual content in the image. This is explained below in more detail with reference to
At block 108, a substantially matching script and orientation are obtained by comparing the computed concatenated VDV and HDV of the image with reference VDV and HDV associated with each script and orientation, respectively. The reference VDV and HDV for each script and orientation are generated by averaging the VDVs and HDVs obtained from a plurality of images associated with each script and orientation. This is explained below in more detail with reference to
At block 110, a check is made to determine whether the computed concatenated VDV and HDV of the image substantially matches with the reference VDV and HDV of the matching script and orientation. If the computed concatenated VDV and HDV of the image substantially matches with the reference VDV and HDV of the matching script and orientation, the method goes to block 112, and declares the matching script and orientation as the script and orientation of the image. If the computed concatenated VDV and HDV of the image does not substantially matches with the reference VDV and HDV of the matching script and orientation, the method goes to block 114, and performs statistical script identification to identify the script of the image.
At block 116, SSDs between the computed and reference VDVs and HDVs associated with each orientation of a plurality of orientations are computed. For example, the plurality of orientations includes image orientation angles selected from the group consisting of 0 degree, 90 degree, 180 degree, and 270 degree. At block 118, a check is made to determine whether any one of the computed SSDs associated with the plurality of orientations is equal to or below a second threshold value. If any one of the computed SSDs associated with the plurality of orientations is equal to or below the second threshold value, the method goes to block 120, and declares the orientation associated with the computed SSD that is equal to or below the second threshold value as the orientation of the image. If any one of the computed SSDs associated with the plurality of orientations is not equal to or below the second threshold value, the method goes to block 122, and performs statistical orientation identification to identify the orientation of the image.
Referring now to
Referring now to
Referring now to
Further, the CCA and resolution based thresholding are performed on the dilated image to identify and extract the textual content, shown in
Referring now to
For example, position of 1 in 1-8 values of the 32-value vector of the VCR of the connected component represents number of vertical cuts in the connected component. Further, the position of 1's in 9-16 values of the 32-value vector of the VCR of the connected component represent location of vertical cuts lying in the top zone of the connected component. Furthermore, the position of 1's in 17-24 values of the 32-value vector of the VCR of the connected component represent location of vertical cuts lying in the middle zone of the connected component. In addition, the position of 1's in 25-32 values of the 32-value vector of the VCR of the connected component represent location of vertical cuts lying in the bottom zone of the connected component.
Referring now to
For example, position of 1 in 1-8 values of the 32-value vector of the HCR of the connected component represents number of horizontal cuts in the connected component. Further, the position of 1's in 9-16 values of the 32-value vector of the HCR of the connected component represent location of horizontal cuts lying in the left zone of the connected component. Furthermore, the position of 1's in 17-24 values of the 32-value vector of the HCR of the connected component represent location of horizontal cuts lying in the middle zone of the connected component. In addition, the position of 1's in 25-32 values of the 32-value vector of the HCR of the connected component represent location of horizontal cuts lying in the right zone of the connected component.
Referring now to
In one example implementation, the reference VDV and HDV for each script are generated by averaging the VDVs and HDVs obtained from a plurality of images associated with each script. Further, the obtained reference VDV and HDV are used in obtaining the substantially matching script of the image. This is explained in more detail with reference to
Referring now to
In one example implementation, the reference VDV and HDV for each script and orientation are generated by averaging the VDVs and HDVs obtained from the plurality of images associated with each script and orientation. Further, the obtained reference VDV and HDV are used in obtaining the substantially matching script and orientation of the image. This is explained in more detail with reference to
Referring now to
In one example implementation, a statistical model is constructed to identify the reliable index value out of the 64 index values as the feature of the associated script using the statistics of all the 64 index values. The statistics include mean and standard deviation generated using about 100 documents of each script and orientation. Further, the Gaussian distribution of the values at reliable index of reference 64-value vector associated with various scripts is used in the statistical script identification. The statistical script identification is used to determine deviation of the computed VDV and HDV with the reference VDV and HDV and to correctly detect the script of the image.
Referring now to
In one example implementation, the statistical model is constructed to identify the reliable index out of the 64 values as the feature of the particular orientation of a script. Further, the Gaussian distribution of values at the reliable index (i.e., 10th index) of the reference 64-value vector associated with the 0 and 180 degree orientations of Chinese script is used in the statistical orientation identification. The statistical orientation identification is used to determine a deviation of the computed VDV and HDV with the reference VDV and HDV and to correctly detect the orientation of the image.
Referring now to
In one example implementation, the statistical model is constructed to identify the reliable index out of the 64 values as the feature of the particular orientation of a script. Further, the Gaussian distribution of values at the reliable index (i.e., 18th index) of the reference 64-value vector associated with the 0 and 180 degree orientations of Korean script is used in the statistical orientation identification. The statistical orientation identification is used to determine deviation of the VDV and HDV with the reference VDV and HDV and to correctly detect the orientation of the image.
Referring now to
As shown in the exemplary table 800, the first row shows the various scripts, such as Chinese, Korean, Japanese, Hindi, and English. Further, the second row shows the number of images of various scripts used for the detection of script and orientations. Furthermore, the third row shows the accuracy rate of the detection of scripts. Also, the fourth row shows the accuracy rate of the detection of orientations with given script information.
Referring now to
The system 902 includes a processor 904, memory 906, a removable storage 920, and a non-removable storage 922. The system 902 additionally includes a bus 916 and a network interface 918. As shown in
Exemplary user input devices 924 include a digitizer screen, a stylus, a trackball, a keyboard, a keypad, a mouse and the like. Exemplary output devices 926 include a display unit of the personal computer, a mobile device, and the like. Exemplary communication connections 928 include a local area network, a wide area network, and/or other network.
The memory 906 further includes volatile memory 908 and non-volatile memory 910. A variety of computer-readable storage media are stored in and accessed from the memory elements of the system 902, such as the volatile memory 908 and the non-volatile memory 910, the removable storage 920 and the non-removable storage 922. The memory elements include any suitable memory device(s) for storing data and machine-readable instructions, such as read only memory, random access memory, erasable programmable read only memory, electrically erasable programmable read only memory, hard drive, removable media drive for handling compact disks, digital video disks, diskettes, magnetic tape cartridges, memory cards, Memory Sticks™, and the like.
The processor 904, as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction word microprocessor, an explicitly parallel instruction computing microprocessor, a graphics processor, a digital signal processor, or any other type of processing circuit. The processor 904 also includes embedded controllers, such as generic or programmable logic devices or arrays, application specific integrated circuits, single-chip computers, smart cards, and the like.
Examples of the present subject matter may be implemented in conjunction with program modules, including functions, procedures, data structures, and application programs, for performing tasks, or defining abstract data types or low-level hardware contexts. Machine-readable instructions stored on any of the above-mentioned storage media may be executable by the processor 904 of the system 902. For example, a computer program 912 includes machine-readable instructions capable of detecting script and orientation of images in the system 902, according to the teachings and herein described examples of the present subject matter. In one example, the computer program 912 is included on a compact disk-read only memory (CD-ROM) and loaded from the CD-ROM to a hard drive in the non-volatile memory 910. The machine-readable instructions cause the system 902 to encode according to the various examples of the present subject matter.
As shown in
The script and orientation detection module 914 extracts textual content in the image. In one example implementation, the script and orientation detection module 914 obtains the gray level/colored image by capturing the image using the camera or scanner. Further, the script and orientation detection module 914 obtains the binarized image from the gray level/colored image. Furthermore, the script and orientation detection module 914 dilates the binarized image to join disjoint parts of characters in the binarized image. In addition, the script and orientation detection module 914 identifies and extracts the textual content by performing a connected component analysis and a resolution based thresholding on the dilated image.
Further, the script and orientation detection module 914 obtains the VCR and the HCR by vectorizing each connected component in the extracted textual content in the image. Furthermore, the script and orientation detection module 914 computes a concatenated VDV and HDV by averaging the obtained VCR and HCR for each connected component in the image. In addition, the script and orientation detection module 914 obtains a substantially matching script and orientation by comparing the computed concatenated VDV and HDV of the image with reference VDV and HDV associated with each script and orientation, respectively. Also, the script and orientation detection module 914 determines whether the computed concatenated VDV and HDV of the image substantially matches with the reference VDV and HDV of the matching script and orientation, respectively. Moreover, the script and orientation detection module 914 declares the matching script and orientation as the script and orientation of the image, when the computed concatenated VDV and HDV of the image substantially matches with the reference VDV and HDV of the matching script and orientation, respectively.
In various examples, the system and method described in
Although certain methods, apparatus, and articles of manufacture have been described herein, the scope of coverage of this patent is not limited thereto. To the contrary, this patent covers all methods, apparatus, and articles of manufacture fairly falling within the scope of the appended claims either literally or under the doctrine of equivalents.
Wu, Yifeng, Jain, Chirag, Kadagattur, Srinidhi
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
7020338, | Apr 08 2002 | The United States of America as represented by The National Security Agency | Method of identifying script of line of text |
7392473, | May 26 2005 | Xerox Corporation | Method and apparatus for determining logical document structure |
8509537, | Aug 05 2010 | Xerox Corporation | Learning weights of fonts for typed samples in handwritten keyword spotting |
20090028435, | |||
20110249897, | |||
20130194448, | |||
20130195376, | |||
20130266176, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Aug 29 2011 | Hewlett-Packard Development Company, L.P. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Date | Maintenance Schedule |
Jul 29 2017 | 4 years fee payment window open |
Jan 29 2018 | 6 months grace period start (w surcharge) |
Jul 29 2018 | patent expiry (for year 4) |
Jul 29 2020 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 29 2021 | 8 years fee payment window open |
Jan 29 2022 | 6 months grace period start (w surcharge) |
Jul 29 2022 | patent expiry (for year 8) |
Jul 29 2024 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 29 2025 | 12 years fee payment window open |
Jan 29 2026 | 6 months grace period start (w surcharge) |
Jul 29 2026 | patent expiry (for year 12) |
Jul 29 2028 | 2 years to revive unintentionally abandoned end. (for year 12) |