Disclosed are systems, computer-readable mediums, and methods for detecting glare in a frame of image data. A frame of image data is preprocessed. A set of connected components in the preprocessed frame is determined. A set of statistics is calculated for one or more connected components in the set of connected components. A decision for the one or more connected components is made, using the calculated set of statistics, if the connected component is a light spot over text. Whether glare is present in the frame is determined.
|
1. A method for detecting glare in a frame of image data, the method comprising:
preprocessing the frame of the image data;
determining a set of connected components in the preprocessed frame;
calculating a set of statistics for one or more connected components in the set of connected components, wherein calculating the set of statistics comprises gathering statistics of directions and values of gradients of intensity along a perimeter of a connected component of the one or more connected components;
making a decision for the one or more connected components, using the calculated set of statistics, if the connected component is a light spot; and
determining, using a processor, whether the frame comprises the glare in view of the decision for the one or more connected components.
30. A system to detect glare in a frame of image data, the system comprising:
one or more processors configured to:
preprocess the frame of the image data;
determine a set of connected components in the preprocessed frame;
calculate a set of statistics for one or more connected components in the set of connected components, wherein to calculate the set of statistics, the one or more processors are to gather statistics of directions and values of gradients of intensity along a perimeter of a connected component of the one or more connected components;
make a decision for the one or more connected components, using the calculated set of statistics, if the connected component is a light spot; and
determine whether the frame comprises the glare in view of the decision for the one or more connected components.
34. A non-transitory computer-readable medium having instructions stored thereon to detect glare in a frame of image data, the instructions comprising:
instructions to preprocess the frame of the image data;
instructions to determine a set of connected components in the preprocessed frame;
instructions to calculate a set of statistics for one or more connected components in the set of connected components, wherein the instructions to calculate the set of statistics comprise instructions to gather statistics of directions and values of gradients of intensity along a perimeter of a connected component of the one or more connected components;
instructions to make a decision for the one or more connected components, using the calculated set of statistics, if the connected component is a light spot; and
instructions to determine whether the frame comprises the glare in view of the decision for the one or more connected components.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
8. The method of
sorting the set of connected components based upon a size of the components; and
excluding connect components less than a predetermined size from the set of connected components.
9. The method of
determining a second set of statistics for one or more of the connected components that remain in the set of connected components after removing from further consideration all light spot connected components based upon the decision making; and
determining for each of the one or more of the connected components, using a trained classifier and the calculated second set of statistics, if the one or more of the connected components is a light spot.
11. The method of
12. The method of
13. The method of
15. The method of
16. The method of
17. The method of
18. The method of
19. The method of
segmenting the frame into regions of image data;
determining for a connected component a first set of regions of image data that contain this connected component;
determining a third set of statistics based upon regions in the first set of regions for the connected component;
determining a related statistic based upon one or more regions of the image data related to the first set of regions for the connected component; and
comparing the third set of statistics to the related statistic determined therein.
20. The method of
21. The method of
22. The method of
23. The method of
determining an area of detected glare in the frame;
altering the frame to visually represent the area of detected glare;
displaying the altered frame;
receiving input from a graphical user interface that changes the area of the glare; and
updating parameters of a classifier to detect glare based upon the received input.
24. The method of
25. The method of
26. The method of
27. The method of
determining a message corresponding to detected glare to reduce or correct the detected glare; and
displaying the message.
28. The method of
determining a level of defects within the frame based upon an analyzing of regions of image data to determine types of defects contained within the regions of image data;
determining the level of defects within the frame is below a predetermined threshold; and
saving the frame in long term memory based upon the determining the level of defects within the frame is below a predetermined threshold.
29. The method of
31. The system of
32. The system of
sort the set of connected components based upon a size of the components; and
exclude connect components less than a predetermined size from the set of connected components.
35. The non-transitory computer-readable medium of
36. The non-transitory computer-readable medium of
instructions to sort the set of connected components based upon a size of the components; and
instructions to exclude connect components less than a predetermined size from the set of connected components.
37. The non-transitory computer-readable medium of
|
This application is a continuation of U.S. patent application Ser. No. 14/564,424, filed Dec. 9, 2014 which will issue as U.S. Pat. No. 9,418,407 on Aug. 16, 2016 which is a continuation-in-part of U.S. patent application Ser. No. 13/305,768, filed Nov. 29, 2011, now U.S. Pat. No. 8,928,763, issued Jan. 6, 2015 which is a continuation-in-part of U.S. patent application Ser. No. 12/330,771, filed Dec. 9, 2008, now U.S. Pat. No. 8,098,303, issued Jan. 17, 2012. This application also claims the benefit of priority to Russian patent application No. 2014101665, filed Jan. 21, 2014; disclosures of the priority applications are herein incorporated by reference.
Various mobile devices include a built-in camera. Pictures taken with mobile devices can include various defects, such as, blurred areas, fuzzy or unfocused areas, areas with glare, etc. Prior to taking a picture of a scene, the scene can be shown in a viewfinder. Determining when to capture an image to minimize defects can be a problem. Defects in captured images may be caused by one or more various reasons such as: limited resolution of the camera matrix, issues in the optical system of the electronic device, data encryption algorithms, and insensitive or crude compression algorithms. The imperfections in the use of a mobile camera such as hands shakes or non-ideal light conditions may also cause defects in captured images. Such defects can include optical distortions, blur caused by limited shutter speed, noise, smoothing effect, defocusing, aliasing effects, glare in images etc. These defects can negatively impact the processing of the captured image.
Disclosed are systems, computer-readable mediums, and methods for detecting glare in a frame of image data. A frame of image data is preprocessed. A set of connected components in the preprocessed frame is determined. A set of statistics is calculated for one or more connected components in the set of connected components. A decision for the one or more connected components is made, using the calculated set of statistics, if the connected component is a light spot over text. Whether glare is present in the frame is determined.
Described herein systems, computer-readable mediums, and methods for performing preliminary analysis of image data, such as a frame (current view or scene) on a display screen of a viewfinder, before capturing an image, and controlling the generation of images with a camera, such as those built into portable electronic devices. In various embodiments, results of preliminary analysis of a frame are displayed as visual signals on a screen in real time. These visual signals can help substantially simplify the process of capturing and to improve the quality of captured images.
The foregoing and other features of the present disclosure will become more fully apparent from the following description and appended claims, taken in conjunction with the accompanying drawings. Understanding that these drawings depict only several implementations in accordance with the disclosure and are; therefore, not to be considered limiting of its scope, the disclosure will be described with additional specificity and detail through use of the accompanying drawings.
Reference is made to the accompanying drawings throughout the following detailed description. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative implementations described in the detailed description, drawings, and claims are not meant to be limiting. Other implementations may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated and made part of this disclosure.
Implementations of various disclosed embodiments relate to analyzing a frame of image data to find defects. The frame can be image data shown in a viewfinder. The detected defects can be analyzed before the image is obtained by a user. Evaluation of the frame for defects can be used to avoid shooting images unsuitable or poorly suitable for recognition. The preliminary results of defects detection can be displayed directly on the screen of a mobile device in order to improve the conditions of shooting for higher picture quality. For example, defects can be visually highlighted or user prompts that include information regarding detected defects can be presented on the mobile device.
In another embodiment, analysis of a frame of a depicted scene on a screen of a viewfinder of an electronic device is performed. The display of a frame on a screen of a mobile device can be performed in real time without recording a video or capturing the image and storing it in the memory of the electronic device. The analysis can be performed on multiple frames and the frame containing the least amount of noise and defects can be determined. This frame can be displayed along with the results of defects detection in the form of visual signals. Accordingly, blur, defocus and glare can be detected and the areas with defects shown on the electronic device.
In another embodiment, the capture of an image shown in the frame can be automatically performed if the level of defects within the frame does not exceed a threshold value. Accordingly, preliminary analysis of the frame based upon data received by lens of camera application can be used to determine when to capture the image of the scene shown in the frame.
In the following description, the term “frame” refers to a content, e.g., an image, displayed on the screen of an electronic device produced using the camera viewfinder. The frame can be transmitted in real time without recording video or capturing/storing the image in long term memory.
There is a plethora of portable electronic devices with display screens capable of displaying text and/or images. These portable devices typically include a camera which allows these devices to capture images. Devices with a display screen and/or touch screen and a camera include many mobile devices, such as laptops, tablet computers, smartphones, mobile phones, personal digital assistants (PDAs), etc. Various embodiments facilitate the capturing or shooting a document containing text automatically if the level of defects within the frame does not exceed a threshold value.
Detection of boundaries of objects is a versatile tool in image processing and can be applied not only to the recognition systems, but also in other image processing applications, such as scanning systems, cameras, control systems, the print quality of a robot, etc.
Characteristics of algorithms used to detect blur and defocusing can include:
The above characteristics can make the task of blur detection in every scenario very difficult. One way to simplify the task is to narrow down the class of images processed to those that include certain types of object. For example, images that include cells or bacteria, textured objects, text, barcode images, or light and shadow. If the type of object to be expected in an image is known in advance, embodiments can identify not only the typical features of the objects themselves, but also the typical features of the blur on these objects. But even with the class of images narrowed down, the task of blur detection can remain very difficult because of high-performance and noise resistance characteristics.
One or more of the following features can be used to determine the presence of blur and defocusing within a frame of image data:
The ability to detect boundaries in images with distortions depends on the degree of these distortions, and the relative size and contrast of information items within the image. Thus, the small or low contrast letters can be almost unreadable even with low blur, while large or high-contrast letters can remain legible at the same (or even larger) degree of blur. As described in greater detail below, the relative size and contrast of information items can be taken into account when determining defects and areas of those defects.
In addition to detecting defects or distortions in an image, detecting a degree of distortion is useful. Also, determining the type of distortion, e.g., blur, defocusing or glare, can also be useful. A type of distortion in one or more areas can be determined by extrapolating the information from neighboring regions.
In one embodiment, a button on the mobile device can be actuated to trigger capture of an electronic image. Rather than simply capturing the image based upon the button, the frame can be analyzed to determine if the amount of distortion does not exceed a predetermined threshold. After the button is actuated, analysis of the frame can begin. The button 106 shown in
Viewfinders of devices can provide real time image data that is displayed on the screen. The defect analysis can be done for each frame or can be done on a subset of frames received from the viewfinder. For example, the frame can be selected from every ten frames/views received from the viewfinder. Other frequencies are possible such as every 2, 4, 6, 100, or 200 frames can be used. Each frame or each selected frame can be analyzed to identify one or more of blur, defocusing, glare or noise defects. Selecting and analyzing a frame can be performed without interrupting display of the image data on the viewfinder.
Detecting Blur and Defocusing
According to various embodiments a Laplacian filter can be used for blur and defocusing detection. The conclusions about blur or defocusing within the frame may be made based on analysis of results of the second derivatives of the frame's brightness in the neighborhood of zero crossing points. If an area contains objects with contrasting edges, the zero crossing points are well detected. In the opposite case, the absence of objects with contrasting edges indicates a high noise level or a strong degree of blur.
In one embodiment, the second-order method is used to detect object edges. Common examples of this method involve a LoG filter (the Laplacian of the Gaussian) and a DoG filter (the Difference of Gaussians).
The filter applying can be efficiently computed in linear time based on the number of pixels within a frame. In case of the Laplace approximation, using the second derivative in a given direction of brightness function provides a cut of the Laplacian like image passing through the pixel along this direction. This approximation in the neighborhood of the analyzed point will be called a point profile.
Parameters selection for DoG or LoG filtering can take into account features of a class of images and features of a camera structure to detect presence of distortions at known parameters of noise and characteristics of objects within the frame.
A LoG filter first applies a Gaussian smoothing, then applies the Laplacian filter. A DoG filter is the subtraction of two Gaussian smoothed images obtained by convolving the original image with Gaussian functions having different degrees of smoothing (standard deviation). To set up the filters, parameter σl and σ2 (standard deviation) of the Gaussian smoothing is used:
Then zero crossing checking can be performed (i.e., when the resulting value goes from negative to positive or vice versa). The result of applying these filters is an image with the edges of the original image.
In order to detect blur and defocusing, only the edges of the objects can be analyzed instead of the entire image. Broad classes of images will have objects with well-defined edges in all directions. Such defects as blur and defocusing typically result in degradation of these well-defined edges into less sharp edges.
There are step changes in brightness at the objects edges in many frames. In the case of undistorted images, these are sharp changes in brightness at the objects edges. The cut of ideal image of a document is close to the “step”. To get close to the function of a piecewise constant, or a step function, all of the images can be cut through. However, due to imperfections in recording systems, the consequences of sampling and other processes, the step function is eroded and superimposed with noise. For images with negligible distortion of text edges, however, abrupt changes in brightness can be detected and/or distinguished.
After light passes through an optical system and images are registered on a photographic sensor, various distortions of the original image can occur, such as uneven lighting, blur, defocusing, and noise. Additional distortions can be introduced when the image is post-processed and compressed (e.g., by using a JPEG or other lossy compression that yields JPEG artifacts). The edges of the objects are also affected by all of the above types of distortion. Therefore, instead of a sharp change in brightness at the edges of objects, there is a gradual change in brightness compounded by noise. By analyzing the brightness profile at the edges, presence of one or another distortion may be presumed and strength of image degradation may be identified.
With reference to
At operation 202, in order to obtain the second-order derivatives for the original image (201), any Laplacian filter, or any approximation of a Laplacian filter, can be applied. To reduce the effect of noise, the image can first be smoothed, for which purpose Gaussian smoothing or any other smoothing method may be used, e.g. median filter, box-filter, adaptive median filter, etc. Also, transformations may be used which include both filters, such as LoG (the Laplacian of the Gaussian) filtering and DoG (the Difference of Gaussians) filtering.
A smoothed original image to which a Laplacian filter or its approximation has been applied is hereafter termed Laplacian-like image. An example of Laplacian-like image of the fragment is represented in
At operation 203, a zero-crossing filter is applied to the Laplacian-like image and is shown in
At operation 204, for each point on the edges thus obtained (as shown in
Next, local extrema (e.g., points 502 and 503 shown in
In one embodiment, to make the gathering of the statistics (e.g., operation 205 in
Additionally, for each extremum its reliability may be recorded in order to exclude unreliable extrema from the statistics. A local extremum may be assumed to be unreliable if its position may not be reliably detected, for example, when it is considerably stretched or there are two or more local spikes nearby. A large share of unreliable extrema may be a sign of a considerable strength of image distortion. For example in
The whole image may be segmented into non-overlapping areas of any shape. Statistics are gathered within each of the areas. Segmentation into areas makes it possible to take into account possible variations in blur and others defects direction in different areas. The direction of blur may vary, for example, due to rotation of a camera in the moment of shooting, or a part of an image may be out of focus when photographs are taken at close distances. Additionally, connected components of the edges may be selected as areas, each connected component being a separate object or object part. This approach allows detecting blur and defocusing separately for each object within a frame, which makes it possible to use detector, for example, for detecting moving objects against a static background.
Segmenting a frame into non-overlapping regions may be used as one of possible instruments intended for transmitting a visual signal to a user about the presence of defects within the frame, at least within one of the areas in the screen of a viewfinder. For example, areas of a frame in the screen of a viewfinder, which contain one of possible defects, may be filled with a certain color. The intensity of coloring may correlate with (correspond to) degree of distortion within the area.
The grid of non-overlapping regions may be represented in various designs and shapes. For example, the grid of non-overlapping regions may be represented in form of square regions (as shown in
As was mentioned above, the grid is very convenient for demonstrating or visually representing the detected defects for a user. For example, the grid can be used to indicate the part of a screen that contains one or more types of defect. In addition, the type of detected defect (blur, defocusing, noise, etc.) can also be communicated via the grid. To distinguish the type of detected defect, each defect may be designated by one or more kinds of signals. One of the possible signals is indicating the type of defects using different colors. In this example, the type of defect in a particular region determines a color used to color that region. For example, filling the regions may be performed in one color (for example black, as shown in
In other embodiments, filling the regions may be performed by using different colors for visualizing the kind of the defect as illustrated in
The statistics for profiles of the second derivative are gathered separately for each area (205) and are then analyzed (206). The following features may be used: mean value, dispersion, asymmetry coefficient, and other features computed for each parameter of the second derivative profile (e.g., the absolute values of the local maxima and minima, maximum and minimum offset, etc.). Other features may include the number of identified extrema, the ratio of unreliable extrema (for which the position of the maximum or minimum could not be reliably detected), and correlations between the offsets and absolute values of the extrema.
As statistics can be gathered separately for each specified direction of the gradient, the features (mean value, dispersion, etc.) are also computed separately for each direction. Thus, if eight directions are specified (at multiples of 45 degrees), each feature may be represented as eight vectors (e.g., as shown in
In one embodiment, within operation 206, the features thus obtained may be passed to any trained classifier which will use them to identify the type of distortion and its parameters (207). For example, the mean distance from the zero crossing point to the extrema (i.e., the mean value of the extremum offset) for the second derivative gives an estimate of the strength of the blur—the greater the distance or magnitude, the greater the blur. Besides a classifier may be trained based on results of subsequent recognition (OCR/ICR) of an analyzed frame. For example, a classifier may changes some parameters of blur or defocusing detection based on great amount of false recognized characters.
In another embodiment, within operation 206, the set of vectors may be described by a set of features that are invariant to a shift and rotations, for example, by the mean length of the vector, the vectors stretching in a particular direction. These features may include various image moments (e.g., central moments, Hu set of invariant moments, their combinations, etc.). The computed features are subsequently passed to any trained classifier for further analysis. This approach greatly reduces the dimensions of the feature space handled by the classifier and makes the features more immune to lack of statistics in one or more of the specified directions (for example, if there is not enough edges in a particular direction of the gradient).
As an example, the set of vectors for each feature may be regarded as a construct made up of material points with certain weights at the ends of the vectors. An inertia tensor may be computed for this construct (e.g., 510), which is a combination of central moments of second order. The weights may be selected, for example, as proportionate to the number of reliable extrema in the direction of the vector, or based on any other criteria. In the general case, the weights may be assumed to be equal 1. Next, the features of the inertia tensor are computed, such as the eccentricity and the trace of the matrix of the tensor, which are subsequently analyzed by a classifier.
The features of the inertia tensor may be used to identify, for example, the following distortion features. The eccentricity may indicate either blur or defocusing, as eccentricity values close to 0 are typical of defocusing. The trace of the inertia tensor provides information about the strength of the blur or defocusing. The direction of the blur coincides with the eigenvector of the inertia tensor with the maximum eigenvalue.
Sometimes properties of the image can be revealed using additional data in addition to only eigenvalues of tensor inertia. For example, a Hu set of invariant moments can be used to supplement the set of tensor inertia eigenvalues. As a non-limiting example, Hu invariant moments may be computed in the following manner:
where μij are central moments. Optionally, in the above described formulas central moments μij may be substituted by their respective normalized value. Normalization can be calculated by the following formula:
Detecting objects of small widths (e.g., thin lines) is a special problem, especially if such objects have small areas or are obscured by noise. The disclosed detector allows adjusting or changing of the parameters of the filter used to obtain the Laplacian-like image. The parameters of the filter may be selected so as to make the edges of the small-width objects more immune to noise and easier to detect.
Establishing the degree of distortion within the image is performed in operation 208. One of the results of distortion detectors running may be, for example, visual representation of distorted areas within a frame displayed on a screen.
Besides, depending on the type of defect system may provide recommendations to the user to improve the recording conditions. For example, if glare is detected within the frame, the user can receive a message with advice to change the light conditions so as to avoid glare. If the text is blurred, the user can receive a message with advice to stabilize the device before the moment of shooting. If no defects are detected, or defects were negligible (e.g., below a predetermined threshold), the application can automatically take a picture. Also, it is possible that the user may perform a shooting manually.
The disclosed method is noise resistant, making the detector highly reliable. In the case of high levels of noise or low-contrast edges, additional analysis of area statistics may be performed. Additional analysis may also discover complicated defects, for example, those caused by complicated movements of the photographic camera.
The analysis may be performed on the original frame, and on its reduced copies. In the latter case, reducing the image can include smoothing. Smoothing may be performed at the preliminary step, or within the stage of downsampling. Smoothing can reduce noise levels, and the downsampling can improve an algorithm performance.
Noise Detecting
Various embodiments can detect noise in a frame. Noise detection can be performed within the entire image or for one or more regions of the image. Various statistical indicators may be used as a noise measure in areas of image, such as the mean and the variance of luminance in a particular area. The more an area is analyzed, the more accurate and reliable the noise indicator becomes. Based on the results of noise measurements within each of the areas in the frame, it is possible to determine regions that contain a greater noise amount. An amount of noise may depend on specific features of the optical systems, so the noise level may be adjusted or indicated in the device settings.
Noise determination within a frame is related to visual elements of the image. Thus, for example, large letters with large noise content may be readable and recognizable, while small letters with the same noise amount may be unrecognizable and unreadable. When an image is to be used for OCR processing, the noise detection within a frame can be based on a “signal/noise ratio” calculation. This ratio indicates what is the ratio of useful signal to noise level within the frame (or an area of the frame). The higher value of the said ratio, the less noise is present within the frame. If the signal to noise ratio is below a predetermined threshold, a warning message to a user with information about the amount of noise within the current analyzed frame can be displayed. It is also possible to display a message with advice for changing shooting conditions to reduce the noise level within the image or portions of the image.
Glare Detection
An example of a frame in a viewfinder with glare (601) is illustrated in
along the vertical and along the horizontal, or by subsequent smoothing and interpolation. For example masks may be represented as
Smoothing can reduce the amount of noise within the frame.
Then the frame can be binarized at some adjusted suboptimal threshold value in operation 301. For converting the analyzed frame into a binary version, thresholding may be used. For this, some sub-optimal threshold value th of inverted binarization can be specified. For example, a pixel value that is greater than or equal to th is set to value 1 and a pixel value less than th is set to 0. The resulting image is called a binary image because each one of its pixels can only be in one of two states. Other techniques for binarization also may be used. Examples of frames after threshold inverted binarization are illustrated in
It is assumed that the scene in the viewfinder and shooting conditions from frame-to-frame changes slightly. Accordingly, in some embodiments, the selection of a sub-optimal binarization threshold value for the current series of frames can be based on previously analyzed frames.
When there is insufficient smoothing in an area, the area can be “light spot over text” or glare. The term “light spot over text” is referred to an indication of an overexposed area, where the area contains an edge of brightness drop but the text is readable and recognizable. In regard to glare, the area contains a distorted signal that is inappropriate for future processing. An example of glare is shown in
The heterogeneity may be revealed in the appearance of small light spots over text within the paper due to its roughness. To avoid the described above drawback, it is possible to increase the smoothing degree. An alternative method is to apply a dilatation operation to a binarized image. Other morphological filters may be also applied. Morphological filter “dilatation” expands the area of a frame by extending the pixels in a frame. As a result, a union of domains occurs in a frame, which could be separated by some noises remaining after smoothing. Dilatation parameters (size and shape) may depend on the preceding steps. Dilatation may be performed, for example, in a 3×3 window.
The search of simply connected regions (components) within the binarized image is performed in operation 302. Identified connected components may be sorted by their size. Sorting of identified connected regions may increase the chance of detecting a “light spot over text” without going through all the found-related components. In addition, connected small components may be excluded to accelerate the glare detection procedure.
In operation 303, each detected simply connected component is examined to determine whether the area covered by the component is a “light spot over text” or not. At this stage, a cascade eliminating connected components may be applied to identify components that are definitely not a glare. This step is useful to avoid resource-intensive phases of data analysis. Thus, for example, those connected components that are not “sufficiently tight” may be not analyzed, i.e. those components which occupy a small part of the area in the image. Thus, white interline intervals and column spacing may be excluded in advance from the following analysis.
In one embodiment of the disclosed invention to determine whether the area covered by the connected component is a “light spot over text” or not, the computed statistics may be applied to some trained classifier. Based on this statistics the trained classifier may make the decision whether the detected connected component (area) is “light spot over text” or not. For example cascading classifiers may be used. For example, several classifiers can be arranged in a cascade, e.g., a degenerated decision tree. In each stage of the cascade, a decision can be made whether the image contains the object or not. A cascading classifier can apply simple features at first stage. The computing of simple features is not resource consuming. Examples of possible features include the ratio of black pixels to the perimeter of a connected component, average intensity, etc. During the next stage if there is a possibility that the analyzed area is a glare, additional features (additional statistics) may be computed. For instance, the additional statistics can include the gradient of intensity along the perimeter that also may be computed. These additional statistics can be used by one or more stages of the cascading classifiers. Also one or more of the classifiers may be trained based on the results of recognition of a frame (image). For example, a classifier may change the parameters of glare detection based on the amount of the uncertainty recognized characters. Based on the recognition results one or more classifiers may be trained.
Furthermore resource consuming operations may be performed and more complex statistics may be computed. For example, operations for checking the detected area on the subject of whether this area is “light spot over text” follows. Detected connected areas are approximated with an ellipsoid of inertia. The formula of a quadratic form of an ellipse can then be applied.
Along the edge of binarized simply connected area (or other approximated figures) statistics of directions and values of gradients of intensity based on the initial frame are gathered. If the direction of the brightness gradient is directed almost everywhere to the center of a spot (as shown in
If at operation 303, the area is not considered to be lighted, the next suboptimal threshold value of binarization is selected. Then the image is analyzed again searching for “light spot over text” areas. The process of selecting an optimal threshold of binarization and “light spot over text” detection at this value of binarization is iterative and repeated until the “light spot over text” are either found inside a frame or a decision is made that the “light spot over text” is missing. To calculate the threshold values may be used following formula:
thi =thi-1 * k
Where thi is a threshold value at the step i of binarization, thi-1 is a threshold value at the step (i-1) of binarization, k is a coefficient.
It is assumed that the scene in a viewfinder and the conditions of a shooting from a frame to frame change. Accordingly, the selection of an optimal parameter of binarization for a current series of frames can be based on the previously analyzed frames. Thus the optimal threshold value of binarization at a certain step i, at which “light spot over text” were found in an image can be stored and applied to the next frame in a series of frames. This procedure helps to noticeably shorten the amount of time needed to analyze frames in video sequence. Results of different binarization threshold values for detecting a “light spot over text” in the original image are illustrated in
Statistics about the signal in the overlighted area can be collected separately in operation 304. Example statistics include calculating a signal to noise ratio for the regions in the overlighted area. The signal to noise ratio can also be calculated in related regions, e.g., regions near the overlighted area. As an example, the regions surrounding the overlighted area can be used. The signal to noise ratio of the regions in the overlighted area can then be compared to the signal to noise ratio of one or more of the related regions in operation 305. If the indicator “signal/noise ratio” is significantly lower in the regions containing a simply connected region, this area is considered glare, not just an overlighted area. This indicates that the information in the distorted area includes a signal which is unsuitable for further recognition. The calculated “signal/noise ratio” can also be used for blur and defocusing detection.
In addition, at this step the obtained features may be passed to any trained classifier which will use them to identify the glare presence and its parameters. Classifier may be trained based on users interacting with an electronic device. For example, GUI functions related to training may be turned on allowing a user an opportunity to train a classifier during the detectors run. For example, the user may have an opportunity to mark manually, correct or change type or areas of defects within the frame.
In
In addition, a classifier may be trained based on results of subsequent recognition (OCR/ICR) of an analyzed frame. For example, a classifier may changes some parameters of glare detection based on great amount of false recognized characters from OCR recognition.
Specifically statistics about incorrectly or uncertainly recognized characters within each of the areas within the frame (image) can be computed. For example, image data can be sent to remove computing that device that uses OCR to determine text contained within the image data. In an alternative embodiment, OCR can be done locally on the electronic device. Based upon the OCR process, data indicating the recognized characters and the amount of incorrectly or unrecognized characters can be received. If the level of incorrectly or uncertainly recognized characters exceeds some preliminary determined threshold value for an area within the frame, the area within the frame can be determined to have a defect. In addition, this information can be used to train a classifier used to detect the defects. For example, a hypothesis H0 about the presence of the defect can be formulated based upon the level of incorrectly or uncertainly recognized characters exceeds some preliminary determined threshold value. Threshold value h may be determined by user or can be a default value associated with the electronic device. If the formulated hypothesis H0 is confirmed by the computed statistics in the area, the classifier can be trained. So, for example, if the detector has not found a defect in the area, when recognition results comprise a large number of false or uncertain characters that indicates that the classifier should be trained more. One of the known training algorithms may be used for mentioned above goals.
If degree of distortion (306) of a frame does not exceed some predetermined by user (default or specified in the device) threshold value, the capturing of the image may be accomplished. In one of the ways the recording may be accomplished automatically by an electronic device. In an alternative method of performing a record may be accomplished manually by user.
If one of the areas contains a defect, a set of possible actions may be suggested for user. These suggestions can include advice via the GUI of the electronic device in form of written or audio messages. These messages can include instructions for changing one or more shooting conditions or settings to receive an image with better quality.
If the detected defect is a noise, possible actions include:
If the detected defect is a blur or defocusing, so possible actions are:
If the detected defect is a glare, so possible actions are:
The possible actions are not limited by the mentioned above advices. Other messages or actions may be suggested for user. Instructions regarding a suggested action or advice may be represented in form of messages on the screen of electronic device. In cases where there are several defects within an analyzed frame, multiple instructions or advice may be combined in accordance with the detected defects within a frame. In addition, algorithmic algorithms can be run to alter the image data that contains the defect. This altered image data can be displayed to the user. For example, the user can be prompted to accept the changes made by the selected algorithm.
In some embodiments, overshadowed areas within the image can be detected. In some embodiments, detecting overshadowed areas can be combined with detecting over lighted areas. The consequence of operations in detecting overshadowed areas is the same compared to detecting overlighted areas. A main difference consists in applying the algorithm to an inverted frame. This step is included in the stage of binarization. So specifically parameters for the binarization should be set accordingly.
The detection of possible defects within the frame (current view) may be performed by applying all the described detectors simultaneously or in sequence. This allows the detection of various different defects within the frame. Because the frame may contain several types of defects, for example the frame may be blurred with high level of noise and glares, the results of multiple detectors can be integrated together. For example, a weighted level of defect can be determined based upon each individual defect. So the analyzed frame is checked on the presence of two or more defects. Within the screen various visual representations of the defects may be displayed for user as described above. In another embodiment, the results of individual detectors can be shown. In yet another embodiment, each detector is applied separately. For example, it may be useful when user is interested in analyzing the frame only for one kind of the defects, for example for glare, noise or blur, etc.
As described above, an image can be captured when the degree of one or more distortions in a frame is below a predetermined threshold. In another embodiment, the frame may be selected from a current video sequence with a minimum level of distortion or, at least with a level of distortion that does not exceed some certain threshold value. For example, each frame of a video sequence can be analyzed for one or more distortions as described above. The frame with the smallest distortion or at least with distortions below a threshold can be selected and stored in a memory of an electronic device. Further, subsequent frames in a video sequence can be analyzed for defects based on the methods described above, and also compared with stored frames. Thus, if at a later time (t+1), the analyzed frame has fewer defects as compared with already stored in the frame memory from time (t), the frame (t+1) can be stored in place of the previous frame (t) in memory.
In an alternative embodiment, selection and analysis of a frame may be performed in non-realtime. For example, video with documents may be recorded and then at a later time the video can be analyzed. As an example, one or more frames that include documents may be extracted from the video and analyzed. The video can be stored in memory storage (604) of an electronic device or from other external sources, for instance from the Internet, memory cards, or may be downloaded from other electronic devices by Bluetooth, etc.
The hardware 1300 also typically receives a number of inputs and outputs for communicating information externally. For interface with a user or operator, the hardware 1300 usually includes one or more user input devices 1306 (e.g., a keyboard, a mouse, imaging device, scanner, etc.) and a one or more output devices 1308 (e.g., a Liquid Crystal Display (LCD) panel, a sound playback device (speaker). To embody various embodiments, the hardware 1300 must include at least one touch screen device (for example, a touch screen), an interactive whiteboard or any other device which allows the user to interact with a computer by touching areas on the screen. The keyboard is not obligatory in various disclosed embodiments.
For additional storage, the hardware 1300 may also include one or more mass storage devices 1310, e.g., floppy or other removable disk drive, a hard disk drive, a Direct Access Storage Device (DASD), an optical drive (e.g., a Compact Disk (CD) drive, a Digital Versatile Disk (DVD) drive, etc.) and/or a tape drive, among others. Furthermore, the hardware 1300 may include an interface with one or more networks 1312 (e.g., a local area network (LAN), a wide area network (WAN), a wireless network, and/or the Internet among others) to permit the communication of information with other computers coupled to the networks. It should be appreciated that the hardware 1300 typically includes suitable analog and/or digital interfaces between the processor 1302 and each of the components 1304, 1306, 1308, and 1312 as is well known in the art.
The hardware 1300 operates under the control of an operating system 1314, and executes various computer software applications 1316, components, programs, objects, modules, etc. to implement the techniques described above. In particular, the computer software applications will include the client dictionary application and also other installed applications for displaying text and/or text image content such a word processor, dedicated e-book reader, etc., in the case of the user device 102. Moreover, various applications, components, programs, objects, etc., collectively indicated by reference 1316 in
In general, the routines executed to implement the embodiments may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as “computer programs.” The computer programs typically comprise one or more instructions set at various times in various memory and storage devices in a computer, and that, when read and executed by one or more processors in a computer, cause the computer to perform operations necessary to execute elements of disclosed embodiments. Moreover, various embodiments have been described in the context of fully functioning computers and computer systems, those skilled in the art will appreciate that the various embodiments are capable of being distributed as a program product in a variety of forms, and that this applies equally regardless of the particular type of computer-readable media used to actually effect the distribution. Examples of computer-readable media include but are not limited to recordable type media such as volatile and non-volatile memory devices, floppy and other removable disks, hard disk drives, optical disks (e.g., Compact Disk Read-Only Memory (CD-ROMs), Digital Versatile Disks (DVDs), flash memory, etc.), among others. Another type of distribution may be implemented as Internet downloads.
In the above description numerous specific details are set forth for purposes of explanation. It will be apparent, however, to one skilled in the art that these specific details are merely examples. In other instances, structures and devices are shown only in block diagram form in order to avoid obscuring the teachings.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearance of the phrase “in one embodiment” in various places in the specification is not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments.
While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative and not restrictive of the disclosed embodiments and that these embodiments are not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art upon studying this disclosure. In an area of technology such as this, where growth is fast and further advancements are not easily foreseen, the disclosed embodiments may be readily modifiable in arrangement and detail as facilitated by enabling technological advancements without departing from the principals of the present disclosure.
Bocharov, Konstantin, Kostyukov, Mikhail
Patent | Priority | Assignee | Title |
11153474, | Dec 27 2017 | Ubicquia IQ LLC | Automated scope limiting for video analytics |
11893784, | May 14 2021 | ABBYY Development Inc. | Assessment of image quality for optical character recognition using machine learning |
11917325, | Dec 27 2017 | Ubicquia IQ LLC | Automated scope limiting for video analytics |
Patent | Priority | Assignee | Title |
9418407, | Dec 09 2008 | ABBYY DEVELOPMENT INC | Detecting glare in a frame of image data |
20060008122, | |||
20130329073, | |||
20140270528, | |||
20150254507, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jan 15 2015 | BOCHAROV, KONSTANTIN | ABBYY DEVELOPMENT LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 043583 | /0072 | |
Jan 15 2015 | KOSTYUKOV, MIKHAIL | ABBYY DEVELOPMENT LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 043583 | /0072 | |
Aug 04 2016 | ABBYY DEVELOPMENT LLC | (assignment on the face of the patent) | / | |||
Dec 08 2017 | ABBYY DEVELOPMENT LLC | ABBYY PRODUCTION LLC | MERGER SEE DOCUMENT FOR DETAILS | 048129 | /0558 |
Date | Maintenance Fee Events |
Jun 28 2021 | REM: Maintenance Fee Reminder Mailed. |
Dec 13 2021 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Nov 07 2020 | 4 years fee payment window open |
May 07 2021 | 6 months grace period start (w surcharge) |
Nov 07 2021 | patent expiry (for year 4) |
Nov 07 2023 | 2 years to revive unintentionally abandoned end. (for year 4) |
Nov 07 2024 | 8 years fee payment window open |
May 07 2025 | 6 months grace period start (w surcharge) |
Nov 07 2025 | patent expiry (for year 8) |
Nov 07 2027 | 2 years to revive unintentionally abandoned end. (for year 8) |
Nov 07 2028 | 12 years fee payment window open |
May 07 2029 | 6 months grace period start (w surcharge) |
Nov 07 2029 | patent expiry (for year 12) |
Nov 07 2031 | 2 years to revive unintentionally abandoned end. (for year 12) |