Descriptions are provided of various implementations of an automated tuning process configured to optimize a procedure for post-processing images captured by a camera sensor.
|
1. A method of processing a representation of a reference pattern, said method comprising:
capturing a representation of the reference pattern, which includes a plurality of features, each of the plurality of features having a different location within the reference pattern;
for each of said plurality of features of the reference pattern, determining a location of the feature within the captured representation;
based on the determined locations of the plurality of features in the captured representation, calculating a characterization of location transfer error; and
applying the characterization of location transfer error to the reference pattern to obtain a spatially modified reference pattern.
10. A computer-readable medium having tangible features that store machine-readable instructions which when executed by at least one processor cause the at least one processor to:
obtain a captured representation of the reference pattern, which includes a plurality of features, each of the plurality of features having a different location within the reference pattern;
determine a location of the feature within the captured representation;
based on the determined locations of the plurality of features in the captured representation, calculate a characterization of location transfer error; and
apply the characterization of location transfer error to the reference pattern to obtain a spatially modified reference pattern.
11. An apparatus for processing a representation of a reference pattern, said apparatus comprising:
means for capturing a representation of the reference pattern, which includes a plurality of features, each of the plurality of features having a different location within the reference pattern;
means for determining, for each of said plurality of features of the reference pattern, a location of the feature within the captured representation;
means for calculating a characterization of location transfer error based on the determined locations of the plurality of features in the captured representation; and means for applying the characterization of location transfer error to the reference pattern to obtain a spatially modified reference pattern.
20. An apparatus for processing a representation of a reference pattern, said apparatus comprising:
a sensor arranged to capture a representation of the reference pattern, which includes a plurality of features, each of the plurality of features having a different location within the reference pattern;
a feature detector configured to determine, for each of said plurality of features of the reference pattern, a location of the feature within the captured representation;
a transform calculator configured to calculate a characterization of location transfer error based on the determined locations of the plurality of features in the captured representation; and
a remapper configured to apply the characterization of location transfer error to the reference pattern to obtain a spatially modified reference pattern.
2. The method according to
3. The method according to
wherein the value of each of a plurality of pixels within the pattern for matching is based on (A) values of a plurality of pixels in a neighborhood of a corresponding pixel of the captured representation and (B) a value of a corresponding pixel of the spatially modified reference pattern.
4. The method according to
obtaining intensity transfer error information from the captured representation;
from the spatially modified reference pattern, obtaining information that indicates the locations of edges between regions of different intensities; and
producing a pattern for matching that is based on the intensity transfer error information and on the edge location information.
5. The method according to
performing a filtering operation on the captured representation to obtain a filtered representation; and evaluating at least one difference between the filtered representation and the pattern for matching.
6. The method according to
based on a result of said evaluating, modifying a value of at least one parameter of the filtering operation; and subsequent to said modifying, performing the filtering operation on the captured representation to obtain a second filtered representation.
7. The method according to
wherein each among a plurality of pixels of the pattern for matching corresponds to one among (A) a region of the first intensity and (B) a region of the second intensity, and
wherein, for each of said plurality of pixels of the pattern for matching, said edge location information indicates whether the pixel corresponds to a region of the first intensity or to a region of the second intensity.
8. The method according to
9. The method according to
12. The apparatus according to
13. The apparatus according to
wherein the value of each of a plurality of pixels within the pattern for matching is based on (A) values of a plurality of pixels in a neighborhood of a corresponding pixel of the captured representation and (B) a value of a corresponding pixel of the spatially modified reference pattern.
14. The apparatus according to
means for obtaining intensity transfer error information from the captured representation;
means for obtaining, from the spatially modified reference pattern, information that indicates the locations of edges between regions of different intensities; and
means for producing a pattern for matching that is based on the intensity transfer error information and on the edge location information.
15. The apparatus according to
means for performing a filtering operation on the captured representation to obtain a filtered representation; and means for evaluating at least one difference between the filtered representation and the pattern for matching.
16. The apparatus according to
means for modifying a value of at least one parameter of the filtering operation, based on a result of said evaluating; and
means for performing the filtering operation on the captured representation, subsequent to said modifying, to obtain a second filtered representation.
17. The apparatus according to
wherein each among a plurality of pixels of the pattern for matching corresponds to one among (A) a region of the first intensity and (B) a region of the second intensity, and
wherein, for each of said plurality of pixels of the pattern for matching, said edge location information indicates whether the pixel corresponds to a region of the first intensity or to a region of the second intensity.
18. The apparatus according to
19. The apparatus according to
21. The apparatus according to
22. The apparatus according to
wherein the value of each of a plurality of pixels within the pattern for matching is based on (A) values of a plurality of pixels in a neighborhood of a corresponding pixel of the captured representation and (B) a value of a corresponding pixel of the spatially modified reference pattern.
23. The apparatus according to
a calculator configured to obtain intensity transfer error information from the captured representation;
a comparator configured to obtain, from the spatially modified reference pattern, information that indicates the locations of edges between regions of different intensities; and
a pattern generator configured to produce a pattern for matching that is based on the intensity transfer error information and on the edge location information.
24. The apparatus according to
a filter configured to perform a filtering operation on the captured representation to obtain a filtered representation; and
a second calculator configured to evaluate at least one difference between the filtered representation and the pattern for matching.
25. The apparatus according to
value of at least one parameter of the filtering operation, based on a result of said evaluating, and
wherein said filter is configured to perform the filtering operation on the captured representation, subsequent to said modifying, to obtain a second filtered representation.
26. The apparatus according to
wherein each among a plurality of pixels of the pattern for matching corresponds to one among (A) a region of the first intensity and (B) a region of the second intensity, and
wherein, for each of said plurality of pixels of the pattern for matching, said edge location information indicates whether the pixel corresponds to a region of the first intensity or to a region of the second intensity.
27. The apparatus according to
28. The apparatus according to
|
The present Application for Patent claims priority to U.S. Provisional Pat. Appl. No. 61/176,731, entitled “SYSTEMS, METHODS, AND APPARATUS FOR GENERATION OF REINFORCEMENT SIGNAL, ARTIFACT EVALUATION, CAMERA TUNING, ECHO CANCELLATION, AND/OR REFERENCE IMAGE GENERATION,” filed May 8, 2009 and assigned to the assignee hereof.
1. Field
This disclosure relates to signal processing.
2. Background
Image information captured by image capture devices, such as digital still photo cameras, may be susceptible to noise as a result of physical limitations of the image sensors, interference from illumination sources, and the like. With the increased demand for smaller image capture devices, e.g., in multi-purpose mobile devices such as mobile wireless communication devices, comes the need for more compact image sensor modules. The decrease in the size of image sensor modules typically results in a significant increase in the amount of noise captured within the image information.
Image quality may vary greatly among different model lines of cellular telephone (“cellphone”) cameras and other inexpensive digital cameras. Unlike the models of a line of conventional digital cameras, different models of cellular telephones may have different optics and different sensors. Variations between high-quality optical systems of conventional cameras are typically much less than variations between the optical systems of cellphone cameras, which may be made of glass or plastic, may or may not include one or more optical coatings, and may have very different resolving powers. Similarly, the various models of a line of conventional digital cameras typically have the same sensor, possibly in different sizes, while cellphone cameras of the same line or even the same model may have very different sensors. As a result, images from different models of cellphone cameras will typically be much more different from one another than images from different models of conventional cameras.
In order to produce commercially acceptable image quality from a cellphone camera or other inexpensive digital camera, it may be desirable to perform a filtering operation (also called a “post-processing procedure”) on the captured image. Such an operation may be designed to enhance image quality by, for example, reducing artifacts and/or increasing edge sharpness. The process of determining an appropriate post-processing procedure for a particular camera is also called “camera tuning ” Typically, camera tuning is repeated whenever a new lens and/or sensor is used.
A method according to a general configuration includes capturing a representation of the reference pattern, which includes a plurality of features, each of the plurality of features having a different location within the reference pattern. This method also includes, for each of said plurality of features of the reference pattern, determining a location of the feature within the captured representation, and based on the determined locations of the plurality of features in the captured representation, calculating a characterization of location transfer error. This method also includes applying the characterization of location transfer error to the reference pattern to obtain a spatially modified reference pattern. Apparatus and other means for performing such a method, and computer-readable media having executable instructions for such a method, are also disclosed herein.
A method according to another general configuration includes capturing an image of the reference pattern, which includes regions of different intensities that meet to form edges, and performing a filtering operation on the captured image to obtain a sharpened image. This method also includes determining that a plurality of pixel values of the sharpened image are overshoot pixel values, and distinguishing at least two categories among the plurality of overshoot pixel values. Apparatus and other means for performing such a method, and computer-readable media having executable instructions for such a method, are also disclosed herein.
Camera tuning is typically time-consuming, expensive, and subjective. Traditionally, camera tuning is performed by engineers having advanced training in color science. Because a large number of parameters may be involved in the camera tuning process, the parametric space of the problem may be extremely high. Finding a near-optimal solution (e.g., a set of filter parameters that increases a desired quality metric without also creating artifacts) is a nontrivial problem and may take days or weeks. It may also be necessary to repeat the camera tuning process if the result produced by the tuning engineer does not match the subjective preference of her customer (e.g., the camera manufacturer). It may be desirable to automate a camera tuning process, to reduce the learning curve of such a process, and/or to render the result of such a process more quantifiable or less subjective.
Unless expressly limited by its context, the term “signal” is used herein to indicate any of its ordinary meanings, including a state of a memory location (or set of memory locations) as expressed on a wire, bus, or other transmission medium. Unless expressly limited by its context, the term “generating” is used herein to indicate any of its ordinary meanings, such as computing or otherwise producing. Unless expressly limited by its context, the term “calculating” is used herein to indicate any of its ordinary meanings, such as computing, evaluating, smoothing, and/or selecting from a plurality of values. Unless expressly limited by its context, the term “obtaining” is used to indicate any of its ordinary meanings, such as calculating, deriving, receiving (e.g., from an external device), and/or retrieving (e.g., from an array of storage elements). Where the term “comprising” is used in the present description and claims, it does not exclude other elements or operations. The term “based on” (as in “A is based on B”) is used to indicate any of its ordinary meanings, including the cases (i) “derived from” (e.g., “B is a precursor of A”), (ii) “based on at least” (e.g., “A is based on at least B”) and, if appropriate in the particular context, (iii) “equal to” (e.g., “A is equal to B” or “A is the same as B”). Similarly, the term “in response to” is used to indicate any of its ordinary meanings, including “in response to at least.”
Unless indicated otherwise, any disclosure of an operation of an apparatus having a particular feature is also expressly intended to disclose a method having an analogous feature (and vice versa), and any disclosure of an operation of an apparatus according to a particular configuration is also expressly intended to disclose a method according to an analogous configuration (and vice versa). The term “configuration” may be used in reference to a method, apparatus, and/or system as indicated by its particular context. The terms “method,” “process,” “procedure,” and “technique” are used generically and interchangeably unless otherwise indicated by the particular context. The terms “apparatus” and “device” are also used generically and interchangeably unless otherwise indicated by the particular context. The terms “element” and “module” are typically used to indicate a portion of a greater configuration. Unless expressly limited by its context, the term “system” is used herein to indicate any of its ordinary meanings, including “a group of elements that interact to serve a common purpose.” Any incorporation by reference of a portion of a document shall also be understood to incorporate definitions of terms or variables that are referenced within the portion, where such definitions appear elsewhere in the document, as well as any figures referenced in the incorporated portion.
In one example, method M100 is configured to optimize a filtering operation (or “post-processing procedure”) to be applied to a luminance channel of images captured by a camera sensor. Such a tuning process may be based on criteria such as noise, edge sharpness, and oversharpening artifacts. The tuning process may include selecting values of parameters that may oppose each other and whose combined effects on image quality are not known a priori. In other examples of imaging applications, method M100 is configured to optimize and/or automate tuning of one or more color channels. More generally, method M100 may be applied to optimize and/or automate processing of non-imaging signals, such as echo cancellation processing of an audio signal, or to optimize and/or automate other high-parametric-space problems.
Method M100 may be implemented as a reinforcement learning process. In an imaging application of such a process, the camera to be tuned is used to capture an image of a reference pattern, the post-processing procedure is used to process the captured image to obtain a processed (or “filtered”) image, and the values of one or more parameters of the post-processing procedure are adjusted until the captured image is sufficiently close to a reinforcement pattern (or until a maximum number of iterations is reached). The nature and quality of the reinforcement pattern may have a significant effect on the performance of the automated camera tuning process. If a flawed reinforcement pattern is used, the automated camera tuning process may produce a suboptimal result.
One example of such a tuning process uses a set of reference patterns that are designed to quantify different aspects of the post-processing procedure that is to be tuned. In an imaging application, for example, it may be desirable to use a set of reference patterns that abstractly represents a range of image features that are common, ubiquitous, and/or perceptually important in real-life images. It may be desirable to use a sufficient number of different reference patterns to cover the particular parametric space. It may also be desirable to have complete a priori knowledge of the contents of each reference pattern. For example, complete a priori knowledge of the reference pattern may be used to construct a pattern for matching as described herein.
It may be desirable for a set of reference patterns to include at least one pixel noise quantification pattern, at least one edge sharpness quantification pattern, and at least one oversharpening artifact quantification pattern.
A pixel noise quantification pattern provides information that may be used to quantify noise of the camera sensor (e.g., pattern noise). Such a pattern may be configured to include one or more regions that have a low pixel-to-pixel variance (e.g., to reduce sources of pixel-to-pixel response differences other than sensor noise). For example, such a pattern may include one or more flat-field regions. One such example is a pattern as shown in
It may be desirable for an edge sharpness quantification pattern, or a set of such patterns, to include a range of edges of different orientations and/or shapes. For example, a set of edge sharpness quantification patterns may be selected to provide a desired sampling of the range of spatial frequency features expected to be encountered during use of the camera. One example of such a set includes a pattern that contains horizontal and vertical edges (e.g., as shown in
It may be desirable for the edges of an edge sharpness quantification pattern to form concentric figures (e.g., as shown in
It may be desirable for each one or more (possibly all) of the edge sharpness quantification patterns to be bimodal. For example, such a feature may simplify a global process of distinguishing the sides of each edge that is to be analyzed. Such a pattern may include regions of substantially uniform low intensity that meet regions of substantially uniform high intensity to form edges. The term “substantially uniform” indicates uniformity within five, two or one percent.
Alternatively or additionally, it may be desirable for such a set to include high-contrast and low-contrast versions of one or more patterns (e.g., as shown in
It may be desirable for one or more of the edge sharpness quantification patterns to include edges that have the same fundamental spatial frequency (e.g., such that the minimum number of bright pixels between each respective pair of edges is the same as the minimum number of dark pixels between each respective pair of edges). Alternatively or additionally, it may be desirable for one or more of the edge sharpness quantification patterns to include two or more different fundamental spatial frequencies (e.g., such that a spatial distance between one pair of adjacent edge transitions is different than a spatial distance between another pair of adjacent edge transitions). It may be desirable to design the patterns in consideration of the particular capturing arrangement to be used, such that the distance between edges in the captured representations will be at least a minimum value (e.g., eight, ten, sixteen, or twenty pixels).
An oversharpening artifact quantification pattern (e.g., as shown in
One example of an oversharpening artifact quantification pattern (e.g., as shown in
Oversharpening artifacts may occur on curved edges as well as straight edges, and the set of reference patterns may include an oversharpening artifact quantification pattern with curved edges (e.g., as shown in
It may be desirable for the edges of an oversharpening artifact quantification pattern to have sufficient contrast to support the generation of oversharpening artifacts. If the edges in the pattern do not have sufficient contrast, they may fail to adequately represent edges in an image captured during a typical use of the camera, and the sharpening operation may fail to generate artifacts. On a normalized intensity scale of zero to one, for example, where zero indicates minimum intensity (e.g., zero luminance or reflectivity) and one indicates maximum intensity (e.g., full saturation or reflectivity), it may be desirable for the difference between the bright and dark sides of an edge to have a value in the range of from 0.2 to 0.6 (e.g., 0.4).
However, it may also be desirable for the dynamic range (e.g., the contrast) of the pattern to be sufficiently low that the generated artifacts may be characterized. Some degree of oversharpening is typically desirable (e.g., to enhance the perception of edge sharpness), and it may be desirable to allow for sufficient headroom to distinguish acceptable artifacts from unacceptable ones. If the portions of the captured image that correspond to bright sides of edges are too bright, for example, even acceptable overshoots caused by oversharpening may reach or exceed the saturation level, making it impossible to distinguish acceptable overshoots from unacceptable ones. Likewise, if the portions of the captured image that correspond to dark sides of edges are too dark, even acceptable undershoots caused by oversharpening may reach or fall below the zero-intensity level, making it impossible to distinguish acceptable undershoots from unacceptable ones. On a normalized intensity scale of zero to one, for example, it may be desirable for the dark side of an edge to have a value in the range of from 0.2 to 0.4 (e.g., 0.3) and for the bright side of an edge to have a value in the range of from 0.6 to 0.8 (e.g., 0.7).
Task T100 uses a sensor to be tuned (e.g., an instance of a camera model to be tuned) to capture a representation of a reference pattern. For example, multiple instances of task T110 may be performed to use the camera to be tuned to capture an image of each of the reference patterns. Typical captured image sizes for current cellphone cameras and other inexpensive digital cameras include VGA or 0.3-megapixel resolution (640×480 pixels), 1.3 megapixels (e.g., 1280×960 pixels), 2.0 megapixels (e.g., 1600×1200 pixels or UXGA resolution), 3.2 megapixels (e.g., 2048×1536 pixels), 4 megapixels (e.g., 2240×1680 pixels), 5 megapixels (e.g., 2560×1920 pixels) and 6 megapixels (e.g., 3032×2008 pixels). Task T100 may be performed by a human operator or may be partially or fully automated. For example, task T110 may include appropriately placing (whether by hand or automatically) a corresponding one of a set of reference patterns in front of the camera, which may be mounted in a fixed position, and initiating the corresponding capture manually or automatically.
Typically each reference pattern is prepared for capture by printing the pattern as a poster (e.g., of size 8.5 by 11, 11×17, 16×20, 18×24, or 24×36 inches). It may be desirable to reduce illumination nonuniformity (e.g., specular reflection) by, for example, mounting the printed pattern to a flat, rigid surface (e.g., foamcore). Likewise, it may be desirable for the surface of the pattern to be matte or nonreflective. It may be desirable to perform the image capture under uniform illumination conditions across the field of view (at least, over the pattern area that is to be analyzed). For example, it may be desirable to capture the images under uniform illumination (e.g., a diffuse illumination field within a light tent or light booth). Uniform illumination during capture may help to reduce variance among the values of pixels of the same mode. It may be desirable to capture separate images of each of one or more (possibly all) of the reference patterns under at least two different levels of illumination (e.g., bright illumination and dim illumination; or bright, normal, and dim illumination). In another example, each reference pattern is presented (under manual or automatic control) for capture on a screen, such as a plasma display panel or a liquid crystal display panel for transmissive display, or an electronic paper display (e.g., a Gyricon, electrophoretic, or electrowetting display) for reflective display. Again, it may be desirable in such case to capture separate images of each of one or more (possibly all) of the reference patterns under at least two different levels of panel illumination.
During the capture operation, it may be desirable for the reference pattern to be positioned with its center near the camera's optical axis. For a case in which the camera has a fixed-focus lens, it may be desirable to position the reference pattern within the depth of field or within the hyperfocal distance. As the hyperfocal distance may be several feet, it may be desirable for the reference pattern to be large enough (e.g., 24×36 inches) to cover at least a majority of the field of view at that distance. For a case in which the camera has an autofocus lens, it may be desirable to ensure that the lens is properly focused on the reference pattern, and to use a macro mode if the distance between the pattern and the lens is short. It may also be desirable to orient the reference pattern relative to the camera's optical axis to minimize roll or tilt (i.e., rotation of the pattern about the camera optical axis), pitch or skew (i.e., rotation of the pattern about an axis horizontally orthogonal to the camera optical axis), and yaw (i.e., rotation of the pattern about an axis vertically orthogonal to the camera optical axis). Examples of tilt, skew, and yaw are illustrated in FIG. A5A. Such positional error minimization may help to produce a better camera tuning result (e.g., by limiting the maximum amount of blur at any point in the captured image), especially for a case in which the post-processing procedure being tuned operates globally on the image being processed (e.g., as in the adaptive spatial filter described below).
The captured images may be obtained in an uncompressed format (e.g., RGB or YCbCr) or in a compressed format (e.g., JPEG). The capturing operation (i.e., task T110) may also include one or more pre-processing operations, such as flare reduction, data linearization, color balancing, color filter array (e.g., Bayer filter) demosaicing, etc.
Ideally, each captured image would be identical to the corresponding reference pattern. In reality, the captured image will differ from the reference pattern because of nonidealities in the response of the camera, which may be compensated at least partially by the post-processing procedure. However, the captured image will also differ from the reference pattern because of other nonidealities that are introduced by the capturing operation, such as nonidealities in the positioning and/or illumination of the reference pattern. It may be desirable to perform a different operation to compensate for the effects of such operation-specific nonidealities, rather than allowing the camera tuning process to attempt to compensate for them in the post-processing procedure.
One general approach would be to compensate for the effects of operation-specific nonidealities in the captured image by altering the captured image to be more similar to the reference pattern. However, it may be desirable to avoid adding such a layer of processing on the captured image. Altering the captured image may cause a loss of information, which may cause the camera tuning process to produce a suboptimal post-processing procedure (e.g., by obscuring image characteristics that are to be quantified).
Another general approach as described below is to quantify the effects of nonidealities other than those which are to be compensated by the post-processing procedure (e.g., operation-specific nonidealities), and to apply this information to the reference pattern to create a pattern for matching (also called a target or reinforcement pattern).
It may be desirable for the capture operation to be robust to some degree of variation in the relative positions and/or orientations of the camera and the reference pattern being captured. For example, it may be desirable for the capture operation to allow some variation in the distance between the camera and the reference pattern, some translation of the reference pattern relative to the camera optical axis, and some variation in the orientation of the reference pattern relative to the camera's optical axis. Such variation may include roll, pitch, and/or yaw of the reference pattern (e.g., as shown in
At least because of such variation in distance, translation, and orientation, the captured image would differ from the reference pattern even if the camera's lens and sensor were perfect. It may be desirable to quantify such variation. Unless the tuning method can be configured to compensate for such variation, for example, it may prevent a subsequent error analysis operation (e.g., task T400) from producing useful information.
The reference pattern may be implemented to include a number of well-defined features (also called “markers”) whose positions in the captured image may be used to define a correspondence between positions in the reference pattern and pixels in the captured image. Such markers may be implemented according to any desired design that supports sufficiently accurate location of the marker's position (e.g., to at least pixel resolution) in the captured image. The locations of the markers in the reference pattern, and the corresponding locations of the markers in the captured image, may then be used to characterize the position transfer error introduced by the capturing operation (e.g., as a spatial warping).
A marker may be implemented as a filled patch within (e.g., at the center of) a larger unfilled patch. In one example, each of two or more of the markers is implemented as a filled circular patch within a square patch, with a high contrast between the circular and square patches (e.g., a black circle within a white patch, or vice versa). In another example, each of two or more of the markers is implemented as a filled circular or square patch of a particular color (a “color dot”). In a further example, each of two or more of the markers is implemented as a color dot (e.g., a filled red circle) within a square white patch. In such cases, task T2010 may be configured to determine the position of the marker in the captured image as the center of gravity of the image of the filled patch. Task T2010 may be configured to detect the pixels of each color dot using a connected component labeling operation. It may be desirable to identify the location of each marker as a particular pixel of the captured image. Alternatively, it may be desirable to identify the location of each marker with subpixel precision. Typically the positions of at least four markers (arranged as points of a quadrilateral) are used to define a unique projective transformation between the captured image and the reference pattern. A projective transformation maps quadrilaterals to quadrilaterals.
For a case in which a reference pattern includes more than four markers, it may be desirable for the reference pattern to include two or more sets of markers (with each set having up to four markers) that are configured such that task T2010 may reliably distinguish the markers of each set in the captured image from markers of all other sets. Such markers may be implemented to include filled circular or square patches of a particular color (also called “color dots”). In one such example, the markers of a first set (e.g., of four markers) are color dots of a first color (e.g., red, green, or blue) within white squares, and the markers of a second set (e.g., of four markers) are color dots of a second color (e.g., a different one of red, green, and blue) within white squares. In a further example, the markers of a third set (e.g., of four markers) are color dots of a third color (e.g., the remaining one of red, green, and blue) within white squares.
The positions of the markers within the reference pattern, together with the corresponding positions of the markers within the captured image, represent information about position and orientation of the reference pattern relative to the camera that existed when the captured image was taken (also called “location transfer error information”), such as the particular variation in the distance between the camera and the reference pattern, translation of the reference pattern relative to the camera optical axis, and variation in the orientation of the reference pattern relative to the camera's optical axis. Task T2020 may be configured to extrapolate this information to map pixels of the captured image to corresponding positions within the reference pattern.
It may be desirable to place the markers of each set in the reference pattern at the corners of a quadrilateral. It may be desirable for such a quadrilateral to have the same aspect ratio as the camera sensor. When more than one set of markers is used, it may be desirable to scale and/or rotate the respective quadrilaterals relative to one another (e.g., such that the markers are distributed evenly in the reference pattern across the camera's field of view).
Task T2020 is configured to apply the characterization of location transfer error to the reference pattern to obtain a reinforcement pattern. For example, task T2010 may be configured to use the marker locations to calculate the characterization of location transfer error as a transformation, such as a projective transformation, and task T2020 may be configured to apply the calculated transformation to the reference pattern (e.g., using bilinear or bicubic interpolation) to generate the reinforcement pattern. In one example, tasks T2010 and T2020 are configured to use the functions cp2tform and imtransform of the software package MATLAB (The MathWorks, Inc., Natick, Mass.) to calculate and apply the transformation, respectively. Other software packages that may be used by other implementations of tasks T2010 and/or T2020 to perform such operations include GNU Octave (Free Software Foundation, Boston, Mass., www-dot-gnu-dot-org) and Mathematica (Wolfram Research, Champaign, Ill.).
In another example, task T2020 is configured to apply the characterization of location transfer error to the reference pattern to obtain a reinforcement pattern by combining the the characterization of location transfer error with an algorithmic description of the reference pattern. An algorithmic description of the reference pattern of
It may be desirable to provide a sufficient number of markers to support a transformation that will account for geometric distortion caused by imperfections of the camera's optical system (e.g., the camera lens), such as barrel distortion or pincushion distortion. Geometric distortion may cause straight lines to appear as curves in the captured image. Such a transformation may include a curve-fitting operation that is configured to fit a curve to the locations of markers in the captured representation.
Correction of location transfer error may help to align edges within the captured representation with corresponding edges within the transformed reference pattern. Some of the remaining error will be due to sensor imperfections. However, some (possibly more) of the remaining error will be due to nonuniformities in the illumination of the reference pattern during capture (also called “intensity transfer error”). Such error may arise due to, for example, a nonuniform illumination field, nonplanarities of the reference pattern, specular reflections from the reference pattern, and/or differences in illumination and/or orientation from one reference pattern to another.
It may be desirable to account for nonidealities in the transfer of light from the illumination source to the camera sensor via the reference pattern. For example, it may be desirable for the capture operation to be robust to some degree of variation in the field of illumination across the reference pattern being captured (or over an area to be analyzed, such as a region within the markers). Another source of intensity transfer error is camera lens rolloff (also called lens falloff or vignetting), which may cause the center of the captured image to be brighter than the periphery even under perfectly uniform illumination. Although cellphone cameras and other inexpensive cameras typically have fixed-aperture lenses, it is noted that lens rolloff may vary with lens aperture. Intensity transfer error may cause a captured image of a bimodal pattern to be non-bimodal and/or may cause a captured image of a region having uniform reflectance or brightness to have non-uniform luminance, even if the sensor used to capture the image has a perfectly uniform response across its surface.
In one example, task T2030 is configured to create a map of intensity transfer error information by performing a block-based mode filtering operation on the captured image. This operation includes dividing the captured image into a set of nonoverlapping blocks, calculating the mode value (i.e., the most frequently occurring value) for each block, and replacing the values of the pixels of each block with the corresponding mode value. Typical block sizes include 8×8, 16×16, 32×32, 8×16, and 16×8 pixels.
In another example, task T2030 is configured to create a map of intensity transfer error information by performing a window-based mode filtering operation on the captured image. This operation includes passing a window across the image, calculating the mode value at each position of the window, and replacing the value of the pixel at the center of the window with the calculated mode value. An image produced using a window-based mode filtering operation may more accurately reflect gradual changes in the illumination field across the image and/or may contain edges that are less noisy as compared to an image produced using a block-based mode filtering operation. Typical window sizes include 9×9, 51×51, 101×101, and 201×201 pixels.
In a further example, task T2030 is configured to create a map of intensity transfer error information by performing a combined block- and window-based mode filtering operation on the captured image such that all of the pixels within a block (e.g., of 3×3, 5×5, 7×7, or 9×9 pixels) at the center of each window are given the same value (e.g., the mode value as calculated over the window). Alternatively, task 2030 may also be configured to use a different nonlinear operation that smoothes local noise but also enhances edges.
Task T2040 is configured to combine the intensity transfer error information and the transformed reference pattern to produce a reinforcement pattern. For example, although a mode-filtered image may provide important information about the illumination over the reference pattern during capture, it may also contain inaccuracies at edge transitions. Task T2040 may be configured to combine the intensity transfer error information and the transformed reference pattern by imposing the transitions of the transformed reference pattern on the mode-filtered image. For example, task T2040 may be configured to produce the reinforcement pattern by moving the transitions of the mode-filtered image according to the positions of the corresponding transitions in the transformed reference pattern (e.g., by reconstructing the transition point in the mode-filtered image). Alternatively, task T2040 may be configured to combine the intensity transfer error information and the transformed reference pattern by replacing each pixel of the transformed reference pattern with a corresponding pixel of the mode-filtered image according to an intensity similarity criterion. Typically the reinforcement pattern is the same size as the captured image.
It may be desirable for task T2040 to use the transformed reference pattern to indicate the mode with which each pixel is associated (e.g., the low-intensity mode or the high-intensity mode for a bimodal pattern) and to use the mode-filtered image to indicate the particular pixel values that are locally associated with each mode. For example, task T2040 may be configured to use pixel values of the transformed reference pattern to detect mode errors in the mode-filtered image. In one such example, task T2040 detects a mode error when either a pixel value of the mode-filtered image is above a local intermodal threshold value and the corresponding pixel value of the transformed reference pattern indicates a low-intensity mode, or the pixel value of the mode-filtered image is below the local intermodal threshold value and the corresponding pixel value of the transformed reference pattern indicates a high-intensity mode In one example, the local intermodal threshold value is calculated over a 51×51-pixel window centered at the corresponding pixel of the captured image. Task T2040 may be configured to calculate the local intermodal threshold value according to any suitable technique, such as by applying Otsu's method to a window that includes the current pixel, by taking an average (e.g., the mean or median) of a window that includes the current pixel, etc.
It may be desirable for task T2045 to include calculating a histogram of the block or window. It is also possible for task T2045 to use a window of one size to calculate the local intermodal threshold value and a window of a different size to calculate the pixel value for the selected mode or, alternatively, to use the same window for each operation. It is also noted that mode errors are likely to occur only near edges within the mode-filtered image, such that it may be unnecessary to check for mode errors at pixels of the mode-filtered image that are far from edges (e.g., at least two, three, four, or five pixels away).
Task T300 performs a post-processing procedure on one or more of the captured representations to obtain corresponding processed representations. For example, task T310 may be configured to perform an adaptive spatial filtering operation such as any among the range of those described in U.S. Publ. Pat. Appl. No. 2008/0031538, entitled “ADAPTIVE SPATIAL FILTER FOR FILTERING IMAGE INFORMATION” (“Jiang”), which is incorporated by reference for purposes limited to disclosure of implementations of task T310 (including disclosure of apparatus configured to perform such a task). Such an adaptive spatial filter, which may have a large number of parameters, may be configured to reduce noise while finding an optimal balance between edge sharpness and artifact generation. It may be desirable for task T300 to perform the same post-processing procedure (i.e., using the same parameter values) on each among the set of captured representations. For a case in which multiple sets of captured images are taken under different respective capture conditions (e.g., different respective illumination levels), it may be desirable for task T300 to use a different corresponding set of parameter values for each set of captured representations.
The adaptive spatial filter may include a smoothing filter KS10. This smoothing filter may be implemented as an averaging filter, with the degree of averaging being controlled according to the value of a smoothing parameter. Expression (1) of Jiang describes the following example of such a filter: vs= 1/9*[1 1 1; 1 1 1; 1 1 1]+(1−p/100)*( 1/9)*[−1 −1 −1; −1 8 −1; −1 −1 −1], where vs indicates the smoothed pixel value. In this example, the smoothing parameter p is implemented to have a range of values from zero (zero percent smoothing) to one hundred (100 percent smoothing). In another example, the smoothing parameter is implemented to have a range of values from zero (no smoothing) to one (100 percent smoothing). In a further example, the smoothing parameter is implemented to have fifteen possible values over the range of from zero (no smoothing) to fourteen (100 percent smoothing). It is also possible to use a 5×5 smoothing filter rather than the 3×3 filter described above.
The adaptive spatial filter may include one or more filters for edge sharpening. For example, the adaptive spatial filter may include a filter KH10 configured to sharpen horizontal edges and a filter KV10 configured to sharpen vertical edges. Each such filter may be implemented as a square kernel (i.e., a square convolution mask). Expressions (2) and (3) of Jiang describe the following examples of 5×5 horizontal and vertical edge sharpening kernels, respectively: vH=⅙*[−1 −1 −1 −1 −1; −2 −2 −2 −2 −2; 6 6 6 6 6; −2 −2 −2 −2 −2; —1 −1 −1 −1 −1] and vV=⅙*[−1 −2 6 −2 −1; −1 −2 6 −2 −1; −1 −2 6 −2 −1; −1 −2 6 −2 −1; −1 −2 6 −2 −1], where vH indicates the horizontally sharpened pixel value and vV indicates the vertically sharpened pixel value. A respective gain factor for each sharpening filter (e.g., a horizontal sharpening gain factor kH and a vertical sharpening gain factor kV) may be implemented as a multiplier applied to the result of convolving the image with the corresponding kernel (e.g., to values vH and vV, respectively). In one particular example, horizontal sharpening gain factor kH and vertical sharpening gain factor kV are each implemented to have four hundred one possible values uniformly distributed over the range of from zero to 4.00 (i.e., 0.00, 0.01, 0.02, and so on). Other examples of sharpening filters that may be used in task T310 include unsharp masking.
The adaptive spatial filter may be configured to perform edge sharpening in a processing path that is parallel to a smoothing operation. Alternatively, the adaptive spatial filter may be configured to perform edge sharpening in series with a smoothing operation. For example, task T310 may be configured to perform a filtering operation as shown in
It may be desirable to configure task T310 such that at least the edge sharpening operations are performed only on the luminance channel (e.g., the Y channel) of the captured image. Applying different sharpening operations to the color channels of an image may cause color artifacts in the resulting sharpened image. Task T310 may be configured to operate only on the luminance channel of the image being processed.
As described above, task T310 may be configured to perform an edge sharpening operation using a fixed kernel and a variable gain factor value. In some cases (e.g., in gradient areas of an image), however, the use of a fixed sharpening kernel such as [−1 −2 6 −2 −1; −1 −2 6 −2 −1; −1 −2 6 −2 −1; −1 −2 6 −2 −1; −1 −2 6 −2 −1] may create cross-hatch artifacts in the processed image. In another example, task T300 may be configured to perform an edge sharpening operation using variable kernel values. In such case, it may be desirable to impose one or more constraints, such as symmetry around the middle row and column (i.e., top/bottom and left/right symmetry). This symmetry constraint reduces a 5×5 kernel to nine parameters. In one such example, task T310 is implemented to perform a vertical edge sharpening operation and a horizontal edge sharpening operation, with each operation having nine parameters that define a variable kernel. In a further example, a gain factor value as described above is also included, for a total of ten parameters for each edge sharpening operation.
The adaptive spatial filter may include a clipping or clamping operation CL10 on the output of an edge sharpening operation. For example, clamp CL10 may be configured to include a maximum positive sharpening limit e2 (e.g., such that sharpened values higher than e2 are clamped at e2). Alternatively or additionally, clamp CL10 may be configured to include a maximum negative sharpening limit e3 (e.g., such that sharpened values less than minus e3 are clamped at minus e3). Limits e2 and e3 may have the same value or different values, and such operations may be implemented on the outputs of both of the vertical and horizontal edge sharpening operations (e.g., as shown in
The adaptive spatial filter may include a thresholding operation CT10 that applies a sharpening subtraction threshold e1. Such an operation may be configured, for example, such that pixels of a sharpened image that have a magnitude of less than e1 are reduced to zero, and such that e1 is subtracted from pixels of a sharpened image that have a magnitude of at least e1. In one particular example, e1 is implemented to have 101 possible values over the range of from zero to 100 (e.g., 0, 1, 2, and so on). It may be desirable to configure task T310 to apply threshold e1 downstream of limits e2 and e3 (e.g., as shown in
Each instance of task T310 may be configured to process a corresponding one of a set of captured images (possibly with multiple instances of task T310 executing in parallel) and to produce a corresponding processed image from the captured image. Task T310 may be configured to produce each processed image as the sum of a corresponding smoothed image, vertical sharpened image, and horizontal sharpened image (e.g., as shown in
Task T400 evaluates one or more filtered representations, as produced by task T300, with respect to corresponding patterns for matching. It may be desirable to configure task T400 to use a consistent metric to describe the quality of a filtered representation. For example, task T400 may be configured to quantify the error between a filtered representation and the corresponding pattern for matching in terms of a cost function, whose terms may be weighted according to a decision regarding their relative importance or priority in the particular implementation. A term of the cost function may be weighted more heavily, for example, to increase the degree of negative reinforcement of the behavior represented by the term.
Respective instances of task T400 may be configured to evaluate a cost function for each among a set of filtered representations. In such case, task T400 may be configured to produce an overall cost function value as an average of the various evaluations. Alternatively or additionally, task T400 may be configured to evaluate a cost function based on information from two or more (possibly all) among the set of filtered representations. It may be desirable to implement multiple instances of method M110, each including multiple instances of task T410, to produce a separate evaluation measure for each set among multiple sets of processed images (e.g., with each set corresponding to a different illumination level).
In one example, task T410 uses a four-term cost function ƒ=w1p+w2n+w3o+w4s, where p denotes the pixel-by-pixel error of the processed image with respect to the reinforcement pattern, n denotes a noise measure, o denotes an oversharpening measure, s denotes a sharpness measure, and w1 to w4 denote weight factors. In one particular example, each of weights w1, w2, and w4 has a value of one. It may also be desirable to configure method M102 to combine evaluation results from the multiple instances of task T400. For example, it may be desirable to configure method M112 to combine evaluation results from the multiple instances of task T410 into a single cost function as described herein. In such case, one or more (possibly all) of the instances of task T410 may be configured to produce evaluation results for fewer than all of the terms of the cost function.
Task T410 may be configured to calculate the pixel-by-pixel error p as the RMS error between the processed image and the corresponding reinforcement pattern. In this case, task T410 may be configured to subtract the processed image from the reinforcement pattern, calculate the sum of the squares of the values of the resulting difference image, normalize the sum by the image size, and produce the value of p as the square root of the normalized sum.
Method M112 may be configured to calculate the value of p as an average error among the set of processed images. Alternatively, method M112 may be configured to calculate the value of p from fewer than all of the processed images. For example, method M112 may include an instance of task T410 that is configured to calculate the value of p as the RMS error of a noise quantification pattern (e.g., as shown in
The noise measure n quantifies the presence of noise in a processed image. As compared to the image sensors used in high-quality cameras, such as digital single-lens-reflex (SLR) cameras, the image sensors used in inexpensive cameras, such as cellphone cameras, tend to be much smaller and have smaller pixel sites. Consequently, images captured by the smaller sensors tend to have lower signal-to-noise ratios (SNRs). Task T410 may be configured to calculate the noise measure based on a variance in each of one or more patches of a noise quantification pattern. In one example as applied to
The oversharpening or “overshoot/undershoot” measure o quantifies the presence of oversharpening artifacts in the processed image. Typically, it is desirable to have some degree of oversharpening, as it tends to enhance the perception of sharpness. Too much oversharpening, however, may create excessive, visible, and objectionable overshoots and/or undershoots, and it may be desirable for the cost function to include a term that penalizes the presence of oversharpening artifacts.
Task T4010 takes a histogram of the selected region of the filtered image. Although it may be possible to evaluate sharpening performance over any one or more regions of the processed image, it may be desirable to select regions near the center of the reference pattern. For example, it may be desirable to configure the mask image to select a region that is close to the center of the reference pattern (e.g., near the center of the lens). The resolving power of the camera's lens is likely to be stronger near the optical axis than near the periphery, and optimizing a sharpening operation over a region near the periphery may cause oversharpening near the center.
In a typical case, the oversharpening artifact quantification reference pattern and the region selection are designed so that the histogram has two peaks, one corresponding to the brighter side of the edge or edges and the other corresponding to the darker side. For the brighter side, task T4020a determines a high-intensity baseline value that is an average of the corresponding histogram peak. In one example, the baseline value is the mode (i.e., the most frequently occurring value) of the corresponding histogram peak. In another example, the baseline value is the mean of the pixel values that are greater than a midpoint of the histogram.
Based on this baseline value, task T4030a determines corresponding first and second overshoot threshold values. Task T4030a may be configured to determine the first threshold value from a range of from 110% or 115% to 120%, 125%, or 130% of the baseline value. In one example, task T4030a calculates the first threshold value as 115% of the baseline value. Task T4030a may be configured to determine the second threshold value from a range of from 115% or 120% to 130%, 140%, or 150% of the baseline value. In one example, task T4030a calculates the second threshold value as 122.5% of the baseline value.
It may be desirable to calculate an oversharpening measure based on a number of oversharpening artifact pixels (e.g., a number of overshoot and/or undershoot pixels) in the region. In such case, it may be desirable to weight each artifact pixel according to the magnitude of its error. Based on the first and second threshold values, task T4040a calculates an overshoot measure. In one general example, task T4040a is configured to calculate the number of unacceptable pixels in the region as the number of pixels having values greater than (alternatively, not less than) the first threshold value and not greater than (alternatively, less than) the second threshold value. In this example, task T4040a may also be configured to calculate the number of very unacceptable pixels in the region as the number of pixels having values greater than (alternatively, not less than) the second threshold value.
Such an implementation of task T4040a may be configured to calculate the overshoot measure based on the numbers of unacceptable and very unacceptable pixels. For example, task T4040a may be configured to calculate the overshoot measure according to an expression such as Mo=wuoPuo+wvoPvo, where Mo denotes the overshoot measure, Puo denotes the number of unacceptable overshoot pixels, Pvo denotes the number of very unacceptable overshoot pixels, and wuo and wvo denote weight factors. Typically, the value of wvo is much larger (e.g., five to twenty times larger) than the value of wuo. In one example, the ratio wuo/wvo is equal to 0.075. It may be desirable to select the ratio wuo/wvo (possibly dynamically) according to an observed ratio of Puo and Pvo, such that the terms wuoPuo and wvoPvo may be expected to be approximately equal. In another particular example, the value of wuo is one and the value of wvo is twenty.
It may be desirable to configure task T4040a to normalize the pixel count numbers. For example, task T4040a may be configured to normalize the values Puo and Pvo by dividing each value by the total number of pixels in the region, or by dividing each value by the number of pixels having values greater than (alternatively, not less than) the baseline value. Task T4040a may be configured to use different values of wuo and/or wvo (e.g., according to a different value of wuo/wvo) for bright and dim illumination conditions.
Task T4040a may be configured to calculate the number of pixels whose values are greater than the baseline value but less than (alternatively, not greater than) the first threshold value, and to calculate the overshoot measure based on this number. Alternatively, such pixels may be considered to be acceptable. As noted above, for example, some degree of oversharpening by the post-processing procedure may be desirable (e.g., to enhance the perception of edge sharpness in the image), and task T4040a may be configured to ignore these pixels.
For the darker side, task T4020b determines a baseline value that is an average (e.g., the mode) of the corresponding histogram peak. Based on this baseline value, task T4030b determines corresponding first and second undershoot threshold values. Task T4030b may be configured to determine the first threshold value from a range of from 70%, 75%, or 80% to 85% or 90% of the baseline value. In one example, task T4030b calculates the first threshold value as 80% of the baseline value. Task T4030b may be configured to determine the second threshold value from a range of from 50%, 60%, or 70% to 80% or 85% of the baseline value. In one example, task T4030b calculates the second threshold value as 70% of the baseline value. Task T4030b may be configured to calculate the first and/or second threshold values differently for bright and dim illumination conditions.
Based on the first and second threshold values, task T4040b calculates an undershoot measure. In one general example, task T4040b is configured to calculate the number of unacceptable pixels in the region as the number of pixels having values less than (alternatively, not greater than) the first threshold value and not less than (alternatively, greater than) the second threshold value. In this example, task T4040b may also be configured to calculate the number of very unacceptable pixels in the region as the number of pixels having values less than (alternatively, not greater than) the second threshold value.
Such an implementation of task T4040b may be configured to calculate the undershoot measure based on the numbers of unacceptable and very unacceptable pixels. For example, task T4040b may be configured to calculate the undershoot measure according to an expression such as Mu=wuuPuu+wvuPvu, where Mu denotes the undershoot measure, Puu denotes the number of unacceptable undershoot pixels, Pvu denotes the number of very unacceptable undershoot pixels, and wuu and wvu denote weight factors. Typically, the value of wvu is much larger (e.g., five to twenty times larger) than the value of wuu. In one example, the ratio wuu/wvu is equal to 0.075. It may be desirable to select the ratio wuu/wvu (possibly dynamically) according to an observed ratio of Puu and Pvu, such that the terms wuuPuu and wvuPvu may be expected to be approximately equal. In another particular example, the value of wuu is one and the value of wvu is twenty.
It may be desirable to configure task T4040b to normalize the pixel count numbers. For example, task T4040b may be configured to normalize the values Puu and Pvu by dividing each value by the total number of pixels in the region, or by dividing each value by the number of pixels having values less than (alternatively, not greater than) the baseline value. Tasks T4040a and T4040b may be configured such that wuo is equal to wuu and/or that wvo is equal to wvu. Task T4040b may be configured to use different values of wuu and/or wvu (e.g., according to a different value of wuu/wvu) for bright and dim illumination conditions.
Task T4040b may be configured to calculate the number of pixels whose values are less than the baseline value but greater than (alternatively, not less than) the first threshold value, and to calculate the undershoot measure based on this number. Alternatively, such pixels may be considered to be acceptable. As noted above, for example, some degree of oversharpening by the post-processing procedure may be desirable (e.g., to enhance the perception of edge sharpness in the image), and task T4040b may be configured to ignore these pixels.
Task T412 may be configured to evaluate a cost function that has separate overshoot and undershoot terms. Alternatively, task T412 may be configured to produce a value for oversharpening measure o as the sum of overshoot measure Mo and undershoot measure Mu. Alternatively, task T412 may be configured to produce a value for oversharpening measure o as an average of overshoot measure Mo and undershoot measure Mu (possibly with each term being weighted by a corresponding number of pixels).
The edge slope or sharpness measure s quantifies the degree of sharpness (e.g., the slope) of the edges in a processed image. Although it may be desirable to avoid excessive overshoot and undershoot in the processed image, it may also be desirable to obtain a sufficiently steep slope at the edges. Ideally, the slope of each edge in the processed image would be equal to the slope of that edge in the corresponding reinforcement pattern. It may be desirable to calculate the sharpness measure as a measure of a difference between edge slope in the processed image and edge slope in the reinforcement pattern.
Each of the tasks T4050a and T4050b may be configured to calculate a derivative vector b from a cross-section vector a as a difference vector according to an expression such as b[i]=a[i+1]−a[i].
Task T416 also includes a task T4070 that calculates an absolute difference vector between the sorted derivative vectors and a task T4080 that calculates a value for measure s as an edge error based on information from the absolute difference vector.
It may be desirable to configure task T4080 to calculate the sharpness measure as an overall average of edge errors from more than one absolute difference vector (e.g., of four absolute difference vectors) that correspond to different corresponding cross-sections of the filtered image. In one particular implementation of task T416, instances of tasks T4050a, T4050b, T4060a, T4060b, and T4070 are performed for each of four vertical cross-sections and four horizontal cross-sections, and task T4080 is configured to calculate the sharpness measure as an average of edge errors from each of the eight cross-sections. Method M112 may be configured to calculate sharpness measure s by combining (e.g., averaging) results from multiple instances of task T416, each configured to evaluate a filtered image corresponding to a different respective reference pattern.
Task T500 optimizes parameter values of the post-processing procedure. For example, task T500 may be configured to optimize the parameter values such that the cost function of task T400 is minimized. As shown in
In an example as described above that provides for 15 possible values for p, 401 possible values for each of kH and kV, and 101 possible values for each of e1 and e2, the solution space includes over twenty-four billion different possibilities. This solution space becomes many times larger if the sharpening kernels are also opened for optimization. Performing an exhaustive search of such a high parametric space is not practical. Therefore it may be desirable to configure task T500 as an implementation of a technique that is suitable for optimizing a large parametric space. One such technique is simulated annealing. Other such techniques include neural networks and fuzzy logic. Another such technique is least-squares optimization, which maps the parameters to a least-squares problem. Some of these techniques may be sensitive to the initial condition (e.g., the set of parameter values used in the first iteration of task T300).
Another technique for optimizing a large parametric space that may be used in task T500 is a genetic algorithm. A genetic algorithm evaluates populations of genomes, where each genome is a genetic representation of the solution domain (i.e., of the parametric space) that encodes a particular set of parameter values. The genetic algorithm iteratively generates a new population of genomes from the previous population until a genome that satisfies a fitness criterion is found and/or until a maximum number of iterations have been performed. The population of genomes (also called “chromosomes”) at each iteration of the genetic algorithm is also called a “generation.”
In the case of a genetic algorithm, task T500 may be configured to encode the value of each parameter to be optimized as a string of bits that is long enough to cover the range of possible values of the parameter, and to concatenate the strings for the individual parameters to obtain a genome that is a string of bits having some length L. In this system, each string of length L defines a particular solution to the optimization problem (i.e., a particular data point in the parametric space). Of course, task T500 may also be configured to use a data structure other than a string.
In one example, task T510 is configured to encode the value of smoothing degree p (e.g., in a range of from zero to fourteen) in four bits, values for each of the sharpening kernel gains kH and kV (e.g., in a range of from zero to 4.00) in seven bits, and values for each of sharpening threshold e1 and sharpening limit e2 (e.g., in a range of from zero to 100) in seven bits to obtain a five-parameter, 32-bit genome. Such a genome has over four billion possible values. For a case in which the parameter set also includes nine parameters from each sharpening kernel, it may be desirable to use a genome that is up to about sixty-two bits long.
Method M100 may be configured to begin with an initial population that is randomly generated (e.g., random strings of length n). In this case, for each genome in the population, task T300 performs the post-processing operation according to the parameter values encoded in the genome to generate a corresponding filtered representation, and task T400 evaluates the cost function for that filtered representation to determine the fitness of the genome.
Subsequent iterations of task T500 perform a reproduction stage to generate the population of the next generation. It may be desirable to configure task T500 to provide every genome with the chance to reproduce (e.g., to preserve diversity and/or to avoid local minima). It may be desirable to configure task T500 such that the probability that a genome will reproduce is based on the corresponding cost function value for a processed image generated according to that genome from a captured image. In one example, task T500 is configured to select parents for the next generation according to a roulette selection process, in which the probability of selecting a genome to be a parent is inversely proportional to the corresponding cost function value. In another example, task T500 is configured to select parents for the next generation according to a tournament selection process, such that from each set of two or more genomes or “contestants” that are randomly selected from the population, the genome having the lowest corresponding cost function value is selected to be a parent.
Task T500 may be configured to generate a genome of the new generation by performing a crossover operation (also called recombination), which generates a new genome from parts of two or more parents. The new solution may represent a large jump in the parametric space. A crossover operation may be performed by switching from one parent genome to the other at a single point, which may be randomly selected. Alternatively, a crossover operation may be performed by switching from one parent genome to another at multiple points, which may also be randomly selected.
Alternatively or additionally, task T500 may be configured to generate a genome of the new generation by performing a mutation operation, which generates a new genome by flipping a random bit in the parent genome. Such mutation represents a small random walk in the parametric space. In another example of mutation, the substring that encodes a value of a parameter (also called an “allele”) is replaced by the value of a random variable (e.g., a uniformly distributed variable, a normally distributed (Gaussian) variable, etc.). A further example of mutation generates a new genome by flipping more than one random bit in the parent genome, or by inverting a randomly selected substring of the genome (also called “inversion”).
Task T500 may be configured to perform a random mix of crossover and mutation operations. Task T500 may also be configured to identify the one or more best genomes of a generation (e.g., the genomes having the lowest cost function values) and pass them unaltered to the next generation. Such a process is also known as “elitism”. In one example, task T500 is configured to generate the next generation with an eighty-percent probability of crossover and a probability of mutation that is less than ten percent. In such case, elitism may be used to select the remaining population.
In one particular example, method M100 is configured to generate and evaluate a population of one hundred genomes for each of one hundred generations, and to return the set of parameter values that are encoded in the genome associated with the lowest corresponding cost function value. Such an example of method M100 may also be configured to terminate early if a genome associated with a cost function value below a given threshold value is encountered. It will be understood that a genetic algorithm implementation of method M100 may be used to obtain a commercially acceptable or even superior result by sampling a very small fraction (much less than one percent, one-tenth of one percent, or even one-hundredth of one percent) of the parametric space.
Tasks T200, T300, T400, and T500 may be performed by an apparatus that includes at least one processor and storage, such as a desktop computer or a network of computers. Instances of post-processing task T300 and evaluation task T400 may be performed for each genome of a generation in parallel, without requiring communication with a processing path for any other genome of that generation. Consequently, multiple instances of each task may be implemented on a multi-core processor, such as a graphics processing unit (GPU). One example of such a processor is the GTX 200 GPU (Nvidia Corporation, Santa Clara, Calif.), which includes 240 stream processors. One example of an assembly that includes such a processor is the Nvidia GeForce GTX 280 graphics expansion card. A card having such a processor may be used to evaluate an entire generation at once. Multiple instances of such tasks may also be executed on a card having more than one multi-core GPU and/or on multiple multi-core processor cards (e.g., connected via a Scalable Link Interface (SLI) bridge). Additionally, multi-core processors and/or cards having such a processor may be stacked to process consecutive generations.
The sharpness of an image that is captured by an end-user of a camera may be related to the illumination conditions under which the image is taken. Outdoor scenes, for example, tend to be more brightly lit as well as more directly lit, and images taken under such conditions tend to have higher sharpness. Indoor scenes, on the other hand, tend to be more dimly lit as well as more diffusely lit, and images taken under such conditions tend to have edges that are less sharp. It may be desirable to perform multiple instances of method M110 or M112 (possibly in parallel) to generate different sets of parameter values for post-processing of images captured under different illumination conditions. For example, a set of captured images that were captured under dim illumination may be used to generate a set of parameter values optimized for post-processing of images captured under dim illumination, and a set of captured images that were captured under bright illumination may be used to generate a set of parameter values for post-processing of images captured under bright illumination. In a further example, sets of images taken under three different illumination levels are used to generate three different corresponding sets of parameter values for post-processing.
The principles described above may be used to implement an automated camera tuning process that has a reduced learning curve and can be used to obtain consistent and commercial-quality results without the need for a trained expert's subjective evaluation. Such a process may enable a relatively unskilled human operator to tune a camera by obtaining the captured images and initiating the automated tuning process (e.g., by indicating a file directory or other location in which the captured images are stored). In a further example, the task of obtaining the captured images may also be automated. It is noted that method M110 may include performing one or more other processing operation on the captured images between tasks T110 and T310 and/or between tasks T310 and T410 (e.g., image compression and decompression, color correction and/or conversion, and/or one or more other spatial and/or frequency-domain processing operations, such as automatic white balance).
The foregoing presentation of the described configurations is provided to enable any person skilled in the art to make or use the methods and other structures disclosed herein. The flowcharts, block diagrams, state diagrams, and other structures shown and described herein are examples only, and other variants of these structures are also within the scope of the disclosure. Various modifications to these configurations are possible, and the generic principles presented herein may be applied to other configurations as well. Thus, the present disclosure is not intended to be limited to the configurations shown above but rather is to be accorded the widest scope consistent with the principles and novel features disclosed in any fashion herein, including in the attached claims as filed, which form a part of the original disclosure.
Those of skill in the art will understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, and symbols that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
The various elements of an implementation of an apparatus as disclosed herein (e.g., apparatus MF100, as well as the numerous implementations of such apparatus and additional apparatus that are expressly disclosed herein by virtue of the descriptions of the various implementations of methods as disclosed herein) may be embodied in any combination of hardware, software, and/or firmware that is deemed suitable for the intended application. For example, such elements may be fabricated as electronic and/or optical devices residing, for example, on the same chip or among two or more chips in a chipset. One example of such a device is a fixed or programmable array of logic elements, such as transistors or logic gates, and any of these elements may be implemented as one or more such arrays. Any two or more, or even all, of these elements may be implemented within the same array or arrays. Such an array or arrays may be implemented within one or more chips (for example, within a chipset including two or more chips).
One or more elements of the various implementations of the apparatus disclosed herein may also be implemented in whole or in part as one or more sets of instructions arranged to execute on one or more fixed or programmable arrays of logic elements, such as microprocessors, embedded processors, IP cores, digital signal processors, FPGAs (field-programmable gate arrays), ASSPs (application-specific standard products), and ASICs (application-specific integrated circuits). Any of the various elements of an implementation of an apparatus as disclosed herein may also be embodied as one or more computers (e.g., machines including one or more arrays programmed to execute one or more sets or sequences of instructions, also called “processors”), and any two or more, or even all, of these elements may be implemented within the same such computer or computers.
A processor or other means for processing as disclosed herein may be fabricated as one or more electronic and/or optical devices residing, for example, on the same chip or among two or more chips in a chipset. One example of such a device is a fixed or programmable array of logic elements, such as transistors or logic gates, and any of these elements may be implemented as one or more such arrays. Such an array or arrays may be implemented within one or more chips (for example, within a chipset including two or more chips). Examples of such arrays include fixed or programmable arrays of logic elements, such as microprocessors, embedded processors, IP cores, DSPs, FPGAs, ASSPs, and ASICs. A processor or other means for processing as disclosed herein may also be embodied as one or more computers (e.g., machines including one or more arrays programmed to execute one or more sets or sequences of instructions) or other processors.
Those of skill will appreciate that the various illustrative modules, logical blocks, circuits, and operations described in connection with the configurations disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. Such modules, logical blocks, circuits, and operations may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an ASIC or ASSP, an FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to produce the configuration as disclosed herein. For example, such a configuration may be implemented at least in part as a hard-wired circuit, as a circuit configuration fabricated into an application-specific integrated circuit, or as a firmware program loaded into non-volatile storage or a software program loaded from or into a data storage medium as machine-readable code, such code being instructions executable by an array of logic elements such as a general purpose processor or other digital signal processing unit. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. A software module may reside in RAM (random-access memory), ROM (read-only memory), nonvolatile RAM (NVRAM) such as flash RAM, erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An illustrative storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
It is noted that the various methods disclosed herein (e.g., methods M100, M102, M110, and M112) may be performed by a array of logic elements such as a processor, and that the various elements of an apparatus as described herein may be implemented as modules designed to execute on such an array. As used herein, the term “module” or “sub-module” can refer to any method, apparatus, device, unit or computer-readable data storage medium that includes computer instructions (e.g., logical expressions) in software, hardware or firmware form. It is to be understood that multiple modules or systems can be combined into one module or system and one module or system can be separated into multiple modules or systems to perform the same functions. When implemented in software or other computer-executable instructions, the elements of a process are essentially the code segments to perform the related tasks, such as with routines, programs, objects, components, data structures, and the like. The term “software” should be understood to include source code, assembly language code, machine code, binary code, firmware, macrocode, microcode, any one or more sets or sequences of instructions executable by an array of logic elements, and any combination of such examples. The program or code segments can be stored in a processor readable medium or transmitted by a computer data signal embodied in a carrier wave over a transmission medium or communication link.
The implementations of methods, schemes, and techniques disclosed herein may also be tangibly embodied (for example, in one or more computer-readable media as listed herein) as one or more sets of instructions readable and/or executable by a machine including an array of logic elements (e.g., a processor, microprocessor, microcontroller, or other finite state machine). The term “computer-readable medium” may include any medium that can store or transfer information, including volatile, nonvolatile, removable and non-removable media. Examples of a computer-readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, an erasable ROM (EROM), a floppy diskette or other magnetic storage, a CD-ROM/DVD or other optical storage, a hard disk, or any other medium which can be used to store the desired information and which can be accessed. The computer data signal may include any signal that can propagate over a transmission medium such as electronic network channels, optical fibers, air, electromagnetic, RF links, etc. The code segments may be downloaded via computer networks such as the Internet or an intranet. In any case, the scope of the present disclosure should not be construed as limited by such embodiments.
Each of the tasks of the methods described herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. In a typical application of an implementation of a method as disclosed herein, an array of logic elements (e.g., logic gates) is configured to perform one, more than one, or even all of the various tasks of the method. One or more (possibly all) of the tasks may also be implemented as code (e.g., one or more sets of instructions), embodied in a computer program product (e.g., one or more data storage media such as disks, flash or other nonvolatile memory cards, semiconductor memory chips, etc.), that is readable and/or executable by a machine (e.g., a computer) including an array of logic elements (e.g., a processor, microprocessor, microcontroller, or other finite state machine). The tasks of an implementation of a method as disclosed herein may also be performed by more than one such array or machine.
In one or more exemplary embodiments, the operations described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, such operations may be stored on or transmitted over a computer-readable medium as one or more instructions or code. The term “computer-readable media” includes both computer storage media and communication media, including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise an array of storage elements, such as semiconductor memory (which may include without limitation dynamic or static RAM, ROM, EEPROM, and/or flash RAM), or ferroelectric, magnetoresistive, ovonic, polymeric, or phase-change memory; CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray Disc™ (Blu-Ray Disc Association, Universal City, CA), where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The elements of the various implementations of the modules, elements, and devices described herein may be fabricated as electronic and/or optical devices residing, for example, on the same chip or among two or more chips in a chipset. One example of such a device is a fixed or programmable array of logic elements, such as transistors or gates. One or more elements of the various implementations of the apparatus described herein may also be implemented in whole or in part as one or more sets of instructions arranged to execute on one or more fixed or programmable arrays of logic elements such as microprocessors, embedded processors, IP cores, digital signal processors, FPGAs, ASSPs, and ASICs.
It is possible for one or more elements of an implementation of an apparatus as described herein to be used to perform tasks or execute other sets of instructions that are not directly related to an operation of the apparatus, such as a task relating to another operation of a device or system in which the apparatus is embedded. It is also possible for one or more elements of an implementation of such an apparatus to have structure in common (e.g., a processor used to execute portions of code corresponding to different elements at different times, a set of instructions executed to perform tasks corresponding to different elements at different times, or an arrangement of electronic and/or optical devices performing operations for different elements at different times).
Chan, Victor H., Ravirala, Narayana S.
Patent | Priority | Assignee | Title |
8582914, | Mar 22 2010 | Nikon Corporation | Tone mapping with adaptive slope for image sharpening |
Patent | Priority | Assignee | Title |
5038388, | May 15 1989 | PLR IP HOLDINGS, LLC | Method for adaptively sharpening electronic images |
5303023, | Mar 26 1992 | Abbott Medical Optics Inc | Apparatus and method for inspecting a test lens, method of making a test lens |
6285799, | Dec 15 1998 | Xerox Corporation | Apparatus and method for measuring a two-dimensional point spread function of a digital image acquisition system |
7734111, | Apr 05 2006 | Fujitsu Limited | Image processing apparatus, image processing method, and computer product |
7903850, | Oct 14 2005 | Siemens Medical Solutions USA, Inc | Method and apparatus for pre-processing scans by combining enhancement and denoising as a preparation for segmenting the same |
8098948, | Dec 21 2007 | Qualcomm Incorporated | Method, apparatus, and system for reducing blurring in an image |
8223230, | May 08 2009 | Qualcomm Incorporated | Systems, methods, and apparatus for camera tuning and systems, methods, and apparatus for reference pattern generation |
20040071360, | |||
20060045377, | |||
20060093235, | |||
20060245665, | |||
20080019611, | |||
20080031538, | |||
20080152234, | |||
20080162061, | |||
WO2005096218, | |||
WO2007141368, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
May 06 2010 | Qualcomm Incorporated | (assignment on the face of the patent) | / | |||
Jul 20 2010 | CHAN, VICTOR H | Qualcomm Incorporated | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 024721 | /0287 | |
Jul 20 2010 | RAVIRALA, NARAYANA S | Qualcomm Incorporated | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 024721 | /0287 |
Date | Maintenance Fee Events |
Jan 13 2017 | REM: Maintenance Fee Reminder Mailed. |
Jun 04 2017 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Jun 04 2016 | 4 years fee payment window open |
Dec 04 2016 | 6 months grace period start (w surcharge) |
Jun 04 2017 | patent expiry (for year 4) |
Jun 04 2019 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jun 04 2020 | 8 years fee payment window open |
Dec 04 2020 | 6 months grace period start (w surcharge) |
Jun 04 2021 | patent expiry (for year 8) |
Jun 04 2023 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jun 04 2024 | 12 years fee payment window open |
Dec 04 2024 | 6 months grace period start (w surcharge) |
Jun 04 2025 | patent expiry (for year 12) |
Jun 04 2027 | 2 years to revive unintentionally abandoned end. (for year 12) |