A method, an apparatus, and a computer program product for three-dimensional shape estimation using constrained disparity propagation are presented. An act of receiving a stereoscopic pair of images of an area occupied by at least one object is performed. Next, pattern regions and non-pattern regions are detected in the images. An initial estimate of śpatial disparities between the pattern regions in the images is generated. The initial estimate is used to generate a subsequent estimate of the spatial disparities between the non-pattern regions. The subsequent estimate is used to generate further subsequent estimates of the spatial disparities using the disparity constraints until there is no change between the results of subsequent iterations, generating a final estimate of the spatial disparities. A disparity map of the area occupied by at least one object is generated from the final estimate of the three-dimensional shape.
|
16. A computer program product for object detection encoded on a computer-readable medium, having encoded therein, means for:
receiving a stereoscopic pair of images of an area occupied by at least one object;
detecting pattern regions and non-pattern regions within each of the pair of images using a texture filter;
generating an initial estimate of spatial disparities between the pattern regions within each of the pair of images;
using the initial estimate to generate a subsequent estimate of the spatial disparities between the non-pattern regions based on the spatial disparities between the pattern regions using disparity constraints;
iteratively using the subsequent estimate as the initial estimate in the means for using the initial estimate to generate a subsequent estimate in order to generate further subsequent estimates of the spatial disparities between the non-pattern regions based on the spatial disparities between the pattern regions using the disparity constraints until there is no change between the results of subsequent iterations, thereby generating a final estimate of the spatial disparities;
processing the final estimate to provide signals, wherein the signals comprise enable and disable signals of a vehicle;
generating a disparity map of the area occupied by at least one object from the final estimate of the three-dimensional shape; and
processing the disparity map with at least one classification algorithm to produce object class confidence data, said classification algorithm is selected from the group consisting of a trained c5 decision tree, a trained nonlinear discriminant analysis network, and a trained c5 g0">fuzzy aggregation network.
1. A method for three-dimensional shape estimation using constrained disparity propagation, the method comprising an act of causing a processor to perform operations of extracting image features, wherein the operations of extracting image features comprises the acts of:
receiving a stereoscopic pair of images of an area occupied by at least one object;
detecting pattern regions and non-pattern regions within each of the pair of images using a texture filter;
generating an initial estimate of spatial disparities between the pattern regions within each of the pair of images;
using the initial estimate to generate a subsequent estimate of the spatial disparities between the non-pattern regions based on the spatial disparities between the pattern regions using disparity constraints;
iteratively using the subsequent estimate as the initial estimate in the act of using the initial estimate to generate a subsequent estimate in order to generate further subsequent estimates of the spatial disparities between the non-pattern regions based on the spatial disparities between the pattern regions using the disparity constraints until there is no change between the results of subsequent iterations, thereby generating a final estimate of the spatial disparities;
processing the final estimate to provide signals, wherein the signals comprise enable and disable signals of a vehicle;
generating a disparity map of the area occupied by at least one object from the final estimate of the three-dimensional shape; and
processing the disparity map with at least one classification algorithm to produce object class confidence data, said classification algorithm is selected from the group consisting of a trained c5 decision tree, a trained nonlinear discriminant analysis network, and a trained c5 g0">fuzzy aggregation network.
6. An apparatus for object detection comprising a computer system including a processor, a memory coupled with the processor, an input coupled with the processor for receiving images, and an output coupled with the processor for outputting information based on an object estimation, wherein the computer system further comprises means for:
receiving a stereoscopic pair of images of an area occupied by at least one object;
detecting pattern regions and non-pattern regions within each of the pair of images using a texture filter;
generating an initial estimate of spatial disparities between the pattern regions within each of the pair of images;
using the initial estimate to generate a subsequent estimate of the spatial disparities between the non-pattern regions based on the spatial disparities between the pattern regions using disparity constraints;
iteratively using the subsequent estimate as the initial estimate in the means for using the initial estimate to generate a subsequent estimate in order to generate further subsequent estimates of the spatial disparities between the non-pattern regions based on the spatial disparities between the pattern regions using the disparity constraints until there is no change between the results of subsequent iterations, thereby generating a final estimate of the spatial disparities;
processing the final estimate to provide signals, wherein the signals comprise enable and disable signals of a vehicle;
generating a disparity map of the area occupied by at least one object from the final estimate of the three-dimensional shape; and
processing the disparity map with at least one classification algorithm to produce object class confidence data, said classification algorithm is selected from the group consisting of a trained c5 decision tree, a trained nonlinear discriminant analysis network, and a trained c5 g0">fuzzy aggregation network.
11. An apparatus for object detection comprising a computer system including a processor, a memory coupled with the processor, an input coupled with the processor for receiving images, and an output coupled with the processor for outputting information based on an object estimation, wherein the computer system further comprises:
a receiving module for receiving a stereoscopic pair of images of an area occupied by at least one object;
a pattern region detector for detecting pattern regions and non-pattern regions within each of the pair of images using a texture filter;
an estimator for generating an initial estimate of spatial disparities between the pattern regions within each of the pair of images;
a subsequent estimator using the initial estimate to generate a subsequent estimate of the spatial disparities between the non-pattern regions based on the spatial disparities between the pattern regions using disparity constraints;
an iterator for iteratively using the subsequent estimate as the initial estimate in the subsequent estimator to generate a subsequent estimate in order to generate further subsequent estimates of the spatial disparities between the non-pattern regions based on the spatial disparities between the pattern regions using the disparity constraints until there is no change between the results of subsequent iterations, thereby generating a final estimate of the spatial disparities;
an estimate processor for processing the final estimate to provide signals, wherein the signals comprise enable and disable signals of a vehicle;
a map generator for generating a disparity map of the area occupied by at least one object from the final estimate of the three-dimensional shape; and
an object class confidence data generator for processing the disparity map with at least one classification algorithm to produce object class confidence data, said classification algorithm is selected from the group consisting of a trained c5 decision tree, a trained nonlinear discriminant analysis network, and a trained c5 g0">fuzzy aggregation network.
2. The method of
3. The method of
4. The method of
5. The method of
7. The apparatus of
8. The apparatus of
9. The apparatus of
10. The apparatus of
12. The apparatus of
13. The apparatus of
14. The apparatus of
15. The apparatus of
17. The computer program product of
18. The computer program product of
19. The computer program product of
20. The computer program product of
|
This application claims the benefit of priority to utility application Ser. No. 10/132,875, filed in the United States on Apr. 24, 2002, and titled “High-Performance Sensor Fusion Architecture.”
(1) Technical Field
The present invention relates to general techniques for computer vision and object classification. More specifically, the present invention relates to use of constrained disparity with stereoscopic images to generate three-dimensional shape estimates.
(2) Discussion
Matching features in images of a scene taken from two different viewpoints (stereoscopic images) is a major problem in the art of machine vision systems. A variety of constraint-based solutions have been proposed, and have met with varying degrees of success because there is no general solution to the problem and a set of constraints applied to one scene may not be appropriate for other scenes.
In particular, a three-dimensional scene is reduced to two-dimensional images when captured by an imaging device. Thus, the pictures each contain less information than the original scene. In the art of stereoscopic imaging, a pair of two-dimensional images, each taken from a different location, is used to approximate the information from the original three-dimensional scene. However, in attempting to reconstruct the original three-dimensional scene, another problem arises—that of identifying corresponding points in the pair of images. In other words, for any individual pixel in one image, there are many potential corresponding pixels in the other image. Thus, a major difficulty in the art is determining corresponding pixels in the images so that disparities between the images may be determined in order to approximate the original three-dimensional scene.
Accordingly, there exists a need in the art for a fast and reliable system for approximating a three-dimensional scene from stereoscopic images. The present invention provides such a system, using a texture filter to generate a disparity estimate and refining the disparity estimate iteratively using disparity constraints until a final estimate is achieved.
The features of the present invention may be combined in many ways to produce a great variety of specific embodiments, as will be appreciated by those skilled in the art. Furthermore, the means which comprise the apparatus are analogous to the means present in computer program product embodiments and to the acts in the method embodiment.
The present invention teaches a method, an apparatus, and a computer program product for three-dimensional shape estimation using constrained disparity propagation. The invention performs an operation of receiving a stereoscopic pair of images of an area occupied by at least one object. Next, an operation of detecting pattern regions and non-pattern regions within each of the pair of images using a texture filter is performed. Then, an operation of generating an initial estimate of spatial disparities between the pattern regions within each of the pair of images is executed. Next, an operation of using the initial estimate to generate a subsequent estimate of the spatial disparities between the non-pattern regions based on the spatial disparities between the pattern regions using disparity constraints is performed. Subsequently, an operation of iteratively using the subsequent estimate as the initial estimate in the act of using the initial estimate to generate a subsequent estimate in order to generate further subsequent estimates of the spatial disparities between the non-pattern regions based on the spatial disparities between the pattern regions using the disparity constraints until there is no change between the results of subsequent iterations, thereby generating a final estimate of the spatial disparities is performed. Finally, an operation of generating a disparity map of the area occupied by at least one object from the final estimate of the three-dimensional shape is executed.
In a further aspect, the at least one object comprises a vehicle occupant and the area comprises a vehicle occupancy area, the method further comprising an act of processing the final estimate to provide signals to vehicle systems.
In a still further aspect, the invention performs an operation of capturing images from a sensor selected from a group consisting of CMOS vision sensors and CCD vision sensors.
In a yet further aspect, the signals comprise airbag enable and disable signals.
In another aspect, the act of extracting image features further comprises operations of processing the disparity map with at least one of the classification algorithms to produce object class confidence data.
In yet another aspect, the classification algorithm is selected from the group consisting of a trained C5 decision tree, a trained Nonlinear Discriminant Analysis network, and a trained Fuzzy Aggregation Network.
In a further aspect, an operation of data fusion is performed on the object class confidence data to produce a detected object estimate.
It will be appreciated by one of skill in the art that the “operations” of the present invention just discussed have parallels in acts of a method, and in modules or means of an apparatus or computer program product and that various combinations of these features can be made without departing from the spirit and scope of the present invention.
The objects, features and advantages of the present invention will be apparent from the following detailed descriptions of the preferred embodiment of the invention in conjunction with reference to the following drawings, where:
The present invention relates general techniques to computer vision and object classification. More specifically, the present invention relates to use of constrained disparity with stereoscopic images to generate three-dimensional shape estimates. The following description, taken in conjunction with the referenced drawings, is presented to enable one of ordinary skill in the art to make and use the invention and to incorporate it in the context of particular applications. Various modifications, as well as a variety of uses in different applications, will be readily apparent to those skilled in the art, and the general principles defined herein, may be applied to a wide range of embodiments. Thus, the present invention is not intended to be limited to the embodiments presented, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein. Furthermore it should be noted that unless explicitly stated otherwise, the figures included herein are illustrated diagrammatically and without any specific scale, as they are provided as qualitative illustrations of the concept of the present invention.
In order to provide a working frame of reference, first a glossary of terms used in the description and claims is given as a central resource for the reader. Next, a discussion of various principal embodiments of the present invention is provided. Finally, a discussion is provided to give an understanding of the specific details.
Before describing the specific details of the present invention, a centralized location is provided in which various terms used herein and in the claims are defined. The glossary provided is intended to provide the reader with a general understanding of the intended meaning of the terms, but is not intended to convey the entire scope of each term. Rather, the glossary is intended to supplement the rest of the specification in more clearly explaining the terms used.
Means—The term “means” as used with respect to this invention generally indicates a set of operations to be performed on a computer, and may represent pieces of a whole program or individual, separable, software modules. Non-limiting examples of “means” include computer program code (source or object code) and “hard-coded” electronics (i.e. computer operations coded into a computer chip). The “means” may be stored in the memory of a computer or on a computer readable medium.
Object: The term object as used herein is generally intended to indicate a physical object within a scene for which a three-dimensional estimate is desired.
Sensor: The term sensor as used herein generally includes any imaging sensor such as, but not limited to, optical sensors such as CCD cameras
The present invention has three principal “principal” embodiments. The first is a system for determining operator distraction, typically in the form of a computer system operating software or in the form of a “hard-coded” instruction set. This system may be incorporated into various devices, non-limiting examples of which include vehicular warning systems, three-dimensional modeling systems, and robotic vision systems incorporated in manufacturing plants. Information from the system may also be incorporated/fused with data from other sensors or systems to provide more robust information regarding the object observed. The second principal embodiment is a method, typically in the form of software, operated using a data processing system (computer). The third principal embodiment is a computer program product. The computer program product generally represents computer readable code stored on a computer readable medium such as an optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape. Other, non-limiting examples of computer readable media include hard disks, read only memory (ROM), and flash-type memories. These embodiments will be described in more detail below.
A block diagram depicting the components of a computer system used in the present invention is provided in
An illustrative diagram of a computer program product embodying the present invention is depicted in
As shown in
Several choices are available for the selection of a texture filter 304 for recognizing regions of the image characterized by salient features, and the present invention may use any of them as suited for a particular embodiment. In an embodiment, a simple texture filter 304 was used for estimating the mean variance of the rows of a selected region of interest. This choice reflects the necessity of identifying those image blocks that present a large enough contrast along the direction of the disparity search. For a particular N×M region of the image I, where (x,y) are image coordinates and σ2 is variable, the following quantity:
is compared against a threshold defining the minimum variance considered sufficient to identify a salient image feature. Once the whole image has been filtered and the regions rich in texture have been identified, the disparity values of the selected regions are estimated minimizing the following cost function in order to perform the matching between the left and right image (where d(opt) is an optimal value and d is an offset distance):
During the disparity estimation act, a neighborhood density map is created. This structure consists of a matrix of the same size as the disparity map, whose entries specify the number of points in an 8-connected neighborhood where a disparity estimate is available. An example of such a structure is depicted in
Once the initialization stage is completed, the disparity information available is propagated starting from the denser neighborhoods. Two types of constraints are enforced during the disparity propagation. The first type of constraint ensures that the order of appearance of a set of image features along the x direction is preserved. This condition, even though it is not always satisfied, is generally true in most situations where the camera's base distance is sufficiently small. An example of allowed and prohibited orders of appearance of image elements is depicted in
d(i)min=d(i−1)−ε and (3)
d(i)max=d(i+1)+ε, where (4)
ε=|xi−xi−1|. (5)
This type of constraint is very useful for avoiding false matches of regions with similar features.
The local smoothness of the disparity map is enforced by the second type of propagation constraint. An example of a 3×3 neighborhood where the disparity of the central element has to be estimated is shown in
dmin=min{dεNij}−η and (6)
dmax=max{dεNij}+η, where (7)
Nij={pm,n}, m=i−1, . . . , i+1, and n=j−1, . . . , j+1. (8)
The concept is that very large local fluctuations of the disparity estimates are more often due to matching errors that to true sharp variations. As a consequence, enforcing a certain degree of smoothness in the disparity map greatly improves the signal-to-noise ratio of the estimates. In an embodiment, the parameter η is forced equal to zero, thus bounding the search interval of possible disparities between the minimum and maximum disparity currently measured in the neighborhood.
Additional constraints to the disparity value propagation based on the local statistics of the grayscale image are enforced. This feature attempts to lower the amount of artifacts due to poor illumination conditions and poorly textured areas of the image, and addresses the issue of propagation of disparity values across object boundaries. In an effort to reduce the artifacts across the boundaries between highly textured objects and poorly textured objects, some local statistics of the regions of interest used to perform the disparity estimation are computed. This is done for the entire frame, during the initialization stage of the algorithm. The iterative propagation technique takes advantage of the computed statistics to enforce an additional constraint to the estimation process. The results obtained by applying the algorithm to several sample images have produced a net improvement in the disparity map quality in the proximity of object boundaries and a sharp reduction in the amount of artifacts present in the disparity map.
Because the disparity estimation is carried out in an iterative fashion, the mismatch value for a particular image block and a particular disparity value usually need to be evaluated several times. The brute force computation of such cost function every time its evaluation is required is computationally inefficient. For this reason, an ad-hoc caching technique is preferred in order to greatly reduce the system time-response and provide a considerable increase in the speed of the estimation process. The quantity that is stored in the cache is the mismatch measure for a given disparity value in a particular point of the disparity grid. In a series of simulations, the number of hits in the cache averaged over 80%, demonstrating the usefulness of the technique.
The last component of the Disparity Map module 300 is an automatic vertical calibration subroutine (not shown in the figure). This functionality is particularly useful for compensating for hardware calibration tolerances. While an undetected horizontal offset between the two cameras usually causes only limited errors in the disparity evaluation, the presence of even a small vertical offset can be catastrophic. The rapid performance degradation of the matching algorithm when such an offset is present is a very well-known problem that affects all stereo camera-based ranging systems.
A fully automated vertical calibration subroutine is based on the principle that the number of correctly matched image features during the initialization stage is maximized when there is no vertical offset between the left and right image. The algorithm is run during system initialization and after periodically to check for the consistency of the estimate.
(b) System Performance
An example of a stereo image pair is shown in
Srinivasa, Narayan, Owechko, Yuri, Medasani, Swarup, Boscolo, Riccardo
Patent | Priority | Assignee | Title |
10127635, | May 30 2016 | Novatek Microelectronics Corp | Method and device for image noise estimation and image capture apparatus |
10586345, | May 17 2015 | INUITIVE LTD | Method for estimating aggregation results for generating three dimensional images |
7715591, | Apr 24 2002 | HRL Laboratories, LLC | High-performance sensor fusion architecture |
7756325, | Jun 20 2005 | University of Basel | Estimating 3D shape and texture of a 3D object based on a 2D image of the 3D object |
9916524, | Feb 17 2016 | Microsoft Technology Licensing, LLC | Determining depth from structured light using trained classifiers |
Patent | Priority | Assignee | Title |
5247583, | Nov 01 1989 | Hitachi, LTD | Image segmentation method and apparatus therefor |
5309522, | Jun 30 1992 | WACHOVIA BANK, NATIONAL | Stereoscopic determination of terrain elevation |
5561431, | Oct 24 1994 | Lockheed Martin Corporation | Wavelet transform implemented classification of sensor data |
5995644, | Jun 30 1997 | Siemens Medical Solutions USA, Inc | Robust and automatic adjustment of display window width and center for MR images |
6026340, | Sep 30 1998 | The Robert Bosch Corporation | Automotive occupant sensor system and method of operation by sensor fusion |
6078253, | Feb 04 1997 | Hubbel Incorporated | Occupancy sensor and method of operating same |
6295373, | Apr 04 1997 | Raytheon Company | Polynomial filters for higher order correlation and multi-input information fusion |
6307959, | Jul 14 1999 | SRI International | Method and apparatus for estimating scene structure and ego-motion from multiple images of a scene using correlation |
6452870, | Feb 08 1996 | AMERICAN VEHICULAR SCIENCES LLC | Methods for controlling deployment of an occupant restraint in a vehicle and determining whether the occupant is a child seat |
6529809, | Feb 06 1997 | AMERICAN VEHICULAR SCIENCES LLC | Method of developing a system for identifying the presence and orientation of an object in a vehicle |
6701005, | Apr 29 2000 | Cognex Corporation | Method and apparatus for three-dimensional object segmentation |
6754379, | Sep 25 1998 | Apple Computer, Inc. | Aligning rectilinear images in 3D through projective registration and calibration |
6914599, | Jan 14 1998 | Canon Kabushiki Kaisha | Image processing apparatus |
6961443, | Jun 15 2000 | Joyson Safety Systems Acquisition LLC | Occupant sensor |
7289662, | Dec 07 2002 | HRL Laboratories, LLC | Method and apparatus for apparatus for generating three-dimensional models from uncalibrated views |
7505841, | Sep 02 2005 | Aptiv Technologies AG | Vision-based occupant classification method and system for controlling airbag deployment in a vehicle restraint system |
20020001398, | |||
20020134151, | |||
20020191837, | |||
20030091228, | |||
20030204384, | |||
20040022418, | |||
20040022437, | |||
20040105579, | |||
20040240754, | |||
20040247158, | |||
20040252862, | |||
20040252863, | |||
20040252864, | |||
20050196015, | |||
20050196035, | |||
20050201591, | |||
20070055427, | |||
WO230717, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Oct 20 2003 | MEDASANI, SWARUP | HRL Laboratories, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 016257 | /0389 | |
Oct 20 2003 | SRINIVASA, NARAYAN | HRL Laboratories, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 016257 | /0389 | |
Oct 21 2003 | BOSCOLO, RICCARDO | HRL Laboratories, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 016257 | /0389 | |
May 17 2004 | OWECHKO, YURI | HRL Laboratories, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 016257 | /0389 | |
Feb 04 2005 | HRL Laboratories, LLC | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Jan 04 2013 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jan 05 2017 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Mar 01 2021 | REM: Maintenance Fee Reminder Mailed. |
Aug 16 2021 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Jul 14 2012 | 4 years fee payment window open |
Jan 14 2013 | 6 months grace period start (w surcharge) |
Jul 14 2013 | patent expiry (for year 4) |
Jul 14 2015 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 14 2016 | 8 years fee payment window open |
Jan 14 2017 | 6 months grace period start (w surcharge) |
Jul 14 2017 | patent expiry (for year 8) |
Jul 14 2019 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 14 2020 | 12 years fee payment window open |
Jan 14 2021 | 6 months grace period start (w surcharge) |
Jul 14 2021 | patent expiry (for year 12) |
Jul 14 2023 | 2 years to revive unintentionally abandoned end. (for year 12) |