A method for generating an augmented reality image from first and second images, wherein at least a portion of at least one of the first and the second image is captured from a real scene, identifies a confidence region in which a confident determination as to which of the first and second image to render in that region of the augmented reality image can be made, and identifies an uncertainty region in which it is uncertain as to which of the first and second image to render in that region of the augmented reality image. At least one blending factor value in the uncertainty region is determined based upon a similarity between a first colour value in the uncertainty region and a second colour value in the confidence region, and an augmented reality image is generated by combining, in the uncertainty region, the first and second images using the at least one blending factor value.
|
1. A method for generating an augmented reality image from first and second images, wherein at least a portion of at least one of the first and the second image is captured from a real scene, and wherein at least one of the first and the second image comprises an image of a virtual object, the method comprising:
determining at least one blending factor value in a first region based upon a similarity between a first value in the first region and at least one second value in a second region, wherein in the second region a determination as to which of the first and second image to render in that region of the augmented reality image can be made; and
generating an augmented reality image by combining, in the first region of the augmented reality image, the first and second images using the at least one blending factor value.
15. An augmented reality processing system for generating an augmented reality image from first and second images, wherein at least a portion of at least one of the first and the second image is captured from a real scene, and wherein at least one of the first and the second image comprises an image of a virtual object, the augmented reality processing system comprising:
a blend module arranged to determine at least one blending factor value in a first region based upon a similarity between a first value in the first region and at least one second value in a second region, wherein in the second region a determination as to which of the first and second image to render in that region of the augmented reality image can be made; and
an image generation module arranged to generate an augmented reality image by combining, in the first region of the augmented reality image, the first and second images using the at least one blending factor value.
20. A non-transitory computer readable storage medium having stored thereon computer readable instructions that, when executed at a computer system, cause the computer system to generate an augmented reality image from first and second images by:
determining at least one blending factor value in a first region based upon a similarity between a first value in the first region and at least one second value in a second region, wherein in the second region a determination as to which of the first and second image to render in that region of the augmented reality image can be made; and
generating an augmented reality image by combining, in the first region of the augmented reality image, the first and second images using the at least one blending factor value, wherein at least a portion of at least one of the first and the second image is captured from a real scene, and wherein at least one of the first and the second image comprises an image of a virtual object.
2. The method of
3. The method of
5. The method of
identifying the first region; and
identifying the second region.
6. The method according to
7. The method according to
8. The method according to
9. The method according to
first confidence regions are confidence regions in which a colour value of the first image is to be rendered in the corresponding region of the augmented reality image; and
second confidence regions are confidence regions in which a colour value of the second image is to be rendered in the corresponding region of the augmented reality image.
10. The method according to
11. The method according to
12. The method according to
13. The method according to
14. The method according to
16. The augmented reality processing system of
17. The augmented reality processing system of
18. An augmented reality processing system according to
19. The augmented reality processing system according to
|
This application is a continuation under 35 U.S.C. 120 of copending application Ser. No. 15/623,690 filed Jun. 15, 2017, which claims foreign priority under 35 U.S.C. 119 from United Kingdom Application No. 1610657.7 filed Jun. 17, 2016.
In augmented reality (AR) systems, a pair of images may be combined so as to create an augmented reality image in which the content from one image appears to be included in the other image. In some arrangements, an image of a virtual object and an image of a real scene are combined so as to generate an augmented reality image in which it appears to the viewer that the virtual object has been included in the real scene. The augmented reality image may be generated by rendering the virtual object within a portion of the captured real scene. When rendering the virtual object in the scene, the relative depth of the virtual object with respect to the depth of the scene is considered to ensure that portions of the virtual object and/or the scene are correctly occluded with respect to one another. By occluding the images in this way, a realistic portrayal of the virtual object within the scene can be achieved.
Techniques for generating an augmented reality image of a scene typically require the generation of an accurate model of the real scene by accurately determining depth values for the objects within the real scene from a specified viewpoint. By generating an accurate model, it is possible to compare depth values and determine portions of the two images to be occluded. Determining the correct occlusion in an augmented reality image may be performed by comparing corresponding depth values for the image of the virtual object and the image of the real scene and rendering, for each pixel of the scene, a pixel using a colour selected from the colour at that pixel in the image of the virtual object or the real scene based upon which image has the smaller depth value with respect to the specified viewpoint, i.e. is closer to the specified viewpoint.
To avoid potential errors with depth measurements, a scene can be scanned from a number of positions to generate an accurate map of the scene. For example, camera tracking may be performed whilst moving a camera around a scene and capturing a number of different scans or images of the scene. However, such processing is time consuming and processor intensive and is not suited to real-time applications, where the position of objects in the scene may vary or where it may be necessary to update the model of the real scene regularly. For example, in video applications where a constant frame rate is required there may be insufficient time between frames to update a scene model.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
One approach for capturing depth information regarding a scene is to make use of a capture device that is configured to capture information relating to both colour and depth, such as an RGBD camera. An RGBD camera is configured to capture Red, Green, and Blue (RGB) colour information as well as depth information, D.
The inventors have recognised that depth information obtained from a single point, for example using such a capture device, may not be complete or the depth information may be imprecise for portions of the captured scene. For example, there may be portions of an image captured by an RGBD camera where a corresponding depth measurement could not have been obtained. This may occur where a surface of an object in the scene is absorptive of the signals used for depth measurement or is positioned at an angle relative to a capture device such that a depth signal is not directed back to a sensor of the capture device with sufficient signal strength for a precise depth measurement to be captured. Similarly, it may be that the depth information is detected but is inaccurate, for example due to signal reflections or interference, which can result in noise in the captured depth measurement.
For time-critical applications, the inventors have recognised that it is sometimes useful to make use of depth data captured at a single point rather than generate a complex model of a scene when generating an augmented reality image. However, the result of errors in the depth information or an absence of depth information for a particular portion of the scene is that, when generating an augmented reality image, erroneous depth comparison results may occur. These erroneous depth comparison results may result in portions of one image being incorrectly rendered or occluded leading to visual artefacts in a resultant rendered augmented reality image.
The present application seeks to address these above problems and to provide an improved approach to generating an augmented reality image.
There is provided a method for generating an augmented reality image from first and second images, wherein at least a portion of at least one of the first and the second image is captured from a real scene, the method comprising: identifying a confidence region in which a confident determination as to which of the first and second image to render in that region of the augmented reality image can be made; identifying an uncertainty region in which it is uncertain as to which of the first and second image to render in that region of the augmented reality image; determining at least one blending factor value in the uncertainty region based upon a similarity between a first colour value in the uncertainty region and a second colour value in the confidence region; and generating an augmented reality image by combining, in the uncertainty region, the first and second images using the at least one blending factor value.
There is provided an augmented reality processing system for generating for generating an augmented reality image from first and second images, wherein at least a portion of at least one of the first and the second image is captured from a real scene, the augmented reality processing system comprising: a confidence identification module arranged to identify a confidence region in which a confident determination as to which of the first and second image to render in that region of the augmented reality image can be made; an uncertainty identification module arranged to identify an uncertainty region in which it is uncertain as to which of the first and second image to render in that region of the augmented reality image; a blend module arranged to determine at least one blending factor value in the uncertainty region based upon a similarity between a first colour value in the uncertainty region and a second colour value in the confidence region; and an image generation module arranged to generate an augmented reality image by combining, in the uncertainty region, the first and second images using the at least one blending factor value.
The first image and the second image may each have associated therewith a plurality of colour values and a corresponding plurality of depth values. The confident determination as to which of the first image and the second image to render based upon a depth value of the first image and the corresponding depth value of the second image in the confidence region may be made as part of the method or processing system. The uncertainty region may be identified based upon at least one depth value associated with at least one of the first and the second image, the at least one depth value being derived from a depth value captured from a real scene. The at least one depth value may be derived from an unreliable or incomplete depth value captured from the real scene. Identifying the uncertainty region may be based on the absolute depth value of the unreliable or incomplete depth value, where the absolute depth value is indicative of an erroneously captured depth value. Identifying the uncertainty region may comprise comparing at least one depth value in the region in the first image with a depth value in a corresponding region of the second image and determining that the difference in compared depth values is below a predetermined threshold.
At least one initial blending factor value in a confidence region may be generated based upon the confident determination and generating the augmented reality image may further comprise combining a corresponding colour value of the first image and a corresponding colour value of the second image in the confidence region using the at least one initial blending factor value. The at least one blending factor value and the at least one initial blending factor value may form part of an alpha matte for combining colour values of the first image and the second image to generate the augmented reality image.
Making the confident determination may be based upon at least one depth value associated with the first image and at least one corresponding depth value associated with the second image. Making the confident determination may be based upon a comparison of at least one depth value associated with a region of the first image with at least one depth value associated with a corresponding region of the second image and wherein the result of the comparison exceeds a predetermined threshold.
Identifying a confidence region further may comprise categorising portions of the confidence region as first confidence regions or second confidence regions, wherein: first confidence regions are confidence regions in which a colour value of the first image is to be rendered in the corresponding region of the augmented reality image; and second confidence regions are confidence regions in which a colour value of the second image is to be rendered in the corresponding region of the augmented reality image. Re-categorising an uncertainty region as either a first confidence region or a second confidence region may be performed prior to determining at least one blending factor value. Re-categorising an uncertainty region as a first confidence region may be based on the uncertainty region being surrounded by a first confidence region. Re-categorising an uncertainty region as a first confidence region may be based upon a determination that confidence regions within a predetermined distance of the uncertainty region are first confidence regions. Re-categorising an uncertainty region as a second confidence region may be based on the uncertainty region being surrounded by a second confidence region. Re-categorising an uncertainty region as a second confidence region based upon a determination that confidence regions within a predetermined distance of the uncertainty region are second confidence regions.
Colour and depth values of at least one of the first and second images from the real scene may be captured using a capture device. Determining the at least one blending factor value may be further based upon the distance between the position of the first colour value and the position of the at least one second colour value. The first colour value and the colour value may be colour values associated with a single image of the first image and the second image. The first colour value and the second colour values may be colour values captured from a real scene.
The uncertainty region may comprise a plurality of sample points and determining the at least one blending factor value may further comprise processing, for each of a plurality of sample points in the uncertainty region, that sample point based upon colour values at a plurality of sample points located in a confidence region within a predetermined distance of that sample point. When processing a sample point in the uncertainty region, a zero weight may be assigned to other sampling points within the predetermined distance of the sampling point that are in an uncertainty region.
Determining the at least one blending factor value for the uncertainty region may comprise applying a cross bilateral filter to each of a plurality of sample points in the uncertainty region based upon: the distance between the position of the first colour value and the position of the at least one second colour value; and the similarity in colour value between the first colour value and the at least one second colour value. The plurality of sample points used in the cross bilateral filter may be identified using a filter kernel and sample points within the filter kernel may be used to determine the at least one blending factor value for the uncertainty region. Comparing the similarity in colour values may comprise comparing the difference in colour for each of a red, a green, and a blue colour component at a sample point with the corresponding colour component at each sample point within the filter kernel that is in the confidence region. The distance between the position of the first colour value and the position of the at least one second colour value may be determined based upon the number of sample points between the first colour value and the at least one second colour value.
Determining at least one blending factor value in the uncertainty region may be based upon a similarity between a colour value in the uncertainty region and at least one corresponding colour value of each of the first image and the second image. Determining at least one blending factor value may be based upon generating at least two error metrics for the uncertainty region, and minimising the error metrics to determine the at least one blending factor value in the uncertainty region. A first error metric may be a gradient metric indicative of gradient changes in blending factor values and a second error metric may be a colour metric indicative of colour similarities between colour values in the uncertainty region and colour values in the confidence region. A plurality of initial blending factor values may be determined and the gradient metric may be determined based upon variations in the plurality of initial blending factor values across an alpha matte.
The colour metric may estimate the probability that a colour value in the uncertainty region forms part of an image of the real scene in front of a virtual object or forms part of the image of the real scene behind a virtual object based on neighbouring colour values. Colour values used in determining the colour metric may be selected by performing a dilation operation on the uncertainty region. The at least two error metrics may be minimised using an iterative method. The colour metric may be formed from fitted Mixture of Gaussian models for each of the part of the real scene in front of a virtual object and the part of the real scene behind a virtual object. The error metrics may be minimised using the Levenberg-Marquardt algorithm to determine the at least one blending factor in the uncertainty region.
An erosion operation may be performed on the confidence region, wherein the erosion operation is configured to re-categorise at least one portion of the confidence region as forming a part of an uncertainty region.
The first image may be a captured image of a real scene and the second image may be an image of a virtual object.
An augmented reality video sequence may be generated from a first video sequence and a further image, the method comprising performing, for a plurality of frames of the video sequence, the above-discussed methods, wherein the first image corresponds to the frame of the first video sequence and the second image corresponds to the further image.
The augmented reality processing system may be embodied in hardware on an integrated circuit. There may be provided a method of manufacturing, at an integrated circuit manufacturing system, an augmented reality processing system. There may be provided an integrated circuit definition dataset that, when processed in an integrated circuit manufacturing system, configures the system to manufacture a augmented reality processing system. There may be provided a non-transitory computer readable storage medium having stored thereon a computer readable description of an integrated circuit that, when processed, causes a layout processing system to generate a circuit layout description used in an integrated circuit manufacturing system to manufacture a augmented reality processing system.
There may be provided an integrated circuit manufacturing system comprising: a non-transitory computer readable storage medium having stored thereon a computer readable integrated circuit description that describes the augmented reality processing system; a layout processing system configured to process the integrated circuit description so as to generate a circuit layout description of an integrated circuit embodying the augmented reality processing system; and an integrated circuit generation system configured to manufacture the augmented reality processing system according to the circuit layout description.
There may be provided computer program code for performing a method as claimed in any preceding claim. There may be provided non-transitory computer readable storage medium having stored thereon computer readable instructions that, when executed at a computer system, cause the computer system to perform the method as claimed in any preceding claim.
The above features may be combined as appropriate, as would be apparent to a skilled person, and may be combined with any of the aspects of the examples described herein.
Examples will now be described in detail with reference to the accompanying drawings in which:
The accompanying drawings illustrate various examples. The skilled person will appreciate that the illustrated element boundaries (e.g., boxes, groups of boxes, or other shapes) in the drawings represent one example of the boundaries. It may be that in some examples, one element may be designed as multiple elements or that multiple elements may be designed as one element. Common reference numerals are used throughout the figures, where appropriate, to indicate similar features.
The following description is presented by way of example to enable a person skilled in the art to make and use the invention. The present invention is not limited to the embodiments described herein and various modifications to the disclosed embodiments will be apparent to those skilled in the art.
Embodiments will now be described by way of example only.
In
The second image may be an image of one or more virtual objects taken from the same viewpoint as the first image. As such, the virtual object or the real objects within the scene may be correctly occluded by the other depending on their relative depths with respect to the selected viewpoint.
Alternatively, a “virtual” viewpoint may be generated for the first image by interpolating between depth measurements taken from multiple real viewpoints. For example, the capture device 200 may obtain two different depth measurements from two different viewpoints and the augmented reality processing system may interpolate between the two depth measurements to obtain depth measurements for the first image that correspond with the depth measurements for the second image. However, for the purposes of describing the following examples, it will be assumed that the viewpoint from which the augmented reality image 600 is rendered is the same as the position of the capture device 200 from which the colour values and depth values of the scene are captured.
When capturing the depth values of the scene, the capture device 200 determines the distance of the scene from the capture device 200 at a plurality of different sampling points across the scene to create an array of depth values. For example, the capture device 200 may comprise a first sensor 210 and a second sensor 220. The first sensor 210 is configured to capture a first image 500 of the scene 100 comprising a plurality of first colour values. The captured colour values in the first image 500 may be in the form of RGB colour values for a plurality of pixels which combine to represent the scene from the viewpoint of the capture device 200, for example in an array of pixels each having a red, green, and blue colour component value.
The second sensor 220 is configured to capture depth values from the scene 100. For example, the second sensor 220 may be an Infra-Red (IR) sensor configured to detect the presence of IR signals. The capture device 200 may also include an IR transmitter (not shown) configured to transmit IR signals which are then captured by the second sensor 220. By measuring the received IR signals, it is possible to make a determination regarding depth information at each of a plurality of sampling points across the scene 100.
The sampling points at which a depth value is captured may correspond with the points at which colour information is captured. Put another way, portions of the scene at which depth measurements are captured may have a one-to-one correspondence with pixels of an image of the scene captured by the capture device 200. The depth information may be captured such that it directly corresponds in position to the colour information.
For example, depth information may be obtained for an area of the scene with the same resolution as colour information by the capture device. In some arrangements, depth information may be obtained at a lower resolution than the colour values and thus some degree of interpolation may be required to ensure a correspondence in values. Similarly, the depth information may be at a higher resolution than the colour information. It will be assumed for the purposes of describing the following examples that the resolution of the captured depth values and the captured colour values are the same.
The IR signals transmitted by the capture device 200 may be transmitted in a grid and time-of-flight information may be used to determine the depth value at each sampling point captured by second sensor 220. For example, the second sensor 220 may be configured to detect the phase of the IR signal. In this way, it is the surface of the scene which is closest to the capture device at a particular sampling point which is used to determine the depth value at that sampling point. For example, the face of object 102 that is closest to the capture 200 defines the depth value for sampling points that fall upon that face.
As can be seen from the plan view of scene 100 in
It can be seen that the largest of the three depth values captured by the capture device 200 along line Y-Y1 are captured where neither the first object 102 nor the second object 103 is located, for example in the area between the two objects at depth dmax. Accordingly, the captured depth measurement is based upon the measured depth of the background of the scene 100. Another measured depth is dobj1 which corresponds with the depth values determined at sampling points which fall on the surface of first object 102, i.e. the portion of line Y-Y1 that intersects first object 102. Similarly, depth dobj2 corresponds with sampling points of the depth value that fall on second object 103. As illustrated in
It will be noted that, in the example of
Another example of a different scene 110 is provided in relation to
In
For example, third object 112 and fourth object 113 overlap in the y dimension at a portion of the respective objects across an area indicated by reference number 150. Accordingly, depth values obtained by the capture device 200 at sampling points in region 150 are determined based upon the distance of third object 112 from the capture device rather than the distance of fourth object 113, since the third object 112 is closer to the viewpoint at the capture device 200 than the fourth object 113, with respect to dimension z. Similarly, the colour values captured by capture device 200 over region 150 will be the captured colour of the third object 112 rather than the fourth object 113.
In this way, a portion 223 of fourth object 113 that is located within region 150 is occluded from the viewpoint at the capture device 200 by the portion of third object 112 that also falls within region 150.
In more detail,
Accordingly, in traditional image capture systems, only colour information relating to real objects in a scene that are not occluded by other real objects is captured by the image sensor. In augmented reality processing systems, it is desirable to re-create this behaviour for arrangements in which virtual objects are to be rendered in a manner that allows the virtual objects to appear to behave in the same manner as a real object to provide added realism to the augmented reality image.
Accordingly, it is desirable for virtual objects to be accurately rendered to generate an augmented reality image of a scene. To generate an augmented reality image, it is determined whether or not portions of a virtual object in an image should be occluded based upon where in an image of a real scene a virtual object is to be rendered. In this way, the virtual object is effectively processed in a similar manner as described above by determining which of the real elements and the virtual elements (e.g. the real and virtual objects) have the least depth values. However, as discussed above, errors in determining the depth values may affect the perceived realism of the augmented reality image.
Returning to the scene 100 illustrated in
To generate the augmented reality image 600, the position and depth values of a virtual object 104 with respect to the scene are determined and the virtual object 104 is rendered with respect to a selected viewpoint of the scene 100.
A plurality of depth values are determined for the virtual object 104 at a plurality of sampling points, where each depth value represents a depth of a portion of the virtual object 104 with respect to the viewpoint. A correspondence between the position of a sampling point of the depth of the virtual object 104 and the position of a sampling point of the depth of the real scene 100 may be formed to allow a comparison of real and virtual depth values. If there is no direct correspondence, it may be necessary to interpolate between depth sampling points in order to compare the virtual and real depths.
For the sake of simplicity in describing the following examples, it is assumed that there is a direct correspondence between the sampling point of each real colour value, each real depth value, each virtual colour value, and each virtual depth value. For example, each virtual depth value of the virtual object 104 is directly associated with a pixel of an image of the real object 104 from the defined viewpoint. In turn, each captured depth value of the real scene 100 from the viewpoint is also associated with a depth value for the real scene 100. Similarly, colour values (e.g. RGB colour values) of an image of the virtual object may be associated in position with colour values of an image of the virtual object. Accordingly, there may be a direct correspondence in position between pixels of an image of the scene and pixels of the rendered virtual object.
A depth map comprising a plurality of depth values for different portions of the image 550 of the virtual object 104 is determined. By comparing the captured depth values in the depth map for the virtual object 104 with depth values at corresponding positions of the real scene 100 it is possible to determine which captured colour value is to be rendered. For example, where the depth value of the image of the virtual object is less (the virtual object is closer) than the depth value of the image of the real scene, the colour value at that position of the virtual object is rendered. Similarly, where the depth value of the image of the real scene is less (the real scene is closer), the colour value at that position of the image of the real scene is rendered.
In
Accordingly, the portion of object 103, indicated by area 423, which overlaps along dimension y the virtual object 104 is occluded from view in the augmented reality image 600 and is thus not rendered in the augmented reality image 600. As such, since no other object or element in scene 400 is located between the capture device 200 and the virtual object 104, the corresponding portion of virtual object 104 that falls within that area would be rendered in the resultant augmented reality scene 600 instead of the real object 103. Put another way, the colour value at a corresponding position of the second image 550 of the virtual object would be used in the augmented reality image 600.
Similarly, real object 102 within scene 400 overlaps in the y dimension with virtual object 104. Since real object 102 is closer (i.e. has a smaller depth value) to the capture device 200 in direction z than the determined distance values of the rendered virtual object, a portion of object 104 is occluded from view by the capture device 200. Specifically, area 424 indicated in
As such, the finally rendered augmented reality image 600 would be formed of portions of a first image 500 of the real scene 400 and portions of the second image 550 of the virtual object 104. For example, for a row of pixels of the augmented reality image 600 that falls along line Y-Y3, pixels that have a correspondence with depth values dobj2 are rendered using the corresponding colour values of the second image 550 of the virtual object 104 since the virtual object 104 has a lower depth value (i.e. is closer) than the corresponding depth value of the image 500 of the real scene 400. Similarly, for pixels that correspond with depth values dobj1, dobj3, and dmax, the colour values associated with pixels of the first image 500 of the real scene 400 are used since the corresponding depth values of the real scene 400 are less (i.e. they are closer) than the depth values of the second image 550 of the virtual object 104. Alternatively, it may that the virtual object 104 is not present at the location of some pixels (e.g. the pixels located at dmax locations) and thus the corresponding colour values of the real scene are used.
A representation of the first 500 and second 550 images is illustrated in
As can be seen from
In this way, by comparing the depth values of the virtual object from a viewpoint with corresponding depth values of an image of the real scene, the occlusion of the virtual object within an augmented reality image 600 is performed and an accurate augmented reality image 600 may be generated.
In practice, erroneous determinations as to which image should be selected for rendering may occur. These errors may occur because the determination of the depth values for a first image 500 of the real scene may not be accurately obtained by the capture device 200.
Scene 700 also illustrates a real object 103 and virtual object 104b which overlap in dimension y. A first image may be captured of the real scene 700 to include the real objects 102 and 103 and a second image may be rendered that includes the virtual objects 104a and 104b.
If the amount of variation in the captured depth value of the real scene 700 exceeds the difference in depth values, at a particular position, between real and virtual objects, then erroneous rendering the resultant augmented reality image 800 may occur. For example, where the virtual object 104a and the real object 102 have similar depth values, the augmented reality processing system may erroneously determine that the colour values of the real object 102 should, at particular pixels, be rendered instead of the colour values of the virtual object 104a. This is illustrated with respect to area 724 in which objects 102 and 104a overlap in dimension y and may result in erroneous rendering. The result of such an erroneous determination is that the overlapping areas may appear disjointed or noisy, with visual artefacts of the real scene being incorrectly rendered within the rendered virtual object in the resultant rendered augmented reality image 700.
For example,
However, as shown in rendered image 800, portions of the image have been incorrectly rendered, such as regions 823 and 824, or have not been rendered at all, such as the shaded region 825. For example, region 824 has incorrectly rendered using the colour values of virtual object 104a rather than the correct colour values of real object 102 due to errors in the depth measurements of the first image of the real scene 700. Similarly, region 823 of the rendered scene has been incorrectly rendered using the colour values of object 104b rather than the corresponding colour values of the rendered virtual object 103. As such, regions 823 and 824 appear as spurious artefacts in the resultant image.
Similarly, due to the orientation or the specular properties of the real object 103 in the real scene 700, it may not be possible for depth values to be obtained for portions of the scene and thus an error occurs such that neither colour is rendered, such as region 825. As such, the depth values captured of the real scene may be incomplete. In the example of
Since depth values captured from a real scene may include errors, any subsequent comparison of depth values in that region may result in erroneous rendering. This may occur across the entire surface of the region 825 of rendered object 103 or instead may be occur on a pixel-by-pixel basis, such that the resultant erroneous rendering is either large-scale or sporadic, as set out above in respect of issues caused by the degree of noise in the depth measurements for the real scene.
To overcome these issues, there is a need for the augmented reality processing system to reduce the impact of an erroneous determination as to which of a plurality of images to render in a region of an image. Where real objects and virtual objects overlap in depth in a scene, and a portion of one object is occluded by the other, the boundaries between the two objects can appear visually disturbing to the determination of depth values. There is therefore also a need to smooth the transition from a real object to a virtual object (or vice versa) in a scene to avoid disturbing transitions in colour from one object to the other. There is also a need to handle partial occlusions, in which an alpha matte for blending images is to be determined.
An improved approach for generating an augmented reality image will now be described with reference to the following figures.
An example method will now be described in relation to scene 400, as illustrated in
The method 900 begins at step 910 at which first 500 and second 550 images are captured. In general, either image or both images may be virtual images or partially virtual images provided that at least a portion of one image is an image of a real scene and another portion of either image contains virtual information. Put another way, portions of either or both image may comprise virtually generated content. The method 900 comprises capturing depth and colour values of the scene which form at least part of at least one of the first and second images and then determining colour and depth values for the remaining virtual portions of the first and second images.
For example, an RGB colour map and a depth map may be determined for the first image 500 and the second image 550 based on a combination of virtual depth and colour information and real colour and depth information. For the purposes of the following example, it is assumed that an RGBD camera has been used and that the resolution of the depth map matches the resolution of the RGB colour map for the scene such that there is a direct correspondence between a pixel in the depth map and a corresponding pixel in the RGB colour map. In this way, it is possible to perform direct assessment of each pixel in the two images. Furthermore, for the following example, the first image 500 is an image of the real scene 400 and the second image 550 is an image of the virtual object 104, both taken from an identical viewpoint positioned at the capture device 200.
Having completed step 910, the method proceeds to a step of categorisation in which the confidence and uncertainty regions are identified.
At step 920, a confidence region is identified, wherein the confidence region is a region of the scene in which a confident determination as to which of the first 500 and second 550 image to render in that region of the augmented reality image 600 can be made. For example, the first and second images may be compared at corresponding regions and, where the difference in depth values between images exceeds a threshold, the region may be marked as a confidence region since there can be a degree of confidence that the result of the comparison is correct.
The identification of a confidence region may include identifying one or more regions of the scene in which the first and second images do not comprise captured depth values of a real scene. In such regions there is certainty as to which image should be rendered (aside from exactly equal depth values) as it can be assumed that there is no capture error in the depth of virtual images. One approach to identifying such regions as confidence regions would be to track which of the depth and colour values have been obtained from a real scene and to identify regions of the scene in which only virtual depth values are present. These regions may automatically be identified as confidence regions. In some arrangements, it may be that regions in which only virtual depth values are present are deemed uncertainty regions, as will be described later. Alternatively, all regions of the first and second images may be individually processed to identify confidence regions.
As well as identifying confidence regions by identifying regions of the first and second images in which only real data is present, it is also possible to identify confidence regions in which at least one of the first and second image has a depth value captured from a real scene. For example, it could be determined that a region is a confidence region based upon a difference in the depth values of the first and second images being sufficiently large that any noise in the captured depth values would not affect the result of a comparison of the depth values of the first 500 and second 550 images.
Specifically, for a depth value at position x, y in the first image, D1(x,y), and a corresponding depth value at position x, y in the second image, D2(x,y), it is possible to determine whether or not the difference in value exceeds a threshold. A confidence region may be identified if the magnitude of the difference in depth values exceeds a predetermined threshold. In practice, this predetermined threshold may be manually selected when configuring the system. For example, setting the predetermined threshold to be greater than a maximum noise value may reduce the amount of noise in the final image but would do so at the cost of reducing the confidence region (and therefore increasing the size of the uncertainty region, as will be described later). As such, the amount of processing required by the system may be increased since the amount of an image that needs processing as described herein may be increased. Accordingly, there may be a trade-off between an acceptable level of noise that is accounted for in the predetermined threshold and the amount of processing that is required on the regions that are not identified as confidence regions.
Therefore, the predetermined threshold may be configured to be greater than a background noise level of the depth values captured from the real scene and lower than a maximum noise value. In this way, regions in which an erroneous depth value may result in an erroneous determination as to which image of the first image and second image to render in that region are reduced. Alternatively, if both images comprise real depth values of a scene, those regions in which the real depth values fall at the same point may have a different threshold, which may be twice as large to allow for cumulative addition of the error in each captured depth value.
Where the difference in depth values exceeds a predetermined threshold, i.e. the virtual object is not close in dimension y of
θ<|D1(x,y)−D2(x,y)|
However, at a particular pixel position x, y, if the difference between the two depth values is less than the predetermined threshold, then it may be determined that the pixel is a candidate for an erroneously rendered pixel, since the real scene and virtual object have similar depth values. This is illustrated by the following inequality:
θ≥|D1(x,y)−D2(x,y)|
In the event that this inequality is met, the position x, y may be regarded as an uncertainty region, which will be described in more detail in relation to step 930. It will be appreciated that the situation where θ=|D1(x, y)−D2(x, y)| can be handled in different manners. For example, in this situation the position x, y can be regarded as a confidence region or an uncertainty region, depending upon the specific implementation.
Having identified, for each region of the augmented reality image, whether that region is a confidence region it is possible to further categorise the regions so that each region of the augmented reality image falls within one of more than two different categories. In particular, portions of an identified confidence region may be sub-categorised into one of three sub-categories, namely first, second, and third confidence regions, as will be described in more detail below.
In this example, a categorisation map is generated which indicates into which category each region of the scene is categorised. The example categorisation map includes, for a corresponding pair of depth values, a value indicating the category at that pair of depth values based upon a comparison of the corresponding depth values of the first 500 and second 550 images. An example categorisation map generated based upon the scene of
In the current example, four different categories are defined and will be illustrated in relation to
Generally, a confidence region can be categorised as a first confidence region if the depth value in the confidence region of the first image (e.g. of the real scene) is less than a corresponding depth value in a second image (e.g. of a virtual object) such that the first image is closer than a second image. In the present example, where the first image is an image of a scene and the second image is an image of a virtual object, the first confidence region is a region in which the real scene is to be rendered, for example region 602 of
A confidence region may also be sub-categorised as a second confidence region if the depth value in the confidence region of the first image 500 is greater than a corresponding depth value in a second image 550. In the present example, where the first image 550 is an image of real scene 400 and the second image is an image of a virtual object 104, the second confidence region is a region in which the colour value of the virtual object is used for rendering, for example region 604 in
To make a determination as to whether a pixel of the scene should be categorised in the first confidence region or the second confidence region, the depth value of the first image and the depth value of the second image at that pixel are compared.
In one example, C(x, y) is set to 2 if D1(x, y)<D2(x,y), where D1(x, y) is the depth value at pixel x, y of the first image; D2(x, y) is the depth value at pixel x, y of the second image; and C(x, y) is the resultant categorisation value at pixel x, y. Where D1(x, y)≥D2(x, y), C(x, y) is set to 1.
The above-described process can, at the same time, identify (at step 930) regions that are confidence regions (in one of the three sub-categories) and regions that are uncertainty regions. Alternatively, the confidence regions may first be identified and the uncertainty regions may be separately identified. Once the confidence regions have been identified and sub-categorised and the uncertainty regions have been identified, the entire area of the augmented reality has been placed into one of four categories. The uncertainty regions are regions in which there is some doubt as to which of the first image 500 and the second image 550 is to be rendered. Where the comparison of the depth values is such that the magnitude of the difference in depth values at a location is less than a predetermined threshold θ, the location may be regarded as part of an uncertainty region. This is because the depth values are considered to be so close to one another that it is possible that errors in the capture of the depth value from the real scene in that region may lead to an erroneous result. These regions are then processed further, as will be described below. In the categorisation map of
Another approach for identifying uncertainty regions, which can be used in place of or in addition to the above-described approach, is to consider the absolute values of depth values captured from the real scene. In the present example, this may involve performing a test on each of the captured depth values. For example, an RGBD camera may produce a particular value which is indicative of an erroneously captured depth value. For example, it may be expected that a depth value should fall within a predetermined range and that a value outside of this range indicates an erroneous depth measurement. The RGBD camera may optionally be configured to provide a specific depth value to indicate that an error occurred in the captured value. Accordingly, by using different methods it is possible to identify incomplete or erroneously captured depth values.
It is also possible to perform an “in-fill” function in order to transform an uncertainty region into a confidence region on the basis that the uncertainty region is wholly surrounded by a confidence region of a particular subcategory. This process can be performed during the categorisation process in which confidence and uncertainty regions are identified. Specifically, where a region is wholly surrounded by “in-front category” sample points, it can be inferred that the sample points in that region should be completed based upon the surrounding categorisation. Accordingly, the categorisation value of the uncertainty region (“3”) can be changed to match the surround categorisation. As such, the area of uncertainty region to be processed is reduced before processing is performed. In this way, fewer pixels in the uncertainty region need to be processed in the subsequent processing steps to determine which colour should be used in the augmented reality image. The amount of processing needed to generate the augmented reality image is therefore reduced.
The “in-fill” function may also consider the size of the area to be in-filled before performing the in-filling. Specifically, a large area to be in-filled may indicate that the area is not erroneously uncertain but instead is actually part of another object. It may also be possible to consider the size of the confidence region during in-filling to ensure that the confidence region is sufficiently large to have confidence that the “in-filling” will not create errors in the categorisation. An example of a region of the categorisation map that can be in-filled is illustrated with reference to
The categorisation map 1000 indicates, for regions of the augmented reality image 600, which regions of the image are considered to be confidence regions in which the determination as to which of the first and second images to be rendered is made with a degree of confidence. Regions in which some uncertainty as to which of the first and second images to be rendered are indicated as uncertainty regions and are labelled by numeral 3, which are also shaded. Numeral 2 indicates confidence regions in which the real scene of the first image is to be rendered in place of the virtual object 104 of the second image 550. Numeral 1 indicates the confidence regions in which the virtual object 104 of the second image 550 is to be rendered in place of the colour values of the real scene 400.
It will be appreciated that for regions of the augmented reality image 600 in which there is certainty as to which of the first image or the second image is to be used for rendering, it is possible to determine a blending factor to determine the degree to which first 500 and second 550 images are blended. The blending factor in these regions may be a binary number which indicates which of the two images to wholly render at a pixel. A blending factor value may be regarded as an initial alpha matte value as will be explained in more detail later.
As can be seen from
Off object regions may be identified as a sub-category of the confidence region in which the two images do not overlap one another. Put another way, there may be regions in which the first image 500 and/or the second image 550 are not aligned with one another. For example, where the first image 500 is an image of a real scene and the second image 550 is an image of a virtual object 104, it may be that the second image 550 is smaller than the first image 500 and is only as large as the size of the virtual object 104.
Accordingly, when the first 500 and second 550 images are aligned with one another or a correspondence between colour values in the two images is generated, there may be regions of the first image 500 for which there is no corresponding region of the second image 550. Such regions may be deemed to be “off object” regions since, for these regions, no comparison of depths is required (or possible). As such, it is possible to mark these regions such that they are not processed further. In this way, it is possible for the amount of processing required to generate the augmented reality image 600 to be reduced.
The off object regions form part of the confidence regions since the determination as to which of the first image and the second image to render can be made with confidence. Put another way, since the one of the first and second images is not present in an off object region, it will be the colour values of the present object in the off object region that will be used to render the corresponding colour values of the augmented reality image 600.
In some implementations, the depth values and the colour values may not be directly aligned in position. Therefore, when aligning a depth map of the depth values with the colour images, it may be that boundaries of objects in the depth map extend beyond those in the colour image. As such, some depth value points may be erroneously included in the “in-front” region. In order to overcome this problem a morphological operator (e.g. an erosion operator) may be used to re-categorise confidence regions near a boundary between regions from either “in-front” or “behind” confidence sub-categories to an uncertainty region. This will be explained below.
For elements in a confidence region located near to an uncertainty region, the centre 1110 of the erosion kernel 1100 is placed at that element and, where there is another point within the erosion kernel 1100 that is in an uncertainty region, the element in question is re-categorised as being part of an uncertainty region. In this way, the uncertainty regions are widened to ensure that issues in alignment do not result in spurious results in the rendered image. It will be appreciated that the size of the erosion kernel 1100 may be varied depending upon the particular application of the described methods. Categorisation map 1150 illustrates the result of applying the 3×3 size erosion kernel 1110 to the categorisation map 1000 of
At the end of step 930 of the method of
The uncertainty region may be further processed to determine a value for a degree to which the first 500 and second 550 images are to be combined within these regions. Two possible approaches for processing the uncertainty region are set out below in relation to step 940.
Alpha Matte
In order to combine the first image 500 and the second image 550 to generate the augmented reality image 600, blending factor values may be determined which combine to form an alpha matte. The blending factor values of the alpha matte indicate the degree to which the corresponding colour values of each of the first image and the second image contribute to the colour at a corresponding location of the augmented reality image 600. Blending factor values of the alpha matte may take the value ‘0’, ‘1’, or any value in between ‘0’ and ‘1’. Where the blending factor value at a particular location of the alpha matte is ‘0’ or ‘1’, a single colour from either the first or second image is selected and rendered in the augmented reality image. Where the blending factor value is a value in between ‘0’ and ‘1’, a blend of the corresponding colours of the first and second images is generated and used when rendering that corresponding position in the final augmented reality image. By blending, for use at a particular location in the final augmented reality image, two colours each from the first and second image, it is possible to smooth a transition in colour between a rendered first image and a rendered second image in the augmented reality image, thereby reducing visual artefacts in the augmented reality image.
In the present example, the blending factor values of the alpha matte are determined in different ways for the confidence region and the uncertainty region. Specifically, in the confidence region the blending factor values are based upon the sub-categories of the confidence region. Specifically, a point in the categorisation map being assigned as a “behind” sub-category may optionally translate to a blending factor value of 1 in the corresponding position in the alpha matte. Similarly an “in-front” sub-category may translate to a blending factor value of 0 as illustrated in
Regions of the categorisation map 1300 that are designated as uncertainty regions are not initially assigned an initial alpha matte value since there is doubt as to which of the first 500 and second 550 images is to be used in the corresponding region of the augmented reality image 600.
Blending factor values for the uncertainty regions can be generated by one of a number of different methods. In general, determining at least one blending factor value in the uncertainty region is based upon a similarity between a colour value in the uncertainty region and at least one colour value in the confidence region. In this way, it is possible to use colour values in known regions of the images to infer in which region a particular portion of the image should be categorised based upon the degree of colour similarity.
Two specific approaches for determining the blending factor values in uncertainty regions are set out below. Both methods make use of colour information outside of the uncertainty region (i.e. in a confidence region) in order to determine the degree to which portions of the uncertainty regions are similar to portions of the confidence regions.
One approach to performing step 940 is to use of a cross bilateral filter (CBF) to determine blending factor values (i.e. alpha matte values) for uncertainty regions.
A cross bilateral filter is similar to a bilateral filter, but differs in that the source of the weights in the filter (known as the joint data) differ from those to which the filter is applied. In the approach described herein, the colour values of one of the two images (i.e. the first or the second image) are used to determine blending factor values in the uncertainty region. More specifically, in the present example, the colour values of the first image of the real scene are used when applying the CBF to the uncertainty region, as will be described in more detail below. In other examples, the CBF may be applied in the uncertainty region based on colours of a second (or third) image, for example the second image of the virtual object as described herein.
A cross bilateral filter is defined generally as follows:
Where WP is a normalisation factor that normalises the resultant value for pixel p between 0 and 1, I is the original input image to be filtered (which in this case is the colour values from the first image), and subscript p is the coordinate of the current pixel to be filtered. For each pixel p to be filtered, the cross bilateral filter determines a weighted average of pixels in a set S of pixels based upon two Gaussian functions, Gσ
The use of the cross bilateral filter is configured in the present example filter in that Gσ
The set S is determined based upon a filter kernel 1200, which forms a region around the pixel in question, p, and calculates a sum of all pixel values within the pixel kernel 1200. The pixel kernel may include all pixels within a predetermined distance of the pixel in question, or may be formed as a box of fixed size. For example, the pixel kernel may be a 3×3 pixel kernel with the pixel in question, p, positioned at the centre.
The cross bilateral filter used in the present arrangement makes a determination as to which pixels in the set S are located within uncertainty regions and which pixels in the set S are located within confidence regions. This may be determined based upon the values in the categorisation map. In the present approach, pixels in the set S that are located within uncertainty regions are provided with a zero weight and are thus disregarded. As such, uncertainty regions do not contribute to the blending factor value produced for a pixel in question, p. In this way, the determination of a blending factor value at a pixel does not take into consideration other pixels at which there is doubt as to the reliability of the depth values.
According to an example, a cross bilateral filter can be implemented with the use of a 3×3 pixel kernel 1200. The pixel kernel 1200 may be configured to use a colour value of each pixel that neighbours a pixel in question, p, within the kernel. As such, a 3×3 pixel kernel 1200 may typically involve the calculation of 8 different values for a particular pixel p, which may then be normalised between a value of 0 and 1. This process is repeated for each pixel until all of the pixels of the augmented reality image has been processed. However, in the present approach it may be that, for each processed pixel, fewer pixels are considered since some of those pixels may fall within an uncertainty region and are thus ignored.
An example filter kernel 1200 is illustrated in relation to
will be adjusted to a value based on the fact that only six pixels are taken into consideration. In general, the normalisation factor
will be adjusted to account for the number of pixels that are taken into consideration.
Set out below are the two Gaussian functions, Gσ
Gσ
where d is a colour distance metric. d provides a metric of the similarity in colour between the pixel in question, p, and one of the pixels in the kernel. In this example, the similarity in colour is determined based upon the Manhattan distance in RGB space. Specifically, distance d is defined by the following equation, where (pr, pg, pb), (qr, qg, qb) are the red (r), green (g), and blue (b) components of the colour pixels p and q:
d(p,q):=|pr−qr|+|pg−qg|+|pb−qb|
Advantageously, the Manhattan distance is particularly useful for determining the degree of colour similarity in the present approach since it has produced low mean square error (MSE) relative to ground truth mattes in testing and is efficient to evaluate.
Another Gaussian function Gσ
Where px, qx, py, and qy are the x and y coordinates of pixels p and q within the image. The distance may be a count of the number of pixels between the pixels based on a pixel coordinate system.
Therefore, for each pixel p in the uncertainty region, a blending factor value is provided by the cross bilateral filter based upon corresponding colour values in confidence regions within the filter kernel. The normalisation factor ensures that the generated value lies between 0 and 1.
In other arrangements, additional or alternative colour values could be used to generate the blending factor values. Different colour values in the confidence region in the first image may be utilised to perform filtering. For example, a larger filter kernel or a sparse sampling scheme that selects pixels that are not adjacent to the pixel in question may be used to perform filtering based upon a larger area of colour values in the first image of the real scene. As such, the filtering is performed in a less localised manner which would reduce the impact of any local colour defects in the first image on the generated augmented reality image. Additionally or alternatively, colour values from a third image of the same real scene may be used.
The blending factor values generated by the cross bilateral filter in the uncertainty region may then be combined with the initial blending factor values generated for the confidence region that are illustrated in
An example of an alpha matte 1200 formed solely of values generated within confidence regions is illustrated in
As can be seen from
Blending factor values (i.e. alpha matte values) may be determined for uncertainty regions. An updated complete alpha matte 1400 is illustrated in
The generated complete alpha matte 1400 can be used to combine the first image 500 and the second image 550 to generate an augmented reality image 600. This will be described in more detail later.
An alternative approach to determining blending factor (i.e. alpha matte) values for the uncertainty region is set out below and will be referred to as the “iterative method”. The iterative method differs from the cross bilateral filter in that the cross bilateral filter can be considered to be a localised approach to generating blending factor values in the uncertainty regions whilst the iterative method can be considered to be a large-scale approach.
In this alternative approach, steps 910, 920, and 930 of
Specifically, both the iterative method and the cross bilateral filter receive a partially completed initial alpha matte in which alpha matte values are determined for confidence regions. The iterative method described herein provides an alternative approach for determining the blending factor values for uncertainty regions.
In the iterative method described herein, blending factor values for an uncertainty region are determined by minimising the sum of squares of two error metrics for each element in the uncertainty region. The two error metrics used in the following example, are designed to encourage the formation of a visually pleasing alpha matte, with a low error.
For a partially completed alpha matte M, such as the alpha matte illustrated in
These estimated values may simply be set to 0.5 which is a balanced initial value that is to be refined during execution of the iterative method. An example of such an initial alpha matte used in the execution of the iterative method is illustrated in region 1510 of
In other arrangements, initial values for the alpha matte values in an uncertainty region may be determined using more sophisticated approaches, for example based on an initial desired blend across an uncertainty region, for example where the uncertainty region forms a boundary between confidence sub-category regions.
Since the method described herein is iterative, a better initial value may reduce the number of iterations of the method required to reach a predefined acceptable error level. For the purposes of describing the operation of this method, the alpha matte values for regions of the alpha matte that fall within uncertainty regions are initially assigned a value of 0.5. The iterative method is performed only on the alpha matte values which have an initially assigned value (e.g. alpha matte values in the uncertainty region).
In the following example, a blending factor value is generated for each point in the categorisation map categorised as in being in an uncertainty region based upon the minimisation of a gradient metric and a colour metric.
The gradient metric is designed to encourage an alpha matte which contains large flat regions with low image gradients, whilst allowing a small proportion of pixels to have high gradients, so as to define boundaries between 0 and 1 alpha matte values within the alpha matte. The gradient metric is selected in this way to reflect the properties of mattes in the typical situation where an image of a virtual object is considered with respect to an image of an opaque real object. For example, there may be large flat regions of the matte with zero gradient, and a smaller number of pixels along edges with a very high image gradient. Other shapes for the gradient metric may be selected based upon the content of the images to be used to generate the augmented reality image 600.
The gradient metric εgradient at a pixel p in matte M is illustrated in the equation below:
εgradient(M,p):=1+ln(e−1+G(M,p))
where G(M,p) is a gradient value defined by the below equation. The gradient value is an estimate of the sum of squared partial derivatives, where N4(p) is the 4-neighbourhood of position p in each of the four cardinal directions.
An example of the 4-neighbourhood at p is illustrated in
As set out above, the gradient metric εgradient(M,p) is based on the function y=1+ln(e−1+x). A plot of the gradient metric as a function of the gradient value is illustrated in
A second metric used in the iterative method is a colour metric designed to make use of colour information, by comparing the colour similarity of pixels in the uncertainty region with pixels that have been categorised in the “in-front” category, i.e. pixels in the foreground) and pixels that have been categorised in the “behind” category (i.e. pixels in the background of an image).
An example approach to defining the colour metric is to define two Mixture of Gaussians (MoG) models that are each fitted to colour samples taken from one of the foreground “in-front” and background “behind” colour values in the confidence region, based on the sub-categorisation of the confidence regions into “in-front” and “behind” regions. MoG models are particularly useful in the present implementation due to their multimodal nature, which allows them to handle cases where objects in a scene are surrounded by multiple objects of different colours, or objects with multiple different colours (e.g. due to varying object albedo or non-uniform lighting). Additionally, MoG models provide additional robustness to noise in the colour samples, as compared to finding nearest neighbours in the sample set.
For an image, the colour samples for the MoG models are selected from the sub-categorisations of the confidence regions near the uncertainty region. In order to select the colour samples, a dilation process is applied to the uncertainty region and the result of the dilation is intersected with the sub-categorised confidence pixels using an Expectation Maximisation (EM) algorithm. The EM algorithm process obtains regions from the respective “in-front” and “behind” categorised pixels within a small band of the uncertainty region.
The in-front and behind regions may be represented as one or more binary images, in which sample points inside the region are represented as a ‘1’, and sample points inside the regions are represented by a ‘0’. The uncertainty region is then dilated, to increase the size of the uncertainty region by a few pixels. Then, in an example implementation, a pixel-wise binary AND is applied to the dilated uncertainty region and the “in-front” and “behind” regions (e.g. the “in-front” and “behind” images) to find the area of overlap. In practice the area of overlap will be the separate “in-front” and “behind” regions within a predetermined distance of the uncertainty region, as defined by dilation kernel which is used to define the degree to which the uncertainty region is dilated. By following this approach, two additional regions are defined in which the dilated uncertainty region overlaps respective “in-front” and “behind” regions. Since the determination of the two new regions takes into consideration only “in-front” and “behind” regions, “off object” regions and uncertainty regions are not taken into consideration.
Having performed the above step, two MoG models are generated, each of which consists of scalar weights and parameters (mean, variance) for N 3-dimensional Gaussian functions (where N is the number of components in the mixture). These models provide a concise summary of the distribution of the foreground and background colour samples in the confidence region based upon the sub-categorisation of the alpha matte. For example, the number of Gaussians per model N may be set to 5. However, the number of Gaussians used in the model may vary and will be selected based upon a trade-off between performance and quality.
Once the MoG models have been fitted to the foreground and the background, the colour metric εcolour is defined using the following equation:
Wherein Pbehind and Pinfront are the respective probabilities that the colour sample at pixel p under the MoG models is fitted to the “behind” and “in-front” pixel categories. These probabilities are defined as the probability of the sample under the most likely Gaussian in each mixture. The colour error metric therefore encourages an appropriate local value for each pixel, whereas the gradient metric encourages an appropriate global structure for the matte. The MoG models are respectively fitted to the colours from the first image (e.g. the colours of the real scene) in the “in-front” region and the “behind” region and background colours respectively and are fitted to colours from the first image (e.g. the real image). The MoG models are fitted to maximise the probability of the observed foreground/background colour samples using the Expectation-Maximisation algorithm.
As will be appreciated, it is possible to use ‘0’ and ‘1’ values to represent different categorisations (e.g. a ‘1’ can represent an “in-front” or a “behind” region, provided a different value represents the other region). For example, if different values were used in the category map to represent the in front and behind regions, it may be necessary to swap the Pbehind(C(p)) and Pinfront(C(p)) probabilities in the above equation.
Having generated the colour error metric and the gradient error metric, the two metrics are minimised using an approach for minimising two errors metrics for each point in the uncertainty region of the alpha matte. One approach is to use the Levenberg-Marquardt algorithm to minimise the two error metrics for each point in the uncertainty region and thereby produce alpha matte values for the uncertainty region.
The Levenberg-Marquardt algorithm (LMA) operates upon a parameter space Ω⊆n. In the present example, the parameter space is the space of possible alpha mattes. That is, each element of Ω is a vector x=(p1, . . . , pn), where each pi is a pixel value from the uncertainty region of the alpha matte, such that Ω=[0,1]n, wherein n is the number of pixels in the uncertainty region. In the LMA, the aim is to minimise the sum of squares of errors. As defined above, the iterative approach defined herein makes use of error functions rj: Ω→, for j∈1, . . . , m. The error functions are defined as the gradient error metric and the colour error metric (as described above), each applied at each pixel in the uncertainty region.
The LMA is therefore configured to minimise the sum of squares of each of the error functions, using the following equation:
As described above, the values of the alpha matte in the uncertainty region are initialised to a value defined as the initial estimate of x, termed herein as x0. At each step of the iteration of the LMA, a small step delta is taken, i.e. x1+1:=xi−δi so that f(xi+1)<f(xi), using gradient information.
Let r: Ω→m be a residual vector, defined by r(x):=(r1(x), . . . , rm(x)) that can be differentiated with respect to x to obtain a Jacobian matrix
since the two errors metrics used in the present example are differentiable, J can be found analytically. The updates can be computed as follows:
δi:=(JTJ+λdiag(JTJ))−1JTf(xi)
The above equation is a form of combination of a first order and second order approximation to f, and the value λ∈ controls the weighting of the two approximations. In order to perform the above computation, a matrix inverse needs to be performed as shown above. Whilst this matrix can be large, the matrix is also sparse and symmetric, which means that δ can be efficiently found using a sparse Cholesky solver.
In order to perform the LMA, the following steps are performed in order to minimise the two error metrics:
The iterative method is particularly suited to applications in which the generation of an augmented reality image is to be performed in real time, for example where a plurality of augmented reality images are to be generated sequentially to form a video sequence. The iterative method may be performed a number of times to reduce the mean squared error (MSE) in the resultant alpha matte. In time-critical applications such as the generation of a video sequence, it is possible to allocate a defined period of time to the generation of the blending factor values in the uncertainty region using the iterative method. Accordingly, the iterative method will be performed as many times as possible with the allocated time period. In this way, it is certain that the iterative method will generate blending factor values in the required time and the error may be minimised within the required time. For example, it is possible to maintain a constant frame rate in an augmented reality video sequence of augmented reality images.
Once the iterative method or the cross bilateral filter approach has been applied, a complete alpha matte is generated for the entire image space, as illustrated in
A further example implementation is illustrated with respect to
As can be seen from
Following the uncertainty region 1750 in row 1720, are a series of values “1”, “2”, and then “-” in the categorisation map. These remaining categorisation values and their corresponding alpha matte values are determined in a similar manner as described above. As can be seen from
By generating the blending factor values (i.e. alpha matte values) for the uncertainty regions, for example by using the cross bilateral filter or the iterative method as described above, a complete alpha matte 1400 is generated as illustrated in
An approach for generating the augmented reality image 600 is to apply the following equation based upon the colour values of the first image 500 and the second image 550.
c∝:=∝c1·(1−∝)c2
For a particular point in the alpha matte, a corresponding pixel of each of the first image 500 and the second image 550 is considered. The alpha matte value ∝ at that corresponding point determines the colour value c∝ in the corresponding pixel of the augmented reality image 600. As shown in the above equation, the colour value c∝ at a particular pixel in the augmented reality image 600 is a colour combination of colour value c2 of the second image 550 at that pixel and the colour value c1 of the first image 500 at that pixel. In some arrangements, the alpha matte values of 0 and 1 may be switched, for example where the alpha matte values assigned to “in-front” and “behind” pixels are switched. In this arrangement, the values used for c1 and c2 may therefore also be switched.
In the present example, and as shown in
Specifically, where the alpha matte value in a confidence region is ‘1’, the above equation provides that the colour at a corresponding pixel of the augmented reality image will be based solely on the colour of the first image of the real scene. Conversely, where the alpha matte value in a confidence region is ‘0’, the above equation provides that the colour at a corresponding pixel of the augmented reality image 600 will be based solely on the colour of the second image of the virtual object.
In the confidence regions a confident determination can be made and thus the alpha matte value is ‘1’ or ‘0’. It is preferable to determine in uncertainty regions a value of ‘1’ or ‘0’ for the alpha matte. As such, the alpha matte value determined by applying, for example, the cross bilateral filter or the iterative method as described above, may also be 0 or 1. If such values are determined in uncertainty regions, the colour of the augmented reality image at a corresponding pixel will also be based solely on either the colour value of the first image or the colour value of the second image. In the event that all uncertainty regions are given 0 or 1 values, the boundary in the augmented reality image between the sub-categories of the confidence region will be well-defined and thus the occlusion in an augmented reality image will be clearly defined. In practice, as illustrated in
Accordingly, where it is not possible to form a confident boundary between objects in an augmented reality image, it is possible to control the transition in colour at the boundary between the first and second images so that fewer artefacts from the occlusion are visible. By performing a blend of the colour values of the first image and the second image in this way, it is possible to lessen the impact of artefacts in a manner that is visually pleasing. Moreover, the approaches described herein allow occlusion on a per-pixel basis and also the control of the transition in colour between first and second images when performing occlusion to be performed on a per-pixel basis.
A performance comparison of the iterative method and the cross bilateral filter is illustrated with respect to
The performance of the cross bilateral filter and the iterative method is compared to a simple approach in which it is assumed that determined real depths are accurate and the depth values of the first and second images are simply compared to determine the alpha matte used in combining the images. Put another way, in the simple approach, it is assumed that the entire image is a confidence region and is thus processed accordingly. In the simple approach, any pixels without valid depth values are assumed to lie behind the virtual object. As can be seen from
The present approaches determine blending factor values which indicate the degree to which the colour values at corresponding points in two images are blended. As discussed previously, blending factor values may each indicate the degree of colour blending at a sampling point or within a region. As such, the colour values of each image should correspond with a blending factor value. A plurality of blending factor values may therefore be combined to cover an entire image area, with each blending factor value corresponding to a portion of the image area. In this way, it is possible for a plurality of blending factor values to combine to be form an alpha matte comprising a plurality of alpha matte values. The alpha matte values individually indicate the degree of transparency of a particular image. However, when applied in the present arrangement the alpha matte value can be used to indicate the degree to which each of the first image and the second image are to be combined.
The augmented reality processing system described above can be considered to be a standard graphics processing system configured for augmented reality applications. Alternatively, the augmented reality processing system can be considered to be a separate system arranged for the purposes of augmented reality image generation.
In the examples described herein, the comparison of depth values has been such that a first object having lower depth value at a sample point than a second object means the first object is closer to the viewpoint from which the augmented reality image is to be generated. However, in other arrangements, a first object having lower depth value at a sample point than a second object means the first object is further away from the viewpoint from which the augmented reality image is to be generated. For such arrangements, the calculations used to perform categorisation would be reversed as would be understood by the person skilled in the art.
The examples defined herein generate an augmented reality image, which combines first and second images. At least a portion of either or both of the first and second image includes an image of a real scene. Other portions may include imagery of a virtual scene and/or a virtual object. In the example illustrated herein, the first image is an image of a real scene with no virtual object and the second image is a wholly virtual image of a virtual object. In other implementations, the first and/or the second image may comprise wholly or partially virtual components. It will be appreciated that errors arise where at least a portion of the two images comprises a real captured depth which gives rise to a potential error in the depth measurements.
In an example, an augmented reality video sequence may be generated using the above-described approach of generating an augmented reality image. Specifically, each frame of the augmented reality video may be generated using the method of
The confidence identification module 2110 and the uncertainty identification module 2120 need not be implemented in a parallel manner as is set out in
The augmented reality processing system 2100 of
The augmented reality processing systems described herein may be embodied in hardware on an integrated circuit. Generally, any of the functions, methods, techniques or components described above can be implemented in software, firmware, hardware (e.g., fixed logic circuitry), or any combination thereof. The terms “module,” “functionality,” “component”, “element”, “unit”, “block” and “logic” may be used herein to generally represent software, firmware, hardware, or any combination thereof. In the case of a software implementation, the module, functionality, component, element, unit, block or logic represents program code that performs the specified tasks when executed on a processor. The algorithms and methods described herein could be performed by one or more processors executing code that causes the processor(s) to perform the algorithms/methods. Examples of a computer-readable storage medium include a random-access memory (RAM), read-only memory (ROM), an optical disc, flash memory, hard disk memory, and other memory devices that may use magnetic, optical, and other techniques to store instructions or other data and that can be accessed by a machine.
The terms computer program code and computer readable instructions as used herein refer to any kind of executable code for processors, including code expressed in a machine language, an interpreted language or a scripting language. Executable code includes binary code, machine code, bytecode, code defining an integrated circuit (such as a hardware description language or netlist), and code expressed in a programming language code such as C, Java or OpenCL. Executable code may be, for example, any kind of software, firmware, script, module or library which, when suitably executed, processed, interpreted, compiled, executed at a virtual machine or other software environment, cause a processor of the computer system at which the executable code is supported to perform the tasks specified by the code.
A processor, computer, or computer system may be any kind of device, machine or dedicated circuit, or collection or portion thereof, with processing capability such that it can execute instructions. A processor may be any kind of general purpose or dedicated processor, such as a CPU, GPU, System-on-chip, state machine, media processor, an application-specific integrated circuit (ASIC), a programmable logic array, a field-programmable gate array (FPGA), or the like. A computer or computer system may comprise one or more processors.
It is also intended to encompass software which defines a configuration of hardware as described herein, such as HDL (hardware description language) software, as is used for designing integrated circuits, or for configuring programmable chips, to carry out desired functions. That is, there may be provided a computer readable storage medium having encoded thereon computer readable program code in the form of an integrated circuit definition dataset that when processed in an integrated circuit manufacturing system configures the system to manufacture an augmented reality processing system configured to perform any of the methods described herein, or to manufacture an augmented reality processing system comprising any apparatus described herein. An integrated circuit definition dataset may be, for example, an integrated circuit description.
An integrated circuit definition dataset may be in the form of computer code, for example as a netlist, code for configuring a programmable chip, as a hardware description language defining an integrated circuit at any level, including as register transfer level (RTL) code, as high-level circuit representations such as Verilog or VHDL, and as low-level circuit representations such as OASIS (RTM) and GDSII. Higher level representations which logically define an integrated circuit (such as RTL) may be processed at a computer system configured for generating a manufacturing definition of an integrated circuit in the context of a software environment comprising definitions of circuit elements and rules for combining those elements in order to generate the manufacturing definition of an integrated circuit so defined by the representation. As is typically the case with software executing at a computer system so as to define a machine, one or more intermediate user steps (e.g. providing commands, variables etc.) may be required in order for a computer system configured for generating a manufacturing definition of an integrated circuit to execute code defining an integrated circuit so as to generate the manufacturing definition of that integrated circuit.
An example of processing an integrated circuit definition dataset at an integrated circuit manufacturing system so as to configure the system to manufacture an augmented reality processing system will now be described with respect to
The layout processing system 2304 is configured to receive and process the IC definition dataset to determine a circuit layout. Methods of determining a circuit layout from an IC definition dataset are known in the art, and for example may involve synthesising RTL code to determine a gate level representation of a circuit to be generated, e.g. in terms of logical components (e.g. NAND, NOR, AND, OR, MUX and FLIP-FLOP components). A circuit layout can be determined from the gate level representation of the circuit by determining positional information for the logical components. This may be done automatically or with user involvement in order to optimise the circuit layout. When the layout processing system 2304 has determined the circuit layout it may output a circuit layout definition to the IC generation system 2306. A circuit layout definition may be, for example, a circuit layout description.
The IC generation system 2306 generates an IC according to the circuit layout definition, as is known in the art. For example, the IC generation system 2306 may implement a semiconductor device fabrication process to generate the IC, which may involve a multiple-step sequence of photo lithographic and chemical processing steps during which electronic circuits are gradually created on a wafer made of semiconducting material. The circuit layout definition may be in the form of a mask which can be used in a lithographic process for generating an IC according to the circuit definition. Alternatively, the circuit layout definition provided to the IC generation system 2306 may be in the form of computer-readable code which the IC generation system 2306 can use to form a suitable mask for use in generating an IC.
The different processes performed by the IC manufacturing system 2302 may be implemented all in one location, e.g. by one party. Alternatively, the IC manufacturing system 2302 may be a distributed system such that some of the processes may be performed at different locations, and may be performed by different parties. For example, some of the stages of: (i) synthesising RTL code representing the IC definition dataset to form a gate level representation of a circuit to be generated, (ii) generating a circuit layout based on the gate level representation, (iii) forming a mask in accordance with the circuit layout, and (iv) fabricating an integrated circuit using the mask, may be performed in different locations and/or by different parties.
In other examples, processing of the integrated circuit definition dataset at an integrated circuit manufacturing system may configure the system to manufacture an augmented reality processing system without the IC definition dataset being processed so as to determine a circuit layout. For instance, an integrated circuit definition dataset may define the configuration of a reconfigurable processor, such as an FPGA, and the processing of that dataset may configure an IC manufacturing system to generate a reconfigurable processor having that defined configuration (e.g. by loading configuration data to the FPGA).
In some embodiments, an integrated circuit manufacturing definition dataset, when processed in an integrated circuit manufacturing system, may cause an integrated circuit manufacturing system to generate a device as described herein. For example, the configuration of an integrated circuit manufacturing system in the manner described above with respect to
In some examples, an integrated circuit definition dataset could include software which runs on hardware defined at the dataset or in combination with hardware defined at the dataset. In the example shown in
The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
6445816, | Sep 12 1996 | AUTODESK, Inc | Compositing video image data |
20020150307, | |||
20030133602, | |||
20050231511, | |||
20070132777, | |||
20100128923, | |||
20110038536, | |||
20120026190, | |||
20120075341, | |||
20120075484, | |||
20120206452, | |||
20130063486, | |||
20130083064, | |||
20140098100, | |||
20140176757, | |||
20140354690, | |||
20150243087, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Feb 18 2020 | Imagination Technologies Limited | (assignment on the face of the patent) | / | |||
Jul 30 2024 | Imagination Technologies Limited | FORTRESS INVESTMENT GROUP UK LTD | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 068221 | /0001 |
Date | Maintenance Fee Events |
Feb 18 2020 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Aug 10 2024 | 4 years fee payment window open |
Feb 10 2025 | 6 months grace period start (w surcharge) |
Aug 10 2025 | patent expiry (for year 4) |
Aug 10 2027 | 2 years to revive unintentionally abandoned end. (for year 4) |
Aug 10 2028 | 8 years fee payment window open |
Feb 10 2029 | 6 months grace period start (w surcharge) |
Aug 10 2029 | patent expiry (for year 8) |
Aug 10 2031 | 2 years to revive unintentionally abandoned end. (for year 8) |
Aug 10 2032 | 12 years fee payment window open |
Feb 10 2033 | 6 months grace period start (w surcharge) |
Aug 10 2033 | patent expiry (for year 12) |
Aug 10 2035 | 2 years to revive unintentionally abandoned end. (for year 12) |