A method for providing an output image, the method includes: determining an importance value for each input pixels out of multiple input pixels of an input image; applying on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form the output image; wherein the input image differs from the output image.
|
5. A method for providing an output image, the method comprising:
determining an importance value for each input pixels pixel out of multiple input pixels of an input image, wherein the importance value is at least based on a saliency score, the saliency score based on at least one of: spatial constraints, object detection and/or motion detection for each of the multiple input pixels;
applying on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form the output image;
wherein the input image differs from the output image;
wherein a distance between output image representations of a pair of adjacent input pixels is responsive to an importance of at least one of pair of adjacent input pixels; and
outputting the output image.
12. A device for providing an output image, the device comprising:
a memory unit adapted to store an input image and a processor, adapted to:
determine an importance value for each input pixels pixel out of multiple input pixels of an input image and apply on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form the output image, wherein the importance value is at least based on a saliency score, the saliency score based on at least one of: spatial constraints, object detection and/or motion detection for each of the multiple input pixels;
wherein the input image differs from the output image;
wherein the processor is adapted to determine an importance of an one or the multiple input pixel pixels in response to motion associated with each of the multiple input pixels and output the output image.
8. A method for providing an output image, the method comprising:
determining an importance value for each input pixels pixel out of multiple input pixels of an input image, wherein the importance value is at least based on a saliency score, the saliency score based on at least one of: spatial constraints, object detection and/or motion detection for each of the multiple input pixels;
applying on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form the output image;
wherein the input image differs from the output image;
wherein the determining is responsive to a saliency score of the input pixels;
wherein the saliency score is computed by locating the coarsest diagonal high frequency frame in which the percentage of wavelet coefficients having values below a first threshold is below a second threshold; and
outputting the output image.
9. A method for providing an output image, the method comprising:
determining an importance value for each input pixels pixel out of multiple input pixels of an input image, wherein the importance value is at least based on a saliency score, the saliency score based on at least one of: spatial constraints, object detection and/or motion detection for each of the multiple input pixels;
applying on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form the output image;
wherein the input image differs from the output image;
wherein the determining is responsive to a saliency score of the input pixels;
wherein the saliency score is computed by applying a wavelet decomposition process;
wherein the wavelet decomposition process is followed by thresholding a diagonal high frequency image to generate a binary frame;
re-scaling the binary frame; and
smoothing the re-scaled binary frame, and
outputting the output image.
15. A device for providing an output image, the device comprising:
a memory unit adapted to store an input image and a processor, adapted to:
determine an importance value for each input pixels pixel out of multiple input pixels of an input image and apply on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form the output image, wherein the importance value is at least based on a saliency score, the saliency score based on at least one of: spatial constraints, object detection and/or motion detection for each of the multiple input pixels;
wherein the input image differs from the output image;
wherein the processor is adapted to apply a conversion process in response to at least one of the following constraints:
each input pixel is mapped to an output pixel that is located at substantially a fixed distance from its left and right neighbors;
each input pixel is mapped to an output pixel located to substantially a similar location to which upper and lower input pixels are mapped;
an input pixel is mapped to an output pixel located substantially at a same location as an output pixel to which the same input pixel at a previous image was mapped; and
size and shape of the output image; and
output the output image.
0. 1. A method for providing an output image, the method comprising: determining an importance value for each input pixels out of multiple input pixels of an input image; applying on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form the output image; wherein the input image differs from the output image; wherein the determining is responsive to motion associated with each of the multiple input pixels.
0. 2. The method according to
0. 3. The method according to
0. 4. The method according to
0. 6. A method for providing an output image, the method comprising: determining an importance value for each input pixels out of multiple input pixels of an input image; applying on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form the output image; wherein the input image differs from the output image; wherein the applying is responsive to at least one of the following constraints:
each input pixel is mapped to an output pixel that is located at substantially a fixed distance from its left and right neighbors;
each input pixel is mapped to an output pixel located to substantially a similar location to which upper and lower input pixels are mapped;
an input pixel is mapped to an output pixel located substantially at a same location as an output pixel to which the same input pixel at a previous image was mapped; and
size and shape of the output image.
0. 7. A method for providing an output image, the method comprising: determining an importance value for each input pixels out of multiple input pixels of an input image; applying on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form the output image; wherein the input image differs from the output image; wherein the determining is responsive to a saliency score of the input pixels; wherein the saliency score is computed by applying a wavelet decomposition process.
0. 10. A method for providing an output image, the method comprising: determining an importance value for each input pixels out of multiple input pixels of an input image; applying on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form the output image; wherein the input image differs from the output image; wherein the applying is preceded by applying a data reduction stage; and applying on each of the results of the data reduction stage a conversion process that is responsive to the importance value of the results to provide converted results.
0. 11. A method for providing an output image, the method comprising: determining an importance value for each input pixels out of multiple input pixels of an input image; applying on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form the output image; wherein the input image differs from the output image;
determining an importance value for each input pixels out of multiple input pixels of a group of input images; and
applying on each of the multiple input pixels a conversion process that is responsive to the importance value of the input pixel to provide multiple output pixels that form group of output images; wherein inputs image differs from output images.
0. 13. The device according to
14. The device according to
0. 16. The method of claim 5 wherein the saliency score is based on the spatial constraints, the object detection and the motion detection for each of the multiple input pixels.
0. 17. The method of claim 8 wherein the saliency score is based on the spatial constraints, the object detection and the motion detection for each of the multiple input pixels.
0. 18. The method of claim 9 wherein the saliency score is based on the spatial constraints, the object detection and the motion detection for each of the multiple input pixels.
0. 19. The device of claim 12 wherein the saliency score is based on the spatial constraints, the object detection and the motion detection for each of the multiple input pixels.
0. 20. The device of claim 15 wherein the saliency score is based on the spatial constraints, the object detection and the motion detection for each of the multiple input pixels.
|
This application is a
Si,j,t(xi+1,j,t−xi,j,t)=Si,j,t,
More precisely, since the at least-squares manner solution is applied then an equation arising from a pixel of importance Si,j,t is as influential as
equations arising from a pixel of importance Si′,j′,t′.
It is noted that S is the saliency matrix of Eq. (1), except the time index appears explicitly. Note that the equation looking right from pixel (i−1,j) can be combined with the equation looking left from pixel (i, j) to one equation:
(Si−1,j,t+Si,j,t)(xi,j,t−xi−1,j,t)=(Si−1,j,t+Si,j,t) (6)
Boundary Substitutions.
In order to make the retargeted image fit in the new dimensions a constraint is added that defining the first pixel in each row of the frame (1, j, t) to be mapped to the first row in the retargeted video, i.e., ∀j, ∀t x1,j,t=1. Similarly, the last pixel of each row is mapped to the boundary of the remapped frame: ∀j, ∀t xC
Since the mappings of the first and last pixels in each row are known, there is no need to have unknowns for them. Instead, it is substituted with the actual values whenever x1,j,t or xC
Spatial and Time Smoothness
It is important to have each column of pixels in the input image mapped within the boundaries of a narrow strip in the retargeted image. Otherwise, the image looks jagged and distorted. These type of constraint are weighted uniformly, and take the form:
Ws(xi,j,t−xi,j+1,t)=0 (7)
In the system Ws=1. In order to prevent drifting, a similar constraint is added that states that the first and the last pixels of each column have a similar displacement.
Ws(xi,1,t−xi,C
The mapping also has to be continuous between adjacent frames, as stated bellow:
Wi,j,tt(xi,j,t−xi,j,t−1)=0, (9)
where, in order to prevent distortion of faces, the weighting depends on the face detector saliency map Wt=0.2(1+SF). Note that according to an embodiment of the invention in on line more (real time mode) the resources do not necessarily allow to build a system of equations for the whole shot. Instead mapping is computed for each frame given the previous frame's computed mapping. This limited-horizon online time-smoothing method and, as illustrated in
Altering the Aspect Ratio of the Input Image
Examples of aspect ratio altering are exhibited in
The format of the retargeted videos is as follows: each frame is divided into three sub frames. The bottom one is the original video frame. The top right sub-frame is the result of applying bi-cubic interpolation to obtain a new frame of half the input width. The top-left sub-frame is the retargeted result.
While the method does not explicitly crop frames, whenever the unimportant regions in the frame lie away from the frame's center, an implicit cropping is created. See, for example, the retargeting result of the sequence Akiyo (
Down-Sizing Results
The down-sampling results (preserving the aspect ratio) are exhibited in
The x-axis and the y-axis warps were computed independently on the original frame and then applied together to produce the output frames. As can be seen, there is a strong zooming-in effect in our results, as necessitated by the need to display large enough objects on a small screen.
It is noted that by using a global error measuring function (such as least squares) the solution tends to uniformly distribute the error across the whole image, rather than concentrate it locally.
Video Expanding
The method can also be used for video expanding. In such a case, however, the desired output depends on the application. In one application, for stills, the task is to keep the original size of the salient objects, while enlarging the video by filling-in the less salient locations with unnoticeable pixels. For such a task, the method can work without any modifications.
In another application, one would like the salient objects to become larger without creating noticeable distortions to the video. A related task is foreground emphasis through non-homogenous warping in-place, where the dimensions of the video remain the same (salient objects are increased in size on the expense of less salient regions). To apply the method in these cases, we need to alter Equation (6) to have the preferred inflating ratio on the right-hand-side. If given by the user or by some heuristic, this is a simple modification. For an inflation by a fixed factor of two where the width is increased and the height remains the same.
According to another embodiment of the invention the device and method are adapted to compensate for camera motion, camera zoom-out and camera zoom-in. Accordingly, the method can compensate for (or substantially ignore) motion introduced by camera manipulation and not by an actual movement of the object. This stage can involve incorporating global affine motion registration, into the solution of the optimization problem. In such a case the global motion is compensated before the optimization stage, and added/subtracted from the optimization solution.
According to another embodiment of the invention the method and device can be used to convert an input video stream to a shorter output video stream. This can be applied by computing an optimal per-pixel time warping via a linear system of equations. Each pixel will be mapped to a time-location in the output video that is similar to that of its spatial neighbors. Important pixels are to be mapped to locations in time distinct from their time-line neighbors. Each frame in the output video is assembled using several input frames, such that moving objects do not overlap.
Data Reduction
According to an embodiment of the invention the conversion process can be simplified, and additionally or alternatively, applied on multiple frames at once, by reducing the amount of information that is processed during the conversion process. For example, after the importance of each pixel of a frame is calculated, a smaller information set can be used when calculating the conversion process. The smaller information set can include multiple variables, each representative of an importance of multiple pixels, it can include only importance information of a sub set of the pixels, and the like. The data reduction can be implemented by various mathematical manners such as but not limited to averaging, quantizing, sub-set selection and the like.
For example, assuming that an input saliency matrix (which includes information about all pixels of the frame) has G elements, then a reduced matrix can include fewer elements (for example G/R). After the smaller matrix is used during the conversion process the results are up-scaled, for example by using a bilinear filter, a Bi-cubic filter, and the like.
Group of Frame Processing
According to an embodiment of the invention the conversion process can be applied on a group of frames. This group can form a shot or a portion of the shot. When the conversion process is applied on frames of a sequence of consecutive frames then the time smoothness can be further improved as a frame is processed not just in relation to a previous frame but also in relation to one or more following frames.
Conveniently, a single conversion process can be applied on a group of frames after performing data reduction on each frame, but this is not necessarily so and depends upon the number of frames, the computational resources and memory resources that can be used during the conversion process and timing requirements as it can be harder to provide real time processing or even almost real time processing on a group of frames.
For example, assuming that the group of images form a shot the processing can include: (i) calculating the saliency of every frame in the shot; resize each saliency matrix (Width×Height) to a reduced matrix (Width×{Height/ReductionFactor}) using bilinear/bicubic interpolation, (ii) generate a combined saliency matric that includes the different reduced matrices)—for example by concatenate the different reduced saliency matrices one after the other to provide, wherein the size of the combined saliency matric has the following dimensions: Width×(Height*NumberOfMatrices/Reduction Factor); (iii) calculating the optimization matrix with various constraints, such as: (iii.a) X(i,j,t)−X(i,j+1,t)=1; (iii.b) X(i,j,t)−X(i+1,j,t)=0; (iii.c) X(i,j,t)−X(i,j,t+1)=0; (iii.d) X(i,1,1)=1; (iii.e) X(i, Width,NumberOfFrames)=TargetWidth; (iv) adding weights; (v) solving the linear system; and (vi) mapping each frame using the upscale solution.
Panning
Panning includes emulating a movement of a camera, such as a horizontal movement or rotation. Panning can be introduced when the conversion process is applied on a group of frames. In this case the panning can be represented by selecting portions of a larger frame, wherein the selection of portions provides a panning effect. In this case the conversion process can include mapping pixels of an input frame (that its location within the larger frame changes over time) to an output frame.
Conveniently, these mentioned above varying boundaries are includes in the set of constraints that are solved by the conversion process.
For example, assume that the variable Pan_t is the horizontal panning of frame t. It differs over time to provide the panning effect. Then the conversion process should take into account the following constraints: (ia) X(i,1,t)−Pan_t=1; (ib) X(i,n,t)−Pan_t=new_width; (ic) X(i,j,t)+Pan_t−X(i,j,t+1)−Pan_t+1=0//times some weighting (I use 0.11); (ii) Pan_t−Pan_{t+1}=0; //times some weighting (I use 0.000001); and (iii) Pan_1=0
Under these constraints the linear system is solved. For each frame a solution-matrix is provided and Pan_t can be subtracted from it. (iv) the solution matrix is upscaled to the original frame size. (v) The frame is remapped.
A device is provided. The device can include hardware, software and/or firmware.
Conveniently, processor 210 includes a local saliency detection module 212, face detection module 214, motion detection module 216 and mapping optimizing module 218. These modules cooperate in order to provide an output image. It is noted that processor 210 can work in an in-line manner, in a partially off-line manner or entirely in an off-line manner. It is further notes that various objects of interest can be detected by processor 210, in addition to or instead of faces. Each module can include software, hardware, firmware or a combination thereof.
Local saliency module 212 calculates local saliency values of pixels. Face detection module 214 detects faces. Motion detection module 216 detects motion. Mapping optimizing module 218 applies the conversion process.
Conveniently, processor 210 is adapted to perform at least one of the following or a combination thereof: (i) determine an importance of an input pixel in response to an importance input pixel mask. The mask can be defined by a user; (ii) determine an importance of an input pixel in response to motion associated with each of the multiple input pixels; (iii) determine an importance of an input pixel in response to a saliency score of the input pixels; (iv) determine an importance of an input pixel in response to an inclusion of an input pixel within an input image that represents a face of a person and/or within an object of interest. The object of interest is predefined and can depend upon the expected content of the image. For example, when viewing sport events the ball can be defined as an object of interest; (v) generate an output image such that a distance between output image representations of a pair of adjacent input pixels is responsive to an importance of at least one of pair of adjacent input pixels. Thus, for example, a pair of important input image pixels can be mapped to a pair of output image pixels while less important pair of input pixels can be mapped to the same pixel or be mapped to output pixels whereas the distance between their output image representations is less than a pixel; (vi) at least partially compensate for camera induced motion. This motion can result from zoom-in, zoom-out, camera rotation, and the like; (vii) apply an optimal mapping between the input image (original frame or source frame) to the output image (retargeted image); (viii) solve a set of sparse linear equations; (ix) apply a conversion process in response to at least one of the following constraints: each input pixel is mapped to an output pixel that is located at substantially a fixed distance from its left and right neighbors; each input pixel is mapped to an output pixel located to substantially a similar location to which upper and lower input pixels are mapped; an input pixel is mapped to an output pixel located substantially at a same location as an output pixel to which the same input pixel at a previous image was mapped; and size and shape of the output image; (x) perform re-sizing (down-sizing, up-sizing, warping, and the like); (xi) alter an aspect ratio.
The processor is adapted to perform at least one of the mentioned above operations by executing code. It is noted that the adaptation involve providing hardware circuitries that can assist in executing one or more of the mentioned above stages. The hardware can include memory circuitry, logic circuitry, filters, and the like.
Conveniently, the input image belongs to an input image sequence and processor 210 is adapted to apply a conversion process in response to a relationship between the input image and at least one other input of the input image sequence.
Processor 210 can execute at least one stage of methods 100 or 300 or a combination thereof. It can, for example, perform data reduction, wavelet decomposition, group of frames processing, and panning.
Stage 110 can include at least one of the following: (i) stage 112 of determining a importance input pixel mask, (ii) stage 113 of determining an importance of an input pixel in response to motion associated with each of the multiple input pixels; the determination can include assigning a binary motion base saliency score to an input pixel or assigning a non-binary motion base saliency score of a pixel in response to the amount of motion; (iii) stage 114 of determining an importance of an input pixel in response to a saliency score of the input pixels; (iv) stage 115 of determining an importance of an input pixel in response to an inclusion of an input pixel within a an input image that represents a face of a person.
Stage 110 is followed by stage 120 of applying on each of the multiple input pixels a conversion process that is responsive to the interest value of the input pixel to provide multiple output pixels that form the output image; wherein the input image differs from the output image.
Stage 120 can be preceded by stage 119 of performing a data reduction stage. The performing can include representing each set of pixels by a single variable, ignoring pixel importance information, and the like.
Conveniently, the input image belongs to an input image sequence and wherein the applying is responsive to a relationship between the input image and at least one other input of the input image sequence.
Method 300 differs from method 100 by the processing of a group of input images.
Method 300 starts by stage 310 of determining an interest value for each input pixels out of multiple input pixels of the group of input images. These multiple input images can form a shot or a portion of a shot.
Stage 310 can resemble stage 110 but it is applied on pixels of a group of images. It can be applied on these pixels simultaneously.
Stage 310 can include stages that are analogue to stages 111-115 and 119. For example, stage 310 can include applying a data reduction stage to provide results of a data reduction stage. It this case stage 320 will include applying on each of the results of the data reduction stage a conversion process that is responsive to the importance value of the results to provide converted results.
Stage 310 is followed by stage 320 of applying on each of the multiple input pixels of the group of images a conversion process that is responsive to the interest value of the input pixel to provide multiple output pixels that form the output image; wherein the input image differs from the output image. The conversion process of a certain image can be responsive to one or more frames that precede this certain image and to one or more mages that follow this certain image.
Stage 320 can include stages that are analogue to stages 121-127. It can also include stage 328 of applying the conversion process on elements of a combined saliency matrix, wherein the combined saliency matrix includes multiple saliency matrices, each representative of a saliency of multiple pixels of a single input image out of the group of input images.
Stages 310 and 320 can be executed in a manner that generates a panning effect. Assuming that a sequence of K images (IA(1) . . . IA(K)) form a shot, that the panning effect includes a movement of the camera from left to right, and that portions of size P×Q pixels should be regarded as the input image. In this case the first input image will include the leftmost P*Q pixels of IA(1), the second input image will include a slightly shifted portion of P×Q pixels of IA(2) and the K'th input image should include the rightmost P*Q pixels of IA(K). The pixels that belong to these input images can be processed by applying stages 310 and 320 to provide a panning effect.
Sample Images
The top row of
Each of
Each of
APPENDIX A illustrates a method and system for providing an output image. Especially, a method and device for inhomogeneous 2D texture mapping guided by a feature mask is provided. The mapping can apply one or more conversion processes that are responsive to the feature mask. The mapping preserves some regions of the image, such as foreground objects or other prominent parts. The method is also referred to as the method illustrated in appendix A. This method includes receiving an input frame and a feature mask defined by a rough selection of the features interest map and mapping them (by solving a sparse equation set) to an output frame. If a rigid transformation (rigid mapping) is applied then the featured indicated in the feature mask can undergo (during the mapping) a similarity transformation, possibly at the expense of the background regions in the texture that are allowed to deform more. If a similarity transformation (similarity mapping) is applied then the size of a feature can be slightly changed.
Appendix A illustrates a method for providing an output image, the method includes: receiving an input frame and a feature mask defined by a rough selection of the features; applying a mapping process to provide the output image; wherein the mapping process differentiates between pixels of features included in the feature mask and between other pixels; wherein the applying comprises solving a sparse equation set.
Conveniently, the mapping process applies a similarity transformation on pixels of features included in the feature mask.
Conveniently, the mapping process allows pixels of features included in the feature mask to slightly change.
Appendix A illustrates a device for providing an output image, the device includes: a memory unit adapted to store an input image and a feature mask defined by a rough selection of the features; and a processor, adapted to: apply a mapping process to provide the output image; wherein the mapping process differentiates between pixels of features included in the feature mask and between other pixels; wherein the applying comprises solving a sparse equation set.
Conveniently, the processor applies a similarity transformation on pixels of features included in the feature mask.
Conveniently, the processor applies allows pixels of features included in the feature mask to slightly change.
Appendix A illustrates a computer readable medium that stores instructions for: receiving an input frame and a feature mask defined by a rough selection of the features; applying a mapping process to provide the output image; wherein the mapping process differentiates between pixels of features included in the feature mask and between other pixels; wherein the applying comprises solving a sparse equation set.
Conveniently, the computer readable medium stores instructions for applying a similarity transformation on pixels of features included in the feature mask.
Conveniently, the computer readable medium stores instructions for allowing pixels of features included in the feature mask to slightly change.
Instead of cropping the frames, the device and method shrink them while respecting the salient regions and maintaining the user experience. The proposed device and method are efficient and the optimization stage includes of solving a sparse N×N system, where N is the number of pixels in each frame. The method and device are well adapted to batch applications, but are designed for streaming video since they compute the warp of a given frame based on a smalltime-neighborhood only, and are is fast enough to avoid delays. It is noted that the method and system can also perform up-scaling.
The method and device can be applied to solve several retargeting tasks:
video down/up-sampling, aspect ratio alterations, and non-homogenous video expansion, video abstraction, object removal from a video, and object insertion to video while respecting the saliency. It is noted that object removal is done by zeroing the saliency measure of the object while object insertion is implemented by placing a new blob of pixels in between existing image pixels and setting it the importance of the new pixels to a large value.
The method of appendix A does not distort the regions of interest.
The method of appendix A and the system are able to arbitrarily warp a given image while preserving the shape of its features by constraining their deformation to be a similarity transformation.
In particular, the method and system allow global or local changes to the aspect ratio of the texture without causing undesirable shearing to the features. The algorithmic core of the method and system is a particular formulation of the Laplacian editing technique, suited to accommodate similarity constraints on parts of the domain.
The method illustrated in appendix A is useful in digital imaging, texture design and any other applications involving image warping, where parts of the image have high familiarity and should retain their shape after modification.
In 2D texture mapping applications, images are mapped onto arbitrary 2D shapes to create various special effects; the texture mapping is essentially a warp of the texture image, with constraints on the shape of the boundary or possibly the interior of the image as well. Such texture mapping is common in graphical design and publishing tools, as well as 2D and 3D modeling and animation applications. Commercial design tools usually provide a library of predefined warps, where the user only needs to select the desired mapping type and possibly tune a few parameters. Another option is to interactively design the texture map by selecting and transforming points or curves on the original image; the mapping is computed so as to accommodate such user constraints. It is also possible to apply free-form deformations with grid-based controls.
Texture manipulation in 2D is commonly applied by modelers when texturing 3D models: the texture map often needs to be adjusted and aligned to match particular features of the 3D surface. Constrained texture mapping methods have been developed for this purpose where the user supplies point correspondences between the texture and the 3D model, and a suitable mapping is computed automatically.
Most image mapping and manipulation techniques treat the entire texture image homogeneously. When the deformation applied to an image introduces shearing, e.g. in the simplest situation where the aspect ratio of an image is altered by non-uniform scaling, all the image features are distorted. This may be disturbing when the image contains features with highly familiar shape, such as humans, animals, prominent geometric objects, etc. A typical example of a simple image transformation is shown in
The method illustrated in appendix A is capable of preserving the shape of masked regions of the texture while warping the image according to the user specifications. This feature-aware texture mapping is guided by a feature mask defined by a rough selection of the features; in the mapping result, these features will undergo solely a similarity transformation, possibly at the expense of the background regions in the texture that are allowed to deform more. This method can relate to the texture optimization techniques of Balmelli et al., where the texture map is warped to allow higher pixel budget for the high-frequency details of the texture image.
At a first glance, it seems that a feature-preserving mapping could be achieved by cutting out the features, warping the rest of the image as desired and then pasting the features back and adjusting their orientation and scale. However, this poses several difficulties: (i) precise segmentation of the features with correct alpha-mattes for subsequent seamless compositing is required; (ii) it is not clear how to prescribe the similarity transformation of the features; (iii) texture synthesis heeds to be applied for the holes that are likely to form around the features; alternatively, the pasted features could overlap with parts of the warped texture, causing information loss. The above tasks are quite complex; moreover, the tuning of such an algorithm would require significant amount of user interaction. In contrast, our method does not require a highly accurate matte but rather a loose selection of the features, which can be done using standard selection tools. The method illustrated in appendix A produces coherent, smooth image warps by drawing upon the recent machinery of differential representations and deformation techniques.
Feature-Aware Mapping
The suggested feature-preserving texture mapping technique is first described assuming that an input warping function W: R2→R2 is given. Assume that the input image is represented by a regular pixel grid of dimensions m×n. The grid of the input image is denoted by G=(V,E,K), where V={v1, v2, . . . , vN} is the set of node positions (N=mn), E={(i, j)} is the set of directed edges between the nodes and K is the set of quad faces of the grid. Throughout the discussion it is assumed that G is a 4-connected quad grid, although the algorithm can be easily extended to any general meshing of the image. It is assumed that the values of the input mapping W on all the grid nodes vi are known.
The user provides a feature mask that marks the parts of the image whose shape should be preserved. The mask is denoted by M={m1, . . . , mN}, such that mi=1 if pixel i belongs to a feature and mi=0 otherwise. The feature nodes indices are thus F={i s.t. mi=1}. The method 100 partitions F into its connected components: F=F1∪F2∪ . . . ∪Fd (see
A proper shape preserving transformation for each quad Q=(vi
Specifically, denote W(Q)=(vi
the centroid of Q; the centered vertices are then ui
TW,Q=[ui
where A* denotes the pseudoinverse of matrix A. In fact, TW,Q is an approximation of the Jacobian of W on Q; if given the analytical expression of W, TW,Q can be replaced by the Jacobian of W at, say, vi
To extract the rigid component of TW,Q the method performs its singular value decomposition: TW,Q=UΣVT; the rigid component of TW,Q is then
RW,Q=VUT. (11)
To devise the feature-preserving mapping, the method formulates the following optimization problem: it would be desired that all the elements outside of F to undergo a transformation as close as possible to W, and all the elements in F should undergo solely the rigid (or similarity) component of W. It is convenient to formulate the requirements of this optimization per quad. If quad Q=(vi
{tilde over (v)}i
where {tilde over (v)}i
{tilde over (v)}i
Overall, the method of appendix A obtains an over-determined system of 4|K| equations in 2N unknowns, which can be solved in the least squares sense. Note that the system is separable in the two coordinates, thus we can solve for x and y separately, with the system matrix containing N columns. The method can constrain the boundary nodes to their positions under W to make the optimization problem well-posed:
{tilde over (v)}i=W(vi),∀iϵ∂G. (14)
Solving for {tilde over (v)}1, . . . , {tilde over (v)}N will provide a mapping that rigidly preserves the features, including their size. To obtain a shape-preserving mapping that allows appropriate scaling of the features, the method can modify the local transformations RW,Q as follows.
The method estimates that the average scaling of each connected feature component Fi under W by observing the singular values of the transformations TW,Q. For each element QϵFi, the method takes the smaller singular value of TW,Q, and average those values over all QϵFi, obtaining the average scale factor λi. Conveniently, the smaller singular values are averaged, because intuitively, if the image is stretched in one direction, the feature size should remain constant. The target local transformations of the quads in each Fi are thus updated to be λiRW,Q, and Eq. ((2)) is modified accordingly.
Smoothing the Mapping
When the input warp W is largely deforming the geometry of G, feature shape preservation may be compromised. To compensate for such situations, it is useful to apply weights to Eq. ((2)) that is responsible for feature preservation: each side of those equations is multiplied by weight w, (a sample value of WF=10). Since a least-squares system of equations is solved, this multiplication results in wF2-magnification of the corresponding error terms in the minimization functional, forcing the optimization to respect the features more, at the expense of larger deformation of other areas.
However, since the weights are abruptly discontinuous at the feature boundaries (weighting of 1 outside the feature and wf?1 inside), such solution damages the smoothness of the mapping near the feature boundary. This can be easily corrected by assigning a more smooth weighting function: computing a local distance field to the feature and assigning smoothly decreasing weights for the quads in the vicinity of the feature as functions of the distance field. The equations associated with those “transition-quads” are of type ((2)).
The following polynomial can be used as the decay function:
(15) where the constant ρ>0 controls the extent of the decay; the weights in the intermediate region around the feature boundaries are thus defined as:
w(Q)=wF·f(D(Q))+1·(1−f(D(Q))), (16)
where D(Q) is the value of the distance to the feature at the center of Q. The decay radius p is set to be the width of two grid cells; outside of this radius the weights are set to 1.
Interactive Texture Mapping
Two possible modes of texturing application are differentiated from each other: input-warp mode (described in the previous section) and interactive mode. In both modes, the feature regions of the input image are first specified by a feature mask. In the interactive mode, the user designs the mapping using the standard controls of image boundary editing and/or prescription of inner curve transformations. The mapping is computed taking into account these user-defined constraints and the feature mask, using a deformation technique based on differential coordinates.
These user's manipulations are interpreted by the system as positional constraints on the grid nodes, i.e. simply
{tilde over (v)}i=ci,iϵU, (17)
where U is the set of the nodes constrained by the user and ci are the new positions for those nodes.
The mapping of the free grid nodes is decided by applying the Laplacian editing optimization. The goal of this optimization is to create a smooth and as-rigid-as-possible mapping of the grid shape that respects the user constraints ((17)).
“As-rigid-as-possible” means that if the user-constraints imply solely a rigid (or similarity) transformation of the grid shape, the optimization technique indeed delivers such transformation; otherwise, the optimization finds a mapping that is locally as close as possible to being rigid, which is perceived as an intuitive result. The optimization involves solving a sparse linear system of size 2N×2N.
Once the mapping function W is established in the above manner, its feature-preserving approximation is created according to the feature mask, as described in Section “Feature-aware mapping” above.
Sample Implementation Details
Size
setup
Factor
Rhs setup
Solve
50 × 100
0.156
0.110
0.015
0
100 × 100
0.375
0.250
0.031
0.015
100 × 200
1.141
0.562
0.047
0.031
200 × 200
2.171
1.407
0.109
0.063
Table 1 illustrates timing statistics (in seconds) for the different parts of the mapping algorithm. Sys. setup stands for the setup of the normal equations matrix; Rhs setup denotes the building the right-hand side of the normal equations and Solve stands for the back-substitution. Note that the system setup and matrix factorization is done in a pre-process, once per given image grid.
The algorithmic core of the feature-sensitive texture mapping is the solution of the least-squares optimization expressed by Eqs. ((2)-(3)) and ((14)).
When put together, these equations form an over-determined linear system of the form:
A(xy)=(bxby), (18)
where x=({tilde over (x)}1, . . . , {tilde over (x)}N)T are the x coordinates of the deformed grid and y=({tilde over (y)}1, . . . , {tilde over (y)}N)T are the y coordinates.
The system is separable in the two coordinates, so the system matrix A has N columns. The matrix is very sparse since there are only two non-zero coefficients in each row. The system is solved by factoring the normal equations:
ATA(xy)=AT(bxby). (19)
The Taucs library is used for efficient sparse matrix solvers. Cholesky factorization provides a sparse lower-triangular matrix L such that
ATA=LLT. (20)
Then, the equations can solved by double back substitution:
Lxtemp=ATbx
LTx=xtemp, (21)
and in the same fashion for the y component. Thus, a single factorization serves solving for multiple right-hand sides.
The construction of the A matrix, the normal equations matrix and the factorization can be attributed to the pre-process, since they only depend on the grid and the feature map of the input image; the matrix factorization is the most computationally-intensive part, taking a few seconds for grids with several tens of thousands of quads. Once the factorization is computed, back substitution is extremely fast (see Table 1).
When varying the input warp function W, there is only need to update the right-hand side of the system (the bx,by vectors) and perform back-substitution, so the user can experiment with various mappings in real time. Of course, manipulation of very large images may slow down due to the large dimensions of the system matrix; to maintain interactive response in this case the grid is defined to be slightly coarser than the pixel grid of the input image, so that the size of the system remains in the order of 20000-50000 variables. For example, it can efficiently handle an image of 1000×1000 pixels by defining the size of the grid cells to be 5×5 pixels.
Computing the initial mapping by interactively-placed user constrains (Section “Interactive texture mapping”) also requires solving a sparse linear system of size 2N×2N. It is done in the same manner pre-factoring the system matrix and solely varying the right-hand side of the system when the user manipulates the boundary constraints. Since the back-substitution is fast, the manipulation is interactive, as demonstrated in the accompanying video.
The mentioned above feature-sensitive texturing system on a Pentium 4 3.2 GHz computer with 2 GB RAM. It was assumed that the feature mask comes together with the input image, defined in some external image editing software. During the experiments the feature maps were created by Photoshop using the standard selection tools (Magic Wand, Lasso and Magnetic Lasso). The process of feature selection is quite easy since the feature-aware texturing needs only a rough binary matte.
The inventor experimented with various input warping functions that are commonly available in most image editing packages. The results of unconstrained mapping with the mentioned above feature-preserving mapping were compared in various figures. It can be clearly seen in all the examples that the mentioned above mapping preserves the shape of the features while gracefully mimicking the input mapping function. The similarity-preserving mapping allows uniform scaling of the features, and thus it has more freedom to approximate the input mapping. For instance, when the input mapping implies enlargement of the image, the similarity-preserving mapping will allow uniform scaling of the features, whereas the rigid mapping will constrain the features to remain in their original size, thus introducing more stretch to the background areas.
In extreme deformation cases, the feature-aware mapping may introduce fold-overs, which may result in texture discontinuity. Preventing self-intersections within the least-squares optimization is quite difficult; it is noted that the method can be adapted to perform post-processing relaxations to fix the fold-overs.
Sample Images
Variations, modifications, and other implementations of what is described herein will occur to those of ordinary skill in the art without departing from the spirit and the scope of the invention as claimed. Accordingly, the invention is to be defined not by the preceding illustrative description but instead by the spirit and scope of the following claims.
Wolf, Lior, Cohen-Or, Daniel, Guttman, Moshe
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
6670963, | Jan 17 2001 | FORTRESS CREDIT CORP , AS AGENT | Visual attention model |
6751363, | Aug 10 1999 | WSOU Investments, LLC | Methods of imaging based on wavelet retrieval of scenes |
6919892, | Aug 14 2002 | AVAWORKS, INCORPROATED | Photo realistic talking head creation system and method |
7027054, | Aug 14 2002 | AvaWorks, Incorporated | Do-it-yourself photo realistic talking head creation system and method |
7280696, | May 20 2002 | SIMMONDS PRECISION PRODUCTS, INC | Video detection/verification system |
7374536, | Apr 16 2004 | TAYLOR MICROTECHNOLOGY, INC | Method for analysis of pain images |
7413502, | Mar 11 2003 | Optotech Optikmaschinen GmbH; Carl Zeiss Vision GmbH; Optotech Optikmashinen GmbH | Method for producing ophthalmic lenses and other shaped bodies with optically active surfaces |
7505604, | May 20 2002 | SIMMONDS PRECISION PRODUCTS, INC | Method for detection and recognition of fog presence within an aircraft compartment using video images |
7574069, | Aug 01 2005 | Mitsubishi Electric Research Laboratories, Inc | Retargeting images for small displays |
7614742, | Mar 11 2003 | Carl Zeiss Vision GmbH | Method for producing ophthalmic lenses and other shaped bodies with optically active surfaces |
7747107, | Mar 06 2007 | Mitsubishi Electric Research Laboratories, Inc | Method for retargeting images |
7767792, | Feb 20 2004 | Commonwealth Scientific & Industrial Research | Antibodies to EGF receptor epitope peptides |
8078006, | May 04 2001 | LEGEND3D, INC. | Minimal artifact image sequence depth enhancement system and method |
8218895, | Sep 27 2006 | Wisconsin Alumni Research Foundation | Systems and methods for generating and displaying a warped image using fish eye warping |
8385684, | May 04 2001 | LEGEND3D, INC.; LEGEND3D, INC | System and method for minimal iteration workflow for image sequence depth enhancement |
8718333, | Apr 23 2007 | RAMOT AT TEL AVIV UNIVERSITY LTD | System, method and a computer readable medium for providing an output image |
20060029294, | |||
20060072847, | |||
20060109510, | |||
20070025643, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 15 2018 | Ramot at Tel Aviv University Ltd. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Mar 15 2018 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Mar 25 2018 | SMAL: Entity status set to Small. |
Date | Maintenance Schedule |
Jul 23 2022 | 4 years fee payment window open |
Jan 23 2023 | 6 months grace period start (w surcharge) |
Jul 23 2023 | patent expiry (for year 4) |
Jul 23 2025 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 23 2026 | 8 years fee payment window open |
Jan 23 2027 | 6 months grace period start (w surcharge) |
Jul 23 2027 | patent expiry (for year 8) |
Jul 23 2029 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 23 2030 | 12 years fee payment window open |
Jan 23 2031 | 6 months grace period start (w surcharge) |
Jul 23 2031 | patent expiry (for year 12) |
Jul 23 2033 | 2 years to revive unintentionally abandoned end. (for year 12) |