A content-aware image retargeting technique uses an “importance filtering” technique to preserve important information in the resizing of an image. The image saliency is first filtered, guided by the image itself to achieve a structure-consistent importance map. The pixel importance is then used as the key constraint in computing the gradient map of pixel shifts from the original resolution to the target resolution. Finally the shift gradient is integrated across the image by a weighted filtering process to construct a smooth pixel shift-map and render the target image. The weight is again controlled by the pixel importance. The two filtering processes enforce the maintaining of structural consistency while preserving the important contents in the target image. The simple nature of the present filter operations allow for real-time applications and easy extension to video retargeting, as the structural constraints from the original image naturally convey the temporal coherence between frames.
|
1. A method of retargeting a digital original image to a digital target image, said method comprising:
(a) accessing said original image, wherein the size of at least one dimension of said original image is different than the size of the corresponding one dimension in said target image, and a scaling factor relates the size of said one dimension of said original image to the size of said corresponding one dimension in said target image;
(b) constructing an image saliency of said original image;
(c) filtering the image saliency under the guidance of said original image to produce an importance map, SImp, that matches pixels to salient structures in said original image;
(d) estimating a shift-map gradient G according to said scaling factor and under the constraint of said importance map SImp, wherein pixels that are matched to salient structures are assigned a consistently distributed gradient;
(e) defining a shift-map M from said shift-map gradient G, wherein said shift-map M estimates pixel shifts from said original image to said target image; and
(f) applying said shift-map M to said original image to construct said target image;
wherein in step (c):
said importance map SImp defines salient structures, said salient structures being important structures preserved in said target image; and
for each defined salient structure, the pixels matched to the defined structure constitute a group of matched pixels, and the matched pixels within each group are assigned smoothly distributed salient measures.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
8. The method of
9. The method of
10. The method of
11. The method of
said importance map SImp is normalized so that SImp(x, y)ε[0,1];
said shift-map gradient G renders bigger gradients for pixels whose importance value as defined by said normalized importance map SImp approach 0, and renders smaller gradients for pixels whose importance value as defined by said normalized importance map SImp approach 1.
12. The method of
13. The method of
where α is a scaling factor, σ is the variance and C is a normalization term.
where α is a scaling factor, σ is the variance, and C1 is a normalization term.
where α is a scaling factor, σ is the variance, and C2 is a normalization term.
where α is a scaling factor, σ is the variance, and C3 is a normalization term.
18. The method of
19. The method of
M=Mopt=minM′|∇xM′−G| s.t.M′(0,y)=0,M′(X,y)=|X′−X| where M′ is a pixel shift that when optimized renders the final shift-map M, X is dimensional coordinate of a pixel in said original image along said one dimension of the original image, X′ is dimensional coordinate of the corresponding pixel in said target image along the corresponding one dimension in said target image, and x and y are coordinate in a two dimensional system including said one dimension of the original image.
where x and y are coordinates in a two dimensional system including said one dimension of the original image, r is a fraction of a dimension of the target image, and w is a weight for assuring that pixels with higher importance are affected less than pixels with lower importance.
|
This application claims the benefit of U.S. Provisional Application No. 61/412,645 filed Nov. 11, 2010, under 35 U.S.C. §119(e) and is hereby incorporated by reference in its entirety
1. Field of Invention
The following relates to image resizing using content-aware image retargeting.
2. Description of Related Art
Due to the fast growing diversity of display devices, an image often needs to be displayed across various imaging devices having different image settings, including different aspect ratios. To assure a good viewing experience in a given display device, it may be necessary to resize, or otherwise modify, an original image for a given display device.
Another option is to compress original image 11 to create a compressed representation 11A that matches the aspect ratio of the display screen 13. This results in a distorted image 13a being displayed. As seen, displayed image 13a experiences an elongated, stretching quality, as compared with original image 11.
A further option is to crop original image 11 to fit display screen 13 by using a cropping frame 15 whose aspect ratio matches that of display screen 13. But cropping results in a displayed image 13B that may omit important image content from original image 11. For example in the present case, in order to include the person standing off from the castle in original image 11, it is necessary to cut off half of the castle in displayed image 13B.
A more recently proposed option is termed content-aware image retargeting, and it aims to remove from view, or otherwise compress, less important image content information while preserving, or otherwise minimizing compression/distortion of, more important image content information. A simplified illustration of this idea is to divide original image 11 into a multiple image segments S1 through Sn.
In the present example, image segments S1 through Sn are shown large, wide, and vertical, but this is purely for illustration purposes. More typically, each image segment, or seam, is one pixel wide and follows a path that goes straight or shifts by one-pixel position as it crosses from one pixel row/column to the next in traversing from one side of the image to its opposite side. Preferably, each pixel in the seam's path is selected according to image content importance so that each seam defines a path of least importance as it progresses from one row/column to the next. Thus, the shifting of the seam is the result of the pixel paths identifying the pixels of least importance, i.e., pixels that are part of less important image content.
Content-aware image retargeting may automatically remove seams to reduce an image size, or insert seams to extend it, as necessary, for a given display screen. In the present simplified illustration, the resultant, displayed image 13C preserves the entire castle image content intact as well as the person standing off from the castle, but removes (or optionally compresses) the portion of original image 11 that contains less important image content information, such as the sky and field space between the person and the castle.
An objective of content-aware image retargeting is to preserve the look-and-feel (i.e., visual quality) of the original image. One method of achieving this is to maintain consistency of important structures from the original image to the resized output image (i.e., the target image). However, compression of less important image content often introduces distortion into the structure of more important content. For example, if less important content that surrounds (or is adjacent to) more important content is removed, or compressed; this may introduce curvature into the straight lines of the more important content.
Nonetheless because of its ability to preserve important imaging content, content-aware image retargeting has attracted much interest recently. A challenging issue, however, is how to balance a tradeoff between preserving important content, minimizing visual distortion, and maintaining consistency of an image structure.
It is an object of the present invention to provide an improved method of identifying important content for use in content-aware image retargeting.
Another object of the present invention is to preserve important content of an image while minimizing visual distortion and maintaining consistency of from an original image.
These objects are met in a filtering-based technique, hereinafter called “importance filtering”. The present method first filters the image saliency of an input image, guided by the entire input image (i.e. the original image) itself, to achieve a structure-consistent importance map. The pixel importance is then used as the key constraint in computing a gradient map of pixel shifts relative to neighboring pixels. Finally the shift gradient is integrated across the image by a weighted filtering process to construct a smooth pixel shift-map and render the target image. The weight is controlled by the pixel importance. The two filtering processes enforce the maintaining of structural consistency while preserving the important contents in the target image. Furthermore, the simple nature of the present filter operations allows highly efficient implementation for real-time applications and easy extension to video retargeting, as the structural constraints from the original image naturally convey the temporal coherence between frames. The effectiveness and efficiency of the present filtering algorithm are confirmed in extensive experiments.
The above objects are thus met in a method of retargeting a digital original image to a digital target image, said method comprising: (a) accessing said original image, wherein the size of at least one dimension of said original image is different than the size of the corresponding one dimension in said target image, and a scaling factor relates the size of said one dimension of said original image to the size of said corresponding one dimension in said target image; (b) constructing an image saliency of said original image; (c) filtering the image saliency under the guidance of said original image to produce an importance map, SImp, that matches pixels to salient structures in said original image; (d) estimating a shift-map gradient G according to said scaling factor and under the constraint of said importance map SImp, wherein pixels that are matched to salient structures are assigned a consistently distributed gradient; (e) defining a shift-map M from said shift-map gradient G, wherein said shift-map M estimates pixel shifts from said original image to said target image; and (f) applying said shift-map M to said original image to construct said target image.
Preferably in step (b), said image saliency defines a saliency measure for each pixels of said original image, said saliency measure being computed from local image features within said original image according to the saliency of said features.
Further preferably in step (c), said image saliency is filtered under the guidance of the whole of said original image by using a guided filter.
Also in step (c), a face detector is incorporated into the filtering of the image saliency to identify the pixels matched to salient structures, and salient structures are determined from saliency measures in said image saliency.
Moreover in step (c): said importance map SImp defines salient structures, said salient structures being important structures bet preserved in said target image; and for each defined salient structure, the pixels match to the defined salient structure constitute a group of matched pixels, and the matched pixels within each group are assigned smoothly distributed salient measures. In this case, said matched pixels within each group may be assigned substantially similar salient measures.
Also preferably in step (d), said shift-map gradient G defines each pixel's shift relative to its neighboring pixels.
Additionally in step (d), said consistently distributed gradient is substantially the same gradient value indicative of a minimal shift relative to neighboring pixels.
Preferably in step (d), said consistently distributed gradient is nonlinearly related to said scaling factor. In this case, said consistently distributed gradient is lower than a gradient linearly related to said scaling factor.
Additionally in step (d), said consistently distributed gradient is effective for maintaining distortion of said salient structure lower than distortion of unimportant regions of said original image, as defined by said importance map SImp.
Furthermore in step (d), said importance map SImp is normalized so that SImp(x, y)ε[0,1]; said shift-map gradient G renders bigger gradients for pixels whose importance value as defined by said normalized importance map SImp approach 0, and renders smaller gradients for pixels whose importance value as defined by said normalized importance map SImp approach 1. In this approach, said shift-map gradient G is preferably non-linear and gradient values drop faster as importance values approach 1. Further preferably, gradient values drops faster when importance values are above 0.75 than when gradient values are not above 0.75.
In an embodiment of the present invention, the shift-map gradient G is defined as:
where α is a scaling factor, σ is the variance and C is a normalization term.
The shift-map gradient G may also be defined as:
where α is a scaling factor, σ is the variance, and C1 is a normalization term.
The shift-map gradient G may further be defined as:
where α is a scaling factor, σ is the variance, and C2 is a normalization term.
If preferred, the shift-map gradient G may also be defined as:
where α is a scaling factor, σ is the variance, and C3 is a normalization term.
Additionally in step (e), the defining of said shift-map M from said shift-map gradient G, may include integrating the shift-map gradient G by a weighted filtering process, wherein pixels matched to salient structures are weighted higher than pixels not matched to salient structures.
Further preferably, shift-map M may be defined by the following optimization:
M=Mopt=minM′|∇xM′−G|
s.t.M′(0,y)=0,M′(X,y)=|X′−X|
where M′ is a pixel shift that when optimized renders the final shift-map M, X is dimensional coordinate of a pixel in said original image along said one dimension of the original image, X′ is dimensional coordinate of the corresponding pixel in said target image along the corresponding one dimension in said target image, and x and y are coordinate in a two dimensional system including said one dimension of the original image.
Additionally, shift-map M may be defined by following relationship:
where x and y are coordinate in a two dimensional system including said one dimension of the original image, r is a fraction of a dimension of the target image, and w is a weight for assuring that pixels with higher importance are affected less than pixels with lower importance.
In this case, weight w may be defined as w(x, y)=eS
Other objects and attainments together with a fuller understanding of the invention will become apparent and appreciated by referring to the following description and claims taken in conjunction with the accompanying drawings.
In the drawings wherein like reference symbols refer to like parts.
An image records visual information of a covered scene viewed from a certain angle given predefined imaging parameters. In applying content-aware image retargeting, an original image (i.e. input image) is retargeted to different image parameters, such as a different aspect ratio, which inevitably causes the original visual content to be altered. To preserve the original image's visual quality in the retargeted image (hereinafter, the target image), the important content (hereinafter also, the important contents or important pixels) are preferably maintained close to that of the original image. Pixels that are part of less important content (hereinafter also, unimportant contents or unimportant pixels) have to endure more sacrifice. This generally changes the overall image structure and often leads to visual distortion in the target image. For example, a straight line in the original image may become badly curved in the target image if different parts of the line happen to have different importance designations. Such structural distortion often causes the target image to look much less natural than the original image. Therefore careful and proper treatment is required to minimize such distortion while preserving the important content of the original image. The balance of this tradeoff is the key and most challenging issue for content-aware retargeting.
Several approaches have been developed to address this problem. A comprehensive introduction of recent developments in this area is presented in “A Comparative Study of Image Retargeting”, by Rubinstein et al, ACM SIGGRAPH ASIA, 2010, herein incorporated in its entirety by reference. Rubinstein et al. classify existing methods into two categories; discrete or continuous. Discrete methods (including seam carving and shift-maps) try to remove or copy unimportant pixels while keeping important pixels rigid. Continuous methods (including feature-aware texture mapping, scale-and-stretch, and energy-based deformation) try to compute a non-uniform warping function (from the original image to the target image) that is designed to retain the important contents and warp regions containing unimportant content. To reduce distortion on overall image structure, both categories of methods use constraints from part of the image features to optimize the retargeting manipulation with local smoothness. However, since the image features are usually computed at individual pixels or in a local patch of pixels, the global image structure can still be distorted in many cases.
In an effort to capitalize on the advantages of the different methods, Rubinstein et al. propose a multi-operator approach that optimizes a combination of several methods in “Multi-Operator Media Retargeting”, by Rubinstein et al. TOG., 2009. This multi-operator method greatly reduces the visual distortion on image structure, but the essential problems of its individual components (i.e. the methods that it combines) still remain.
The results of retargeting input image 21 using a method in accord with present invention is target image 25.
Target images 23A and 23C show distorted flags and chairs. Target images 23B and 23C exhibit an unnatural stretching quality. Target image 23D cuts off human subjects (i.e. important contents) of input image 21 and moves the roof and chairs. Target image 23E, the result of the multi-operator method, achieves better results than the other illustrated prior art methods.
But as illustrate in target image 25, the method in accord with the present invention improves over that of target image 23E by better retaining prominent areas while minimizing distortion.
Target image 35 shows the results of retargeting input image 31 using the presently preferred method in accord with the present invention. Like before, target image 35 obtained in accord with the present invention retains more of the important image content regions while minimizing visual distortion.
In the presently preferred method, to further minimize the visual distortion on image structure while preserving important contents, the retargeting process is constrained directly using the whole of the original image itself (not merely a patch) such that all the information, global and local, can be used together to enable the optimal overall quality. Based on this concept, an importance filtering algorithm for content-aware image retargeting is developed. The algorithm consists of three major steps.
In the first major step, the image saliency is computed and an importance map based on it is constructed. Preferably, the image saliency used in the present invention builds on the method illustrated in “Human Detection Using a Mobile Platform and Novel Features Derived from a Visual Saliency Mechanism” by Montabone et al., in Image Vision Comput., 2010, herein incorporated in its entirety by reference. Montabone et al., image saliency technique was developed to measure visual attractiveness for use in human detection, but was not designed to be consistent with an image structure, e.g. pixels within a single pictured object can have very different saliency.
Therefore the presently preferred method defines an importance map that builds on the saliency measurements by working with the structure of salient objects in the original image. In so doing, the importance map defines image structures consistent with pictured structures in the original image, i.e., pixels that are part of same pictured object (or structure) are preferably assigned the same (or similar) importance value. Such a map is preferably achieved using guided filters, i.e., filters that filter the image saliency under the guidance of the original image. Information on guided filters can be found in “Guided Image Filtering”, by He et al., ECCV, pages 1-8, 2010, herein incorporated in its entirety by reference.
The resulting structure-consistent importance map provides the key constraint to determine how much a pixel is allowed to shift from the original image to the target image. Ideally, neighboring pixels with similar importance should shift together so that the image structure they define will not be distorted. Additionally, important pixels (i.e., pixels having a high importance rating, i.e. they are part of important content as determined by the importance map) should not shift much with respect to neighboring pixels with similar importance ratings, such that their shape in the target image remains close to their shape in the original image. By contrast, unimportant pixels (i.e., pixels having a low importance rating or pixels that are not part of important content) should be allowed greater movement relative to neighboring pixels. Note that these constraints are mainly on the relative shift of neighboring pixels, i.e., the gradient of the pixel shift.
These criterions are addressed in the second major step of the present three-step algorithm. In the second major step, a mapping function to compute the gradient map of pixel shifts based on the importance map is developed.
The third major step of the present three-step algorithm integrates the shift-map gradients across the image to construct a smooth pixel shift-map and render the target image. Since the shift for retargeting is usually one-dimensional (horizontal or vertical), the direct integration can still be inconsistent along the other dimension. This can lead to visual distortion.
Preferably, an importance-weighted filtering method is used to address this issue. This method forces the integrated pixel shifts along both dimensions to be smooth and consistent across the image. The important pixels are weighted more so that the filtering process favors more on preserving their associated contents. The combination of the guided filters and importance-weighted filters in the presently preferred algorithm ensures the consistency of overall image structure while preserving the important contents in the target image.
The method in accord with the present invention differs significantly from prior art content-aware retargeting approaches, such as those described above in reference to
To gauge the effectiveness of the presently preferred embodiment, the present invention was subjected to extensive experiments and comparisons based on the RetargetMe benchmark provided in “A Comparative Study of Image Retargeting”, by M. Rubinstein, D. Gutierrez, O. Sorkine, and A. Shamir, ACM SIGGRAPH ASIA, 2010, herein incorporated in its entirety by reference. The results of these experiments and comparisons confirm the effectiveness and efficiency of the present importance filtering algorithm.
Before discussing the presently preferred embodiment, it may be helpful to first provide an overview of prior art methods of retargeting an input image to a target image. The following is a brief summary of currently known methods of retargeting an input image.
Many algorithms have been proposed for media retargeting across various settings such as aspect ratios. Traditionally this has been achieved by uniformly warping the contents to the target setting or cropping a single important region and discarding the rest. Though maintaining the overall structural consistency, such methods often either distort or discard partially the prominent image contents. To better present the important contents, content-aware methods have become the mainstream for media retargeting. Content-aware retargeting was pioneered by the seam carving method disclosed in “Seam Carving for Content-Aware Image Resizing” by Avidan et al. in Transaction on Graphics, 2007, herein incorporated in its entirety by reference.
The seam carving method, in general, resizes an input image by reducing or adding one seam at each iteration. Each seam consists of a continuous chain of the least important pixel from each row or column so that the carving operation should not alter the important contents. This method has been extended to video retargeting, and extended to allowing discontinuous seams to improve the quality of a target video.
A multi-operator approach has been proposed to optimize a combination of seam carving with cropping and uniform scaling methods. It reduces the visual distortion on image structure from individual operators and improves the target image quality.
In another approach, instead of manipulating one seam at a time, the shift-map method optimizes the cropping and blending of the important image regions to construct the target image. It thus better preserves the important image contents, though at the risk of significant change on the image structure. This method has also been extended to video retargeting.
Another category of methods try to compute a continuous warping function from the original image to the target image. The warping is non-uniform in such a way that the important contents receive little change while the un-important areas, e.g. homogeneous regions, suffer the most distortion. One method of achieving this is to apply a similarity constraint when warping user-specified important regions. Another method uses a saliency-weighted linear system to compute the non-uniform mapping for individual pixels. However, this mapping does not enforce constraints to maintain the image structure and thus can lead to visual distortion. To reduce the distortion, another method applies joint bilateral filters on pixel shifts to rectify the image structure. Still another approach proposes an energy optimization scheme to constrain the distortions. Another method divides an image into uniform grids and computes non-uniform warping that is small on important grids and big on un-important ones. The local warping functions are iteratively optimized by enforcing smoothness constraints on neighboring grids. This can reduce the distortion on the overall image structure.
Despite their differences, the existing methods generally use image saliency directly to determine important pixels to retain, and use local smoothness to restrain undesired visual distortion. Although these methods have their benefits, their resultant global image structure is generally still distorted in many occasions due to their image features being generally computed from individual pixels, or local pixel area (i.e. patches), as indicated in
To achieve improved results over the prior art, the present invention proposes to use the original image directly as a global constraint to guide the retargeting manipulation, which preserves prominent contents as well as minimizes the visual distortion. This method is herein termed “importance filtering.”
Returning now to the method of the present invention, the preferred method resizes an original (i.e. input) image I of width X and height Y to a target image I′ of resolution [X′, Y′]. Without loss of generality, herein an example is illustrated that focuses on a fixed height and a resizing of the width, i.e., X′≠X and Y′=Y. Extension of the present invention to other resizing combinations is considered straightforward and within the scope of one versed in the art.
The basic pipeline of the present importance filtering method in accord with the present invention is illustrated in
Initial image saliency S0 may be a standard saliency measure, which is usually computed from local image features within input image I to obtain a measure the significance (or saliency) of pixels. Various known saliency measurement methods may be used to obtain initial image saliency S0. Examples of such known saliency measurement methods are the gradient magnitudes method, neighborhood discontinuity method, and patch based visual attention method; all of which have achieved some level of success in image retargeting.
However, use of a standard image saliency measure directly in a retargeting operation is not considered sufficient for achieving the objects of the present invention. Instead, the present invention combines techniques from face detection with the initial saliency measurement S0 (preferably generated by means of a visual attention saliency measurement method) to achieved the pixel importance map SImp. That is, the presently preferred embodiment combines a visual attention saliency measure method (such as disclosed in “Human detection using a mobile platform and novel features derived from a visual saliency mechanism” by Montabone et al., Image Vision Comput., 2010, herein incorporated in its entirety by reference) with a face detector (such as the face detector disclosed in “Robust real-time face detection” by Viola et al., Int. J. Comput. Vision, 2004, herein incorporated in its entirety by reference) to compute an importance map SImp (i.e. an image saliency that is more robust and consistent across image structures) of the input image I.
Thus, importance map SImp builds on an initial image saliency measure S0. For example in
A goal of the presently preferred content-aware retargeting method is not only to preserve important contents, but also to minimize distortions on pictured objects/structures. To achieve this, the pixels on the same object (i.e. pixels that are part a common pictured object or structure) should shift in nearly the same way. Since the amount of permissible pixel shift is determined by its importance, or saliency, the importance of all pixels that define the same pictured object should be close to each other (or substantially the same).
The above described prior art saliency measurement methods, including that of Montabone et al., are insufficient for achieving this objective. The presently preferred embodiment therefore defines an importance map that matches pixels to (salient) specific objects, or image structures, (i.e. collects pixels into groups that defined common pictured objects).
To construct an importance map SImp that matches image structures (i.e. recognizes salient image structures and identifies the pixels that comprise them), the presently preferred embodiment uses a guided filtering method, such as described in “Guided image filtering” by He et al., ECCV, pages 1-8, 2010, herein incorporated in its entirety by reference. Use of the guided filtering method enhances the saliency under the guidance of the original image. It is to be understood that selection of an appropriate guided filtering method is considered a design choice, and other guided filtering methods may be used, such as that described in “Bi-affinity filter: A bilateral type filter for color images” by Gupta et al., ECCV 2010 Workshop on Color and Reflectance in Computer Vision, 2010.
Preferably, the guided filtering method considers the target image as a linear transform of the guidance to constrain the smoothing process. The target image thus nicely resembles the structure of the guidance after filtering. The result of this approach is an importance map, or importance image, such as S2Imp shown in
The last two stages in the process flow of
In general, the retargeting of an input image is achieved basically by shifting the pixel coordinates and/or warping their colors from the input image to the target image, such as indicated by a shift-map. For example, a pixel (x, y) from an input image may be retargeted to a target pixel (x′, y) in a target image with a shift along the x-dimension defined as M(x, y)=x′−x. Such pixel shifts across an input image form shift-map. The positive or negative sign of the shift value indicates the shift direction and/or indicates shrinking or enlarging. The magnitude (i.e. absolute value) of the shift value indicates the amount of shifting. As is shown in
In the case of uniform scaling, as illustrated by path P1 in
By contrast as illustrated by path P2, the shift-map gradient G2 resulting from using importance map S2Imp of
As shown, the absolute shift values generally increase monotonically with respect to the x-dimension. However, it is also desirable that the shift-map M be smooth along the y-dimension to avoid distortion. A discussion of how the y-dimension may be made smooth is provided below.
An importance map (such as S2Imp of
Clearly these constraints are mainly on the relative shift of neighboring pixels, which is the gradient of the shift-map, or shift-map gradient G. Constant shift gradients refer to a uniform scaling of the local neighborhood. A zero gradient means a rigid translation of the associated area, and a big gradient corresponds to a large deformation. For ease of illustration, the shift gradient is defined along the width dimension (i.e. the x-dimension), as
G(x,y)=∇xM(x,y) (1)
As explained above, shift-map gradient field G2″ of path P1 is defined using uniform scaling, and shift-map gradient field G2 of path P2 is defined using importance filtering in accord with the present invention. As expected, uniform scaling leads to constant gradients while importance filtering results in greatly varying gradients.
Comparing the original input image I2 with shift-map gradient G2, it can be seen that that the region of the pictured woman is dark (i.e. has low gradient values) and demonstrates a nearly constant gradient so as to maintain that region of input image I2 nearly rigid in the target image (i.e. very little change from the original image to the target image). By contrast, the background area surrounding the region of the pictured woman has a largely varying gradient to allow more severe deformation. This is a desired result.
As is discussed above, a desired shift-map gradient field G is constructed using an importance filtering algorithm based on the importance map. A discussion of this shift-map, gradient mapping function G follows.
The shift-map gradient G is estimated based on the corresponding pixel importance using a non-linear mapping function. Preferably, the importance is normalized such that S2Imp(x, y)ε[0,1]. It is desirable that the gradient mapping function render bigger gradients (i.e. bigger pixel shifts relative to neighboring pixels) when pixel importance is closer to 0, and render smaller gradients (i.e. smaller pixel shifts relative to neighboring pixels) when pixel importance is closer to 1. Thus, the shift-map gradient G should be non-linear so that gradient values drop faster as importance values gets closer to 1. Preferably, gradient values drop faster when normalized importance values are above 0.75.
The next step is to define a gradient mapping function (i.e. a shift-map gradient G) that has these characteristics. An intuitive choice for such a function is the zero-mean Gaussian function, such that the shift-map gradient G may be defined as:
where α is the variance and l is the normalization term. A typical value for σ is 0.5.
From Eqn. (1), the integral of shift gradients in a row equals the total pixel shift from the original width X to the target width X′, i.e., |X′−X|=Σx=0X−1(x, y). Thus, normalizing term l can be computed as,
and α is the scaling factor.
Incorporating Eqn. (3) into Eqn. (2) simplifies the shift-map gradient function G to:
This indicates that the shift gradient G may be uniformly scaled by |α−1|.
With reference to
As an illustration, the results of applying the shift-map gradient G of Eqn. 2 to input image I2 using importance map S2Imp of
On the other hand, when α<1 (indicating a reduction of the input image to the target image, i.e. decreasing the x-dimension of the input image to create the target image) the shift-map gradient for the less-important pixels (shift-map gradient 75) increases linearly with decreasing α. A problem arises because unimportant pixels may be intersperse within important contents or may be adjoining important contents. Since the shrinking operation basically squeezes or even removes less-important pixels, the up-scaled shift-map gradient leads to more severe cutting (i.e. removing) of less-important pixels. Unfortunately, some of the less-important pixels that are cut may be within important contents, as shown in target image 72 where α=0.7 and target image 71 where α=0.4. As is particularly evident from target image 71, because of the severe cutting within unimportant areas adjoining important content comprised of the pictured woman, right and left portions of the pictured woman is cut-off.
It would therefore be desirable for a shift-map gradient to respond differently to changes in a for important content when α>1 (i.e. when the input image is being enlarged) than the linear response shown in shift-map gradient 70. It would also be desirable for a shift-map gradient to respond differently to changes in a for unimportant content when α<1 (i.e. when the input image is being reduced) than the linear response shown in shift-map gradient 75.
Specifically for important content, when α is greater than 1, increases in the shift-map gradient with increasing a should be limited since the important content from the input image would full fit with minimal modification within the target image's enlarged aspect ratio. Similarly for unimportant content, when α is less than 1, increases in the shift-map gradient with decreasing α should be reduced to avoid severe distortion along areas of important content that adjoins unimportant content.
That is, to avoid the above-described distortion, the shift map gradient function should be designed in such a way that, when α>1, the shift-map gradient for an important pixel starts to drop quickly with increasing α. Additionally when α<1, the growth with decreasing α, of the shift-map gradient for a less-important pixel should be slower than the linear growth provided by Eqn. (2). In this way, both the undesired deformation that comes with image enlargement and the undesired cutting of prominent areas that comes with image reductions can be reduced. Accordingly alternate designs for the shift-map gradient function are now presented, as follows:
where the normalization terms are obtained analogously to the computation of the normalization terms in Eqns. (3) and (4), above.
As shown, the shift-map gradient functions of Eqns. (5), (6), and (7) satisfy the above-specified desired effect, but each may serve better for different cases. For important content, when α>1, as shown in
For unimportant content, during reduction of an input image (i.e. when α<1) the gradient values of less-important pixels for all of Eqns. (5), (6) and (7) are always lower than those provided by Eqn. (2). As a result, all three Eqns. (5), (6) and (7) achieve less cutting off of prominent areas than Eqn. (2).
Among Eqns. (5), (6) and (7), Eqn. (5) provides the largest gradient values and thus permits the most cutting-off of prominent areas. Eqn. (6) provides higher values than Eqn. (7) for α>0.6, but lower gradient values than Eqn. (7) for α<0.6. Thus for image reduction, Eqn. (7) leads to the least cutting off of prominent areas when α>0.6 and Eqn. (6) leads to the least cutting off of prominent areas when α<0.6.
For illustration purposes, a comparison of three sets of target images with α set to 0.4, 0.7 and 1.5, and created using the shift-map gradient fields of Eqns. (2), (5), (6) and (7), are shown in
These shift-map gradient functions can be easily combined into one function with more stable or better overall performance. However, they are herein discussed individually for ease of explanation, and it is left as a design choice to determine the best one (or best combination of) function(s) for a given specific need. For example, selection of a specific shift-map gradient function may depend on whether an input image is being enlarged (i.e. α>1) and whether one wishes to maintain important content with least change. Selection may also depend on whether an input image is being reduced (i.e., α<1), on the amount of reduction (i.e. the scaling value of α), and on whether one wishes to reducing the potential cutting off of prominent areas.
For ease of explanation, unless otherwise stated, the provided examples herein, assume the use of Eqn. (5) for construction of shift-map gradient function G. For example, shift-map gradient field G2 in
This leads to the topic of gradient integration of importance-weighted filtering.
Once the shift-map gradients G are constructed, one can integrate them to estimate the pixel shift-map M and render the target image I′, as illustrated in
M=Mopt=minM′|∇xM′−G|
s.t.M′(0,y)=0,M′(X,y)=|X′−X| (8)
where M′ is a pixel shift that when optimized renders the final shift-map M. Such an optimization process is often computationally expensive. Furthermore, since the pixel shift is only one-dimensional, integration along individual rows may still be inconsistent with each other. This inconsistency will cause undesired visual distortion in the target image.
One straight-forward solution to this problem is direct gradient integration followed by shift-map smoothing using box filters. However smoothing the shift-map directly has disadvantages. First, it often leads to artifacts such as blurriness, holes, and pixel swaps, especially across object borders. Second and more importantly, smoothing by box filters may not be able to rectify shift inconsistency across the image. Since this inconsistency is accumulated over the columns during integration, it can be big everywhere in later parts of an image. Additionally, since box filters smooth pixel shifts in only small local neighborhoods, it is unlikely that they can restore global consistency to the target image. An example of an enlarged target image created using this straight-forward approach of direct gradient integration followed by shift-map smoothing using box filters is target image 81 in
A better solution to the optimization problem of Eqn. (8) is an efficient algorithm that incorporates importance-weighted filtering into the integration process. It has been found that this preferred solution better retains consistency across the target image. Specifically, at each step one pixel in each row is integrated at the present column. The shift integral at this pixel equals its gradient plus an importance-weighted average of the pixel shifts in a large neighborhood within the previous column. The formulation for this preferred solution is,
It defines a one-dimensional column filter of size (2r+1). The typical choice for r is a quarter of the image height. The high efficiency of a one-dimensional filter allows such a large kernel size, and averaging over a large neighborhood enables the integrated shift-map to be smooth and consistent in both dimensions (i.e. in both the x and y directions). The weight w is designed in such a way that the averaging filter does not affect the important pixels as much as the unimportant ones. In another words, in the filtering process the shift from the important pixels should contribute more (i.e. have a higher weight) so that their shape will not be distorted by the nearby unimportant pixels. Hence, the weight w is defined based on the pixel importance as follows,
w(x,y)=eS
As an example, shift-map M2 in
The target image can then be easily warped based on the pixel shifts defined by the thus constructed pixel shift-map M. For comparison purposes, an enlarged target image created using the present method of gradient integration of importance-weighted filtering is illustrated in target image 82 of
The above-described system and method was tested on various input images in a PC with Duo CPU 2.53 GHz. Without code optimization, it takes less than 80 ms for retargeting an input image of 1024×768, without the need of down-sampling. The present system is flexible enough to provide both fully automatic solution and interactive ways for users to select areas of an input image to preserve. However, all results shown herein were achieved by the automatic solution.
As shown, the important contents of input image I3 are well preserved without distortion in target image I3′. Additionally, the overall image structure of input image I3 is also preserved in target image I3′.
The presently preferred method was compared with state-of-art methods in the reduction of an input image by half along the x-dimension (α=0.5). Target images created using each method are shown in
Thus,
As shown, the presently preferred method achieves the best overall balance between retaining the prominent contents and minimizing the distortion on pictured structures. The other methods generally work well but lead to occasional noticeable artifacts.
When using the improved seam carving method, abrupt distortions occur on the human bodies in the first row and the house shapes in the third and fifth rows of
The Shift map method produces smooth and natural images after resizing, but it results in severe cutting off of important contents in almost in all the provided examples. The Shift map method can also alter the pictured structures significantly, such as exhibited by the shoulder of the girl in the second row of
The scale-and-stretch method avoids abrupt distortions and achieves smooth image structure, but important contents, such as the humans in the first two rows, the house shadow in the third row, and the street and sidewalk in the fifth row are stretched or squeezed unnaturally.
In many cases, images resized by these methods, especially those resized by the scale-and-stretch method and the shift map method, may by themselves look quite realistic. But when placed together with the original image, significant changes on image structure or prominent contents can be observed. The presently preferred method tends to minimize such changes.
Among the existing works, the multi-operator method (examples of which are shown in
As shown in
The present method directly warps the image pixels based on the integrated shift-map to render the final target image. In case the unimportant areas are squeezed very much, the direct mapping may result in artifacts like discontinuity in the resized image. For example, in the second row of
The present method can be extended to content-aware video retargeting in a straight-forward manner, by basically frame-by-frame application of the present image resizing method. A small difference is that a motion feature, motion energy image (MEI), is added to the saliency cues, as it would be understood by those versed in the art. For each frame, the MEI is computed using a neighborhood of 20 frames and directly added to its image saliency to construct the combined saliency. The rest of the process is exactly identical to that of image retargeting described above. Since the original image frame is used as guidance in the filtering processes, the presently preferred method is able to naturally maintain the temporal coherence in the retargeted video without the need of special care. For a video of 640×256, the present method achieves 15 fps with pre-computed saliency or 6 fps including saliency calculation using the above-mentioned PC. The present method does not consider the global camera motion and thus works on videos by nearly fixed cameras. However, camera motion can be compensated by frame registration, as it is known in the art.
In summary, the above-described importance filtering algorithm for content-aware image retargeting directly uses the original image as the constraint to filter and estimate pixel importance so that it is consistent with the original image's pictured structure. This is a key to minimize visual distortion and while preserving prominent image contents. The constraint is applied on the gradient of pixel shift, instead of directly on pixel shift. This further avoids undesired distortion such as pixel swap that occurs in many prior art methods. The importance filtering operations are highly efficient and ready for real-time applications. A simple extension to video retargeting is also shown to be promising.
One potential improvement to the importance filtering algorithm is to extend the one-dimensional shift gradients to two dimensions. Even though the pixels all shift along the same dimension, the shift-map on the two dimension image has a two dimension gradient field.
While the invention has been described in conjunction with several specific embodiments, it is evident to those skilled in the art that many further alternatives, modifications and variations will be apparent in light of the foregoing description. Thus, the invention described herein is intended to embrace all such alternatives, modifications, applications and variations as may fall within the spirit and scope of the appended claims.
Patent | Priority | Assignee | Title |
10599946, | Mar 15 2017 | Tata Consultancy Services Limited | System and method for detecting change using ontology based saliency |
8965141, | Dec 20 2012 | MAGNOLIA LICENSING LLC | Image filtering based on structural information |
9665925, | Jun 24 2014 | Xiaomi Inc. | Method and terminal device for retargeting images |
ER8704, |
Patent | Priority | Assignee | Title |
7574069, | Aug 01 2005 | Mitsubishi Electric Research Laboratories, Inc | Retargeting images for small displays |
7747107, | Mar 06 2007 | Mitsubishi Electric Research Laboratories, Inc | Method for retargeting images |
7773099, | Jun 28 2007 | Mitsubishi Electric Research Laboratories, Inc | Context aware image conversion method and playback system |
8175376, | Mar 09 2009 | Xerox Corporation | Framework for image thumbnailing based on visual similarity |
8249394, | Mar 12 2009 | Yissum Research Development Company of the Hebrew University of Jerusalem | Method and system for shift-map image editing |
8369652, | Jun 16 2008 | HRL Laboratories, LLC | Visual attention system for salient regions in imagery |
8373802, | Sep 01 2009 | THE WALT DISNEY STUDIOS SCHWEIZ GMBH; ETH ZURICH EIDGENOESSISCHE TECHNISCHE HOCHSCHULE ZURICH ; DISNEY ENTERPRISES, INC | Art-directable retargeting for streaming video |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
May 02 2011 | Seiko Epson Corporation | (assignment on the face of the patent) | / | |||
May 02 2011 | DING, YUANYUAN | EPSON RESEARCH AND DEVELOPMENT, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 026212 | /0327 | |
May 02 2011 | XIAO, JING | EPSON RESEARCH AND DEVELOPMENT, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 026212 | /0327 | |
May 10 2011 | EPSON RESEARCH AND DEVELOPMENT, INC | Seiko Epson Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 026681 | /0400 |
Date | Maintenance Fee Events |
Jan 12 2017 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Mar 15 2021 | REM: Maintenance Fee Reminder Mailed. |
Aug 30 2021 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Jul 23 2016 | 4 years fee payment window open |
Jan 23 2017 | 6 months grace period start (w surcharge) |
Jul 23 2017 | patent expiry (for year 4) |
Jul 23 2019 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 23 2020 | 8 years fee payment window open |
Jan 23 2021 | 6 months grace period start (w surcharge) |
Jul 23 2021 | patent expiry (for year 8) |
Jul 23 2023 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 23 2024 | 12 years fee payment window open |
Jan 23 2025 | 6 months grace period start (w surcharge) |
Jul 23 2025 | patent expiry (for year 12) |
Jul 23 2027 | 2 years to revive unintentionally abandoned end. (for year 12) |