A technique for improving image compression by pre-processing the image frames. In particular, methods for de-interlacing and noise reduction using combinations of median filters, applied both spatially and temporally, with and without motion analysis, are described.

Patent
   RE44235
Priority
Jan 30 1996
Filed
Feb 06 2012
Issued
May 21 2013
Expiry
Jan 30 2016

TERM.DISCL.
Assg.orig
Entity
Large
3
37
EXPIRED
10. A method for enhancing image quality in an image encoding system, including:
applying a normal down filter to an image to create a first intermediate image;
applying a Gaussian up filter to the first intermediate image to create a second intermediate image; and
adding a weighted fraction of the second intermediate image to a selected image to create an image having reduced high frequency noise.
0. 12. A method for enhancing image quality in an image system, the method comprising:
applying a filter to adjacent pixel values of a video image to generate a filtered value for motion compensation with sub-pixel displacement, the filter including a first negative lobe, a second negative lobe, and a positive lobe disposed between the first and second negative lobes;
wherein an absolute amplitude of each of the negative lobes is less than an absolute amplitude of the positive lobe.
3. A method for enhancing image quality in an image encoding system, including creating a noise-reduced digital video image comprising a linear weighted sum of five terms:
a current digital video image;
an average of horizontal and vertical medians of the current digital video image;
a thresholded temporal median;
an average of horizontal and vertical medians of the thresholded temporal median; and
a median of the thresholded temporal median and horizontal and vertical medians of the current digital video image,
wherein the weights of the five terms are approximately 50%, 15%, 10%, 10%, and 15%, respectively.
4. A method for enhancing image quality in an image encoding system, including creating a noise-reduced digital video image comprising a linear weighted sum of five terms:
a current digital video image;
an average of horizontal and vertical medians of the current digital video image;
a thresholded temporal median;
an average of horizontal and vertical medians of the thresholded temporal median; and
a median of the thresholded temporal median and horizontal and vertical medians of the current digital video image,
wherein the weights of the five terms are approximately 35%, 20%, 22.5%, 10%, and 12.5%, respectively.
6. A method for enhancing image quality in an image encoding system, including:
determining a motion vector for each n×n pixel region of a current digital video image with respect to at least one previous digital video image and at least one subsequent digital video image; and
applying a center weighted temporal filter to each n×n pixel region of the current digital video image and corresponding motion-vector offset n×n pixel regions of the at least one previous digital video image and at least one subsequent digital video image to create a motion-compensated image,
wherein each digital video image is a three-field-frame de-interlaced image.
7. A method for enhancing image quality in an image encoding system, including:
determining a motion vector for each n×n pixel region of a current digital video image with respect to at least one previous digital video image and at least one subsequent digital video image; and
applying a center weighted temporal filter to each n×n pixel region of the current digital video image and corresponding motion-vector offset n×n pixel regions of the at least one previous digital video image and at least one subsequent digital video image to create a motion-compensated image,
wherein each digital video image is a thresholded three-field-frame de-interlaced image.
8. A method for enhancing image quality in an image encoding system, including:
determining a motion vector for each n×n pixel region of a current digital video image with respect to at least one previous digital video image and at least one subsequent digital video image; and
applying a center weighted temporal filter to each n×n pixel region of the current digital video image and corresponding motion-vector offset n×n pixel regions of the at least one previous digital video image and at least one subsequent digital video image to create a motion-compensated image,
wherein the center weighted temporal filter is a three-image temporal filter having weights for each of such images of approximately 25%, 50%, and 25%, respectively.
9. A method for enhancing image quality in an image encoding system, including:
determining a motion vector for each n×n pixel region of a current digital video image with respect to at least one previous digital video image and at least one subsequent digital video image; and
applying a center weighted temporal filter to each n×n pixel region of the current digital video image and corresponding motion-vector offset n×n pixel regions of the at least one previous digital video image and at least one subsequent digital video image to create a motion-compensated image,
wherein the center weighted temporal filter is a five-image temporal filter having weights for each of such images of approximately 10%, 20%, 40%, 20%, and 10%, respectively.
1. A method for enhancing image quality in an image encoding system, including:
applying a temporal median filter to corresponding pixel values of a previous digital video image, a current digital video image, and a next digital video image to create a noise-reduced digital video image;
comparing the difference between each corresponding pixel value of each noise-reduced digital video image and each corresponding current digital video image to a threshold value to generate a difference value; and
selecting, for each final pixel value for the noise-reduced digital video image, a corresponding pixel value from the current digital video image if the difference value is within a first threshold comparison range, and a corresponding pixel value from the noise-reduced digital video image if the difference value is within a second threshold comparison range.
2. A method for enhancing image quality in an image encoding system, including:
applying a temporal median filter to corresponding pixel values of a previous digital video image, a current digital video image, and a next digital video image to create a noise-reduced digital video image;
comparing the difference between each corresponding pixel value of each noise-reduced digital video image and each corresponding current digital video image to a threshold value to generate a difference value; and
selecting, for each final pixel value for the noise-reduced digital video image, a corresponding pixel value from the current digital video image if the difference value is within a first threshold comparison range, and a corresponding pixel value from the noise-reduced digital video image if the difference value is within a second threshold comparison range,
wherein the threshold value is selected from the range of approximately 0.1 to approximately 0.3.
5. A method for enhancing image quality in an image encoding system, including:
creating a noise-reduced digital video image comprising a linear weighted sum of five terms:
a current digital video image;
an average of horizontal and vertical medians of the current digital video image;
a thresholded temporal median;
an average of horizontal and vertical medians of the thresholded temporal median; and
a median of the thresholded temporal median and horizontal and vertical medians of the current digital video image;
determining a motion vector for each n×n pixel region of the current digital video image with respect to at least one previous digital video image and at least one subsequent digital video image;
applying a center weighted temporal filter to each n×n pixel region of the current digital video image and corresponding motion-vector offset n×n pixel regions of the at least one previous digital video image and at least one subsequent digital video image to create a motion-compensated image; and
adding the motion-compensated image to the noise-reduced digital video image.
11. The method of claim 10, wherein the weighted fraction is between approximately 5% and 10% of the second intermediate image.
0. 13. The method of claim 12, wherein the filter comprises only four values.
0. 14. The method of claim 13, wherein the positive lobe comprises only of first and second values of the four values, the first and second values differing from each other.
0. 15. The method of claim 14, wherein the first value is greater than the second value.
0. 16. The method of claim 14, wherein the first value is less than the second value.
0. 17. The method of claim 12, wherein the filter comprises at least four values.
0. 18. The method of claim 17, wherein the positive lobe comprises a first value and a second value, the first and second value differing from each other.
0. 19. The method of claim 18, wherein the first value is greater than the second value.
0. 20. The method of claim 18, wherein the first value is less than the second value.
0. 21. The method of claim 12, wherein the adjacent pixel values correspond to adjacent pixels horizontally aligned.
0. 22. The method of claim 21, further comprising applying the filter to adjacent pixel values vertically aligned.
0. 23. The method of claim 12, wherein the adjacent pixel values correspond to adjacent pixels vertically aligned.
0. 24. The method of claim 12, wherein all positive lobes are disposed between of the first and second negative lobes.
.
Gdiff=G_single_field_de-interlaced minus G_three_field_de-interlaced
Bdiff=B_single_field_de-interlaced minus B_three_field_de-interlaced
ThresholdingValue=abs(Rdiff+Gdiff+Bdiff)+abs (Rdiff)+abs(Gdiff)+abs(Bdiff)

The ThresholdingValue is then compared to a threshold setting. Typical threshold settings are in the range of 0.1 to 0.3, with 0.2 being most common. FIG. 3 shows a block diagram of this threshold test. The PROCESSING block 30 multiplies the inputs by [0.25, 0.5, 0.25] and sums the results. The SELECTION CONTROL block 32 compares the output 36 of the PROCESSING block 30 with Input B 34 using the above equations for Rdiff, Gdiff, Bdiff, and ThresholdingValue. The switch selects the PROCESSING output 36 if the ThresholdingValue is less than the threshold, otherwise the switch selects Input B 34, the middle value, for the output 38.

In order to remove noise from this threshold, smooth-filtering the three-field-frame and single-field-frame de-interlaced pictures can be used before comparing and thresholding them. This smooth filtering can be accomplished simply by down filtering (e.g., down filtering by two), and then up filtering (e.g., using a gaussian up-filter by two). This “down-up” smoothed filter can be applied to both the single-field-frame de-interlaced picture and the three-field-frame de-interlaced picture. The smoothed single-field-frame and three-field-frame pictures can then be compared to compute a ThresholdingValue and then thresholded to determine which picture will source each final output pixel.

In particular, the threshold test is used as a switch to select between the single-field-frame de-interlaced picture and the three-field-frame temporal filter combination of single-field-frame de-interlaced pictures. This selection then results in an image where the pixels are from the three-field-frame de-interlacer in those areas where that image differs in small amounts (i.e., below the threshold) from the single field-frame image, and where the pixels are from the single field-frame image in those areas where the three-field-frame differed more than then the threshold amount from the single-field-frame de-interlaced pixels (after smoothing).

This technique has proven effective in preserving single-field fast motion details (by switching to the single-field-frame de-interlaced pixels), while smoothing large portions of the image (by switching to the three-field-frame de-interlaced temporal filter combination).

In addition to selecting between the single-field-frame and three-field-frame de-interlaced image, it is also often beneficial to add a bit of the single-field-frame image to the three-field-frame de-interlaced picture, to preserve some of the immediacy of the single field pictures over the entire image. This immediacy is balanced against the temporal smoothness of the three-field-frame filter. A typical blending is to create new frame by adding 33.33% (⅓) of a single middle field-frame to 66.67% (⅔) of the corresponding three-field-frame smoothed image. This can be done before or after threshold switching, since the result is the same either way, only affecting the smoothed three-field-frame picture. Note that this is effectively equivalent to using a different proportion of the three field-frames, rather than the original three-field-frame weights of [0.25, 0.5, 0.25]. Computing ⅔ of [0.25, 0.5, 0.25] plus ⅓ of (0,1,0), yields [0.1667, 0.6666, 0.1667] as the temporal filter for the three field-frames. The more heavily weighted center (current) field-frame brings additional immediacy to the result, even in the smoothed areas which fell below the threshold value. This combination has proven effective in balancing temporal smoothness with immediacy in the de-interlacing process for moving parts of a scene.

Use of Linear Filters

Sums, filters, or matrices involving video pictures should take into account the fact that pixel values in video are non-linear signals. For example, the video curve for HDTV can be several variations of coefficients and factors, but a typical formula is the international CCIR XA-11 (now called Rec. 709):
V=1.0993*L0.45−0.0993 for L>0.018051
V=4.5*L for L<=0.018051

where V is the video value and L is linear light luminance.

The variations adjust the threshold (0.018051) a little, the factor (4.5) a little (e.g. 4.0), and the exponent (0.45) a little (e.g., 0.4). The fundamental formula, however, remains the same.

A matrix operation, such as a RGB to/from YUV conversion, implies linear values. The fact that MPEG in general uses the video non-linear values as if they were linear results in leakage between the luminance (Y) and the color values (U, and V). This leakage interferes with compression efficiency. The use of a logarithmic representation, such as is used with film density units, corrects much of this problem. The various types of MPEG encoding are neutral to the non-linear aspects of the signal, although its efficiency is effected due to the use of the matrix conversion RGB to/from YUV.YUV (U=R−Y, V=B−Y) should have Y computed as a linearized sum of 0.59 G, plus 0.29 R, plus 0.12 B (or slight variations on these coefficients). However, U (=R−Y) becomes equivalent to R/Y in logarithmic space, which is orthogonal to luminance. Thus, a shaded orange ball will not vary the U (=R−Y) parameter in a logarithmic representation. The brightness variation will be represented completely in the Luminance parameter, where full detail is provided.

The linear vs. logarithmic vs. video issue impacts filtering. A key point to note is that small signal excursions (e.g. 10% or less) are approximately correct when a non-linear video signal is processed as if it were a linear signal. This is because a piece-wise linear approximation to the smooth video-to-from-linear conversion curve is reasonable. However, for large excursions, a linear filter is much more effective, and produces much better image quality. Accordingly, if large excursions are to be optimally coded, transformed, or otherwise processed, it would be desirable to first convert the non-linear signal to a linear one in order to be able to apply a linear filter.

De-interlacing is therefore much better when each filter and summation step utilizes conversions to linear values prior to filtering or summing. This is due to the large signal excursions inherent in interlaced signals at small details of the image. After filtering, the image signals are converted back to the non-linear video digital representation. Thus, the three-field-frame weighting (e.g., [0.25, 0.5, 0.25] or [0.1667, 0.6666, 0.1667]) should be performed on a linearized video signal. Other filtering and weighted sums of partial terms in noise and de-interlace filtering should also be converted to linear form for computation. Which operations warrant linear processing is determined by signal excursion, and the type of filtering. Image sharpening can be appropriately computed in video or logarithmic non-linear representations, since it is self-proportional. However, matrix processing, spatial filtering, weighted sums, and de-interlace processing should be computed using linearized digital values.

As a simple example, the single field-frame de-interlacer described above computes missing alternate lines by averaging the line above and below each actual line. This average is much more correct numerically and visually if this average is done linearly. Thus, instead of summing 0.5 times the line above plus 0.5 times the line below, the digital values are linearized first, then averaged, and then reconverted back into the non-linear video representation.

Median Filters

In noise processing, the most useful filter is the median filter. A three element median filter just ranks the three entries, via a simple sort, and picks the middle one. For example, an X (horizontal) median filter looks at the red value (or green or blue) of three adjacent horizontal pixels, and picks the one with the middle-most value. If two are the same, that value is selected. Similarly, a Y (vertical) filter looks in the scanlines above and below the current pixel, and again picks the middle value.

It has been experimentally determined that it is useful to average the results from applying both an X and a Y median filter to create a new noise-reducing component picture (i.e., each new pixel is the 50% equal average of the X and Y medians for the corresponding pixel from a source image).

In addition to X and Y (horizontal and vertical) medians, it is also possible to take diagonal and other medians. However, the vertical and horizontal pixel values are most close physically to any particular pixel, and therefore produce less potential error or distortion than the diagonals. However, such other medians remain available in cases where noise reduction is difficult using only the vertical and horizontal medians.

Another beneficial source of noise reduction is information from the previous and subsequent frame (i.e., a temporal median). As mentioned below, motion analysis provides the best match for moving regions. However, it is compute intensive. If a region of the image is not moving, or is moving slowly, the red values (and green and blue) from a current pixel can be median filtered with the red value at that same pixel location in the previous and subsequent frames. However, odd artifacts may occur if significant motion is present and such a temporal filter is used. Thus, it is preferred that a threshold be taken first, to determine whether such a median would differ more than a selected amount from the value of a current pixel. The threshold can be computed essentially the same as for the de-interlacing threshold above:
Rdiff=R_current_pixel minus R_temporal_median
Gdiff=G_current_pixel minus G_temporal_median
Bdiff=B_current_pixel minus B_temporal_median
ThresholdingValue=abs(Rdiff+Gdiff+Bdiff)+abs (Rdiff)+abs(Gdiff)+abs(Bdiff)

The ThresholdingValue is then compared to a threshold setting. Typical threshold settings are in the range 0.1 to 0.3, with 0.2 being typical. Above the threshold, the current value is kept. Below the threshold, the temporal median is used. The block diagram of FIG. 3 also applies to this threshold test. In this case the PROCESSING block 30 is a temporal median filter and the inputs are three successive frames. The SELECTION CONTROL block 32 compares the output 36 of the PROCESSING block 30 with Input B 34 using the above equations for Rdiff, Gdiff, Bdiff, and ThresholdingValue. The switch selects the PROCESSING output 36 if the ThresholdingValue is less than the threshold, otherwise the switch selects Input B 34, the middle value, for the output 38.

An additional median type is a median taken between the X, Y, and temporal medians. Another median type can take the temporal median, and then take the equal average of the X and Y medians from it.

Each type of median can cause problems. X and Y medians smear and blur an image, so that it looks “greasy”. Temporal medians cause smearing of motion over time. Since each median can result in problems, yet each median's properties are different (and, in some sense, “orthogonal”), it has been determined experimentally that the best results come by combining a variety of medians.

In particular, FIG. 4 shows a preferred combination of medians is a linear weighted sum (see the discussion above on linear video processing) of five terms to determine the value for each pixel of a current image:

50% of the original image (Frame N 40) (thus, the most noise reduction is 3 db, or half);

15% of the average of X and Y medians 42, 44, respectively;

10% of the thresholded temporal median 46;

10% of the average of X and Y medians of the thresholded temporal median (48); and

15% of a three-way X, Y, and temporal median (50).

This set of time medians does a reasonable job of reducing the noise in the image without making it appear “greasy” or blurred, causing temporal smearing of moving objects, or losing detail. Another useful weighting of these five terms is 35%, 20%, 22.5%, 10%, and 12.5%, respectively.

In addition, it is useful to apply motion-compensation by applying center weighted temporal filters to a motion-compensated n×n region, as described below. This can be added to the median filtered image result (of five terms, just described) to further smooth the image, providing better smoothing and detail on moving image regions.

Motion Analysis

In addition to “in-place” temporal filtering, which does a good job at smoothing slow-moving details, de-interlacing and noise reduction can also be improved by use of motion analysis. Adding the pixels at the same location in three fields or three frames is valid for stationary objects. However, for moving objects, if temporal averaging/smoothing is desired, it is often more optimal to attempt to analyze prevailing motion over a small group of pixels. For example, an n×n block of pixels (e.g., 2×2, 3×3, 4×4, 6×6, or 8×8) can be used to search in previous and subsequent fields or frames to attempt to find a match (in the same way MPEG-2 motion vectors are found by matching 16×16 macroblocks). Once a best match is found in one or more previous and subsequent frames, a “trajectory” and “moving mini-picture” can be determined. For interlaced fields, it is best to analyze comparisons as well as compute inferred moving mini-pictures utilizing the results of the thresholded de-interlaced process above. Since this process has already separated the fast-moving from the slow-moving details, and has already smoothed the slow moving details, the picture comparisons and reconstructions are more applicable than individual de-interlaced fields.

The motion analysis preferably is performed by comparison of an n×n block in the current thresholded de-interlaced image with all nearby blocks in the previous and subsequent one or more frames. The comparison may be the absolute value of differences in luminance or RGB over the n×n block. One frame is sufficient forward and backward if the motion vectors are nearly equal and opposite. However, if the motion vectors are not nearly equal and opposite, then an additional one or two frames forward and backward can help determine the actual trajectory. Further, different de-interlacing treatments may be useful in helping determine the “best guess” motion vectors going forward and back. One de-interlacing treatment can be to use only individual de-interlaced fields, although this is heavily prone to aliasing and artifacts on small moving details. Another de-interlacing technique is to use only the three-field-frame smooth de-interlacing, without thresholding, having weightings [0.25, 0.5, 0.25], as described above. Although details are smoothed and sometimes lost, the trajectory may often be more correct.

Once a trajectory is found, a “smoothed n×n block” can be created by temporally filtering using the motion-vector-offset pixels from the one (or more) previous and subsequent frames. A typical filter might again be [0.25, 0.5, 0.25] or [0.1667, 0.6666, 0.1667] for three frames, and possibly [0.1, 0.2, 0.4, 0.2, 0.1] for two frames back and forward. Other filters, with less central weight, are also useful, especially with smaller block sizes (such as 2×2, 3×3, and 4×4). Reliability of the match between frames is indicated by the absolute difference value. Large minimum absolute differences can be used to select more center weight in the filter. Lower values of absolute differences can suggest a good match, and can be used to select less center weight to more evenly distribute the average over a span of several frames of motion-compensated blocks.

These filter weights can be applied to: individual de-interlaced motion-compensated field-frames; thresholded three-field-frame de-interlaced pictures, described above; and non-thresholded three-field-frame de-interlaced images, with a [0.25, 0.5, 0.25] weighting, also as described above. However, the best filter weights usually come from applying the motion-compensated block linear filtering to the thresholded three-field-frame result described above. This is because the thresholded three-field-frame image is both the smoothest (in terms of removing aliasing in smooth areas), as well as the most motion-responsive (in terms of defaulting to a single de-interlaced field-frame above the threshold). Thus, the motion vectors from motion analysis can be used as the inputs to multi-frame or multi-de-interlaced-field-frame or single-de-interlaced field-frame filters, or combinations thereof. The thresholded multi-field-frame de-interlaced images, however, form the best filter input in most cases.

The use of motion analysis is computationally expensive for a large search region, when fast motion might be found (such as ±32 pixels). Accordingly, it may be best to augment the speed by using special-purpose hardware or a digital signal processor assisted computer.

Once motion vectors are found, together with their absolute difference measure of accuracy, they can be utilized for the complex process of attempting frame rate conversion. However, occlusion issues (objects obscuring or revealing others) will confound matches, and cannot be accurately inferred automatically. Occlusion can also involve temporal aliasing, as can normal image temporal undersampling and its beat with natural image frequencies (such as the “backward wagon wheel” effect in movies). These problems often cannot be unraveled by any known computation technique, and to date require human assistance. Thus, human scrutiny and adjustment, when real-time automatic processing is not required, can be used for off-line and non-real-time frame-rate conversion and other similar temporal processes.

De-interlacing is a simple form of the same problem. Just as with frame-rate-conversion, the task of de-interlacing is theoretically impossible to perform perfectly. This is especially due to the temporal undersampling (closed shutter), and an inappropriate temporal sample filter (i.e., a box filter). However, even with correct samples, issues such as occlusion and interlace aliasing further ensure the theoretical impossibility of correct results. The cases where this is visible are mitigated by the depth of the tools, as described here, which are applied to the problem. Pathological cases will always exist in real image sequences. The goal can only be to reduce the frequency and level of impairment when these sequences are encountered. However, in many cases, the de-interlacing process can be acceptably fully automated, and can run unassisted in real-time. Even so, there are many parameters which can often benefit from manual adjustment.

Filter Smoothing of High Frequencies

In addition to median filtering, reducing high frequency detail will also reduce high frequency noise. However, this smoothing comes at the price of loss of sharpness and detail. Thus, only a small amount of such smoothing is generally useful. A filter which creates smoothing can be easily made, as with the threshold for de-interlacing, by down-filtering with a normal filter (e.g., truncated sinc filter) and then up-filtering with a gaussian filter. The result will be smoothed because it is devoid of high frequency picture detail. When such a term is added, it typically must be in very small amounts, such as 5% to 10%, in order to provide a small amount of noise reduction. In larger amounts, the blurring effect generally becomes quite visible.

Base Layer Noise Filtering

The filter parameters for the median filtering described above for an original image should be matched to the noise characteristics of the film grain or image sensor that captured the image. After this median filtered image is down-filtered to generate an input to the base layer compression process, it still contains a small amount of noise. This noise may be further reduced by a combination of another X-Y median filters (equally averaging the X and Y medians), plus a very small amount of the high frequency smoothing filter. A preferred filter weighting of these three terms, applied to each pixel of the base layer, is:

75% of the original base layer (down filtered from median-filtered original above);

22.5% of the average of X and Y medians; and

7.5% of the down-up smoothing filter.

This small amount of additional filtering in the base layer provides a small additional amount of noise reduction and improved stability, resulting in better MPEG encoding and limiting the amount of noise added by such encoding.

Image Filtering

Downsizing and Upsizing Filters

Experimentation has shown that the downsizing filter used in creating a base layer from a high resolution original picture is most optimal if it includes modest negative lobes and an extent which stops after the first very small positive lobes after the negative lobes. FIG. 5 is a diagram of the relative shape, amplitudes, and lobe polarity of a preferred downsizing filter. The down filter essentially is a center-weighted function which has been truncated to a center positive lobe 500, a symmetric pair of adjacent (bracketing) small negative lobes 504, and a symmetric pair of adjacent (bracketing) very small outer positive lobes 504. The absolute amplitude of the lobes 500, 502, 504 may be adjusted as desired, so long as the relative polarity and amplitude inequality relationships shown in FIG. 5 are maintained. However, a good first approximation for the relative amplitudes are defined by a truncated sinc function (sinc(x)=sin(x)/x)). Such filters can be used separably, which means that the horizontal data dimension is independently filtered and resized, and then the vertical data dimension, or vise versa; the result is the same.

When creating a base layer original (as input to the base layer compression) from a low-noise high resolution original input, the preferred downsizing filter has first negative lobes which are of a normal sinc function amplitude. For clean and for high resolution input images, this normal truncated sinc function works well. For lower resolutions (e.g., 1280×720, 1024×768, or 1536×768), and for noisier input pictures, a reduced first negative lobe amplitude in the filters is more optimal. A suitable amplitude in such cases is about half the truncated sinc function negative lobe amplitude. The small first positive lobes outside of the first negative lobes are also reduced to lower amplitude, typically to ½ to ⅔ of the normal sinc function amplitude. The affect of reducing the first negative lobes is the main issue, since the small outside positive lobes do not contribute to picture noise. Further samples outside the first positive lobes preferably are truncated to minimize ringing and other potential artifacts.

The choice of whether to use milder negative lobes or full sinc function amplitude negative lobes in the downfilter is determined by the resolution and noise level of the original image. It is also somewhat a function of image content, since some types of scenes are easier to code than others (mainly related to the amount of motion and change in a particular shot). By using a “milder” downfilter having reduced negative lobes, noise in the base layer is reduced, and a cleaner and quieter compression of the base layer is achieved, thus also resulting in fewer artifacts.

Experimentation has also shown that the optimal upsizing filter has a center positive lobe with small adjacent negative lobes, but no further positive lobes. FIGS. 6A and 6B are diagrams of the relative shape, amplitudes, and lobe polarity of a pair of preferred upsizing filters for upsizing by a factor of 2. A central positive lobe 600, 600′ is bracketed by a pair of small negative lobes 602, 602′. An asymmetrically placed positive lobe 604, 604′ is also required. These paired upfilters could also be considered to be truncated sinc filters centered on the newly created samples. For example, for a factor of two upfilter, two new samples will be created for each original sample. The small adjacent negative lobes 602, 602′ have less negative amplitude than is used in the corresponding downsizing filter (FIG. 5), or than would be used in an optimal (sinc-based) upsizing filter for normal images. This is because the images being upsized are decompressed, and the compression process changes the spectral distribution. Thus, more modest negative lobes, and no additional positive lobes beyond the middle ones 600, 600′, work better for upsizing a decompressed base layer.

Experimentation has shown that slight negative lobes 602, 602′ provide a better layered result than positive-only gaussian or spline upfilters (note that splines can have negative lobes, but are most often used in the positive-only form). Thus, this upsizing filter preferably is used for the base layer in both the encoder and the decoder.

Weighting of High Octave of Picture Detail

In the preferred embodiment, the signal path which expands the original uncompressed base layer input image uses a gaussian upfilter rather than the upfilter described above. In particular, a gaussian upfilter is used for the “high octave” of picture detail, which is determined by subtracting the expanded original base-resolution input image (without using compression) from the original picture. Thus, no negative lobes are used for this particular upfiltered expansion.

As noted above, for MPEG-2 this high octave difference signal path is typically weighted with 0.25 (or 25%) and added to the expanded decompressed base layer (using the other upfilter described above) as input to the enhancement layer compression process. However, experimentation has shown that weights of 10%, 15%, 20%, 30%, and 35% are useful for particular images when using MPEG-2. Other weights may also prove useful. For MPEG-4, it has been found that filter weights of 4-8% may be optimal when used in conjunction with other improvements described below. Accordingly, this weighting should be regarded as an adjustable parameter, depending upon the encoding system, the scenes being encoded/compressed, the particular camera (or film) being used, and the image resolution.

Filters with Negative Lobes for Motion Compensation in MPEG-2 and MPEG-4

In MPEG-4, reference filters have been implemented for shifting macroblocks when finding the best motion vector match, and then using the matched region for motion compensation. MPEG-4 video coding, like MPEG-2, supports ½ pixel resolution of motion vectors for macroblocks. Unlike MPEG-2, MPEG-4 also supports ¼ pixel accuracy. However, in the reference implementation of MPEG-4, the filters used are sub-optimal. In MPEG-2, the half-way point between pixels is just the average of the two neighbors, which is a sub-optimal box filter. In MPEG-4, this filter is used for ½ pixel resolution. If ¼ pixel resolution is invoked in MPEG-4 Part 2, a filter with negative lobes is used for the half-way point, but a sub-optimal box filter with this result and the neighboring pixels is used for the ¼ and ¾ points.

Further, the chrominance channels (U=R−Y and V=B−Y) do not use any sub-pixel resolution in the motion compensation step under MPEG-4. Since the luminance channel (Y) has resolution to the ½ or ¼ pixel, the half-resolution chrominance U and V channels should be sampled using filters to ¼ pixel resolution, corresponding to ½ pixel in luminance. When ¼ pixel resolution is selected for luminance, then ⅛ pixel resolution should be used for U and V chrominance.

Experiments have shown that the effects of filtering are significantly improved by using a negative lobe truncated sinc function (as described above) for filtering the ¼, ½, and ¾ pixel points when doing ¼ pixel resolution in luminance, and by using similar negative lobes when doing ½ pixel resolution for the filter which creates the ½ pixel position.

Similarly, effects of filtering are significantly improved by using a negative lobe truncated sinc function for filtering the ⅛-pixel points for U and V chrominance when using ¼ pixel luminance resolution, and by using ¼ pixel resolution filters with similar negative lobe filters when using ½ pixel luminance resolution.

It has been discovered that the combination of quarter-pixel motion vectors with truncated sinc motion compensated displacement filtering results in a major improvement in picture quality. In particular, clarity is improved, noise and artifacts are reduced, and chroma detail is increased.

These filters may be applied to video images under MPEG-1, MPEG-2, MPEG-4 or any other appropriate motion-compensated block-based image coding system.

The invention may be implemented in hardware or software, or a combination of both. However, preferably, the invention is implemented in computer programs executing on one or more programmable computers each comprising at least a processor, a data storage system (including volatile and non-volatile memory and/or storage elements), an input device, and an output device. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices, in known fashion.

Each such program may be implemented in any desired computer language (including machine, assembly, or high level procedural, logical, or object oriented programming languages) to communicate with a computer system. In any case, the language may be a compiled or interpreted language.

Each such computer program is preferably stored on a storage media or device (e.g., ROM, CD-ROM, or magnetic or optical media) readable by a general or special purpose programmable computer system, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein. The inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein.

A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, while the preferred embodiment uses MPEG-2 or MPEG-4 coding and decoding, the invention will work with any comparable standard that provides equivalents of I, P, and/or B frames and layers. Accordingly, it is to be understood that the invention is not to be limited by the specific illustrated embodiment, but only by the scope of the appended claims.

Demos, Gary A.

Patent Priority Assignee Title
8767829, Jul 11 2001 Dolby Laboratories Licensing Corporation Switch-select single frame reference
8942285, Jul 11 2001 Dolby Laboratories Licensing Corporation Motion compensation filtering in an image system
8995528, Jul 11 2001 Dolby Laboratories Licensing Corporation Switch-select single frame reference
Patent Priority Assignee Title
4903317, Jun 24 1986 Kabushiki Kaisha Toshiba Image processing apparatus
5253058, Apr 01 1992 SHINGO LIMITED LIABILITY COMPANY Efficient coding scheme for multilevel video transmission
5270813, Jul 02 1992 SHINGO LIMITED LIABILITY COMPANY Spatially scalable video coding facilitating the derivation of variable-resolution images
5387940, Jul 07 1993 RCA Thomson Licensing Corporation Method and apparatus for providing scaleable compressed video signal
5408270, Jun 24 1993 Massachusetts Institute of Technology Advanced television system
5414469, Oct 31 1991 International Business Machines Corporation Motion video compression system with multiresolution features
5418571, Feb 01 1991 British Telecommunicatons public limited company Decoding of double layer video signals with interpolation replacement on missing data from enhancement layer
5465119, Feb 22 1991 DOLBY LABORATORIES, INC ; Dolby Laboratories Licensing Corporation Pixel interlacing apparatus and method
5493338, Dec 28 1991 Goldstar Co., Ltd. Scan converter of television receiver and scan converting method thereof
5519453, Aug 06 1993 U S PHILIPS CORPORATION Method of eliminating interfernce signals from video signals
5737027, Feb 22 1991 DOLBY LABORATORIES, INC ; Dolby Laboratories Licensing Corporation Pixel interlacing apparatus and method
5742343, Jul 13 1993 THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT Scalable encoding and decoding of high-resolution progressive video
5828788, Dec 14 1995 THOMSON MULTIMEDIA S A System for processing data in variable segments and with variable data resolution
5852565, Jan 30 1996 DOLBY LABORATORIES, INC ; Dolby Laboratories Licensing Corporation Temporal and resolution layering in advanced television
5974159, Mar 29 1996 ASEV DISPLAY LABS Method and apparatus for assessing the visibility of differences between two image sequences
5988863, Jan 30 1996 DOLBY LABORATORIES, INC ; Dolby Laboratories Licensing Corporation Temporal and resolution layering in advanced television
6028634, Oct 27 1995 Kabushiki Kaisha Toshiba Video encoding and decoding apparatus
6111975, Mar 22 1991 Minimum difference processor
6175592, Mar 12 1997 MATSUSHITA ELECTRIC INDUSTRIAL CORP , LTD Frequency domain filtering for down conversion of a DCT encoded picture
6252906, Jul 31 1998 THOMSON LICENSING S A Decimation of a high definition video signal
6442203, Nov 05 1999 DOLBY LABORATORIES, INC ; Dolby Laboratories Licensing Corporation System and method for motion compensation and frame rate conversion
6489956, Feb 17 1998 Oracle America, Inc Graphics system having a super-sampled sample buffer with generation of output pixels using selective adjustment of filtering for implementation of display effects
6728317, Jan 30 1996 DOLBY LABORATORIES, INC ; Dolby Laboratories Licensing Corporation Moving image compression quality enhancement using displacement filters with negative lobes
7106322, Jan 11 2000 Oracle America, Inc Dynamically adjusting a sample-to-pixel filter to compensate for the effects of negative lobes
20020003838,
20040196901,
CA2127151,
EP634871,
EP634871,
JP1140883,
JP6165150,
JP6350995,
JP7203426,
WO177871,
WO9728507,
WO177871,
WO9728507,
/
Executed onAssignorAssigneeConveyanceFrameReelDoc
Feb 06 2012Dolby Laboratories Licensing Corporation(assignment on the face of the patent)
Date Maintenance Fee Events
Dec 16 2019REM: Maintenance Fee Reminder Mailed.
Jun 01 2020EXP: Patent Expired for Failure to Pay Maintenance Fees.


Date Maintenance Schedule
May 21 20164 years fee payment window open
Nov 21 20166 months grace period start (w surcharge)
May 21 2017patent expiry (for year 4)
May 21 20192 years to revive unintentionally abandoned end. (for year 4)
May 21 20208 years fee payment window open
Nov 21 20206 months grace period start (w surcharge)
May 21 2021patent expiry (for year 8)
May 21 20232 years to revive unintentionally abandoned end. (for year 8)
May 21 202412 years fee payment window open
Nov 21 20246 months grace period start (w surcharge)
May 21 2025patent expiry (for year 12)
May 21 20272 years to revive unintentionally abandoned end. (for year 12)