A method and apparatus is disclosed herein for using an in-the-loop denoising filter for quantization noise removal for video compression. In one embodiment, the video encoder comprises a transform coder to apply a transform to a residual frame representing a difference between a current frame and a first prediction, the transform coder outputting a coded differential frame as an output of the video encoder; a transform decoder to generate a reconstructed residual frame in response to the coded differential frame; a first adder to create a reconstructed frame by adding the reconstructed residual frame to the first prediction; a non-linear denoising filter to filter the reconstructed frame by deriving expectations and performing denoising operations based on the expectations; and a prediction module to generate predictions, including the first prediction, based on previously decoded frames.
|
47. A decoder comprising:
a transform decoder to generate a decoded differential frame;
an adder to create a reconstructed frame in response to the decoded differential frame and a first prediction;
a denoising filter to filter the reconstructed frame by deriving conditional expectations of coefficients of an original frame and performing denoising operations based on the conditional expectations, wherein the denoising filter performs weighted averaging in the transform domain to generate an estimate of the original frame; and
a prediction module to generate predictions, including the first prediction, based on previously decoded frames.
26. A decoder comprising:
a transform decoder to generate a decoded differential frame;
an adder to create a reconstructed frame in response to the decoded differential frame and a first prediction;
a denoising filter to filter the reconstructed frame by deriving conditional expectations of coefficients of an original frame and performing denoising operations based on the conditional expectations, wherein the denoising filter performs denoising on a selected subset of pixels, where the subset of pixels is determined by defining a mask using compression mode parameters of the reconstructed frame; and
a prediction module to generate predictions, including the first prediction, based on previously decoded frames.
48. A video decoding process comprising:
generating a decoded differential frame;
creating a reconstructed frame in response to the decoded differential frame and a first prediction;
computing a conditional expectation for each of a plurality of coefficients of an original frame;
performing denoising operations based on the conditional expectation and a coefficient resulting from application of a linear transform to the reconstructed frame, wherein performing denoising occurs on a selected subset of pixels, where the subset of pixels is determined by defining a mask using compression mode parameters of the reconstructed frame; and
generating predictions, including the first prediction, based on previously decoded frames.
43. A decoder comprising:
a transform decoder to generate a decoded differential frame;
an adder to create a reconstructed frame in response to the decoded differential frame and a first prediction;
a denoising filter to filter the reconstructed frame by deriving conditional expectations of coefficients of an original frame and performing denoising operations based on the conditional expectations; and
a prediction module to generate predictions, including the first prediction, based on previously decoded frames,
wherein the denoising filter applies a linear transform to the reconstructed frame to obtain coefficients of the original frame and determines, based on a compression mode, the filtering to perform on each coefficient of the reconstructed frame, wherein the compression mode of a macroblock is based on motion vectors and a mode of the macroblock.
25. A video encoder comprising
a transform coder to apply a transform to a residual frame representing a difference between a current frame and a first prediction, the transform coder outputting a coded differential frame as an output of the video encoder;
a transform decoder to generate a reconstructed residual frame in response to the coded differential frame;
a first adder to create a reconstructed frame by adding the reconstructed residual frame to the first prediction;
a non-linear denoising filter to filter the reconstructed frame by deriving conditional expectations of coefficients of the current frame and performing denoising operations based on the conditional expectations; and
a prediction module to generate predictions, including the first prediction, based on previously decoded frames,
wherein the denoising filter performs weighted averaging in the transform domain to generate an estimate of the current frame.
1. A video encoder comprising a transform coder to apply a transform to a residual frame representing a difference between a current frame and a first prediction, the transform coder outputting a coded differential frame as an output of the video encoder;
a transform decoder to generate a reconstructed residual frame in response to the coded differential frame;
a first adder to create a reconstructed frame by adding the reconstructed residual frame to the first prediction;
a non-linear denoising filter to filter the reconstructed frame by deriving conditional expectations of coefficients of the current frame and performing denoising operations based on the conditional expectations, wherein the denoising filter performs denoising on a selected subset of pixels, where the selected subset of pixels is determined by defining a mask using compression mode parameters of the reconstructed frame; and
a prediction module to generate predictions, including the first prediction, based on previously decoded frames.
57. A video decoding process comprising:
computing a conditional expectation for each of a plurality of coefficients of an original frame; and
performing denoising operations based on the conditional expectation and a coefficient resulting from application of a linear transform to a corresponding reconstructed frame, and further comprising:
obtaining the reconstructed frame and other available information;
obtaining a set of coefficients of the reconstructed frame by applying a transform to the decoded frame;
setting a set of image elements equal to elements of the reconstructed frame;
determining coefficient parameters and a mask function based on compression parameters;
computing the conditional expectation for each coefficient in the set of coefficients obtained by applying the transform to the reconstructed frame, and obtaining a set of filtered coefficients by applying a denoising rule using the value of the coefficient in the set of coefficients and the conditional expectation; and
obtaining a filtered frame by applying the mask function to a result of an inverse of the transform applied to the set of filtered coefficients.
21. A video encoder comprising
a transform coder to apply a transform to a residual frame representing a difference between a current frame and a first prediction, the transform coder outputting a coded differential frame as an output of the video encoder;
a transform decoder to generate a reconstructed residual frame in response to the coded differential frame;
a first adder to create a reconstructed frame by adding the reconstructed residual frame to the first prediction;
a non-linear denoising filter to filter the reconstructed frame by deriving conditional expectations of coefficients of the current frame and performing denoising operations based on the conditional expectations; and
a prediction module to generate predictions, including the first prediction, based on previously decoded frames,
wherein the denoising filter applies a linear transform to the reconstructed frame to obtain coefficients of the reconstructed frame and determines, based on a compression mode, the filtering to perform on each coefficient of the reconstructed frame, wherein the compression mode of a macroblock is based on motion vectors and a mode of the macroblock.
44. A decoder comprising:
a transform decoder to generate a decoded differential frame;
an adder to create a reconstructed frame in response to the decoded differential frame and a first prediction;
a denoising filter to filter the reconstructed frame by deriving conditional expectations of coefficients of an original frame and performing denoising operations based on the conditional expectations; and
a prediction module to generate predictions, including the first prediction, based on previously decoded frames,
wherein the denoising filter performs a process comprising:
obtaining the reconstructed frame and other available information;
obtaining a set of coefficients of the reconstructed frame by applying a transform to the decoded frame;
setting a set of image elements equal to elements of the reconstructed frame;
determining coefficient parameters and a mask function based on compression parameters;
computing a conditional expectation for each coefficient in the set of coefficients obtained by applying the transform to the reconstructed frame, and obtaining a set of filtered coefficients by applying a denoising rule using the value of each coefficient in the set of coefficients and the conditional expectation; and
obtaining a filtered frame by applying the mask function to a result of an inverse of the transform applied to the set of filtered coefficients.
22. A video encoder comprising
a transform coder to apply a transform to a residual frame representing a difference between a current frame and a first prediction, the transform coder outputting a coded differential frame as an output of the video encoder;
a transform decoder to generate a reconstructed residual frame in response to the coded differential frame;
a first adder to create a reconstructed frame by adding the reconstructed residual frame to the first prediction;
a non-linear denoising filter to filter the reconstructed frame by deriving conditional expectations of coefficients of the current frame and performing denoising operations based on the conditional expectations; and
a prediction module to generate predictions, including the first prediction, based on previously decoded frames,
wherein the denoising filter performs a process comprising:
obtaining the reconstructed frame and other available information;
obtaining a set of coefficients of the reconstructed frame by applying a transform to the reconstructed frame;
setting a set of image elements equal to elements of the reconstructed frame;
determining coefficient parameters and a mask function based on compression parameters;
computing the conditional expectation of for each coefficient in the set of coefficients obtained by applying the transform to the reconstructed frame based on the set of image elements, and obtaining a set of filtered coefficients by applying a denoising rule using the value of the coefficient in the set of coefficients and the conditional expectation; and
obtaining a filtered frame by applying the mask function to a result of an inverse of the transform applied to the set of filtered coefficients.
2. The video encoder defined in
3. The video encoder defined in
5. The video encoder defined in
6. The video encoder defined in
7. The video encoder defined in
8. The video encoder defined in
9. The video encoder defined in
10. The video encoder defined in
11. The video encoder defined in
12. The video encoder defined in
13. The video encoder defined in
15. The video encoder defined in
16. The video encoder defined in
17. The video encoder defined in
18. The video encoder defined in
19. The video encoder defined in
20. The video encoder defined in
23. The video encoder defined in
determining the mask function using compression parameters and using the mask to determine to which coefficients in the set of coefficients to apply denoising.
24. The video encoder defined in
27. The video decoder defined in
28. The video decoder defined in
29. The video decoder defined in
30. The video decoder defined in
31. The video decoder defined in
32. The video decoder defined in
33. The video decoder defined in
34. The video decoder defined in
35. The video decoder defined in
36. The video decoder defined in
37. The video decoder defined in
38. The video decoder defined in
39. The video decoder defined in
40. The video decoder defined in
41. The video decoder defined in
42. The video decoder defined in
45. The video decoder defined in
determining the mask function using compression parameters and using the mask to determine to which coefficients in the set of coefficients to apply denoising.
46. The video decoder defined in
49. The video decoding process defined in
51. The video decoding process defined in
52. The video decoding process defined in
applying a plurality of denoising transforms to quantized video data; and
combining results produced from the plurality of denoising transforms to obtain an estimate of the original frame.
53. The video decoding process defined in
54. The video decoding process defined in
55. The video decoding process defined in
56. The video decoding process defined in
altering denoising parameters for each coefficient of the rescontructed frame based on compression mode parameters.
58. The video decoding process defined in
determining the mask function using compression parameters and using the mask to determine to which coefficients in the set of coefficients to apply denoising.
59. The video decoding process defined in
|
The present patent application claims priority to the corresponding provisional patent application Ser. No. 60/644,230, titled, “A Nonlinear, In-the-Loop, Denoising Filter for Quantizaton Noise Removal for Hybrid Video Compression”, filed on Jan. 13, 2005.
The present invention relates to the field of processing video frames; more particularly, the present invention relates to filtering quantization noise from video frames, thereby improving video compression.
Hybrid video compression consists of encoding an anchor video frame and then predictively encoding a set of predicted frames. Predictive encoding uses motion compensated prediction with respect to previously coded frames in order to obtain a prediction error frame followed by the encoding of this prediction error frame. Anchor frames and prediction errors are encoded using transform coders in a manner well-known in the art.
Transform coded frames incur quantization noise. Due to the predictive coding of frames, quantization noise has two adverse consequences: (i) the quantization noise in frame n causes reduced quality in the display of frame n, (ii) the quantization noise in frame n causes reduced quality in all frames that use frame n as part of their prediction.
Quantization artifact removal from images is a well-known problem. For a review and many references, Shen & Kuo, “Review of Postprocessing Techniques for Compression Artifact Removal,” Journal of Visual Communication and Image Representation, vol. 9, pp. 2-14, March 1998.
For video, various types of in-the-loop filters are well-known and have become part of earlier standards. See, for example, MPEG4 Verification Model, VM 14.2, pp. 260-264, 1999, as well as ITU-T Recommendation H.261.
Prior solutions are typically limited to quantization noise produced by block transform coders. Robust solutions that handle block, as well as non-block, transforms (such as wavelets, lapped orthogonal transforms, lapped biorthogonal transforms) are not in the reach of the related art. This is because related art assumes transform coding is done using block transforms and that quantization noise mostly occurs at the boundaries of transform blocks. These prior solutions thus apply filtering around block boundaries. For non-block transforms, there are no transform block boundaries (in fact, for transforms such as wavelets and lapped transforms, there are no well-defined spatial transform boundaries since the transform basis functions overlap). Hence, these previously used solutions cannot determine where quantization noise occurs.
Prior solutions have been applied to video frames that have smoothly varying pixel values. This is because prior solutions are derived using smooth image models. The filters derived are typically restricted to low-pass filters. These are not applicable on many types of image regions, such as on edges, textures, etc.
Related art is typically restricted to a single type of quantization artifacts. Techniques typically specialize on blocking artifacts, or ringing artifacts, or other types of artifacts without providing a general means for addressing all types of artifacts.
The performance of prior techniques (in a rate-distortion sense and in a visual quality sense) is typically not high enough and their applicability not broad enough to justify the complexity of their incorporation in a video codec.
A method and apparatus is disclosed herein for using an in-the-loop denoising filter for quantization noise removal for video compression. In one embodiment, the video encoder comprises a transform coder to apply a transform to a residual frame representing a difference between a current frame and a first prediction, the transform coder outputting a coded differential frame as an output of the video encoder; a transform decoder to generate a reconstructed residual frame in response to the coded differential frame; a first adder to create a reconstructed frame by adding the reconstructed residual frame to the first prediction; a non-linear denoising filter to filter the reconstructed frame by deriving expectations and performing denoising operations based on the expectations; and a prediction module to generate predictions, including the first prediction, based on previously decoded frames.
The present invention will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the invention, which, however, should not be taken to limit the invention to the specific embodiments, but are for explanation and understanding only.
In-the-loop, spatial, nonlinear filtering of video frames is described. The filtering is designed to remove noise incurred during the compression of the frames. In one embodiment, the filtering technique is based on deriving an expectation and implementing denoising operations based on the derived expectation. Mode based decisions are derived for the effective denoising of differentially compressed video frames. In one embodiment, weighted denoising technique is applied which further improves performance.
The described techniques are robust and general, being able to effectively handle a multitude of image region types and a multitude of compression techniques. In one embodiment, the derived nonlinear, in-the-loop denoising filters adaptively and autonomously develop the proper frequency selectivity for the multitude of image region types by developing low-pass selectivity in smooth image regions, high-pass selectivity in high-frequency regions, etc. The described technique is instrumental in concurrently removing general quantization artifacts such as, for example, blocking artifacts, ringing artifacts, etc.
In the following description, numerous details are set forth to provide a more thorough explanation of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
Some portions of the detailed descriptions which follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The present invention also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory (“ROM”); random access memory (“RAM”); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc.
Overview
With respect to the denoising filters of
The denoising filter derives mode-based decisions that refine the expectation calculation and the denoising operation. Specifically, depending on the compression mode of the predictively encoded frame, the denoising filter determines the spatial locations that should be denoised. In one embodiment, the parameters that are used in the expectation calculation are also determined based on the compression mode.
In one embodiment, the expectation processing described above is performed for multiple linear transforms applied to the quantized video frame. Each linear transform determines an estimate for the video frame. Multiple linear transforms thereby result in multiple estimates. The denoising filter combines these estimates to form an overall estimate. In one embodiment, the overall estimate is better than each of the estimates individually.
In one embodiment, the linear transform applied to the quantized video frame described above is an orthonormal transform such as, for example, a block n×n DCT. Other transforms, including non-orthogonal transforms and non-block transforms can also be applied. It is desirable but not necessary for this transform to have a fast implementation so that denoising computations can be performed in an efficient manner.
Let y denote a quantized video frame (arranged into an N×1 vector), let H denote the linear transform as specified in the above paragraph (an N×N matrix). The transform coefficients are given by d=Hy. Similarly let x denote the original unquantized version of y, and let c=Hx be its transform coefficients. Consider the ith transform coefficient c(i). Observe that x and hence c are not available since only their quantized versions are available. To estimate c (and thereby x) given y, let ĉ denote the estimate of c and {circumflex over (x)} denote the estimate of x.
In estimating c(i), there are two possibilities: the estimate is either ĉ(i)=d(i) or ĉ(i)=0. Using conditional expectations, let Ψ be a canonical variable that encapsulates all of the available information (y, any previously decoded video frames, side information about x, etc.) In one embodiment, the estimate ĉ(i)=0 is selected over ĉ(i)=d(i) if
E└∥c(i)∥2|Ψ┘≦E└∥c(i)−d(i)∥2|Ψ┘
i.e., if the mean squared error of using ĉ(i)=0 given all the available information is less than the mean squared error of using ĉ(i)=d(i) given all the available information. This expression can be used to generate the following rule:
use {circumflex over (c)}(i)=0 if |E[c(i)|Ψ]≦|E[c(i)|Ψ]−d(i)|
use {circumflex over (c)}(i)=d(i) otherwise
The quantity E[c(i)|Ψ] is referred to herein as the conditional expectation of c(i), conditioned on all the available information. The above rule is referred to herein as the denoising rule to obtain ĉ(i).
Referring to
Then, processing logic obtains a set of coefficients d by applying a transform H to the decoded frame y (processing block 302). For example, the transform H may represent a block-wise two-dimensional DCT. Processing logic also sets a set of image elements e equal to the elements of y.
Afterwards, processing logic computes a conditional expectation of c(i) for each coefficient in d based on the set of image elements e and obtains a filtered coefficient ĉ(i) by applying a denoising rule using the value of the coefficient in d and the conditional expectation of c(i) (processing block 303). Thereafter, processing logic obtains a filtered frame {circumflex over (x)} by applying the inverse of transform H to the set of coefficients ĉ (processing block 304).
After obtaining the filtered frame, processing logic determines whether more iterations are needed (processing block 305). For example, a fixed number of iterations such as two, may be preset. If more iterations are needed, processing logic sets the set of image elements e to {circumflex over (x)} (processing block 307) and processing transactions to processing block 303. Otherwise, the processing flow proceeds to processing block 306 where the processing logic outputs the filtered frame {circumflex over (x)}.
Due to differential encoding, some portions of the quantized frame might be the same as previously coded frames. Since these portions have already been filtered, re-filtering them may cause unintended results. In transform domain, this points to a requirement that denoising parameters must be altered for each of the coefficients. In pixel domain, this points to a requirement that not all pixel values should be changed. Furthermore, compression parameters may change spatially (or in frequency) which requires different coefficients to be denoised using different denoising parameters.
The process of
Referring to
After obtaining the decoded frame and collecting other information, processing logic obtains a set of coefficients d by applying a transform H to the decoded frame y (processing block 402). For example, the transform H may represent a block-wise two-dimensional DCT. Processing logic also sets a set of image elements e equal to the elements of y.
Processing logic then determines coefficient denoising parameters for each coefficient based on compression parameters (processing block 403) and determines a mask based on compression parameters (processing block 404).
Afterwards, processing logic computes a conditional expectation of c(i) for each coefficient in d based on e and coefficient parameters and a filtered coefficient ĉ(i) by applying a denoising rule using the value of the coefficient in d and the conditional expectation of c(i) (processing block 405).
Next, processing logic obtains a filtered frame {circumflex over (x)} by applying the mask function to the result of the inverse of transform H applied to the set of coefficients ĉ (processing block 406).
Processing logic then determines whether more iterations are needed (processing block 407). For example, a fixed number of iterations such as two, may be preset. If more iterations are needed, processing logic sets the set of image elements e to {circumflex over (x)} (processing block 408) and the process transitions to processing block 405; otherwise, processing transitions to processing block 408 where processing logic outputs the filtered frame {circumflex over (x)}.
While the above mentioned basic procedures that use a single linear transform H provide acceptable denoising performance, better performance can be obtained by using several different linear transforms, H1, H2, . . . HM. Each of these transforms are used in a basic procedure of its own to produce estimates of the original unquantized video frame x given by {circumflex over (x)}1, {circumflex over (x)}2, . . . , {circumflex over (x)}M. These individual estimates are combined to form an overall estimate {circumflex over (x)} that is better than each of the estimates. One embodiment of such a process using multiple transforms is illustrated in
The process of
Referring to
After obtaining the decoded frame and collecting other information, processing logic obtains a set of coefficients d1:M by applying M transforms Hj to the decoded frame y (processing block 502). For example, each transform Hj may represent a block-wise two-dimensional DCT, where the block alignment is dependent on j. Processing logic also sets a set of image elements e equal to the elements of y.
Processing logic then determines coefficient denoising parameters for each coefficient based on compression parameters (processing block 503) and determines a mask based on compression parameters (processing block 504).
With this information, processing logic computes a conditional expectation of c1:M(i) for each coefficient in d1:M based on e and coefficient parameters and obtains a filtered coefficient ĉ1:M(i) by applying a denoising rule using the value of the coefficient in d1:M and the conditional expectation of c1:M(i) (processing block 505).
Next, processing logic obtains filtered frames {circumflex over (x)}1:M(i) by applying the mask function to the result of the inverses of transforms H1:M applied to the set of coefficients ĉ1:M (processing block 506).
Processing logic then determines an overall estimate {circumflex over (x)} (processing block 507). This may be performed by averaging all the estimates together. The averaging may be a weighted average. In one embodiment, the overall estimate block in
After obtaining the overall estimate, processing logic determines whether more iterations are needed (processing logic 508). For example, a fixed number of iterations such as two, may be preset. If more iterations are needed, processing logic sets the set of image elements e to {circumflex over (x)} (processing block 509) and the process transitions to processing block 505; otherwise, processing transitions to processing block 510 where processing logic outputs the filtered frame {circumflex over (x)}.
A denoising rule module 620 generates denoised coefficients 621 based on coefficients 607 and expectations 609. The operation of module 620 is described above. An inverse linear transform 630 is applied to denoised coefficients 621 to generate initial denoised estimates 631.
An overall estimate construction module 640 combines initial denoised estimates 631 to generate an overall denoised estimate 641. A mask application module 650 selects between elements of the overall denoised estimate 641 and decompressed video 606 based on the mask function 603 to generate a denoised estimate 651. An iteration decision module 660 generates a final denoised estimate 661 equal to the denoised estimate if it is determined that a predetermined number of iterations is reached. Otherwise, thresholds are modified into altered thresholds 610. The altering of thresholds is described in greater detail below. Altered thresholds 610 and the denoised estimate 611 are fed back to expectation calculation module 608 for further iterations.
With respect to the expectation calculation, in one embodiment, E[c(i)|Ψ] is constructed with the aid of a threshold T. In one embodiment, T is taken as √{square root over (3)} times the standard deviation of the quantization noise that c(i) incurs. Let h be the ith row of H, i.e., the basis function responsible for the generation of c(i). Consider the set of coefficients obtained by shifting h spatially in a pixel neighborhood around its default spatial position and by obtaining the scalar product of the shifted basis function with e, where e=y. This neighborhood can be taken as +/−j pixels around the default spatial position and j can be set to 0, 1, 2 or higher. Suppose k coefficients in the set have magnitudes greater than T (subset 1) and the remaining 1 coefficients have magnitudes less than T (subset 2). Construct the average value of the coefficients in subset 1 and multiply the average with
The resulting value is assigned to E[c(i)|Ψ]. This calculation can be extended to incorporate coefficients from previously decoded frames that match the coefficients in the subset using motion trajectories to determine matches.
In an alternative embodiment, a threshold is not used. In such a case, only averaging is used and no thresholding (equivalently this means that 1=0). Also, other forms of averaging may be used including linear averaging and nonlinear averaging (e.g., by taking a median value of the elements of the set, by taking the 2, 3, 4, 5 . . . values in the set closest to c(i), etc.). The type and detail parameters involved in averaging can be set using compression modes.
In another embodiment, a general condition expectation calculation is used in conjunction with a general denoising rule with coefficients set to 0 or d(i) based on the expectation.
In one embodiment, for the expectation calculation, after the initial denoising operation has been completed and the filtered frame {circumflex over (x)} has been obtained, a second round of denoising is performed by determining E[c(i)|Ψ] again using the filtered frame {circumflex over (x)} rather than y as follows. In the determination of subsets 1 and 2 in i-, the scalar products of the shifts of h are obtained with {circumflex over (x)} rather than with y, i.e., e={circumflex over (x)}. Once the coefficients are determined, the average calculation and the E[c(i)|Ψ] calculation are applied as in i- by using a threshold T′ rather than T. In one embodiment, T′ can be taken as T/2. This calculation can also be extended to incorporate matching coefficients from previously decoded/denoised frames using motion trajectories.
In one embodiment, mode-based decisions for the encoder utilized in hybrid video compression is an MPEG-like encoder and the compression mode alters the threshold used in the expectation calculation for each coefficient and also alters the mask function that determines the spatial locations to which denoising is applied. Below an example of mode based decisions that accomplish these two objectives is set forth.
In one embodiment, a block is an B×B rectangle of pixels, where B can be 4, 8, 16, or higher. For each coded block its mode is determined as one of:
According to its mode, each block has a spatial denoising influence that determines a spatial mask around its boundaries. The influence I is specified in pixel units and it identifies a rectangular shell of thickness of I pixels around the boundaries of the block. In one embodiment, the masks of all blocks are combined to form a frame-wide mask. Once the initial denoised estimate z is obtained, the mask is applied such that in the denoised estimate {circumflex over (x)} if a pixel lies in the mask, its value is copied from z otherwise its value is copied from y.
In one embodiment, the influence factors I used in mask construction are determined as follows:
In one embodiment, mode-based denoising decisions are accomplished as follows:
In one embodiment, the denoising linear transforms are given by a n×n block DCT and all of its n2 spatial translations, where n is determined by the size of the video frame. For QCIF resolution frames n=4, for CIF resolution frames n=4 or n=8, etc. In general, n is expected to get larger for higher resolution video frames.
There are a number of advantages associated with embodiments of the present invention. Some of these include the following. Embodiments of the invention can accommodate video coders that use block as well as non-block transforms in transform coding. Embodiments of the invention are applicable to video frames that have pixel values due to a large range of statistics, such as low-pass, band-pass, high-pass, texture, edge, etc. The invention is not limited to video frames that have smoothly varying pixel values. Embodiments of the invention are effective in denoising a wide range of quantization artifacts, such as blocking artifacts, ringing artifacts, etc. The invention is not limited to blocking artifacts only or ringing artifacts only, etc. The rate-distortion performance of embodiment of the invention on typical video frames and the visual quality of the invention on typical video frames is significantly above related art. Also, embodiment of the invention can be deployed in a way that achieves low computational complexity.
An Exemplary Computer System
Referring to
System 700 further comprises a random access memory (RAM), or other dynamic storage device 704 (referred to as main memory) coupled to bus 711 for storing information and instructions to be executed by processor 712. Main memory 704 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 712.
Computer system 700 also comprises a read only memory (ROM) and/or other static storage device 706 coupled to bus 711 for storing static information and instructions for processor 712, and a data storage device 707, such as a magnetic disk or optical disk and its corresponding disk drive. Data storage device 707 is coupled to bus 711 for storing information and instructions.
Computer system 700 may further be coupled to a display device 721, such as a cathode ray tube (CRT) or liquid crystal display (LCD), coupled to bus 711 for displaying information to a computer user. An alphanumeric input device 722, including alphanumeric and other keys, may also be coupled to bus 711 for communicating information and command selections to processor 712. An additional user input device is cursor control 723, such as a mouse, trackball, trackpad, stylus, or cursor direction keys, coupled to bus 711 for communicating direction information and command selections to processor 712, and for controlling cursor movement on display 721.
Another device that may be coupled to bus 711 is hard copy device 724, which may be used for marking information on a medium such as paper, film, or similar types of media. Another device that may be coupled to bus 711 is a wired/wireless communication capability 725 to communication to a phone or handheld palm device.
Note that any or all of the components of system 700 and associated hardware may be used in the present invention. However, it can be appreciated that other configurations of the computer system may include some or all of the devices.
Whereas many alterations and modifications of the present invention will no doubt become apparent to a person of ordinary skill in the art after having read the foregoing description, it is to be understood that any particular embodiment shown and described by way of illustration is in no way intended to be considered limiting. Therefore, references to details of various embodiments are not intended to limit the scope of the claims which in themselves recite only those features regarded as essential to the invention.
Patent | Priority | Assignee | Title |
10102613, | Sep 25 2014 | GOOGLE LLC | Frequency-domain denoising |
8879001, | Dec 28 2006 | MAGNOLIA LICENSING LLC | Detecting block artifacts in coded images and video |
9361707, | Sep 15 2010 | Sharp Kabushiki Kaisha | Methods and systems for detection and estimation of compression noise |
9721333, | Sep 15 2010 | Sharp Kabushiki Kaisha | Methods and systems for estimation of additive noise |
9723330, | Nov 25 2008 | INTERDIGITAL MADISON PATENT HOLDINGS | Method and apparatus for sparsity-based de-artifact filtering for video encoding and decoding |
Patent | Priority | Assignee | Title |
5565925, | Oct 26 1992 | Acacia Research Group LLC | Image sub-sampling apparatus |
5587708, | Jan 19 1994 | Industrial Technology Research Institute | Division technique unified quantizer-dequantizer |
5974197, | Jan 29 1997 | SAMSUNG ELECTRONICS CO , LTD | Loop filter and loop filtering method |
6360024, | Feb 07 1997 | Matsushita Electric Industrial Co., Ltd. | Method and apparatus for removing noise in still and moving pictures |
6631162, | Jul 16 1997 | SAMSUNG ELECTRONICS CO , LTD | Signal adaptive filtering method, signal adaptive filter and computer readable medium for storing program therefor |
6724944, | Mar 13 1997 | Qualcomm Incorporated | Adaptive filter |
7206459, | Jul 31 2001 | RICOH CO , LTD | Enhancement of compressed images |
20030219074, | |||
20040022315, | |||
20040062316, | |||
20050025379, | |||
20050105817, | |||
20070047648, | |||
EP1282075, | |||
JP3136586, | |||
JP5130592, | |||
WO2004030369, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jan 12 2006 | NTT DOCOMO, INC. | (assignment on the face of the patent) | / | |||
Jan 12 2006 | GULERYUZ, ONUR G | DOCOMO COMMUNICATIONS LABORATORIES USA, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 017471 | /0551 | |
Feb 09 2006 | DOCOMO COMMUNICATIONS LABORATORIES USA, INC | NTT DoCoMo, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 017551 | /0682 |
Date | Maintenance Fee Events |
Jan 24 2013 | ASPN: Payor Number Assigned. |
Feb 19 2016 | REM: Maintenance Fee Reminder Mailed. |
Jul 10 2016 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Jul 10 2015 | 4 years fee payment window open |
Jan 10 2016 | 6 months grace period start (w surcharge) |
Jul 10 2016 | patent expiry (for year 4) |
Jul 10 2018 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 10 2019 | 8 years fee payment window open |
Jan 10 2020 | 6 months grace period start (w surcharge) |
Jul 10 2020 | patent expiry (for year 8) |
Jul 10 2022 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 10 2023 | 12 years fee payment window open |
Jan 10 2024 | 6 months grace period start (w surcharge) |
Jul 10 2024 | patent expiry (for year 12) |
Jul 10 2026 | 2 years to revive unintentionally abandoned end. (for year 12) |