Examples are provided for apparatus and methods for video encoding and decoding. An encoder apparatus for encoding a picture into a data stream is configured to: subject a transform of a block of the picture to quantization to obtain a quantized transform; determine a zero-quantized portion of the quantized transform; determine at least one noise parameter providing noise information on the transform within the zero-quantized portion; signal the quantized transform and the at least one parameter.
|
14. An encoding method, comprising:
quantizing a transform of a block of a picture to acquire a quantized transform;
defining a second block having coefficients of the transform associated to a zero-quantized portion of the quantized transform;
acquiring at least one noise parameter providing noise information on the transform within the zero-quantized portion;
signal the quantized transform and the at least one noise parameter, wherein the method comprises:
dividing the second block into a higher-frequency zero-quantized portion and a lower-frequency zero-quantized portion; and
determining the noise colorness parameter or granularity parameter on the basis of a comparison and/or a relationship and/or a quotient based on:
a first sum or aggregate value or statistical value or average value associated to the transform within the higher-frequency zero-quantized portion; and
a second sum or aggregate value or statistical value or average value associated to the transform within the second block or within a super block comprising the block.
15. A decoding method, comprising:
acquiring a quantized transform from a data stream, a zero-quantized portion of the quantized transform, and at least one noise parameter providing noise information for the zero-quantized portion;
performing noise synthesis associated with the quantized transform on the basis of the at least one noise parameter; and
reconstructing a picture using the quantized transform and the synthesized noise, wherein the method further comprises:
defining at least one frequency boundary on the basis of a noise colorness parameter or granularity parameter describing colorness or granularity of noise of a transform in the positions of the zero-quantized portion, the noise colorness parameter or granularity parameter being comprised in the at least one noise parameter;
determining, from the zero-quantized portion, a higher-frequency zero-value subportion and a lower-frequency zero-value subportion; and
setting at zero the noise for the higher-frequency zero-value subportion and/or avoid noise insertion for the higher-frequency zero-value subportion and/or prevent from synthesizing noise for the higher-frequency zero-value subportion.
1. An encoder apparatus for encoding a picture into a data stream, configured to:
subject a transform of a block of the picture to quantization to acquire a quantized transform;
determine a zero-quantized portion of the quantized transform;
define a second block having coefficients of the transform associated to the zero-quantized portion of the quantized transform;
determine at least one noise parameter providing noise information on the transform within second block;
signal the quantized transform and the at least one noise parameter, wherein the encoder apparatus is configured to:
divide the second block into a higher-frequency zero-quantized portion and a lower-frequency zero-quantized portion; and
determine a noise colorness parameter or granularity parameter on the basis of a comparison and/or a relationship and/or a quotient based on:
a first sum or aggregate value or statistical value or average value associated to the transform within the higher-frequency zero-quantized portion; and
a second sum or aggregate value or statistical value or average value associated to the transform within the second block or within a super block comprising the block.
8. A decoder apparatus for decoding a picture from a data stream, configured to:
derive a quantized transform from the data stream, a zero-quantized portion of the quantized transform, and at least one noise parameter providing noise information for a transform in the positions of the zero-quantized portion;
provide noise synthesis to synthesize noise associated with the quantized transform on the basis of the at least one noise parameter; and
reconstruct the picture using the quantized transform and the synthesized noise, wherein the decoder apparatus is configured to:
define at least one frequency boundary on the basis of a noise colorness parameter or granularity parameter describing colorness or granularity of noise of the transform in the positions of the zero-quantized portion, the noise colorness parameter or granularity parameter being comprised in the at least one noise parameter;
determine, from the zero-quantized portion, a higher-frequency zero-value subportion and a lower-frequency zero-value subportion; and
set at zero the noise for the higher-frequency zero-value subportion and/or avoid noise insertion for the higher-frequency zero-value subportion and/or prevent from synthesizing noise for the higher-frequency zero-value subportion.
2. The encoder apparatus of
determine a noise energy parameter or noise level parameter, comprised in the at least one noise parameter, associated to the transform within the second block.
3. The encoder apparatus of
determine the noise energy parameter or noise level parameter by performing averaging, aggregating, and/or acquiring measured values and/or estimated values and/or statistical values associated to the transform within the second block.
4. The encoder apparatus of
determine the noise colorness parameter or granularity parameter on the basis of a spectral tilt value or low-pass characteristic associated to the noise of the transform within the second block.
5. The encoder apparatus of
determine the noise colorness parameter or granularity parameter between at least:
a first value associated to Brownian colorness or granularity; and
a second value associated to white colorness or granularity.
6. The encoder apparatus of
exclude at least the lowest frequency from the second block.
7. The encoder apparatus of
perform a first determination to determine a noise colorness parameter or granularity parameter describing colorness or granularity of noise of the transform within the second block;
perform second determinations to determine a noise energy parameter or noise level parameter associated to the transform within the second block,
wherein each of the second determinations is performed for one block of a group of blocks and the first determination is performed for the entire group of blocks,
so as to associate an energy parameter or noise level parameter to each block of the plurality of blocks, and
to associate one noise colorness parameter or granularity parameter to the entire group of blocks.
9. The decoder apparatus of
define a subportion of the zero-quantized portion on the basis of the at least one noise parameter, the at least one noise parameter comprising a noise colorness parameter or granularity parameter describing colorness or granularity of noise of the transform in the positions of the zero-quantized portion.
10. The decoder apparatus of
provide the noise synthesis by generating noise in the positions of the subportion of the zero-quantized portion.
11. The decoder apparatus of
provide the noise synthesis by inserting pseudo-random values conditioned by a noise energy parameter or noise level parameter comprised in the at least one noise parameter.
12. The decoder apparatus of
synthesize noise on the basis of the at least one noise energy parameter or noise level parameter in such a way that the intensity of the noise inserted into the positions of the zero-quantized portion is comparatively increased for a comparatively increased noise energy or noise level of the transform.
13. The decoder apparatus of
cut frequencies, in the positions of the zero-quantized portion, higher than a frequency threshold defined on the basis of a noise colorness parameter or granularity parameter describing colorness or granularity of noise of the transform in the positions of the zero-quantized portion, the noise colorness parameter or granularity parameter being comprised in the at least one noise parameter, so that the frequency threshold is comparatively higher for noise which is whiter than for noise which is less white noise.
|
This application is a continuation of copending International Application No. PCT/EP2019/054312, filed Feb. 21, 2019, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. EP 18 158 985.4, filed Feb. 27, 2018, which is incorporated herein by reference in its entirety.
Examples herewith refer to encoders for encoding pictures in data streams, decoders for decoding pictures from data streams, and related methods.
At least some of the examples refer to a spectrally adaptive noise filling tool (SANFT) for perceptual transform coding of still and/or moving images.
Video information such as pictures may be digitized into matrix-like data, which may map pixels into entries of the matrices. A value of the entry may be, for example, in one of the spaces such as RGB, YUV, and so on. Each entry may be associated, for example, to a particular spatial position in the image (e.g., a pixel position). Data are therefore provided in the spatial domain. Spatial domains are known for which predicted and/or residual values (e.g., errors in the prediction) are associated to particular spatial position in the image.
Spectral domains and/or frequency domains are known for which the values in the spatial domains are transformed into spectral values. Several types of transforms are known, such as the discrete cosine transform, DCT, the discrete sine transform, DST, and so on. Transformed data (also known as coefficients) may be obtained, and they may be grouped into transforms. A transform may be represented in vectorial or in matrix form or in other schematizations (e.g., other 2-dimensional schematizations). A transform may comprise a plurality of coefficients, each being associated to one or more particular frequencies (e.g., a continuous or DC value, and so on) and/or a linear combination of frequencies.
It is known that encoders may encode video data (e.g., transforms) into bitstreams, and to signal the video data together with parameters. Decoders may perform an inverse process, e.g., by transforming the bitstreams into video information (e.g, electronically displayed). With respect to the encoder, the decoder may perform an inverse transformation, e.g., and inverse DCT, IDCT, an inverse DST, IDST, and so on.
However, the encoder and the decoder do not exactly operate on the same data. In fact, video data at the encoder side are quantized, and cannot deterministically reconstructed at the decoder side. For example, noise (e,g., texture data) may be lost, hence reducing the quality of an image when represented.
In order to cope with this issue, noise generation techniques have been developed in the conventional technology. Some conventional technology techniques are known as parametric techniques, in the sense that they are based on providing parameters associated to the noise lost because of the quantization.
Modern lossy image and video coders achieve very high reconstruction quality even at low bit-rates but tend to denoise or detexturize the input images. Traditionally, two independent parametric approaches, referred to as texture and film grain synthesis, have been applied in the spatial domain as pre and post-processors around the codec to counteract such effects.
Perceptual transform coding of digital still images has evolved considerably over the last two decades. Almost a quarter of a century after the completion of the JPEG image specification T.81 [1], formal objective and subjective evaluation of state-of-the-art still image coding standards was conducted for presentation at the 2016 International Conference on Image Processing (ICIP) [2]. In the visual assessments performed in the course of this evaluation, a main still picture profile extension of the H.265/High Efficiency Video Coding (HEVC) specification [3] was found to outperform, with statistical significance, most of the other coding schemes in visual quality across all tested images and bit-rates, without being exceeded in reconstruction quality by any other scheme [4]. Although the HEVC main still picture profile represents one of the most efficient image (and video) coding approaches currently available, like other solutions it tends to introduce visible artifacts such as blurriness and loss of textural detail into the reconstructed images. The authors noticed that, in particular, picture noise and quasi-random textures are softened by the inherent quantization of the “lossy” transform coder at very low bit-rates.
To alleviate this effect, caused by the coarse quantization (and, thus, highly sparse transform-domain representation resulting therefrom) at such low rates, two independent procedures, carried out at the decoder side during image reconstruction, have been developed recently.
The first method, called texture synthesis, intends to regenerate specific textural structures of foreground or background objects in images of natural scenes. This approach is generally applied in a parametric fashion by transmitting auxiliary information about the textural details and/or location to the decoder for guided synthesis [5]-[13]. It is worth noting that, on the one hand, a very compact representation of the parametric side-information is crucial particularly in low-bit-rate coding, in order not to cancel the subjective benefit of the texture synthesis by notably reducing the bit budget of the underlying transform coder architecture. On the other hand, texture-specific artifacts, such as the abovementioned ones, due to an insufficient parameter rate (see [5], [9]) shall be avoided.
The second method, traditionally called film grain synthesis, is employed to recreate the film (i. e., media) or camera (i. e., sensor) noise introduced into natural images during the picture acquisition process particularly at low light levels, which, as noted earlier, is prone to visual removal by low-rate coding algorithms [14], [15]. The basic principle behind this approach is an additive image plus noise model: the image to be compressed is split into a noise-like and a denoised component, the latter is encoded by a conventional image coding solution, and the former is reconstructed, again in a typically guided parametric manner, by a dedicated synthesis procedure and added to the decoded denoised image [14]-[20]. The noise parameters may comprise information such as higher-order statistics (variance, skewness, or kurtosis) [14], [20] or autoregressive (AR) model data [16]-[19], which may be extracted in either the spatial domain [14]-[19] or a spectral domain [20]. Note that, in the abovementioned work, a single global noise model is used for the entire image (or video frame). Thus, a single image source has to be assumed, and suboptimal visual performance may need to be accepted when coding images compiled from different origins.
At the decoder side 130, a decoding may be based on a conventional HEVC technique at stage 132 (which also inversely transforms the video data from the spectral domain to the spatial domain), to obtain a noiseless image 134 in the spatial domain. The stage 132 may also output texture templates 136, which are optional noise or texture patterns generated (i.e. derived during stage 132) from the conventionally decoded noiseless image 134 and which may be input to a noise or texture synthesis stage 138 to generate noise or texture 140 on the basis of the coded model parameters 126 obtained from the stage 124. The noise or texture may be added at stage 142 (spatial domain) to obtain the video output 144.
Therefore, in
The image of the input 112 is explicitly segmented between textural and non-textural spatial areas, which are assessed by stage 124, which processes the noise component 122. Therefore, besides operations at stage 118, operations at stage 124 are to be performed. Further to the transformations needed at stage 118, similar processes are needed for obtaining the model parameters 126, which may involve, for example, analysis of average values, noise color, and so on.
The operations of stage 124 are inevitably prone to classification errors: as the noise is to be classified in a small number of classes, errors may statistically occur, which cause faulty parameters and a faulty noise or texture synthesis.
In general terms, for each image a single, global noise model is used, which is to be accepted for the entire image.
According to an embodiment, an encoder apparatus for encoding a picture into a data stream is configured to: subject a transform of a block of the picture to quantization to acquire a quantized transform; determine a zero-quantized portion of the quantized transform; define a second block having coefficients of the transform associated to the zero-quantized portion of the quantized transform; determine at least one noise parameter providing noise information on the transform within second block; signal the quantized transform and the at least one noise parameter, wherein the encoder apparatus is configured to: divide the second block into a higher-frequency zero-quantized portion and a lower-frequency zero-quantized portion; and determine a noise colorness parameter or granularity parameter on the basis of a comparison and/or a relationship and/or a quotient based on: a first sum or aggregate value or statistical value or average value associated to the transform within the higher-frequency zero-quantized portion; and a second sum or aggregate value or statistical value or average value associated to the transform within the second block or within a super block comprising the block.
According to another embodiment, a decoder apparatus for decoding a picture from a data stream is configured to: derive a quantized transform from the data stream, a zero-quantized portion of the quantized transform, and at least one noise parameter providing noise information for a transform in the positions of the zero-quantized portion; provide noise synthesis to synthesize noise associated with the quantized transform on the basis of the at least one noise parameter; and reconstruct the picture using the quantized transform and the synthesized noise, wherein the decoder apparatus is configured to: define at least one frequency boundary on the basis of a noise colorness parameter or granularity parameter describing colorness or granularity of noise of the transform in the positions of the zero-quantized portion, the noise colorness parameter or granularity parameter being comprised in the at least one noise parameter; determine, from the zero-quantized portion, a higher-frequency zero-value subportion and a lower-frequency zero-value subportion; and set at zero the noise for the higher-frequency zero-value subportion and/or avoid noise insertion for the higher-frequency zero-value subportion and/or prevent from synthesizing noise for the higher-frequency zero-value subportion.
According to another embodiment, an encoding method may have the steps of: quantizing a transform of a block of a picture to acquire a quantized transform; defining a second block having coefficients of the transform associated to a zero-quantized portion of the quantized transform; acquiring at least one noise parameter providing noise information on the transform within the zero-quantized portion; signal the quantized transform and the at least one noise parameter, wherein the method may have the steps of: dividing the second block into a higher-frequency zero-quantized portion and a lower-frequency zero-quantized portion; and determining the noise colorness parameter or granularity parameter on the basis of a comparison and/or a relationship and/or a quotient based on: a first sum or aggregate value or statistical value or average value associated to the transform within the higher-frequency zero-quantized portion; and a second sum or aggregate value or statistical value or average value associated to the transform within the second block or within a super block comprising the block.
According to another embodiment, a decoding method may have the steps of: acquiring a quantized transform from a data stream, a zero-quantized portion of the quantized transform, and at least one noise parameter providing noise information for the zero-quantized portion; performing noise synthesis associated with the quantized transform on the basis of the at least one noise parameter; and reconstructing a picture using the quantized transform and the synthesized noise, wherein the method further may have the steps of: defining at least one frequency boundary on the basis of a noise colorness parameter or granularity parameter describing colorness or granularity of noise of a transform in the positions of the zero-quantized portion, the noise colorness parameter or granularity parameter being comprised in the at least one noise parameter; determining, from the zero-quantized portion a higher-frequency zero-value subportion and a lower-frequency zero-value subportion; and setting at zero the noise for the higher-frequency zero-value subportion and/or avoid noise insertion for the higher-frequency zero-value subportion and/or prevent from synthesizing noise for the higher-frequency zero-value subportion.
In accordance to examples, an encoder apparatus for encoding a picture into a data stream is configured to:
In accordance to examples, the encoder apparatus may be configured to:
In accordance to examples, the encoder apparatus may be configured to:
In accordance to examples, the encoder apparatus may be configured to:
In accordance to examples, the encoder apparatus may be configured to:
In accordance to examples, the encoder apparatus may be configured to:
In accordance to examples, the encoder apparatus may be configured to:
In accordance to examples, the encoder apparatus may be configured to:
In accordance to examples, the encoder apparatus may be configured to:
In accordance to examples, a decoder apparatus for decoding a picture from a data stream is configured to:
In accordance to examples, the decoder apparatus may be configured to:
In accordance to examples, the decoder apparatus may be configured to:
In accordance to examples, the decoder apparatus may be configured to:
In accordance to examples, the decoder apparatus may be configured to:
In accordance to examples, the decoder apparatus may be configured to:
In accordance to examples, the decoder apparatus may be configured to:
In accordance to examples, the decoder apparatus may be configured to:
In accordance to examples, the decoder apparatus may be configured to:
so that the frequency threshold is comparatively higher for noise which is whiter than for noise which is less white noise.
In accordance to examples, an encoder apparatus may be configured to:
The apparatus may be configured to:
The apparatus may be configured to:
The apparatus may be configured to:
The apparatus may be configured to:
The apparatus may be configured to synthesize noise on the basis of the combination of one of the pseudo-random values of the first sequence with one of the pseudo-random values of the second sequence.
The apparatus may be configured to operate according to an inter loop to:
In accordance to examples, there is provided a method, comprising:
In accordance to examples, there is provided a method, comprising:
In accordance to examples, there is provided a method for noise generation, comprising:
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
According to some techniques and/or standards, such as High Efficiency Video Coding (HEVC), a picture may be subdivided into a plurality of stages. For example, a picture may be divided, in the spatial domain, into a plurality of CTUs (Coding Tree Units). The width and height of a CTU may be signalled in parameters. All the CTUs in a video sequence may have the same size, such as: 64×64, 32×32, or 16×16, etc. “64×64 Unit”, “32×32 Unit”, and similar annotations indicate a coding logical unit which is encoded into a bit stream. “16×16 Block” and similar annotations indicate a portion of video frame buffer.
CTU indicates a logical unit. A CTU may comprise three blocks, luma (Y) and two chroma samples (Cb and Cr), and associated parameters. Each block may be called CTB (Coding Tree Block), A CTB may be split into CBs (Coding Blocks). Each CB may be associated to a particular type of prediction. It is with reference to the CB, that it is decided whether inter-picture or intra-picture prediction is to be performed. Intra-picture strategies are based on use of residuals from the same picture. Inter-picture prediction strategies are based on use of residuals from another picture (e.g., the preceding one, in the case of video data). The chosen prediction may be coded, as a parameter, in the CU (Coding Unit). CB may also be subdivided into a plurality of Prediction Blocks (PBs). Each CB can be split into a plurality of PBs differently according to the temporal and/or spatial predictability. After a prediction, the residual (difference between predicted image and actual image) may be coded using a spectral transform, for example. CB can be differently split into TUs (Transform Unit). Strategies based on HEVC may be used in the examples above and below. Strategies different from HEVC may be used in the examples above and below.
At the encoder side 210 (which may be exemplified, in some implementations, by an encoder apparatus), video data 212 (input) may be processed. The video data 212 may be in the spatial domain. The video data may comprise different channels (e.g., luma or chroma channels).
At the encoder side, a stage 214 may be provided. Stage 214 may comprise a segmentation function 214. The segmentation function may define, for an image, a subdivision into CTUs. The segmentation function may define, for a CTU, a subdivision into CBs. The segmentation function may define, for a CB, a subdivision into PBs. Procedures are known for choosing the most appropriate subdivision, so as to obtain a subdivision according to which each stage has the most homogeneous coding noise as possible. Conventional rate-distortion procedures, for example, may be used.
Stage 214 may comprise a prediction function, which may make use of a prediction residual value obtained for an adjacent block (intra-prediction) and/or a previous picture (inter-prediction).
Stage 214 may comprise a quantization parameter (QP) scaling function. The QP may determine the step size for associating the transformed coefficients with a finite set of steps. Large values of QP may represent big steps that approximate the spatial transform, so that most of the signal can be captured by only a few coefficients. Small values of QP refer to more accurate approximations of the block's spatial frequency spectrum, at the cost of more bits.
Stage 214 may comprise an analysis transform function (e.g., DCT, DST, and so on). A transform T may be obtained. The transform T may relate, for example, to spectral values (coefficients) which describe a block (e.g., a prediction block) as a multitude of frequencies. The transform T may be associated to a TU (e.g., in HEVC).
The transform T may be quantized at stage 216 to obtain a quantized transform TQ. The quantized transform TQ may be understood as a version of T for which coefficient values, intermediate between a higher value and 0, are forced to be zero. The quantized transform TQ may be understood to only comprise ones or zeroes in the entries of the matrix.
At stage 218, entropy coding or another operation may be performed. A bitstream 219 (carrying information regarding the coded denoised image) may therefore be signalled to the decoder side 230. For example, the bitstream 219 may be transmitted (e.g., using an antenna and radio-frequency communications) and/or stored into a storage means (e.g., a random access memory, RAM, a hard disk, a flash memory, or another mass memory).
At stage 220, a parametric noise model estimation is performed in the spectral domain. Stage 220 may use both the transform T and the quantized transform TQ. Stage 220 may signal at least one noise parameter 222 (coded model parameters) to the decoder side 230. For example, the at least one noise parameter 222 may be transmitted (e.g., using an antenna and radio-frequency communications) and/or stored into a storage means (e.g., a random access memory, RAM, a hard disk, a flash memory, or another mass memory).
In examples, the at least one noise parameter may comprise, inter alia, a noise energy parameter, a noise level parameter, a noise colorness parameter, a noise granularity parameter, a temporal correlation, etc. In inter-loop related examples, a correlation parameter between a previous signal and a current signal may also be obtained.
At the decoder side 230, a stage 232 may perform entropy decoding and/or QP descaling to obtain an entropy-decoded and/or QP-descaled value 234. Notably, the value 234 is in the spectral domain.
At stage 236, an inverse transformation is performed. For example, an IDCT or an IDST may be performed, to obtain a noiseless image data 238. The noiseless image data 238 may be combined with the noise composer stage 240 with noise 250 to provide a video output 252.
In order to generate the noise 250, the at least one parameter 222 may be used. At stage 244, spectrally adaptive noise filling may be performed, e.g., by using, in case, values 242 (e.g., QP descaled version of TQ). The obtained noise 246 is in the spectral domain.
At stage 248, the noise 246 in the spectral domain may be inversely transformed to obtain the noise 250 in the spatial domain to be used for the video output 252.
Hence, the parametric noise model estimation is performed, at 220, on the basis of data in the spectral domain, such as the transform T and the quantized transform TQ. This is different from the parametric model estimation 124 of the conventional technology, which is in the spatial domain. It has been noted that it is much easier to estimate noise on the basis of spectral data with respect to classify noise from spatial data. No operation is to be repeated as in case of stages 118 and 124 of the conventional technology.
Therefore,
Therefore,
Stage 220 may be input by the transform T and/or the quantized transform TQ (see also above).
Stage 220 may comprise a stage 302 of determining (defining) a zero-quantized portion NQ of the quantized transform TQ. The zero-quantized portion NQ may be formed by coefficients of TQ which are zero. The zero-quantized portion NQ may give information on the positions of coefficients equal to zero within TQ. The position may be represented with rows and columns in the matrix and/or may be understood as being associated to frequencies at which the coefficients are zero. In examples, the zero-quantized portion NQ does not distinguish between the coefficients which were actually zero in TQ and those which have been actually truncated by effect of the quantization. Also the coefficients which were originally zero are part of the zero-quantized portion N° as they have been notwithstanding quantized to zero. A representation of NQ is shown in
NQ may be understood as indicating the frequencies of TQ which would be discarded by the conventional stage 118 of
Stage 220 may comprise a stage 304 of retrieving coefficients of the transform T which are associated to the frequencies of NQ as determined at stage 302. We may, for example, define a block C (e.g., representable as a matrix) which is only defined in the positions of NQ but, instead of having zeroes as values, has the values of T at the corresponding entries. By complementing TQ with C, the original transform T may be obtained. Block C may be understood as being associated to the noise that has been polished from the quantized transform TQ by effect of the quantization. Other techniques may be obtained.
Stage 220 may comprise a stage 306 at which at least part of noise information 222N is derived (e.g., computed). For example, a noise energy parameter or a noise level parameter may be obtained. It has been noted, in fact, that block C is particularly adapted to obtain noise information (here indicated as I) associated to the transform T. In examples, an aggregate value or statistical value or average value associated to the coefficients of C may be performed. In examples, the absolute value of the coefficients may be calculated. We may calculate, for example:
I=mean(abs(all coeffs. c∈T for which cQ∈NQ)),
which is the same of
I=mean(abs(all coeffs. c∈C).
In examples, at stage 308 averages and/or aggregations and/or other statistical operations between different transforms (e.g., associated to different and/or adjacent blocks) may be performed. For example, the noise energy parameter or a noise level parameter may be averaged. The obtained noise information 308 of a block may therefore keep into account the noise information of adjacent blocks. For example, in HEVC the noise energy parameter or a noise level parameter may be averaged over 4×4 TUs of an 8×8 block. In case, an averaged noise information 310 may be obtained.
At stage 312, the noise information I (in case, after having been averaged at 308) may be quantized, to obtain at least a part of the at least one noise parameter 222 to be signalled to the encoder. The quantized noise information (quantized noise energy parameter or quantized noise level parameter) IQ may, in particular, be encoded in a field of a bitstream (e.g., with fixed length, e.g., of some bits). Notably, also the quantized noise information is noise information.
In examples, the quantized noise information may be represented as
IQ=0 if I<1/16,
IQ=└9+2 log2(I)┘ otherwise
where “└” is the floor function and gives the integer part of the argument. Note that, instead of the floor function, a rounding function may be used.
In examples, the higher the noise of T, the higher the noise energy parameter or noise level parameter, and the higher the quantized noise energy parameter or quantized noise level parameter.
The encoder side 230 will make use of the noise energy parameter or noise level parameter for noise synthesis.
Notably, the noise energy parameter or noise level parameter may be obtained in the spectral domain, without necessity of deriving the noise component 122 in the spatial domain as in
In addition or alternative, noise colorness parameter or granularity parameter may be obtained. It has been noted, in fact, that block C is particularly adapted to obtain noise information (here indicated as g) associated to the transform T.
At stage 314, for example, it is possible to divide (split) the zero-quantized portion NQ into a higher-frequency zero-quantized portion and a lower-frequency zero-quantized portion. This operation is the same of dividing C into a higher-frequency portion H and a lower-frequency portion L. This is because C is defined in the same frequencies of NQ (but C has not necessarily zero coefficients, which are instead the only coefficients of NQ). In examples, a threshold(s) may be defined for separating the higher-frequency portion H from the lower-frequency portion L. In examples, the threshold(s) is(are) predefined.
At stage 320, noise colorness parameter or granularity parameter g may be obtained on the basis of the higher-frequency zero-quantized portion and a lower-frequency zero-quantized portion of NQ (e.g., on the basis of H and L). The noise colorness parameter or granularity parameter g may be obtained on the basis of a spectral tilt value or low-pass characteristic associated to the noise of the transform T within the zero-quantized portion NQ.
The division of C into H and L may be appropriated. It is possible to determine the noise colorness parameter or granularity parameter g on the basis of a comparison and/or a relationship and/or a quotient based on:
In examples, a quotient or another relationship may be based, for example, on a numerator formed by values of H which are summed or averaged or aggregated with each other (e.g., in absolute values) and divided by a denominator formed by values of L after having summed or averaged or aggregated with each other (e.g., in absolute values).
For example, it is possible to perform:
Each of the second determinations is performed for one block of a group of blocks and the first determination is performed for the entire group of blocks, so as to associate an energy parameter or noise level parameter (I) to each block of the plurality of blocks, and to associate one noise colorness parameter or granularity parameter (g) to the entire group of blocks. An example may be provided, for example, by:
Note that a different factor other than 1, like 2 or 0.5, can also be used to perceptually optimize the mapping of the values L and H to g.
In examples, the coefficient associated to the lowest frequency (e.g., the DC frequency) may be excluded.
In examples, it is possible to determine the noise colorness parameter or granularity parameter g on the basis of a spectral tilt value or low-pass characteristic associated to the noise of the transform T within the zero-quantized portion NQ.
It has been noted, unexpectedly, that it is not necessary to strictly limit the low-frequency values to the coefficients of the specific block. It is also possible to use the low-frequency values averaged or aggregated or statistically processed from an entire super block that contains the block to which the transform T refers. For example, it is possible to use one single mean or average or aggregate value or summation or statistical value associated to an entire CTU. It is possible, for example, to use:
Accordingly, computational efforts are reduced and the operations are sped up.
It has been noted, for example, that the blocks of one single CTU have in general similar noise characteristics, also owing to the segmentation procedure (e.g., performed at stage 214), which has already defined a single CTU as being formed by elements with similar noise characteristics. Therefore, there is no need, for each block, to recalculate values such as mean(abs(all low-frequency coeff. c∈T with cQ ∈NQ), but it is possible to simply process it for one single CTU and using it for determining each quotient.
The noise colorness parameter or granularity parameter g may be comprised between:
The value g obtained at 320 may be one noise parameter 222N to be signalled to the encoder. At 322, the noise colorness parameter or granularity parameter g may be averaged over all the blocks (e.g., TUs) in the CTU to obtain an averaged noise colorness parameter or granularity parameter.
At 326, the value g (in case, after having being averaged) may be quantized, to obtain a quantized noise colorness parameter or quantized granularity parameter gQ.
In examples, the quantized noise colorness parameter or quantized granularity parameter gQ may be defined so as to be:
gQ=4 if g≤0, gQ=└4−4√min(1,g))┘ otherwise.
The quantized noise information (quantized noise colorness parameter or quantized granularity parameter) gQ may, in particular, be encoded in a field of a bitstream (e.g., with fixed length, e.g., of some bits). Notably, also the quantized noise colorness parameter or quantized granularity parameter is noise information.
The decoder will therefore be capable of distinguishing the particular type of the noise, e.g., on the basis of gQ and/or IQ.
It is noted that blocks 320-326 operate on the transform domain hence avoiding the necessity of stages 121 and 124 in the spatial domain.
Further, the noise information (g, I, gQ, IQ etc.) is at least partially obtained with reference to a single portion of a picture (e.g., a TU, and/or a CTU in some cases), so as to provide particularly localized information regarding the noise (texture). Therefore, a particularly adapted noise information may be provided to the decoder.
It is also possible to obtain a fine scalability for between an entirely parametric estimation and transparent estimation: a user simply has to regulate the quantization at stage 216 according to known techniques. A user may, for example, modify the quantization stage so as to increase the number of zero-quantized coefficients (i.e., increasing the entries of NQ) or reducing the number of zero-quantized coefficients (so that, for example, TQ in
Hence, the encoder may be configured to:
In examples, an optional stage 402 for QP descaling may be provided. In some examples, stage 402 implements, at least partially, stage 232 of
In examples, the decoder side 400 may comprise a dequantization stage 404. The dequantization stage 404 may use the noise colorness parameter or granularity parameter (e.g., its quantized version go) from the bitstream. The noise colorness parameter or granularity parameter may be, for example, read from a particular data filed of the bitstream, e.g., one having a particular (e.g., predefined) position and/or a particular (e.g., predefined) length. In examples, the noise colorness parameter or granularity parameter may therefore be derived so as to distinguish, for example, between Brownian noise, white noise, and a plurality of intermediate colors or granularities.
In examples, it is possible to perform:
g′=2−g
In examples, a white noise may have g′=1, and Brownian noise may have g′=0.5.
The decoder side 400 may comprise a stage 406. This stage 406 may make use of the zero-quantized portion NQ of the quantized transform TQ. NQ is in general easily determined from TQ, as NQ is simply formed from the portions of TQ with coefficients equal to zero.
From NQ, a subportion NQ* is defined. The subportion NQ*may be defined so as to be a low-pass version of NQ. For example, coefficients at frequencies greater than threshold may be cut. When representing NQ as a matrix, the subportion NQ*may be understood as a matrix defined in the same entries of NQ, apart from the last rows and/or columns (corresponding to the higher frequencies), which are cancelled. Of course, a major information carried by NQ*is the frequencies at which the NQ is cut.
The subportion NQ*is defined on the basis of the noise colorness parameter or granularity parameter g. In examples, the whiter the noise (texture) of the transform T, the bigger NQ* with respect to NQ. In examples, the more Brownian the noise (texture) of the transform T, the smaller NQ* with respect to NQ. The better the noise is distributed, the more NQ* corresponds to NQ. NQ is therefore a low-pass version of NQ. The cutoff frequency (boundary) may therefore be understood as being dynamically defined. For example, the higher the colorness or granularity, the higher the cutoff frequency, the bigger N.
Given the vertical and horizontal dimensions of the block (e.g., TU) as a height h and width w, respectively, the spectral region of T to be noise filled (i. e., reconstructed parametrically) may be determined, according to examples, as:
NQ*=all coeffs. cQ∈NQ at indices h′<┌h g′┘,w′<┌w g′┘
where “┘” is the ceiling function and “┌ . . . ┘” is the rounding function. Note that, for g′<1, NQ* represents a low-pass subset of all zero-valued coefficients in TQ.
In examples, the decoder side 400 may comprise a stage 408 for filling random values 410 on TQ. In examples, the random values 410 may be obtained considering the value I′. In examples, random noise, however, is not filled on the whole TQ, but only on the subportion NQ*. Stage 408, therefore, may produce a block F in which noise is filled only in correspondence to NQ*. A no-noise portion Z may be formed by the subportion of NQ not interested by NQ. The no-noise portion Z may be formed, for example, by coefficients which are zero. In examples, when the noise is white noise, the no-noise portion Z may be void. In examples, the no-noise portion Z is maximum where the noise is Brownian noise.
At stage 412, padding, QP descaling and/or inverse transformations may be performed to obtain noise 414 in the spatial domain.
At stage 416, an optional deblocking procedure may be performed. In examples, a spatial two-tap smoothing filter may be applied, e.g., where the noise is not white. The deblocking effect may be understood by comparing the non-deblocked element 480 (before stage 416) with the blocked element 482 (after stage 416).
As may be noted, the noise synthesis may be operated at the spectral domain. There is no necessity for the stage 138 in the spatial domain as in
Notably, the dimension of the subportion NQ* may be directly derived from the noise colorness parameter or granularity parameter, which in turn is conditioned by the quantization as performed at stage 216. Therefore, by modifying the parameter for the quantization at the encoder side, we will obtain a fine scalability for the dimensions of NQ* and for the noise synthesis.
It is also to be noted that the noise colorness parameter or granularity parameter are in general different for each block, which permits to increase the quality of the generated noise.
In examples, the random values 410 may be obtained on the basis of the at least one noise parameter 222. In examples, the random values 410 may be obtained on the basis of the noise color or energy I (or its quantized version IQ).
At stage 420, a noise energy parameter or noise level parameter may be obtained. The noise energy parameter or noise level parameter may be, for example, read from a particular data filed of the bitstream, e.g., one having a particular (e.g., predefined) position and/or a particular (e.g., predefined) length. In examples, the noise colorness parameter or granularity parameter IQ may be processed to obtain a reconstructed value I′.
In examples, I may be reconstructed to 0≤/′≤0.5 using the decoded level index,
I′=0 if IQ=0,
I′=2(IQ−9)/2 otherwise.
In examples, I′ may be used to scale random values at a random scaler stage 422. The random scaler stage 422 may obtain random values 424 from a pseudo-random value generator stage 426. Accordingly, it is possible to use random values 422, for the noise synthesis, which consider the noise energy or level. The pseudo-random value generator stage 426 may base its random generation on a seed 428.
In general terms, the higher the noise energy parameter or the noise level parameter, the higher the random values 410, the higher the noise generated into TQ. Notably, however, where the dimensions of NQ ′ are reduced with respect to NQ (non-white noise of T), the random coefficients, irrespective of their intensity, are not inserted into all the frequencies: by virtue of having a non-void no-noise portion Z, we obtain a noise generation only on the frequencies which in NQ*. Accordingly, we have obtained a low-pass filter for the noise.
Notably, for each block (e.g., TU) of the picture we obtain a different intensity of the noise, which will be distributed, however, only at some particular frequencies: this permits to increase quality for the texture of the images, which may appear differently in accordance to different regions of the image (e.g., blocks, TUs) and of different colorness, energy, etc.
According to examples, the resulting picture may, therefore, appear with;
Accordingly, noise is synthesized on the basis of the at east one noise energy parameter or noise level parameter (I′, IQ) in such a way that the intensity of the noise inserted into the zero-quantized portion NQ is comparatively increased for a comparatively increased noise energy or noise level of the transform T.
Hence, the decoder 400 may be configured to:
In addition or alternative, the decoder 400 may be configured to:
so that the frequency threshold is comparatively higher for noise which is whiter than for noise which is less white noise.
In examples, the encoder side and the decoder side may be the same device. In examples, an encoder may comprise an decoder for understanding the signals that the decoder will generate.
It has been noted that (in particular when operating in inter loop) noise may be appropriately be synthesized when the random generation stages are somehow associated (e.g., in respect of the reset frequencies) to the prediction or correlation or expectation of the values for a current signal (e.g., associated a frame, a block . . . ) on the basis of the previous signal.
A parameter such as a temporal correlation (TCorr) may therefore be obtained (e.g., from the encoder side 210 as part of the at least one noise parameter 222). An example of technique for obtaining TCorr may be, for example, to obtain a Pearson correlation coefficient (see, for example, https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#For_a_sample). Examples are provided below.
The correlation function gives an idea of a relationship between random functions. A high temporal correlation value indicates a high probability that a signal or a parameter for a previous frame is repeated in the subsequent parameter. While in quasi-still video content temporal correlation tends to be comparatively high, in moving video content temporal correlation tends to be low.
It has been understood that it is possible to:
The first sequence of pseudo-random values 503 is reset (refreshed) faster than the second sequence of pseudo-random values 505. For example, the second sequence of pseudo-random values 505 may be reset for every frame, while the first sequence of pseudo-random values 503 may be refreshed every 16 or less than 16 frames, for example.
Accordingly, the first sequence of pseudo-random values 503 will reproduce random values which are tendentially less repetitive than those produced by the second sequence of pseudo-random values 505.
It has been noted, unexpectedly, that by combining (e.g., linearly combining) random values generated by the first generator 502 with random values generated by the second generator 504, and by weighting the combination on the basis of a statistical value associated to the probability of repetition of a signal or a parameter, it is possible to obtain values which are extremely appropriated for noise synthesis. For example, fully identical pseudo-random texture patterns in successive pictures may be obtained in case of highest temporal correlation.
The following cases may be identified:
Therefore, it has been understood that, by simply providing to the decoder side a parameter such as the temporal correlation, the noise generation may be intelligently obtained by combining (506) one of the pseudo-random values (503) by combining (506) one of the pseudo-random values (503) of the first sequence with one of the pseudo-random values (505) of the second sequence so as to weight the value (505) of the second sequence more than the value (503) of the first sequence in case of comparatively higher correlation and/or lower predictability.
In examples, the at least one noise parameter 222 comprises the temporal correlation TCorr (e.g., for inter loop), e.g., as provided by the encoder side. TCorr may be quantized as a natural number (e.g., between 0 and 5) which may be encoded in a particular (e.g., predefined) field of the bitstream (and, in case, with a particular, e.g., predefined, length).
At stage 520, a valuer may be obtained on the basis of the temporal correlation TCorr (other techniques may be used). In examples,
β=TCorr/5
Therefore, in examples
β0,0.2,0.4,0.6,0.8,1
In examples, the higher β, the higher the temporal correlation. At stage 506, the weighed combination of the random values 503 (RNG1) and 505 (RNG2) may be obtained by:
out=(1−β)·f(RNG1)+β·f(RNG2)
Therefore, β operates as a weight which determines whether the static pseudo-random values 505 are to be weighed more or less than the dynamic pseudo-random values 503, according to the necessity of reproducing the same texture (or a similar texture) of the previous frame for the current frame. Note that f(RNG 1) is a function outputting values 503 while f(RNG 2) is a function outputting values 505.
In order to obtain the pseudo-random value 424 to be actually used for noise synthesis, information on the current region (e.g., block, TU, CU, CTU), may be used.
In case of large TU regions, it is possible to divide a current spatial region into subregions at stage 530 and apply J forward transforms on the pseudo-random values 505 of the second sequence, where J is the number of forward transform to be applied. Note that J equals 1 if said division of the TU into spatial subregions is not applied.
An inter loop stage 610 (which may comprise the stages 244 and 248 of
As can be seen from
It has been noted that it is possible to feed back (630), to the inter loop 610, the signal s1′ (608) only with the first component 612′ of noise 612 (250) of noise.
A discussion on examples is here provided. Examples above and below may be referred to as examples of a Spectrally Adaptive Noise Filling Tool (SANFT).
A unified low-complexity alternative, operating directly within the frequency domain of conventional transform codecs, with tight coupling to the transform coefficient quantizer, is proposed. By virtue of its design, this spectrally adaptive noise filling tool (SANFT) allows for structurally and computationally efficient as well as highly input adaptive realizations by reusing the coder's existing optimized spatial and spectral partitioning algorithms.
Objective and subjective evaluation in the context of main still picture High Efficiency Video Coding (HEVC) confirms the benefits of the proposal
In this work, a low-complexity unification of the film grain and texture synthesis paradigms is presented. The proposal, referred to as spectrally adaptive noise filling tool (SANFT), bears three key advantages in comparison with the state of the art. First, it avoids explicit binary segmentations of the coded image into textural and nontextural spatial areas [7], [10] or noisy and denoised portions by way of filtering [14], [16]-[19]. This minimizes the chance of limiting the visual coding quality gains due to false classifications and enables seamless scaling of the codec to visual transparency, two aspects which will be examined below. Second, as described below and above, both the analysis and synthesis algorithms are designed such that they can be implemented with very low algorithmic and architectural complexity. At the same time, the proposal allows for highly localized control over the synthesis process and, as such, precise adaptation to the instantaneous spectral and spatial properties of the input image to be coded. Third, a straightforward integration into existing image or video transform coding standards is permitted. In fact, below details an implementation of SANFT into the main still picture profile of HEVC and reports on the preparation and outcome of a formal subjective evaluation carried out to assess its visual benefit in an actual use case.
The discussion is here directed to flexible noise or texture synthesis in image coding.
A fundamental concept behind both texture and film grain reconstruction, as outlined above, is a segmentation into two components, of which one—the coarse structural portion—is coded conventionally and the other—the fine textural detail—is parameterized by the transmitter and synthesized by the receiver. A block diagram of this processing chain is depicted in
It is evident from
To avoid these drawbacks, the following structural modification of the scheme of
Furthermore, by adapting the relative set sizes spatially based on the local image characteristics, the somewhat limiting binary segmentation of the image in said prior work becomes obsolete. In other words, the denoising and/or segmentation pre-processors can be replaced by the partitioning method before or during spectral quantization.
The discussion is here directed to joint parametric partitioning and quantization.
There has been noted above the removal of fine quasi-random texture and/or noise components of an image due to coarse spectral quantization in low-bit-rate transform coding. Assuming, analogous to the film grain model, an additive structure plus texture paradigm in texture coding, a common explanation may be given as follows. Consider two disjoint quantized transform coefficient sets, SQ and NQ, with
SQ∈TQ,NQ∈TQ,SQ∩NQ=ø, (1)
obtained from some local rectangular block of the input image, or a prediction residual thereof, by way of a DOT, DST, or Wavelet analysis (i. e., forward) transform. The set T of all coefficients of this transform is quantized to a reduced range of output values, or indices, yielding TQ having lower entropy than T. Hence, after the quantization, certain coefficients of TQ will have been mapped to index zero even though the respective coefficient magnitudes in T are nonzero. Let this subset of TQ, whose information content is, effectively, lost through the quantization, be denoted by NQ. The remaining coefficients of TQ, which are not elements of NQ, will all exhibit a nonzero index and, accordingly, shall form the second subset of TQ, called SQ. It can be observed that, at low coding bit-rates, NQ≈TQ, SQ≈ø, while at high rates, SQ≈TQ, NQ≈ø. Moreover, NQ primarily comprises HF coefficients.
The additive noise or texture models imply that the coefficients of SQ, which “survive” the quantization, comprise both signal (or structure) and noise (or texture) contributions. Thus, in SQ, waveform accurate coding is achieved for both of these model components. Similarly, it can be argued that, through the coefficients of NQ zeroed by the quantizer, both signal (structure) and noise (texture) information is lost. However, the variance of the image signal(s) generally exceeds the noise variance, and the image usually exhibits a low-pass spectral shape while the noise/texture is flat or high-pass shaped. As a result, at mid to high bit-rates, NQ is likely to comprise much less structural image detail than texture or film grain, which explains the denoising effect of a lossy image transform coder [15]. It can, therefore, be concluded that, at sufficient bit-rates, NQ is primarily dominated by texture or noise data and, as such, can be represented very efficiently by a parametric model that does not convey individual coefficient indices but only compactly coded descriptors of their overall statistical properties.
It is worth mentioning that an alternative countermeasure to the denoising effect would be to utilize a dithered quantizer, in which pseudo-random noise with a specified variance dependent on the quantizer's step-size is added to the coefficients of T prior to their mapping to T° (and possibly subtracted again afterwards) [21]. In the case of coarse quantization, however, this leads to more of the coefficients of TQ being nonzero, so SQ and, thereby, the entropy of TQ and the coding bit-rate increase. This solution is, therefore, impractical for low-bit-rate image or video compression.
In HEVC and other modern image and video coders, the sizes of the localized transforms, applied in a non-overlapped way to an image component (i. e., luma or chroma channel), are chosen in an input-adaptive fashion by the encoder and signalled to the decoder such that corresponding synthesis (i. e., inverse) transforms can be applied. The encoder-side search for an optimal transform block segmentation is typically performed jointly with the search for an optimal prediction by way of a rate-distortion (RD) loop, and for the sake of maximizing the coding performance, this architectural property shall be maintained. Note that, in most cases, the chosen transform sizes reflect the stationarity of the local image contents: in predictable, structurally flat regions, relatively large transform blocks are selected, whereas unpredictable, highly structured parts like nontrivial edges are coded using relatively small transforms. In the present context, where the noise/texture model parameters may be acquired separately for each transform-individual NQ, one benefit of this characteristic is that the block segmentation implicitly allows for more model parameters in a nonstationary image region, where the local statistics may change from one pixel to the next, than in a quasistationary region, where the statistics may not change at all. Hence, an efficient (in a RD sense) and perceptually pleasing spatial partitioning of an image component into local NQ parameter units can be reached by simply reusing the conventional RD optimal transform block segmentation and signaling.
Analogously to the spatial partitioning, which can be achieved without any additional algorithmic operations, a low-complexity spectral partitioning of each TQ into its subsets SQ and NQ can be realized merely by letting the transform-domain quantizer “do its job”. Note that the relative sizes of SQ and NQ—and, thereby, the amount of denoising—can be influenced somewhat by traditional means like static or adaptive quantizer deadzone variation, whose operation does not need to be signaled to the decoder [22], [23]. A closer examination of this aspect will be provided in the following
A discussion is here directed to integration and evaluation of noise filling, e.g., in HEVC. Having explained that spatial as well as spectral partitioning for input-adaptive parametric noise or texture coding extensions can be inherited from the underlying image or video codec, a low-complexity variant of the SANFT proposal is integrated into the main still picture HEVC software as follows.
An implementation of SANFT (e.g., into HEVC Reference Software) is here discussed.
In HEVC, implemented herein by a 64-bit processor-intrinsics optimized variant of the ITU/ISO/IEC test model (HM) 13.0 reference software [24], the maximum spatial partition size is given by the coding tree unit (CTU) size of 64×64 picture elements (pels), in which the largest transform unit (TU) size is 32×32 pels. Each CTU is, therefore, split into at least four TUs, and each TU can be further divided into smaller TUs using a recursive binary segmentation tree, which is signaled to the decoder. Wth this binary tree, a set of parametric model parameters can be explicitly assigned to (and transmitted for) each NQ associated with a TU after the D×D pels belonging to that TU have been analysis transformed and the resulting D×D transform coefficients have been quantized. Note that, in HEVC and its predecessors, the transform input represents a temporal or, in main still picture and “all-intra” coding, a spatial closed-loop prediction residual [3], [25] where the predictor block is derived from the decoded left and upper neighboring pels of the given D×D block.
HEVC supports a minimum TU size of 4×4 pels, allowing for high spatial parameter resolution, and for this TU size, an optional transform skip mode can be employed to bypass the analysis and synthesis transforms and use weighted identity transforms instead. Since, on average, the 4×4 TUs are the most commonly selected ones during encoding, they also contribute most dominantly to the codec's side information rate. In fact, the authors noticed that, by allowing dedicated NQ parameters for each 4×4 TU, the overhead due to SANFT coding and, thereby, the total size of the coded bit-stream grows to an unacceptable level. It is, therefore, advantageous to sacrifice some spatial resolution for reduced overall side information overhead by acquiring and transmitting only one NQ data set for each 8×8 area split into four neighboring 4×4 TUs.
Regarding the spectral partitioning into SQ and NQ, it was found that, at medium and high coding rates, the overall bit-rate increase due to the SANFT parameters can be partially compensated for by adaptively increasing the quantizer's deadzone in a way similar to the optimized quantization approach in EVS audio coding [23]. In particular, by successively zeroing out higher-frequency transform coefficients initially quantized to an index magnitude of one, until a coefficient with a greater magnitude is reached (beginning at the highest frequency and ending, at the latest, at half that frequency), the number of elements in NQ can be increased. Thus, in doing so, SQ—and, thereby, the bit-rate needed for entropy coding of TQ—is reduced without causing visible blur (i. e., loss of structural detail) in the decoded image. Moreover, further gentle detexturizing and denoising of the coefficients in T is achieved, rendering the legacy codec-external pre-processing methods of
Having illuminated the spatiospectral integration of the SANFT proposal into the fundamental algorithmic architecture of HEVC, here after there is provided a detailed description of the SANFT analysis/encoding and decoding/synthesis procedures.
Noise parameter extraction and coding
The proposal follows the notion that natural texture, like film grain, frequently exhibits a noise-like character [8], and it extends on the parametric model utilized for frequency-domain noise filling in recent MPEG audio transform coders [26], [27]. Thus, after the QP-dependent scaling (i. e., normalization) and quantization of T resulting in TQ, and the definition of NQ ∈TQ, a mean noise level/is derived for each TU via the coefficients of T (after said QP-dependent scaling) located at the same spatial frequencies as the respective coefficients in NQ.
I=mean(abs(all coeffs. c∈T for which cQ∈NQ)). (2)
Note that, for better stability of the analysis, it was found beneficial to exclude the coefficient with the lowest possible frequency (first coefficient in T/TQ, i. e, the DC offset in a DCT-Il) if that is included in NQ, unless the employed transform is a weighted identity transform. Assuming only moderate deadzone adaptations, I will, due to the normalization of T, lie in the range between 0 and 0.5 (inclusively) for a scalar quantizer and, hence, can be directly quantized. Herein, a mapping to one of eight level indices is used:
IQ=0 if I<1/16, IQ=└9+2 log2(I)┘ otherwise. (3)
Furthermore, a single noise granularity value g is jointly obtained from all TUs in the given CTU. This second parameter represents the average spectral tilt (i. e., low-pass characteristic or coarse frequency envelope) of the coefficients in N. Herein, it is defined as
i. e., a value of g=1 implies fine “white” granularity of the noise, while a value approaching g=0 indicates coarse “Brownian” granularity. Note that a factor other than 1, like 2 or 0.5, can also be used. Moreover, for very-high-resolution content where the physical limitations of the cameras become more noticeable, a constant offset of ¼ or ½ is subtracted from (4) to make the noise output appear more natural. Like I, g can be quantized straightforwardly:
gQ=4 if g≤0, gQ=└4−4√{square root over (min(1,g))})┘ otherwise. (5)
The spatially localized noise parametrization via IQ, gQ suffices to transmit to the decoder the key statistical properties of the film gain or textures to be reconstructed. Hence, further measures like the aforementioned skewness, kurtosis [14], [20] or AR data [16]-[19] are not required, which minimizes the bit-stream overhead.
IQ and gQ, along with the QP indices, are coded differentially in horizontal and vertical direction across all CTUs and TUs. To this end, an approach very similar to that used in HEVC for delta-QP coding [3], [28], involving a mixture of mapped unary coding and context adaptive binary arithmetic coding (CABAC), is employed.
Noise parameter decoding and parametric synthesis
At the receiver, the transmitted TQ, IQ, gQ, and QP indices are subjected to entropy decoding, and in case of the latter three bit-stream elements, the two-dimensional differential (delta) coding is undone. I is then reconstructed to 0≤I′≤0.5 using the decoded level index,
I=0 if IQ=0, I′=2(IQ−9)/2 otherwise (6)
with the italic prime ′ denoting the parameter quantization process.
From gQ, a relative transform low-pass factor 0<g′≤1 is derived:
g′=2−gQ/4, (7)
The actual SANFT application, i. e., noise component synthesis, in each TU is performed separately from the inverse transform of the associated TQ (whose operation is left unchanged) in four steps.
First, given the vertical and horizontal dimensions of the TU as a height h and width w, respectively, the spectral region of T to be noise filled (i. e., reconstructed parametrically) is determined:
NQ*=all coeffs. cQ∈NQ at indices h′<┌h g′┘,w′<┌w g′┘. (8)
Note that, for g′<1, NQ* represents a low-pass subset of all zero-valued coefficients in TQ, i. e., NQ of (1), as indicated by the star. As in the encoder-side analysis, the lowest-frequency “DC” transform coefficient of TQ is excluded from NQ if that coefficient was quantized to zero and the employed analysis transform was a DCT or DST. The coefficients in NQ* are now substituted with pseudo-random coefficients, e. g., randomly signed unity values as in [27] or signed results of a variance-normalized linear congruential random number generator [29], which are scaled (multiplied) by I′.
As the second step, NQ* is padded with zeros at the locations of the other coefficients in TQ and then subjected to the same inverse transform as that applied to TQ, with the same QP value and fixed-point representation for scaling. This ensures that the synthesized spatial contents of the waveform accurate and the parametric parts are spectrally disjoint and compatibly scaled, as desired The low-pass effect of limiting the spectral support of NQ* via g′<1 can be more pronounced and effective than the spatial-domain AR model used in, e, g., [17] or [18] because the latter only captures the relationship among neighboring pixels within a small range [20].
The inverse transforms are applied in all TUs and CTUs covering the picture/frame and, thus, result in a temporary noise image which is eventually added to the legacy decoded, denoised image for the final reconstruction. Generating the temporary noise image instead of applying a single joint inverse transform to the TU-wise mix of TQ and NQ* bears two major advantages. On the one hand, it allows for HEVC's “intra” predictive coding to be undone using only the waveform-coded but not the noise filled parametric parts. Since the synthesized noise components are, by definition, uncorrelated with the input image and, thus, would lower the prediction gain quite severely, this architectural approach is highly beneficial in terms of maintaining the core transform coding efficiency. On the other hand, it enables the separate application of dedicated deblocking and deringing post-filters like [30], [31] to the temporary noise image as well as the decoded, denoised image.
In HEVC, deblocking is carried out after “intra” decoding using vertical and horizontal (in that order) conditional spatial filtering along the block boundaries at a multiple-of-8 pel location [28]. In case of strong low-passing with gQ>2 in the SANFT, notable dis-continuities tend to appear also in the temporary noise image, and they do so even at intermediate non-8×8 locations in places where 4×4 TUs are primarily used. Since the HEVC deblocking filter is designed with a certain complexity-performance trade-off in mind and a much simpler unconditional deblocking filter would suffice in case of the noise image (where every spatial discontinuity may be considered undesirable), it is reasonable to maintain the HEVC algorithm on the decoded, denoised image and to apply faster and more aggressive deblocking to the temporary noise image.
Therefore, as an optional third step applied in case of non-white noise granularity with g<1, it is proposed to apply, in vertical and then horizontal direction, a spatial two-tap smoothing filter to the noise image along the pets on each side of a TU block boundary:
{acute over (p)}A=(5·gQ·pB+(64−5·gQ)·PA+32)»6,
{acute over (p)}B=(5·gQ·pA+(64−5·gQ)·PB+32)»6, (9)
with pA and pB being the adjacent pels (i. e., horizontal or vertical neighbors) at the edges of TU blocks A and B, respectively, and»indicating the binary right-shift operation (Video coding algorithms are typically specified using only integer arithmetic). It is easy to note in (9) that the closer gQ is to 4, the more cross-TU blending is performed in the derivation of the deblocked border values {acute over (p)}A and {acute over (p)}B. Moreover, since g is fixed for the entire CTU and (9) is unconditional, the filter can be realized with low algorithmic complexity.
Having synthesized and optionally deblocked the noise component, the fourth and final step in the decoder-side SANFT method constitutes the combination of the temporary noise image with the traditionally decoded, denoised image component to complete the output picture or frame. This can simply be achieved using spatial addition or, equivalently (due to its pseudo-random nature), subtraction of the noise image. It is worth noting that a coded image is split into its luminance (luma, Y) and chrominance (chroma, U and V) components. Intuitively, SANFT processing should, thus, be applied to Y as well as U and
V. However, the authors observed that the restriction of the tool to the luma channel limited both the algorithm complexity and the side information overhead without reducing the visual benefit too much. Hence, the luma-only configuration was chosen for the formal subjective evaluation, which is discussed subsequently.
Three more algorithmic aspects are worth mentioning. First, to save dynamic memory during SANFT decoding, the generation of a complete temporary noise image can be circumvented by storing, for each decoded CTU, only the temporary 64×64 “noise patches” of its previously decoded left, top, and top-left CTU neighbors. After deblocking has been executed on this square-of-four CTU quadruple, the SAN FT contribution of said top-left CTU neighbor is fully synthesized and can be added to the reconstructed output image (or subtracted therefrom), making space for a new patch generated when decoding the next CTU. Second, the authors discovered that, in the luma channel, the SAO post-processor [31] can be deactivated without causing significant perceptual quality reduction when gQ≤2 and IQ>0 in at least one TU of the given CTU, since quantization-induced ringing artifacts tend to be visually masked by the noise filling. This saves a part of the bit-stream overhead utilized for the SAO parameters. Third, for a more consistent texture or film grain look across differently sized TUs, it makes sense to, as an alternative example, create the temporary noise image from only 4×4 inverse transforms even in case of larger TUs. Specifically, for any D×D-sized TU having D=8, 16, 32 or, possibly, 64, a set of D2/16 square-arranged 4×4 synthesis transforms of separate NQ* with independently generated pseudo-random numbers (or signs) can be employed. Note that all NQ* in said TU share the same 4×4 NQ or a low-pass subset thereof if g>0, so NQ has to be derived from a D/14-times downsampled (in both directions) magnitude-only version of TQ, i. e., ↓(abs(TQ)).
The assessment of objective and subjective performance is here discussed.
The SANFT integration into the main still picture HEVC codec was evaluated both objectively and subjectively. To this end, the first frames of 11 full high definition (FHD) and ultra-high definition (UHD) 8-bit YUV video sequences utilized during the HEVC standardization [25], [28], [32] (classes A and B) were coded and decoded with and without the SANFT and a proprietary, visually motivated per-CTU local adaptation of the quantization parameter (OP). The global (i. e., mean) QP index was set to 37, for medium quality, and 32, for higher quality, RD optimized quantization was used, and all other coding parameters were either set to the default values of the reference software HM 13.0 [32] or, in case of activated SANFT.
For objective evaluation, complexity estimates were conducted using runtime measurements integrated directly into the HM software, bypassing any disk read and write overhead. Regardless of whether the local OP adaptation (QPA) was enabled, the decoder-side complexity overhead due to the SANFT synthesis was found to lie between one quarter, at QP 32, and one third, at OP 37, with a mean around 29%. This additional complexity requirement can be attributed to the execution of further inverse transforms for the noise filling texture. At higher QPs, inverse transforms can traditionally be skipped in numerous blocks due to the absence of nonzero spectral coefficients therein, but even for these blocks, inverse transforms has to be applied for the noise synthesis. Hence, the decoder-side complexity overhead is slightly greater at high QPs than at low QPs, Note that the mean encoder-side overhead, measured at only about one seventh of the total workload, is much lower because only noise level and shape/granularity analysis, but no SANFT decoding and extra synthesis transforms, are involved. Moreover, further algorithmic optimizations should make it possible to reduce this overhead to less than one tenth.
As a second objective aspect, the mean bit-stream size increase or, in other words, side-information overhead caused by the coded SANFT parameters was quantified. It was observed that this overhead is roughly constant across the two QP operating points and, as such, leads to a greater relative bit-stream growth at lower rates (5-6% at QP 37) than at higher rates (2.5-3% at OP 32).
The subjective evaluation was carried out using a formal visual comparison test following the methodology of lTU-R rec. BT.500 [33]. Specifically, a degradation category rating (DCR) experiment with a sequence-wise simultaneous ten-second presentation of the uncoded reference, on the left, and the coded version, on the right, assessed on a nine-grade numerical degradation scale according to Annex C and Appendix V, respectively, of ITU-T rec. P.910 [34] was performed. Each uncodedicoded image pair was, if needed, cropped to FHD resolution and depicted side-by-side horizontally on top of a mid-gray background on a color-calibrated 65-inch LG E6 OLED television. An uncoded/uncoded hidden reference pair was shown for each sequence at random points during the test for verification purposes. Fourteen observers experienced in detecting traditional image and video coding artifacts, aged between 24 and 35 (incl. two females) participated in this study. The voting period after each ten-second stimulus pair presentation was not limited in duration in order to allow the participants to take voluntary short breaks. The viewing distance for all 14 subjects was limited to the range 3H-3.5H, with H denoting the stimulus height (0.4 m in this case, i. e., half the height of the television's UHD panel).
The mean opinion score (MOS) and associated 95% confidence interval (computed with the assumption of a student-t distribution) for each codec configuration and OP operating point is illustrated in
Regarding statistical differences between codec configurations, it can be observed that, when activating the SANFT, higher MOS values are attained at the cost of increased bit-stream sizes. When linearly connecting identical configurations at the two QPs, as in FIG. 5, and interpolating the MOS and corresponding confidence interval values along these lines, it can be concluded that
with statistical significance defined by t-test probability p<0.05. Focusing on QP 32, it is also worth noting that the two sequences scored worst when applying conventional HEVC—ParkScene and HomelessSleeping, both rated at 4.5 and roughly 5.0 without and with QPA, respectively—benefit most from the SANFT processing (their MOS values improve by 1.0 and 1.9, respectively). The proposed noise filling approach, thus, contributes to a more balanced and input independent overall perceptual coding quality. Unfortunately, for the best-scored HEVC coded image, BasketballDrive, the reverse is true (its MOS decreases from 7.0 to 5.92, regardless of whether QPA is active). Although this is the only one of the 11 images for which a score degradation is observed, further investigations into this issue are planned as a subject for future research.
Integration for the non-intra case is here discussed. In particular, generating temporally correlated “static” textures is here discussed.
SANFT, as described earlier, can insert pseudo-random noise into decoded images/videos in spectral regions which were quantized to zero by the video codec's quantizer, wherein said noise can represent both camera noise or film grain as well as textural image/video content. In video codecs using temporal prediction, e. g., by way of motion compensation, temporally correlated “static” textural video components like the fine details of an asphalt road, which move predictably within the video images over time and which do not appear “noisy” (i.e., which do not appear temporally variant), can be represented well by temporally predicting the SANFT content of a current frame from the SANFT synthesis of a previous frame. Put differently, existing motion compensation techniques can be used to synthesize the noise to be inserted in said current frame from the noise already inserted in said previous frame (a reference frame acting as a source for the temporal prediction). This approach, however, does not always yield satisfactory results, the reason being that the motion compensation vectors (trajectories) may “jump” over time between different source regions. In particular, this may lead to unstable—and, thus, partially uncorrelated and dynamic—noise textures in spatial regions where said noise textures are supposed to appear (almost) static. Moreover, the approach does not work for video coding schemes which do not employ any temporal inter-frame prediction, such as “all-Intra” coding where each frame is coded independently.
In order to circumvent the abovementioned two issues, we propose the introduction of a so-called temporal correlation (TCorr) parameter in combination with the utilization of two fully independent (and advantageously identically distributed) pseudo-random value generators instead of only one such value generator. The first random number generator, called RNG 1, is reset to a predefined value only once every few frames (in so-called random-access pictures or tune-in points) and otherwise operates continuously (i.e., the last pseudo-random value is used to compute the next pseudo-random value). The second number generator, called RNG 2, is reset to a predefined value (advantageously, a different predefined value than that used for RNG 1) once in every single frame and, thus, outputs the same random signal in every frame. In other words, RNG 1 is a dynamic noise value generator for synthesizing temporally uncorrelated SANFT noise (similar to, e. g., traditional film grain synthesis), while RNG 2 is a static noise value generator which can be used to generate noise patterns which are temporally fully correlated. An example shall now operate as follows:
2. The decoder applies entropy decoding (including, advantageously, delta-decoding) so as to recover the integer TCorr index for each picture block (CTU). Since the SANFT noise levels are coded and decoded prior to the shape/granularity and TCorr indices in each frame (or slice), the decoder can identify, like the encoder, when to decode the TCorr index (i.e., when at least one non-zero noise level was decoded for the given picture block) and when not to decode a TCorr index (i.e., when no or only zero-valued noise levels were decoded for said picture block). Note that, in the latter case of no TCorr decoding, it is useful to assign a default (e. g., predicted) TCorr value in order to simplify algorithmic operations like the advantageously applied delta-coding techniques. The decoded (or default) integer 0≤TCorr≤5 controls the SANFT noise synthesis by defining the influence of the two random number generators as follows.
This means that, the smaller the value of TCorr, the more RNG 1 dominates (i.e., is weighted strongly in) the synthesized SAN FT signal, while the larger the value of TCorr, the more RNG 2 dominates (i.e. is weighted strongly in) the SAN FT synthesis. An appropriate, reasonable relationship between the TCorr parameter and the weighted average between the RNG 1 and RNG 2 outputs is given by a linear relationship such as the following one, governed by β:
out=(1−β)·f(RNG1)+β·f(RNG2) with β=TCorr/5 (10)
Note, however, that other relationships such as quadratic or similar nonlinear functions may give visually better results. Moreover, the best weighting function may depend on the codec.
Owing to rate-distortion optimization, completely different coding block segmentations are generally determined in the encoding of different video pictures by a contemporary video coder. In codecs with bi-directional temporal prediction, such as HEVC, this is even the case for jointly coded identical images (or regions thereof), as can be observed in screen content coding applications. In order for the static RNG 2 to produce fully identical pseudo-random texture patterns in successive pictures, the pseudo-random values have to be generated in the spatial (i.e., picture) domain. However, as in case of the dynamic RNG 1, said random values can still be applied in the transform (i.e., frequency) domain in order to up-hold the SAN FT synthesis principle described earlier. This can be achieved by, in each coding unit and, associated therewith, transform unit, applying a forward (i.e., analysis) transform on the RNG 2 output associated with the spatial location and region of said coding unit and transform unit. The resulting transformed static random values can be used in the SANFT synthesis algorithm.
An implementation (e.g. into HEVC reference software) is here discussed.
In contrast to the intra case, the implementation of the algorithm for inter-frame-coding involves some special modification and adjustments, since the prediction is now based on reference frames.
In the following we briefly present an overview about encoder-specific details:
It can be observed that noisy reference pictures basically cause a lower prediction gain and as a consequence thereof one would obtain a loss of coding efficiency. In order to avoid such effects and achieve nearly the same coding efficiency, the implementation rests on two reconstruction buffers for storing both the original (noise-free) reconstruction signal and the noisy reconstruction. Latter has to be available for so-called backward-adaptive tools (i.e. tools, which use neighboring content to refine the current block without signaling their parameters). But, apart from backward-adaptive tools, the entire estimation of tool and coding parameters is based on the noise-free buffer: First, the encoder determines the best mode or best tool for a specific coding unit (CU) in the sense of R-D-costs. Then the found decision is applied to the second reconstruction buffer, what can be seen as a CU-wise decoding process within the encoder search. Simultaneously, new noise parameters (level, shape/granularity, TCorr) will be calculated.
After the current frame is completely partitioned and compressed, a new noise part is added to the second reconstruction buffer according to the pre-calculated parameters. Considering that the motion compensation in the following frame will use the current reconstruction buffers, there is the effect of noise propagation. To suppress an unintended enhancement of the noise signal during the whole inter-coding process, the noise-adding is realized by a weighted summation. For this purpose, SANFT adds only one half of the calculated noise signal to the noisy reconstruction buffer. The second part will be added in the end of the encoding process when the reconstructed frames are written to the hard disk or displayed. In other words, within the inter loop the algorithm takes only 50% of each noise signal into consideration. Note that this ratio can be changed, e.g. the contribution within the inter loop could be varied between 0 and 100 percent.
A further aspect of the implementation concerns the periodic reset of internal parameters, which is important for a flexible and stable decoding functionality. More precisely, in the case of transmission errors, the algorithm should guarantee periodic points, whose decoding does not depend on previous frames (cf. intra period). Therefore, SANFT resets the dynamic seed periodically to the initial value 0.
Note that all mentioned techniques regarding weighted noise summation and dynamic seed reset are analogously implemented on the decoder side. The implementation differs only in the existence of a second reconstruction buffer. Since the analysis was done by the encoder, a noise-free buffer becomes redundant for the decoder.
Coding and prediction of tool parameters for inter case are here discussed. In particular, a temporal prediction of SANFT parameters is here discussed.
Similar to other video coding tools, SANFT provides a spatial and a temporal prediction method for its parameters (level, shape/granularity, TCorr). The spatial prediction depends on the respective SANFT parameter values in neighboring CUs and, in some cases, on CTUs and follows conventional techniques used e.g. in the coding of delta-OP values [3], [28].
For usual video sequences, the shape/granularity parameter and the temporal correlation TCorr do not vary significantly from frame to frame. Therefore, in the majority of cases, the spatial prediction is not as powerful as temporal prediction. Since shape/granularity and TCorr are CTU-based parameters, the temporal prediction simply takes the respective values from the same CTU, but from a previously encoded frame. More precisely, for each temporal layer there is a separate predictor to use different values in different layers. This leads to a more efficient prediction than just taking the last encoded frame. The temporal prediction for the noise levels is structurally identical, but levels depend on the partitioning depth and, in general, there is more than one level per CTU. SANFT circumvent this issue by calculating the level average on each CTU, which yields a convenient and partition-independent value for prediction purposes. However, calculating the average is only one approach among further possibilities: calculating the maximum over all level, minimum over all levels or (min+max)/2. Having a prior averaged level, each SANFT level in a CTU is predicted by this.
The decision on the prediction mode is made frame-wise (or slice-wise), i.e. we iterate over all CTUs and calculate the exact rates or approximate them. Then the entire frame/slice is either coded using spatial prediction or temporal prediction, where the selected mode is signaled with one bit per frame/slice.
As already mentioned, SANFT has to reset internal states periodically for decoder stability and robustness. Thus, temporal prediction is disabled for these special frames and only spatial prediction is applied. Another solution would be to allow temporal prediction, but instead of drawing on prior values the prediction uses pre-defined initial values for all three parameter types.
Advantageous methods for the transmission of SANFT Parameters are here discussed.
The SANFT parameters (levels, shape/granularity, and TCorr) are transmitted with a specific priority. Specifically, in each frame/slice, the CU-wise levels are written to the bitstream and read from the bitstream first. For very small CUs/TUs, the levels of adjacent CUs/TUs within a certain rectangular spatial region are combined (e.g., averaged), and only the combined, single level is written and read. Then, in the decoder, said combined value will be assigned to all CUs/TUs within said rectangular spatial region so that the remaining SANFT algorithms can operate per TU, as usual. This reduces the SANFT signaling overhead. During the writing/reading of the SAN FT level parameters in each CTU, the number of non-zero-valued levels is counted. If this number is larger than zero, this means that SANFT synthesis will be performed at least somewhere in said CTU, so the shape/granularity and TCorr parameters will also be written/read at the end of the bitstream part associated with said CTU. If the number of non-zero levels is zero, no SANFT will be applied in the CTU, and so no shape and TCorr parameters are written/read (instead, “default” predicted values are to be assumed in the further coding/decoding and processing). The above method adds the SANFT bitstream parameters inside the existing image or video codec bitstream syntax, i.e., as an integral part of said image or video codec. An alternative approach is to write and read the SANFT parameters as a bitstream extension element, e.g., as a so-called supplemental enhancement information (SEI) message. Using this approach, all SANFT parameters for a picture/frame are written and read after all other existing syntax elements, i.e., they are appended to the legacy bitstream (which, thus, remains backward-compatible) and their entropy coding remains fully independent of the entropy coding of all legacy bitstream elements (e.g., independent CABAC context states and predictors are used for the entropy coding of the legacy bitstream elements and of the SANFT parameters).
Some conclusions are here provided. This paper examined the issue of reduced film grain or textural detail in images and videos after lossy compression using modern predictive transform coders like HEVC. As an alternative to, and unification of, conventional parametric film grain and texture synthesis methods, a frequency-domain codec extension referred to as spectrally adaptive noise filling tool (SANFT), operating as closely around the cadet's transform coefficient quantizer as possible, was derived. Its key benefit over prior work is its low complexity, achieved by an efficient reuse of existing coder algorithms, and its ability to partition the coded transform coefficients into two fully disjoint sets: a traditionally coded “waveform accurate” one and a SANFT-coded parametric one A formal DCR test using the main still picture HEVC profile indicates, for some input, visual quality gains of almost two points on the 9-grade scale due to the SANFT
Some additional examples are here provided. Depending on certain implementation requirements, examples may be implemented in hardware. The implementation may be performed using a digital storage medium, for example a floppy disk, a Digital Versatile Disc (DVD), a Blu-Ray Disc, a Compact Disc (CD), a Read-only Memory (ROM), a Programmable Read-only Memory (PROM), an Erasable and Programmable Read-only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM) or a flash memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Generally, examples may be implemented as a computer program product with program instructions, the program instructions being operative for performing one of the methods when the computer program product runs on a computer. The program instructions may for example be stored on a machine readable medium.
Other examples comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier. In other words, an example of method is, therefore, a computer program having a program instructions for performing one of the methods described herein, when the computer program runs on a computer. A further example of the methods is, therefore, a data carrier medium (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier medium, the digital storage medium or the recorded medium are tangible and/or non-transitionary, rather than signals which are intangible and transitory.
A further example comprises a processing unit, for example a computer, or a programmable logic device performing one of the methods described herein.
A further example comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further example comprises an apparatus or a system transferring (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some examples, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some examples, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods may be performed by any appropriate hardware apparatus.
The above described examples are illustrative for the principles discussed above. It is understood that modifications and variations of the arrangements and the details described herein will be apparent. It is the intent, therefore, to be limited by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the examples herein.
Aspects of the examples above are here reported:
Notably, “one or more inverse transforms” may be understood, in some examples, as a generalization to cover both a multi-transform processing and an alternative example described above, in which large inverse transformations are divided into several small ones.
Further, “weighted identity transforms” may be understood, in some examples, as a generalization of the transform skip operation in HEVC and other codecs, wherein the synthesis transformation is skipped in at least one spatial direction, but a certain coefficient weighting is maintained,
Further, “spatially or temporally predictive decoding” may be understood, in some examples, as a generic term for the intra- or inter-prediction decoding [3], [25], [28] in HEVC and other codecs as well as their possible modifications (e, g. inpainted or affine prediction) or combinations,
Further, “spatial or temporal filtering” may be understood, in some examples as a generic term for deblocking filtering or shape adaptive offset filtering [30], [31] in HEVC and other codecs as well as their possible modifications (e. g. with application in a temporal direction) or combinations,
In general: generalizable to non-square D1×D2 transformations, special case NQ*=NQ.
Notably, “spatially or temporally predictive encoding” may be understood, in some examples, as a generic term for intra- or inter-prediction coding [3], [25], [28] in HEVC and other codecs as well as their possible modifications (e. g. inpainted or affine prediction) or combinations,
Further, “spatial or temporal filtering” may be understood, in some examples, as a generic term for denoising pre-processing or shape adaptive offset pre-processing [14], [18], [31] in HEVC and other codecs as well as their possible modifications (e. g. with application in a temporal direction) or combinations,
Further, “one or more analysis transforms” may be understood, in some examples, as a generalization to cover both multi-transform processing and splitting the two-dimensional analysis transformation into two separate “horizontal” and “vertical” 1D transformations,
Further, “weighted identity transforms” may be understood, in some examples, as a generalization of the transform skip operation in HEVC and other codecs, wherein the analysis transformation is skipped in at least one spatial direction, but a certain coefficient weighting is maintained,
Further, “direct or indirect entropy coding” may be understood, in some examples, as a short description for the direct arithmetic coding, range coding or Huffman coding of the quantized transform coefficients in TQ or the quantized SANFT parameters as well as the indirect arithmetic coding, range coding or Huffman (delta) coding of, e. g., differences between, or prediction errors of, coefficients in TQ or SANFT parameters,
In general: generalizable to non-square D1×D2 transformations, use of coefficient subsets in (2), cf.
The figures may refer to:
Even if the drawings are referred to stages of the invention (e.g., elements of hardware and programming to obtain predetermined functions), they may also be understood as describing corresponding methods (e.g. each block corresponding to a related method step).
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Helmrich, Christian, Bosse, Sebastian, Keydel, Paul
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
5479211, | Apr 30 1992 | Olympus Optical Co., Ltd. | Image-signal decoding apparatus |
20040114817, | |||
20060050157, | |||
20060133686, | |||
20120127350, | |||
20120230395, | |||
20130006622, | |||
20130114685, | |||
20150332686, | |||
WO2013040336, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Aug 24 2020 | FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E.V. | (assignment on the face of the patent) | / | |||
Oct 04 2020 | SEBASTIAN, BOSSE | FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 054428 | /0604 | |
Oct 06 2020 | HELMRICH, CHRISTIAN | FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 054428 | /0604 | |
Oct 06 2020 | KEYDEL, PAUL | FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 054428 | /0604 | |
May 20 2024 | WIEGAND, THOMAS | FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 067968 | /0062 | |
May 23 2024 | HINZ, TOBIAS | FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 067968 | /0062 | |
May 23 2024 | WIECKOWSKI, ADAM | FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 067968 | /0062 | |
May 23 2024 | MARPE, DETLEV | FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 067968 | /0062 | |
May 23 2024 | SWCHWARZ, HEIKO | FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 067968 | /0062 | |
May 23 2024 | GEORGE, VALERI | FRAUNHOFER-GESELLSCHAFT ZUR FÖRDERUNG DER ANGEWANDTEN FORSCHUNG E V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 067968 | /0062 |
Date | Maintenance Fee Events |
Aug 24 2020 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Sep 14 2024 | 4 years fee payment window open |
Mar 14 2025 | 6 months grace period start (w surcharge) |
Sep 14 2025 | patent expiry (for year 4) |
Sep 14 2027 | 2 years to revive unintentionally abandoned end. (for year 4) |
Sep 14 2028 | 8 years fee payment window open |
Mar 14 2029 | 6 months grace period start (w surcharge) |
Sep 14 2029 | patent expiry (for year 8) |
Sep 14 2031 | 2 years to revive unintentionally abandoned end. (for year 8) |
Sep 14 2032 | 12 years fee payment window open |
Mar 14 2033 | 6 months grace period start (w surcharge) |
Sep 14 2033 | patent expiry (for year 12) |
Sep 14 2035 | 2 years to revive unintentionally abandoned end. (for year 12) |