Reconstructing a second component signal relating to a second component of a multi-component picture from a spatially corresponding portion of a reconstructed first component signal and a correction signal derived from a data stream for the second component promises increased coding efficiency over a broader range of multi-component picture content. By including the spatially corresponding portion of the reconstructed first component signal into the reconstruction of the second component signal, any remaining inter-component redundancies/correlations present such as still present despite a possibly a priori performed component space transformation, or present because of having been introduced by such a priori performed component space transformation, for example, may readily be removed by way of the inter-component redundancy/correlation reduction of the second component signal.
|
13. A method for encoding a multi-component picture, comprising:
obtaining a first residual signal relating to a first component of the multi-component picture, wherein the first component represents a first color plane and the first residual signal is a prediction residual obtained by subtracting a first prediction signal of the first component from the first component;
encoding, into a data stream, the first residual signal; and
encoding, into the data stream, a signaling syntax element that indicates whether or not a second residual signal of a second component of the multi-component picture is obtained based on the first residual signal such that either (a) a portion of the first residual signal is scaled using a scaling factor modified to derive a portion of the second residual signal, or (b) a correction signal encoded into the data stream to derive a portion of the second residual signal, wherein the second component represents a second color plane and the second residual signal is a prediction residual obtained by subtracting a second prediction signal of the second component from the second component.
10. A method for decoding a multi-component picture, comprising:
determining, based on information extracted from an encoded data stream, a first residual signal relating to a first component of the multi-component picture, wherein the first component represents a first color plane and the first residual signal is a prediction residual obtained by subtracting a first prediction signal of the first component from the first component;
extracting, from the encoded data stream, a signaling syntax element that indicates whether or not a second residual signal of a second component of the multi-component picture is derived based on the first residual signal such that either (a) a portion of the first residual signal is scaled using a scaling factor modified to derive a portion of the second residual signal, or (b) a correction signal is extracted from the encoded data stream independent of the first residual signal to derive a portion of the second residual signal, wherein the second component represents a second color plane and the second residual signal is a prediction residual obtained by subtracting a second prediction signal of the second component from the second component;
responsive to the signaling syntax element, deriving the portion of the second residual signal;
sub-dividing the multi-component picture into prediction blocks and residual blocks, and further subdivide the residual blocks into transform blocks;
selecting prediction modes based on first information from the encoded data stream;
determining prediction parameters for the prediction blocks based on second information from the encoded data stream;
deriving a prediction signal using the prediction modes and the prediction parameters;
deriving a residual signal within each residual block by performing inverse transformations within the transform blocks; and
reconstructing the multi-component picture by combining the prediction signal and the residual signal.
1. A decoder configured to decode a multi-component picture, the decoder comprising a processor configured for:
determining, based on information extracted from an encoded data stream, a first residual signal relating to a first component of the multi-component picture, wherein the first component represents a first color plane and the first residual signal is a prediction residual obtained by subtracting a first prediction signal of the first component from the first component;
extracting, from the encoded data stream, a signaling syntax element that indicates whether or not a second residual signal of a second component of the multi-component picture is derived based on the first residual signal such that either (a) a portion of the first residual signal is scaled using a scaling factor to derive a portion of the second residual signal, or (b) a correction signal is extracted from the encoded data stream independent of the first residual signal to derive a portion of the second residual signal, wherein the second component represents a second color plane and the second residual signal is a prediction residual obtained by subtracting a second prediction signal of the second component from the second component;
responsive to the signaling syntax element, deriving the portion of the second residual signal;
sub-dividing the multi-component picture into prediction blocks and residual blocks, and further subdivide the residual blocks into transform blocks;
selecting prediction modes based on first information from the encoded data stream;
determining prediction parameters for the prediction blocks based on second information from the encoded data stream;
deriving a prediction signal using the prediction modes and the prediction parameters;
deriving a residual signal within each residual block by performing inverse transformations within the transform blocks; and
reconstructing the multi-component picture by combining the prediction signal and the residual signal.
2. The decoder according to
3. The decoder according to
4. The decoder according to
performing an inverse spectral transformation onto spectral coefficients relating to the second component derived from the data stream to acquire the correction signal in a spatial domain and reconstructing the second residual signal using the correction signal in the spatial domain, and
acquiring the correction signal in a spectral domain from the data stream, reconstructing, in the spectral domain, the second residual signal using the correction signal as acquired in the spectral domain, and subjecting, in the spectral domain, the second residual signal to an inverse spectral transformation.
5. The decoder according to
6. The decoder according to
7. The decoder according to
8. The decoder according to
9. The decoder according to
11. The method according to
12. The method according to
14. The method according to
15. The method according to
16. The method according to
17. The decoder according to
18. The decoder according to
19. decoder according to
|
The present application is a continuation of U.S. patent application Ser. No. 16/256,064 filed Jan. 24, 2019, which is a continuation of U.S. patent application Ser. No. 14/875,743, filed Oct. 6, 2015, which is a continuation of International Application PCT/EP2014/057090, filed Apr. 8, 2014, and additionally claims priority from U.S. Provisional Application 61/809,608, filed Apr. 8, 2013 and U.S. Provisional Application 61/846,450, filed Jul. 15, 2013, and European Application EP 14150373.0, filed Jan. 7, 2014, all of which are incorporated herein by reference in their entireties.
The present application is concerned with inter-component prediction in multi-component picture coding such as between luma and chroma.
In image and video signal processing, color information is mainly represented in a color space typically consisting of three components like R′G′B′ or Y′CbCr. The first component, Y′ in the case of Y′CbCr, is often referred to as the luma and the remaining two components, the Cb and the Cr components or planes in the case of Y′CbCr, are referred to as the chroma. The advantage of the Y′CbCr color space over the R′G′B′ color space is mainly the residual characteristic of the chroma components, i.e., the chroma components contain less energy or amplitude comparing to the chroma signals of absolute color spaces like R′G′B′. In particular for Y′CbCr, the luma component implies the grey scale information of the image or video and the chroma component Cb implies the difference relative to the blue primary, respectively Cr denotes the difference relative to the red primary.
In the application space of image and video compression and processing, Y′CbCr signals are advantageous as compared to R′G′B′ due to the fact that the color space transformation from R′G′B′ to Y′CbCr reduces or removes the correlation between the different color components or planes. In addition to the correlation removal, less information has to be transmitted, and hence, the color transformation acts as a compression approach too. Such a pre-processing in correlation removal or reduction enables higher compression efficiency while maintaining or increasing the complexity in a meaningful amount as an example. A hybrid video compression scheme is often designed for Y′CbCr input because the correlation between the different color components is removed or reduced and the designs of hybrid compression schemes only have to consider the separate processing of the different components. However, the transformation from R′G′B′ to Y′CbCr and vice versa is not lossless, and hence, information, i.e., sample values available in the original color space might be lost after such a color transformation. This issue can be avoided by using color spaces involving a lossless transformation from the original color space and back to the original color space, e.g., the Y′CoCg color space when having R′G′B′ input. Nevertheless, fixed color space transformations might lead to sub-optimal results depending on the application. For image and video compression, fixed color transformations are often sub-optimal for higher bit rates and non-natural signals with high or without correlation between the color planes. In the second case, a fixed transformation would introduce correlation between the different signals, and in the first case, the fixed transformation might not remove all the correlation between the different signals. Furthermore, due to the global application of the transformation, correlation might not be completely removed from the different components or planes locally or even globally. Another issue introduced by a color space transformation lies in the architecture of an image or video encoder. Usually, the optimization process tries to reduce a cost function, which is often a distance metric defined over the input color space. In the case of transformed input signals, it can be difficult to achieve an optimal result for the original input signal due to additional processing steps. Consequently, the optimization process might result in a minimum cost for the transformed signal but not for the original input signal. Although the transformations are often linear, the cost calculation in the optimization process often involves a signaling overhead and the cost for the final decision is then calculated by a Lagrangian formula. The latter might lead to different cost values and different optimization decision. The color transformation aspect is especially crucial in the domain of color representation as modern image and video displays usually use the R′G′B′ color composition for content representation. Generally speaking, transformations are applied when correlation within the signal or between the signals should be removed or reduced. As a consequence, the color space transformation is a special case of the more generic transformation approach.
Accordingly, it would be favorable to have a multi-component picture coding concept at hand which is even more efficient, i.e. achieves higher bitrates over a broader range of multi-component picture content.
One embodiment has a decoder configured to decode a multi-component picture spatially sampling a scene with respect to different components, by reconstructing a first component signal relating to a first component of the multi-component picture from a data stream; reconstructing a portion of a second component signal relating to a second component of the multi-component picture from a spatially corresponding portion of the reconstructed first component signal and a correction signal derived from the data stream, wherein the first and second components are color components and the first component signal is a prediction residual of a temporally, spatially or inter-view prediction of the first component of the multi-component picture and the second component signal is a prediction residual of a temporal, spatial or inter-view prediction of the second component of the multi-component picture.
Another embodiment has an encoder configured to encode a multi-component picture spatially sampling a scene with respect to different components, by encoding a portion of a second component signal relating to a second component of the multi-component picture by prediction from a spatially corresponding portion of a reconstructed first component signal and inserting a correction signal for correcting the prediction into the data stream.
According to another embodiment, a method for decoding a multi-component picture spatially sampling a scene with respect to different components may have the steps of: reconstructing a first component signal relating to a first component of the multi-component picture from a data stream; and reconstructing a portion of a second component signal relating to a second component of the multi-component picture from a spatially corresponding portion of the reconstructed first component signal and a correction signal derived from the data stream, wherein the first and second components are color components and the first component signal is a prediction residual of a temporally, spatially or inter-view prediction of the first component of the multi-component picture and the second component signal is a prediction residual of a temporal, spatial or inter-view prediction of the second component of the multi-component picture.
According to another embodiment, a method for encoding a multi-component picture spatially sampling a scene with respect to different components may have the steps of: encoding a portion of a second component signal relating to a second component of the multi-component picture by inter-component prediction on the basis of a spatially corresponding portion of a reconstructed first component signal and inserting a correction signal for correcting the inter-component prediction into the data stream, wherein the first and second components are color components and the first component signal is a prediction residual of a temporally, spatially or inter-view prediction of the first component of the multi-component picture and the second component signal is a prediction residual of a temporal, spatial or inter-view prediction of the second component of the multi-component picture.
The present invention is based on the finding that reconstructing a second component signal relating to a second component of a multi-component picture from a spatially corresponding portion of a reconstructed first component signal and a correction signal derived from a data stream for the second component promises increased coding efficiency over a broader range of multi-component picture content. By including the spatially corresponding portion of the reconstructed first component signal into the reconstruction of the second component signal, any remaining inter-component redundancies/correlations present such as still present despite a possibly a priori performed component space transformation, or present because of having been introduced by such a priori performed component space transformation, for example, may readily be removed by way of the inter-component redundancy/correlation reduction of the second component signal.
In accordance with an embodiment of the present application, the multi-component picture codec is construed as a block-based hybrid video codec operating in units of code blocks, prediction blocks, residual blocks and transform blocks, and the inter-component dependency is switched on and off at a granularity of the residual blocks and/or transform blocks by a respective signaling in the data stream. The additional overhead for spending the signaling is over-compensated by the coding efficiency gain as the amount of inter-component redundancy may vary within a picture. In accordance with an embodiment of the present application, the first component signal is a prediction residual of a temporally, spatially or inter-view prediction of the first component of the multi-component picture and the second component signal is a prediction residual of a temporal, spatial or inter-view prediction of the second component of the multi-component picture. By this measure, the inter-component dependency exploited focuses on remaining inter-component redundancies so that the inter-component prediction may tend to show a smoother spatial behavior.
In accordance with an embodiment, a first weight at which the spatially corresponding portion of the reconstructed first component signal influences the reconstruction of the second component signal, denoted □ in the following, is adaptively set a sub-picture granularity. By this measure, the intra-picture variation in inter-component redundancy may be more closely followed. In accordance with an embodiment, a mixture of a high-level syntax element structure and sub-picture granularity first-weight syntax elements is used in order to signal the first weight at the sub-picture granularity, wherein the high-level syntax element structure defines a mapping from a domain set of possible bin strings of a predetermined binarization of the first-weight syntax elements onto a co-domain of possible values of the first weight. By this measure, the overhead for a side information for controlling the first weight is kept low. The adaptation may be done forward adaptively. A syntax element may be used per block such as residual or transform block, which has a limited number of signalable states which symmetrically index one of a number of weight values for □ symmetrically distributed around zero. In one embodiment, the number of signalable states is uneven with the number of weight values including zero, wherein the signaling of zero is used so as to signal the non-use of inter-component prediction so that an extra flag becomes obsolete. Further, the magnitude is signaled before the conditionally signaled sign, with the magnitude is mapped onto the number of weight values and if the magnitude is zero, the sign is not signaled so that signalization costs are further reduced.
In accordance with an embodiment, a second weight at which the correction signal influences the reconstruction of the second component signal, is set at a sub-picture granularity, either in addition to, or alternatively to, the adaptive setting of the first weight. By this measure, the adaptivity of inter-component redundancy reduction may further be increased. In other words, in accordance with an embodiment, in reconstructing the second component signal, weights of a weighted sum of the correction signal and the spatially corresponding portion of the reconstructed first component signal, may be set at a sub-picture granularity. The weighted sum may be used as a scalar argument of a scalar function which is, at least per picture, constant. The weights may be set in a backward-driven manner based on a local neighbourhood. The weights may be corrected in a forward-driven manner.
In accordance with an embodiment, the domain where the reconstruction of the second component signal from a spatially corresponding portion of the reconstructed first component signal using the correction signal is performed, is the spatial domain. Alternatively, the spectral domain is used. And even alternatively, the domain used is changed between spatial and spectral domain. The switching is performed at sub-picture granularity. It turned out that the ability to switch, at sub-picture granularity, the domain where the combination of the reconstructed first component signal and the correction signal takes place, increases the coding efficiency. The performing the switching may be done in backward-adaptive manner or in a forward-adaptive manner.
In accordance with an embodiment, a syntax element in the data stream is used to enable changing the role of the first and second component signals within the components of the multi-component picture. The additional overhead for signaling the syntax element is low compared to the possible gain in coding efficiency.
In accordance with an embodiment, the reconstruction of the second component signal is allowed to switch, at sub-picture granularity, between the reconstruction based on the reconstructive first component signal only, and reconstructing same based on the reconstructed first component signal and a further reconstructed component signal of a further component of the multi-component picture. At relatively low additional effort, this possibility increases the flexibility in removing residual redundancies between components of the multi-component picture.
Likewise, in accordance with an embodiment, a first syntax element in the data stream is used in order to, globally or at an increased scope level, enable or disable the reconstruction of the second component signal based on the reconstructed first component signal. If enabled, sub-picture level syntax elements in the data stream are used to adapt the reconstruction of the second component signal based on a reconstructed first component signal at a sub-picture granularity. By this measure, spending side information for the sub-picture level syntax elements may merely be employed in application cases or multi-component picture contents for which the enablement results in a coding efficiency gain.
Alternatively, the switching between enablement and disablement is performed locally in a backward-driven manner. In this case, the first syntax element does not even need to be present in the data stream. In accordance with an embodiment, for example, the local switching is performed locally depending on a check whether first and second component signals are prediction residuals of a spatial prediction with the intra-prediction mode of the spatial prediction coinciding, or not deviating by more than a predetermined amount. By this measure, the local switching between enablement and disablement does not consume bitrate.
In accordance with an embodiment, a second syntax element in the data stream is used so as to switch between adaptive reconstruction of the second component signal based on the reconstructed first component signal at sub-picture granularity forward-adaptively using sub-picture level syntax elements in the data stream, and non-adaptively performing the reconstruction of the second component signal based on the reconstructed first component signal. The signaling overhead for the second syntax element is low compared to the possibility of avoiding the overhead for transmitting the sub-picture level syntax elements for multi-component picture content for which non-adaptively performing the reconstruction is already efficient enough.
In accordance with an embodiment, the concept of inter-component redundancy reduction is transferred onto a three chroma component picture. In accordance with an embodiment, luma and two chroma components are used. The luma component may be chosen as the first component.
In accordance with an embodiment, the sub-picture level syntax element for adapting the reconstruction of the second component signal from the reconstructed first component signal is coded within the data stream using a Golomb-Rice code. The bins of the Golomb-Rice code may be subject to binary arithmetic coding. Different contexts may be used for different bin positions of the Golomb-Rice code.
In accordance with an embodiment, the reconstruction of the second component signal from the reconstructed first component signal involves a spatial re-scaling and/or a bit depth precision mapping on the spatially corresponding portion of the reconstructed first component signal. The adaptation of the spatially rescaling and/or performance of the bit depth precision mapping, may be done in a backward and/or forward adaptive manner. The adaptation of the spatial re-scaling may involve the selection of a spatial filter. The adaptation of the performance of the bit depth precision mapping may involve the selection of a mapping function.
In accordance with an embodiment, the reconstruction of the second component signal from the reconstructed first component signal is done indirectly via a spatially low-pass filtered version of the reconstructed first component signal.
Advantageous implementations of embodiments of the present application are the subject of the dependent claims.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
The description brought forward below starts with the description of a detailed embodiment of an encoder and a description of a detailed embodiment of a decoder fitting to the encoder, wherein after generalized embodiments are presented.
In principle, the encoder of
The encoder of
Accordingly, the encoder 100 comprises, per component 106, 108 and 110, a sequence of a prediction residual former 112, here exemplarily embodied as a subtractor, a transformer 114 and a quantizer 116 serially connected, in the order of their mentioning between an input at which a respective component 106, 108 and 110, respectively arrives, and a respective input of a data stream former 118 configured to multiplex the quantized coefficients and other coding parameters mentioned in more detail below, into the data stream 104. While a non-inverting input of prediction residual former 112 is arranged so as to receive the respective component of picture 102, the inverting (subtrahend) input thereof receives a prediction signal 120 from a predictor 122, the input of which is connected to the output of quantizer 116 via a reconstruction path comprising a sequence of a dequantizer 124, a re-transformer 126 and a prediction/residual combiner 128, here exemplarily embodied as an adder. While the output of prediction/residual recombiner 126 is connected to an input of prediction 122, and a first input is connected to an output of re-transformer 126, a further input thereof receives the prediction signal 120 output by predictor 122.
Elements 112, 114, 116, 124, 126, 128, 122 and 120 are present within encoder 100 in a parallel manner for each of components 106 to 110 so as to form three separate block-based hybrid video coding paths 1301, 1302 and 1303. The indices 1, 2 and 3 are used to distinguish between the different components of picture 102 with index 1 associated with component 106, index 2 associated with component 108 and index 3 associated with component 110.
In the example shown in
As became clear from the note presented above, in the example of
Before describing the mode of operation of the encoder of
As already denoted above, the decoder paths 2301 to 2303 of decoder 200 substantially correspond to those paths of coding paths 1301 to 1303 of encoder 100, encompassing elements 124, 138, 126, 134, 128 and 122 thereof. That is, the decoding paths 2302 and 2303 of the “dependent components” 208 and 210 comprise between a respective output of data stream extractor 218 on the one hand and the respective output for outputting the respective component 208 and 210, respectively, a serial connection of a dequantizer 224, a spectral domain inter-component prediction/residual recombiner 238, an inverse transformer 226, a spatial domain inter-component prediction/residual recombiner 234 and prediction/residual recombiner 228 connected between data stream extractor 218 on the one hand and the output of decoder 200 for outputting multi-component picture 202, on the other hand, in the order of their mentioning, wherein a predictor 222 is connected into a feedback loop leading from the prediction/residual recombiner 228 back to another input thereof. The further inputs of 238 and 234 are fed by the respective spatial domain and spectral domain inter-component predictors 240 and 244. The decoding path 2301 of the “base component” 206 differs in that elements 238, 234, 244 and 240 are not present. Predictor 2443 has an input thereof connected to an output of dequantizer 2242, and another output connected to the output of dequantizer 2241, predictor 2403 has a first input connected to an output of inverse transformer 2262 and another input connected to an output of inverse transformer 2261. Predictor 2442 has an input connected to the output of dequantizer 2241, and predictor 2402 has an input thereof connected to inverted transformer 2261.
It should be noted that predictors 1403 and 1402 of
After having described the structure of both encoder 100 and decoder 200, their mode of operation is described hereinafter. In particular, as already described above, encoder 100 and decoder 200 are configured to use hybrid coding so as to encode/decode each component of the multi-component picture. Actually, each component of the multi-component picture 102 represents a sample array or picture, each spatially sampling the same scene with respect to a different color component. The spatial resolution of components 106, 108 and 110, i.e. the spectral resolution at which the scene is sampled with respect to the respective component, may differ from component to component.
As described above, the components of the multi-component picture 102 are separately subject to hybrid encoding/decoding. “Separately” does not necessarily mean that the encoding/decoding of the components is performed completely independent of each other. First of all, inter-component prediction removes redundancies between the components and additionally some coding parameters may be chosen commonly for the components. In
In units of code blocks 304, for example, predictors 122 and 222 vary between a plurality of prediction modes supported by encoder 100 and decoder 200, respectively. For example, predictors 1221 to 1223 select the prediction modes for the code blocks individually and indicate the selection via prediction parameters 1541 to 1543 to predictors 2223. The available prediction modes may comprise temporal and spatial prediction modes. Other prediction modes may be supported as well, such as inter-view prediction modes or the like. Using recursive multi-tree subdivisioning, such as dual-tree subdivisioning, code blocks may be further subdivided into prediction blocks 308, the outlines 310 of which are indicated in
Depending on the prediction mode associated with the respective code block or prediction block, each prediction block 308 has respective prediction parameters associated therewith which are selected by predictors 1221 to 1223 appropriately, inserted into parameter information 1541 to 1543 and used by predictors 2221 to 2223 so as to control the prediction within the respective prediction blocks 308 accordingly. For example, prediction blocks 308 having a temporal prediction mode associated therewith may have a motion vector for motion-compensated prediction associated therewith, and optionally a reference picture index indicating a reference picture from which, with the displacement indicated by the motion vector, the prediction of the respective prediction block 308 is derived/copied. Prediction blocks 308 of a spatial prediction mode may have a spatial prediction direction associated therewith, contained within prediction information 1511 to 1543, the latter indicating the direction along which the already reconstructed surrounding of the respective prediction block is spatially extrapolated into the respective prediction block.
Thus, using the prediction modes and prediction parameters, predictors 122 and 222 derive a prediction signal 1201 to 1203 for each of the components, and for each of the components, this prediction signal is corrected using a residual signal 1561 to 1563. These residual signals are coded by use of transform coding. That is, transformers 1141 to 1143 perform a transformation, i.e. a spectral decomposition, onto each transform block 116 individually, such as a DCT, DST or the like, and the inverse transformers 226 reverse the same individually for the transform blocks 308, i.e. perform, for example, a IDCT or IDST. That is, as far as the encoder 100 is concerned, the transformers 114 perform the transformation onto the not yet quantized residual signal as formed by prediction residual formers 112. Inverse transformers 126 and 226 reverse the spectral decomposition on the basis of the quantized residual signal 150 of the respective component which is in a lossless manner, such as using Huffman or arithmetic coding, inserted into the data stream by data stream former 118 and extracted therefrom using, for example, Huffman or arithmetic decoding, by data stream extractor 218.
However, in order to lower the data rate for coding the residual signal 2561 to 2563 using which the prediction signal 2201 to 2203 is corrected at prediction/residual recombiners 2281 to 2283, encoders 100 and 200 support inter-component prediction with respect to the coding of the components' residual signal. As will be described in more detail below, in accordance with embodiments of the present application, the inter-component prediction for coding the residual signal may be switched on and off and/or forward and/or backwards adaptively adjusted at the granularity of residual blocks and/or transform blocks. If switched off, the inter-component prediction signals output by predictors 140, 144 and 240 and 244 are zero and the residual signals 2561 to 2563 of all components are derived from the quantized transform coefficients contained in its respective residual signal 1501 and 1503 solely. If switched on, however, inter-component redundancies/correlations are removed as far as the dependent components are concerned, i.e. residual signals 2562 and 2563 are coded/decoded using inter-component prediction. The base (first) component serving as inter-component prediction source is, as far as the inter-component prediction is concerned, left unchanged. How this is done is outlined in the following.
For the time being, the description of the inter-component redundancy reduction realized by predictors 140, 144, 240 and 244 focuses on the inter-component redundancy reduction between components 106 and 108 and 206 and 208, respectively. After that, in order to ease understanding, the description is extended to the three component case illustrated in
As will become clear from the description brought forward below, the embodiments outlined below take advantage of, for example, the residual characterization of the chroma components, especially in cases where absolute chroma components, or planes, serve as the input. In case of the three components of
With respect to
Please note that the term block shall be understood as describing a generic rectangular shape in the following, i.e. a block can have a rectangular shape.
In the embodiments described next, the encoder decides on the application of the inter-claim/component prediction for each prediction block 308 or transform block 304.
As an intermediate note, however, it is submitted herewith that embodiments of the present invention are not restricted to the case outlined with respect to
For each residual block 312 (or rectangular shape), in accordance with an embodiment, a syntax element is transmitted in the data stream 104, and the syntax element denotes whether or not the inter-plane/component prediction by way of predictors 140, 144, 240, 244 should be used. In the case of a video compression scheme like H.265/EVC, also illustrated in
Differently speaking, reference is made to
At all of these occasions at
As is shown in
The other input signal 404 of reconstruction module 400 is denoted “y” and represents the transmitted residual signal of the portion of the second component, currently being the subject of inter-plane/component redundancy reduction by module 400, in the same domain as signal 402, i.e. spectral or spatial domain. The reconstruction module 400 reconstructs a second component signal 406, denoted “z” in
Just as the other modules of encoder and decoder, the reconstruction module 400 operates on a block-basis. The operation on a block-basis may, for example, manifest itself in a block-wise adaptation of the inter-component redundancy reduction reversal performed by reconstruction module 400. The “block wise adaptation” may involve, optionally, the explicit signaling of prediction parameters 146/148 within the data stream. A backward adaptive setting of the parameters for controlling the inter-component redundancy reduction is feasible, however, as well. That is, referring to
As outlined in more detail below, a reconstruction module 400 may, for example, operate in such a manner that z is representable as φ(αx+β+γy). α, β and γ are possible inter-component prediction parameters. For each of α, β and γ, it may hold true that same is a constant, thus is neither backward nor forward adaptively varied, is backward adaptively varied and accordingly does not form part of the data stream, or is forward adaptively varied and is signaled in the data stream.
In particular, for a given block such as block 440, it may be signaled within the data stream by way of a syntax element, as to whether or not inter-component prediction is to be performed. The parameters α, β and γ in case of inter-component prediction being switched on, merely represent possible examples. For a block 440 for which inter-component prediction is applied, a prediction mode may be signaled in the data stream, a prediction source may be signaled in the data, a prediction domain may be signaled in the data stream and parameters related to the aforementioned parameters may be signaled in the data stream. The meaning of “prediction mode”, “prediction source”, “prediction domain” and “related parameters” will become clear from the description brought forward below. In the examples described so far, the inter-component prediction operates on residual signals. That is, x and y were prediction residuals as transmitted within the data stream and both represent prediction residuals of a hybrid prediction. As also described above, x and y may be prediction residuals in the spatial domain or in the frequency domain in the exemplary case of using transform coding as outlined above. Applying the prediction in the stage of an encoder or a decoder has several benefits. First of all, additional memory is usually unnecessary, and second the inter-component prediction can be performed locally, i.e. without the introduction of additional intermediate steps after the parsing process from the decoder point of view. In order to distinguish the prediction domain, a further syntax element might be transmitted in the bit stream. That is, the latter further syntax element may indicate whether the inter-component prediction domain may be the spatial domain or the spectral domain. In the first case, x, y and z are in the spatial domain, and in the latter case x, y and z are in the spectral domain. Please note that from the decoder perspective, the residual is reconstructed from the bit stream and can be different from those that have been generated in the encoder before the quantization step. However, it is advantageous to use the already quantized and reconstructed residual as a prediction source in an encoder implementing the embodiment of the present application. Furthermore, in the case of skipping the transform stage, the inter-component prediction in the spatial and frequency domains is exactly the same. For such a configuration, the signaling of the prediction domain, i.e. spatial or frequency domain, can be skipped.
As far as the aforementioned “prediction modes” are concerned, same can be affine, linear, non-linear or more complex. In the first case, the predictor might be written as already described above, namely as z=φ(αx+β+γy) where z is the reconstructed residual signal or sample, x is the vector containing samples from the prediction source signal, α, β and γ are model parameters, and y samples from the current signal, i.e. the past residual signal from the decoder side, and φ can be some linear or non-linear function.
To keep the processing chain as simple as possible, an example configuration might keep the processing of the first component, such as the luma component, unchanged and use the luma reconstructed residual signal as a predictor for the component residual signals. This is a possible prediction source configuration and note that such a simple prediction source simplifies the generic transformation approach, where all three components or planes of the input signal may be used in order to generate the transform samples.
Another possible configuration is to make the prediction source adaptive, i.e. signaling which residual signal of all already available, or respectively reconstructed residual components, are used for prediction. Alternatively, the processing order can be changed locally, e.g. the second component is reconstructed first and is used as prediction source for the remaining components. Such a configuration takes the advantage from the fact that the delta between two components using a bijective (or nearly invertible) predictor is the same, but with inverted sign, however, the absolute cost for coding the prediction source is different. Furthermore, the combination of several prediction sources is possible. In this case, the combination weights might be transmitted in the bit stream or estimated backward-driven using the statistics of the available or respective coded neighbors.
The specification of the model parameters can be performed backward-driven, composed of backward-drive estimation and forward signalization, or completely forward signalized in the data stream. An example configuration is to use a fixed set of model parameters, known both at the encoder and decoder, and signaling the set index to the decoder for each block or shape. Another configuration is to use a dynamic set or a list where the order of predictors is changed after a specific number of block or shapes. Such an approach enables higher local adaptation to the source signal. A more detail example on prediction mode, prediction source, prediction domain and parameter signaling is brought forward below.
As to the prediction mode, the following may be noted.
The prediction mode can be affine, non-linear, or a more complex function realized by approaches like splines or support vector regression.
Note that color space transformations are mostly linear employing all available input components. That is, color space transformations tend to map a three component vector of three color components, onto another vector of another three components of another color space. Encoder and decoder may, in accordance with an embodiment of the present application, operate independently from the input color space, and hence the luma component might be kept unchanged in order to form a prediction source. However, in accordance with an alternative embodiment, the definition of the “luma component” or “first component” can be different for each block, such as prediction or transformation block (or rectangular shape), i.e. for each block or shape, the component which serves as the “first component” might be selected adaptively. The adaptation may be indicated to the encoder by a signaling in the data stream. For example, while
If prediction is enabled for a certain block 440, additionally or alternatively the prediction mode may be signaled within the data stream for that block 440. Please note that the prediction might be skipped for the case of a zero-valued reconstructed first component signal, i.e. a zero-valued residual luma signal in case of using a prediction residual as a basis for the inter-component prediction. In that case, the aforementioned syntax element signaling whether or not the inter-component prediction is applied, could be omitted, i.e. not present, in the data stream for the respective block 440.
In the case of a combined backward-driven and forward signalization approach, parameters derived from already coded respective reconstructed data can serve as starting parameters. In such a case, a delta relative to the selected prediction mode can be transmitted in the data stream. This could be achieved by calculating the optimal parameter for a fixed or adapted prediction mode and the calculated parameters are transmitted in the bit stream. Another possibility is to transmit some delta relative to a starting parameter derived by using a backward-drive selection approach or using the parameters calculated and selected by the backward-driven approach only.
An example configuration of prediction modes is described in the following. In this example, the first prediction mode implies α=1, β=0, γ=1 and φ(x)=□×□ and the second prediction mode implies the same parameters except for α=0.5 with x=(x0), y=(y0) and the single elements of x and y are the residual value for the block at the same spatial position in the first and second components, such as in the luma component and the respective position at the chroma component. Differently speaking, in accordance with an embodiment of the present application, for each block 440, for which the inter-component prediction is applied, it is signaled in the data stream whether α equals a first value, namely 1, or a second value, here 0.5. Instead of a signaling in the data stream, a mixed backward and forward driven approach may be used as already described above, or the selection among the available values of α may be performed backward adaptively. β and γ would merely represent constants.
Note that the single element in y is replaced after the prediction by z. In other words, the reconstruction module 400 receives the correction signal 404 and replaces same by z 406. Also note that when using such a configuration, the inter-component prediction is simplified to an addition operation: either the fully reconstructed residual sample value of the first component (α=1) or half its value (α=0.5) is added to a correction sample value 404. The halving may be generated by a simple right shift operation. In that case, for example, the configuration could be implemented by realizing the multiplication between x and α in the predictor within reconstruction module 400, and realizing the addition in the adder shown in
The residual signal of the chroma component for a transform block (or rectangular shape) is subtracted by the reconstructed residual sample at the same (spatial or frequency) location in the luma plane. After transform and quantization, the reconstructed residual is added to the reconstructed residual sample at the same spatial location in the luma plane. From the decoder perspective, only the latter operation may be used. An example configuration of an approach involving a backward-driven scheme can be done as follows. For each block or shape 440, an optimum parameter α is calculated. A starting α0 is derived backward-driven from the local neighborhood, e.g., from the above and the left block (or rectangular shape) that have been coded before. If a syntax element in the bit stream signals that no further parameters are transmitted in the bit stream, the derived α0 is used. Otherwise, i.e., prediction should be used but corrected by a syntax element denotes a delta αΔ in the bit stream. It is also possible to transmit two flags, the first flag indicates whether α0 should be used and the second flag indicates whether an αΔ exists in the bit stream.
Another possible configuration is to increase the granularity for the parameter derivation process. This approach also implies higher signalization granularity when forward transmission is applied. Otherwise, a higher granularity for the backward-driven scheme is implied. In this possible configuration, the prediction parameters are derived for each sample or for a group of samples within a transform block (or rectangular shape) or even for a prediction block itself. Please note that a special case is given by coupling the signaling level to transform blocks (or rectangular shapes). For the current sample or group of samples, a specific amount of samples within the same transform block or prediction block, defined by a predefined window, is taken for the parameter derivation. Using the above example where an affine predictor is employed, the parameter αn can be derived from previously reconstructed sample or group of samples as follows where n is the groupindex.
zn=φ(αn-1x+βn-1+γn-1y)
The parameter for the first sample or group of samples can be initialized by some default values or calculated from neighboring blocks or shapes. Another possibility is to transmit the optimum parameters for the first sample or group of samples. In order to use as much previous samples as possible, a predefined scan pattern can be used to map the 2-dimensional residual block (or rectangular shape) to an 1-dimensional vector. For example, the samples or the groups of samples can be scanned vertically, horizontally, or directional similar to the scan directions of the transform coefficients. Again, the specific scan can be derived by a backward-driven scheme or it's signaled in the bit stream.
Another extension is the combination of the already presented example and a transformation using all three available components as input. Here, in this example, the residual signal is transformed using a transformation matrix that removes the correlation for the given transform block. This configuration is useful when the correlation between the planes is very large or extremely small. An example configuration would use Y′CoCg transformation in the case of input color spaces like R′G′B′ or a principal component analysis approach. For the latter case, the transform matrix has to be signaled to the decoder in a forward manner or using a predefined set and rule known at encoder as well as decoder to derive the matrix values. Please note that this configuration may use the residual signal of all available components or planes.
As to the prediction domain, the following is noted.
The prediction domain can be, as described above, the spatial domain, i.e., operating on the residual, or frequency domain, i.e., operating on the residual after applying a transformation like DCT or DST. Furthermore, the domain can be a composition of both by transmitting the information to the decoder. In addition to the domain for the prediction, parameters related to the domain can be transmitted or derived by a backward-driven scheme.
Related to the domain parameters is an additional sub sampling of the chroma component, i.e., a chroma block (or rectangular shape) is scaled down horizontally, vertically, or both. In such a case, the prediction source might be down sampled as well or a set of prediction mode has to be selected that consider different resolution in spatial domain, frequency domain, or both. Another possibility is to upscale the prediction target so that the dimension of the prediction source and the prediction target match each other. The downscaling further improves the compression efficiency, especially for very flat areas in the image or video. For example, the prediction source contains both low and high frequencies but the chroma block (or rectangular shape) contains of low frequencies only. In this example, the sub sampling of the prediction source would removes the high frequencies and a less complex prediction mode can be employed and connected to that less information has to be transmitted to the decoder. Please note that the signaling of a downscaling usage might be done for each transform block, or for each prediction block, or even for a group of prediction blocks or for the whole image.
In addition to the additional down sampling approach, a bit depth adjustment might be transmitted to the decoder too. This case occurs when the precision of the samples are different along the different components. One possible way is to reduce or increase the number of bits for the source. Another possible configuration might increase or decrease the bit depth of the target and corrected the final result back to the correct bit depth. A further option is the usage of a set of predictors suitable for the different bit depth. Such a prediction mode would consider the different bit depth by the prediction parameters. The signaling level for the bit depth correction can be done for each block or shape, or for the whole picture or sequence, depending on the variation of the content.
As the prediction source, the following is noted.
The prediction source can be the first component or all available component. For each block (or rectangular shape), the prediction source might be signaled by the encoder. Alternatively, the prediction source can be a dictionary contains all possible blocks, whether prediction or transform block, from all available components and the prediction source that should be used for the prediction is signaled to the decoder by a syntax element in the bit stream.
As to the parameter derivation, the following is noted.
The parameters may, exemplarily, derived from the causal neighbourhood of the block—in the decoder so as to perform the elsewhere mentioned backward adaptivity or in the encoder, so as to signal the result in the data stream so as to drive the decoder in a forward-adaptive manner—by solving a least square error (LSE) optimization problem. The LSE can be formulated as:
Minimizing this equation with respect to a. The parameter for the equation has a closed form solution calculated as:
An Integer implementation is implemented for this method to replace the division operation by a lookup table and a multiplication operation. A2 is descaled in order to reduce the table size while A1 is descaled in order to avoid multiplication overflow. Only the most significant bits nA
A′=[A»rA]«rA
where
rA=max(bd(A)−nA,0)
Where bd(A) is the bitdepth of the value of A calculated by log2 A.
Now Aα′ can be recalculated as
Now the division can be represented by a lookup table
whose elements are represented by ntable bits. The index for this table is calculated as [A2»rA] and the table size is nA
Please note that in the above minimization problem, y represents the residual signal which, for the respective inter-component predicted block 440, is losslessly transmitted within the data stream. In other words, y is the correction signal for the inter-component predicted block 440. Same may be determined iteratively with, in each iteration, performing the solving of the above outlined LSE optimization problem. In this manner, the encoder may optimally decide on performing or not performing inter-component prediction, in choosing the inter-component prediction parameters such as prediction mode, prediction source and so forth.
As to the parameter signaling, the following is noted.
The prediction itself might be switchable and a header flag specifying the usage of the residual prediction should be transmitted in the beginning of the bit stream. When prediction is allowed, a syntax element specifying its local usage is embedded in the bit stream for a block 440, which as described before may be a residual block (or rectangular shape), a transform block (or rectangular shape), or even a group of transform blocks (or rectangular shapes). The first bit might denote whether the prediction is enabled or not and the following bits might denote the prediction mode, the prediction source, or the prediction domain and the related parameters. Please note that one syntax element can be used to enable or disable the prediction for both chroma components. It is also possible to signal the usage of the prediction as well as prediction mode and source for each second (e.g. chroma) component separately. Again, in order to achieve a high adaptation to the residual signal, a sorting criterion might be used to signal the prediction mode. For example, the most used prediction mode is one, then a sorted list would contain the mode one at index zero. Only one bit may then be used for signaling the most probable mode one. Furthermore, the usage of the prediction might be restricted In the case of applying the inter-component prediction on residuals, the correlation might be high if the same prediction mode for generating the residual is used among the different color components. Such a restriction is useful in intra predicted blocks (or rectangular shapes). For example, this inter-component prediction can be applied only if the same intra prediction mode as used for block (or rectangular shape) 442, which may be luma, is used for the block (or rectangular shape) 440, which may be chroma.
That is, the in the latter case, the block-wise adaptation of the inter-component prediction process would involve checking whether block 440 is associated with a spatial prediction mode as far as the prediction by predictor 2222 is concerned, and whether the spatial prediction mode coincides, or does not deviate by more than a predetermined amount, from the spatial mode using which the co-located block 442 is predicted by predictor 2221. The spatial prediction mode may, for example, comprise a spatial prediction direction along which already reconstructed samples neighbouring block 440 and 442, respectively, are extrapolated into block 440 and 442, respectively, in order to result in the respective prediction signal 2202 and 2201, respectively, which is then combined with z and x, respectively.
In an embodiment, the prediction source is the luma component. In this advantageous embodiment, which is depicted in
In another embodiment, the prediction source is transmitted for a 1 block or shape 440, such as a residual block, a group of residual blocks or shapes, e.g., for the size where the intra or inter prediction is applied.
For example, the prediction source is luma for the first chroma component and luma or the first chroma component for the second chroma component. This advantageous embodiment is similar to a configuration allowing all available planes as prediction source and corresponds to
To illustrate the aforementioned embodiments, see
It is noted that with respect to
A further embodiment, which was just outlined above, is the possibility that all components of picture 110/210 may, alternately, serve as a prediction source. See
With respect to
In a further embodiment, the prediction source can be all available component or a subset of the available component. In this advantageous embodiment, a weighting of the source may be signaled to the decoder.
In an advantageous embodiment, the prediction domain lies in the spatial domain. In this embodiment, the residual for a whole residual block or shape might be used or only a specific part of the residual block or shape, depending on the signaling configuration. The latter case is given when the prediction is signaled individually for each transform block or shape and a further subdivision of residual blocks or shapes into smaller transform blocks or shapes are allowed.
In a further embodiment, the prediction domain lies in the frequency domain. In this advantageous embodiment, the prediction is coupled to the transform block or shape size.
In a further embodiment, the prediction domain can be in the spatial or frequency domains. In this embodiment, the prediction domain is specified separately, either by forward signalization or backward-driven estimation depending on the local statistics.
The latter circumstance is illustrated in
Again, the syntax element 490 may signal the domain to be used for inter-component prediction for block 440 individually, for groups of blocks 440 or for the whole picture or for an even larger scope, such as a group of pictures.
In a further embodiment, both prediction domains are involved in the prediction process. In this advantageous embodiment of the invention, a prediction is made first in the spatial domain, and a further prediction is applied in the frequency domain with both predictions using different prediction modes and sources.
In an embodiment, the chroma block or shape can be subsampled horizontally, vertically, or both by some factors. In this embodiment, the down scale factor can be equal to the power of two. The usage of the down sampler is transmitted as a syntax element in the bit stream and the down sampler is fixed.
In a further embodiment, the chroma block or shape can be subsampled horizontally, vertically, or both by some factors. The factor can be transmitted in the bit stream and the down sampler is selected from a set of filters, where the exact filter can be addressed by an index transmitted in the bit stream.
In a further embodiment, the selected up sampling filter is transmitted in the bit stream. In this embodiment, the chroma blocks might be originally sub sampled, and hence, in order to use the prediction with a matching block or rectangle size, the up sampling has to be done before prediction.
In a further embodiment, the selected down sampling filter is transmitted in the bit stream. In this embodiment, the luma is down sampled in order to achieve the same block or rectangle size.
In an embodiment, a syntax element is signaled denoting the bit correction when the bit depth of the source and the target is different. In this embodiment, the luma precision can be decreased or the chroma precision can be increased in order to have the same bit depth for the prediction. In the latter case, the chroma precision is decreased back to the original bit depth.
In an embodiment, the number of prediction modes is two and the set of predictors is defined exactly as in the given example.
In a further embodiment, the number of prediction mode is one and the configuration is the same as described in the previous embodiment.
In a further embodiment, the number of predictors is freely adjustable with the set of predictors is defined exactly as in the given example. This embodiment is the more generic description of the example with α=1/m where m>0 denotes the prediction number or mode. Hence, m>0 denotes that the prediction should be skipped.
In a further embodiment, the prediction mode is fixed, i.e., the prediction is enabled. For this embodiment, one might enable the adaptive inter-plane prediction and set the number of predictors equal to zero.
In a further embodiment, the prediction is applied and the prediction parameters like a are derived from the neighbouring blocks or shapes. In this embodiment, the optimum a for a block or shape is calculated after the full reconstruction. The calculated a acts as parameter for the next block or shape in the local neighbourhood.
In a further embodiment, a syntax element is transmitted in the bit stream indicating the usage of the parameters derived from the local neighbourhood.
In a further embodiment, the parameters derived from the neighbourhood are used. In addition to that, a delta relative to the optimum parameters calculated in the encoder might be transmitted in the bit stream.
In a further embodiment of, the backward-driven selection scheme for the parameters are disabled and the optimum parameters are transmitted in the bit stream.
In a further embodiment, the usage of the starting α as well as the existing of a delta α in the bit stream is signaled separately.
In an embodiment, the signaling of prediction mode, prediction source, and prediction parameters are restricted to the same regular prediction mode. In this embodiment, the information related to the inter-plane prediction is transmitted only when the intra prediction mode for the chroma component is the same as used for the luma component.
In a further embodiment, the block is partitioned into windows of different sizes and the parameters for the current window are derived from the reconstructed previous window within the block. In a further embodiment, parameters for the first window are derived from reconstructed neighbouring blocks.
In a further embodiment of the, a syntax element is transmitted in the bit stream indicating the usage of the parameters derived from the locally neighbourhood to be used for the first window.
In a further embodiment, the windows can be scanned in a vertical, horizontal or vertical direction.
In a further embodiment, the parameters for the current window are derived from the previous window where the previous window is decided according to the scan position of the transform coefficients sub blocks.
In a further embodiment, the window scanning is limited to one scanning direction.
In a further embodiment, the parameters are derived using an integer implementation by using a lookup table and a multiplication operation instead of a division.
In an embodiment, a global flag transmitted in the header of the bit stream denote the usage of the adaptive inter-plane prediction. In this embodiment, the flag is embedded in the sequence level.
In a further embodiment, a global flag is transmitted in the header of the bit stream, with the embedment in the picture parameter level.
In a further embodiment, the number of predictors is transmitted in the header of the bit stream. In this embodiment, the number zero denotes that prediction is enabled, a number unequally to zero denote that the prediction mode is selected adaptively.
In an embodiment, the set of prediction mode is derived from the number of prediction modes.
In a further embodiment, a set of prediction mode is known at the decoder and the decoder specifying all model parameters of the prediction.
In a further embodiment, the prediction modes are all linear or affine.
In an embodiment, the set of predictors is hybrid, i.e., containing simple prediction modes using other planes as prediction source, and containing more complex prediction modes using all available planes and transform the input residual signals into another component or plane space.
In an embodiment, the usage of prediction is specified for each transform block or shape for each chroma component. In this embodiment, this information may be skipped when the luma component consists of zero-valued residual at the same spatial location.
In an embodiment, the modes are transmitted using truncated unary decomposition. In this embodiment, different context models are assigned for each bin index, however, limited to a specific number, e.g., by three. Furthermore, the same context models are used for both chroma components.
In a further embodiment, different chroma planes use different context model sets.
In a further embodiment, different transform block or shape sizes uses different context model sets.
In a further embodiment, the mapping of bins to prediction mode is dynamic or adaptively. In this embodiment, from decoder perspective, a decoded mode equal to zero denotes the most used mode up to the decoded time.
In a further embodiment, the prediction mode, and the prediction source if using a configuration allowing different prediction sources, is transmitted for a residual block or shape. In this embodiment, different block or shape sizes may use different context models.
The embodiment described next, especially concerns an example for how to code the prediction parameters for the “Cross Component Decorrelation” described so far.
Although not restricted thereto, the following description may be thought of as referring to the alternative where the dependent (second) component is reconstructed based on the reference (first) component signal x and the residual (correction) signal y via computing z=αx+y with using z as the prediction of the dependent component signal. The prediction may be applied in the spatial domain, for example. As in the examples above, the inter-component prediction may be applied to a hybrid-coding residual signal, i.e. first and second component signal may represent a hybrid coding's residual signal The following embodiment concentrates, however, on the signaling of □: this parameter is coded in a sub-picture basis such as, for example, in units of residual blocks into which the multi-component picture is sub-divided. The following is about the fact that the signal-able states of α should advantageously variable, too, in order to account for the fact that the range of optimal values for α depend on the sort of picture content which, in turn, varies in scope/units larger than residual blocks. Thus, in principle, the details set out below with respect to the transmission of □ may be transferred to the other embodiments outlined above as well.
The cross-component decorrelation (CCD) approach utilizes the remaining dependency between different color components enabling higher compression efficiency. An affine model can be employed for such an approach and the model parameters are transmitted in the bit stream as side information.
In order to minimize the side information cost, a limited set of possible parameters are transmitted only. For example, a possible CCD implementation in High Efficiency Video Coding (HEVC) could use a linear prediction model instead of an affine prediction model and the only mode parameter, i.e., the slope or gradient parameter α, could be limited in the range from 0 to 1 and be quantized non-uniformly. Particularly, the limited set of values for α could be αϵ{0, ±0.125, ±0.25, ±0.5, ±1}.
The selection of such a quantization for the linear prediction model parameter could be based on the fact that the distribution of α is symmetrically concentrated around the value 0 for natural video content stored in the Y′CbCr color space. In the Y′CbCr, the color components are decorrelated by using a fixed transformation matrix to convert from R′G′B′ before entering the compression stage. Due to the fact that a global transformation is often sub-optimal, the CCD approach can achieve higher compression efficiency by removing the remaining dependency between the different color components.
However, such an assumption does not hold true for different kind of content, especially for natural video content stored in the R′G′B′ color space domain. In this case, the gradient parameter α is often concentrated around the value 1.
Similar to the case given above, the distribution becomes completely different when CCD is extended to the first chroma component as the prediction source. Hence, it may be beneficial to adjust the CCD parameters according to the given content.
For example, for Y′CbCr, the quantization of a can be set to (0, ±0.125, ±0.25, ±0.5, ±1) while for R′G′B′ the quantization of a could be inverted, i.e. (0, ±1, ±0.5, ±0.25, ±0.125). However, allowing different entropy coding paths introduces additional issues. One problem is that the implementation becomes more expensive in terms of area and speed for both hardware and software. In order to avoid this drawback, the parameter range can be specified in the picture parameter set (PPS) level where the usage of CCD is also indicated.
That is, a syntax element signaling a is transmitted at a sub-picture level/granularity, e.g. individually for a residual block. It could be called res_scale_value, for example. It could be, for example, coded using a (truncated) unary binarization combined with binary arithmetic coding of the bin string. The mapping of the (non-binarized) values of res_scale_value onto α could be implemented such that the mapping is varied using the pps, i.e. for the complete picture or even at larger scope such as on a per picture sequence basis. The variation could vary the number of representable □ values, the order of representable α values and the selection of representable □ values, i.e. their actual values. Merely allowing for switching the order among the representable □ values or restricting the representable α values to positive or negative values only is one way to provide for content adaptivity, but the embodiments outlined further below allow—by merely minor increased overhead—for an even increased flexibility in varying the mapping from the sub-picture granularly signalled res_scale_value to α values such as a variation of the size, members and member order of the mapping's co-domain (representable set of □ values), and it turned out that the advantages provided by this provision in terms of bit savings for transmitting res_scale_value overcompensates □ seen over a typical blend of video contents coded in YCC or RGB—the necessity of signaling the variation of the mapping.
The specification of the range for a can be done, for example, as follows. In the case of Y′CbCr, the advantageous sub-set can be for example (0, ±0.125, ±0.25, ±0.5) while it can be (0, ±0.5, ±1) or (0, 0.5, 1) or even (0, 0.5, 1, 2) for R′G′B′. In order to achieve the mentioned behavior, the range can be specified in the PPS using two syntax elements representing two values. Given the above example and the fact that the prediction is performed with 3 point accuracy, i.e. the prediction sample values are multiplied by α and then right-shifted by 3, the range configuration for Y′CbCr can be transmitted as [−3, 3].
However, with such a signaling, only the second and the third case for R′G′B′ can be achieved using [2, 3] and [2, 4]. In order to achieve the first example for R′G′B′, the sign has to be separated using an additional syntax. Furthermore, it is sufficient to transmit the delta for the second value with the first value serves as the starting point. For this example, the second R′G′B′ configuration is [2, 1] instead of [2, 3].
In the case of prediction from the first chroma component, the range values can be specified separately for each chroma component. Note that this can be done even without the support for the prediction from the first chroma component.
Given the limit specified in the PPS, the parsing and reconstruction of the prediction parameter α is modified as follows. For the case without limit, i.e., αϵ{0, ±0.125, ±0.25, ±0.5, ±1} and a 3 point accuracy precision, the final αF is reconstructed as follows where αP denotes the parsed value from the bit stream: αF=1«αP. This is modified according to αF=1«(αP+αL) where αL denotes the offset for the smallest absolute value when the range lies completely in positive or negative range.
Both values are used to derive the limit on the number of bins parsed from the bit stream when αP is binarized using truncated unary code. Please note that the parsing of the sign might depend on the given range. A better way to utilize this aspect is to encode the sign before the absolute value of α. After the coding the sign, the number of maximum bins to be parsed from the bit stream can be derived. This case is useful when the range is asymmetric, e.g., it is [−1, 3].
Often, a different order is desired like (0, 1, 0.5), e.g., for some R′G′B′ content. Such an inversion can be simply achieved by setting the range values according to [3, 2]. In this case, the number of bins parsed from the bit stream is still 2 (the absolute difference between the two range values is n=1 and the number of bins in case of truncated unary code is n+1). Then, the inversion can be achieved by two ways. The first option introduces a fixed offset, which is equal to 2 times the current value if no inversion is desired and the maximum representable value in the case of inversion. The second and more elegant way to do this is to expand to range transmitted in the PPS to a memory and access the corresponding value through a look-up operation. This approach leads to a unified logic and single path for both cases. For example, the case (0, 0.25, 0.5, 1) is signaled in the PPS by transmitting [1, 3] and a memory is created with the following entries (0, 1, 2, 3). On the other way, i.e., in the inverted case with the transmitted values [3, 1] in the bit stream, a memory is created with the following entries (0, 3, 2, 1). Using such an approach, the final αF can be formulated as αF=1 «(LUT[αP]) where LUT[αP] denotes the look-up operation.
In order to explain the most recently mentioned aspect in more detail, reference is made to
Picture 102/202 may be part of a video 500.
The data stream 104, into which picture 102/202 is coded, comprises a high-level syntax element structure 510 which relates at least to the whole picture 102/202 or even beyond that to a picture sequence out of video 500 which comprises picture 102/202. This is illustrated in
The first-weight syntax element is coded using a truncated unary binarization. An example for such TU binarization is shown in
As described above, it is feasible that the syntax element structure 510 allows the decoder to derive first and second interval bounds 524 therefrom. The corresponding syntax elements in structure 510 may be coded independent/separately from each other, or relative to each other, i.e. differentially. The interval bound values 524 identify elements out of sequence 520, namely using the above-presented exponential function which may be implemented using respective bit shifts. By this measure, the interval bounds 524 indicate to the decoder the co-domain of mapping 522.
As also described above, the case of α being zero may be signaled to the decoder separately using a respective zero flag 526. If the zero flag has a first state, the decoder sets a to be zero and skips reading any first-weight syntax element 514 for the respective block 440. If the zero flag has the other state, then the decoder reads the first-weight syntax element 514 from data stream 104 and determines the actual value of the weight using mapping 522.
Further, as outlined above merely an absolute part 528 of the first-weight syntax element 514 may be coded using the truncated unary binarization, and a sign part of the first-weight syntax element 514 may be coded previously. In this way, it is feasible for encoder and decoder to appropriately set their lengths, i.e. the number of bin strings, of binarization 518 for the absolute path 528 as the sign part 530 determines whether the value α of block 440 belongs to those members of the co-domain 532 of mapping 522 out of set 520 which have the positive α values, or to the other part thereof consisting of the negative α values. Naturally, the sign part 530 of the first-weight syntax element 514 may not be present and may not be read by a decoder in case the decoder derives from the high-level syntax element structure 510 that the co-domain 530 merely comprises positive α values or merely negative α values, exclusively.
As also became clear from the above discussion, the interval bounds 524 and the order in which same are coded in the high-level syntax element structure 510 may determine the order in which the absolute part 528 “traverses” the members of co-domain 532.
With respect to
Per prediction block 308, prediction parameters may be signaled within the data stream 104. These prediction parameters are used for hybrid encoding/decoding each component of the picture 102/202. The prediction parameter 602 may be signaled within the data stream 104 for each component individually, for all components commonly or partially component specifically and component-globally. For example, the prediction parameter 602 may, for example, distinguish between, inter alia, spatially and/or temporally predicted blocks 308 and while, for example, this indication may be common among the components, the temporal prediction-related parameters among prediction parameter 602 may be signaled within the data stream 104 component-specifically. Using the implementation of
Further, the data stream 104 signals, in accordance with the embodiment of
Per residual/transform block, the data stream 104 may comprise residual data 6081, 6082, 6083 in the form of, for example, quantized transform coefficients. Dequantizing and inverse transforming the residual data 6081 to 6083 reveals a residual signal in the spatial domain for each component, namely 6101, 6102 and 6103.
As illustrated in
Thus, if for example the flag 6142 indicates for the second component that inter-component prediction is to be used, the inter-component prediction parameter 6162 indicates the weight at which the residual signal 6101 is to be added to the residual signal 6102 so as to replace residual signal 6102 by the new residual signal 6102′. The latter is then used instead of residual signal 6102 so as to correct the respective prediction signal 1202/2202.
Likewise, if flag 6143 indicates the usage of inter-component prediction for a respective residual/transform block, then the inter-component prediction parameter 6163 indicates the weight α3 at which the residual signal 6101 is added to the residual signal 6103 so as to replace the latter and result in the new residual signal 6103′, which is then used in order to correct the prediction signal 1203/2203 of the third component.
Instead of separately transmitting a first flag 6141/2 conditionally followed by an inter-component prediction parameter 6162/3, another signaling may also be feasible. For example, as the weight α2/3 may be a signed value, the domain of possible values of which is symmetrically arranged around zero, the absolute value of a2/3 may be used so as to distinguish between the case of not using inter-component prediction as far as component 2/3 is concerned, and using inter-component prediction for the respective component 2/3. In particular, if the absolute value is zero, this corresponds to not using inter-component prediction. The signaling of any sign flag for the respective parameter α2/3 may then be suppressed in the data stream 104. As a reminder, in accordance with the embodiment of
According to a specific syntax example, the inter-component prediction flag 612 may be signaled within a picture parameter set of the data stream 104. The syntax element may be denoted as
It is noted, however, that the scope of flag 612 may be chosen differently. For example, flag 612 may relate to smaller units, such as slices of picture 102/202 or greater units such as groups of pictures or a sequence of pictures.
Per residual/transform block, syntax elements 6142, 6162, 6143 and 6163 may be signaled conditionally as just described using the following syntax with parameter c indicating the component and, accordingly, assuming one value for component 2 and the other value for component 3, and parameters x0 and y0 denoting, for example, the respective residual/transform block by way of, for example the position of its upper left corner sample.
cross_comp_pred( x0, y0, c ) {
log2_res_scale_abs_plus1[ c ]
if( log2_res_scale_abs_plus1[ c ] != 0 )
res_scale_sign _flag[ c ]}
That is, the above syntax would occur in the data stream 104, for example, for each residual or transform block of the picture twice, one for each of the second and third components, such as the chroma components whereas the luma component would form the base (first) component.
As previously indicated with respect to
The semantics of the syntax elements presented so far could be provided as follows:
cross_component_prediction_enabled_flag equal to 1 specifies that log 2_res_scale_abs_plus1 and res_scale_sign_flag may be present in the transform unit syntax for pictures referring to the PPS. cross_component_prediction_enabled_flag equal to 0 specifies that log 2_res_scale_abs_plus1 and res_scale_sign_flag are not present for pictures referring to the PPS. When not present, the value of cross_component_prediction_enabled_flag is inferred to be equal to 0. When ChromaArrayType is not equal to 3, it is a requirement of bitstream conformance that the value of cross_component_prediction_enabled_flag shall be equal to 0.
log 2_res_scale_abs_plus1[c] minus 1 specifies the base 2 logarithm of the magnitude of the scaling factor ResScaleVal used in cross-component residual prediction. When not present, log 2_res_scale_abs_plus1 is inferred equal to 0.
res_scale_sign_flag[c] specifies the sign of the scaling factor used in cross-component residual prediction as follows:
The variable ResScaleVal[cIdx][x0][y0] specifies the scaling factor used in cross-component residual prediction. The array indices x0, y0 specify the location (x0, y0) of the top-left luma sample of the considered transform block relative to the top-left luma sample of the picture. The array index cIdx specifies an indicator for the color component; it is equal to 1 for Cb, and equal to 2 for Cr.
The variable ResScaleVal[cIdx][x0][y0] is derived as follows:
In the above, ResScaleVal corresponds to the aforementioned a.
That is, the sample values within a residual/transform block 604, for which inter-component prediction is used, i.e. for which log 2 res_scale_absplus1≠0, at positions x, y, i.e. r[x][y], are computed on the basis of the co-located residual sample values ry[x][y] of the first component according to, for example,
r[x][y]+=(ResScaleVal[cIdx][xTbY][yTbY]*((ry[x][y]«BitDepthC)»BitDepthY))»3
wherein BitDepthC is the bit depth of the dependent components 2/3 and BitDepthY is the bit depth of the first component.
The right shift “»3” corresponds to a division by eight. According to the present example, the signalizable α values are, as already exemplified above, {0, ±0.125, ±0.25, ±0.5, ±1}.
log 2_res_scale_abs_plus1 may be signaled in the data stream using a truncated unary binarization and binary arithmetic coding and binary arithmetic decoding and truncated unary debinarization, respectively. The binary arithmetic de/coding may be context-adaptively. The context may be selected based on the local neighbourhood. For example, per bin of the binarization of log 2_res_scale_abs_plus1 a different context may be chosen. Different sets of contexts may be used for both chroma components. Likewise, res_scale_sign_flag may be signaled in the data stream binary arithmetic coding and binary arithmetic decoding, respectively. The binary arithmetic de/coding may be context-adaptively. And different context may be used for both chroma components. Alternatively, the same contexts would be used for both chroma components.
As described, the mapping from log 2_res_scale_abs_plus1 to the absolute value of □, i.e. ResScaleVal»3, may be done arithmetically using a bit shift operation, i.e. by an exponential function.
The signaling of log 2_res_scale_abs_plus1 and res_scale_sign_flag for the two chroma components may be skipped for a certain residual/transform block if the luma component within the latter is zero. As log 2_res_scale_abs_plus1 and res_scale_sign_flag are examples for signaling 614 and 616 in
Thus,
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
Thus, above description, inter alia described the following embodiments:
According to a first embodiment, a decoder is configured to decode a multi-component picture 202 spatially sampling a scene with respect to different components 206, 208, 210, by reconstructing a first component signal 2561; 2701 relating to a first component 206 of the multi-component picture 202 from a data stream 104; reconstructing 400 a portion 440 of a second component signal 256′2; 270′2 relating to a second component 208 of the multi-component picture 202 from a spatially corresponding portion of the reconstructed first component signal 2561; 2701 and a correction signal 2562; 2702 derived from the data stream.
According to a second embodiment, the decoder according to the first embodiment is configured as a block-based hybrid video decoder configured to sub-divide the multi-component picture 202 regularly into tree blocks 302, subdivide the tree blocks using recursive multi-tree subdivisioning into code blocks 304 individually and subdivide each code block using recursive multi-tree subdivisioning into prediction blocks 308 and using recursive multi-tree subdivisioning into residual blocks 312 individually, and subdivide the residual blocks into transform blocks 316; select prediction modes depending on the data stream at a granularity depending on the code blocks or depending on the prediction blocks; set prediction parameters depending on the data stream at a granularity of the prediction blocks; derive a prediction signal 2201, 2202, 2203 using the prediction modes and prediction parameters; derive a residual signal 2561, 256′2, 256′3 within each residual block by performing inverse transformations within the transform blocks individually, and reconstruct the multi-component picture 202 by correcting the prediction signal using the residual signal, the decoder being responsive to a signaling 6142; 6143 in the data stream so as to, at a granularity of the residual blocks and/or transform blocks, switch between performing the reconstructing of the second component signal from the spatially corresponding portion of the reconstructed first component signal and the correction signal and a reconstructing the second component signal from the correction signal irrespective the spatially corresponding portion of the reconstructed first component signal.
According to a third embodiment, the decoder according to the first embodiment is configured such that the first component signal is a prediction residual of a temporally, spatially or inter-view prediction of the first component 206 of the multi-component picture 202, and to perform the temporal, spatial or inter-view prediction of the first component 206 of the multi-component picture 202 and reconstruct the first component 206 of the multi-component picture by correcting the temporal, spatial or inter-view prediction of the first component using the reconstructed first component signal.
According to a fourth embodiment, the decoder according to the first embodiment is configured such that the second component signal is a prediction residual of a temporal, spatial or inter-view prediction of the second component 208 of the multi-component picture 202, and to perform the temporal, spatial or inter-view prediction of the second component 208 of the multi-component picture and reconstruct the second component 208 of the multi-component picture 202 by correcting the temporal, spatial or inter-view prediction of the multi-component picture 202 using the reconstructed second component signal.
According to a fifth embodiment, the decoder according to the first embodiment is configured to obtain the correction signal 2562 by performing an inverse spectral transformation 2262 onto spectral coefficients relating to the second component 208 derived from the data stream 104 so as to obtain the correction signal in the spatial domain.
According to a sixth embodiment, the decoder according to the first embodiment is configured to, in reconstructing the second component signal, adaptively set a first weight □2 at which the spatially corresponding portion 442 of the reconstructed first component signal influences the reconstruction of the second component signal at a sub-picture granularity.
According to a seventh embodiment, the decoder according to the sixth embodiment is configured to, at the sub-picture granularity, adaptively set, depending on signaling in the data stream, the first weight □2.
According to an eighth embodiment, the decoder according to the sixth embodiment is configured to, at the sub-picture granularity, read a first weight's absolute value from the data stream and, in a manner conditionally depending on whether same is zero, a first weight's sign.
According to a ninth embodiment, the decoder according to the eighth embodiment is configured to, at the sub-picture granularity, skip reading the first weight's absolute value from the data stream and, in a manner conditionally depending on whether same is zero, the first weight's sign at portions where the first component signal is zero.
According to a tenth embodiment, the decoder according to the sixth embodiment is configured to, in reconstructing the second component signal, add the spatially corresponding portion 442 of the reconstructed first component signal, weighted by the first weight □2, to the correction signal.
According to an eleventh embodiment, the decoder according to the tenth embodiment is configured to, in reconstructing the second component signal, perform the addition in the spatial domain in a sample-wise manner.
According to a twelfth embodiment, the decoder according to the sixth embodiment is configured to set the first weight by deriving a high-level syntax element structure 510 from the data stream, having at least picture scope; constructing, at the at least picture scope, a mapping 522 from a domain set of possible bin strings 518 of a predetermined binarization onto a co-domain 520 of possible values of the first weight; and deriving the first weight by reading, at sub-picture granularity, a first-weight syntax element 514 from the data stream using the predetermined binarization and subjecting the first-weight syntax element's bin string to the mapping.
According to a thirteenth embodiment, the decoder according to the twelfth embodiment is configured to derive from the high-level syntax element structure lower and upper bounds 524 of an interval of co-domain values out of a predetermined set of possible non-zero values of the first weight, and, in deriving the first weight, additionally read, at sub-picture granularity, a zero flag 526 from the data stream indicating whether the first weight shall be zero or not with performing the reading of the first-weight syntax element and the subjecting conditionally depending on the zero flag.
According to a fourteenth embodiment, the decoder according to the twelfth embodiment is configured to, in constructing the mapping, derive from the high-level syntax element structure sign and absolute value of a lower bound integer value and sign and absolute value of a upper bound integer value and apply an integer-domain exponential function onto the absolute values of lower bound integer value and upper bound integer value and grab-out, out of a co-domain of the integer-domain exponential function exclusive zero, the co-domain of possible values of the first weight.
According to a fifteenth embodiment, the decoder according to the twelfth embodiment is configured to use a truncated unary binarization as the predetermined binarization for an absolute value part of the first-weight syntax element and, in deriving the first weight, read a sign part 530 of the first-weight syntax element from the data stream before the absolute part 530 of the first-weight syntax element and set a length of the truncated unary binarization of the absolute part of the first-weight syntax element depending on the sign part and the co-domain of possible values of the first weight.
According to a sixteenth embodiment, the decoder according to the twelfth embodiment is configured to derive from the high-level syntax element structure first and second interval bounds 524 and the decoder is configured to use a truncated unary binarization of TU bin strings as the predetermined binarization for an absolute value part of the first-weight syntax element and, in constructing the mapping, reverse an order the possible values onto which the TU bin strings are mapped traverse the co-domain of possible values depending on a comparison of the first and second interval bounds.
According to a seventeenth embodiment, the decoder according to the first embodiment is configured to, in reconstructing the second component signal 208, adaptively set a second weight at which the correction signal influences the reconstruction of the second component signal at a sub-picture granularity.
According to an eighteenth embodiment, the decoder according to the first embodiment is configured to, in reconstructing the second component signal, adaptively set weights of a weighted sum of the correction signal and the spatially corresponding portion 442 of the reconstructed first component signal at a sub-picture granularity and use the weighted sum as a scalar argument of a scalar function which is, at least per picture, constant so as to obtain the reconstructed second component signal.
According to a nineteenth embodiment, the decoder according to the first embodiment is configured to, in reconstructing the second component signal, adaptively set weights of a weighted sum of the correction signal, the spatially corresponding portion 442 of the reconstructed first component signal and a constant at a sub-picture granularity and use the weighted sum as a scalar argument of a scalar function which is, at least per picture, constant so as to obtain the reconstructed second component signal.
According to a twentieth embodiment, the decoder according to the eighteenth embodiment is configured to set the weights in a backward-driven manner based on a local neighbourhood.
According to a twenty-first embodiment, the decoder according to the eighteenth embodiment is configured to set the weights in a backward-driven manner based on a local neighbourhood, with correcting the weights in a forward-driven manner.
According to a twenty-second embodiment, the decoder according to the twentieth embodiment is configured to set the weights in a backward-driven manner based on attributes of an already decoded portion of the multi-component picture.
According to a twenty-third embodiment, the decoder according to the eighteenth embodiment is configured to set the weights, in a combined backward and forward adaptive manner, or a forward-adaptive manner, a backward-adaptive manner, to default values, at a first spatial granularity, and refine the weights in a backward-driven manner based on a local neighbourhood at a second spatial granularity being finer than the first spatial granularity.
According to a twenty-fourth embodiment, the decoder according to the sixteenth embodiment is configured to set the weights to one of m different states depending on m-ary sub-picture level syntax elements wherein the decoder is configured to derive m from an higher-level syntax element 510.
According to a twenty-fifth embodiment, the decoder according to the first embodiment is configured to adaptively switch, at a sub-picture granularity, in reconstructing the second component signal, between performing an inverse spectral transformation onto spectral coefficients relating to the second component 208 derived from the data stream so as to obtain the correction signal x in a spatial domain and reconstructing 400 the second component signal z using the correction signal x in the spatial domain, and obtaining the correction signal x in a spectral domain from the data stream, reconstructing 400, in the spectral domain, the second component signal z using the correction signal x as obtained in the spectral domain, and subjecting the, in the spectral domain, reconstructed second component signal z to an inverse spectral transformation.
According to a twenty-sixth embodiment, the decoder according to the twenty-fifth embodiment is configured to perform the adaptively switching in a backward-adaptive manner and/or forward-adaptive manner 490.
According to a twenty-seventh embodiment, the decoder according to the first embodiment is configured to adaptively switch, at a sub-picture granularity, a direction of reconstruction of the second component signal between performing the reconstruction of the second component signal from the spatially corresponding portion of the reconstructed first component signal and reversing the reconstruction so as to reconstruct of the first component signal from a spatially corresponding portion of the reconstructed second component signal.
According to a twenty-eighth embodiment, the decoder according to the first embodiment is configured to adaptively switch, responsive to a syntax element 472 signaling an order among the first and second component signals, a direction of reconstruction of the second component signal between performing the reconstruction of the second component signal from the spatially corresponding portion of the reconstructed first component signal and reversing the reconstruction so as to reconstruct of the first component signal from a spatially corresponding portion of the reconstructed second component signal.
According to a twenty-ninth embodiment, the decoder according to the first embodiment is configured to adaptively switch, at a sub-picture granularity, the reconstruction of the second component signal between reconstructing same merely based on the reconstructed first component signal and reconstructing same based on the reconstructed first component signal and a third reconstructed component signal.
According to a thirtieth embodiment, in the decoder according to the first embodiment, the first and second components are color components.
According to a thirty-first embodiment, in the decoder according to the first embodiment, the first component is luma and the second component is a chroma component.
According to a thirty-second embodiment, in the decoder according to the first embodiment, the decoder is responsive to a first syntax element 612 in the data stream so as to, depending on the first syntax element 612, enable the reconstruction of the second component signal based on the reconstructed first component signal, read sub-picture level syntax elements 6142, 6162, 6143, 6163 from the data stream in parsing the data stream and adapt the reconstruction of the second component signal based on the reconstructed first component signal at a sub-picture granularity based on the sub-picture level syntax elements, and disable the reconstruction of the second component signal based on the reconstructed first component signal, and change the parsing the data stream to address the data stream not comprising the sub-picture level syntax elements.
According to a thirty-third embodiment, in the decoder according to the first embodiment, the decoder is configured to, in a backward-driven manner, switch locally between enabling the reconstruction of the second component signal based on the reconstructed first component signal, and disabling the reconstruction of the second component signal based on the reconstructed first component signal.
According to a thirty-fourth embodiment, in the decoder according to the thirty-third embodiment, the decoder is configured to perform the local switching in a backward-driven manner.
According to a thirty-fifth embodiment, in the decoder according to the thirty-third embodiment, the decoder is configured such that the first component signal is a prediction residual of a temporally, spatially or inter-view prediction of the first component of the multi-component picture, and to perform the temporal, spatial or inter-view prediction of the first component of the multi-component picture and reconstruct the first component of the multi-component picture by correcting the temporal, spatial or inter-view prediction of the first component using the reconstructed first component signal, the decoder is configured such that the second component signal is a prediction residual of a temporal, spatial or inter-view prediction of the second component of the multi-component picture, and to perform the temporal, spatial or inter-view prediction of the second component of the multi-component picture and reconstruct the second component of the multi-component picture by correcting the temporal, spatial or inter-view prediction of the multi-component picture using the reconstructed second component signal, and the decoder is configured to perform the local switching by locally checking whether the first and the second component signals are prediction residuals of a spatial prediction and whether an intra-prediction mode of the spatial prediction coincides, or by locally checking whether the first and the second component signals are prediction residuals of a spatial prediction and whether an intra-prediction mode of the spatial prediction does not deviate by more than a predetermined amount.
According to a thirty-sixth embodiment, in the decoder according to the thirty-third embodiment, the decoder is configured to decide on the local switching firstly in a backward-driven manner with modifying the decision in a forward-adaptive manner responsive to a signaling in the data stream.
According to a thirty-seventh embodiment, in the decoder according to the first embodiment, the decoder is responsive to a second syntax element in the data stream so as to, depending on the second syntax element, read sub-picture level syntax elements from the data stream in parsing the data stream and adapt the reconstruction of the second component signal based on the reconstructed first component signal at a sub-picture granularity based on the sub-picture level syntax elements, and perform the reconstruction of the second component signal based on the reconstructed first component signal non-adaptively.
According to a thirty-eighth embodiment, in the decoder according to the first embodiment, the first and second components 206, 208 are two of three color components, and the decoder is configured to also reconstruct a third component signal relating to the third color component 210 of the multi-component picture 202 from a spatially corresponding portion of the reconstructed first or second component signal and a correction signal derived from the data stream for the third component, wherein the decoder is configured to perform the reconstruction of the second and third component signals on a sub-picture level adaptively individually.
According to a thirty-ninth embodiment, in the decoder according to the first embodiment, the first component 206 is luma, the second component 208 is a first chroma component and the third component 210 is a second chroma component and the decoder is configured to entropy decode first sub-picture level syntax elements 6142, 6162 for adapting the reconstruction of the second component signal relating to the first color component of the multi-component picture and second sub-picture level syntax elements 6143, 6163 for adapting the reconstruction of the third component signal relating to the first color component of the multi-component picture from the spatially corresponding portion of the reconstructed first or second component signal context-adaptively using the same contexts.
According to a fortieth embodiment, in the decoder according to the first embodiment, the first component is luma, the second component is a first chroma component and the third component is a second chroma component and the decoder is configured to entropy decode first sub-picture level syntax elements for adapting the reconstruction of the second component signal relating to the first color component of the multi-component picture and second sub-picture level syntax elements for adapting the reconstruction of the third component signal relating to the first color component of the multi-component picture from the spatially corresponding portion of the reconstructed first or second component signal context-adaptively using separate contexts.
According to a forty-first embodiment, in the decoder according to the first embodiment, the decoder is configured to read sub-picture level syntax elements from the data stream in parsing the data stream and adapt the reconstruction of the second component signal based on the reconstructed first component signal at a sub-picture granularity based on the sub-picture level syntax elements, and, in parsing the data stream, check as to whether, for a currently decoded portion of the second component, the spatially corresponding portion 442 of the reconstructed first component signal is zero and, depending on the check, explicitly read the sub-picture level syntax elements from the data stream and perform the reconstruction of the second component signal from the spatially corresponding portion of the reconstructed first component signal, or skip the explicit reading.
According to a forty-first embodiment, in the decoder according to the first embodiment, the decoder is configured to entropy decode the sub-picture level syntax elements from the data stream using a Golomb Rice Code.
According to a forty-first embodiment, in the decoder according to the forty-second embodiment, the decoder is configured to, in entropy decoding the sub-picture level syntax elements from the data stream, binary arithmetic decode bins of the Golomb Rice Code.
According to a forty-first embodiment, in the decoder according to the forty-third embodiment, the decoder is configured to, in entropy decoding the sub-picture level syntax elements from the data stream, binary arithmetic decode bins of the Golomb Rice Code at different bin positions using different contexts.
According to a forty-first embodiment, in the decoder according to the forty-third embodiment, the decoder is configured to, in entropy decoding the sub-picture level syntax elements from the data stream, binary arithmetic decode bins of the Golomb Rice Code at bin positions exceeding a predetermined value, context-less.
According to a forty-first embodiment, in the decoder according to the first embodiment, the decoder is configured to, in reconstructing the second component signal, spatially re-scale, and/or perform a bit-depth precision mapping on, the spatially corresponding portion of the reconstructed first component signal.
According to a forty-first embodiment, in the decoder according to the forty-sixth embodiment, the decoder is configured to adapt the spatially re-scaling, and/or performance of the bit-depth precision mapping, in a backward- and/or forward-adaptive manner.
According to a forty-first embodiment, in the decoder according to the forty-sixth embodiment, the decoder is configured to adapt the spatially re-scaling by selecting a spatial filter, in a backward- and/or forward-adaptive manner.
According to a forty-first embodiment, in the decoder according to the forty-eighth embodiment, the decoder is configured to adapt the performance of the bit-depth precision mapping, by selecting a mapping function, in a backward- and/or forward-adaptive manner.
According to a fiftieth embodiment, in the decoder according to the first embodiment, the decoder is configured to, in reconstructing the second component signal, reconstruct the second component signal from a spatially low-pass filtered version of the reconstructed first component signal.
According to a fifty-first embodiment, in the decoder according to the fiftieth embodiment, the decoder is configured to perform the reconstruction of the second component signal from the spatially low-pass filtered version of the reconstructed first component signal in a forward adaptive manner or in a backward-adaptive manner.
According to a fifty-second embodiment, in the decoder according to the fiftieth embodiment, the decoder is configured to adapt the reconstruction of the second component signal from the spatially low-pass filtered version of the reconstructed first component signal by setting a low-pass filter used for the low-pass filtering in a forward adaptive manner or in a backward-adaptive manner.
According to a fifty-third embodiment, in the decoder according to the forty-fifth embodiment, the decoder is configured to perform the spatial low-pass filtering leading to the low-pass filtered version of the reconstructed first component signal using binning.
According to a fifty-fourth embodiment, an encoder is configured to encode a multi-component picture 202 spatially sampling a scene with respect to different components 206, 208, 210, by encoding 400 a portion 440 of a second component signal 256′2; 270′2 relating to a second component 208 of the multi-component picture 202 by inter-component prediction on the basis of a spatially corresponding portion of a reconstructed first component signal 2561; 2701 and inserting a correction signal 2562; 2702 for correcting the inter-component prediction into the data stream.
According to a fifty-fifth embodiment, a method for decoding a multi-component picture 202 spatially sampling a scene with respect to different components 206, 208, 210 comprises reconstructing a first component signal 2561; 2701 relating to a first component 206 of the multi-component picture 202 from a data stream 104; and reconstructing 400 a portion 440 of a second component signal 256′2; 270′2 relating to a second component 208 of the multi-component picture 202 from a spatially corresponding portion of the reconstructed first component signal 2561; 2701 and a correction signal 2562; 2702 derived from the data stream.
According to a fifty-sixth embodiment, a method for encoding a multi-component picture 202 spatially sampling a scene with respect to different components 206, 208, 210 comprises encoding 400 a portion 440 of a second component signal 256′2; 270′2 relating to a second component 208 of the multi-component picture 202 by inter-component prediction on the basis of a spatially corresponding portion of a reconstructed first component signal 2561; 2701 and inserting a correction signal 2562; 2702 for correcting the inter-component prediction into the data stream.
According to a fifty-seventh embodiment, a computer program has a program code for performing, when running on a computer, a method according to the fifty-fifth or fifty-sixth embodiment.
Marpe, Detlev, Nguyen, Tung, Khairat Abdelhamid, Ali Atef Ibrahim
Patent | Priority | Assignee | Title |
12155850, | Apr 08 2013 | DOLBY VIDEO COMPRESSION, LLC | Inter-component prediction |
Patent | Priority | Assignee | Title |
9661340, | Oct 22 2012 | Microsoft Technology Licensing, LLC | Band separation filtering / inverse filtering for frame packing / unpacking higher resolution chroma sampling formats |
9781447, | Jun 21 2012 | GOOGLE LLC | Correlation based inter-plane prediction encoding and decoding |
9979960, | Oct 01 2012 | Microsoft Technology Licensing, LLC | Frame packing and unpacking between frames of chroma sampling formats with different chroma resolutions |
20050013363, | |||
20110142358, | |||
20130113884, | |||
20130121415, | |||
20150063438, | |||
20150078447, | |||
CN101616329, | |||
JP2011124846, | |||
JP2012129650, | |||
JP2013021557, | |||
KR1020110063860, | |||
KR1020130006690, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Dec 15 2015 | NGUYEN, TUNG | GE VIDEO COMPRESSION, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 053436 | /0626 | |
Dec 15 2015 | MARPE, DETLEV | GE VIDEO COMPRESSION, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 053436 | /0626 | |
Dec 21 2015 | KHAIRAT ABDELHAMID, ALI ATEF IBRAHIM | GE VIDEO COMPRESSION, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 053436 | /0626 | |
Aug 07 2020 | GE VIDEO COMPRESSION, LLC | (assignment on the face of the patent) | / | |||
Aug 19 2024 | GE VIDEO COMPRESSION, LLC | DOLBY VIDEO COMPRESSION, LLC | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 069450 | /0446 |
Date | Maintenance Fee Events |
Aug 07 2020 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Feb 28 2026 | 4 years fee payment window open |
Aug 28 2026 | 6 months grace period start (w surcharge) |
Feb 28 2027 | patent expiry (for year 4) |
Feb 28 2029 | 2 years to revive unintentionally abandoned end. (for year 4) |
Feb 28 2030 | 8 years fee payment window open |
Aug 28 2030 | 6 months grace period start (w surcharge) |
Feb 28 2031 | patent expiry (for year 8) |
Feb 28 2033 | 2 years to revive unintentionally abandoned end. (for year 8) |
Feb 28 2034 | 12 years fee payment window open |
Aug 28 2034 | 6 months grace period start (w surcharge) |
Feb 28 2035 | patent expiry (for year 12) |
Feb 28 2037 | 2 years to revive unintentionally abandoned end. (for year 12) |