Provided are a method for image prediction of a multi-view video codec capable of improving coding efficiency, and a computer readable recording medium therefor. The method for image prediction of a multi-view video codec includes partitioning an image to a plurality of base blocks, acquiring information of reference images which are temporally different, acquiring information of reference images which have different views, and predicting a target block based on the acquired information. Accordingly, an image that is most similar to an image of a view to be currently compressed is generated using multiple images of different views, so that coding efficiency can be improved.

Patent
   RE47897
Priority
Jan 11 2007
Filed
Sep 06 2018
Issued
Mar 03 2020
Expiry
Jan 11 2028
Assg.orig
Entity
Small
1
30
currently ok
0. 18. A method for image decoding of a multi-view video codec, which decodes multi-view images comprising a target-view image and at least one different-view image which is captured from a different viewpoint from a view point capturing the target-view image, the target-view image and the different-view image being captured at a same time, the method comprising:
partitioning the target-view image into a plurality of base blocks to be currently decoded;
predicting a base block among the base blocks based on the different-view image and a different-time image which is captured at a time different from a time when the target-view image is captured, the target-view image and the different-time image being captured from a same viewpoint;
decoding a difference between a prediction result of the base block and the base block; and
reconstructing the base block based on the prediction result and the difference,
wherein the base block is predicted by using the different-time image and the different-view image as reference images, and the predicting the base block comprises performing weighted prediction by applying at least one weighting value to the reference images.
12. A method for image prediction of a multi-view video codec, which encodes multi-view images comprising a base-view target-view image and at least one different-view image which is captured from a different viewpoint from a viewpoint capturing the base-view target-view image, the base-view target-view image and the different-view image being captured at a same time, the method comprising:
partitioning the base-view target-view image into a plurality of base blocks to be currently encoded;
predicting a base block among the base blocks based on the different-view image or and a different-time image which is captured at a time different from a time when the base-view target-view image is captured, the base-view target-view image and the different-time image being captured from a same viewpoint;
acquiring a difference between a prediction result of the base block and the base block, and encoding the difference; and
generating a bitstream comprising the encoded difference,
wherein in case where the base block is predicted by using the different-time image and the different-view image as reference images, and the predicting the base block comprises performing weighted prediction by applying at least one weighting value to the base-view image and the different-view image reference images.
1. A method for image prediction of a multi-view video codec, which encodes multi-view images comprising a base-view target-view image and at least one different-view image which is captured from a different viewpoint from a viewpoint capturing the base-view target-view image, the base-view target-view image and the different-view image being captured at a same time, the method comprising:
partitioning the base-view target-view image into a plurality of base blocks to be currently encoded;
predicting a base block among the base blocks based on the different-view image and a different-time image Which which is captured at a time different from a time when the base-view target-view image is captured, the base-view target-view image and the different-time image being captured from a same viewpoint;
acquiring a difference between a prediction result of the base block and the base block, and encoding the difference; and
generating a bitstream comprising the encoded difference,
wherein the predicting the base block comprises using at least one of a first residual adjusted using a second residual and the second residual adjusted using the first residual, and
wherein the first residual is a difference between the base-view target-view image and the different-time image, and the second residual is a difference between the base-view target-view image and the different-view image.
11. A non-transitory computer-readable recording medium storing a program for executing a method for image prediction of a multi-view video codec, which encodes multi-view images comprising a base-view target-view image and at least one different-view image which is captured from a different viewpoint from a viewpoint capturing the base-view target-view image, the base-view target-view image and the different-view image being captured at a same time, the method comprising:
partitioning the base view target-view image into a plurality of base blocks to be currently encoded;
predicting a base block among the base blocks based on the different-view image and a different-time image which is captured at a time different from a time when the base view target-view image is captured, the base view target-view image and the different-time image being captured from a same viewpoint;
acquiring a difference between a prediction result of the base block and the base block, and encoding the difference; and
generating a bitstream comprising the encoded difference,
wherein the predicting the base block comprises using at least one of a first residual adjusted using a second residual and a second residual adjusted using the first residual, and
wherein the first residual is a difference between the base-view target-view image and the different-time image, and the second residual is a difference between the base-view target-view image and the different-view image.
2. The method of claim 1, wherein information about the first residual and the second residual is contained in a macroblock layer or a higher layer than the macroblock layer.
3. The method of claim 2, wherein the higher layer than the macroblock layer is a slice header extension (SHE), a picture parameter set extension (PPSE), or a sequence parameter set extension (SPSE).
4. The method of claim 1, wherein the adjusted first residual is a result of adding the second residual to the first residual or subtracting the second residual from the first residual, and the adjusted second residual is a result of adding the first residual to the second residual or subtracting the first residual from the second residual.
5. The method of claim 1, wherein the predicting the base block based on the different-view image comprises performing weighted prediction by applying a weighting value to at least one of the base-view target-view image and the different-view image.
6. The method of claim 5, wherein the generating the bitstream comprises including the weighting value in the bitstream.
7. The method of claim 5, wherein in the predicting the base block, a same weighting value is applied to both the base-view target-view image and the different-view image.
8. The method of claim 5, wherein in the predicting the base block, different weighting values are applied to the base-view target-view image and the different-view image, respectively.
9. The method of claim 5, wherein the generating the bitstream comprises including information about the weighted prediction in the bitstream.
10. The method of claim 1, wherein the different-time image is an image which is the most similar to the base-view target-view image among a plurality of different-time images with respect to the base-view target-view image, and the different-view image is an image which is the most-similar to the base-view target-view image among a plurality of different-view images with respect to the base-view target-view image.
13. The method of claim 12, wherein the generating the bitstream comprises including the weighting value in the bitstream.
14. The method of claim 12, wherein in case where the base block is predicted by using the different-view image, a same weighting value is applied to both the base-view target-view image and the different-view image.
15. The method of claim 12, wherein in case where the base block is predicted by using the different-view image, different weighting values are applied to the base-view target-view image and the different-view image, respectively.
16. The method of claim 12, wherein in case where the base block is predicted by using the different-view image, the generating the bitstream comprises of including information about the weighted prediction in the bitstream.
17. The method of claim 12, wherein the different-time image is an image which is the most similar to the base-view target-view image among a plurality of different-time images with respect to the base-view target-view image, and the different-view image is an image which is the most-similar to the base-view target-view image among a plurality of different-view images with respect to the base-view target-view image.
0. 19. The method of claim 18, wherein the predicting the base block comprises decoding the weighting value from a bitstream.
0. 20. The method of claim 18, wherein the predicting the base block comprises decoding information about the weighted prediction from a bitstream.


where Pred denotes a reference image of a specific size, which is most similar to a target block 210 of FIG. 2 in a temporal/spatial domain and can be represented by motion information, and Res denotes residual information indicating a difference between a reference image and a target block 210.

According to the embodiment of the present invention, to minimize this residual information, a method of using residual information present in an image having a different view is proposed, thereby reducing the residual information being currently encoded/decoded. A video codec can be implemented such that Pred is properly selected to minimize Res. As Pred in the multi-view codec, an image that is proper in terms of view or time may be used. Pred may be defined by the following Equation (2):
Pred=F(Pred′+Res′)   (2)

That is, Pred is obtained by applying a proper filter, e.g., an LPF such as a deblocking filter in H.264, to a value obtained by adding a residual to a certain reference image.

When Equation (2) is applied to Equation (1), the following Equation (3) can be obtained:
Recon=F(Pred′+Res′)+Res″  (3)
where Pred′ and Res′ are a reference image and a residual of an image that the target block 210 references, respectively. A combination of Pred′ and Res′ that are properly induced is used as a reference image of a current image, i.e., a target image, and residual information therebetween is minimized.

If Equation (3) is rearranged with respect to the terms Pred′ and Res′ by distributing F, F(Pred′) is represented by Pred, and Res is represented by F(Res′)+Res″. Thus, a gain is obtained by transmitting Res″ instead of Res as in the related art.

If Pred acquires a reference image in a temporal domain, the term Res′ is obtained from a view domain, whereas if Pred acquires a reference image in a view domain, the term Res′ is obtained from a temporal domain. F( ), which is a filter suitable for the obtained term Res′ may be additionally used. For example, the simplest filter having a filter coefficient {½, ½} may be used, or a filter such as 1/20{1, −4. 20, 20, −4, 1} may be used.

To report the application of the above techniques, information of the following exemplary format may be used:

{“NewPred is equal to 1” specifies that the current slice uses new view prediction method according to the present invention. “NewPred is equal to 0” specifies that the current slice does not use new view prediction method according to the present invention. When NewPred is not present, NewPred is inferred as 0. It can be located in slice layer or higher layer (SPS, PPS, Slice header, slice header extension, SEI).

“ResPredFlag is equal to 1” specifies that the predictor of the current macroblock is derivate as following,

If the current macroblock is coded by inter mode (temporal direction), residual signal of neighbor view(s) and reference block of the current macroblock(s) are used as the predictor (Deblocking filter can be applied on the predictor).

Otherwise (the current macroblock is coded by inter-view mode (view direction)), residual signal of collocated block(s) and reference block of the current macroblock(s) is used as the predictor (Deblocking filter can be applied on the predictor.)

if possible, with the considering of the global disparity. “ResPredFlag is equal to 0” specifies any residual signal is not predicted. When ViewPredFlag is not present, ViewPredFlag is inferred as 0.}.

According to another embodiment of the present invention, a weighting value or a weighted prediction value may be established. For example, if an image has a certain temporal characteristic such as image fading in/out, a weighting value of an image of a different view may be used. The weighting value means information indicating an extent to which image brightness or chrominance signals change over time as illustrated in FIG. 4. FIG. 4 illustrates a weighting-value reference model according to the embodiment of the present invention.

In general, even if images have different views, they may have similar temporal characteristics. That is, in the case where images are gradually brightened, an encoder may send a proper weighting value to a decoder so that the decoder can collectively apply the weighting value to the images of the different views.

If a different light source is used for each view, it may be difficult to collectively apply the weighting value. In this case, a new weighting value must be used.

As the simplest implementation method, weighting information may be defined for each view. However, this method may be inefficient because multiple redundant information pieces may be transmitted.

According to the embodiment of the present invention, in order to reduce redundant information and overcome a limitation caused by using different light sources, a weighting value of a specific view such as a BaseView or VIEW0 of FIG. 4 is shared, and information reporting whether weighting values of different views are used as they are (hereinafter, referred to as weighting information) is used.

For example, as illustrated in FIG. 4, images of VIEW1 may contain weighting information reporting the use of a weighting value of BaseView (View0), and images of View2 may contain weighting information reporting the use of their own weighting values without using the weighting value of BaseView.

The weighting information is inserted in a bit stream to prevent mutual mis-operation between an encoder and a decoder. The weighting information may be contained in a slice header, a slice header extension or a higher layer such as PPS, PPSE, SPS, SPSE or SEI.

To report the application of the above techniques, information of the following exemplary format may be used:

{“baseview_pred_weight_table_flag is equal to 1” specifies that the variables for weighted prediction are inferred. When baseview_pred_weight_table_flag is not present, it shall be inferred as follows:

If baseViewFlag (which indicates whether baseview or not) is equal to 1, base_pred_weight_table_flag shall be inferred to be equal to 0.

Otherwise, baseview_pred_weight_table_flag shall be inferred to be equal to 1.}.

The method for image prediction of a multi-view video codec and the computer readable recording medium therefor according to the embodiments of the present invention, an image that is most similar to an image of a view to be currently compressed is generated by using inter-view prediction, i.e., using images of multiple different views, thereby improving coding efficiency.

The methods for image prediction of a multi-view video codec according to the exemplary embodiments can be realized as programs and stored in a computer-readable recording medium that can execute the programs. Examples of the computer-readable recording medium include CD-ROM, RAM, ROM, floppy disks, hard disks, magneto-optical disks and the like.

As the present invention may be embodied in several forms without departing from the spirit or essential characteristics thereof, it should also be understood that the above-described embodiments are not limited by any of the details of the foregoing description, unless otherwise specified, but rather should be construed broadly within its spirit and scope as defined in the appended claims, and therefore all changes and modifications that fall within the metes and bounds of the claims, or equivalents of such metes and bounds are therefore intended to be embraced by the appended claims.

Kim, Je-Woo, Kim, Yong-Hwan, Park, Ji-Ho, Choi, Byeong-Ho, Shin, Hwa-Seon

Patent Priority Assignee Title
11558597, Aug 13 2018 LG Electronics Inc Method for transmitting video, apparatus for transmitting video, method for receiving video, and apparatus for receiving video
Patent Priority Assignee Title
6122321, May 12 1998 Hitachi America, Ltd.; Hitachi America, Ltd Methods and apparatus for reducing the complexity of inverse quantization operations
6148107, Sep 06 1996 THOMSON LICENSING S A ; THOMSON LICENSING DTV Quantization process for video encoding
6480547, Oct 15 1999 LG Electronics Inc System and method for encoding and decoding the residual signal for fine granular scalable video
6650705, May 26 2000 Mitsubishi Electric Research Laboratories, Inc Method for encoding and transcoding multiple video objects with variable temporal resolution
7280708, Mar 09 2002 Samsung Electronics Co., Ltd. Method for adaptively encoding motion image based on temporal and spatial complexity and apparatus therefor
7742657, Oct 18 2005 Korea Electronics Technology Institute Method for synthesizing intermediate image using mesh based on multi-view square camera structure and device using the same and computer-readable medium having thereon program performing function embodying the same
7903736, Nov 04 2005 Electronics and Telecommunications Research Institute Fast mode-searching apparatus and method for fast motion-prediction
7912298, Sep 16 2004 NTT DoCoMo, Inc Video evaluation device, frame rate determination device, video process device, video evaluation method, and video evaluation program
8005145, Aug 11 2000 Nokia Technologies Oy Method and apparatus for transferring video frame in telecommunication system
8265156, Jul 19 2005 Samsung Electronics Co., Ltd. Video encoding/decoding method and apparatus in temporal direct mode in hierarchical structure
8401080, Jan 09 2002 DOLBY INTERNATIONAL AB Motion vector coding method and motion vector decoding method
9042439, Jan 14 2005 HUMAX CO , LTD Reference frame ordering for multi-view coding
20030169817,
20030169933,
20030202592,
20050152450,
20050175093,
20050249288,
20060165303,
20060222079,
20060262853,
20060262856,
20070019727,
20070064800,
20070081814,
20070183495,
20070237232,
20070274388,
20080170618,
KR1020060065553,
/
Executed onAssignorAssigneeConveyanceFrameReelDoc
Sep 06 2018Korea Electronics Technology Institute(assignment on the face of the patent)
Date Maintenance Fee Events
Sep 06 2018BIG: Entity status set to Undiscounted (note the period is included in the code).
Sep 07 2018SMAL: Entity status set to Small.
May 26 2020M2551: Payment of Maintenance Fee, 4th Yr, Small Entity.
May 26 2020M2554: Surcharge for late Payment, Small Entity.
Dec 27 2023M2552: Payment of Maintenance Fee, 8th Yr, Small Entity.


Date Maintenance Schedule
Mar 03 20234 years fee payment window open
Sep 03 20236 months grace period start (w surcharge)
Mar 03 2024patent expiry (for year 4)
Mar 03 20262 years to revive unintentionally abandoned end. (for year 4)
Mar 03 20278 years fee payment window open
Sep 03 20276 months grace period start (w surcharge)
Mar 03 2028patent expiry (for year 8)
Mar 03 20302 years to revive unintentionally abandoned end. (for year 8)
Mar 03 203112 years fee payment window open
Sep 03 20316 months grace period start (w surcharge)
Mar 03 2032patent expiry (for year 12)
Mar 03 20342 years to revive unintentionally abandoned end. (for year 12)