Video coding method and decoding method and devices thereof

Video coding method and decoding method and devices thereof
RE39455

A new predictive coding is used to increase the temporal frame rate and coding efficiency without introducing excessive delay. Currently the motion vector for the blocks in the bi-directionally predicted frame is derived from the motion vector of the corresponding block in the forward predicted frame using a linear motion model. This however is not effective when the motion in the image sequence is not linear. The efficiency of this method can be further improved if a non-linear motion model is used. In this model a delta motion vector is added to or subtracted from the derived forward and backward motion vector, respectively. The encoder performs an additional search to determine if there is a need for the delta motion vector. The presence of this delta motion vector in the transmitted bitstream is signalled to the decoder which then takes the appropriate action to make use of the delta motion vector to derive the effective forward and backward motion vectors for the bi-directionally predicted block.

PTO Wrapper PDF
Dossier Espace Google

Patent RE39455
Priority Dec 27 1995
Filed Oct 18 2000
Issued Jan 02 2007
Expiry Dec 27 2016
Inventors Tan, Thiow…
Assg.orig Matsushita…
Assg.curr Matsushita…
Entity Large
Referenced by 3
References 18
Maint.: all paid

SUMMARY OF THE INVEN…
BRIEF DESCRIPTION OF…
PREFERRED EMBODIMENTS

11. A method for encoding a sequence of video image frames comprising the steps of:

dividing a source sequence into a plurality of groups of pictures, each group of pictures comprising a first frame (i-frame) followed by a plurality of pairs of predictively encoded frames (pb-frame pairs);

dividing each i-frame or pb-frame pair into a plurality of blocks;

encoding the blocks from the i-frame;

predictively encoding the blocks from the second frame of the pb-frame pair;

bi-directionally predictively encoding the blocks from the first frame of a pb-frame pair (b-blocks);

deriving a scaled forward motion vector and a scaled backward motion vector for the b-block;

obtaining a final forward motion vector for the b-block by adding a delta motion vector to the scaled forward motion vector; and

obtaining a final backward motion vector for the b-block by subtracting the delta motion vector from the scaled backward motion vector.

12. An apparatus for encoding a sequence of video image frames comprising:

means for dividing a source sequence into a plurality of groups of pictures, each group of pictures comprising a first frame (i-frame) followed by a plurality of pairs of predictively encoded frames (pb-frame pairs);

means for dividing each i-frame or pb-frame pair into a plurality of blocks;

means for encoding the blocks from the i-frame;

means for predictively encoding the blocks from the second frame of the pb-frame pair;

means for bi-directionally predictively encoding the blocks from the first frame of a pb-frame pair (b-blocks);

means for deriving a scaled forward motion vector and a scaled backward motion vector for the b-block;

means for obtaining a final forward motion vector for the b-block by adding a delta motion vector to the scaled forward motion vector; and

means for obtaining a final backward motion vector for the b-block by subtracting the delta motion vector from the scaled backward motion vector.

0. 13. A method for encoding a sequence of video image frames comprising the steps of:

dividing a source sequence into a group of pictures, each group of pictures comprising an i-frame followed by a plurality of p-frames and b-frames,

dividing each i-frame, p-frame and b-frame into a plurality of spatially non-overlapping blocks of pixel data;

encoding a block in the i-frame independently from any other frames in the group of pictures;

predictively encoding a block in a p-frame, based on the i-frame positioned before the p-frame or a previous p-frame positioned before the p-frame;

bi-directionally predictively encoding a block in a b-frame, based on the i-frame positioned before the b-frame or the previous p-frame and the p-frame positioned after the b-frame;

deriving a scaled forward motion vector and a scaled backward motion vector for the block in the b-frame by scaling a motion vector of the block predictively encoded in the p-frame positioned after the b-frame;

obtaining a final forward motion vector for the block in the b-frame by adding a delta motion vector to the scaled forward motion vector; and

obtaining a final backward motion vector for the block in the b-frame by adding the delta motion vector to the scaled backward motion vector.

5. A method for decoding a sequence of video image frames comprising the steps of:

decoding the compressed video image sequence as a set of group of pictures, each group of pictures comprising an i-frame followed by a plurality of pb-frame pairs, each pb-frame pair having a corresponding p-block;

decoding each i-frame or pb-frame pair into a plurality of spatially non-overlapping blocks of pixel data;

decoding the i-blocks from the i-frame independently from any other frames in the group of pictures;

predictively decoding the p-blocks from the second frame of the pb-frame pair based on the i-blocks in the previous i-frame or the p-blocks in the previous pb-frame pair;

bi-directionally predictively decoding the b-blocks from the first frame of the pb-frame pair based on the i-blocks in the previous i-frame or the p-blocks in the previous pb-frame pair and the corresponding p-block in the current pb-frame pair;

deriving a scaled forward motion vector and a scaled backward motion vector for the b-block by scaling the motion vector of the corresponding p-block in the current pb-frame pair;

obtaining a final forward motion vector for the b-block by adding a delta motion vector to the scaled forward motion vector; and

obtaining a final backward motion vector for the b-block by subtracting the delta motion vector from the scaled backward motion vector.

1. A method for encoding a sequence of video image frames comprising the steps of:

dividing a source sequence into a set of group of pictures, each group of pictures comprising a first frame (i-frame) followed by a plurality of pairs of predictively encoded frames (pb-frame pairs), each pb-frame pair having a corresponding p-block;

dividing each i-frame or pb-frame pair into a plurality of spatially non-overlapping blocks of pixel data;

encoding the blocks from the i-frame (i-blocks) independently from any other frames in the group of pictures;

predictively encoding the blocks from the second frame of the pb-frame pair (p-blocks), based on the i-blocks in the previous i-frame or the p-blocks in the previous pb-frame pair;

bi-directionally predictively encoding the blocks from the first frame of the pb-frame pair (b-blocks), based on the i-blocks in the previous i-frame or the p-blocks in the previous pb-frame pair and the corresponding p-block in the current pb-frame pair;

deriving a scaled forward motion vector and a scaled backward motion vector for the b-block by scaling the motion vector of the corresponding p-block in the current pb-frame pair;

obtaining a final forward motion vector for the b-block by adding a delta motion vector to the scaled forward motion vector; and

obtaining a final backward motion vector for the b-block by subtracting the delta motion vector from the scaled backward motion vector.

10. An apparatus for decoding a sequence of video image frames comprising:

means for decoding each frame in a sequence of video image frames into a set of group of pictures, each group of pictures comprising an i-frame followed by a plurality of pb-frame pairs;

means for decoding the i-blocks of the i-frame independently of any other frames in the group of pictures;

means for storing the decoded i-blocks to predictively decode subsequent frames;

means for decoding the p-blocks of the second frame of the pb-frame pair based on the i-blocks in the previous i-frame or the p-blocks in the previous pb-frame pair;

means for storing the decoded p-blocks to predictively decode subsequent frames;

means for deriving a scaled forward motion vector and a scaled backward motion vector for a b-block by scaling the motion vector of the corresponding p-block in the current pb-frame pair, the b-block being the first frame of the pb-frame pair;

means for obtaining a final forward motion vector for the b-block by adding a delta motion vector to the scaled forward motion vector;

means for obtaining a final backward motion vector for the b-block by subtracting the delta motion vector to the scaled backward motion vector; and

means for decoding the b-blocks of the first frame of the pb-frame pairs based on the i-blocks in the previous i-frame of the p-blocks in the previous pb-frame pair and the corresponding p-block in the current pb-frame pair using the final forward motion vector and the final backward motion vector.

9. An apparatus for encoding a sequence of video image frames comprising:

means for encoding each frame in a sequence of video image frames into a set of group of pictures, each group of pictures comprising an i-frame followed by a plurality of pb-frame pairs;

means for dividing the i-frame and the pb-frame pair into a plurality of spatially non-overlapping blocks of pixel data;

means for encoding and decoding the i-blocks of the i-frame independently from any other frames in the group of pictures;

means for storing the decoded i-blocks to predictively encode subsequent frames;

means for predictively encoding and decoding the p-blocks of the second frame of the pb-frame pair based on the i-blocks in the previous i-frame or the p-blocks in the previous pb-frame pair;

means for storing the decoded p-blocks to predictively encode subsequent frames;

means for obtaining a final forward motion vector for the b-block by adding a delta motion vector to the scaled forward motion vector;

means for obtaining a final backward motion vector for the b-block by subtracting the same delta motion vector from the scaled backward motion vector; and

means for encoding the b-blocks of the first frame of the pb-frame pairs based on the i-blocks in the previous i-frame or the p-blocks in the previous pb-frame pair and the corresponding p-block in the current pb-frame pair using the final forward motion vector and the final backward motion vector.

2. A method for encoding a sequence of video image frames according to claim 1, wherein

the scaling of the motion vector is based on a temporal reference of the first and second frames of the pb-frame pair.

3. A method for encoding a sequence of video image frames according to claim 1, further comprising the step of forming an encoded output, wherein the encoded output is a bitstream comprising:

temporal reference information for the first and second frames of the pb-frame pairs;

motion vector information for the p-blocks;

quantized residual error information for the p-blocks;

delta motion vector information for the b-blocks; and

quantized residual error information for the b-blocks.

4. A method for encoding a sequence of video image frames according to claim 3, wherein

the output bitstream contains additional information to indicate the presence of at least one of:

the delta motion vector information for the b-blocks; and

the quantized residual error information for the b-blocks.

6. A method for decoding a sequence of video image frames according to claim 5, further comprising the step of forming a decoded output, wherein the decoded output is responsive to a bitstream comprising:

temporal reference information for the first and second frames of the pb-frame pairs;

motion vector information for the p-blocks;

quantized residual error information for the p-blocks;

the delta motion vector information for the b-blocks; and

quantized residual error information for the b-blocks.

7. A method for decoding a sequence of video image frames according to claim 6, wherein

the bitstream contains additional information to indicate the presence of at least one of:

the delta motion vector information for the b-blocks; and

the quantized residual error information for the b-blocks.

8. A method of decoding a sequence of video image frames according to claim 5, wherein

the scaling is based on a temporal reference of the first and second frames of the pb-frame pair.

0. 14. A method for encoding a sequence of video image frames according to claim 13, wherein the deriving step includes

scaling of the forward and backward motion vectors is based on a temporal reference of the p-frame and b-frame.

0. 15. A method for encoding a sequence of video image frames according to claim 13, further comprising the step of forming an encoded output, wherein the encoded output is a bitstream comprising:

temporal reference information for the b-frame and the p-frame;

motion vector information for the block in the p-frame;

quantized residual error information for the block in the p-frame;

delta motion vector information for the block in the b-frame; and

quantized residual error information for the block in the b-frame.

0. 16. A method for encoding a sequence of video image frames according to claim 15, wherein

the output bitstream contains additional information indicating a presence of at least one of the delta motion vector information for the block in the b-frame; and the quantized residual error information for the block in the b-frame.

MV_B=((TR_B−TR_P)×MV)/TR_P (2)
where

- MV is the motion vector of the P-block,
- MV_Fand MV_Bare the forward and backward motion vectors for the B-block,
- TR_Bis the increment in the temporal reference from the last P-frame to the current B-frame, and
- TR_Pis the increment in the temporal reference from the last P-frame to the current P-frame.

Currently the method used in the prior art assumes a linear motion model. However this assumption is not valid in a normal scene where the motion is typically not linear. This is especially true when the camera shakes and when objects are not moving at constant velocities.

A second problem involves the quantization and transmission of the residual of the prediction error in the B-block Currently the coefficients from the P-block and the B-block are interleaved in some scanning order which requires the B-block efficients to be transmitted even when they are all zero. This is not very efficient as it is quite often that there are no residual coefficients to transmit (all coefficients are zero).

SUMMARY OF THE INVENTION

In order to solve the first problem, the current invention employs a delta motion vector to compensate for the non-linear motion. Thus it becomes necessary for the encoder to perform an additional motion search to obtain the optimum delta motion vector that when added to the derived motion vectors would result in the best match in the prediction This delta motion vectors are transmitted to the decoder at the block level only when necessary. A flag is used to indicate to the decoder if there are delta motion vectors present for the B-block.

For the second problem, this invention also uses a flag to indicate if there are coefficients for the B-block to be decoded.

The operation of the Invention is described as follows.

FIG. 3a shows the linear motion model used for the derivation of the forward and backward motion vectors from the P-block motion vector and the temporal reference information As illustrated in FIG. 3b, this model breaks down when the motion is not linear. The derived forward and backward motion vector is different from the actual motion vector when the motion is not linear. This is especially true when objects in the scene are moving at changing velocities.

In the current invention the problem is solved by adding a small delta motion vector to the derived motion vector to compensate for the difference between the derived and true motion vector. Therefore the equations in (1) and (2) are now replaced by equations (3) and (4), respectively.
MV_F′=(TR_B×MV)/TR_P+MV_Delta (3)
MV_B′=((TR_B−TR_P)×MV)/TR_P−MV_Delta (4)
where

- MV is the motion vector of the P-block,
- MV_Deltais the delta motion vector,
- MV_F′ and MV_B′ are the new forward and backward motion vectors for the B-block according to the current invention,
- TR_Bis the increment in the temporal reference from the last P-frame to the current B-frame, and
- TR_Pis the increment in the temporal reference from the last P-frame to the current P-frame.
- Note: Equations (3) and (4) are used for the motion vector in the horizontal as well as the vertical directions. Thus the motion vectors are in pairs and there are actually two independent delta motion vectors, one each for the horizontal and vertical directions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1a is a prior art which illustrates the prediction mode used in the ITU-T Recommendation H.261 Standard.

FIG. 1b is a prior art which illustrates the prediction mode used in the ISO-IEC/JTC MPEG Standard.

FIG. 2a illustrates the PB-frame prediction mode.

FIG. 2b illustrates the B-block bi-directional prediction mode.

FIG. 3a illustrates the linear motion model.

FIG. 3b illustrates the non-linear motion model of the current invention

FIG. 4 illustrates the encoder functionality block diagram.

FIG. 5 illustrates the B-block bi-directional prediction functionality block diagram.

FIG. 6 illustrates the decoder functionality block diagram.

PREFERRED EMBODIMENTS

The preferred embodiment of the current invention is described here. FIG. 4 illustrates the encoding functionality diagram. The present invention deals with the method for deriving the motion vectors for the B-block. The encoding functionality is presented here for completeness of the embodiment.

The encoding functionality block diagram depicts an encoder using a motion estimation and compensation for reducing the temporal redundancy in the sequence to be coded. The input sequences is organized into a first frame and pairs of subsequent frames. The first frame, hereafter referred to as the I-frame, is coded independent of all other frames. The pairs of subsequent frames, hereafter referred to as PB-frame, consist of a B-frame followed by a P-frame. The P-frame is forward predicted based on the previously reconstructed I-frame or P-frame and the B-frame is bi-directionally predicted based on the previously reconstructed I-frame or P-frame and the information in the current P-frame.

The input frame image sequence, 1, is placed in the Frame Memory 2. If the frame is classified as an I-frame or a P-frame it is passed through line 14 to the Reference Memory 3, for use as the reference frame in the motion estimation of the next PB-frame to be predictively encoded. The signal is then passed through line 13 to the Block Sampling module 4, where it is partitioned into spatially non-overlapping blocks of pixel data for further processing.

If the frame is classified as an I-frame, the sampled blocks are passed through line 16 to the DCT module 7. If the frame is classified as a PB-frame, the sampled blocks are passed through line 17 to the Motion Estimation module 5. The Motion Estimation module 5 uses information from the Reference Frame Memory 3 and the current block 17 to obtain the motion vector for that provides the best match for the P-block The motion vector and the local reconstructed frame, 12, are passed through line 19 and 20, respectively, to the Motion Compensation module 6. The difference image is formed by subtracting the motion compensated decoded frame, 21, from the current P-block, 15. This signal is then passed through line 22 to the DCT module 7.

In the DCT module 7, each block is transformed into the DCT domain coefficients. The transform coefficients are passed through line 23 to the Quantization module 8, where they are quantized. The quantized coefficients are then passed through line 24 to the Run-length & Variable Length Coding module 9. Here the coefficients are entropy coded to form the Output Bit Stream, 25.

If the current block is an I-block or a P-block, the quantized coefficients are also passed through line 26 to the Inverse Quantization module 10. The output of the Inverse Quantization 10, is then passed through line 27 to the Inverse DCT module 11. If the current block is an I-block then the reconstructed block is placed, via line 28, in the Local Decoded Frame Memory 12. If the current block is a P-block then the output of the Inverse DCT 29 is added to the motion compensated output 21, to from the reconstructed block 30. The reconstructed block 30, is then placed in the Local Decoded Frame Memory 12, for the motion compensation of the subsequent frames.

After the P-block have been locally reconstructed, the information is passed again to the Motion Compensation Module 6, where the prediction of the B-block is formed. FIG. 5 shows a more detailed functional diagram for the B-block prediction process. The P-motion vector derived in the Motion Estimation module 51, is passed through line 57 to the Motion Vector Scaling Module 53. Here the forward and backward motion vectors of the B-block is derived using the formula (1) and (2), respectively. In the present embodiment, an additional motion search around these vectors is performed in the Delta Motion Search module 54, to obtain the delta motion vector. In this embodiment the motion vector is obtained by performing the search for all delta motion vector values between −3 and 3. The delta motion vector value that gives the best prediction in terms of the smallest mean absolute difference in the pixel values of the B-block and the prediction block is chosen. The prediction is formed in the Bi-directional Motion Compensation module 55, according to FIG. 2b using the information from the Local Decoded Frame Memory 52, and the Current Reconstructed P-block 50. In the bi-directional prediction, only information available in the corresponding P-block is used to predict the B-block. The average of the P-block information and the information from the Local Decoded Frame is used to predict the B-block. The rest of the B-block is predicted using information from the Local Decoded Frame only.

The prediction difference block is then passed through line 22 to the DCT module 7. The DCT coefficients are then passed through line 23 to the Quantization module 8. The result of the Quantization module 8, is passed through line 24 to the Run-length & Variable Length Coding 9. In this module the presence of the delta motion vector and the quantized residual error in the Output Bitstream 25, is indicated a variable length code, NOB which is the acronym for No B-block This flag is generated in Run-length & Variable Length Coding module 9 based on whether there are residual error in the Quantization module 8 and delta motion vectors found in the Delta Motion Search module 54 is not zero. Table 1 provides the preferred embodiment of the variable length code for the NOB flag. The variable length code of the NOB flag is inserted in the Output Bitstream, 25, prior to the delta motion vector and quantized residual error codes.

TABLE 1

(Variable length code for the NOB flag)
	Quantized Residual	Delta Motion
NOB	Error Coded	Vectors Coded

0	No	No
10	No	Yes
110	Yes	No
111	Yes	Yes

FIG. 6 shows the functional block diagram for the decoder. The Input Bit Stream 31, is passed to the Variable Length & Run Length Decoding module 32. The block and side information are extracted in this module. If the frame is a PB-frame then the bitstream is checked if any delta motion vector and/or quantized residual error coefficients present. The output of the module 32,is passed through line 37 to the Inverse Quantisation module 33. The output of the Inverse Quantization 33, is then passed through line 38 to the Inverse DCT module 34. Here the coefficients are transformed back into the pixel values.

If the current frame is an I-frame then the output of Inverse DCT 34, is passed through line 39 and stored in the Frame Memory 42.

If the current frame is a PB-frame, the side information containing the motion vectors are passed through line 45 to the Motion compensation module 36. The motion Compensation module 36, uses this information and the information in the Local Decoded Memory, 35, to from the motion compensated signal, 44. This signal is then added to the output of the Inverse DCT module 34, to form the reconstruction of the P-block.

The Motion Compensation module 36, then uses the additional information obtained in the reconstructed P-block to obtain the bi-directional prediction for the B-block. The B-block is then reconstructed and placed in the Frame Memory, 42, together with the P-block.

By implementing this invention, the temporal frame rate of the decoded sequences can be effectively doubled at a fraction of the expected cost in bit rate. The delay is similar to that of the same sequence decoded at half the frame rate.

As described above in the present invention a new predictive coding is used to increase the temporal frame rate and coding efficiency without introducing excessive delay. Currently the motion vector for the blocks in the bi-directionally predicted frame is derived from the motion vector of the corresponding block in the forward predicted frame using a linear motion model. This however is not effective when the motion in the image sequence is not linear. According to this invention, the efficiency of this method can be further improved if a non-linear motion model is used. In this model a delta motion vector is added to or subtracted from the derived forward and backward motion vector, respectively. The encoder performs an additional search to determine if there is a need for the delta motion vector. The presence of this delta motion vector in the transmitted bitstream is signalled to the decoder which then takes the appropriate action to make use of the delta motion vector to derive the effective forward and backward motion vectors for the bi-directionally predicted block.

INVENTORS:

Tan, Thiow Keng

THIS PATENT IS REFERENCED BY THESE PATENTS:

Patent	Priority	Assignee	Title
7848410,	Oct 20 2004	Institute For Information Industry	Video decoding methods and devices
8879631,	Nov 30 2007	Dolby Laboratories Licensing Corporation	Temporally smoothing a motion estimate
9232230,	Mar 21 2012	Vixs Systems, Inc.	Method and device to identify motion vector candidates using a scaled motion search

THIS PATENT REFERENCES THESE PATENTS:

Patent	Priority	Assignee	Title
5136378,	Sep 07 1990	Matsushita Electric Industrial Co., Ltd.	Moving picture coding apparatus
5144426,	Oct 13 1989	MATSUSHITA ELECTRIC INDUSTRIAL CO , LTD ,	Motion compensated prediction interframe coding system
5150432,	Mar 26 1990	Kabushiki Kaisha Toshiba	Apparatus for encoding/decoding video signals to improve quality of a specific region
5155593,	Sep 27 1989	Sony Corporation	Video signal coding method
5267334,	May 24 1991	Apple Inc	Encoding/decoding moving images with forward and backward keyframes for forward and reverse display
5293229,	Mar 27 1992	Panasonic Corporation of North America	Apparatus and method for processing groups of fields in a video data compression system
5315326,	Apr 26 1991	JVC Kenwood Corporation	Efficient coding/decoding apparatuses for processing digital image signal
5361105,	Mar 05 1993	Panasonic Corporation of North America	Noise reduction system using multi-frame motion estimation, outlier rejection and trajectory correction
5386234,	Nov 13 1991	Sony Corporation	Interframe motion predicting method and picture signal coding/decoding apparatus
5412428,	Dec 28 1992	Sony Corporation	Encoding method and decoding method of color signal component of picture signal having plurality resolutions
5436665,	Mar 03 1992	Kabushiki Kaisha Toshiba	Motion picture coding apparatus
5467136,	May 31 1991	Kabushiki Kaisha Toshiba	Video decoder for determining a motion vector from a scaled vector and a difference vector
5481310,	Aug 29 1991	Sharp Kabushiki Kaisha	Image encoding apparatus
5905534,	Jul 12 1993	Sony Corporation	Picture decoding and encoding method and apparatus for controlling processing speeds
6104753,	Feb 03 1996	LG Electronics Inc	Device and method for decoding HDTV video
6184935,	Mar 12 1997	Vertex Pacific Limited	Upsampling filter and half-pixel generator for an HDTV downconversion system
6219383,	Jun 30 1997	QUARTERHILL INC ; WI-LAN INC	Method and apparatus for selectively detecting motion vectors of a wavelet transformed video signal
EP651574,

ASSIGNMENT RECORDS Assignment records on the USPTO

Executed on	Assignor	Assignee	Conveyance	Frame	Reel	Doc
Oct 18 2000		Matsushita Electric Industrial Co., Ltd.	(assignment on the face of the patent)

MAINTENANCE FEES AND DATES: Maintenance records on the USPTO

Date	Maintenance Fee Events
Sep 18 2007	ASPN: Payor Number Assigned.
Apr 14 2010	M1553: Payment of Maintenance Fee, 12th Year, Large Entity.

Date	Maintenance Schedule
Jan 02 2010	4 years fee payment window open
Jul 02 2010	6 months grace period start (w surcharge)
Jan 02 2011	patent expiry (for year 4)
Jan 02 2013	2 years to revive unintentionally abandoned end. (for year 4)
Jan 02 2014	8 years fee payment window open
Jul 02 2014	6 months grace period start (w surcharge)
Jan 02 2015	patent expiry (for year 8)
Jan 02 2017	2 years to revive unintentionally abandoned end. (for year 8)
Jan 02 2018	12 years fee payment window open
Jul 02 2018	6 months grace period start (w surcharge)
Jan 02 2019	patent expiry (for year 12)
Jan 02 2021	2 years to revive unintentionally abandoned end. (for year 12)