A motion estimation and compensation technique is provided for interlaced digital video such as video object planes (VOPs). Predictor motion vectors for use in differentially encoding a current field coded macroblock are obtained using the median of motion vectors of surrounding blocks or macroblocks. When a surrounding macroblock is itself interlaced coded, an average motion vector for that macroblock is used, with fractional pixel values being mapped to the half-pixel. When the current block is not interlaced coded but a surrounding block is, the field motion vectors may be used individually or averaged. In a repetitive padding technique for an interlaced coded VOP, the even and odd lines of the VOP and surrounding block are grouped. Within each field, exterior pixels are padded by setting them to the value of the nearest boundary pixel, or to an average of two boundary pixels. The lines are then reordered to provide a single padded reference VOP image.
|
0. 15. A signal carrying a padded digital video image which includes a field coded video object plane (VOP) having top and bottom field pixel lines carried in an interleaved order to provide a reference padded VOP, said VOP being carried, at least in part, in a region which includes pixels which are exterior to boundary pixels of said VOP, said signal including:
a top field block comprising top field pixel lines reordered from said interleaved order; a bottom field block comprising bottom field pixel lines reordered from said interleaved order; and separately padded exterior pixels within said respective top and bottom field blocks.
0. 20. A communications channel carrying a padded digital video image signal which includes a field coded video object plane (VOP) having top and bottom field pixel lines carried in an interleaved order to provide a reference padded VOP, said VOP being carried, at least in part, in a region which includes pixels which are exterior to boundary pixels of said VOP, said signal including:
a top field block comprising top field pixel lines reordered from said interleaved order; a bottom field block comprising bottom field pixel lines reordered from said interleaved order; and separately padded exterior pixels within said respective top and bottom field blocks.
1. A method for padding a digital video image which includes a field coded video object plane (VOP) comprising top and bottom field pixel lines carried in an interleaved order to provide a reference padded VOP, said VOP being carried, at least in part, in a region which includes pixels which are exterior to boundary pixels of said VOP, said method comprising the steps of:
reordering said top and bottom field pixel lines from said interleaved order to provide a top field block comprising said top field pixel lines, and a bottom field block comprising said bottom field pixel lines; and padding said exterior pixels separately within said respective top and bottom field blocks.
6. An apparatus for padding a digital video image which includes a field coded video object plane (VOP) comprising top and bottom field pixel lines carried in an interleaved order to provide a reference padded VOP, said VOP being carried, at least in part, in a region which includes pixels which are exterior to boundary pixels of said VOP, said apparatus comprising:
means for reordering said top and bottom field pixel lines from said interleaved order to provide a top field block comprising said top field pixel lines, and a bottom field block comprising said bottom field pixel lines; and means for padding said exterior pixels separately within said respective top and bottom field blocks.
0. 13. A decoder for recovering a padded digital video image which includes a field coded video object plane (VOP) comprising top and bottom field pixel lines carried in an interleaved order to provide a reference padded VOP, said VOP being carried, at least in part, in a region which includes pixels which are exterior to boundary pixels of said VOP, said decoder comprising:
a detector for detecting padding in the exterior pixels separately within respective top and bottom field blocks; said top and bottom field blocks being representative of said top and bottom field pixel lines reordered from said interleaved order, said top field block comprising reordered data from said top field pixel lines, and said bottom field block comprising reordered data from said bottom field pixel lines.
0. 16. A communications signal for use in a system in which horizontal and vertical motion vector components are used to differentially encode respective horizontal and vertical motion vector components of a current block of a digital video image, wherein:
candidate first, second and third blocks have associated horizontal and vertical motion vector components; said first block being at least a portion of a first macroblock which immediately precedes said current block in a current row; said second block being at least a portion of a second macroblock which is immediately above said current block in a preceding row; said third block being at least a portion of a third macroblock which immediately follows said second macroblock in said preceding row; and at least one of said first, second and third candidate blocks and said current block is field-coded; said communications signal including at least one of: a selected horizontal motion vector component used to differentially encode the horizontal motion vector component of said current block according to a value derived from the horizontal motion vector components of said first, second and third candidate blocks, a selected vertical motion vector component used to differentially encode the vertical motion vector component of said current block according to a value derived from the vertical motion vector components of said first, second and third candidate blocks. 0. 21. A communications channel carrying a signal for use in a system in which horizontal and vertical motion vector components are used to differentially encode respective horizontal and vertical motion vector components of a current block of a digital video image, wherein:
candidate first, second and third blocks have associated horizontal and vertical motion vector components; said first block being at least a portion of a first macroblock which immediately precedes said current block in a current row; said second block being at least a portion of a second macroblock which is immediately above said current block in a preceding row; said third block being at least a portion of a third macroblock which immediately follows said second macroblock in said preceding row; and at least one of said first, second and third candidate blocks and said current block is field-coded; said communications signal including at least one of: a selected horizontal motion vector component used to differentially encode the horizontal motion vector component of said current block according to a value derived from the horizontal motion vector components of said first, second and third candidate blocks, a selected vertical motion vector component used to differentially encode the vertical motion vector component of said current block according to a value derived from the vertical motion vector components of said first, second and third candidate blocks. 2. The method of
reordering said top and bottom field pixel lines comprising said padded exterior pixels back to said interleaved order to provide said reference padded VOP.
3. The method of
when a particular one of said exterior pixels is located between two of said boundary pixels of said VOP in the corresponding field block, said padding step comprises the further step of: assigning said particular one of said exterior pixels a value according to an average of said two boundary pixels.
4. The method of
when a particular one of said exterior pixels is located between one of said boundary pixels of said VOP and an edge of said region in the corresponding field block, but not between two of said boundary pixels of said VOP in the corresponding field block, said padding step comprises the further step of: assigning said particular one of said exterior pixels a value according to said one of said boundary pixels.
5. The method of
when a particular one of said exterior pixels is located between two edges of said region in the corresponding field block, but not between one of said boundary pixels of said VOP and an edge of said region in the corresponding field block, and not between two of said boundary pixels of said VOP in the corresponding field block, said padding step comprises the further step of: assigning said particular one of said exterior pixels a value according to at least one of: (a) a padded exterior pixel which is closest to said particular one of said exterior pixels moving horizontally in said region in the corresponding field block; and (b) a padded exterior pixel which is closest to said particular one of said exterior pixels moving vertically in said region in the corresponding field block. 7. The apparatus of
means for reordering said top and bottom field pixel lines comprising said padded exterior pixels back to said interleaved order to provide said reference padded VOP.
8. The apparatus of
said means for padding comprises means for assigning; and when a particular one of said exterior pixels is located between two of said boundary pixels of said VOP in the corresponding field block, said means for assigning assigns said particular one of said exterior pixels a value according to an average of said two boundary pixels.
9. The apparatus of
said means for padding comprises means for assigning; and when a particular one of said exterior pixels is located between one of said boundary pixels of said VOP and an edge of said region in the corresponding field block, but not between two of said boundary pixels of said VOP in the corresponding field block, said means for assigning assigns said particular one of said exterior pixels a value according to said one of said boundary pixels.
10. The apparatus of
said means for padding comprises means for assigning; and when a particular one of said exterior pixels is located between two edges of said region in the corresponding field block, but not between one of said boundary pixels of said VOP and an edge of said region in the corresponding field block, and not between two of said boundary pixels of said VOP in the corresponding field block, said means for assigning assigns said particular one of said exterior pixels a value according to at least one of: (a) a padded exterior pixel which is closest to said particular one of said exterior pixels moving horizontally in said region in the corresponding field block; and (b) a padded exterior pixel which is closest to said particular one of said exterior pixels moving vertically in said region in the corresponding field block. 11. The apparatus of
means for using said reference padded VOP for motion prediction of another VOP.
12. The method of
using said reference padded VOP for motion prediction of another VOP.
0. 14. A decoder in accordance with
0. 17. A signal in accordance with
0. 18. A signal in accordance with
0. 19. A signal in accordance with
|
This application is a divisional of co-pending U.S. patent application Ser. No. 08/897,847, filed on Jul. 21, 1997 now pending, and claims the benefit of U.S. Provisional Application No. 60/040,120, filed Mar. 7, 1997, and U.S. Provisional Application No. 60/042,245, filed Mar. 31, 1997.
The present invention relates to a method and apparatus for coding of digital video images such as video object planes (VOPs), and, in particular, to motion estimation and compensation techniques for interlaced digital video. A padding technique for extending the area of an interlaced coded reference VOP is also disclosed.
The invention is particularly suitable for use with various multimedia applications, and is compatible with the MPEG-4 Verification Model (VM) standard described in document ISO/IEC/JTC1/SC29/WG11 N1642, entitled "MPEG-4 Video Verification Model Version 7.0", April 1997, incorporated herein by reference. The MPEG-2 standard is a precursor to the MPEG-4 standard, and is described in document ISO/IEC 13818-2, entitled "Information Technology--Generic Coding of Moving Pictures and Associated Audio, Recommendation H.262," Mar. 25, 1994, incorporated herein by reference.
MPEG-4 is a new coding standard which provides a flexible framework and an open set of coding tools for communication, access, and manipulation of digital audio-visual data. These tools support a wide range of features. The flexible framework of MPEG-4 supports various combinations of coding tools and their corresponding functionalities for applications required by the computer, telecommunication, and entertainment (i.e., TV and film) industries, such as database browsing, information retrieval, and interactive communications.
MPEG-4 provides standardized core technologies allowing efficient storage, transmission and manipulation of video data in multimedia environments. MPEG-4 achieves efficient compression, object scalability, spatial and temporal scalability, and error resilience.
The MPEG-4 video VM coder/decoder (codec) is a block- and object-based hybrid coder with motion compensation. Texture is encoded with an 8×8 Discrete Cosine Transformation (DCT) utilizing overlapped block-motion compensation. Object shapes are represented as alpha maps and encoded using a Content-based Arithmetic Encoding (CAE) algorithm or a modified DCT coder, both using temporal prediction. The coder can handle sprites as they are known from computer graphics. Other coding methods, such as wavelet and sprite coding, may also be used for special applications.
Motion compensated texture coding is a well known approach for video coding, and can be modeled as a three-stage process. The first stage is signal processing which includes motion estimation and compensation (ME/MC) and a two-dimensional (2-D) spatial transformation. The objective of ME/MC and the spatial transformation is to take advantage of temporal and spatial correlations in a video sequence to optimize the rate-distortion performance of quantization and entropy coding under a complexity constraint. The most common technique for ME/MC has been block matching, and the most common spatial transformation has been the DCT.
However, special concerns arise for ME/MC of VOPs, particularly when the VOP is itself interlaced coded, and/or uses reference images which are interlaced coded. Moreover, for arbitrarily shaped VOPs which are interlaced coded, special attention must be paid to the area of the reference image used for motion prediction.
Accordingly, it would be desirable to have an efficient technique for ME/MC coding of a VOP which is itself interlaced coded, and/or uses reference images which are interlaced coded. The technique should provide differential encoding of the motion vectors of a block or macroblock of the VOP using motion vectors of neighboring blocks or macroblocks. A corresponding decoder should be provided. It would further be desirable to have an efficient technique for padding the area of a reference image for coding of interlaced VOPs. The present invention provides a system having the above and other advantages.
In accordance with the present invention, a method and apparatus are presented for motion estimation and motion compensation coding of a video object plane (VOP) or similar video image which is itself interlaced coded, and/or uses reference images which are interlaced coded.
A first method provides horizontal and vertical motion vector components for use in differentially encoding respective horizontal and vertical motion vector components of first and second fields of a current field coded macroblock of a digital video image. Candidate first, second and third blocks near the current macroblock have associated horizontal and vertical motion vector components which can be used for predicting the motion vectors of the current macroblock. The first block immediately precedes the current macroblock of a current row, the second block is immediately above the current macroblock in a preceding row, and the third block immediately follows the second block in the preceding row. Thus, the candidate blocks are in a spatial neighborhood of the current macroblock.
A horizontal motion vector component is selected for use in differentially encoding the horizontal motion vector components of the first and second fields of the current field coded macroblock according to the median of the horizontal motion vector components of the first, second and third candidate blocks. Alternatively, an average or some other weighted function may be used. A vertical motion vector component is determined similarly.
When one of the candidate blocks is a subset of a macroblock, the block which is closest to the upper left-hand portion of the current macroblock is used as the candidate block of that particular macroblock. For example, the candidate block may be an 8×8 block in a 16×16 macroblock.
A second method provides horizontal and vertical motion vector components for use in differentially encoding horizontal and vertical motion vector components, respectively, of a current progressive-predicted or advanced-predicted block of a digital video image. A progressive predicted block may be a 16×16 macroblock. An advanced prediction block uses a combination of 8×8 motion compensation and overlapped block motion compensation. In either case, the current block is not interlaced coded.
Candidate first, second and third blocks have associated horizontal and vertical motion vector components. If at least one of the candidate blocks is a field coded candidate macroblock having first and second fields, then the first and second fields each have corresponding horizontal and vertical motion vector components. A horizontal motion vector component is selected for use in differentially encoding the horizontal motion vector component of the current block according to a value derived from the horizontal motion vector components of the first, second and third candidate blocks.
In particular, the selected horizontal motion vector component may be determined according to a median of the horizontal motion vector components of the candidate blocks, including the corresponding horizontal motion vector components of the first and second fields of the at least one field coded candidate macroblock.
Alternatively, the respective first and second field horizontal motion vector components of the at least one field coded candidate macroblock may be averaged to obtain at least one corresponding averaged horizontal motion vector component. The selected horizontal motion vector components is then determined according to a median of the horizontal motion vector components of the candidate blocks other than the at least one field coded candidate macroblock, if any, and the at least one corresponding averaged horizontal motion vector component.
For example, if all three candidate macroblocks are field (i.e., interlaced) predicted, the horizontal motion vector components of the first and second fields of each candidate macroblock are averaged to obtain three averaged horizontal motion vector components. The selected horizontal motion vector component for differentially encoding the horizontal motion vector component of the current block is then the median of the three averaged motion vector components. A vertical motion vector component is similarly selected.
When first and second field motion vectors of the at least one field coded candidate macroblock are averaged, all fractional pixel offsets are mapped to a half-pixel displacement to provide a better prediction.
In a third method, the current macroblock is field predicted and at least one of the candidate blocks is a field coded macroblock. The selected horizontal motion vector component for use in differentially encoding the horizontal motion vector component of the first field of the current macroblock is determined according to a value derived from (i) the horizontal motion vector components of the candidate blocks other than the at least one field coded candidate macroblock, if any, and (ii) the horizontal motion vector components of the first field of the at least one field coded candidate macroblock. For example, the median may be used. Thus, only the first field of the field predicted candidate macroblock(s) is used. Alternatively, only the second field of the field predicted candidate macroblock(s) can be used to predict the second field of the current macroblock.
In another alternative, the respective first and second field horizontal motion vector components of the at least one field coded candidate macroblock are averaged to obtain at least one corresponding averaged horizontal motion vector component. The selected horizontal motion vector component for use in differentially encoding the horizontal motion vector component(s) of at least one of the first and second fields of the current macroblock is determined according to a median of (i) the horizontal motion vectors of the candidate blocks other than the at least one field coded candidate macroblock, if any, and (ii) the at least one corresponding averaged horizontal motion vector component. A vertical motion vector component is similarly selected.
When the first and second field horizontal motion vector components of the at least one field coded candidate macroblock are averaged, all fractional pixel offsets are mapped to a half-pixel displacement.
A corresponding decoder method and apparatus are also presented.
A method and apparatus are also presented for padding a digital video image which includes a field coded VOP comprising interleaved top and bottom field pixel lines to provide a padded reference VOP. By padding the VOP, the image area is extended. The VOP is carried, at least in part, in a region which includes pixels which are exterior to boundary pixels of said VOP. The top and bottom field pixel lines are reordered from the interleaved order to provide a top field block comprising the top field pixel lines, and a bottom field block comprising the bottom field pixel lines. The exterior pixels are padded separately within the respective top and bottom field blocks.
After the exterior pixels have been padded, the top and bottom field pixel lines comprising the padded exterior pixels are reordered back to the interleaved order to provide the padded reference image.
During padding, when a particular one of the exterior pixels is located between two of the boundary pixels of the VOP in the corresponding top or bottom field block, the exterior pixel is assigned a value according to an average of the two boundary pixels. When a particular one of the exterior pixels is located between one of the boundary pixels of said VOP and an edge of the region in the corresponding field block, but not between two VOP boundary pixels in the corresponding field block, the exterior pixel is assigned a value according to one of the boundary pixels. The term "between" means bounded by interior pixels along a horizontal or vertical pixel grid line. For example, the region may be a 16×16 macroblock.
When a particular exterior pixel is located between two edges of the region in the corresponding field block, but not between a VOP boundary pixel and an edge of the region, and not between two of the VOP boundary pixels, the particular exterior pixel is assigned a value according to at least one of: (a) a padded exterior pixel which is closest to the particular exterior pixel moving horizontally in the region; and (b) a padded exterior pixel which is closest to the particular exterior pixel moving vertically in the region. For example, when padded exterior pixels are available moving both horizontally and vertically from the particular exterior pixel in the region, the average may be used.
A method and apparatus are presented for ME/MC coding of a VOP which is itself interlaced coded, and/or uses reference images which are interlaced coded, and for padding of the area of a reference image for coding of interlaced VOPs.
The frame 105 and VOP data from frame 115 are supplied to separate encoding functions. In particular, VOPs 117, 118 and 119 undergo shape, motion and texture encoding at encoders 137, 138 and 139, respectively. With shape coding, binary and gray scale shape information is encoded. With motion coding, the shape information is coded using motion estimation within a frame. With texture coding, a spatial transformation such as the DCT is performed to obtain transform coefficients which can variable-length coded for compression.
The coded VOP data is then combined at a multiplexer (MUX) 140 for transmission over a channel 145. Alternatively, the data may be stored on a recording medium. The received coded VOP data is separated by a demultiplexer (DEMUX) 150 so that the separate VOPs 117-119 are decoded and recovered. Frames 155, 165 and 175 show that VOPs 117, 118 and 119, respectively, have been decoded and recovered and can therefore be individually manipulated using a compositor 160 which interfaces with a video library 170, for example.
The compositor may be a device such as a personal computer which is located at user's home to allow the user to edit the received data to provide a customized image. For example, the user's personal video library 170 may include a previously stored VOP 178 (e.g., a circle) which is different than the received VOPs. The user may compose a frame 185 where the circular VOP 178 replaces the square VOP 117. The frame 185 thus includes the received VOPs 118 and 119 and the locally stored VOP 178.
In another example, the background VOP 109 may be replaced by a background of the user's choosing. For example, when viewing a television news broadcast, the announcer may be coded as a VOP which is separate from the background, such as a news studio. The user may select a background from the library 170 or from another television program, such as a channel with stock price or weather information. The user can therefore act as a video editor.
The video library 170 may also store VOPs which are received via the channel 145, and may access VOPs and other image elements via a network such as the Internet. Generally, a video session comprises a single VOP, or a sequence of VOPs.
The video object coding and decoding process of
P-VOPs may include a number of macroblocks which may be coded individually using an intra-frame mode or an inter-frame mode. With intra-frame (INTRA) coding, the macroblock is coded without reference to another macroblock. With inter-frame (INTER) coding, the macroblock is differentially coded with respect to a temporally subsequent frame in a mode known as forward prediction. The temporally subsequent frame is known as an anchor frame or reference frame. The anchor frame (e.g., VOP) must be a P-VOP, not a B-VOP.
With forward prediction, the current macroblock is compared to a search area of macroblocks in the anchor frame to determine the best match. A corresponding motion vector describes the relative displacement of the current macroblock relative to the best match macroblock. Additionally, an advanced prediction mode for P-VOPs may be used, where motion compensation is performed on 8×8 blocks rather than 16×16 macroblocks. Moreover, both intra-frame and inter-frame coded P-VOP macroblocks can be coded in a frame mode or a field mode.
B-VOPs can use the forward prediction mode as described above in connection with P-VOPs as well as backward prediction, bi-directional prediction, and direct mode, which are all inter-frame techniques. B-VOPs do not currently use intra-frame coded macroblocks under the MPEG-4 Video Verification Model Version 7.0 referred to previously, although this is subject to change. The anchor frame (e.g., VOP) must be a P-VOP, not a B-VOP.
With backward prediction of B-VOPs, the current macroblock is compared to a search area of macroblocks in a temporally previous anchor frame to determine the best match. A corresponding motion vector describes the relative displacement of the current macroblock relative to the best match macroblock. With bidirectional prediction of B-VOPs, the current macroblock is compared to a search area of macroblocks in both a temporally previous anchor frame and a temporally subsequent anchor frame to determine the best match. Forward and backward motion vectors describes the relative displacement of the current macroblock relative to the best match macroblocks.
With direct mode prediction of B-VOPs, a motion vector is derived for an 8×8 block when the collocated macroblock in the following P-VOP uses the 8×8 advanced prediction mode. The motion vector of the 8×8 block in the P-VOP is linearly scaled to derive a motion vector for the block in the B-VOP without the need for searching to find a best match block.
The encoder, shown generally at 200, includes a shape coder 210, a motion estimation function 220, a motion compensation function 230, and a texture coder 240, which each receive video pixel data input at terminal 205. The motion estimation function 220, motion compensation function 230, texture coder 240, and shape coder 210 also receive VOP shape information input at terminal 207, such as the MPEG-4 parameter VOP_of_arbitrary_shape. When this parameter is zero, the VOP has a rectangular shape, and the shape coder 210 therefore is not used.
A reconstructed anchor VOP function 250 provides a reconstructed anchor VOP for use by the motion estimation function 220 and motion compensation function 230. For P-VOPs, the anchor VOP occurs after the current VOP in presentation order, and may be separated from the current VOP by one or more intermediate images. The current VOP is subtracted from a motion compensated anchor VOP at subtractor 260 to provide a residue which is encoded at the texture coder 240. The texture coder 240 performs the DCT to provide texture information (e.g., transform coefficients) to a multiplexer (MUX) 280. The texture coder 240 also provides information which is summed with the output from the motion compensator 230 at a summer 270 for input to the reconstructed anchor VOP function 250.
Motion information (e.g., motion vectors) is provided from the motion estimation function 220 to the MUX 280, while shape information which indicates the shape of the VOP is provided from the shape coding function 210 to the MUX 280. The MUX 280 provides a corresponding multiplexed data stream to a buffer 290 for subsequent communication over a data channel.
The pixel data which is input to the encoder may have a YUV 4:2:0 format. The VOP is represented by means of a bounding rectangle. The top left coordinate of the bounding rectangle is rounded to the nearest even number not greater than the top left coordinates of the tightest rectangle. Accordingly, the top left coordinate of the bounding rectangle in the chrominance component is one-half that of the luminance component.
A motion compensated difference block is formed by subtracting the pixel values of the predicted block from those of the current block point by point. Texture coding is then performed on the difference block. The coded MV and the coded texture information of the difference block are transmitted to the decoder. The decoder can then reconstruct an approximated current block by adding the quantized difference block to the predicted block according to the MV. The block for ME/MC can be a 16×16 frame block (macroblock), an 8×8 frame block or a 16×8 field block.
In ME/MC, it is generally desirable to have small residue values for the difference block, use few bits for the motion vectors, and have a low computational complexity. Due to its lower computational complexity relative to other difference measures, the Sum of Absolute Difference (SAD) is commonly used to select the motion vector which meets these criteria, as follows. Let {c(i,j),i,j=0,1, . . . , N-1} be the pixels of the current block and {p(m,n),m,n=-R,-R+1, . . . , 1,0,1, . . . ,R+N-1} be the pixels in the search range of the reference frame. Then,
where {x,y=-R,-R+1, . . . ,-1,0,1, . . . ,R-1}, R and C are positive constants.
The (x, y) pair resulting in the minimum SAD value is the optimum full-pixel motion vector (MV) having horizontal and vertical components, e.g., (MVx,MVy). Note that a MV of (0,0) is favored because a positive constant C is subtracted from the SAD when (x, y)=(0,0). For example, for a 16×16 block, C=128. Accordingly, the distribution of MVs can be concentrated near (0, 0) so that entropy coding of the MVs is more efficient.
Accuracy of (MVx,MVy) is set at half-pixel. Interpolation must be used on the anchor frame so that p(i+x,j+y) is defined for x or y being half of an integer. Interpolation is performed as shown in FIG. 3. Integer pixel positions are represented by the symbol "+", as shown at A, B, C and D. Half-pixel positions are indicated by circles, as shown at a, b, c and d. As seen, a=A, b=(A+B)//2 c=(A+C)//2, and d=(A+B+C+D)//4, where "//" denotes rounded division, as discussed below in connection with Tables 2 and 3.
A motion compensated difference block is defined as d(i,j)=c(i,j)-p(i+MVx,j+MVy)i,j=0,1, . . . , N-1. The difference block d(i,j) is transformed, quantized, and entropy coded. At a decoder, the motion vector (MVx, MVy) and quantized difference block {{circumflex over (d)}(i,j)} are available to reconstruct the current frame as follows:
For a color format of Y:U:V=4:2:0, the macroblock size is 16×16 pixels for the Y (luminance) component, and 8×8 pixels for the U/V (chrominance) components. The search range R (in half-pixel units) can be selected by the user and is signified by a parameter called f_code, where R={(xy):-2f
TABLE 1 | ||
f_code | R | |
1 | 32 | |
2 | 64 | |
3 | 128 | |
4 | 256 | |
Motion vectors for the chrominance blocks are derived from those of the luminance blocks. Since one pixel in a chrominance block corresponds to two pixels in each direction in the corresponding luminance block, i.e., the chrominance component is at half the resolution of the luminance, the MV for the chrominance block is one-half the MV of the luminance block. Moreover, since the MV of the luminance block may have half-pixel values, the MV for the chrominance block may consequently have quarter pixel values. But, since only half-pixel interpolation is used for MC of the chrominance block, the quarter pixel values have to be rounded into half-pixel values. Table 2, below, shows how to perform the required rounding operation. For example, ¼ is rounded to ½, {fraction (2/4)} is the same as ½ and ¾ is rounded to ½.
TABLE 2 | |||||
¼ pixel value | 0 | 1 | 2 | 3 | |
½ pixel value | 0 | 1 | 1 | 1 | |
Although only the reconstructed previous frame is available at the decoder for MC, there is a choice at the encoder for ME to use either the reconstructed previous frame or the original previous frame. It is advantageous to use the original previous frame in ME, but not MC, since the complexity is lower, and the MV may represent true motion more closely so that the chrominance components may be more accurately predicted.
For some video sequences, such as where there is fast motion or a scene change, coding the difference block may require more bits than direct DCT coding of the actual intensity values of the current block. Accordingly, it is desirable to have a decision criteria for adaptively choosing to code the current block directly (e.g., INTRA mode) or differentially (e.g., INTER mode). The following parameters are calculated to make the INTRA/INTER decision:
where N is the size of the block (e.g., N=16). The INTRA mode is chosen if A<(SADint er(MVx,MVy)-2*NC) otherwise, the INTER mode is used. Note that the MV in SADinter(MVx,MVy) for this decision is at integer pixel resolution. If the INTRA mode is chosen, no further operations are necessary for the motion search. If the INTER mode is chosen, the motion search continues for the half-pixel MV.
With this technique, the previous frame is extended in all four directions (e.g., left, top, right, and bottom) by repeating (e.g., padding) the border pixels a number of times according to a code word (e.g., f_code, described above in Table 1) which indicates the relative range of motion. With a larger range of motion, a correspondingly larger search area is required. The difference block is generated by applying ME/MC against the extended previous frame and taking the difference of the current block and the predicted block that may be partially out of the frame boundary. This technique improves the coding efficiency of the boundary blocks and can result in an improved image.
For example, with the basic ME/MC technique, a previous frame 400 includes a predicted block 410 which is in a search range 420. The relative macroblock (MB) position of the current frame is shown with a dashed line 430. The corresponding motion vector may be, for example, (MVx, MVy)=(8,0) if the predicted block is displaced eight pixels horizontally to the right and zero pixels vertically.
A field mode image, e.g., a 16×16 macroblock, is shown generally at 600. The macroblock includes even-numbered lines 602, 604, 606, 608, 610, 612, 614 and 616, and odd-numbered lines 603, 605, 607, 609, 611, 613, 615 and 617. The even and odd lines are thus interleaved, and form top and bottom (or first and second) fields, respectively.
When the pixel lines in image 600 are permuted to form same-field luminance blocks, the macroblock shown generally at 650 is formed. Arrows, shown generally at 645, indicate the reordering of the lines 602-617. For example, the even line 602, which is the first line of macroblock 600, is also the first line of macroblock 650. The even line 604 is reordered as the second line in macroblock 650. Similarly, the even lines 606, 608, 610, 612, 614 and 616 are reordered as the third through eighth lines, respectively, of macroblock 650. Thus, a 16×8 luminance region 680 with even-numbered lines is formed. Similarly, the odd-numbered lines 603, 605, 607, 609, 611, 613, 615 and 617 form a 16×8 region 685.
The decision process for choosing the MC mode for P-VOPs is as follows. For frame mode video, first obtain the Sum of Absolute Differences (SAD) for a single 16×16 block, e.g., SAD16(MVx,MVy); and for four 8×8 blocks, e.g., SAD8(MVx1,MVy1), SAD8(MVx2,MVy2), SAD8 (MVx3,MVy3), and SAD8(MVx4,MVy4). If
choose 8×8 prediction; otherwise, choose 16×16 prediction.
For interlaced video, obtain SADtop(MVx
The overall prediction mode decision is based on choosing the minimum of:
(a) SAD16(MVx, MVy).
and (c) SADtop(MVx,MVy)(MVx
If 8×8 prediction is chosen, there are four MVs for the four 8×8 luminance blocks, i.e., one MV for each 8×8 block. The MV for the two chrominance blocks is then obtained by taking an average of these four MVs and dividing the average value by two. Since each MV for the 8×8 luminance block has a half-pixel accuracy, the MV for the chrominance blocks may have a sixteenth pixel value. Table 3, below, specifies the conversion of a sixteenth pixel value to a half-pixel value for chrominance MVs. For example, 0 through 2/16 are rounded to 0, 3/16 through 13/16 are rounded to 1/2, and 14/16 and 15/16 are rounded to 2/2=1.
TABLE 3 | ||||||||||||||||
¼ pixel value | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
½ pixel value | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 |
With field prediction, there are two MVs for the two 16×8 blocks. The luminance prediction is generated as follows. The even lines of the macroblock (e.g., lines 602, 604, 606, 608, 610, 612, 614 and 616) are defined by the top field motion vector using the reference field specified. The motion vector is specified in frame coordinates such that full pixel vertical displacements correspond to even integral values of the vertical motion vector coordinate, and a half-pixel vertical displacement is denoted by odd integral values. When a half-pixel vertical offset is specified, only pixels from lines within the same reference field are combined.
The MV for the two chrominance blocks is derived from the (luminance) motion vector by dividing each component by 2, then rounding as follows. The horizontal component is rounded by mapping all fractional values into a half-pixel offset. This is the same procedure as described in Table 2. The vertical motion vector component is an integer and the resulting chrominance motion vector vertical component is rounded to an integer. If the result of dividing by two yields a non-integral value, it is rounded to the adjacent odd integer. Note that the odd integral values denote vertical interpolation between lines of the same field.
The second aspect of the advanced prediction technique is overlapped MC for luminance blocks. In the following discussion, four MVs are always assumed for a 16×16 luminance block. The case of one 16×16 MV can be considered as having four identical 8×8 MVs. Each pixel is an 8×8 luminance predicted block is a weighted sum of three prediction values specified in the following equation:
where division by eight is with rounding off to the nearest half-pixel, with rounding away from zero. The weighting matrices for H0(i,j), H1(i,j) and H2(i,j) are specified in Tables 4-6, respectively, below. (i,j)=(0,0) is the upper left hand value in each table, and (i,j)=(7,7) is the lower right hand corner value.
In Table 5, the top four rows indicate the top neighbor motion vector weights, while the bottom four rows indicate the bottom neighbor motion vector weights. In Table 6, the four left-hand columns indicate the left-hand neighbor motion vector weights, while the four right-hand columns indicate the right-hand neighbor motion vector weights.
TABLE 4 | ||||||||
4 | 5 | 5 | 5 | 5 | 5 | 5 | 4 | |
5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | |
5 | 5 | 6 | 6 | 6 | 6 | 5 | 5 | |
5 | 5 | 6 | 6 | 6 | 6 | 5 | 5 | |
5 | 5 | 6 | 6 | 6 | 6 | 5 | 5 | |
5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | |
4 | 5 | 5 | 5 | 5 | 5 | 5 | 4 | |
TABLE 5 | ||||||||
2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
1 | 1 | 2 | 2 | 2 | 2 | 1 | 1 | |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
1 | 1 | 2 | 2 | 2 | 2 | 1 | 1 | |
2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
TABLE 6 | ||||||||
2 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | |
2 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | |
2 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | |
2 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | |
2 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | |
2 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | |
2 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | |
2 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | |
The values of q(i,j), r(i,j), and s(i,j) are the pixels of the previous frame, defined as follows.
where (MVx0,MVy0) is the MV of the current 8×8 luminance block, (MVx1, MVy1) is the MV of the block either above (for j=0,1,2,3) or below (for j=4,5,6,7) the current block, and (MVx2,MVy2) is the MV of the block either to the left (for i=0,1,2,3) or right (for i=4,5,6,7) of the current block.
Motion vector coding is performed separately on the horizontal and vertical components of the current block. For each MV component in a P-VOP, for example, the median value of the candidate predictors for the same component may be computed, and the difference value between component and median values may be coded using variable length codes.
When interlaced coding tools are used, candidate predictors for field-based motion vectors in a current P-VOP can be obtained as follows. Let macroblock 700 by a current field mode macroblock (e.g., 16×16 pixels). Surrounding macroblocks include a macroblock 710 which immediately precedes the current macroblock 700 in the current row 715, a macroblock 720 which is immediately above the current macroblock in a preceding row 725, and a macroblock 730 which immediately follows the macroblock 720 in the preceding row.
The macroblock 700 has associated first field horizontal and vertical motion vectors MVxf1 and MVyf1, respectively, and second field horizontal and vertical motion vectors MVxf2 and MVyf2, respectively. Vertical and horizontal motion vector components are not shown separately in
When a particular 8×8 sub-block of a macroblock is used as a candidate, the macroblock will have three other sub-blocks with associated horizontal and vertical motion vector components which are suitable for use in differentially encoding the motion vector components of the current field coded macroblock 700. Generally, is desirable to select the sub-block in the particular macroblock which is closest to the upper left-hand portion of the current macroblock as the candidate block as shown.
Predictor horizontal and vertical motion vector components, Px and Py, respectively, can be determined in accordance with the present invention from Px=median (MV1x, MV2x, MV3x) and Py=median (MV1y, MV2y, MV3y). It has been found that the use of the median provides efficient coding. The median is the middle number in an ordered sequence having an odd number of elements, or the average of the two middle numbers of a sequence having an even number of elements. For example, the median of the sequence (1, 2, 4) is 2, and the median of the sequence (1, 2, 4, 10) is 3. The median is the same as the average when there are two numbers in a sequence.
Other functions besides the median may be used. For example, the average may be used, i.e., Px=1/3 MV1x+1/3 MV2x+1/3 MV3x. Alternatively, some other weighting scheme may be used, e.g., Px=0.4 MV1x+0.4 MV2x+0.2 MV3x. Moreover, while three candidate blocks are used in the present example, two or more may be used. Furthermore, the location of the candidate blocks may vary. For example, a candidate block (not shown) which immediately precedes macroblock 720 may be used. Moreover, for coding schemes which employ sufficient buffering capability, candidate blocks which follow the current macroblock 700 in the current row 715, or a subsequent row (not shown) may be used.
For differential coding of the current macroblock to obtain a motion vector difference value, MVD, in accordance with the present invention, both fields use the same predictor. That is, MVDxf1=MVxf1-Px, MVDyf1=MVyf1-Py, MVDxf2=MVxf2-Px, and MVDyf2=MVyf2-Py.
Generally, when the current macroblock is field predicted, and at least one of the spatial neighborhood macroblocks is field predicted, then the candidate motion vector predictors can be generated by using the same field of the candidate blocks. That is, for the first field of the current macroblock, the first field motion vector(s) of the surrounding field predicted macroblock(s) are used. Similarly, for the second field of the current macroblock, the second field motion vector(s) of the surrounding field predicted macroblock(s) are used.
Specifically, the first field horizontal predictor is Pxf1=median(MV1x, MV2xf1, MV3x), the first field vertical predictor is Pxf1=median(MV1y, MV2yf1, MV3y), the second field horizontal predictor is Pxf2=median(MV1, MV2xf2, MV3x), and the second field vertical predictor is Pyf2=median(MV1y, MV2yf2, MV3y). The motion vector difference values are MVDxf1=MVxf1-Pxf1, MVDyf1=MVyf1-Pyf1, MVDxf2=MVxf2-Pxf2, and MVDyf2=MVyf2-Pyf2.
Alternatively, the first and second field motion vectors of macroblock 820 (and any other field mode candidate macroblock) may be averaged to obtain averaged horizontal and vertical motion vector components. The processing then proceeds as discussed in connection with FIG. 7. Specifically, an averaged horizontal motion vector component for macroblock 820 is MV2x=(MV2xf1+MV2xf2)/2, while an averaged vertical motion vector component for macroblock 820 is MV2y=(MV2yf1+MV2yf2)/2. Predictor horizontal and vertical motion vector components, respectively, are Px=median(MV1x, MV2x, MV3x) and Py=median(MV1y, MV2y, MV3y). The motion vector difference values for the current macroblock 700 are MVDxf1=MVxf1-Px, MVDyf1=MVyf1-Py, MVDxf2=MVxf2-Px, and MVDyf2=MVyf2-Py.
When two or more of the candidate macroblocks are field predicted, processing may proceed as above for each field predicted macroblock.
For coding efficiency, to ensure that the vertical component of a field motion vector is an integer, the vertical difference motion vector component is encoded in the bitstream as MVDyf1=(MVyf1-int(Py))/2, where int(Py) means truncate Py in the direction of zero to the nearest integer. This assures that all fractional pixel offsets are mapped to a half-pixel displacement. For example, if MVyf1=4 and Py=3.5, then MVDyf1=(MVyf1-int(Py))/2=(4-int(3.5))/2=(4-3)/2=0.5. Otherwise, without the "int" function, MVDyf1=(MVyf1-Py)/2=(4-3.5)/2=(0.5)/2=0.25, which cannot be coded as efficiently. The factor of 1/2 is used to reduce the magnitude of MVDyf1, thus making it more efficiently coded by the motion vector VLC.
Generally, when the current block is coded as a progressive macroblock or an advanced prediction (8×8) block, and at least one of the coded spatial neighborhood macroblocks is field predicted, the candidate motion vector components can be generated by averaging the first and second field motion vector components, by using both of the first and second field motion vector components as candidates, or by using one of the first and second field motion vector components, but not both. Specifically, in a first option, the motion vector components of the interlaced candidate macroblock(s) are averaged within the block. For example, for macroblock 820, averaged motion vector components are MV2x=(MV2xf1+MV2xf2)/2, and MV2y=(MV2yf1+MV2yf2)/2. Predictor horizontal and vertical motion vector components, respectively, are Px=median(MV1x, MV2x, MV3x) and Py=median(MV1y, MV2y, MV3y), and the motion vector difference values for the current block 912 are MVDx=MVx-Px and MVDy=MVy-Py, MVx and MVy are the horizontal and vertical motion vector components, respectively, of the current block 912.
In a second option, both of the field motion vectors of the macroblock 820 are candidate predictors. For example, for macroblock 820, the predictor horizontal and vertical motion vector components, Px and Py respectively, can be determined from Px=median(MV1x, MV2xf1, MV2xf2, MV3x) and Py=median(MV1y, MV2yf1, MV2yf2, MVy) and the motion vector difference values for the current block 912 are MVDx=MVx-Px and MVDy=MVy-Py.
In a third option, the first field motion vectors of the macroblock 820 are candidate predictors. For example, for macroblock 820, the predictor horizontal and vertical motion vector components, respectively, are Px=median (MV1x, MV2xf1, MV3x) and Py=median(MV1y, MV2yf1, MVy). The motion vector difference values for the current block 912 are MVDx=MVx-Px and MVDy=MVy-Py.
In a fourth option, the second field motion vectors of the macroblock 820 are candidate predictors. For example, for macroblock 820, the predictor horizontal and vertical motion vector components, respectively, are Px=median (MV1x, MV2xf2, MV3x) and Py=median(MV1y, MV2yf2, MVy). The motion vector difference values for the current block 912 are MVDx=MVx-Px and MVDy=MVy-Py.
As discussed previously, when averaging pixel data from first and second fields, all fractional pixel offsets are mapped to a half-pixel displacement for coding efficiency.
The blocks designated by light shading (e.g., such as block 1020) are all boundary blocks, and are processed with normal padding. Blocks designated with darker shading (e.g., such as block 1030) are adjacent to the boundary blocks and are processed with extended padding. Further extended block (not shown) may also be padded. The amount of padding required is related to the parameter f_code discussed above in connection with Table 1. The blocks may be 16×16 luminance blocks or 8×8 chrominance blocks.
Padding fills the areas outside the VOP by repeating the boundary pixels of the VOP. If a pixel outside the VOP can be padded by the repetition of more than one boundary pixel, the average of those particular boundary pixels is used. Normal padding refers to padding within a boundary block. First, each horizontal line of the blocks in frame 1000 is scanned to provide continuous line segments which are either interior to the VOP (e.g., including boundary pixels of the VOP), or exterior to the VOP. If the entire line in a block is interior to the VOP, no padding is performed.
If there are both interior and exterior segments in a block, and the exterior segment is positioned between an interior segment and the edge of the block, the pixels in the exterior segment are set to the interior segment pixel value at the boundary of the VOP which is closest to that particular exterior segment. For example, for the left to right sequence E1-E5 and I6-I16 in a 16 pixel scan line, where "E" denotes and exterior pixel and "I" denotes an interior pixel, E1-E5 are set to I6. For the sequence E1-E5, I6-I10 and E11-E16, E1-E5 are set to I6, and E11-E16 are set to I10.
If an exterior segment is between two interior segments, the exterior segment is filled with the average of the two boundary pixels of the interior segments. For example, for the sequence I1-I5, E6-E10 and I11-I16, E6-E10 are set to (I5+I11)/2.
The above process is repeated for each horizontal and vertical scan line in each block. If a pixel can be padded by both horizontal and vertical boundary pixels, the average is used.
It is possible for an exterior line segment to extend horizontally across a block without encountering an interior line segment. In this case, for each pixel in the line segment, scan horizontally in both directions to find the closest padded exterior pixel. If there is a tie (i.e., padded pixels to the right and left are equidistant from a current pixel), use the pixel to the left of the current pixel.
Similarly, it is possible for an exterior line segment to extend vertically across a block without encountering an interior line segment. In this case, for each pixel in the line segment, scan vertically in both directions to find the closest padded exterior pixel. If there is a tie (i.e., padded pixels above and below are equidistant from a current pixel), use the pixel above the current pixel. The exterior pixel is then replaced by the average of the pixels found in the horizontal and vertical bi-directional scans.
However, when the VOP is interlaced coded, a modified padding technique as described below may be used.
Pixel location which are shaded are part of a VOP. For example, pixels (0,6-15), (1,7-15), (2,8-15), (3,9-15), (4,10-15), (5,11-15), (6,12-15), (7,13-15), (8, 14 and 15), (9,15), (12,15), (13, 14 and 15), (14,13-15), and (15,12-15) are part of a VOP. Unshaded (i.e., exterior) pixels are not part of the VOP. Using the padding technique discussed in connection with
Top field block 1210 includes rows 0, 2, 4, 6, 8, 10, 12 and 14, while bottom field 1220 block includes rows 1, 3, 5, 7, 9, 11, 13 and 15.
Next, each line in the respective blocks is scanned horizontally to provide exterior and interior line segments as discussed previously. For example, (0-8,14) and (13-15,14) are interior lines segments, while (9-12,14) is an exterior line segment.
Repetitive padding is then applied separately for each field. For example, in the top field block 1210, exterior pixels (0,0), (0,2) and (0,4) are set to the value of interior pixel (0,6), exterior pixel (9,14) is set to the average of the values of boundary pixels (8,14) and (13,14), exterior pixel (1,6) is set to the average values of boundary pixels (0,6) and (1,8), and so forth. In the bottom field block 1220, exterior pixels (0,1), (0,3) and (0,5) are set to the value of interior pixel (0,7).
Lastly, after padding, the two field blocks 1210 and 1220 are combined to form a single luminance padded reference VOP. That is, the lines are reordered in the interleaved order shown in FIG. 11.
A shape decoding function 1344 processes the data when the VOP has an arbitrary shape to recover shape information, which is, in turn, provided to a motion compensation function 1350 and a VOP reconstruction function 1352. A texture decoding function 1346 performs an inverse DCT on transform coefficients to recover residue information. For INTPA coded macroblocks, pixel information is recovered directly and provided to the VOP reconstruction function 1352. For INTER coded blocks and macroblocks, the pixel information provided from the texture decoding function 1346 to the reconstructed VOP function 1352 represents a residue between the current macroblock and a reference macroblock.
For INTER coded blocks and macroblocks, a motion decoding function 1348 processes the encoded motion vector data to recover the differential motion vectors and provide them to the motion compensation function 1350 and to a motion vector memory 1349, such as a RAM. The motion compensation function 1350 receives the differential motion vector data and determines a reference motion vector (e.g., motion vector predictor) in accordance with the present invention. The reference motion vector is obtained from one or more of the macroblocks which are in a spatial neighborhood of the current macroblock.
For example, when the encoder provides a reference motion vector which is the median of three neighboring macroblocks, the motion compensation function 1350 must re-calculate the median motion vector components (e.g., horizontal and vertical), and sum the median components with the differential motion vector components of the current macroblock to obtain the full motion vector for the current macroblock. The motion compensation function may also need to have circuitry for averaging motion vector components of top and bottom fields of a field coded neighboring macroblock.
Thus, the motion vector memory 1349 is required to store the full motion vectors of the neighboring macroblocks once these full motion vectors are determined. For example, using the scheme disclosed in
Once the motion compensation function 1350 determines a full reference motion vector and sums it with the differential motion vector of the current macroblock, the full motion vector of the current macroblock is available. Accordingly, the motion compensation function 1350 can now retrieve anchor frame best match data from a VOP memory 1354, such as a RAM, and provide the anchor frame pixel data to the VOP reconstruction function to reconstruct the current macroblock. In accordance with the present invention, the image quality of the reconstructed macroblock is improve by using the full motion vectors of neighboring macroblocks to determine the reference motion vector.
Padding may also be performed by the motion compensation function 1350. The retrieved best match data is added back to the pixel residue at the VOP reconstruction function 1352 to obtain the decoded current macroblock or block. The reconstructed block is output as a video output signal and also provided to the VOP memory 1354 to provide new anchor frame data. Note than an appropriate video data buffering capability may be required depending on the frame transmission and presentation orders since the anchor frame for P-VOPs is the temporally future frame in presentation order.
Generally, the decoder will perform the same steps as the encoder to determine the median or other value which was used as the motion vector predictor for the current VOP or block. Alternatively, when the motion vector predictor is the same as the motion vector of one of the candidate macroblocks, it is possible to transmit a code word to the decoder which designates the particular macroblock. For example, a code 00 may mean the motion vector of the previous block in the same row was used, a code 01 may mean the motion vector of the block above the current block in the previous row was used, and a code 10 may mean the motion vector of the next block in the previous row was used. In this case, the decoder can directly use the motion vector predictor of the designated macroblock and need not access the motion vectors of each of the candidate macroblocks to recalculate the motion vector predictor. A code of 11 may mean that the motion vector predictor is different than the motion vectors of all of the candidate macroblocks, so the decoder must recalculate the motion vector predictor. Those skilled in the art will appreciate that the necessary operations can be implemented in software, firmware or hardware. This processing can be implemented with a relatively low cost and low complexity.
The field Interlaced_information in the third layer 1450 indicates whether a macroblock is interlaced coded, and provides field motion vector reference data which informs the decoder of the coding mode of the current macroblock or block. The decoder uses this information in calculating the motion vector for a current macroblock. For example, if the current macroblock is not interlaced coded but at least one of the reference macroblocks is, then the decoder will average the motion vector components of each of the interlaced coded reference macroblocks for use in determining a reference motion vector for the current macroblock. Alternatively, the motion vector from the top or bottom field in the reference macroblock is used, but not both. If the current macroblock is interlaced coded, then the decoder will know to calculate a reference motion vector separately for each field. The coding mode may also designate which of the candidate macroblocks, if any, has a motion vector which is the same as the reference motion vector used in differentially encoding the motion vector of the current macroblock.
The Interlaced_information field may be stored for subsequent use as required in the motion vector memory 1349 or other memory in the decoder.
The Interlaced_information field may also include a flag dct_type which indicates whether top and bottom field pixel lines in a field coded macroblock are reordered from the interleaved order, e.g., for padding.
It will be appreciated that the arrangement shown in
A bitstream syntax for use in accordance with the present invention is now described. MPEG-4 provides a video syntax with a class hierarchy, including a Video Session (VS), Video Object (VO), Video Object Layer (VOL) or Texture Object Layer (SOL), Group of Video Object Plane, and at the bottom, a Video Object Plane. The Video Object Plane Layer syntax, set forth below, can indicate whether a current macroblock is interlaced coded in accordance with the present invention as shown below. The syntax which is not shaded is part of the present invention (e.g., interlaced; if (interlaced) and top_field_first). Here, the term "interlaced"=1 if the current macroblock is interlaced coded. The term "top_field_first" indicates that the top field of the current macroblock is to be processed first. Other terms are defined in the aforementioned "MPEG-4 Video Verification Model Version 7.0". Only a portion of the conventional syntax is shown for compactness, with the omitted portions being designated by three vertically arranged dots.
Syntax | No. of bits | |
VideoObjectPlane( ) { | ||
VOP_start_code | sc + 8 = 32 | |
do { | ||
modulo_time_base | 1 | |
} while ( modulo_time_base != "0") | ||
VOP_time_increment | 10 | |
VOP_prediction_type | 2 | |
. | ||
. | ||
. | ||
background_composition | 1 | |
VOP_CR | 1-2 | |
change_CR_disable | 1 | |
} | ||
disable_sadct | 1 | |
interlaced | 1 | |
if (interlaced) | ||
top_field_first | 1 | |
if (VOP_prediction_type==`10`) | ||
VOP_dbquant | 2 | |
. | ||
. | 6 | |
. | ||
forward_temporal_ref | 10 | |
if (VOP_prediction_type == "10") { | ||
marker_bit | 1 | |
backward_temporal_ref | 10 | |
} | ||
} | ||
combined_motion_shape_texture_coding ( ) | ||
} | ||
next_start_code ( ) | ||
} | ||
A more detailed macroblock layer syntax in accordance with the present invention when the Interlaced_information field=1 is shown below, field_prediction=1 if the current macroblock is interlaced predicted. The reference field flags have a value of zero for the top field and a value one for the bottom field. For P-VOPs, when field_prediction=1, two motion vector differences follow the syntax, with the top field motion vector followed by the bottom field motion vector.
The syntax also accounts for B-VOPs. In particular, for B-VOPs, when field_prediction=1, two or four motion vector differences are encoded. The order of motion vector differences for an interpolated macroblock is top field forward, bottom field forward, top field backward, and bottom field backward. For unidirectional interlaced prediction (e.g., forward or backward only), the top field motion vector difference is followed by the bottom field motion vector difference.
No. | |||
of | |||
Syntax | bits | Format | |
if (interlaced) { | |||
if ((mbtype == INTRA) ∥ | |||
(mbtype == INTRA_Q) ∥ | |||
(cbp !=0)) | |||
dct_type | 1 | uimbsf | |
if ((P_VOP && ((mbtype == INTER) ∥ | |||
(mbtype = INTER_Q))) ∥ | |||
(B_VOP && (mbtype !== Direct_mode))) { | |||
field_prediction | 1 | uimbsf | |
if (field_prediction) { | |||
if (P_VOP ∥ | |||
(B_VOP && (mbtype != | 1 | ||
Backward))) { | |||
forward_top_field_reference | 1 | uimbsf | |
forward_bottom_field_reference | uimbsf | ||
} | 1 | ||
if (B-VOP && (mbtype != Forward)) { | 1 | uimbsf | |
backward_top_field_reference | uimbsf | ||
backward_bottom_field_reference | |||
} | |||
} | |||
} | |||
} | |||
Accordingly, it can be seen that the present invention provides a method and apparatus for coding of digital video images such as video object planes (VOPs), and, in particular, to motion estimation and compensation techniques for interlaced digital video. A technique for providing predictor motion vectors for use in differentially encoding a current field predicted macroblock uses the median of motion vectors of surrounding blocks or macroblocks. When a surrounding macroblock is itself interlaced coded, an average motion vector for that macroblock can be used, with fractional pixel values being mapped to the half-pixel. When the current block is not interlaced coded but a surrounding block is, the field motion vectors of the surrounding block may be used individually or averaged.
A decoder in accordance with the present invention uses a bitstream syntax to determine the coding mode of a macroblock, e.g., such as whether a macroblock is field coded. The decoder may store the coding mode of neighboring macroblocks for later use.
In a repetitive padding technique for an interlaced coded VOP, the top and bottom lines of the VOP and surrounding block are grouped. Within each field, exterior pixels are padded by setting them to the value of the nearest boundary pixel, or to an average of two boundary pixels. The lines are then reordered to provide a single reference VOP image.
Although the invention has been described in connection with various specific embodiments, those skilled in the art will appreciate that numerous adaptations and modifications may be made thereto without departing from the spirit and scope of the invention as set forth in the claims.
Luthra, Ajay, Chen, Xuemin, Eifrig, Robert O.
Patent | Priority | Assignee | Title |
10116959, | Jun 03 2002 | Microsoft Technology Licesning, LLC | Spatiotemporal prediction for bidirectionally predictive (B) pictures and motion vector prediction for multi-picture reference motion compensation |
10284843, | Jan 25 2002 | Microsoft Technology Licensing, LLC | Video coding |
7020200, | Aug 13 2002 | AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE LIMITED | System and method for direct motion vector prediction in bi-predictive video frames and fields |
7197070, | Jun 04 2001 | Cisco Technology, Inc. | Efficient systems and methods for transmitting compressed video data having different resolutions |
7813429, | Aug 13 2002 | AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE LIMITED | System and method for segmentation of macroblocks |
8063894, | Sep 06 2005 | Seiko Epson Corporation | Image display system, image display device, and image data output device |
8085849, | Nov 03 2006 | Keystream Corporation | Automated method and apparatus for estimating motion of an image segment using motion vectors from overlapping macroblocks |
8265159, | Dec 27 2006 | Panasonic Corporation | Moving picture decoding apparatus |
8520738, | Jun 03 2005 | GOODIX TECHNOLOGY HK COMPANY LIMITED | Video decoder with hybrid reference texture |
8755439, | Dec 27 2006 | Panasonic Corporation | Moving picture decoding apparatus |
8811771, | Aug 22 2008 | Adobe Inc | Content aware slideshows |
8817870, | Jun 04 2001 | Cisco Technology, Inc. | Efficient systems and methods for transmitting compressed video data having different resolutions |
9036692, | Jan 18 2010 | HFI INNOVATION INC | Motion prediction method |
9185427, | Jun 03 2002 | Microsoft Technology Licensing, LLC | Spatiotemporal prediction for bidirectionally predictive (B) pictures and motion vector prediction for multi-picture reference motion compensation |
9571854, | Jun 03 2002 | Microsoft Technology Licensing, LLC | Spatiotemporal prediction for bidirectionally predictive (B) pictures and motion vector prediction for multi-picture reference motion compensation |
9729897, | Jan 18 2010 | HFI INNOVATION INC | Motion prediction method |
9881376, | Aug 22 2008 | Adobe Inc | Content aware slideshows |
9888237, | Jan 25 2002 | Microsoft Technology Licensing, LLC | Video coding |
Patent | Priority | Assignee | Title |
5210605, | Jun 11 1991 | Trustees of Princeton University | Method and apparatus for determining motion vectors for image sequences |
5793895, | Aug 28 1996 | IBM Corporation | Intelligent error resilient video encoder |
5815646, | Apr 13 1993 | AVAGO TECHNOLOGIES GENERAL IP SINGAPORE PTE LTD | Decompression processor for video applications |
5822460, | May 10 1996 | ZTE Corporation | Method and apparatus for generating chrominance shape information of a video object plane in a video signal |
5936671, | Jul 02 1996 | RAKUTEN, INC | Object-based video processing using forward-tracking 2-D mesh layers |
EP707428, | |||
GB2282021, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Dec 20 2001 | General Instrument Corporation | (assignment on the face of the patent) | / | |||
Apr 15 2013 | General Instrument Corporation | GENERAL INSTRUMENT HOLDINGS, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 030764 | /0575 | |
May 28 2013 | GENERAL INSTRUMENT HOLDINGS, INC | Motorola Mobility LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 030866 | /0113 | |
Oct 28 2014 | Motorola Mobility LLC | Google Technology Holdings LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 034301 | /0001 |
Date | Maintenance Fee Events |
Jun 21 2007 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Mar 16 2009 | ASPN: Payor Number Assigned. |
May 23 2011 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Date | Maintenance Schedule |
Aug 10 2007 | 4 years fee payment window open |
Feb 10 2008 | 6 months grace period start (w surcharge) |
Aug 10 2008 | patent expiry (for year 4) |
Aug 10 2010 | 2 years to revive unintentionally abandoned end. (for year 4) |
Aug 10 2011 | 8 years fee payment window open |
Feb 10 2012 | 6 months grace period start (w surcharge) |
Aug 10 2012 | patent expiry (for year 8) |
Aug 10 2014 | 2 years to revive unintentionally abandoned end. (for year 8) |
Aug 10 2015 | 12 years fee payment window open |
Feb 10 2016 | 6 months grace period start (w surcharge) |
Aug 10 2016 | patent expiry (for year 12) |
Aug 10 2018 | 2 years to revive unintentionally abandoned end. (for year 12) |