A motion vector refinement apparatus includes a first storage device, a motion vector predictor (mvp) derivation circuit, and a decoder side motion vector refinement (dmvr) circuit. The mvp derivation circuit derives a first mvp for a current block, stores the first mvp into the first storage device, and performs a new task. The dmvr circuit performs a dmvr operation to derive a first motion vector difference (MVD) for the first mvp. The mvp derivation circuit starts performing the new task before the dmvr circuit finishes deriving the first MVD for the first mvp.
|
10. A motion vector refinement method comprising:
deriving a first motion vector predictor (mvp) for a current block;
storing the first mvp into a first storage device;
performing a decoder side motion vector refinement (dmvr) operation to derive a first motion vector difference (MVD) for the first mvp;
before deriving the first MVD for the first mvp is finished, starting to perform a new task;
reading the first mvp from the first storage device;
combining the first mvp and the first MVD to generate a first refined mv for the current block, wherein the first refined mv is in a first representation format;
converting the first refined mv in the first representation format into a second refined mv in a second representation format; and
storing the second refined mv into a second storage device, wherein the second representation format is different from the first representation format.
15. A motion vector refinement method comprising:
deriving a first motion vector predictor (mvp) for a current block;
storing the first mvp into a first storage device;
performing a decoder side motion vector refinement (dmvr) operation to derive a first motion vector difference (MVD) for the first mvp;
storing the first MVD into the first storage device;
before deriving the first MVD for the first mvp is finished, starting to perform a new task;
reading the first mvp and the first MVD from the first storage device;
combining the first mvp and the first MVD to generate a first refined mv for the current block, wherein the first refined mv is in a first representation format;
converting the first refined mv in the first representation format into a second refined mv in a second representation format, wherein the second representation format is different from the first representation format;
converting the second refined mv in the second representation format into a third refined mv in the first representation format; and
providing the third refined mv for mvp derivation.
18. A motion vector refinement method comprising:
deriving a first motion vector predictor (mvp) for a current block;
storing the first mvp into a first storage device;
performing a decoder side motion vector refinement (dmvr) operation to derive a first motion vector difference (MVD) for the first mvp;
storing the first MVD into the first storage device; and
before deriving the first MVD for the first mvp is finished, starting to perform a new task;
wherein storing the first mvp into the first storage device comprises:
storing the first mvp at a first address of the first storage device;
storing the first MVD into the first storage device comprises:
storing the first MVD at a second address of the first storage device; and
the motion vector refinement method further comprises:
generating a starting read address of a burst mode of the first storage device, wherein in response to the starting read address, the burst mode of the first storage device reads a plurality of consecutive addresses, where the first address and the second address are a part of the plurality of consecutive addresses.
9. A motion vector refinement apparatus comprising:
a first storage device;
a motion vector predictor (mvp) derivation circuit, arranged to derive a first mvp for a current block, store the first mvp into the first storage device, and perform a new task;
a decoder side motion vector refinement (dmvr) circuit, arranged to perform a dmvr operation to derive a first motion vector difference (MVD) for the first mvp, and store the first MVD into the first storage device, wherein the mvp derivation circuit is free to start performing the new task before the dmvr circuit finishes deriving the first MVD for the first mvp;
wherein the mvp derivation circuit is arranged to store the first mvp at a first address of the first storage device, the dmvr circuit is arranged to store the first MVD at a second address of the first storage device, and the motion vector refinement apparatus further comprises:
an address generation circuit, arranged to generate a starting read address of a burst mode of the first storage device, wherein the burst mode of the first storage device is arranged to read a plurality of consecutive addresses, and the first address and the second address are a part of the plurality of consecutive addresses.
1. A motion vector refinement apparatus comprising:
a first storage device;
a motion vector predictor (mvp) derivation circuit, arranged to derive a first mvp for a current block, store the first mvp into the first storage device, and perform a new task;
a decoder side motion vector refinement (dmvr) circuit, arranged to perform a dmvr operation to derive a first motion vector difference (MVD) for the first mvp;
a second storage device;
a combining circuit, arranged to read the first mvp from the first storage device, and combine the first mvp and the first MVD to generate a first refined mv for the current block, wherein the first refined mv is in a first representation format; and
a first format conversion circuit, arranged to receive the first refined mv output from the combining circuit, convert the first refined mv in the first representation format into a second refined mv in a second representation format, and store the second refined mv into the second storage device, wherein the second representation format is different from the first representation format;
wherein the mvp derivation circuit is free to start performing the new task before the dmvr circuit finishes deriving the first MVD for the first mvp.
6. A motion vector refinement apparatus comprising:
a first storage device;
a motion vector predictor (mvp) derivation circuit, arranged to derive a first mvp for a current block, store the first mvp into the first storage device, and perform a new task;
a decoder side motion vector refinement (dmvr) circuit, arranged to perform a dmvr operation to derive a first motion vector difference (MVD) for the first mvp, and store the first MVD into the first storage device, wherein the mvp derivation circuit is free to start performing the new task before the dmvr circuit finishes deriving the first MVD for the first mvp;
a combining circuit, arranged to read the first mvp and the first MVD from the first storage device, and combine the first mvp and the first MVD to generate a first refined mv for the current block, wherein the first refined mv is in a first representation format;
a first format conversion circuit, arranged to receive the first refined mv output from the combining circuit, and convert the first refined mv in the first representation format into a second refined mv in a second representation format, wherein the second representation format is different from the first representation format; and
a second format conversion circuit, arranged to receive the second refined mv output from the first format conversion circuit, convert the second refined mv in the second representation format into a third refined mv in the first representation format, and provide the third refined mv to the mvp derivation circuit.
2. The motion vector refinement apparatus of
3. The motion vector refinement apparatus of
4. The motion vector refinement apparatus of
5. The motion vector refinement apparatus of
a second format conversion circuit, arranged to read the second refined mv from the second storage device, convert the second refined mv in the second representation format into a third refined mv in the first representation format, and provide the third refined mv to the mvp derivation circuit.
7. The motion vector refinement apparatus of
8. The motion vector refinement apparatus of
11. The motion vector refinement method of
12. The motion vector refinement method of
13. The motion vector refinement method of
14. The motion vector refinement method of
reading the second refined mv from the second storage device;
converting the second refined mv in the second representation format into a third refined mv in the first representation format; and
providing the third refined mv for mvp derivation.
16. The motion vector refinement method of
17. The motion vector refinement method of
|
The present invention relates to motion vector refinement, and more particularly, to a motion vector refinement apparatus having a motion vector predictor (MVP) derivation circuit that is allowed to start a new task without waiting for motion vector difference (MVD) computation and an associated motion vector refinement method.
The conventional video coding standards generally adopt a block based coding technique to exploit spatial and temporal redundancy. For example, the basic approach is to divide the whole source picture into a plurality of blocks, perform intra/inter prediction on each block, transform residues of each block, and perform quantization and entropy encoding. Besides, a reconstructed picture is generated in a coding loop to provide reference pixel data used for coding following blocks. For certain video coding standards, in-loop filter(s) may be used for enhancing the image quality of the reconstructed picture.
The video decoder is used to perform an inverse operation of a video encoding operation performed by a video encoder. For example, the video decoder may have a plurality of processing circuits, such as an entropy decoding circuit, an intra prediction circuit, a motion compensation circuit, an inverse quantization circuit, an inverse transform circuit, a reconstruction circuit, and in-loop filter(s). When a merge mode is selected, motion information of a current block in a current picture may be set by motion information of a spatially or temporally neighboring block, and thus may suffer from reduced precision. To refine the merge-mode motion vector (MV) without signaling, a decoder side motion vector refinement (DMVR) algorithm may be employed. Specifically, to refine the merge-mode MV, the DMVR algorithm has to traverse neighbor points to find the minimum sum of absolute difference (SAD), and refer to a position with the minimum SAD to determine a motion vector difference (MVD). For example, the search range is ±2, such that total 25 points are in the search window, and a center point of the search window is pointed to by a motion vector predictor (MVP) derived from motion information of a previously decoded block. The final MV (i.e. refined MV) for the current block in the current picture can be obtained by combining MVP and MVD.
The MVD for refining the MVP determined for the current block in the current picture is not signaled from a video encoder to a video decoder, and is computed at the video decoder through the DMVR algorithm. Specifically, to compute the MVD needed for refining the MVP, reference pixels in a forward reference picture and a backward reference picture are needed to be read from a dynamic random access memory (DRAM), where the forward reference picture (e.g. one picture included in a reference picture list L0) is in the past with respect to the current picture in a display order, the backward reference picture (e.g. one picture included in a reference picture list L1) is in the future with respect to the current picture in the display order, and the distance between the forward reference picture and the current picture is the same as the distance between the current picture and the backward reference picture. As mentioned above, the final MV (i.e. refined MV) for the current block can be obtained when computation of the MVD for the current block is done. In accordance with the conventional video decoder design, computation of the MVP of a next block is not started until the final MV (i.e. refined MV) for the current block is obtained. That is, computation of the MVP for the next block is not started until computation of the MVD for the current block is done. Since computation of the MVD for the current block requires reference pixels read from the DRAM and reading the reference pixels from the DRAM requires several DRAM clock cycles, computation of the MVP of the next block has to wait for an end of the computation of the MVD for the current block due to the DRAM latency, which results in degraded decoder performance.
One of the objectives of the claimed invention is to provide a motion vector refinement apparatus having a motion vector predictor (MVP) derivation circuit that is allowed to start a new task without waiting for motion vector difference (MVD) computation and an associated motion vector refinement method.
According to a first aspect of the present invention, an exemplary motion vector refinement apparatus is disclosed. The exemplary motion vector refinement apparatus includes a first storage device, a motion vector predictor (MVP) derivation circuit, and a decoder side motion vector refinement (DMVR) circuit. The MVP derivation circuit is arranged to derive a first MVP for a current block, store the first MVP into the first storage device, and performs a new task. The DMVR circuit is arranged to perform a DMVR operation to derive a first motion vector difference (MVD) for the first MVP. The MVP derivation circuit is free to start performing the new task before the DMVR circuit finishes deriving the first MVD for the first MVP.
According to a second aspect of the present invention, an exemplary motion vector refinement method is disclosed. The exemplary motion vector refinement method includes: deriving a first motion vector predictor (MVP) for a current block; storing the first MVP into a first storage device; performing a decoder side motion vector refinement (DMVR) operation to derive a first motion vector difference (MVD) for the first MVP; and before deriving the first MVD for the first MVP is finished, starting to perform a new task.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
Certain terms are used throughout the following description and claims, which refer to particular components. As one skilled in the art will appreciate, electronic equipment manufacturers may refer to a component by different names. This document does not intend to distinguish between components that differ in name but not in function. In the following description and in the claims, the terms “include” and “comprise” are used in an open-ended fashion, and thus should be interpreted to mean “include, but not limited to . . . ”. Also, the term “couple” is intended to mean either an indirect or direct electrical connection. Accordingly, if one device is coupled to another device, that connection may be through a direct electrical connection, or through an indirect electrical connection via other devices and connections.
The MVP derivation circuit 102 is arranged to derive a motion vector predictor MVPFX_1 for a current block in a current picture, and stores the motion vector predictor MVPFX_1 into the MV buffer 114. In this embodiment, the motion vector predictor MVPFX_1 is in a first representation format such as a fixed-point format. After the motion vector predictor MVPFX_1 is derived from motion vectors of neighbors of the current block under a merge mode, the MVP derivation circuit 102 is further arranged to provide side information INFMVP to the DMVR circuit 104, and provide side information INFADDR to the address generation circuit 106. In response to the side information INFADDR, the address generation circuit 106 determines read addresses at which reference data D_REF (e.g. reference pixels in the forward reference picture and the backward reference picture) needed by computation of the motion vector difference MVDFX_1 are stored. In other words, the side information INFADDR is sent to the address generation circuit 106 to request the reference data D_REF that are stored in the DRAM 116.
The DMVR circuit 104 is arranged to perform a DMVR operation to derive a motion vector difference MVDFX_1 for the motion vector predictor MVPFX_1 after receiving the side information INFMVP from the MVP derivation circuit 102. For example, the DMVR circuit 104 reads the reference data D_REF from the DRAM 116, calculates 25 SAD values for 25 positions within a search window centered at a position of the current block, finds a minimum SAD value among the 25 SAD values, and determines the motion vector difference MVDFX_1 according to a position with the minimum SAD value. In this embodiment, the motion vector difference MVDFX_1 is also in the first representation format such as the fixed-point format.
Since the motion vector predictor MVPFX_1 obtained by the MVP derivation circuit 102 is stored into the MV buffer 114, the combining circuit 108 is arranged to obtain the motion vector predictor MVPFX_1 from the MV buffer 114 rather than the MVP derivation circuit 102. With the help of the MV buffer 114 that offers MVP buffering between the MVP derivation circuit 102 and the combining circuit 108, the MVP derivation circuit 102 is allowed to start a new task before the DMVR circuit 104 finishes deriving the motion vector difference MVDFX_1 for the motion vector predictor MVPFX_1 of the current block. In some embodiments, the new task includes at least one of deriving a motion vector predictor MVPFX_2 for a next block, reading a data from a storage device (e.g., DRAM 116 or SRAM which is not shown herein) for a later computation, writing a data to the storage device for a later computation, or any other tasks independent from deriving the motion vector difference in order to use free computation resource efficiently. In the embodiment that the new task performs deriving the motion vector predictor MVPFX_2 for the next block, after determining the motion vector predictor MVPFX_2 for the next block, the MVP derivation circuit 102 stores the motion vector predictor MVPFX_2 into the MV buffer 114, and initiates a MVP computation process of a next block. The DRAM latency of reading the reference data D_REF for computation of motion vector difference MVDFX_1 can be fully/partially hidden in a period during which computation of motion vector predictor MVPFX_2 is performed at MVP derivation circuit 102. Since computation of the next motion vector predictor MVPFX_2 does not need to wait for an end of computation of the current motion vector difference MVDFX_1, the decoder performance can be greatly improved.
After the motion vector difference MVDFX_1 is determined by the DMVR circuit 104, the combining circuit 108 is arranged to read the motion vector predictor MVPFX_1 from the MV buffer 114, receive the motion vector difference MVDFX_1 output from the DMVR circuit 104, and combine the motion vector predictor MVPFX_1 and the motion vector difference MVDFX_1 to generate a refined motion vector MVFX_1 (MVFX_1=MVPFX_1+MVDFX_1) for the current block, wherein the refined motion vector MVFX_1 is in the first representation format such as the fixed-point format. To reduce the memory usage, the format conversion circuit 112 is arranged to perform format conversion upon the refined motion vector MVFX_1. Specifically, the format conversion circuit 112 is arranged to receive the refined motion vector MVFX_1 output from the combining circuit 108, convert the refined motion vector MVFX_1 in the first representation format into a refined motion vector MVFP_1 in a second representation format such as a floating-point format, and store the refined motion vector MVFP_ 1 into the DRAM 116 for later use. For example, the refined motion vector MVFX_1 in the first representation format has a bit length of 18, and the refined motion vector MVFP_ 1 in the second representation format has a bit length of 10. It should be noted that there may be a conversion loss resulting from converting an 18-bit fixed-point representation to a 10-bit floating-point representation consisting of, for example, a 4-bit exponent and a 6-bit mantissa.
When the motion vector of the current block is to be selected as a candidate of a motion vector predictor for a later decoded block (e.g. a block in a picture that is in the future with respect to the current picture in the display order), the format conversion circuit 110 is arranged to read the refined motion vector MVFP_ 1 from the DRAM 116, and perform format conversion upon the refined motion vector MVFP_ 1. Specifically, the format conversion circuit 110 is arranged to convert the refined motion vector MVFP_ 1 in the second representation format (e.g. floating-point format) into a refined motion vector MVFX_1′ in the first representation format (e.g. fixed-point format), and provide the refined motion vector MVFX_1′ to the MVP derivation circuit 102. Since there may be a conversion loss resulting from converting a fixed-point representation to a floating-point representation, the refined motion vector MVFX_1′ is not necessarily the same as the refined motion vector MVFX_1.
The format conversion circuit 112 converts a refined motion vector of each block in the first representation format (e.g. fixed-point format) into a refined motion vector in the second representation format (e.g. floating-point format), and stores the refined motion vector in the second representation format (e.g. floating-point format) into the DRAM 116.
The MVP derivation circuit 602 is arranged to derive a motion vector predictor MVPFX_1 for a current block in a current picture, and stores the motion vector predictor MVPFX_1 into the DRAM 614. In this embodiment, the motion vector predictor MVPFX_1 is in a first representation format such as a fixed-point format. After the motion vector predictor MVPFX_1 is derived from motion vectors of neighbors of the current block under a merge mode, the MVP derivation circuit 602 is further arranged to provide side information INFMVP to the DMVR circuit 604, and provide side information INFADDR to the address generation circuit 606. In response to the side information INFADDR, the address generation circuit 606 determines read addresses at which reference data D_REF (e.g. reference pixels in the forward reference picture and the backward reference picture) needed by computation of the motion vector difference MVDFX_1 are stored. In other words, the side information INFADDR is sent to the address generation circuit 606 to request the reference data D_REF that are stored in the DRAM 614.
The DMVR circuit 604 is arranged to perform a DMVR operation to derive a motion vector difference MVDFX_1 for the motion vector predictor MVPFX_1 after receiving the side information INFMVP from the MVP derivation circuit 602, and store the motion vector difference MVDFX_1 into the DRAM 614. For example, the DMVR circuit 604 reads the reference data D_REF from the DRAM 614, calculates 25 SAD values for 25 positions within a search window centered at a position of the current block, finds a minimum SAD value among the 25 SAD values, and determines the motion vector difference MVDFX_1 according to a position with the minimum SAD value. In this embodiment, the motion vector difference MVDFX_1 is also in the first representation format such as the fixed-point format.
In this embodiment, the motion vector predictor MVPFX_1 and the motion vector difference MVDFX_1 determined for the same block are stored into the DRAM 614 individually. Since the motion vector predictor MVPFX_1 obtained by the MVP derivation circuit 602 is stored into the DRAM 614, the combining circuit 608 is arranged to obtain the motion vector predictor MVPFX_1 from the DRAM 614 rather than the MVP derivation circuit 602. With the help of the DRAM 614 that buffer an MVP output of the MVP derivation circuit 602, the MVP derivation circuit 602 is allowed to start a new task before the DMVR circuit 604 finishes deriving the motion vector difference MVDFX_1 for the motion vector predictor MVPFX_1 of the current block. In some embodiments, the new task includes at least one of deriving a motion vector predictor MVPFX_2 for a next block, reading a data from a storage device (e.g., DRAM 116 or SRAM which is not shown herein) for a later computation, writing a data to the storage device for a later computation, or any other tasks independent from deriving the motion vector difference in order to use free computation resource efficiently. In the embodiment that the new task performs deriving the motion vector predictor MVPFX_2 for the next block, after determining the motion vector predictor MVPFX_2 for the next block, the MVP derivation circuit 602 stores the motion vector predictor MVPFX_2 into the DRAM 614, and initiates a MVP computation process of a next block. Hence, the DRAM latency of reading the reference data D_REF for computation of motion vector difference MVDFX_1 can be fully/partially hidden in a period during which computation of motion vector predictor MVPFX_2 is performed at MVP derivation circuit 602. Since computation of motion vector predictor MVPFX_2 does not need to wait for an end of computation of motion vector difference MVDFX_1, the decoder performance can be greatly improved.
When the motion vector of the current block is to be selected as a candidate of a motion vector predictor for a later decoded block (e.g. a block in a picture that is in the future with respect to the current picture in the display order), the combining circuit 608 is arranged to read both of the motion vector predictor MVPFX_1 and the motion vector difference MVDFX_1 from the DRAM 614, and combine the motion vector predictor MVPFX_1 and the motion vector difference MVDFX_1 to generate a refined motion vector MVFX_1 (MVFX_1=MVPFX_1+MVDFX_1) for the current block, wherein the refined motion vector MVFX_1 is in the first representation format such as the fixed-point format.
Regarding the embodiment shown in
Regarding each block to be decoded, the MVP derivation circuit 602 determines a motion vector predictor and stores the motion vector predictor into the DRAM 614, and the DMVR circuit 604 determines a motion vector difference and stores the motion vector difference into the DRAM 614. Since the motion vector predictor and the motion vector difference of the same block may be read from the DRAM 614 for determining a refined motion vector, a DRAM footprint may be properly designed to ensure that the motion vector predictor and the motion vector difference of the same block can be retrieved by a single bust transfer under a bust-mode of the DRAM 614. Please refer to
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.
Chen, Chi-Hung, Li, Cheng-Han, Lin, Hong-Cheng
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
10469869, | Jun 01 2018 | TENCENT AMERICA LLC | Method and apparatus for video coding |
20130208805, | |||
20190394483, | |||
20200169748, | |||
20200186827, | |||
20200221118, | |||
20210344948, | |||
20220159277, | |||
CN113728644, | |||
TW201813396, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jan 21 2022 | CHEN, CHI-HUNG | MEDIATEK INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 058766 | /0308 | |
Jan 21 2022 | LI, CHENG-HAN | MEDIATEK INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 058766 | /0308 | |
Jan 21 2022 | LIN, HONG-CHENG | MEDIATEK INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 058766 | /0308 | |
Jan 25 2022 | MEDIATEK INC. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Jan 25 2022 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Mar 28 2026 | 4 years fee payment window open |
Sep 28 2026 | 6 months grace period start (w surcharge) |
Mar 28 2027 | patent expiry (for year 4) |
Mar 28 2029 | 2 years to revive unintentionally abandoned end. (for year 4) |
Mar 28 2030 | 8 years fee payment window open |
Sep 28 2030 | 6 months grace period start (w surcharge) |
Mar 28 2031 | patent expiry (for year 8) |
Mar 28 2033 | 2 years to revive unintentionally abandoned end. (for year 8) |
Mar 28 2034 | 12 years fee payment window open |
Sep 28 2034 | 6 months grace period start (w surcharge) |
Mar 28 2035 | patent expiry (for year 12) |
Mar 28 2037 | 2 years to revive unintentionally abandoned end. (for year 12) |