A disparity vector estimation method and an apparatus are provided for encoding and decoding a multi-view picture using the disparity vector estimation method. The method of estimating a disparity vector of a multi-view picture includes determining a disparity vector between two frames having a different viewpoint from a current viewpoint, and calculating a disparity vector of a current viewpoint frame using the determined disparity vector and a certain translation parameter.
|
1. A method of estimating a disparity vector of a multi-view picture, the method comprising:
determining a disparity vector between two frames having a different viewpoint from a current viewpoint; and
calculating a disparity vector of a current viewpoint frame using the disparity vector between the two frames and a translation parameter,
wherein at least one of the determining and the calculating is performed by a processor of a computer system,
the two frames having the different viewpoint from the current viewpoint are first and second viewpoint frames, and
the disparity vector of the current viewpoint frame is dv2 which is calculated using dv2=dv1×cd2/cd1, where dv1 denotes the disparity vector between the two frames which is calculated using a certain block of the first viewpoint frame and a first block of the second viewpoint frame corresponding to the certain block, dv2 denotes a disparity vector between the first block of the second viewpoint frame and a second block of the current viewpoint frame corresponding to the first block, and cd2/cd1 is a value corresponding to the translation parameter, or
the disparity vector of the current viewpoint frame is dv4 which is calculated using dv4=dv3×cd4/cd3, where dv3 denotes the disparity vector between the two frames which is calculated using a certain block of the first viewpoint frame and a first block of the second viewpoint frame corresponding to the certain block, dv4 denotes a disparity vector of the second viewpoint frame of a co-located second block of the current viewpoint frame of the first block, and cd4/cd3 is a value corresponding to the translation parameter.
12. A non-transitory computer-readable recording medium having recorded thereon a program which when executed by a computer, causes the computer to execute a method of estimating a disparity vector of a multi-view picture, the method comprising:
determining a disparity vector between two frames having a different viewpoint from a current viewpoint; and
calculating a disparity vector of a current viewpoint frame using the disparity vector between the two frames and a translation parameter,
wherein the two frames having the different viewpoint from the current viewpoint are a first viewpoint frame and a second viewpoint, and
the disparity vector of the current viewpoint frame is dv2 which is calculated by the spatial direct mode performing unit using dv2=dv 1×cd2/cd1, where dv1 denotes the disparity vector between the two frames which is calculated using a certain block of the first viewpoint frame and a first block of the second viewpoint frame corresponding to the certain block, dv2 denotes a disparity vector between the first block of the second viewpoint frame and a second block of the current viewpoint frame corresponding to the first block, and cd2/cd1 is a value corresponding to the translation parameter, or
the disparity vector of the current viewpoint frame is dv4 which is calculated by the spatial direct mode performing unit using dv4=dv3×cd4/cd3, where dv3 denotes the disparity vector between the two frames, which is calculated using a certain block of the first viewpoint frame and a first block of the second viewpoint frame corresponding to the certain block, dv4 denotes a disparity vector of the second viewpoint frame of a co-located second block of the current viewpoint frame of the first block, and cd4/cd3 is a value corresponding to the translation parameter.
4. A multi-view picture encoding apparatus comprising:
a multi-view picture input unit which receives multi-view pictures and a translation parameter; and
an encoding unit which generates an encoded multi-view picture bit stream including the multi-view pictures and the translation parameter,
wherein the encoding unit comprises a spatial direct mode performing unit which estimates a disparity using spatial direct mode estimation that calculates a disparity vector of a current viewpoint frame using a disparity vector between two frames having a different viewpoint from a current viewpoint and the translation parameter,
the two frames having the different viewpoint from the current viewpoint are first and second viewpoint frames, and
the disparity vector of the current viewpoint frame is dv2 which is calculated by the spatial direct mode performing unit using dv2=dv 1×cd2/cd1, where dv1 denotes the disparity vector between the two frames which is calculated using a certain block of the first viewpoint frame and a first block of the second viewpoint frame corresponding to the certain block, dv2 denotes a disparity vector between the first block of the second viewpoint frame and a second block of the current viewpoint frame corresponding to the first block, and cd2/cd1 is a value corresponding to the translation parameter, or
the disparity vector of the current viewpoint frame is dv4 which is calculated by the spatial direct mode performing unit using dv4=dv3×cd4/cd3, where dv3 denotes the disparity vector between the two frames, which is calculated using a certain block of the first viewpoint frame and a first block of the second viewpoint frame corresponding to the certain block, dv4 denotes a disparity vector of the second viewpoint frame of a co-located second block of the current viewpoint frame of the first block, and cd4/cd3 is a value corresponding to the translation parameter.
9. A multi-view picture decoding apparatus comprising:
an information confirming unit which receives an encoded multi-view picture bit stream and confirms information indicating a disparity vector estimation method included in the received multi-view picture bit stream; and
a decoding unit which decodes the encoded multi-view picture based on the confirmed information,
wherein the decoding unit comprises a spatial direct mode performing unit, if the confirmed information is spatial direct mode estimation, which calculates a disparity vector of a current viewpoint frame using a disparity vector between two frames having a different viewpoint from a current viewpoint and a translation parameter, and estimates a disparity,
the two frames having the different viewpoint from the current viewpoint are first and second viewpoint frames, and
the disparity vector of the current viewpoint frame is dv2 which is calculated by the spatial direct mode performing unit using dv2=dv 1×cd2/cd1, where dv1 denotes the disparity vector between the two frames which is calculated using a certain block of the first viewpoint frame and a first block of the second viewpoint frame corresponding to the certain block, dv2 denotes a disparity vector between the first block of the second viewpoint frame and a second block of the current viewpoint frame corresponding to the first block, and cd2/cd1 is a value corresponding to the translation parameter, or
the disparity vector of the current viewpoint frame is dv4 which is calculated by the spatial direct mode performing unit using dv4=dv3×cd4/cd3, where dv3 denotes the disparity vector between the two frames, which is calculated using a certain block of the first viewpoint frame and a first block of the second viewpoint frame corresponding to the certain block, dv4 denotes a disparity vector of the second viewpoint frame of a co-located second block of the current viewpoint frame of the first block, and cd4/cd3 is a value corresponding to the translation parameter.
2. The method of
3. The method of
5. The apparatus of
6. The apparatus of
7. The apparatus of
8. The apparatus of
10. The apparatus of
11. The apparatus of
the decoding unit decodes the multi-view picture using the previously received translation parameter.
|
This application claims priority from U.S. Patent Application No. 60/721,578, filed on Sep. 29, 2005 in the U.S. Patent Trademark Office, and Korean Patent Application No. 10-2006-0033209, filed on Apr. 12, 2006 in the Korean Intellectual Property Office, the disclosures of which are incorporated herein in their entirety by reference.
1. Field of the Invention
The present invention relates to a multi-view picture encoding and decoding apparatus, and more particularly, to a disparity vector estimation method of quickly encoding a multi-view picture and improving the compressibility of the multi-view moving picture, and an apparatus for encoding and decoding a multi-view picture using the disparity vector estimation method.
2. Description of the Related Art
Recently, a new H.264 picture coding video coding standard for high encoding efficiency compared to conventional standards has been developed. The new H.264 picture coding standard depends on various characteristics, consideration of a variable block size between 16×16 and 4×4, a quadtree structure for motion compensation using a loop de-blocking filter, a multi-reference frame, intra prediction, and context adaptability entropy coding, as well as consideration of general bi-directional (B) estimation slices. Unlike the MPEG-2 standard, the MPEG-4 Part 2 standard, etc., the B slices can be referred to as different slices while using multi-prediction obtained from the same direction (forward or backward). However, the above-described characteristics require many bits for encoding motion information including an estimation mode and motion vector and/or reference image.
In order to overcome this problem, a skip mode and a direct mode can be respectively introduced into predicted (P) slices and B slices. The skip and direct modes allow motion estimation of an arbitrary block of a picture to be currently encoded, using motion vector information previously encoded. Accordingly, additional motion data for blocks or macroblocks (MBs) is not encoded. Motions for these modes are obtained using spatial (skip) or temporal (direct) correlation of motions of adjacent MBs or pictures.
In the direct mode, a forward motion vector and a backward motion vector are obtained using a motion vector of a co-located block of a temporally following P image, when estimating a motion of an arbitrary block of a B picture to be currently encoded.
In order to calculate a forward motion vector MVL0 and a backward motion vector MVL1 of a direct mode block 102 whose motion will be estimated in a B picture 110, a motion vector MV for a reference list 0 image 130 is detected. A motion vector refers to the reference list 0 image 130 which a co-located block 104 of a temporally following picture vector. The co-located block 104 is at the same position as the direct mode block 102 in a current B. Thus, the forward motion vector MVL0 and the backward motion vector MVL1 of the direct mode block 102 of the B picture 110 are calculated using Equations 1 and 2 as follows.
where MV represents the motion vector of the co-located block 104 of the reference list 1 picture 120, TRD represents a distance between the reference list 0 picture 130 and the reference list 1 picture 120, and TRB represents a distance between the B picture 110 and the reference list 0 picture 130.
According to the H.264 standard used for encoding moving motion picture data, a frame is divided into blocks, each having a predetermined size, and motion searching for a most similar block to an adjacent frame(s) subjected to encoding is performed. That is, an intermediate value of motion vectors of a left macroblock 4, an upper middle macroblock 2, and an upper right macroblock 3 of a current macroblock c is determined as an estimation value of the corresponding motion vector. The motion vector estimation can be expressed by Equation 3 as follows.
As such, a method of encoding a moving picture using spatial correlation as well as temporal correlation has been proposed. However, a method of enhancing the compressibility and processing speed of a multi-view picture having significantly more information than a general moving motion picture, is still required.
The present invention provides a method and apparatus for encoding a multi-view picture, in order to enhance the compressibility of a multi-view picture and quickly perform encoding of the multi-view picture by estimating disparity using camera parameters.
According to an aspect of the present invention, there is provided a method of estimating a disparity vector of a multi-view picture comprising: determining a disparity vector between two frames having a different viewpoint from a current viewpoint; and calculating a disparity vector of a current viewpoint frame using the determined disparity vector and a certain translation parameter.
According to another aspect of the present invention, there is provided a multi-view picture encoding apparatus comprising: an information confirming unit which receives an encoded multi-view picture bit stream and confirms information indicating a disparity vector estimation method included in the received multi-view picture bit stream; and a decoding unit which decodes the encoded multi-view picture based on the confirmed information, wherein the decoding unit comprises a spatial direct mode performing unit, when the confirmed information is spatial direct mode estimation, which calculates a disparity vector of a current viewpoint frame using a disparity vector between two frames having a different viewpoint from a current viewpoint and a certain translation parameter and estimates a disparity.
The above and other aspects of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings in which:
Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the appended drawings.
Referring to
The multi-view moving picture encoding apparatus receives multi-view image sources obtained from a plurality of camera systems or using a different method. The received multi-view picture sources are stored in the multi-view image buffer 310. The multi-view image buffer 310 provides the stored multi-view picture source data to the estimating unit 320 and the residual image encoder 340.
The estimating unit 320 includes a disparity estimator 322 and a motion estimator 324, and performs disparity estimation and motion estimation on the stored multi-view image sources.
The disparity/motion compensator 330 performs disparity and motion compensation using disparity vectors and motion vectors estimated by the disparity estimator 322 and the motion estimator 324. The disparity/motion compensator 330 reconstructs an image obtained using the estimated motion and disparity vectors and provides the reconstructed image to the residual image encoder 340.
The residual image encoder 340 encodes a residual image obtained by subtracting the image compensated and reconstructed by the disparity/motion compensator 330 from the original image provided by the multi-view image buffer 310 and provides the encoded residual image to the entropy encoder 350.
The entropy encoder 350 receives the estimated the disparity vectors and the motion vectors generated by the estimating unit 320 and the encoded residual image from the residual image encoder 340, and generates a bitstream for the multi-view video source data.
In the disparity estimator 322 illustrated in
The description of
X′=K′R12K−1X+K′t12/Z (4)
X″=K″R13K−1X+K″t13/Z (5)
wherein K, K′, and K″ denote camera intrinsic parameters of pictures 1, 2, and 3, respectively, R12 and R13 denote camera rotation parameters of pictures 2 and 3, respectively, with respect to picture 1, t12 and t13 denote camera translation parameters of pictures 2 and 3, respectively, with respect to picture 1, and Z denotes a value of a certain point with respect to the Z coordinates.
Pictures photographed by the three multi-view cameras in the pure-translation camera setup, i.e., the three multi-view cameras having the same unique camera characteristics are arranged in parallel to a certain object, are illustrated in
Referring to
X′=X+Kt12/Z (6)
X″=X+Kt13/Z (7)
X″=X′+Kt23/Z (8)
Equation 9 can be derived from Equations 6 and 7.
When values of t12, t13, and X″−X, i.e., dv3 are known, a value X′−X, i.e., dv1, can be obtained using Equation 9 instead of a conventional disparity vector search method.
Equation 10 can be derived from Equations 6 and 8.
When values of t12, t23, and X′−X, i.e., dv1 are known, a value X″−X′, i.e., dv2, can be obtained using Equation 10 instead of a conventional disparity vector search method.
Consequently, when the pure-translation exists and values of camera translation parameters are known in the multi-view camera setup, a disparity vector between the point X′ of a current encoding frame and the corresponding point X″ of a first reference frame can be estimated using a disparity vector between the point X′ of the current encoding frame and the corresponding point X of a second reference frame.
A multi-view camera can be setup in a manner different from the pure-translation multi-view camera setup. However, although the multi-view camera setup in a manner other than the pure-translation multi-view camera setup photographs a multi-view picture, the multi-view camera has the same condition as the pure-translation multi-view camera setup in terms of performing a rectification process so as to encode the multi-view picture as a pre-process or post-process. Therefore, since it is possible to estimate the disparity vector in the pure-translation multi-view camera setup, it is necessary to encode the multi-view picture using camera parameters corresponding to translations.
Referring to
Referring to
When dv1 and dv2 are in a rectilinear line, Equation 11 is obtained.
dv2=dv1×cd2/cd1 (11)
Therefore, when dv1, cd1, and cd2 are previously known or estimated, dv2 can be calculated using dv1, cd1, and cd2 instead of the conventional estimation method. Furthermore,
As described with reference to
Referring to
In the current exemplary embodiment of the present invention, dv4 can be estimated using the disparity vector dv3 between a co-located block of the current block at the viewpoint n and a corresponding block at the viewpoint n−1 of the co-located block at the viewpoint n. Therefore, the disparity vector of the current block can be calculated by using Equation 12, which is similar to Equation 11.
dv4=dv3×cd4/cd3 (12)
A value of cd4/cd3 corresponds to a translation parameter. The correlations between disparity vectors of pictures at the viewpoints n−1, n, and n+1 have been described with reference to
More specifically, for example, when the two frames having different viewpoints from a current viewpoint are first and second viewpoint frames, and the disparity vector of the current viewpoint frame is dv2, dv2 is calculated using Equation 11, i.e., dv2=dv1×cd2/cd1. dv1 denotes the disparity vector between the two frames, i.e., a certain block of the first viewpoint frame and a first block of the second viewpoint frame corresponding to the certain block. dv2 denotes a disparity vector between the first block of the second viewpoint frame and a second block of the current viewpoint frame corresponding to the first block. Also, as described above, cd1 denotes a distance between two cameras photographing the first viewpoint frame and the second viewpoint frame, respectively, cd2 denotes a distance between two cameras photographing the second viewpoint frame and the current viewpoint frame, respectively, and cd2/cd1 is a value corresponding to the translation parameter.
Based on the fact that blocks adjacent to the current block to be estimated have a similar disparity vector to the current block, the disparity vector estimation method of the current exemplary embodiment can be used when two frames having different viewpoints from a current viewpoint are first and second viewpoint frames, and the disparity vector of a current viewpoint frame is dv4 which can be calculated using Equation 12, i.e., dv4=dv3×cd4/cd3.
dv3 denotes the disparity vector between the two frames, i.e., a certain block of the first viewpoint frame and a first block of the second viewpoint frame corresponding to the certain block. dv4 denotes a disparity vector of the second viewpoint frame of a co-located second block of the current viewpoint frame of the first block. A second block is the co-located block of the first block in the current viewpoint frame and is the current block as depicted in
In the current exemplary embodiment, the first viewpoint frame, second viewpoint frame, and the current viewpoint frame are images photographed by a first camera, a second camera, and a third camera, respectively, which are sequentially arranged in parallel. The translation parameter is a value relating to the distances between the multi-view cameras such as cd2/cd1 or cd4/cd3 and can be transmitted from a multi-view camera system.
A multi-view picture encoding apparatus and a multi-view picture decoding apparatus separately perfrom the spatial direct mode estimation of the current exemplary embodiment. When the multi-view picture encoding apparatus uses the spatial direct mode estimation, all disparity vectors need not to be encoded in the multi-view picture encoding apparatus. Also, the multi-view picture decoding apparatus can determine a disparity vector using the disparity vector estimation method according to the spatial direct mode estimation, so that multi-view picture encoding and decoding can be effectively performed.
Referring to
The multi-view picture input unit 1010 receives the three multi-view pictures and a certain camera parameter from a multi-view camera system including a plurality of cameras. The encoding unit 1020 generates an encoded multi-view picture bit stream including the multi-view pictures and the certain camera parameter.
The encoding unit 1020 comprises a spatial direct mode performing unit 1021 for performing the spatial direct mode estimation when estimating a disparity vector. When the certain camera parameter is a translation parameter, the spatial direct mode performing unit 1021 estimates the disparity vector using the spatial direct mode estimation that calculates a disparity vector of a current viewpoint frame using a disparity vector of two frames having a different viewpoint from a current viewpoint and the translation parameter.
The operation of the spatial direct mode performing unit 1021 is described with reference to
The encoding unit 1020 sets information indicating the disparity vector estimation method used to encode the three multi-view pictures and transmits the multi-view picture bit stream. When the encoding unit 1020 encodes the multi-view pictures using the same value as the transmitted translation parameter, the encoding unit 1020 further sets information indicating that a translation matrix does not change and transmits the multi-view picture bit stream. Therefore, the multi-view picture encoding apparatus of the present invention does not need to transmit the previously transmitted translation parameter again, which increases multi-view picture encoding efficiency. The information indicating the disparity vector estimation method or the information indicating that the translation matrix does not change can be set as flag information included in the multi-view picture bit stream.
The multi-view picture encoding apparatus of the present invention can perform a conventional multi-view picture encoding method and the multi-view picture encoding method performing the spatial direct mode estimation, selects one of them having high multi-view picture encoding efficiency, and encodes the multi-view pictures.
Referring to
The decoding unit 1120 comprises a spatial direct mode performing unit 1121 to perform the spatial direct mode estimation and determine a disparity vector. When the confirmed information about the disparity vector estimation method is the spatial direct mode estimation, the spatial direct mode performing unit 1121 calculates a disparity vector of a current viewpoint frame using a disparity vector between two frames having a different viewpoint from a current viewpoint and a certain translation parameter and thereby determines a disparity.
The operation of the spatial direct mode performing unit 1121 is described with reference to
The translation parameter is a value relating to a distance between multi-view cameras and is transmitted from the multi-view picture encoding apparatus. When the multi-view picture encoding apparatus does not transmit the translation parameter but instead transmits information indicating that a previously received translation parameter does not change, the decoding unit 1120 can decode the multi-view picture using the previously received translation parameter.
The present invention can also be implemented as computer-readable code on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer-readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet). The computer-readable recording medium can also be distributed over network-coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion.
According to the present invention, a disparity vector estimation method can enhance the compressibility of a multi-view picture and quickly perform encoding of the multi-view picture using camera parameters, in particular, translation parameters.
The present invention provides multi-view picture encoding and decoding apparatuses using the disparity vector estimation method that uses camera parameters.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims.
Patent | Priority | Assignee | Title |
10356388, | Apr 27 2009 | Mitsubishi Electric Corporation | Stereoscopic video distribution system, stereoscopic video distribution method, stereoscopic video distribution apparatus, stereoscopic video viewing system, stereoscopic video viewing method, and stereoscopic video viewing apparatus |
9516317, | Jul 24 2009 | Samsung Electronics Co., Ltd. | Method and apparatus for encoding images and method and apparatus for decoding images |
Patent | Priority | Assignee | Title |
6005607, | Jun 29 1995 | MATSUSHITA ELECTRIC INDUSTRIAL CO LTD | Stereoscopic computer graphics image generating apparatus and stereoscopic TV apparatus |
6055012, | Dec 29 1995 | THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT | Digital multi-view video compression with complexity and compatibility constraints |
6144701, | Oct 11 1996 | Sarnoff Corporation | Stereoscopic video coding and decoding apparatus and method |
6163337, | Apr 05 1996 | MATSUSHITA ELECTRIC INDUSTRIAL CO , LTD | Multi-view point image transmission method and multi-view point image display method |
6911995, | Aug 17 2001 | Mitsubishi Electric Research Labs, Inc. | Computer vision depth segmentation using virtual surface |
7373004, | May 23 2003 | XIAHOU HOLDINGS, LLC | Apparatus for constant quality rate control in video compression and target bit allocator thereof |
20020175921, | |||
20030202592, | |||
20040240725, | |||
20050105610, | |||
20050185048, | |||
JP1013860, | |||
JP2004200814, | |||
JP200448725, | |||
JP8201941, | |||
JP974573, | |||
KR100517517, | |||
KR1020030083285, | |||
KR1020050046108, | |||
WO2007035042, | |||
WO2007035054, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Sep 26 2006 | HA, TAE-HYEUN | SAMSUNG ELECTRONICS CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 018368 | 0047 | |
Sep 29 2006 | Samsung Electronics Co., Ltd. | (assignment on the face of the patent) |
Date | Maintenance Fee Events |
Feb 05 2014 | ASPN: Payor Number Assigned. |
Mar 16 2017 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
May 17 2021 | REM: Maintenance Fee Reminder Mailed. |
Nov 01 2021 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Sep 24 2016 | 4 years fee payment window open |
Mar 24 2017 | 6 months grace period start (w surcharge) |
Sep 24 2017 | patent expiry (for year 4) |
Sep 24 2019 | 2 years to revive unintentionally abandoned end. (for year 4) |
Sep 24 2020 | 8 years fee payment window open |
Mar 24 2021 | 6 months grace period start (w surcharge) |
Sep 24 2021 | patent expiry (for year 8) |
Sep 24 2023 | 2 years to revive unintentionally abandoned end. (for year 8) |
Sep 24 2024 | 12 years fee payment window open |
Mar 24 2025 | 6 months grace period start (w surcharge) |
Sep 24 2025 | patent expiry (for year 12) |
Sep 24 2027 | 2 years to revive unintentionally abandoned end. (for year 12) |