In a communication system, parallel encoding and decoding of serially-coded data occurs in a manner that supports low latency communication. A plurality of data items may be coded as serially-coded data sequences and a transmission sequence may be built from them. An index table may be built having a plurality of entries representing respective start points of the serially-coded data sequences within the transmission sequence. The transmission sequence may be transmitted to a channel and, thereafter, the index table may be transmitted. Latencies otherwise involved in inserting an index table into the beginning of a transmission sequence may be avoided.
|
26. Computer readable storage device to store entropy-coded data having stored thereon a serial datastream comprising in order:
a header,
a payload with a plurality of entropy-coded strings,
an index table having entries identifying locations of the strings within the datastream and
a back pointer.
17. A method, comprising:
deriving a length of a transmission unit received as serial data from a channel,
reading a back pointer from an end of the transmission unit,
determining, from the back pointer, a location of an index table, and
parsing the transmission unit into a plurality of entropy-coded strings according to fields of the index table, and
entropy decoding the strings in a plurality of parallel processing systems.
8. A method, comprising:
entropy coding input data and generating a plurality of coded strings therefrom, wherein a context of at least one coded string may be derived from a prior coded string,
building a transmission unit that includes, in series:
a header region identifying the transmission unit,
a payload region including the coded strings,
an index table identifying locations of the coded strings within the payload region and
a back pointer, and
transmitting the transmission unit to a decoder.
16. A method, comprising:
coding a plurality of data items as serially-coded data sequences,
building a transmission sequence from the serially-coded data sequences,
writing coding selections associated with the serially-coded data sequences into the transmission sequence in a transmission position following the serially-coded data sequences, and
transmitting the transmission sequence and the coding selections in a channel, wherein the transmission sequence precedes the coding selections in transmission order.
1. A method, comprising:
coding a plurality of data items as serially-coded data sequences, the coded data sequences having lower bit rates than the data items,
building a transmission sequence from the serially-coded data sequences,
building an index table having a plurality of entries representing respective start points of the serially-coded data sequences within the transmission sequence, and
transmitting the transmission sequence and the index table in a channel, wherein the transmission sequence precedes the index table in transmission order.
33. A method, comprising:
receiving data from a channel including a transmission sequence having a plurality of serially-coded data sequences contained therein and an index table, wherein the transmission sequence precedes the index table in reception order,
parsing the index table to identify respective start points of the serially-coded data sequences within the transmission sequence, and
decoding at least two of the data sequences using parallel processing threads, the decoding generating decoded data sequences having higher bit rates than the coded data sequences.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
9. The method of
10. The method of
12. The method of
13. The method of
14. The method of
15. The method of
18. The method of
19. The method of
20. The method of
22. The method of
23. The method of
24. The method of
25. The method of
27. The storage device of
28. The storage device of
29. The storage device of
30. The storage device of
31. The storage device of
32. The storage device of
34. The method of
35. The method of
36. The method of
37. The method of
38. The method of
39. The method of
|
The present invention benefits from priority of U.S. Provisional Application Ser. No. 61/680,590, filed Aug. 7, 2012 and entitled “Entropy Coding Techniques and Protocol to Support Parallel Processing with Low Latency,” the disclosure of which is incorporated herein in its entirety.
Various video coding systems can be designed to support parallel entropy coding and entropy decoding processes for different segments of video, for example, slices, tiles or blocks. As one of the final stages of video coding, coded data from different spatial areas of a frame is formed into data sequences and entropy-coded as a string of bits. Early entropy coding techniques had been serial. A coding context carried from bit to bit, down each sequence, and then to the beginning of a next sequence. Until the entropy coding was undone serially, a decoder could not perform any parallel processing of constructs within the entropy-coded sequence.
Wavefront Parallel Processing (“WPP”) introduced the idea of selecting some or all of the sequences to get their entropy context from an initial portion of a previous sequence rather than from an end portion of the previous sequence. By developing the context of a given sequence from the start of the preceding sequence, parallel entropy decoding of the second sequence could be performed once decoding of the first sequence developed a decoding context for the second sequence. Thus, WPP supports parallel processing of the sequences to some degree.
The WPP technique, however, has certain consequences. Parallel decoding of sequences cannot be performed until sequence start points have been identified and an appropriate context has been developed for each sequence. Because the context of a current sequence is developed by entropy decoding a relevant portion of a previously-coded sequence, the WPP introduces dependencies among the sequences. Moreover, because the entropy-coded data is a serially coded bitstream, positions of the various sequences must be identified by an index field table that specifies start points of the sequences.
In the current design of the forthcoming HEVC coding standard, it has been proposed to provide an index in front of the entropy-coded data that identifies the bit-positions of these start points. This causes significant delay, however, because an encoder must buffer all coded video data to be represented by the table, build the table and add it to a coded bitstream as a position that precedes the coded data itself. Essentially, an encoder may start transmitting coded video data of a segment to which the table applies only after the segment is coded in its entirety.
The inventors perceive a need in the art for an entropy coding protocol that supports parallel-processing and yet avoids the latencies associated with prior solutions.
Embodiments of the present invention provide techniques to support parallel encoding and decoding of serially-coded data in a manner that supports low latency communication. The techniques involve coding a plurality of data items as serially-coded data sequences and building a transmission sequence from them. An index table may be built having a plurality of entries representing respective start points of the serially-coded data sequences within the transmission sequence. The transmission sequence may be transmitted to a channel and, thereafter, the index table may be transmitted. Thus, latencies involved in inserting an index table into the beginning of the transmission sequence may be avoided.
The following discussion presents the embodiments of the present invention in the context of a video coding system but the principles of the present invention are not so limited. The present invention may find application in a variety of coding environments, such as audio coding systems, encryption systems and the like, where entropy coding of strings may provide benefits.
In
The pre-processor 220 may perform various analytical and signal conditioning operations on video data. For example, the pre-processor 220 also may apply various filtering operations to the frame data to improve efficiency of coding operations applied by a video coder 230. The pre-processor 220 also may perform analytical operations on the source video data to derive statistics of the video, which may be provided to the controller 260 to otherwise manage operations of the video coding system 200.
The coding engine 230 may perform coding operations on the video sequence to reduce the sequence's bit rate. The coding engine 230 may parse each frame into sub-units, such as slices and coding units (“CUs”), and may code the sub-units according to motion compensated predictive coding techniques that exploit spatial and/or temporal redundancies therein. For purposes of the present discussion, it is sufficient to note that, as part of its operation, the coding engine may include a CU-based coder that includes a transform unit 232, a quantizer 234 and an entropy coder 236. The coding engine 230 may select and apply a coding mode to the CU. Thereafter, pixels of the CUs (which may be expressed as pixel residuals, depending on the selected coding mode) may be subject to a transform, for example a discrete cosine transform or a wavelet transform. Transform coefficients obtained from the transform unit 232 may be quantized by a quantization parameter (Qp) in the quantizer 234. The coding mode and the quantized coefficients may be entropy coded by the entropy coder 236.
In an embodiment, shown in
In another embodiment, strings may be entropy coded independently of each other, by using, for example, entropy slices or tile representations. This can further streamline the decoding process by eliminating the entropy decoding dependencies between strings. In such an embodiment, the coding context of each string may reset to a predetermined state at the onset of each string and, therefore, threads need not pass coding contexts among one another.
Returning to
During operation, the coding system 200 may accept the input video sequence as a stream of video data, which may be coded and output from the system 200 on a running basis. Thus, at a time when the video source 210 provides a new frame to the system 200 for coding, the format buffer 240 and transmitter 250 may be outputting coded video data of earlier-received frames. Indeed, the format buffer 240 may output coded video data of early portions of a slice from a given frame while the coding engine 230 is generating coded video data later portions of the same slice. To provide high throughput, operations of the components illustrated in
As indicated, the slice header 410 may include a data pattern that indicates the start of a slice within the serial data stream and a flag 412 that indicates whether the slice 400 includes a backpointer 430. In an embodiment, the slice header 410 may include fields to provide the index table within the slice header 410 itself (not shown). Thus, embodiments of the present invention permit an encoder to place signaling for the index table 440 either at the beginning of a slice 400 within the slice header 410 or at the end of a slice whose location is identified by the backpointer 420 based on local coding decisions made by the encoder.
As indicated, the embodiment of
The following tables illustrate a syntax of a slice in an embodiment consistent with
TABLE 1
slice_layer_rbsp( ) {
Descriptor
slice_header( )
slice_data( )
slice_extension( )
rbsp_slice_trailing_bits( )
}
where slice_header( ) represents content of the slice header 410, slice_data( ) represents content of the slice payload 420, slice_extension( ) represents content of the index table 440 and the back pointer 430. The field rbsp_slice_trailing_bits( ) may represent a process for forming the transmission bitstream.
Table 2 illustrates an exemplary syntax that may be used within a slice header 410 according to these embodiments:
TABLE 2
slice_header( ) {
Descriptor
• • •
if( tiles_or_entropy_coding_sync_idc = = 1 | |
tiles_or_entropy_coding_sync_idc = = 2 ) {
num_entry_point_offsets
ue(v)
if( num_entry_point_offsets > 0 ) {
offset_len_minus1
ue(v)
for( i = 0; i < num_entry_point_offsets; i++ )
entry_point_offset[ i ]
u(v)
}
}
In the foregoing, the field num_entry_point_offsets may represent a number of strings included within the payload field 420 and, by consequence, the number of entries within the table. In this embodiment, the num_entry_point_offsets field may double as a flag 412 to identify the presence of a back pointer 430. A value of zero may indicate there are no table entries within the slice header 410 and may indicate impliedly that the slice 400 includes a back pointer 430. A non-zero value may identify a number of entries provided within the slice header. The entry_point_offset[i] fields may represent respective locations within the payload field 420 of the start points of the strings 472-478. For i>0, the field entry_point_offset[i] may be calculated as entry_point_offset[i]=entry_point_offset[i−1]+entry_point_offset_delta[i], where the entry_point_offset_delta[i] field represents a change in length among coded successively-coded strings.
As indicated, the backpointer 430 may include data that identifies the location of an index table 440. The backpointer 430 may include one or more variable length codes. As a series of variable length code, data of the backpointer 430 may be provided in reverse order within the slice 400. That is, backpointer data may start with the last bit position of the slice and propagate from the last bit position forward toward the slice header 410.
Table 3 illustrates an exemplary syntax that may be used for slice extension data according to these embodiments:
TABLE 3
slice_extension( ) {
Descriptor
encoded_length = 0;
while (slice_data_remaining( ) >
ue_length_of( encoded_length)) {
slice_extension_tag
ue(v)
slice_extension_length
ue(v)
encoded_length += slice_extension_length +
ue_length_of( slice_extension_tag ) +
ue_length_of( slice_extension_length );
slice_extension_data
u(v)
}
extension_back_pointer
rev-ue(v)
}
In this example, the slice_extension_data field occupies slice_extension_length bits and has a structure indicated by the slice_extension_tag value. The value of extension_back_pointer is equal to encoded_length, and, as indicated, may be in the bitstream with the bits in reverse order. The function ue_length_of(x) returns the number of bits needed to encode the value x as a ue(v). The slice_extension_tag of the entry_point array may be defined to be 0 (which compactly codes as the bit ‘1’) and all other values may be reserved.
Table 4 illustrates a embodiment for slice_extension_data when slice_extension_tag==0:
TABLE 4
slice_extension_data( ) {
Descriptor
switch (slice_extension_tag) {
case 0: /* entry points */
offset_len_minus1
ue(v)
num_entry_point_offsets = 0;
while (extension_data_remaining( )) {
entry_point_offset[ i ]
u(v)
num_entry_point_offsets++;
}
break;
}
}
This structure resembles the table structure in the slice header above (Table 2).
The foregoing discussion has presented the backpointer 430 and string index table 440 as the only metadata that is provided at the end of the slice 400. The principles of the present invention do not foreclose use of metadata 460 provided by other sources (not shown). In embodiments where no other data is permitted in the end-of-slice structures, a backpointer 430 need not include an express pointer to the index table 440.
String start points (shown as entry_point_offsets in Table 2 and Table 4) may be coded in a variety of ways. In a first embodiment, each string start point may be expressed as an offset from the end of the slice header. In a second embodiment, each string start point may be expressed as an offset from a start point of a preceding string (essentially, corresponding to prior string's length). In this embodiment, the start point of the first string may be taken to begin immediately following the end of the slice header.
In another embodiment, each string start point may be expressed as a difference in offsets between the current string's start point and the preceding string's start point (corresponding to a difference in lengths between the prior two strings). This is shown below in Table 5.
TABLE 5
i = 0;
while (extension_data_remaining( )) {
if (i == 0) entry_point_offset[0]
ue(v)
else { entry_point_offset_delta[ i ]
se(v)
entry_point_offset[i] =
entry_point_offset[ i − 1] +
entry_point_offset_delta[ i ];
i++;
}
num_entry_point_offsets = i
At some point, the method 500 will reach the end of a slice. The method 500 may determine when coding has reached the end of the slice (box 535) and, when it does, may build an index table representing string start locations within the slice (box 540). The method 500 may transmit the index table (box 545) and any other metadata that may be required to serve other decoding needs associated with the slice (box 550). As a final transmission associated with the slice, the method 500 may transmit data of the backpointer, which identifies the location of the index table and is transmitted in reverse bit order (box 555).
As indicated, operation of the method 500 advantageously allows coded data to be transmitted as it is generated, without having to build the index table first. The method 500 may mark location(s) of the entropy-coded strings as the video data is generated and transmitted. The method 500 may transmit the index table (box 545) and, finally, the backpointer (box 555) without incurring delays that would be associated with transmitting the index table as part of the slice header. In this manner, the method 500 contributes to reduced latency of transmission.
When the method 600 determines that the slice has been completed (box 630), the method 600 may build the index table representing start positions of the strings (box 630). The method 600 may transmit the index table and, finally, the backpointer to the decoder (box 635).
If at box 610 the method 600 determines that the slice will have the index table at the beginning of the slice, operation may advance to box 640. The method 600 may code video data of the slice and mark string locations within the slice (box 640). The method 600 may store the coded video data in a buffer for later transmission (box 645). The method 600 may repeat operations of boxes 640-645 until all video data associated with the slice has been coded (box 650).
When the method 600 determines that the slice has been completed (box 650), the method 600 may build the index table representing start positions of the strings (box 655). The method 600 may place the index table in the slice header (box 660) and, finally, transmit the entirety of the slice to a decoder (box 665).
The method of 600 finds application with a multi-modal coding system that supports use of index tables both at the beginning and at the end of slices. As indicated, transmitting an index table at the end of the slice can reduce latency because coded video data may be transmitted as it is created (boxes 615-620). The method 600 may prove to be a natural extension of some coders that already support coding protocols that provide index tables at the beginning of slices. Thus, although the operations of boxes 640-665 involve greater transmission latency than the operations of boxes 615-635 (because transmission does not occur until box 665, when the entire slice has been coded), the embodiment of
As illustrated in
The principles of the present invention also find application with strings that are coded independently of each other. In such an embodiment, the coding context of each string may reset to a predetermined state at the onset of each string and, therefore, threads need not pass coding contexts among one another. Thus, the present invention may apply to entropy slices and tiles.
Operation of the method 900 of
For receivers that load from disk (e.g., the channel is a storage device) or otherwise get the whole NAL unit in an atomic unit, a decoder will have instant access to the entirety of a slice upon receipt. The decoder may estimate the position of the back-pointer immediately, retrieve the index table and parse the slice payload to begin parallel threads as illustrated in
For receivers that receive coded slices incrementally, a decoder can perform single-threaded entropy decoding immediately upon reception. The decoder cannot perform parallel processing for entropy decoding, however, until the backpointer is received. Thus, if the decoder can perform single-thread entropy decoding at a rate faster than the data arrival rate, the decoder will never start a second thread, but this does not incur a performance loss because single-threaded entropy decoding likely is the most efficient decoding structure to employ in such cases.
If the data arrival rate is faster than the decoder's single-thread decode rate, the end-of-slice structure incurs a performance consequence. In this case, the decoder will perform single threaded entropy decoding until it receives and decodes the backpointer. Once the decoder decodes the backpointer, it may engage additional threads to decode whatever strings in the slice may remain for entropy decoding. Nevertheless, it is believed that the end-of-slice structure contributes to reduced latency overall because, as discussed in
The principles of the present invention also accommodate uses of end-of-slice coding for other types of coded information. The structure in Table 1 permits any data that must be generated after encoding to be transmitted after encoding, not just WPP entry points. Such other data may include post-filtering instructions or hints, or other information that is coding-dependent. For example, many coding systems also provide deblocking information within slice headers representing post-filtering operations that can be performed at a decoder following video reconstruction operations. Again, providing such deblocking in the beginning of slices can incur latency because a video coder must buffer all coded video data as it makes decisions as to the types of deblocking filters to be applied to the video, then code and insert its selections of the deblocking filters into the slice headers, before it can transmit the slice. Alternatively, the encoder may select a deblocking filter to be applied before coding occurs, which might prove to be sub-optimal. Embodiments of the present invention, therefore, as illustrated in
For ease of description, the preceding discussion has presented the entropy-coding and entropy-decoding processes in the context of a video coding/decoding system (
Several embodiments of the invention are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.
Tourapis, Alexandros, Zhou, Xiaosong, Singer, David W., Leontaris, Athanasios
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
5057917, | Jun 20 1990 | The United States of America as represented by the Administrator of the | Real-time data compression of broadcast video signals |
5122875, | Feb 27 1991 | GENERAL ELECTRIC COMPANY, A CORP OF NY | An HDTV compression system |
5583500, | Feb 10 1993 | RICOH COMPANY, LTD A CORP OF JAPAN; RICOH CORPORATION A CORP OF DELAWARE | Method and apparatus for parallel encoding and decoding of data |
5717394, | Feb 10 1993 | Ricoh Corporation | Method and apparatus for encoding and decoding data |
6028961, | Jul 31 1992 | Canon Kabushiki Kaisha | Image processing method and apparatus |
6222887, | Aug 05 1996 | Mitsubishi Denki Kabushiki Kaisha | Image coded data re-encoding apparatus without once decoding the original image coded data |
6493385, | Oct 23 1997 | Mitsubishi Denki Kabushiki Kaisha | Image encoding method, image encoder, image decoding method, and image decoder |
6496591, | Nov 18 1993 | DIGIMARC CORPORATION AN OREGON CORPORATION | Video copy-control with plural embedded signals |
7983443, | May 08 1995 | DIGIMARC CORPORATION AN OREGON CORPORATION | Methods for managing content using intentional degradation and insertion of steganographic codes |
8098740, | Jul 02 2007 | LG Electronics Inc | Digital broadcasting system and data processing method |
8098741, | Jul 02 2007 | LG Electronics Inc | Digital broadcasting system and data processing method |
8102921, | Jul 02 2007 | LG Electronics Inc | Digital broadcasting system and data processing method |
8520958, | Dec 21 2009 | STMICROELECTRONICS INTERNATIONAL N V | Parallelization of variable length decoding |
20020061184, | |||
20020141499, | |||
20060098734, | |||
20060147122, | |||
20060233239, | |||
20060291570, | |||
20070071096, | |||
20080049844, | |||
20080069247, | |||
20080120676, | |||
20080181300, | |||
20080285874, | |||
20090028247, | |||
20090080788, | |||
20090092326, | |||
20090129481, | |||
20100191859, | |||
20110125987, | |||
20110150351, | |||
20110200104, | |||
20110235699, | |||
20110243222, | |||
20110274180, | |||
20120007992, | |||
20120033742, | |||
20130022111, | |||
20140092987, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Sep 15 2012 | TOURAPIS, ALEXANDROS | Apple Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 029472 | /0464 | |
Sep 17 2012 | ZHOU, XIAOSONG | Apple Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 029472 | /0464 | |
Sep 19 2012 | LEONTARIS, ATHANASIOS | Apple Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 029472 | /0464 | |
Sep 25 2012 | SINGER, DAVID W | Apple Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 029472 | /0464 | |
Dec 14 2012 | Apple Inc. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
May 31 2016 | ASPN: Payor Number Assigned. |
Oct 31 2019 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Nov 01 2023 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Date | Maintenance Schedule |
May 17 2019 | 4 years fee payment window open |
Nov 17 2019 | 6 months grace period start (w surcharge) |
May 17 2020 | patent expiry (for year 4) |
May 17 2022 | 2 years to revive unintentionally abandoned end. (for year 4) |
May 17 2023 | 8 years fee payment window open |
Nov 17 2023 | 6 months grace period start (w surcharge) |
May 17 2024 | patent expiry (for year 8) |
May 17 2026 | 2 years to revive unintentionally abandoned end. (for year 8) |
May 17 2027 | 12 years fee payment window open |
Nov 17 2027 | 6 months grace period start (w surcharge) |
May 17 2028 | patent expiry (for year 12) |
May 17 2030 | 2 years to revive unintentionally abandoned end. (for year 12) |