A communication system for sending a sequence of symbols on a communication link. The system includes a transmitter for placing information indicative of the sequence of symbols on the communication link and a receiver for receiving the information placed on the communication link by the transmitter. The transmitter includes a clock for defining successive frames, each of the frames including M time intervals, where M is an integer greater than 1. A modulator modulates each of M carrier signals with a signal related to the value of one of the symbols thereby generating a modulated carrier signal corresponding to each of the carrier signals. The modulated carriers are combined into a sum signal which is transmitted on the communication link. The carrier signals include first and second carriers, the first carrier having a different bandwidth than the second carrier. In one embodiment, the modulator includes a tree-structured array of filter banks having M leaf nodes, each of the values related to the symbols forming an input to a corresponding one of the leaf nodes. Each of the nodes includes one of the filter banks. Similarly, the receiver can be constructed of a tree-structured array of sub-band filter banks for converting M time-domain samples received on the communication link to M symbol values. signal processing is performed by splitting a signal into subbands using a plurality of filter banks connected to form a tree-structured array. The filter banks are connected so that the signal is split into subbands of different size. The subbands can be designed to approximate the bands of the human auditory system for audio signal processing applications. Reconstruction of signals using a plurality of synthesis filter banks connected to form a tree-structured array is also performed.
|
0. 118. A method of regenerating a signal using a plurality of synthesis filter banks connected to form a tree-structured array having greater than two leaf nodes and a root node,
wherein each of the nodes comprises one synthesis filter bank having greater than two filters, with at least one of the leaf nodes having a number of filters that differs from the number of filters in a second lead node.
0. 5. A signal processing method comprising:
splitting a signal into subbands using a plurality of filter banks connected to form a tree-structured array having a root node and greater than two leaf nodes, each node comprising one filter bank having greater than two filters, and at least one of the leaf nodes having a number of filters that differs from the number of filters in a second leaf node.
0. 66. A signal processing system comprising:
means for splitting a signal into subbands using a plurality of filter banks that can connect to form a tree-structured array having a root node and greater than two leaf nodes, each node comprising one filter bank having greater than two filters, and at least one of the leaf nodes having a number of filters that differs from the number of filters in a second leaf node.
0. 34. A signal processing system comprising:
a plurality of filter banks that can connect to form a tree-structured array to split a signal into subbands, the tree-structured array having a root node and greater than two leaf nodes, each node comprising one filter bank having greater than two filters, and at least one of the leaf nodes having a number of filters that differs from the number of filters in a second leaf node.
0. 114. An information storage media having stored thereon audio information having been split into subbands using a plurality of filter banks connected to form a tree-structured array having a root node and greater than two leaf nodes, each node comprising one filter bank having greater than two filters, and at least one of the leaf nodes having a number of filters that differs from the number of filters in a second leaf node.
0. 18. A signal processing method comprising:
synthesizing a signal using a plurality of synthesis filter banks connected to form a tree-structured array having greater than two leaf nodes and a root node,
wherein each of the nodes comprises one synthesis filter bank having greater than two filters, with at least one of the leaf nodes having a number of filters that differs from the number of filters in a second leaf node.
0. 90. An information storage media having stored thereon information that when executed splits a signal into subbands using a plurality of filter banks connected to form a tree-structured array having a root node and greater than two leaf nodes, each node comprising one filter bank having greater than two filters, and at least one of the leaf nodes having a number of filters that differs from the number of filters in a second leaf node.
0. 76. A signal processing system comprising:
means for synthesizing a signal using a plurality of synthesis filter banks that can connect to form a tree-structured array having greater than two leaf nodes and a root node,
wherein each of the nodes comprises one synthesis filter bank having greater than two filters, with at least one of the leaf nodes having a number of filters that differs from the number of filters in a second leaf node.
0. 48. A signal processing system comprising:
a plurality of synthesis filter banks that can connect to form a tree-structured array to synthesize a signal, the tree-structured array having greater than two leaf nodes and a root node,
wherein each of the nodes comprises one synthesis filter bank having greater than two filters, with at least one of the leaf nodes having a number of filters that differs from the number of filters in a second leaf node.
0. 100. An information storage media having stored thereon information that when executed synthesizes a signal using a plurality of synthesis filter banks connected to form a tree-structured array having greater than two leaf nodes and a root node,
wherein each of the nodes comprises one synthesis filter bank having greater than two filters, with at least one of the leaf nodes having a number of filters that differs from the number of filters in a second leaf node.
0. 120. A method of reconstructing a signal using a plurality of synthesis filter banks connected in a tree-structured array having a first and a second level,
wherein the first level comprises more than two first level synthesis filter banks, and one first level synthesis filter bank has a different number of filters than another first level synthesis filter bank, and
the second level comprises one synthesis filter bank having more than two filters, the second level having as inputs the outputs of the first level synthesis filter banks.
0. 12. A signal processing method comprising:
splitting a signal into sub-bands using a plurality of filter banks connected in a tree-structured array having a first and a second level;
the first level comprising one first level filter bank having more than two filters; and
the second level comprising at least two second level filter banks, each second level filter bank having as input an output from a different filter in the first level, wherein one second level filter bank has a different number of filters than another second level filter bank.
0. 26. A signal processing method comprising:
synthesizing a signal using a plurality of synthesis filter banks connected in a tree-structured array having a first and a second level,
wherein the first level comprises more than two first level synthesis filter banks, and one first level synthesis filter bank has a different number of filters than another first level synthesis filter bank, and
the second level comprises one synthesis filter bank having more than two filters, the second level having as inputs the outputs of the first level synthesis filter banks.
0. 71. A signal processing system comprising:
means for splitting a signal into sub-bands using a plurality of filter banks that can connect to form a tree-structured array having a first and a second level;
the first level comprising one first level filter bank having more than two filters; and
the second level comprising at least two second level filter banks, each second level filter bank having as input an output from a different filter in the first level, wherein one second level filter bank has a different number of filters than another second level filter bank.
0. 116. A information storage media having stored thereon audio information having been split into sub-bands using a plurality of filter bands connected in a tree-structured array having a first and a second level;
the first level comprising one first level filter bank having more than two filters; and
the second level comprising at least two second level filter banks, each second level filter bank having as input an output from a different filter in the first level, wherein one second level filter bank has a different number of filters than another second level filter bank.
0. 95. An information storage media having stored thereon information that when executed splits a signal into sub-bands using a plurality of filter banks connected in a tree-structured array having a first and a second level;
the first level comprising one first level filter bank having more than two filters; and
the second level comprising at least two second level filter banks, each second level filter bank having as input an output from a different filter in the first level, wherein one second level filter bank has a different number of filters than another second level filter bank.
0. 83. A signal processing system comprising:
means for synthesizing a signal using a plurality of synthesis filter banks that can connect to form a tree-structured array having a first and a second level,
wherein the first level comprises more than two first level synthesis filter banks, and one first level synthesis filter bank has a different number of filters than another first level synthesis filter bank, and
the second level comprises one synthesis filter bank having more than two filters, the second level having as inputs the outputs of the first level synthesis filter banks.
0. 41. A signal processing system comprising:
a plurality of filter banks designed that can connect to form a tree-structured array to split a signal into subbands, the tree-structured array having a first and a second level;
the first level comprising one first level filter bank having more than two filters; and
the second level comprising at least two second level filter banks, each second level filter bank having as input an output from a different filter in the first level, wherein one second level filter bank has a different number of filters than another second level filter bank.
0. 107. An information storage media having stored thereon information that when executed synthesizes a signal using a plurality of synthesis filter banks connected in a tree-structured array having a first and a second level,
wherein the first level comprises more than two first level synthesis filter banks, and one first level synthesis filter bank has a different number of filters than another first level synthesis filter bank, and
the second level comprises one synthesis filter bank having more than two filters, the second level having as inputs the outputs of the first level synthesis filter banks.
0. 57. A signal processing system comprising:
a plurality of synthesis filter banks designed that can connect to form a tree-structured array to synthesize a signal, the tree-structured array having a first and a second level,
wherein the first level comprises more than two first level synthesis filter banks, and one first level synthesis filter bank has a different number of filters than another first level synthesis filter bank, and
the second level comprises one synthesis filter bank having more than two filters, the second level having as inputs the outputs of the first level synthesis filter banks.
1. A communication system for sending a sequence of symbols on a communication link a sequence of symbols having values representative of said symbols, said communication system comprising a transmitter for placing information indicative of said sequence of symbols on said communication link and a receiver for receiving said information placed on said communication link by said transmitter, said transmitter comprising
a clock for defining successive frames, each said frame comprising M time intervals, where M is an integer greater than 1;
a modulator modulating each of M carrier signals with a signal related to the value of one of said symbols thereby generating a modulated carrier signal corresponding to each of said carrier signals that is to be modulated and generating a sum signal comprising a sum of said modulated carrier signals, said modulator comprising a tree-structured array of filter banks having nodes, including a root node and M leaf nodes, each of said values related to said symbols forming an input to a corresponding one of said leaf nodes, each of said nodes, other than said leaf nodes, comprising one of said filter banks; and
an output circuit for transmitting said sum signal on said communication link, wherein said carrier signals comprise first and second carriers, said first carrier having a different bandwidth than said second carrier.
3. A communication system for sending a sequence of symbols on a communication link, said communication system comprising a transmitter for placing information indicative of said sequence of symbols on said communication link, said transmitter comprising:
a clock for defining successive frames, each said frame comprising M time intervals, where M is an integer greater than 1;
a modulator modulating each of M carrier signals with a signal related to the value of one of said symbols thereby generating a modulated carrier signal corresponding to each of said carrier signals that is to be modulated and generating a sum signal comprising a sum of said modulated carrier signals;
an output circuit transmitting said sum signal on said communication link, wherein said carrier signals comprise first and second carriers, said first carrier having a different bandwidth than said second carrier; and
a receiver comprising:
an input circuit for receiving and storing M time-domain samples transmitted on said communication link; and
a decoder for recovering said M symbol values, said decoder comprising a tree-structured array of sub-band filter banks, said received M time-domain samples forming the input of a root node of said tree-structured array said decorder decoder and said M symbol values being generated by the leaf nodes of said tree-structured array decorder decoder, each said sub-band filter bank comprising a plurality of fir filters having a common input for receiving an input time-domain signal, each said filter generating an output signal representing a symbol value in a corresponding frequency band.
2. The communication system of
an input circuit for receiving and storing M time-domain samples transmitted on said communication link; and
a decoder for recovering said M symbol values, said decoder comprising a tree-structured array of sub-band filter banks, said received M time-domain samples forming the input of a root node of said tree-structured array of said decoder and said M symbol values being generated by the leaf nodes of said tree-structured array of said decoder, each said sub-band filter bank comprising a plurality of fir filters having a common input for receiving an input time-domain signal, each said filter generating an output signal representing a symbol value in a corresponding frequency band.
4. The communication system of
0. 6. The method of
0. 7. The method of
0. 8. The method of
0. 9. The method of
0. 10. The method of
0. 11. The method of
0. 13. The method of
0. 14. The method of
0. 15. The method of
0. 16. The method of
0. 17. The method of
0. 19. The method of
0. 20. The method of
0. 21. The method of
0. 22. The method of
0. 23. The method of
0. 24. The method of
0. 25. The method of
0. 27. The method of
0. 28. The method of
0. 29. The method of
0. 30. The method of
0. 31. The method of
0. 32. The method of
0. 33. The method of
0. 35. The system of
0. 36. The system of
0. 37. The system of
0. 38. The system of
0. 39. The system of
0. 40. The system of
0. 42. The system of
0. 43. The system of
0. 44. The system of
0. 45. The system of
0. 46. The system of
0. 47. The system of
0. 49. The system of
0. 50. The system of
0. 51. The system of
0. 52. The system of
0. 53. The system of
0. 54. The system of
0. 55. The system of
0. 56. The system of
0. 58. The system of
0. 59. The system of
0. 60. The system of
0. 61. The system of
0. 62. The system of
0. 63. The system of
0. 64. The system of
0. 65. The system of
0. 67. The system of
0. 68. The system of
0. 69. The system of
0. 70. The system of
0. 72. The system of
0. 73. The system of
0. 74. The system of
0. 75. The system of
0. 77. The system of
0. 78. The system of
0. 79. The system of
0. 80. The system of
0. 81. The system of
0. 82. The system of
0. 84. The system of
0. 85. The system of
0. 86. The system of
0. 87. The system of
0. 88. The system of
0. 89. The system of
0. 91. The media of
0. 92. The media of
0. 93. The media of
0. 94. The media of
0. 96. The media of
0. 97. The media of
0. 98. The media of
0. 99. The media of
0. 101. The media of
0. 102. The media of
0. 103. The media of
0. 104. The media of
0. 105. The media of
0. 106. The media of
0. 108. The media of
0. 109. The media of
0. 110. The media of
0. 111. The media of
0. 112. The media of
0. 113. The media of
0. 115. The media of
0. 117. The media of
0. 119. The media of
0. 121. The media of
|
While digital audio recordings provide many advantages over analog systems, the data storage requirements for high-fidelity recordings are substantial. A high fidelity recording typically requires more than one million bits per second of playback time. The total storage needed for even a short recording is too high for many computer applications. In addition, the digital bit rates inherent in non-compressed high fidelity audio recordings makes the transmission of such audio tracks over limited bandwidth transmission systems difficult. Hence, systems for compressing audio sound tracks to reduce the storage and bandwidth requirements are in great demand.
One class of prior an audio compression systems divide the sound track into a series of segments. Over the time interval represented by each segment, the sound track is analyzed to determine the signal components in each of a plurality of frequency bands. The measured components are then replaced by approximations requiring fewer bits to represent, but which preserve features of the sound track that are important to a human listener. At the receiver, an approximation to the original sound track is generated by reversing the analysis process with the approximations in place of the original signal components.
The analysis and synthesis operations are normally carried out with the aid of perfect, or near perfect, reconstruction filter banks. The systems in question include an analysis filter bank which generates a set of decimated subband outputs from a segment of the sound track. Each decimated subband output represents the signal in a predetermined frequency range. The inverse operation is carried out by a synthesis filter bank which accepts a set of decimated subband outputs and generates therefrom a segment of audio sound track. In practice, the synthesis and analysis filter banks are implemented on digital computers which may be general purpose computers or special computers designed to more efficiently carry out the operations. If the analysis and synthesis operations are carried out with sufficient precision, the segment of audio sound track generated by the synthesis filter bank will match the original segment of audio sound track that was inputted to the analysis filter bank. The differences between the reconstructed audio sound track and the original sound track can be made arbitrarily small. In this case, the specific filter bank characteristics such as the length of the segment analyzed, the number of filters in the filter bank, and the location and shape of filter response characteristics would be of little interest, since any set of filter banks satisfying the perfect, or near-perfect, reconstruction condition would exactly regenerate the audio segment.
Unfortunately, the replacement of the frequency components generated by the analysis filter band with a quantized approximation thereto results in artifacts that do depend on the detail characteristics of the filter banks. There is no single segment length for which the artifacts in the reconstructed audio track can be minimized. Hence, the length of the segments analyzed in prior art systems is chosen to be a compromise. When the frequency components are replaced by approximations, an error is introduced in each component. An error in a given frequency component produces an acoustical effect which is equivalent to the introduction of a noise signal with frequency characteristics that depend on filter characteristics of the corresponding filter in the filter bank. The noise signal will be present over the entire segment of the reconstructed sound track. Hence, the length of the segments is reflected in the types of artifacts introduced by the approximations. If the segment is short, the artifacts are less noticeable. Hence, short segments are preferred. However, if the segment is too short, there is insufficient spectral resolution to acquire information needed to properly determine the minimum number of bits needed to represent each frequency component. On the other hand, if the segment is too long, temporal resolution of the human auditory system will detect artifacts.
Prior art systems also utilize filter banks in which the frequency bands are uniform in size. Systems with a few (16-32) sub-bands in a 0-22 kHz frequency range are generally called “subband coders” while those with a large number of sub-bands (.gtoreq.64) are called “transform coders”. It is known from psychophysical studies of the human auditory system that there are critical bandwidths which vary with frequency. The information in a critical band may be approximated by a component representing the time averaged signal amplitude in the critical band.
In addition, the ear's sensitivity to a noise source in the presence of a localized frequency component such as a sine tone depends on the relative levels of the signals and on the relation of the noise spectral components to the tone. The errors introduced by approximating the frequency components may be viewed as “noise”. The noise becomes significantly less audible if its spectral energy is within one critical bandwidth of the tone. Hence, it is advantageous to use frequency decompositions which approximate the critical band structure of the auditory system.
Systems which utilize uniform frequency bands are poorly suited for systems designed to take advantage of this type of approximation. In principle, each audio segment can be analyzed to generate a large number of uniform frequency bands, and then, several bands at the higher frequencies could be merged to provide a decomposition into critical bands. This approach imposes the same temporal constraints on all frequency bands. That is, the time window over which the low frequency data is generated for each band is the same as the time window over which each high-frequency band is generated. To provide accuracy in the low frequency ranges, the time window must be very long. This leads to temporal artifacts that become audible at higher frequencies. Hence, systems in which the audio segment is decomposed into uniform sub-bands with adequate low-frequency resolution cannot take full advantage of the critical band properties of the auditory system.
Prior art systems that recognize this limitation have attempted to solve the problem by utilizing analysis and synthesis filter banks based on QMF filter banks that analyze a segment of an audio sound track to generate frequency components in two frequency bands. To obtain a decomposition of the segment into frequency components representing the amplitudes of the signal in critical bands, these two frequency based QMF filters are arranged in a tree-structured configuration. That is, each of the outputs of the first level filter becomes the input to another filter bank at least one of whose two outputs is fed to yet another level, and so on. The leaf nodes of this tree provide an approximation to a critical band analysis of the input audio track. It can be shown that this type of filter bank used different length audio segments to generate the different frequency components. That is, a low frequency component represents the signal amplitude in an audio segment that is much longer than a high-frequency component. Hence, the need to choose a single compromise audio segment length is eliminated.
While tree structured filter banks having many layers may be used to decompose the frequency spectrum into critical bands, such filter banks introduce significant aliasing artifacts that limit their utility. In a multilevel filter bank, the aliasing artifacts are expected to increase exponentially with the number of levels. Hence, filter banes with large numbers of levels are to be avoided. Unfortunately, filter banks based on QMF filters which divide the signal into two bandlimited signals require large numbers of levels.
Prior art audio compression systems are also poorly suited to applications in which the playback of the material is to be carried out on a digital computer. The use of audio for computer applications is increasingly in demand. Audio is being integrated into multimedia applications such as computer based entertainment, training, and demonstration systems. Over the course of the next few years, many new personal computers will be outfitted with audio playback and recording capability. In addition, existing computers will be upgraded for audio with the addition of plug-in peripherals.
Computer based audio and video systems have been limited to the use of costly outboard equipment such as an analog laser disc player for playback of audio and video. This has limited the usefulness and applicability of such systems. With such systems it is necessary to provide a user with a highly specialized playback configuration, and there is no possibility of distributing the media electronically. However, personal computer based systems using compressed audio and video data promise to provide inexpensive playback solutions and allow distribution of program material on digital disks or over a computer network.
Until recently, the use of high quality audio on computer platforms has been limited due to the enormous data rate required tier storage and playback. Quality has been compromised in order to store the audio data conveniently on disk. Although some increase in performance and some reduction in bandwidth has been gained using conventional audio compression methods, these improvements have not been sufficient to allow playback of high fidelity recordings on the commonly used computer platforms without the addition of expensive special purpose hardware.
One solution to this problem-would be to use lower quality playback on computer platforms that lack the computational resources to decode compressed audio material at high fidelity quality levels. Unfortunately, this solution requires that the audio material be coded at various quality levels. Hence, each audio program would need to be stored in a plurality of formats. Different types of users would then be sent the format suited to their application. The cost and complexity of maintaining such multi-format libraries makes this solution unattractive. In addition, the storage requirements of the multiple formats partially defeats the basic goal of reducing the amount of storage needed to store the audio material.
Furthermore, the above discussion assumes that the computational resources of a particular playback platform are fixed. This assumption is not always true in practice. The computational resources of a computing system are often shared among a plurality of applications that are running in a time-shared environment. Similarly, communication links between the playback platform and shared storage facilities also may be shared. As the playback resources change, the format of the audio material must change in systems utilizing a multi-format compression approach. This problem has not been adequately solved in prior art systems.
FIG. 8(a) is a block diagram of an audio filter based on a low-frequency filter and a modulator.
FIG. 8(b) is a block diagram of a sub-band analysis filter for generating a set of frequency components.
where the xi, for i=0 . . . W−1 are the values stored in shift register 320, and the hi are coefficients of a low pass prototype filter which are stored in controller 325. For those wishing a more detailed explanation of the process for generating sets of filter coefficients, see J. Rothweiler, “POLYPHASE QUADRATURE FILTERS—A NEW SUB-BAND CODING TECHNIQUE” IEEE Proceedings of the 1983 ICASSP Conference, pp 1280-1283. The polyphase components are then generated from the Zi by the following summing operations:
The frequency components, Si, are obtained via the following matrix multiplication from the polyphase components
This operation is equivalent to passing the polyphase components through M finite impulse response filters of length 2M. The cosine modulation of the polyphase components shown in Eq. (3a) may be replaced by other such modulation terms. The form shown in Eq. (8a) leads to near-perfect reconstruction. An alternative modulation scheme which allows for perfect reconstruction is as follows:
It can be seen by comparison to FIG. 5(a) that the matrix multiplication provides an operation analogous to the modulation of the incoming audio signal. The windowing operation performs the analysis with the prototype low-frequency filter.
As will be discussed in more detail below, the computational workload in analyzing and synthesizing audio tracks, of a great importance in providing systems that can operate on general purpose computing platforms. It will be apparent from the above discussion that the computational workload inherent in generating M frequency components from a window of W audio sample values is approximately (W+2M2) multiplies and adds. In this regard, it should be noted that a two level filter bank of the type used in the present invention significantly reduces the overall computational workload even in situations in which the frequency spectrum is to be divided into uniform bands. For example, consider a system in which the frequency spectrum is to be divided into 64 bands utilizing a window of 512 samples. If a prior art one level filter bank is utilized, the workload will be approximately 8,704 multiplies and adds. If the filter bank is replaced by a two level filter bank according to the present invention, then the filter bank will consist of 9 filter banks, each dividing the frequency spectrum into 8 bands. The computational workload inherent in this arrangement is only 5,760 multiplies and adds. Hence, a filter bank according to the present invention typically requires less computational capability than a one level filter bank according to the prior art. In addition, a filter bank according to the present invention also provides a means for providing a non-uniform band structure.
The transformation of the audio signal into sets of frequency components as described above does not, in itself, result in a decrease in the number of bits needed to represent the audio signal. For each M audio samples received by a sub-band analysis filter, M frequency components are generated. The actual signal compression results from the quantization of the frequency components. As noted above, the number of bits that must be allocated to each frequency component is determined by a phenomena known as “masking”. Consider a tone at a frequency f. The ability of the ear to detect a signal at frequency f′ depends on the energy in the tone and difference in frequency between the signal and the tone, i.e., (f−f′). Research in human hearing has led to measurements of a threshold function T(E,f,f′) which measures the minimum energy at which the second frequency component can be detected in the presence of the first frequency component with energy E. In general, the threshold function will vary in shape with frequency.
The threshold function is used to construct a masking function as follows. Consider a segment of the incoming audio signal. Denote the energy as a function of frequency in this segment by E(t). Then a mask level, L(f), is constructed by convolving E(f) and T(f,f′), i.e.,
L(f)=∫T(E(f′)f,f′)E(f′)df (4)
Consider the filtered signal value in a band f0±Δf. Denote the minimum value of L in this frequency band by Lmin. It should be noted that Lmin may depend on frequency components outside the band in question, since a peak in an adjacent band may mask a signal in the band in question.
According to the masking model, any noise in this frequency band that has an energy less than Lmin will not be perceived by the listener. In particular, the noise introduced by replacing the measured signal amplitude in this band by a quantized approximation therefore will not be perceived if the quantization error is less than Lmin. The noise in question will be less than Lmin if the signal amplitude is quantized to accuracy equal to S/Lmin, where S is the energy of the signal in the band in question.
The above-described quantization procedure requires a knowledge of frequency spectrum of the incoming audio signal at a resolution which is significantly greater than that of the sub-analysis of the incoming signal. In general, the minimum value of the mask function L will depend on the precise location of any peaks in the frequency spectrum of the audio signal. The signal amplitude provided by the sub-band analysis filter measures the average energy in the frequency band; however, it does not provide any information about the specific location of any spectral peaks within the band.
Hence, a more detailed frequency analysis of the incoming audio signal is required. This can be accomplished by defining a time window about each filtered signal component and performing a frequency analysis of the audio samples in this window to generate an approximation to E(f). In prior art systems, the frequency analysis is typically performed by calculating a FFT of the audio samples in the time window.
In one embodiment of a quantization sub-component according to the present invention, this is accomplished by further subdividing each sub-band using another layer of filter banks. The output of each of the sub-band filters in the analysis filter bank is inputted to another sub-band analysis filter which splits the original sub-band into a plurality of finer sub-bands. These finer sub-bands provide a more detailed spectral measurement of the audio signal in the frequency band in question, and hence, can be used to compute the overall mask function L discussed above.
While a separate Lmin value may be calculated for each filtered signal value from each sub-band filter, the preferred embodiment of the present invention operates on blocks of filtered signal values. If a separate quantization step size is used for each filtered value, then the step size would need to be communicated with each filtered value. The bits needed to specify the step size reduce the degree of compression. To reduce this “overhead”, a block of samples is quantized using the same step size. This approach reduces the number of overhead bits/sample, since the step size need only be communicated once. The blocks of filtered samples utilized consist of a sequential set of filtered signal values from one of the sub-band filters. As noted above, these values can be inputted to a second sub-band analysis filter to obtain a fine spectral measurement of the energy in the sub-band.
One embodiment of such a system is shown in
The manner in which an audio decompression system according to the present invention operates will now be explained with the aid of
The filtered samples are inputted to an inverse sub-band filter 426 which generates an approximation to the original audio signal from the filtered signal values. Filter 402 shown in FIG. 9 and filter 426 form a perfect, or near perfect, reconstruction filter bank. Hence, if the filtered samples had not been replaced by approximation thereto by quantizer 404, the decompressed signal generated by filter bank 426 would exactly match the original audio signal input to filter 402 to a specified precision.
Inverse sub-band filter bank 426 also comprises a tree-structured filter bank. To distinguish the filters used in the inverse sub-band filters from those used in the sub-band filter banks which generated the filtered audio samples, the inverse filter banks will be referred to as synthesizers. The filtered signal values enter the tree at the leaf nodes thereof, and the reconstructed audio signal exits from the root node of the tree. The low and intermediate filtered samples pass through two levels of synthesizers. The first level of synthesizers are shown at 427 and 428. For each group of four filtered signal values accepted by synthesizers 427 and 428, four sequential values which represent filtered signal values in a frequency band which is four times wider are generated. Similarly, for each group of eight filtered signal values accepted by synthesizer 429, eight sequential values which represent filtered signal values in a frequency band which is eight times as wide are generated. Hence, the number of signal values entering synthesizer 430 on each input is now the same even though the number of signal values provided by de-quantizer 414 for each frequency band varied from band to band.
The synthesis of the audio signal from the sub-band components is carried out by analogous operations. Given M sub-band components that were obtained from 2M polyphase components Pi, the original polyphase components can be obtained from the following matrix multiplication:
As noted above, there are a number of different cosine modulations that may be used. Eq. (5a) corresponds to modulation using the relationship shown in Eq. 3(a). If the modulation shown in Eq. 3(b) is utilized, then the polyphase components are obtained from the following matrix multiplication:
The time domain samples xk are computed from the polyphase components by the inverse of the windowing transform described above. A block diagram of a synthesizer according to the present invention is shown in FIG. 11 at 500. The M frequency components are first transformed into the corresponding polyphase components by a matrix multiplication shown at 510. The resultant 2M polyphase components are then shifted into a 2W entry shift register 512 and the oldest 2M values in the shift register are shifted out and discarded. The contents in the shift register are inputted to array generator 513 which builds a W value array 514 by iterating the following loop 8 times: take the first M samples from shift register 512, ignore the next 2M samples, then take the next M samples. The contents of array 514 are then multiplied by W weight coefficients, h′i which are related to the .hsub.i used in the corresponding sub-band analysis filter to generate a set of weighted valueswi=h′i*ui, which are stored in array 516. Here the ui are the contents of array 514. The M time domain samples, xi for j=0, . . . M−1, are then generated by summing circuit 518 which sums the appropriate wi values, i.e.,
While the above-described embodiments of synthesizers and sub-band analysis filters are described in terms of special purpose hardware for carrying out the various operations, it will be apparent to those skilled in the art that the entire operation may be carried out on a general purpose digital computer.
As pointed out above, it would be advantageous to provide a single high-quality compressed audio signal that could be played back on a variety of playback platforms having varying computational capacities. Each such playback platform would reproduce the audio material at a quality consistent with the computational resources of the platform.
Furthermore, the quality of the playback should be capable of being varied in real time as the computational capability of the platform varies. This last requirement is particularly important in playback systems comprising multi-tasking computers. In such systems, the available computational capacity for the audio material varies in response to the computational needs of tasks having equal or higher priority. Prior art decompression systems due not provide this capability.
The present invention allows the quality of the playback to be varied in response to the computational capability of the playback platform without the use of multiple copies of the compressed material. Consider an audio signal that has been compressed using a sub-band analysis filter bank in which the window contains W audio samples. The computational workload required to decompress the audio signal is primarily determined by the computations carried out by the synthesizers. The computational workload inherent in a synthesizer is W multiplies and adds from the windowing operations and 2M2 multiplies and adds from the matrix multiplication. The extent to which the filters approximate an ideal band pass filter, in general, depends on the number samples in the window, i.e., W. As the number of samples increases, the discrepancy between the sub-band analysis filter performance and that of an ideal band pass filter decreases. For example, a filter utilizing 128 samples has a side lobe suppression in excess of 48 dB, while a filter utilizing 512 samples has a side lobe suppression in excess of 96 dB. Hence, synthesis quality can be traded for a reduction in computational workload if a smaller window is used for the synthesizers.
In the preferred embodiment of the present invention, the size of the window used to generate the sub-band analysis filters in the compression system is chosen to provide filters having 96 dB rejection of signal energy outside a filter band. This value is consistent with playback on a platform having 16 bit D/A converters. In the preferred embodiment of the present invention, this condition can be met by 512 samples. The prototype filter coefficients, hi, viewed as a function of i have a more or less sine-shaped appearance with tails extending from a maximum. The tails provide the corrections which result in the 96 dB rejection. If the tails are truncated, the filter bands would have substantially the same bandwidths and center frequencies as those obtained from the non-truncated coefficients. However, the rejection of signal energy outside a specific filter's band would be less than the 96 dB discussed above. As a result, a compression and decompression system based on the truncated filter would show significantly more aliasing than the non-truncated filter.
The present invention utilizes this observation to trade sound quality for a reduction in computational workload in the decompression apparatus. In the preferred embodiment of the present invention, the audio material is compressed using filters based on a non-truncated prototype filter. When the available computational capacity of the playback platform is insufficient to provide decompression using synthesis filters based on the non-truncated prototype filter, synthesizers based on the truncated filters are utilized. Truncating the prototype filter leads to synthesizers which have the same size window as those based on the non-truncated prototype. However, many of the filter coefficients used in the windowing operation are zero. Since the identity of the coefficients which are now zero is known, the multiplications and additions involving these coefficients can be eliminated. It is the elimination of these operations that provides the reduced computational workload.
It should be noted that many playback platforms use D/A converters with less than 16 bits. In these cases, the full 96 dB rejection is beyond the capability of the platform; hence, the system performance will not be adversely effected by using the truncated filter. These platforms also tend to be the less expensive computing systems, and hence, have lower computational capacity. Thus, the trade-off between computational capacity and audio quality is made at the filter level, and the resultant system provides an audio quality which is limited by its D/A converters rather than its computational capacity.
Another method for trading sound quality for a reduction in computational workload is to eliminate the synthesis steps that involve specific high-frequency components. If the sampled values in one or more of the high-frequency bands are below some predetermined threshold value, then the values can be replaced by zero. Since the specific components for which the substitution is made are known, the multiplications and additions involving these components may be eliminated, thereby reducing the computational workload. The magnitude of the distortion generated in the reconstructed audio signal will, of course, depend on the extent of the error made in replacing the sampled values by zeros. If the original values were small, then the degradation will be small. This is more often the case for the high-frequency filtered samples than for the low frequency filtered samples. In addition, the human auditory system is less sensitive at high frequencies; hence, the distortion is less objectionable.
It should also be noted that the computational workload inherent in decompressing a particular piece of audio material varies during the material. For example, the high-frequency filtered sampled may only have a significant amplitude during pans of the sound track. When the high-frequency components are not present or sufficiently small to be replaced by zeros without introducing noticeable distortions, the computational workload can be reduced by not performing the corresponding multiplications and additions. When the high-frequency components are large, e.g., during attacks, the computational workload is much higher.
It should be noted that the computational work associated with generating the Pk values from the Si values can be organized by Si. That is, the contribution to each Pk from a given Si is calculated, then the contribution to each Pk from Si+1, and so on. Since there are 2M P values involved with each value of S, the overhead involved in testing each value of S before proceeding with the multiplications and additions is small compared to the computations saved if a particular S value is 0 or deemed to be negligible. In the preferred embodiment of the present invention, the computations associated with Si are skipped if the absolute value of Si is less than some predetermined value, ε.
Because of the variation in workload, the preferred embodiment of the present invention utilizes a buffering system to reduce the required computational capacity from that needed to accommodate the peak workload to that need to accommodate the average workload. In addition, this buffering facilitates the use of the above-described techniques for trading off the required computational capacity against sound quality. For example, when the computational workload is determined to be greater than that available, the value of .epsilon. can be increased which, in turn, reduces the number of calculations needed to generate the Pk values.
A block diagram of an audio decompression system utilizing the above-described variable computational load techniques is shown in
If the number of stored values exceeds a second predetermined value, controller 614 adjusts the computational algorithm to regain audio quality if synthesizer 606 is not currently running in a manner that provides the highest audio quality. In this case, controller 614 reverses the approximations introduced into synthesizer 606 discussed above.
While audio decompression system 600 has been discussed in terms of individual computational elements, it will be apparent to those skilled in the art that the functions of decoder 602, de-quantizer 604, synthesizer 606, buffer 608 and controller 614 can be implemented on a general purpose digital computer. In this case, the functions provided by clock 609 may be provided by the computer's clock circuitry.
In stereophonic decompression systems having parallel computational capacity, two synthesizers may be utilized. A stereophonic decompression system according to the present invention is shown in
If a stereophonic decompression system does not have parallel computational capacity, then the regeneration of the left and right audio channels must be carried out by time-sharing a single synthesizer. When the computational workload exceeds the capacity of the decompression system, the trade-offs discussed above may be utilized to trade audio quality for a reduction in the computational workload. In addition, the computational workload may be reduced by switching to a monaural reproduction mode, thereby reducing the computational workload imposed by the synthesis operations by a factor of two.
A stereophonic decompression system using this type of serial computation system is shown in
The techniques described above for varying the computational complexity required to synthesize a signal may also be applied to vary the computational complexity required to analyze a signal. This is particularly important in situations in which the audio signal must be compressed in real time prior to being distributed through a communication link having a capacity which is less than that needed to transmit the uncompressed audio signal. If a computational platform having sufficient capacity to compress the audio signal at full audio quality is available, the methods discussed above can be utilized.
However, there are situations in which the computational capacity of the compression platform may be limited. This can occur when the computational platform has insufficient computing power, or in cases in which the platform performing the compression may also include a general purpose computer that is time-sharing its capacity among a plurality of tasks. In the later case, the ability to trade-off computational workload against audio quality is particularly important.
A block diagram of an audio compression apparatus 850 utilizing variable computational complexity is shown in
When M such signal values have been received, sub-band analysis filter bank 856 generates M signal components from these samples while the next M audio samples are being received. The signal components are then quantized and coded by quantizer 858 and stored in an output buffer 860. The compressed audio signal data is then transmitted to the communication link at a regular rate that is determined by clock 862 and controller 864.
Consider the case in which sub-band analysis filter 856 utilizes a computational platform that is shared with other applications running on the platform. When the computational capacity is restricted, sub-band analysis filter bank 856 will not be able to process incoming signal values at the same rate at which said signal values are received. As a result, the number of signal values stored in buffer 854 will increase. Controller 864 periodically senses the number of values stored in buffer 854. If the number of values exceeds a predetermined number, controller 864 alters the operations of sub-band analysis filter bank 856 in a manner that decreases the computational workload of the analysis process. The audio signal synthesized from the resulting compressed audio signal will be of lesser quality than the original audio signal; however, compression apparatus 850 will be able to keep up with the incoming data rate. When controller 864 senses that the number of samples in buffer 854 returns to a safe operating level, it alters the operation of sub-band analysis filter bank 856 in such a manner that the computational workload and audio quality increases.
Many of the techniques described above may be used to vary the computational workload of the sub-band analysis filter. First, the prototype filter may be replaced by a shorter filter or a truncated filter thereby reducing the computational workload of the windowing operation. Second, the higher frequency signal components can be replaced by zero's. This has the effect of reducing “M” and thereby reducing the computational workload.
Third, in stereophonic systems, the audio signals from each of the microphones 851 and 852 can be combined by circuitry in buffer 854 to form a monaural signal which is analyzed. The compressed monaural signal is then used for both the left and right channel signals.
However, for the purposes of the present discussion, it is sufficient to note that the filters may be implemented as finite impulse response filters with real filter coefficients. If the synthesis filter generates M coefficients per frame representing the amplitude of the transmitted signal, the filter bank accepts M frequency-domain symbols and generates M time-domain coefficients. However, it should be noted that the M coefficients generated may also depend on symbols received prior to the M frequency-domain symbols of the current frame. Similarly, the analysis filter bank demodulates M frequency-domain symbols from M time-domain received signal values in a given frame, and the resulting M symbols may depend on previous frames of M time-domain signal values processed by the filter bank.
The communication bandwidth may alternatively be broken up into sub-bands of distinct (nonuniform) bandwidths by means of a single nonuniform filter bank transform. The synthesis filter bank, or frequency-domain-to-time-domain transform for converting symbols into signal values for transmission, is depicted in FIG. 4 16 at 300 for a system having K subchannels. If the subchannels are nonuniform in their bandwidth, distinct subchannels of the filter bank will operate at different upsampling rates, the upsampling rate of the kth subchannel will be denoted by Mk. The upsampling rates are subject to the critical sampling condition
Referring to FIG. 4 16, synthesis filter bank 300 generates Mtot time-domain samples in each time frame. Here, Mtot is the least common multiple of the upsampling rates Mk provided by the upsamplers of which 302 is typical. Define the integers nk by
In each frame of transform processing, nk symbols, denoted by sk,i, are mapped onto the kth subchannel using the sequence, fk, as the modulating waveform to generate a time domain sequence, xk, representing the symbols in the kth subchannel, i.e.,
Note that symbols from previous frames may contribute to the output of a given frame. Each of the contributions xk from the K distinct subchannels are added together, as shown at 301, to produce a set of Mtot time-domain signal values x[n] from Mtot input symbols Sk,i during the given frame. The kth subchannel will have a bandwidth that is 1/Mk as large as that occupied by the full transmitted signal.
At the receiver, the incoming discrete signal values x′[n] are passed through an analysis filter bank 400, depicted in FIG. 17. The received signal values are denoted by x′ to emphasize that the samples have been altered by the transmission link. Each filter in this bank has a characteristic downsampling ratio Mk imposed after filtering by an finite impulse response filter, producing a set of Mtot output symbols s per frame. A typical filter is shown at 401 with its corresponding downsampler at 402. The output symbol stream for the kth subchannel is given by
Again, input signal values from preceding frames may contribute to the set of symbols output during a given frame.
We require that in an ideal channel, the subchannel waveforms, fk, together with the receive filters Hk satisfy perfect-reconstruction or near-perfect-reconstruction conditions, with an output symbol stream that is identical (except for a possible delay of an integer number of samples) to the input symbol stream. This is equivalent to the absence of inter-symbol and inter-channel interference upon reconstruction. Methods for the design of such finite-impulse-response filter bank waveforms are known to the art. The reader is referred to J. Li, T. Q. Nguyen, S. Tantaratana, “A simple design method for nonuniform multirate filter banks,” in Proc. Asilomar Conf. On Signals, Systems, and Computers, November 1994 for a detailed discussion of such filter banks.
Various modifications to the present invention will become apparent to those skilled in the art from the foregoing description and accompanying drawings. Accordingly, the present invention is to be limited solely by the scope of the following claims.
Tzannes, Michael A., Jayasimha, Sriram, Stautner, John P., Heller, Peter N., Morrell, William R.
Patent | Priority | Assignee | Title |
10311886, | Aug 12 2010 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Resampling output signals of QMF based audio codecs |
10687059, | Oct 01 2012 | DOLBY VIDEO COMPRESSION, LLC | Scalable video coding using subblock-based coding of transform coefficient blocks in the enhancement layer |
10748551, | Jul 16 2014 | NEC Corporation | Noise suppression system, noise suppression method, and recording medium storing program |
11134255, | Oct 01 2012 | DOLBY VIDEO COMPRESSION, LLC | Scalable video coding using inter-layer prediction contribution to enhancement layer prediction |
11361779, | Aug 12 2010 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V. | Resampling output signals of QMF based audio codecs |
11475905, | Aug 12 2010 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V. | Resampling output signals of QMF based audio codec |
11475906, | Aug 12 2010 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V. | Resampling output signals of QMF based audio codec |
11477467, | Oct 01 2012 | DOLBY VIDEO COMPRESSION, LLC | Scalable video coding using derivation of subblock subdivision for prediction from base layer |
11575921, | Oct 01 2012 | DOLBY VIDEO COMPRESSION, LLC | Scalable video coding using inter-layer prediction of spatial intra prediction parameters |
11589062, | Oct 01 2012 | DOLBY VIDEO COMPRESSION, LLC | Scalable video coding using subblock-based coding of transform coefficient blocks in the enhancement layer |
11676615, | Aug 12 2010 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V. | Resampling output signals of QMF based audio codec |
11682406, | Jan 28 2021 | Sony Interactive Entertainment LLC | Level-of-detail audio codec |
11790928, | Aug 12 2010 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V. | Resampling output signals of QMF based audio codecs |
11804232, | Aug 12 2010 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V. | Resampling output signals of QMF based audio codecs |
11810584, | Aug 12 2010 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V. | Resampling output signals of QMF based audio codecs |
11961531, | Aug 12 2010 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V. | Resampling output signals of QMF based audio codec |
12154583, | Aug 12 2010 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E.V. | Resampling output signals of QMF based audio codecs |
12155867, | Oct 01 2012 | DOLBY VIDEO COMPRESSION, LLC | Scalable video coding using inter-layer prediction contribution to enhancement layer prediction |
7783456, | Mar 19 2003 | Advantest Corporation | Wave detection device, method, program, and recording medium |
7881482, | May 13 2005 | Harman Becker Automotive Systems GmbH | Audio enhancement system |
8219871, | Mar 18 2008 | NEC Corporation | Efficient decoupling schemes for quantum systems using soft pulses |
8255212, | Jul 04 2006 | DOLBY INTERNATIONAL AB | Filter compressor and method for manufacturing compressed subband filter impulse responses |
8340317, | May 06 2003 | Harman Becker Automotive Systems GmbH | Stereo audio-signal processing system |
9595265, | Aug 12 2010 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Resampling output signals of QMF based audio codecs |
ER9351, | |||
RE42949, | Sep 21 1992 | HYBRID AUDIO, LLC | Stereophonic audio signal decompression switching to monaural audio signal |
Patent | Priority | Assignee | Title |
3947827, | May 29 1974 | WHITTAKER CORPORATION, A CORP OF DE | Digital storage system for high frequency signals |
4216354, | Dec 23 1977 | International Business Machines Corporation | Process for compressing data relative to voice signals and device applying said process |
4622680, | Oct 17 1984 | ERICSSON GE MOBILE COMMUNICATIONS INC | Hybrid subband coder/decoder method and apparatus |
4766562, | Mar 23 1985 | U S PHILIPS CORPORATION, A CORP OF DE | Digital analyzing and synthesizing filter bank with maximum sampling rate reduction |
4799179, | Feb 01 1985 | TELECOMMUNICATIONS RADIOELECTRIQUES ET TELEPHONIQUES T R T 88, A CORP OF FRANCE | Signal analysing and synthesizing filter bank system |
4815023, | May 04 1987 | General Electric Company | Quadrature mirror filters with staggered-phase subsampling |
4829378, | Jun 09 1988 | Telcordia Technologies, Inc | Sub-band coding of images with low computational complexity |
4852179, | Oct 05 1987 | Motorola, Inc. | Variable frame rate, fixed bit rate vocoding method |
4882754, | Aug 25 1987 | DIGIDECK INCORPORATED, A CORP OF CA | Data compression system and method with buffer control |
4896362, | Apr 27 1987 | U S PHILIPS CORPORATION | System for subband coding of a digital audio signal |
4918524, | Mar 14 1989 | Telcordia Technologies, Inc | HDTV Sub-band coding using IIR filter bank |
4949383, | Aug 24 1984 | Bristish Telecommunications public limited company | Frequency domain speech coding |
4972484, | Nov 21 1986 | Bayerische Rundfunkwerbung GmbH | Method of transmitting or storing masked sub-band coded audio signals |
5048054, | May 12 1989 | CIF LICENSING, LLC | Line probing modem |
5144569, | Jul 07 1989 | Siemens Nixdorf Informationssysteme Aktiengesellschaft | Method for filtering digitized signals employing all-pass filters |
5170413, | Dec 24 1990 | Motorola, Inc. | Control strategy for reuse system assignments and handoff |
5243629, | Sep 03 1991 | AT&T Bell Laboratories | Multi-subcarrier modulation for HDTV transmission |
5285474, | Jun 12 1992 | Silicon Valley Bank | Method for equalizing a multicarrier signal in a multicarrier communication system |
5479447, | May 03 1993 | BOARD OF TRUSTEES OF THE LELAND STANFORD, JUNIOR UNIVERSITY, THE | Method and apparatus for adaptive, variable bandwidth, high-speed data transmission of a multicarrier signal over digital subscriber lines |
5517435, | Mar 11 1993 | NEC Corporation | Method of identifying an unknown system with a band-splitting adaptive filter and a device thereof |
EP400222, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Nov 23 2004 | Aware, Inc. | (assignment on the face of the patent) | / | |||
Dec 10 2010 | AWARE, INC | Hybrid Audio LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 025534 | /0671 | |
Mar 28 2016 | HYBRID AUDIO, LLC | HYBRID AUDIO, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 038115 | /0877 |
Date | Maintenance Fee Events |
Feb 04 2013 | REM: Maintenance Fee Reminder Mailed. |
Feb 12 2013 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Feb 12 2013 | M1556: 11.5 yr surcharge- late pmt w/in 6 mo, Large Entity. |
Date | Maintenance Schedule |
Apr 29 2011 | 4 years fee payment window open |
Oct 29 2011 | 6 months grace period start (w surcharge) |
Apr 29 2012 | patent expiry (for year 4) |
Apr 29 2014 | 2 years to revive unintentionally abandoned end. (for year 4) |
Apr 29 2015 | 8 years fee payment window open |
Oct 29 2015 | 6 months grace period start (w surcharge) |
Apr 29 2016 | patent expiry (for year 8) |
Apr 29 2018 | 2 years to revive unintentionally abandoned end. (for year 8) |
Apr 29 2019 | 12 years fee payment window open |
Oct 29 2019 | 6 months grace period start (w surcharge) |
Apr 29 2020 | patent expiry (for year 12) |
Apr 29 2022 | 2 years to revive unintentionally abandoned end. (for year 12) |