Rate loop processor for perceptual encoder/decoder

Rate loop processor for perceptual encoder/decoder
RE39080

A method and apparatus for quantizing audio signals is disclosed which advantageously produces a quantized audio signal which can be encoded within an acceptable range. Advantageously, the quantizer uses a scale factor which is interpolated between a threshold based on the calculated threshold of hearing at a given frequency and the absolute threshold of hearing at the same frequency.

PTO Wrapper PDF
Dossier Espace Google

Patent RE39080
Priority Dec 30 1988
Filed Aug 13 2002
Issued Apr 25 2006
Expiry Dec 30 2008
Inventors Johnston, …
Assg.orig Lucent Tec…
Assg.curr Lucent Tec…
Entity Large
Referenced by 95
References 55
Maint.: EXPIRED

1. A method of coding an audio signal comprising:

(a) converting a time domain representation of the audio signal into a frequency domain representation of the audio signal, the frequency domain representation comprising a set of frequency coefficients;

(b) calculating a masking threshold based upon the set of frequency coefficients;

(c) using a rate loop processor in an iterative fashion to determine a set of quantization step size coefficients for use in encoding the set of frequency coefficients, said set of quantization step size coefficients determined by using the masking threshold and an absolute hearing threshold; and

(d) coding the set of frequency coefficients based upon the set of quantization step size coefficients.

4. A decoder for decoding a set of frequency coefficients representing an audio signal, the decoder comprising:

(a) means for receiving the set of coefficients, the set of frequency coefficients having been encoded by:

(1) converting a time domain representation of the audio signal into a frequency domain representation of the audio signal comprising the set of frequency coefficients;

(2) calculating a masking threshold based upon the set of frequency coefficients;

(3) using a rate loop processor in an iterative fashion to determine a set of quantization step size coefficients needed to encode the set of frequency coefficients, said set of quantization step size coefficients determined by using the masking threshold and an absolute hearing threshold; and

(4) coding the set of frequency coefficients based upon the set of quantization step size coefficients; and

(b) means for converting the set of coefficients to a time domain signal.

0. 2. The method of claim 1 wherein the set of frequency coefficients are MDCT coefficients.

3. The method of claim 1 wherein the using the rate loop processor in the iterative fashion is discontinued when a cost, measured by the number of bits necessary to code the set of frequency coefficients, is within a predetermined range.

This
MDCT processor 310 provides the seven MDCT vectors to concatenator 311 and delay memory 312.

As discussed above with reference to window multiplier 304, four the seven data windows have N/2 non-zero coefficients (see FIGS. 4c-f). This means that four of the windowed frame vectors contain only N/2 non-zero values. Therefore, the non-zero values of these four vectors may be concatenated into a single vector of length 2N by concatenator 311 upon output from MDCT processor 310. The resulting concatenation of these vectors is handled as a single vector for subsequent purposes. Thus, delay memory 312 is presented with four MDCT vectors, rather than seven.

Delay memory 312 receives the four MDCT vectors from MDCT processor 310 and concatenator 311 for the purpose of providing temporary storage. Delay memory 312 provides a delay of one audio signal frame (as defined by input signal buffer 302) on the flow of the four MDCT vectors through the filter bank 202. The delay is provided by (i) storing the two most recent consecutive sets of MDCT vectors representing consecutive audio signal frames and (ii) presenting as input to data selector 314 the older of the consecutive sets of vectors. Delay memory 312 may comprise random access memory (RAM) of size:
M×2×4×N
where 2 is the number of consecutive sets of vectors, 4 is the number of vectors in a set, N is the number of elements in an MDCT vector, and M is the number of bits used to represent an MDCT vector element.

Data selector 314 selects one of the four MDCT vectors provided by delay memory 312 to be output from the filter bank 202 to quantizer/rate-loop 206. As mentioned above, the perceptual model processor 204 directs the operation of data selector 314 based on the FFT vectors provided by the FFT processor 308. Due to the operation of delay memory 312, the seven FFT vectors provided to the perceptual model processor 204 and the four MDCT vectors concurrently provided to data selector 314 are not based on the same audio input frame, but rather on two consecutive input signal frames—the MDCT vectors based on the earlier of the frames, and the FFT vectors based on the later of the frames. Thus, the selection of a specific MDCT vector is based on information contained in the next successive audio signal frame. The criteria according to which the perceptual model processor 204 directs the selection of an MDCT vector is described in Section 2.2, below.

For purposes of an illustrative stereo embodiment, the above analysis filterbank 202 is provided for each of the left and right channels.

2.2. The Perceptual Model Processor

A perceptual coder achieves success in reducing the number of bits required to accurately represent high quality audio signals, in part, by introducing noise associated with quantization of information bearing signals, such as the MDCT information from the filter bank 202. The goal is, of course, to introduce this noise in an imperceptible or benign way. This noise shaping is primarily a frequency analysis instrument, so it is convenient to convert a signal into a spectral representation (e.g., the MDCT vectors provided by filter bank 202), compute the shape and amount of the noise that will be masked by these signals and injecting it by quantizing the spectral values. These and other basic operations are represented in the structure of the perceptual coder shown in FIG. 2.

The perceptual model processor 204 of the perceptual audio coder 104 illustratively receives its input from the analysis filter bank 202 which operates on successive frames. The perceptual model processor inputs then typically comprise seven Fast Fourier Transform (FFT) vectors from the analysis filter bank 202. These are the outputs of the FFT processor 308 in the form of seven vectors of 2N complex elements, each corresponding to one of the windowed frame vectors.

In order to mask the quantization noise by the signal, one must consider the spectral contents of the signal and the duration of a particular spectral pattern of the signal. These two aspects are related to masking in the frequency domain where signal and noise are approximately steady state—given the integration period of the hearing system- and also with masking in the time domain where signal and nose are subjected to different cochlear filters. The shape and length of these filters are frequency dependent.

Masking in the frequency domain is described by the concept of simultaneous masking. Masking in the time domain is characterized by the concept of premasking and postmasking. These concepts are extensively explained in the literature; see, for example, E. Zwicker and H. Fastl, “Psychoacoustics, Facts, and Models,” Springer-Verlag, 1990. To make these concepts useful to perceptual coding, they are embodied in different ways.

Simultaneous masking is evaluated by using perceptual noise shaping models. Given the spectral contents of the signal and its description in terms of noise-like or tone-like behavior, these models produce an hypothetical masking threshold that rules the quantization level of each spectral component. This noise shaping represents the maximum amount of noise that may be introduced in the original signal without causing any perceptible difference. A measure called the PERCEPTUAL ENTROPY (PE) uses this hypothetical masking threshold to estimate the theoretical lower bound of the bitrate for transparent encoding. J. D. Jonston, Estimation of Perceptual Entropy Using Noise Masking Criteria,” ICASSP, 1989.

Premasking characterizes the (in)audibility of a noise that starts some time before the masker signal which is louder than the noise. The noise amplitude must be more attenuated as the delay increases. This attenuation level is also frequency dependent. If the noise is the quantization noise attenuated by the first half of the synthesis window, experimental evidence indicates the maximum acceptable delay to be about 1 millisecond.

This problem is very sensitive and can conflict directly with achieving a good coding gain. Assuming stationary conditions—which is a false premiss—The coding gain is bigger for larger transforms, but, the quantization error spreads till the beginning of the reconstructed time segment. So, if a transform length of 1024 points is used, with a digital signal sampled at a rate of 48000 Hz, the noise will appear at most 21 milliseconds before the signal. This scenario is particularly critical when the signal takes the form of a sharp transient in the time domain commonly known as an “attack”. In this case the quantization noise is audible before the attack. The effect is known as pre-echo.

Thus, a fixed length filter bank is a not a good perceptual solution nor a signal processing solution for non-stationary regions of the signal. It will be shown later that a possible way to circumvent this problem is to improve the temporal resolution of the coder by reducing the analysis/synthesis window length. This is implemented as a window switching mechanism when conditions of attack are detected. In this way, the coding gain achieved by using a long analysis/synthesis window will be affected only when such detection occurs with a consequent need to switch to a shorter analysis/synthesis window.

Postmasking characterizes the (in)audibility of a noise when it remains after the cessation of a stronger masker signal. In this case the acceptable delays are in the order of 20 milliseconds. Given that the bigger transformed time segment lasts 21 milliseconds (1024 samples), no special care is needed to handle this situation.

WINDOW SWITCHING

The PERCEPTUAL ENTROPY (PE)_measure of a particular transform segment gives the theoretical lower bound of bits/sample to code that segment transparently. Due to its memory properties, which are related to premasking protection, this measure shows a significant increase of the PE value to its previous value—related with the previous segment—when some situations of strong non-stationarity of the signal (e.g. an attack) are presented. This important property is used to activate the window switching mechanism in order to reduce pre-echo. This window switching mechanism is not a new strategy, having been used, e.g., in the ASPEC coder, described in the ISO/MPEG Audio Coding Report, 1990, but the decision technique behind it is new using the PE information to accurately localize the non-stationarity and define the right moment to operate the switch.

Two basic window lengths: 1024 samples and 256 samples are used. The former corresponds to a segment duration of about 21 milliseconds and the latter to a segment duration of about 5 milliseconds. Short windows are associated in sets of 4 to represent as much spectral data as a large window (but they represent a “different” number of temporal samples). In order to make the transition from large to short windows and vice-versa it proves convenient to use two more types of windows. A START window makes the transition from large (regular) to short windows and a STOP window makes the opposite transition, as shown in FIG. 5b. See the above-cited Princen reference for useful information on this subject. Both windows are 1024 samples wide. They are useful to keep the system critically sampled and also to guarantee the time aliasing cancellation process in the transition region.

In order to exploit interchannel redundancy and irrelevancy, the same type of window is used for RIGHT and LEFT channels in each segment.

The stationarity behavior of the signal is monitored at two levels. First by large regular windows, then if necessary, by short windows. Accordingly, the PE of large (regular) window is calculated for every segment while the PE of short windows are calculated only when needed. However, the tonality information for both types is updated for every segment in order to follow the continuous variation of the signal.

Unless stated otherwise, a segment involves 1024 samples which is the length of a large regular window.

The diagram of FIG. 5a represents all the monitoring possibilities when the segment from the point N/2 till the point 3N/2 is being analysed. Related to the digram of FIG. 5 is the flowchart of FIG. 6 which describes the monitoring sequence and decision technique. We need to keep in buffer three halves of a segment in order to be able to insert a START window prior to a sequence of short windows when necessary. FIGS. 5a-e explicitly considers the 50% overlap between successive segments.

The process begins by analysing a “new” segment with 512 new temporal samples (the remaining 512 samples belong to the previous segment). As shown in FIG. 6, the PE of this new segment and the differential PE to the previous segment are calculated (601). If the latter value reaches a predefined threshold (602), then the existence of a non-stationarity inside the current segment is declared and details are obtained by processing four short windows with positions as represented in FIG. 5a. The PE value of each short window is calculated (603) resulting in the ordered sequence: PE1, PE2, PE3 and PE4. From these values, the exact beginning of the strong non-stationarity of the signal is deduced. Only five locations are possible, identified in FIG. 5a as L1, L2, L3, L4 and L5. As it will become evident, if the non-stationarity had occurred somewhere from the point N/2 till the point 15N/16, that situation would have been detected in the previous segment. It follows that the PE1 value does not contain relevant information about the stationarity of the current segment. The average PE of the short windows is compared with the PE of the large window of the same segment (605). A smaller PE reveals a more efficient coding situation. Thus if the former value is not smaller than the latter, then we assume that we are facing a degenerate situation and the window switching process is aborted.

It has been observed that for short windows the information about stationarity lies more on its PE value than on the differential to the PE value of the precedent window. Accordingly, the first window that has a PE value larger than a predefined threshold is detected. PE2 is identified with location L1, PE3 with L2 and PE4 with location L3. In either case, a START window (608) is placed before the current segment that will be coded with short windows. A STOP window is needed to complete the process (616). There are, however, two possibilities. If the identified location where the strong non-stationarity of the signal begins is L1 or L2 then, this is well inside the short window sequence, no coding artifacts result and the coding sequence is depicted in FIG. 5b. If the location if L4 (612), then, in the worst situation, the non-stationarity may begin very close to the right edge of the last short window. Previous results have consistently shown that placing a STOP window—in coding conditions—in these circumstances degrades significantly the reconstruction of the signal in this switching point. For this reason, another set of four short windows is placed before a STOP window (614). The resulting coding sequence is represented in FIG. 5e.

If none of the short PEs is above the threshold, the remaining possibilities are L4 or L5. In this case, the problem lies ahead of the scope of the short window sequence and the first segment in the buffer may be immediately coded using a regular large window.

To identify the correct location, another short window must be processed. It is represented in FIG. 5a by a dotted curve and its PE value, PE1_n+1, is also computed. As it is easily recognized, this short window already belongs to the next segment. If PE1_n+1is above the threshold (611), then, the location is L4 and, as depicted in FIG. 5c, a START window (613) may be followed by a STOP window (615). In this case the spread of the quantization noise will be limited to the length of a short window, and a better coding gain is achieved. In the rare situation of the location being L5, then the coding is done according to the sequence of FIG. 5d. The way to prove that in this case that is right solution is by confirming that PE2_n+1will be above the threshold. PE2_n+1is the PE of the short window (not represented in FIG. 5) immediately following the window identified with PE1_n+1.

As mentioned before for each segment, RIGHT and LEFT channels use the same type of analysis/synthesis window. This means that a switch is done for both channels when at least one channel requires it.

It has been observed that for low bitrate applications the solution of FIG. 5c, although representing a good local psychoacoustic solution, demands an unreasonably large number of bits that may adversely affect the coding quality of subsequent segments. For this reason, that coding solution may eventually be inhibited.

It is also evident that the details of the reconstructed signal when short windows are used are closer to the original signal than when only regular large window are used. This is so because the attack is basically a wide bandwidth signal and may only be considered stationary for very short periods of time. Since short windows have a greater temporal resolution than large windows, they are able to follow and reproduce with more fidelity the varying pattern of the spectrum. In other words, this is the difference between a more precise local (in time) quantization of the signal and a global (in frequency) quantization of the signal.

The final masking threshold of the stereophonic coder is calculated using a combination of monophonic and stereophonic thresholds. While the monophonic threshold is computed independently for each channel, the stereophonic one considers both channels.

The independent masking threshold for the RIGHT of the LEFT channel is computed using a psychoacoustic model that includes an expression for tone masking noise and noise masking tone. The latter is used as a conservative approximation for a noise masking noise expression. The monophonic threshold is calculated using the same procedure as previous work. In particular, a tonality measure considers the evolution of the power and the phase of each frequency coefficient across the last three segments to identify the signal as being more tone—like or noise—like. Accordingly, each psychoacoustic expression is more of less weighted than the other. These expressions found in the literature were updated for better performance. They are defined as: ${TMN}_{dB} = 19.5 + bark \frac{18.0}{26.0}$ ${NMT}_{dB} = 6.56 - bark \frac{3.06}{26.0}$
where bark is the frequency in Bark scale. The scale is related to what we may call the cochlear filters or critical bands which, in turn, are identified with constant length segments of the basilar membrane. The final threshold is adjusted to consider absolute thresholds of masking and also to consider a partial premasking protection.

A brief description of the complete monophonic threshold calculation follows. Some terminology must be introduced in order to simplify the description of the operations involved.

The spectrum of each segment is organized in three different ways, each one following a different purpose.

1. First, it may be organized in partitions. Each partition has associated one single Bark value. These partitions provide a resolution of approximately either one MDCT line or ⅓ of a critical band, whichever is wider. At low frequencies a single line of the MDCT will constitute a coder partition. At high frequencies, many lines will be combined into one coder partition. In this case the Bark value associated is the median Bark point of the partition. This partitioning of the spectrum is necessary to insure an acceptable resolution for the spreading function. As will be shown later, this function represents the masking influence among neighboring critical bands.

2. Secondly, the spectrum may be organized in bands. Bands are defined by a parameter file. Each band groups a number of spectral lines that are associated with a single scale factor that results from the final masking threshold vector.

3. Finally, the spectrum may also be organized in sections. It will be shown later that sections involve an integer number of bands and represent a region of the spectrum coded with the same Huffman code book.

Three indices for data values are used. These are:

- ω→indicates that the calculation is indexed by frequency in the MDCT line domain.
- b→indicates that the calculation is indexed in the threshold calculation partition domain. In the case where we do a convolution or sum in that domain, bb will be used as the summation variable.
- n→indicates that the calculation is indexed in the coder band domain.

Additionally some symbols are also used:

- 1. The index of the calculation partition, b.
- 2. The lowest frequency line in the partition, ωlow_b.
- 3. The highest frequency line in the partition, ωhigh_b.
- 4. The median bark value of the partition, bval_b.
- 5. The value for tone masking noise (in dB) for the partition, TMN_b.
- 6. The value for noise masking tone (in dB) for the partition, NMT_b.

Several points in the following description refer to the “spreading function”. It is calculated by the following method:
tmpx=1.05(j−i),
Where i is the bark value of the signal being spread, j the bark value of the band being spread into, and tmpx is a temporary variable.
x=8minimum((tmpx−0.5)²−2(tmpx−0.5),0)
Where x is a temporary variable, and minimum(a,b) is a function returning the more negative of a or b.
tmpy=15.811389+7.5(tmpx+0.474)−17.5(1.+(tmpx+0.474)²)^0.5
where tmpy is another temporary variable. $if (tmpy < - 100) then [sprdngf (i, j) = 0] else [sprdngf (i, j) = 10^{\frac{(x + tmpy)}{10.}}] .$
Steps in Threshold Calculation

The following steps are the necessary steps for calculation the SMR_nused in the coder.

- 1. Concatenate 512 new samples of the input signal to from another 1024 samples segment. Please refer to FIG. 5a.
- 2. Calculate the complex spectrum of the input signal using the O-FFT as described in 2.0 and using a sine window.
- 3. Calculate a predicted r and φ.

The polar representation of the transform is calculated r_ω and φ_ω represent the magnitude and phase components of a spectral line of the transformed segment.

A predicted magnitude, {circumflex over (r)}_ω, and phase {circumflex over (φ)}_ω, are calculated from the preceding two threshold calculation blocks' r and φ:
{circumflex over (r)}_ω=2r₁₀₇(t−1)−r_ω(t−2)
{circumflex over (φ)}_ω=2φ₁₀₇(t−1)−φ_ω(t−2)
where t represents the current block number, t−1 indexes the previous block's data, and t−2 indexes the data from the threshold calculation block before that,

- 4. Calculate the unpredictability measure c_{ω cω}, the unpredictability measure, is: $c_{ω} = \frac{{({(r_{ω} \cos ϕ_{ω} - {\hat{r}}_{ω} \cos {\hat{ϕ}}_{ω})}^{2} + {(r_{ω} \sin ϕ_{ω} - {\hat{r}}_{ω} \sin {\hat{ϕ}}_{ω})}^{2})}^{5}}{r_{ω} + abs ({\hat{r}}_{ω})}$
- 5. Calculate the energy and unpredictability in the threshold calculation partitions.

The energy in each partition, e_b, is: $e_{b} = \sum_{ω = {ωlow}_{b}}^{{ωhigh}_{b}} r_{ω}^{2}$
and the weighted unpredictability, c_b, is: $c_{b} = \sum_{ω = {ωlow}_{b}}^{{ωhigh}_{b}} r_{ω}^{2} c_{ω}$

- 6. Convolve the partitioned energy and unpredictability with the spreading function. ${ecb}_{b} = \sum_{bb = 1}^{b \max} e_{bb} sprdngf ({bval}_{bb}, {bval}_{b})$ ${ct}_{b} = \sum_{bb = 1}^{b \max} c_{bb} sprdngf ({bval}_{bb}, {bval}_{b})$

Because ct_bis weighted by the signal energy, it must be renormalized to cb_b. ${cb}_{b} = \frac{{ct}_{b}}{{ecb}_{b}}$
At the same time, due to the non-normalized nature of the spreading function, ecb_bshould be renormalized and the normalized energy en_b, calculated. ${en}_{b} = \frac{{ecb}_{b}}{{rnorm}_{b}}$

The normalization coefficient, rnorm_bis: ${rnorm}_{b} = \frac{1}{\sum_{bb = 0}^{b \max} sprdngf ({bval}_{bb}, {bval}_{b})}$

- 7. Convert cb_bto tb_b.
  tb_b=−0.299−0.43 log_a(cb_b)
- Each tb_bis limited to the range of 0≦tb_b≦1.
- 8. Calculate the required SNR in each partition. ${TMN}_{b} = 19.5 + {bval}_{b} \frac{18.0}{26.0}$ ${NMT}_{b} = 6.56 - {bval}_{b} \frac{3.06}{26.0}$

Where TMN_bis the tone masking noise in dB and NMT_bis the noise masking tone value in dB.

The required signal to noise ratio, SNR_b, is:
SNR_b=tb_bTMN_b+(1−tb_b)NMT_b

- 9. Calculate the power ratio.

The power ratio, bc_b, is: ${bc}_{b} = 10^{\frac{- {SNR}_{b}}{10}}$

- 10. Calculation of actual energy threshold, nb_b.
  nb_b=en_bbc_b
- 11. Spread the threshold energy over MDCT lines, yielding nb_ω ${nb}_{ω} = \frac{{nb}_{b}}{ω {high}_{b} - ω {low}_{b} + 1}$
- 12. Include absolute thresholds, yielding the final energy threshold of audibility, thr_ω
  thr_ωmax(nb_ωabsthr_ω).

The dB values of absthr shown in the “Absolute Threshold Tables” are relative to the level that a sine wave of ±½ lsb has in the MDCT used for threshold calculation. The dB values must be converted into the energy domain after considering the MDCT normalization actually used.

- 13. Pre-echo control
- 14. Calculate the signal to mask ratios, SMR_n.

The table of “Bands of the Coder” shows

- 1. The index, n, of the band.
- 2. The upper index, ωhigh_nof the band n. The lower index, ωlow_n, is computed from the previous band as ωhigh_n-1+1.

To further classify each band, another variable is created. The width index, width_n, will assume a value width_n=1 if n is a perceptually narrow band, and width_n=0 if n is a perceptually wide band. The former case occurs if
bval_ωhigh_b−bval_ωlow_b<bandlength
bandlength is a parameter set in the initialization routine. Otherwise the latter case is assumed.

Then, if (width_n=1), the noise level in the coder band, nband_nis calculated as: ${nband}_{n} = \frac{\sum_{ω = ω {low}_{n}}^{ω {high}_{n}} {thr}_{ω}}{{ωhigh}_{n} - {ωlow}_{n} + 1},$
else,
nband_n=minimum(thr_ωlow_n, . . . ,thr_ωhigh_n)

Where, in this case, minimum(a, . . . ,z) is a function returning the most negative or smallest positive argument of the arguments a . . . z.

The ratios to be sent to the decoder, SMR_n, are calculated as: ${SMR}_{n} = 10 \cdot \log_{10} (\frac{{[12.0 * {nband}_{n}]}^{0.5}}{minimum (absthr)})$

It is important to emphasize that since the tonality measure is the output of a spectrum analysis process, the analysis window has a sine form for all the cases of large or short segments. In particular, when a segment is chosen to be coded as a START or STOP window, its tonality information is obtained considering a sine window; the remaining operations, e.g. the threshold calculation and the quantization of the coefficients, consider the spectrum obtained with the appropriate window.

STEREOPHONIC THRESHOLD

The stereophonic threshold has several goals. It is known that most of the time the two channels sound “alike”. Thus, some correlation exists that may be converted in coding gain. Looking into the temporal representation of the two channels, this correlation is not obvious. However, the spectral representation has a number of interesting features that may advantageously be exploited. In fact, a very practical and useful possibility is to create a new basis to represent the two channels. This basis involves two orthogonal vectors, the vector SUM and the vector DIFFERENCE defined by the following linear combination: $[\begin{matrix} SUM \\ DIF \end{matrix}] = \frac{1}{2} [\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}] \cdot [\begin{matrix} RIGHT \\ LEFT \end{matrix}]$

These vectors, which have the length of the window being used, are generated in the frequency domain since the transform process is by definition a linear operation. This has the advantage of simplifying the computational load.

The first goal is to have a more decorrelated representation of the two signals. The concentration of most of the energy in one of these new channels is a consequence of the redundancy that exists between RIGHT and LEFT channels and on average, leads always to a coding gain.

A second goal is to correlate the quantization noise of the RIGHT and LEFT channels and control the localization of the noise or the unmasking effect. This problem arises if RIGHT and LEFT channels are quantized and coded independently. This concept is exemplified by the following context: supposing that the threshold by masking for a particular signal has been calculated, two situations may be created. First, we add to the signal an amount of noise that corresponds to the threshold. If we present this same signal with this same noise to the two ears then the noise is masked. However, if we add an amount of noise that corresponds to the threshold to the signal and present this combination to one ear; do the same operation for the other ear but with noise uncorrelated with the previous one, then the noise is not masked. In order to achieve masking again, the noise at both ears must be reduced by a level given by the masking level differences (MLD).

The unmasking problem may be generalized to the following form: the quantization noise is not masked if it does not follow the localization of the masking signal. Hence, in particular, we may have two limit cases: center localization of the signal with unmasking more noticeable on the sides of the listener and side localization of the signal with unmasking more noticeable on the center line.

The new vectors SUM and DIFFERENCE are very convenient because they express the signal localized on the center and also on both sides of the listener. Also, they enable to control the quantization noise with center and side image. Thus, the unmasking problem is solved by controlling the protection level for the MLD through these vectors. Based on some psychoacoustic information and other experiments and results, the MLD protection is particularly critical for very low frequencies to about 3 KHz. It appears to depend only on the signal power and not on its tonality properties. The following expression for the MLD proved to give good results: ${MLD}_{dB} (i) = {25.5 [\cos \frac{π b (i)}{32.0}]}^{2}$
where i is the partition index of the spectrum (see [7]), and b(i) is the bark frequency of the center of the partition i. This expression is only valid for b(i)≦16.0 i.e. for frequencies below 3 KHz. The expression for the MLD threshold is given by: ${THR}_{MLD} (i) = C (i) 10^{- \frac{{MLD}_{dB} (i)}{10}}$

C(i) is the spread signal energy on the basilar membrane, corresponding only to the partition i.

A third and last goal is to take advantage of a particular stereophonic signal image to extract irrelevance from directions of the signal that are masked by that image. In principle, this is done only when the stereo image is strongly defined in one direction, in order to not compromise the richness of the stereo signal. Based on the vectors SUM and DIFFERENCE, this goal is implemented by positioning the following two dual principles:

- 1. If there is a strong depression of the signal (and hence of the noise) on both sides of the listener, then an increase of the noise on the middle line (center image) is perceptually tolerated. The upper bound is the side noise.
- 2. If there is a strong localization of the signal (and hence of the noise) on the middle line, then an increase of the (correlated) noise on both sides is perceptually tolerated. The upper bound is the center noise.

However, any increase of the noise level must be corrected by the MLD threshold.

According to these goals, the final stereophonic threshold is computed as follows. First, the thresholds for channels SUM and DIFFERENCE are calculated using the monophonic models for noise-masking-tone and tone-masking-noise. The procedure is exactly the one presented in pages 25 and 26. At this point we have the actual energy threshold per band, nb_bfor both channels. By convenience, we call then THRn_SUMand THRn_DIF, respectively for the channel SUM and the channel DIFFERENCE.

Secondly, the MLD threshold for both channels i.e. THRn_MLD,SUMand THRn_MLD,DIF, are also calculated by: ${THRn}_{MLD, SUM} = {en}_{b, SUM} 10^{- \frac{{MLDn}_{dB}}{10}}$ ${THRn}_{MLD, DIF} = {en}_{b, DIF} 10^{- \frac{{MLDn}_{dB}}{10}}$
The MLD protection and the stereo irrelevance are considered by computing:
nthr_SUM=MAX[THRn_SUM, MIN(THRn_DIF, THRn_MLD,DIF)]
nthr_DIF=MAX[THRn_DIF, MIN(THRn_SUM, THRn_MLD,SUM)]

After these operations, the remaining steps after the 11th, as presented in 3.2 are also taken for both channels. In essence, these last thresholds are further adjusted to consider the absolute threshold and also a partial premasking protection. It must be noticed that this premasking protection was simply adopted from the monophonic case. It considers a monaural time resolution of about 2 milliseconds. However, the binaural time resolution is as accurate as 6 microseconds! To conveniently code stereo signals with relevant stereo image based on interchannel time differences, is a subject that needs further investigation.

STEREOPHONIC CODER

The simplified structure of the stereophonic coder allows for the encoding of the stereo signals which are subsequently decoded by the stereophonic decoder which, is presented in FIG. 12. For each segment of data being analysed, detailed information about the independent and relative behavior of both signal channels may be available through the information given by large and short transforms. This information is used according to the necessary number of steps needed to code a particular segment. These steps involve essentially the selection of the analysis window, the definition on a band basis of the coding mode (R/L or S/D), the quantization (704) and Huffman coding (705) of the coefficients (708) and scale factors (707) and finally, the bitstream composing (706) with a bit stream organization as depicted in FIG. 10.

Coding Mode Selection

When a new segment is read, the tonality updating for large and short analysis windows is done. Monophonic thresholds and the PE values are calculated according to the technique described previously. This gives the first decision about the type of window to be used for both channels.

Once the window sequence is chosen, an orthogonal coding decision is then considered. It involves the choice between independent coding of the channels, mode RIGHT/LEFT (R/L) or joint coding using the SUM and DIFFERENCE channels (S/D). This decision is taken on a band basis of the coder. This is based on the assumption that the binaural perception is a function of the output of the same critical bands at the two ears. If the threshold at the two channels is very different, then there is no need for MLD protection and the signals will not be more decorrelated if the channels SUM and DIFFERENCE are considered. If the signals are such that they generate a stereo image, then a MLD protection must be activated and additional gains may be exploited by choosing the S/D coding mode. A convenient way to detect this latter situation is by comparing the monophonic threshold between RIGHT and LEFT channels. If the thresholds in a particular band do not differ by more than a predefined value, e.g. 2 dB, then the S/D coding mode is chosen. Otherwise the independent mode R/L is assumed. Associated which each band is a one bit flag that specifies the coding mode of that band and that must be transmitted to the decoder as side information. From now on it is called a coding mode flag.

The coding mode decision is adaptive in time since for the same band it may differ for subsequent segments, and is also adaptive in frequency since for the same segment, the coding mode for subsequent bands may be different. An illustration of a coding decision is given in FIG. 13. This illustration is valid for long and also short segments.

At this point it is clear that since the window switching mechanism involves only monphonic measures, the maximum number of PE measures per segment is 10 (2 channels *[1 large window+4 short windows]). However, the maximum number of thresholds that we may need to compute per segment is 20 and therefore 20 tonality measures must be always updated per segment (4 channels *[1 large window+4 short windows]).

Bitrate Adjustment

It was previously said that the decisions for window switching and for coding mode selection are orthogonal in the sense that they do not depend on each other. Independent to these decisions is also the final step of the coding process that involves quantization, Huffman coding and bitstream composing: i.e. there is no feedback path. This fact has the advantage of reducing the whole coding delay to a minimum value (1024/48000=21.3 milliseconds) and also to avoid instabilities due to unorthodox coding situations.

The quantization process effects both spectral and coefficients and scale factors. Spectral coefficients are clustered in bands, each band having the same step size or scale factor. Each step size is directly computed from the masking threshold corresponding to its band. The quantized values, which are integer numbers, are then converted to variable word length of Huffman codes. The total number of bits to code the segment, considering additional fields of the bitstream, is computed. Since the bitrate must be kept constant, the quantization process must be iteratively done till that number of bits is within predefined limits. After the number of bits needed to code the whole segment, considering the basic masking threshold, the degree of adjustment is dictated by a buffer control unit. This control unit shares the deficit or credit of additional bits among several segments, according to the needs of each one.

The technique of the bitrate adjustment routine is represented by the flowchart of FIG. 9. It may be seen that after the total number of available bits to be used by the current segment is computed, an iterative procedure tries to find a factor α such that if all the initial thresholds are multiplied by this factor, the final total number of bits is smaller then and within an error δ of the available number of bits. Even if the approximation curve is so hostile that α is not found within the maximum number of iterations, one acceptable solution is always available.

The main steps of this routine are depicted in FIG. 7 and FIG. 9 as follows. First, an interval including the solution is found. Then, a loop seeks to rapidly converge to the best solution. At each iteration, the best solution is updated. Thus, the total number of bits to represent the present whole segment (710) using the basic masking threshold is evaluated. Next, the total number of bits available to be used by the current segment is computed based on the current buffer status from the buffer control (703). A comparison (903) is made between the total number of bits available in the buffer and the calculated total number of bits to represent the current whole segment. If the required number of bits is less than the available number of bits in the buffer, a further comparison is made to determine if the final total number of bits required is within an error factor of the available number of bits (904). If within the error factor, the total number of bits required to represent the current whole segment are transmitted (916) to the entropy encoder (208). If not within the error factor, an evaluation is done based upon the number of bits required to represent the whole segment at the absolute threshold values (905). If the required number of bits to represent the whole segment at the absolute threshold values are less than the total number of bits available (906) they are transmitted (916) to the entropy encoder (208).

If at this point, neither the basic masking threshold nor absolute thresholds have provided an acceptable bit representation of the whole segment, an iterative procedure (as shown in 907 through 915) is employed to establish the interpolation factor used as a multiplier and discussed previously. If successful, the iterative procedure will establish a bit representation of the whole segment which is within the buffer limit and associated error factor. Otherwise, after reaching a maximum number of iterations (908) the iterative process will return the last best approximation (915) of the whole segment as output (916).

In order to use the same procedure for segments coded with large and short windows, in this latter case, the coefficients of the 4 short windows are clustered by concatenating homologue bands. Scale factors are clustered in the same.

The bitrate adjustment routine (704) calls another routine that computes the total number of bits to represent all the Huffman coded words (705) (coefficients and scale factors). This latter routine does a spectrum partioning according to the amplitude distribution of the coefficients. The goal is to assign predefined Huffman code books to sections of the spectrum. Each section groups a variable number of bands and its coefficients are Huffman coded with a convenient book. The limits of the section and the reference of the code book must be sent to the decoder as side information.

The spectrum partioning is done using a minimum cost strategy. The main steps are as follows. First, all possible sections are defined -the limit is one section per band- each one having the code book that best matches the amplitude distribution of the coefficients within that section. As the beginning and the end of the whole spectrum is known, if K is the number of sections, there are K−1 separators between sections. The price to eliminate each separator is computed. The separator that has a lower price is eliminated (initial prices may be negative). Prices are compared again before the next iteration. This process is repeated till a maximum allowable number of sections is obtained and the smallest price to eliminate another separator is higher than a predefined value.

Aspects of the processing accomplished by quantizer/rate-loop 206 in FIG. 2 will now be presented. In the prior art, rate-loop mechanisms have contained assumptions related to the monophonic case. With the shift from monophonic to stereophonic perceptual coders, the demands placed upon the rate-loop are increased.

The inputs to quantizer/rate-loop 206 in FIG. 2 comprise spectral coefficients (i.e., the MDCT coefficients) derived by analysis filter bank 202, and outputs of perceptual model 204, including calculated thresholds corresponding to the spectral coefficients.

Quantizer/rate-loop 206 quantizes the spectral information based, in part, on the calculated thresholds and the absolute thresholds of hearing and in doing so provides a bitstream to entropy encoder 208. The bitstream includes signals divided into three part: (1) a first part containing the standardized side information; (2) a second part containing the scaling factors for the 35 or 56 bands and additional side information used for so-called adaptive-window switching, when used (the length of this part can vary depending on information in the first part) and (3) a third part comprising the quantized spectral coefficients.

A “utilized scale factor”, Δ, is iteratively derived by interpolating between a calculated scale factor and a scale factor derived from the absolute threshold of hearing at the frequency corresponding to the frequency of the respective spectral coefficient to be quantized until the quantized spectral coefficients can be encoded within permissible limits.

An illustrative embodiment of the present invention can be seen in FIG. 13. As shown at 1301 quantizer/rate-loop receives a spectral coefficient, C_y, and an energy threshold, E, corresponding to that spectral coefficient. A “threshold scale factor”, Δ_ois calculated by $Δ_{0} = \sqrt{12 E}$
An “absolute scale factor”, Δ_A, is also calculated based upon the absolute threshold of hearing (i.e., the quietest sound that can be heard at the frequency corresponding to the scale factor). Advantageously, an interpolation constant, α, and interpolation bounds α_highand α_loware initialized to aid in the adjustment of the utilized scale factor.

- α_high=1
- α_low=0
- α=α_high

Next, as shown in 1305, the utilized scale factor is determined from:
Δ=Δ_o^α×Δ_A^(1-alpha)

Next, as shown in 1307, the utilized scale factor is itself quantized because the utilized scale factor as computed above is not discrete but is advantageously discrete when transmitted and used.
Δ=Q⁻¹(Q(Δ))

Next, as shown in 1309, the spectral coefficient is quantized using the utilized scale factor to create a “quantized spectral coefficient” Q (C_y, Δ). $Q (C_{f}, Δ) = NINT (\frac{C_{f}}{Δ})$
where “NINT” is the nearest integer function. Because quantizer/rate loop 206 must transmit both the quantized spectral coefficient and the utilized scale factor, a cost, C, is calculated which is associated with how many bits it will take to transmit them both. As shown in FIG. 1311,
C=FOO(Q(C_y, Δ), Q(Δ))
where FOO is a function which, depending on the specific embodiment, can be easily determined by persons having ordinary skill in the art of data communications. As shown in 1313, the cost, C is tested to determine whether it is in a permissible range PR. When the cost is within the permissible range, Q (C_y, Δ) and Q(Δ) are transmitted to entropy coder 208.

Advantageously, and depending on the relationship of the cost C to the permissible range PR the interpolation constant and bounds are adjusted until the utilized scale factor yields a quantized spectral coefficient which has a cost within the permissible range. Illustratively, as shown in FIG. 13 at 1313, the interpolation bounds are manipulated to produce a binary search. Specifically,
when C>PR, α_high=α,
alternately,
when C<PR, α_low=α.
In either case, a new interpolation constant is calculated by: $α = \frac{α_{low} + α_{high}}{2}$
The process then continues at 1305 iteratively until the C comes within the permissible range PR.
STEREOPHONIC DECODER

The stereophonic decoder has a very simple structure as shown in FIG. 12. Its main functions are reading the incoming bitstream (1202), decoding all the data (1203), inverse quantization and reconstruction of RIGHT and LEFT channels (1204). The technique is represented in FIG. 12. Thus, the decoder is performing complementary operations to that of the encoder depicted in FIG. 7 such as operations that are complementary to quantization (704) and Huffman coding (705).

Illustrative embodiments may comprise digital signal processor (DSP) hardware, such as the AT&T DSP16 or DSP32C, and software performing the operations discussed below of the present invention. Very large scale integration (VLSI) hardware embodiments of the present invention, as well as hybrid DSP/VLSI embodiments, may also be provided. For example, an AT&T DSP16 may be employed to perform the operations of the rate loop processor depicted in FIG. 13. The DSP could receive the spectral coefficients and energy thresholds (1301) and perform the calculation of blocks 1303 and 1305 as described on page 31. Further, the DSP could calculate the utilized scale factor according to the equation given on page 32 and depicted in block 1305. The quantization blocks 1307 and 1308 can be carried out as described on page 32. Finally, the DSP may perform the cost calculation (1311) and comparison (1313) associated with quantization. The cost calculation is described on page 32 and illustrated further in FIG. 9. In this way, the interpolation factor may be adjusted (1315) according to the analysis carried out within the DSP or similar type hardware embodiments. It is to be understood that the above-described embodiments is merely illustrative of the principles of this invention. Other arrangements may be derived by those skilled in the art without departing from the spirit and scope of the invention.

INVENTORS:

Johnston, James David

THIS PATENT IS REFERENCED BY THESE PATENTS:

Patent	Priority	Assignee	Title
10059000,	Nov 25 2008	TELADOC HEALTH, INC	Server connectivity control for a tele-presence robot
10061896,	May 22 2012	TELADOC HEALTH, INC	Graphical user interfaces including touchpad driving interfaces for telemedicine devices
10218748,	Dec 03 2010	TELADOC HEALTH, INC	Systems and methods for dynamic bandwidth allocation
10241507,	Jul 13 2004	TELADOC HEALTH, INC	Mobile robot with a head-based movement mapping scheme
10259119,	Sep 30 2005	TELADOC HEALTH, INC	Multi-camera mobile teleconferencing platform
10315312,	Jul 25 2002	TELADOC HEALTH, INC	Medical tele-robotic system with a master remote station with an arbitrator
10328576,	May 22 2012	TELADOC HEALTH, INC	Social behavior rules for a medical telepresence robot
10331323,	Nov 08 2011	TELADOC HEALTH, INC	Tele-presence system with a user interface that displays different communication links
10334205,	Nov 26 2012	TELADOC HEALTH, INC	Enhanced video interaction for a user interface of a telepresence network
10343283,	May 24 2010	TELADOC HEALTH, INC	Telepresence robot system that can be accessed by a cellular phone
10399223,	Jan 28 2011	TELADOC HEALTH, INC	Interfacing with a mobile telepresence robot
10404939,	Aug 26 2009	TELADOC HEALTH, INC	Portable remote presence robot
10471588,	Apr 14 2008	TELADOC HEALTH, INC	Robotic based health care system
10493631,	Jul 10 2008	TELADOC HEALTH, INC	Docking system for a tele-presence robot
10591921,	Jan 28 2011	TELADOC HEALTH, INC	Time-dependent navigation of telepresence robots
10603792,	May 22 2012	TELADOC HEALTH, INC	Clinical workflows utilizing autonomous and semiautonomous telemedicine devices
10658083,	May 22 2012	TELADOC HEALTH, INC	Graphical user interfaces including touchpad driving interfaces for telemedicine devices
10682763,	May 09 2007	TELADOC HEALTH, INC	Robot system that operates through a network firewall
10762170,	Apr 11 2012	TELADOC HEALTH, INC	Systems and methods for visualizing patient and telepresence device statistics in a healthcare network
10769739,	Apr 25 2011	TELADOC HEALTH, INC	Systems and methods for management of information among medical providers and facilities
10780582,	May 22 2012	TELADOC HEALTH, INC	Social behavior rules for a medical telepresence robot
10808882,	May 26 2010	TELADOC HEALTH, INC	Tele-robotic system with a robot face placed on a chair
10875182,	Mar 20 2008	TELADOC HEALTH, INC	Remote presence system mounted to operating room hardware
10875183,	Nov 25 2008	TELADOC HEALTH, INC	Server connectivity control for tele-presence robot
10878960,	Jul 11 2008	TELADOC HEALTH, INC	Tele-presence robot system with multi-cast features
10882190,	Dec 09 2003	TELADOC HEALTH, INC	Protocol for a remotely controlled videoconferencing robot
10887545,	Mar 04 2010	TELADOC HEALTH, INC	Remote presence system including a cart that supports a robot face and an overhead camera
10892052,	May 22 2012	TELADOC HEALTH, INC	Graphical user interfaces including touchpad driving interfaces for telemedicine devices
10911715,	Aug 26 2009	TELADOC HEALTH, INC	Portable remote presence robot
10924708,	Nov 26 2012	TELADOC HEALTH, INC	Enhanced video interaction for a user interface of a telepresence network
10969766,	Apr 17 2009	TELADOC HEALTH, INC	Tele-presence robot system with software modularity, projector and laser pointer
11154981,	Feb 04 2010	TELADOC HEALTH, INC	Robot user interface for telepresence robot system
11205510,	Apr 11 2012	TELADOC HEALTH, INC	Systems and methods for visualizing and managing telepresence devices in healthcare networks
11289192,	Jan 28 2011	INTOUCH TECHNOLOGIES, INC.; iRobot Corporation	Interfacing with a mobile telepresence robot
11389064,	Apr 27 2018	TELADOC HEALTH, INC	Telehealth cart that supports a removable tablet with seamless audio/video switching
11389962,	May 24 2010	TELADOC HEALTH, INC	Telepresence robot system that can be accessed by a cellular phone
11399153,	Aug 26 2009	TELADOC HEALTH, INC	Portable telepresence apparatus
11453126,	May 22 2012	TELADOC HEALTH, INC	Clinical workflows utilizing autonomous and semi-autonomous telemedicine devices
11468983,	Jan 28 2011	TELADOC HEALTH, INC	Time-dependent navigation of telepresence robots
11472021,	Apr 14 2008	TELADOC HEALTH, INC.	Robotic based health care system
11515049,	May 22 2012	TELADOC HEALTH, INC.; iRobot Corporation	Graphical user interfaces including touchpad driving interfaces for telemedicine devices
11628571,	May 22 2012	TELADOC HEALTH, INC.; iRobot Corporation	Social behavior rules for a medical telepresence robot
11636944,	Aug 25 2017	TELADOC HEALTH, INC	Connectivity infrastructure for a telehealth platform
11742094,	Jul 25 2017	TELADOC HEALTH, INC.	Modular telehealth cart with thermal imaging and touch screen user interface
11787060,	Mar 20 2008	TELADOC HEALTH, INC.	Remote presence system mounted to operating room hardware
11798683,	Mar 04 2010	TELADOC HEALTH, INC.	Remote presence system including a cart that supports a robot face and an overhead camera
11862302,	Apr 24 2017	TELADOC HEALTH, INC	Automated transcription and documentation of tele-health encounters
11910128,	Nov 26 2012	TELADOC HEALTH, INC.	Enhanced video interaction for a user interface of a telepresence network
12093036,	Jan 21 2011	TELADOC HEALTH, INC	Telerobotic system with a dual application screen presentation
12138808,	Nov 25 2008	TELADOC HEALTH, INC.	Server connectivity control for tele-presence robots
7650277,	Jan 23 2003	Ittiam Systems (P) Ltd.	System, method, and apparatus for fast quantization in perceptual audio coders
7725323,	Sep 15 2003	STMICROELECTRONICS INTERNATIONAL N V	Device and process for encoding audio data
7769492,	Feb 22 2006	TELADOC HEALTH, INC	Graphical interface for a remote presence system
7813836,	Dec 09 2003	TELADOC HEALTH, INC	Protocol for a remotely controlled videoconferencing robot
8836751,	Nov 08 2011	TELADOC HEALTH, INC	Tele-presence system with a user interface that displays different communication links
8849679,	Jun 15 2006	TELADOC HEALTH, INC	Remote controlled robot system that provides medical images
8849680,	Jan 29 2009	TELADOC HEALTH, INC	Documentation through a remote presence robot
8897920,	Apr 17 2009	TELADOC HEALTH, INC	Tele-presence robot system with software modularity, projector and laser pointer
8902278,	Apr 11 2012	TELADOC HEALTH, INC	Systems and methods for visualizing and managing telepresence devices in healthcare networks
8965579,	Jan 28 2011	TELADOC HEALTH, INC	Interfacing with a mobile telepresence robot
8983174,	Mar 27 2009	TELADOC HEALTH, INC	Mobile robot with a head-based movement mapping scheme
8996165,	Oct 21 2008	TELADOC HEALTH, INC	Telepresence robot with a camera boom
9089972,	Mar 04 2010	TELADOC HEALTH, INC	Remote presence system including a cart that supports a robot face and an overhead camera
9098611,	Nov 26 2012	TELADOC HEALTH, INC	Enhanced video interaction for a user interface of a telepresence network
9138891,	Nov 25 2008	TELADOC HEALTH, INC	Server connectivity control for tele-presence robot
9160783,	May 09 2007	TELADOC HEALTH, INC	Robot system that operates through a network firewall
9174342,	May 22 2012	TELADOC HEALTH, INC	Social behavior rules for a medical telepresence robot
9185487,	Jun 30 2008	Knowles Electronics, LLC	System and method for providing noise suppression utilizing null processing noise subtraction
9193065,	Jul 10 2008	TELADOC HEALTH, INC	Docking system for a tele-presence robot
9198728,	Sep 30 2005	TELADOC HEALTH, INC	Multi-camera mobile teleconferencing platform
9251313,	Apr 11 2012	TELADOC HEALTH, INC	Systems and methods for visualizing and managing telepresence devices in healthcare networks
9264664,	Dec 03 2010	TELADOC HEALTH, INC	Systems and methods for dynamic bandwidth allocation
9323250,	Jan 28 2011	TELADOC HEALTH, INC	Time-dependent navigation of telepresence robots
9361021,	May 22 2012	TELADOC HEALTH, INC	Graphical user interfaces including touchpad driving interfaces for telemedicine devices
9375843,	Dec 09 2003	TELADOC HEALTH, INC	Protocol for a remotely controlled videoconferencing robot
9381654,	Nov 25 2008	TELADOC HEALTH, INC	Server connectivity control for tele-presence robot
9429934,	Sep 18 2008	TELADOC HEALTH, INC	Mobile videoconferencing robot system with network adaptive driving
9469030,	Jan 28 2011	TELADOC HEALTH, INC	Interfacing with a mobile telepresence robot
9558755,	May 20 2010	SAMSUNG ELECTRONICS CO , LTD	Noise suppression assisted automatic speech recognition
9602765,	Aug 26 2009	TELADOC HEALTH, INC	Portable remote presence robot
9616576,	Apr 17 2008	TELADOC HEALTH, INC	Mobile tele-presence system with a microphone system
9640194,	Oct 04 2012	SAMSUNG ELECTRONICS CO , LTD	Noise suppression for speech processing based on machine-learning mask estimation
9668048,	Jan 30 2015	SAMSUNG ELECTRONICS CO , LTD	Contextual switching of microphones
9699554,	Apr 21 2010	SAMSUNG ELECTRONICS CO , LTD	Adaptive signal equalization
9715337,	Nov 08 2011	TELADOC HEALTH, INC	Tele-presence system with a user interface that displays different communication links
9766624,	Jul 13 2004	TELADOC HEALTH, INC	Mobile robot with a head-based movement mapping scheme
9776327,	May 22 2012	TELADOC HEALTH, INC	Social behavior rules for a medical telepresence robot
9785149,	Jan 28 2011	TELADOC HEALTH, INC	Time-dependent navigation of telepresence robots
9799330,	Aug 28 2014	SAMSUNG ELECTRONICS CO , LTD	Multi-sourced noise suppression
9838784,	Dec 02 2009	SAMSUNG ELECTRONICS CO , LTD	Directional audio capture
9842192,	Jul 11 2008	TELADOC HEALTH, INC	Tele-presence robot system with multi-cast features
9849593,	Jul 25 2002	TELADOC HEALTH, INC	Medical tele-robotic system with a master remote station with an arbitrator
9956690,	Dec 09 2003	TELADOC HEALTH, INC	Protocol for a remotely controlled videoconferencing robot
9974612,	May 19 2011	TELADOC HEALTH, INC	Enhanced diagnostics for a telepresence robot
9978388,	Sep 12 2014	SAMSUNG ELECTRONICS CO , LTD	Systems and methods for restoration of speech components

THIS PATENT REFERENCES THESE PATENTS:

Patent	Priority	Assignee	Title
3989897,	Oct 25 1974		Method and apparatus for reducing noise content in audio signals
4216354,	Dec 23 1977	International Business Machines Corporation	Process for compressing data relative to voice signals and device applying said process
4349698,	Jun 19 1979	Victor Company of Japan, Limited	Audio signal translation with no delay elements
4356349,	Mar 12 1980	Trod Nossel Recording Studios, Inc.	Acoustic image enhancing method and apparatus
4516258,	Jun 30 1982	AT&T Bell Laboratories	Bit allocation generator for adaptive transform coder
4535472,	Nov 05 1982	AT&T Bell Laboratories	Adaptive bit allocator
4646061,	Mar 13 1985	RACAL-DATACOM, INC	Data communication with modified Huffman coding
4790016,	Nov 14 1985	Verizon Laboratories Inc	Adaptive method and apparatus for coding speech
4803727,	Nov 24 1986	British Telecommunications	Transmission system
4821260,	Dec 17 1986	Deutsche Thomson-Brandt GmbH	Transmission system
4860313,	Sep 21 1986	ECI Telecom Ltd	Adaptive differential pulse code modulation (ADPCM) systems
4860360,	Apr 06 1987	Verizon Laboratories Inc	Method of evaluating speech
4881267,	May 14 1987	NEC Corporation	Encoder of a multi-pulse type capable of optimizing the number of excitation pulses and quantization level
4896362,	Apr 27 1987	U S PHILIPS CORPORATION	System for subband coding of a digital audio signal
4912763,	Oct 30 1986	International Business Machines Corporation	Process for multirate encoding signals and device for implementing said process
4914701,	Dec 20 1984	Verizon Laboratories Inc	Method and apparatus for encoding speech
4941152,	Sep 03 1985	International Business Machines Corp.	Signal coding process and system for implementing said process
4945567,	Mar 06 1984	NEC Corporation	Method and apparatus for speech-band signal coding
4949383,	Aug 24 1984	Bristish Telecommunications public limited company	Frequency domain speech coding
4953214,	Jul 21 1987	Matushita Electric Industrial Co., Ltd.	Signal encoding and decoding method and device
4972484,	Nov 21 1986	Bayerische Rundfunkwerbung GmbH	Method of transmitting or storing masked sub-band coded audio signals
5014318,	Feb 25 1989	Fraunhofer Gesellschaft zur Forderung der angewandten Forschung e. V.	Apparatus for checking audio signal processing systems
5040217,	Oct 18 1989	AMERICAN TELEPHONE AND TELEGRAPH COMPANY, A CORP OF NY	Perceptual coding of audio signals
5079547,	Feb 28 1990	Victor Company of Japan, Ltd.	Method of orthogonal transform coding/decoding
5109417,	Jan 27 1989	Dolby Laboratories Licensing Corporation	Low bit rate transform coder, decoder, and encoder/decoder for high-quality audio
5151941,	Sep 30 1989	Sony Corporation	Digital signal encoding apparatus
5185800,	Oct 13 1989	Centre National d'Etudes des Telecommunications	Bit allocation device for transformed digital audio broadcasting signals with adaptive quantization based on psychoauditive criterion
5218435,	Feb 20 1991	Massachusetts Institute of Technology	Digital advanced television systems
5227788,	Mar 02 1992	AMERICAN TELEPHONE AND TELEGRAPH COMPANY, A CORPORATION OF NY	Method and apparatus for two-component signal compression
5230038,	Jan 27 1989		Low bit rate transform coder, decoder, and encoder/decoder for high-quality audio
5235671,	Oct 15 1990	Verizon Laboratories Inc	Dynamic bit allocation subband excited transform coding method and apparatus
5274740,	Jan 08 1991	DOLBY LABORATORIES LICENSING CORPORATION A CORP OF NY	Decoder for variable number of channel presentation of multidimensional sound fields
5285498,	Mar 02 1992	AT&T IPM Corp	Method and apparatus for coding audio signals based on perceptual model
5297236,	Jan 27 1989	DOLBY LABORATORIES LICENSING CORPORATION A CORP OF CA	Low computational-complexity digital filter bank for encoder, decoder, and encoder/decoder
5341457,	Dec 30 1988	THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT	Perceptual coding of audio signals
5357594,	Jan 27 1989	Dolby Laboratories Licensing Corporation	Encoding and decoding using specially designed pairs of analysis and synthesis windows
5394473,	Apr 12 1990	Dolby Laboratories Licensing Corporation	Adaptive-block-length, adaptive-transforn, and adaptive-window transform coder, decoder, and encoder/decoder for high-quality audio
5400433,	Jan 08 1991	Dolby Laboratories Licensing Corporation	Decoder for variable-number of channel presentation of multidimensional sound fields
5479562,	Jan 27 1989	Dolby Laboratories Licensing Corporation	Method and apparatus for encoding and decoding audio information
5535300,	Dec 30 1988	THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT	Perceptual coding of audio signals using entropy coding and/or multiple power spectra
5583962,	Jan 08 1992	Dolby Laboratories Licensing Corporation	Encoder/decoder for multidimensional sound fields
5592584,	Mar 02 1992	THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT	Method and apparatus for two-component signal compression
5627938,	Mar 02 1992	THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT	Rate loop processor for perceptual encoder/decoder
5633981,	Jan 08 1991	Dolby Laboratories Licensing Corporation	Method and apparatus for adjusting dynamic range and gain in an encoder/decoder for multidimensional sound fields
5752225,	Jan 27 1989	Dolby Laboratories Licensing Corporation	Method and apparatus for split-band encoding and split-band decoding of audio information using adaptive bit allocation to adjacent subbands
5924060,	Aug 29 1986		Digital coding process for transmission or storage of acoustical signals by transforming of scanning values into spectral coefficients
EP193143,
EP376553,
EP446037,
EP559383,
JP1500695,
JP2792853,
JP2796673,
JP637023,
RE36714,	Aug 13 1993	THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT	Perceptual coding of audio signals

ASSIGNMENT RECORDS Assignment records on the USPTO

/////

Executed on	Assignor	Assignee	Conveyance	Frame	Reel	Doc
Aug 13 2002		Lucent Technologies Inc.	(assignment on the face of the patent)
May 28 2003	Lucent Technologies Inc	JPMorgan Chase Bank, as Collateral Agent	SECURITY AGREEMENT	014416	0873	pdf
Nov 30 2006	JPMORGAN CHASE BANK, N A FORMERLY KNOWN AS THE CHASE MANHATTAN BANK , AS ADMINISTRATIVE AGENT	Lucent Technologies Inc	TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS	018597	0051	pdf
Jan 30 2013	Alcatel-Lucent USA Inc	CREDIT SUISSE AG	SECURITY INTEREST SEE DOCUMENT FOR DETAILS	030510	0627	pdf
Aug 19 2014	CREDIT SUISSE AG	Alcatel-Lucent USA Inc	RELEASE BY SECURED PARTY SEE DOCUMENT FOR DETAILS	033950	0001	pdf

MAINTENANCE FEES AND DATES: Maintenance records on the USPTO

Date	Maintenance Fee Events
Jun 13 2007	ASPN: Payor Number Assigned.

Date	Maintenance Schedule
Apr 25 2009	4 years fee payment window open
Oct 25 2009	6 months grace period start (w surcharge)
Apr 25 2010	patent expiry (for year 4)
Apr 25 2012	2 years to revive unintentionally abandoned end. (for year 4)
Apr 25 2013	8 years fee payment window open
Oct 25 2013	6 months grace period start (w surcharge)
Apr 25 2014	patent expiry (for year 8)
Apr 25 2016	2 years to revive unintentionally abandoned end. (for year 8)
Apr 25 2017	12 years fee payment window open
Oct 25 2017	6 months grace period start (w surcharge)
Apr 25 2018	patent expiry (for year 12)
Apr 25 2020	2 years to revive unintentionally abandoned end. (for year 12)