A method of speech encoding comprises generating a first synthesized speech signal from a first excitation signal, weighting the first synthesized speech signal using a first error weighting filter to generate a first weighted speech signal, generating a second synthesized speech signal from a second excitation signal, weighting the second synthesized speech signal using a second error weighting filter to generate a second weighted speech signal, and generating an error signal using the first weighted speech signal and the second weighted speech signal, wherein the first error weighting filter is different from the second error weighting filter. The method may further generate the error signal by weighting the speech signal using a third error weighting filter to generate a third weighted speech signal, and subtracting the first weighted speech signal and the second weighted speech signal from the third weighted speech signal to generate the error signal.
|
0. 29. A device comprising:
a speech encoder configured to:
perform a first error weighting of the speech signal to generate a weighted speech signal;
generate a first synthesized speech signal from a first excitation signal;
perform a second error weighting of the first synthesized speech signal to generate a first weighted synthesized speech signal;
generate a second synthesized speech signal from a second excitation signal;
perform a third error weighting on the second synthesized speech signal to generate a second weighted synthesized speech signal; and
wherein the third error weighting is different from the second error weighting.
1. A method of speech encoding comprising:
generating a first synthesized speech signal from a first excitation signal;
weighting said first synthesized speech signal using a first error weighting filter to generate a first weighted speech signal;
generating a second synthesized speech signal from a second excitation signal;
weighting said second synthesized speech signal using a second error weighting filter to generate a second weighted speech signal; and
generating an error signal using said first weighted speech signal and said second weighted speech signal;
wherein said first error weighting filter is different from said second error weighting filter.
12. A speech encoder comprising:
means for generating a first synthesized speech signal from a first excitation signal;
means for weighting said first synthesized speech signal to generate a first weighted speech signal;
means for generating a second synthesized speech signal from a second excitation signal;
means for weighting said second synthesized speech signal to generate a second weighted speech signal; and
means for generating an error signal using said first weighted speech signal and said second weighted speech signal;
wherein said means for weighting said first synthesized speech signal is different from said means for weighting said second synthesized speech signal.
0. 18. A method of encoding a speech signal for use by a device, the method comprising:
performing, by the device, a first error weighting of the speech signal to generate a weighted speech signal;
generating, by the device, a first synthesized speech signal from a first excitation signal;
performing, by the device, a second error weighting of the first synthesized speech signal to generate a first weighted synthesized speech signal;
generating, by the device, a second synthesized speech signal from a second excitation signal;
performing, by the device, a third error weighting on the second synthesized speech signal to generate a second weighted synthesized speech signal; and
wherein the third error weighting is different from the second error weighting.
7. A speech encoder comprising:
a first codebook;
a second codebook;
a speech synthesizer configured to generate a first synthesized speech signal from a first excitation signal of said first codebook and to generate a second synthesized speech signal from a second excitation signal of said second codebook;
a first error weighting filter configured to generate a first weighted speech signal from said first synthesized speech signal;
a second error weighting filter configured to generate a second weighted speech signal from said second synthesized speech signal; and
an error signal generator configured to generate an error signal using said first weighted speech signal and said second weighted speech signal;
wherein said first error weighting filter is different from said second error weighting filter.
2. The method of
weighting said speech signal using a third error weighting filter to generate a third weighted speech signal; and
subtracting said first weighted speech signal and said second weighted speech signal from said third weighted speech signal to generate said error signal.
3. The method of
4. The method of
using said error signal to independently select a third excitation signal from said first codebook and a fourth excitation signal from said second codebook; and
using said error signal to independently select a third gain to apply to said third excitation signal and a fourth gain to apply to said fourth excitation signal.
5. The method of
8. The speech encoder of
9. The speech encoder of
10. The speech encoder of
11. The speech encoder of
13. The speech encoder of
means for weighting said speech signal to generate a third weighted speech signal; and
means for subtracting said first weighted speech signal and said second weighted speech signal from said third weighted speech signal to generate said error signal.
14. The speech encoder of
15. The speech encoder of
16. The speech encoder of
17. The speech encoder of
0. 19. The method of claim 18 further comprising:
generating a first error signal as a difference between the weighted speech signal and the first weighted synthesized speech signal.
0. 20. The method of claim 19, wherein the first excitation signal is from the first codebook and the second excitation signal is from a second codebook, the method further comprising:
using the first error signal to independently select a third excitation signal from the first codebook and a fourth excitation signal from the second codebook; and
using the first error signal to independently select a third gain to apply to the third excitation signal and a fourth gain to apply to the fourth excitation signal.
0. 21. The method of claim 18, wherein the device is a communication device.
0. 22. The method of claim 18, wherein the device is one of a telephone, a mobile phone, a cordless phone, a digital answering machine or a personal digital assistant.
0. 23. The speech encoder of claim 7, wherein the speech encoder is included in a device.
0. 24. The speech encoder of claim 23, wherein the device is a communication device.
0. 25. The speech encoder of claim 23, wherein the device is one of a telephone, a mobile phone, a cordless phone, a digital answering machine or a personal digital assistant.
0. 26. The method of claim 1, wherein the method of speech encoding is performed by a device.
0. 27. The method of claim 26, wherein the device is a communication device.
0. 28. The method of claim 26, wherein the device is one of a telephone, a mobile phone, a cordless phone, a digital answering machine or a personal digital assistant.
0. 30. The device of claim 29, wherein the speech encoder is further configured to generate a first error signal as a difference between the weighted speech signal and the first weighted synthesized speech signal.
0. 31. The device of claim 30, wherein the first excitation signal is from the first codebook and the second excitation signal is from a second codebook, and wherein the speech encoder is further configured to:
Use the first error signal to independently select a third excitation signal from the first codebook and a fourth excitation signal from the second codebook; and
Use the first error signal to independently select a third gain to apply to the third excitation signal and a fourth gain to apply to the fourth excitation signal.
0. 32. The device of claim 29, wherein the device is a communication device.
0. 33. The device of claim 29, wherein the device is one of a telephone, a mobile phone, a cordless phone, a digital answering machine or a personal digital assistant.
|
This application is a continuation of U.S. application Ser. No. 09/625,088, filed Jul. 25, 2000 now U.S. Pat. No. 7,013,268.
The present invention relates generally to digital voice encoding and, more particularly, to a method and apparatus for improved weighting filters in a CELP encoder.
A general diagram of a CELP encoder 100 is shown in
In CELP encoder 100 speech is broken up into frames, usually 20 ms each, and parameters for synthesis filter 104 are determined for each frame. Once the parameters are determined, an excitation signal μ(n) is chosen for that frame. The excitation signal is then synthesized, producing a synthesized speech signal s′(n). The synthesized frame s′(n) is then compared to the actual speech input frame s(n) and a difference or error signal e(n) is generated by subtractor 106. The subtraction function is typically accomplished via an adder or similar functional component as those skilled in the art will be aware. Actually, excitation signal μ(n) is generated from a predetermined set of possible signals by excitation generator 102. In CELP encoder 100, all possible signals in the predetermined set are tried in order to find the one that produces the smallest error signal e(n). Once this particular excitation signal μ(n) is found, the signal and the corresponding filter parameters are sent to decoder 112, which reproduces the synthesized speech signal s′(n). Signal s′(n) is reproduced in decoder 112 using an excitation signal μ(n), as generated by decoder excitation generator 114, and synthesizing it using decoder synthesis filter 116.
By choosing the excitation signal that produces the smallest error signal e(n), a very good approximation of speech input s(n) can be reproduced in decoder 112. The spectrum of error signal e(n), however, will be very flat, as illustrated by curve 204 in
The weighted error signal ew(n) is also used to minimize the error signal by controlling the generation of excitation signal μ(n) In fact, signal ew(n) actually controls the selection of signal μ(n) and the gain associated with signal μ(n). In general, it is desirable that the energy associated with s′(n) be as stable or constant as possible. Energy stability is controlled by the gain associated with μ(n) and requires a less aggressive weighting filter 108. At the same time, however, it is desirable that the excitation spectrum (curve 202) of signal s′(n) be as flat as possible. Maintaining this flatness requires an aggressive weighting filter 108. These two requirements are directly at odds with each other, because the generation of excitation signal μ(n) is controlled by one weighting filter 108. Therefore, a trade-off must be made that results in lower performance with regard to one aspect or the other.
There is provided a speech encoder comprising a first weighting means for performing an error weighting on a speech input. The first weighting means is configured to reduce an error signal resulting from a difference between a first synthesized speech signal and the speech input. In addition, the speech encoder includes a means for generating the first synthesized speech signal from a first excitation signal, and a second weighting means for performing an error weighting on the first synthesized speech signal. The second weighting means is also configured to reduce the error signal resulting from the difference between the speech input and the first synthesized speech signal. There is also included a first difference means for taking the difference between the first synthesized speech signal and the speech input, where the first difference means is configured to produce a first weighted error signal. The speech encoder also includes a means for generating a second synthesized speech signal from a second excitation signal, and a third weighting means for performing an error weighting on the second synthesized speech signal. The third weighting means is configured to reduce a second error signal resulting from the difference between the first weighted error signal and the second synthesized speech signal. Then there is included a second difference means for taking the difference between the second synthesized speech signal and the first error signal, where the second difference means is configured to produce a second weighted error signal. Finally, there is included a feedback means for using the second weighted error signal to control the selection of the first excitation signal, and the selection of the second excitation signal.
There is also provided a transmitter that includes a speech encoder such as the one described above and a method for speech encoding. These and other embodiments as well as further features and advantages of the invention are described in detail below.
In the figures of the accompanying drawings, like reference numbers correspond to like elements, in which:
A typical implementation of a CELP encoder is illustrated in
H(z)=I/A(z) (1)
Where
A(z)=1−Σi=1Pαiz−1
Equation (2) represents a prediction error filter determined by minimizing the energy of a residual signal produced when the original signal is passed through synthesis filter 312. Synthesis filter 312 is designed to model the vocal tract by applying the correlation normally introduced into speech by the vocal tract to excitation signal μ(n). The result of passing excitation signal μ(n) through synthesis filter 312 is synthesized speech signal s′(n).
Synthesized speech signal s′(n) is passed through error weighting filter 314, producing weighted synthesized speech signal s′w(n). Speech input s(n) is also passed through an error weighting filter 318, producing weighted speech signal sw(n). Weighted synthesized speech signal s′w(n) is subtracted from weighted speech signal sw(n), which produces an error signal. The function of the error weighting filters 314 and 318 is to shape the spectrum of the error signal so that the noise spectrum of the error signal is concentrated in areas of high voice content. Therefore, the error signal generated by subtractor 316 is actually a weighted error signal ew(n).
Weighted error signal ew(n) is feedback to control the selection of the next excitation signal from codebook 302 and also to control the gain term (gc) applied thereto. Without the feedback, every entry in codebook 302 would need to be passed through synthesis filter 302 and subtractor 316 to find the entry that produced the smallest error signal. But by using error weighting filters 314 and 318 and feeding weighted error signal ew(n) back, the selection process can be streamlined and the correct entry found much quicker.
Codebook 302 is used to track the short term variations in speech signal s(n); however, speech is characterized by long-term periodicities that are actually very important to effective reproduction of speech signal s(n). To take advantage of these long-term periodicities, an adaptive codebook 304 may be included so that the excitation signal μ(n) will include a component of the form Gμ(n−α), where α is the estimated pitch period. Pitch is the term used to describe the long-term periodicity. The adaptive codebook selection is multiplied by gain factor (gp) in multiplier 306. The selection from adaptive codebook 304 and the selection from codebook 302 are then combined in adder 310 to create excitation signal μ(n). As an alternative to including the adaptive codebook, synthesis filter 312 may include a pitch filter to model the long-term periodicity present in the voiced speech.
In order to address the problem of balancing energy stability and excitation spectrum flatness, the invention uses the approach illustrated in
Weighted synthesized speech signal s′w1(n) is subtracted in subtractor 420 from weighted speech signal sw(n), which is generated from speech signal s(n) by error weighting filter 418. Weighted synthesized speech signals s′w2(n) is subtracted from the output of subtractor 420 in subtractor 422, thus generating weighted error signal ew(n). Therefore, weighted error signal ew(n) is formed in accordance with the following equation:
ew(n)=sw(n)−s′w1(n)−s′w2(n) (3)
which is the same as:
ew(n)=sw(n)−(s′w1(n)+s′w2(n)) (4)
Equation (4) is essentially the same as the equation for ew(n) in encoder 300 of
Additionally, different error weighting can be used for each error weighting filter 414, 416, and 418. In order to determine the best parameters for each error weighting filter 414, 416 and 418, different parameters are tested with different types of speech input sources. For example, the speech input source may be a microphone or a telephone line, such as a telephone line used for an Internet connection. The speech input can, therefore, vary from very noisy to relatively calm. A set of optimum error weighting parameters for each type of input is determined by the testing. The type of input used in encoder 400 is then the determining factor for selecting the appropriate set of parameters to be used for error weighting filters 414, 416, and 418. The selection of optimum error weighting parameters combined with independent control of the codebook selections and gains applied thereto, allows for effective balancing of energy stability and excitation spectrum flatness. Thus, the performance of encoder 400 is improved with regard to both.
Getting the pitch correct for speech input s(n) is also very important. If the pitch is not correct then the long-term periodicity will not be correct and the reproduced speech will not sound good. Therefore, a pitch estimator 424 may be incorporated into encoder 400. In one implementation, pitch estimator 424 generates a speech pitch estimate sp(n), which is used to further control the selection from adaptive codebook 402. This further control is designed to ensure that the long-term periodicity of speech input s(n) is correctly replicated in the selections from adaptive codebook 402.
The importance of the pitch is best illustrated by the graph in
In order to improve the speech pitch estimation sp(n) encoder 600 of
In an alternative implementation of encoder 600, filter 602 is an adaptive filter. Therefore, as illustrated in
As shown in
Alternatively, filter 602 may take its input from the output of error weighting filter 418. In this case, error weighting filter 418 provides the error weighting for s″w(n), and filter 602 does not incorporate a fourth error weighting filter. This implementation is illustrated by the dashed line in
There is also provided a transmitter 800 as illustrated in
Speech encoder 804 is coupled to a transceiver 806, which converts the encoded data from speech encoder 804 into a signal that can be transmitted. For example, many implementations of transmitter 800 will include an antenna 810. In this case, transceiver 806 will convert the data from speech encoder 804 into an RF signal for transmission via antenna 810. Other implementations, however, will have a fixed line interface such as a telephone interface 808. Telephone interface 808 may be an interface to a PSTN or ISDN line, for example, and may be accomplished via a coaxial cable connection, a regular telephone line, or the like. In a typical implementation, telephone interface 808 is used for connecting to the Internet.
Transceiver 806 will typically be interfaced to a decoder as well for bidirectional communication; however, such a decoder is not illustrated in
Transmitter 800 is capable of implementation in a variety of communication devices. For example, transmitter 800 may, depending on the implementation, be included in a telephone, a cellular/PCS mobile phone, a cordless phone, a digital answering machine, or a personal digital assistant.
There is also provided a method of speech encoding comprising the steps illustrated in
Next, in step 910, a second synthesized speech signal is generated from a second excitation signal multiplied by a second gain term. For example, s′2(n) as generated in
In certain implementations, pitch estimation is performed on the speech signal as illustrated in
While various embodiments of the invention have been presented, it should be understood that they have been presented by way of example only and not limitation. It will be apparent to those skilled in the art that many other embodiments are possible, which would not depart from the scope of the invention. For example, in addition to being applicable in an encoder of the type described, those skilled in the art will understand that there are several types of analysis-by-synthesis methods and that the invention would be equally applicable in encoders implementing these methods.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
4720861, | Dec 24 1985 | ITT Defense Communications a Division of ITT Corporation | Digital speech coding circuit |
5195137, | Jan 28 1991 | THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT | Method of and apparatus for generating auxiliary information for expediting sparse codebook search |
5293449, | Nov 23 1990 | Comsat Corporation | Analysis-by-synthesis 2,4 kbps linear predictive speech codec |
5491771, | Mar 26 1993 | U S BANK NATIONAL ASSOCIATION | Real-time implementation of a 8Kbps CELP coder on a DSP pair |
5495555, | Jun 01 1992 | U S BANK NATIONAL ASSOCIATION | High quality low bit rate celp-based speech codec |
5633982, | Dec 20 1993 | U S BANK NATIONAL ASSOCIATION | Removal of swirl artifacts from celp-based speech coders |
5717824, | Aug 07 1992 | CIRRUS LOGIC INC | Adaptive speech coder having code excited linear predictor with multiple codebook searches |
5864798, | Sep 18 1995 | Kabushiki Kaisha Toshiba | Method and apparatus for adjusting a spectrum shape of a speech signal |
6246979, | Jul 10 1997 | GRUNDIG MULTIMEDIA B V | Method for voice signal coding and/or decoding by means of a long term prediction and a multipulse excitation signal |
6470309, | May 08 1998 | Texas Instruments Incorporated | Subframe-based correlation |
6493665, | Aug 24 1998 | HANGER SOLUTIONS, LLC | Speech classification and parameter weighting used in codebook search |
6556966, | Aug 24 1998 | HTC Corporation | Codebook structure for changeable pulse multimode speech coding |
6738739, | Feb 15 2001 | Macom Technology Solutions Holdings, Inc | Voiced speech preprocessing employing waveform interpolation or a harmonic model |
6804218, | Dec 04 2000 | QUALCOMM INCORPORATED, A DELAWARE CORPORATION | Method and apparatus for improved detection of rate errors in variable rate receivers |
6925435, | Nov 27 2000 | Macom Technology Solutions Holdings, Inc | Method and apparatus for improved noise reduction in a speech encoder |
7062432, | Jul 25 2000 | Macom Technology Solutions Holdings, Inc | Method and apparatus for improved weighting filters in a CELP encoder |
7124078, | Jan 09 1998 | AT&T Corp. | System and method of coding sound signals using sound enhancement |
7590096, | Dec 04 2000 | Qualcomm Incorporated | Method and apparatus for improved detection of rate errors in variable rate receivers |
EP2259255, | |||
JP2010181889, | |||
JP2010181890, | |||
JP2010181891, | |||
JP2010181892, | |||
JP2010181893, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jun 13 2008 | Mindspeed Technologies, Inc. | (assignment on the face of the patent) | / | |||
Sep 28 2010 | WIAV Solutions LLC | MINDSPEED TECHNOLOGIES, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 025717 | /0206 | |
Mar 18 2014 | MINDSPEED TECHNOLOGIES, INC | JPMORGAN CHASE BANK, N A , AS ADMINISTRATIVE AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 032495 | /0177 | |
May 08 2014 | M A-COM TECHNOLOGY SOLUTIONS HOLDINGS, INC | Goldman Sachs Bank USA | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 032859 | /0374 | |
May 08 2014 | MINDSPEED TECHNOLOGIES, INC | Goldman Sachs Bank USA | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 032859 | /0374 | |
May 08 2014 | Brooktree Corporation | Goldman Sachs Bank USA | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 032859 | /0374 | |
May 08 2014 | JPMORGAN CHASE BANK, N A | MINDSPEED TECHNOLOGIES, INC | RELEASE BY SECURED PARTY SEE DOCUMENT FOR DETAILS | 032861 | /0617 | |
Jul 25 2016 | MINDSPEED TECHNOLOGIES, INC | Mindspeed Technologies, LLC | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 039645 | /0264 | |
Oct 17 2017 | Mindspeed Technologies, LLC | Macom Technology Solutions Holdings, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 044791 | /0600 |
Date | Maintenance Fee Events |
Dec 12 2013 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Feb 14 2014 | ASPN: Payor Number Assigned. |
Dec 04 2017 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Date | Maintenance Schedule |
Aug 07 2015 | 4 years fee payment window open |
Feb 07 2016 | 6 months grace period start (w surcharge) |
Aug 07 2016 | patent expiry (for year 4) |
Aug 07 2018 | 2 years to revive unintentionally abandoned end. (for year 4) |
Aug 07 2019 | 8 years fee payment window open |
Feb 07 2020 | 6 months grace period start (w surcharge) |
Aug 07 2020 | patent expiry (for year 8) |
Aug 07 2022 | 2 years to revive unintentionally abandoned end. (for year 8) |
Aug 07 2023 | 12 years fee payment window open |
Feb 07 2024 | 6 months grace period start (w surcharge) |
Aug 07 2024 | patent expiry (for year 12) |
Aug 07 2026 | 2 years to revive unintentionally abandoned end. (for year 12) |