An apparatus and method for vocoding an input signal comprising a linear predictive filter for generating a filtered signal with a first signal pulse and a second signal pulse in response to receiving the input signal and a processor having a lookup table with a plurality of track positions. The first signal pulse is associated with a first track position and the second signal pulse is associated with a second track position relative to the first signal pulse resulting in a plurality of excitation parameters. Additionally, the apparatus has a transmitter which transmits the plurality of excitation parameters in a transmission signal in response to receiving the plurality of excitation parameters from the processor.
|
1. A method of vocoding an input signal comprising the steps of:
filtering the input signal resulting in a filtered signal having a first signal pulse and a second signal pulse; encoding the first signal pulse by association of the first signal pulse with a first pulse position within a first track of a data structure, the first pulse position being one of a predetermined set of pulse positions within the first track; and assigning the second signal pulse to a second pulse position as a function of the first pulse position within a second track of the data structure, the second pulse position in the second track being in a non-adjacent relationship to the first pulse position in the first track.
8. An apparatus for vocoding an input signal comprising:
a linear predictive filter for generating a filtered signal with at least a first signal pulse and a second signal pulse in response to receiving the input signal; a processor having a lookup table with a plurality of track positions in which the first signal pulse is assigned a first track position in the first plurality of track positions, the first pulse position being one of a predetermined set of pulse positions within the first track, and the second signal pulse is assigned a second track position in the second plurality of pulse positions as a function of the first track position of the first signal pulse resulting in a plurality of excitation parameters, the second pulse position in the second track being in a non-adjacent relationship to the first pulse position in the first track; and a transmitter which transmits the plurality of excitation parameters in a transmission signal in response to receiving the plurality of excitation parameters from the processor.
14. An article of manufacture comprising:
a computer-readable signal bearing medium having computer readable program code means embodied therein for vocoding of a signal, the computer readable program code means in said article of manufacture having; means having a first computer readable program code for filtering the input signal resulting in a filtered signal having a first signal pulse and a second signal pulse; means having a second computer readable program code for encoding the first signal pulse by association of the first signal pulse with a first pulse position within a first track of a data structure, the first pulse position being one of a predetermined set of pulse positions within the first track, and means having a third computer readable program code for assigning the second signal pulse to a second pulse position as a function of the first pulse position within a second track of the data structure, the second pulse position in the second track being in a non-adjacent relationship to the first pulse position in the first track.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
9. The apparatus of
10. The apparatus of
11. The apparatus of
15. The article of manufacture of
16. The article of manufacture of
17. The method of
18. The apparatus of
|
This invention relates to voice compression, and in particular, to code excited linear prediction (CELP) vocoding.
A voice encoder/decoder (vocoder) compresses speech signals in order to reduce the transmission bandwidth required in a communications channel. By reducing the transmission bandwidth required per call, it is possible to increase the number of calls over the same communication channel. Early speech coding techniques, such as the linear predictive coding (LPC) technique, use a filter to remove the signal redundancy and hence compress the speech signal. The LPC filter reproduces a spectral envelope that attempts to model the human voice. Furthermore, the LPC filter is excited by receiving quasi periodic inputs for nasal and vowel sounds, while receiving noise-like inputs for unvoiced sounds.
There exists a class of vocoders known as code excited linear prediction (CELP) vocoders. CELP vocoding is primarily a speech data compression technique that at 4-8 kbps can achieve speech quality comparable to other 32 kbps speech coding techniques. The CELP vocoder has two improvements over the earlier LPC techniques. First, the CELP vocoder attempts to capture more voice detail by extracting the pitch information using a pitch predictor. Secondly, the CELP vocoder excites the LPC filter with a noise like signal derived from a residual signal created from the actual speech waveform.
CELP vocoders contain three main components; 1) short term predictive filter, 2) long term predictive filter, also known as pitch predictor or adaptive codebook, and 3) fixed codebook. Compression is achieved by assigning a certain number of bits to each component which is less than the number of bits used to represent the original speech signal. The first component uses linear prediction to remove short term redundancies in the speech signal. The error, or residual, signal that results from the short term predictor becomes the target signal for the long term predictor.
Voiced speech has a quasi-periodic nature and the long term predictor extracts a pitch period from the residual and removes the information that can be predicted from the previous period. After the long term and short term predictive filters, the resulting residual signal is a mostly noise-like signal. Using analysis-by-synthesis, a fixed codebook search finds a best match to replace the noise-like residual with an entry from its library of vectors. The code representing the best matching vector is transmitted in place of the noisy residual. In algebraic CELP (ACELP) vocoders, the fixed codebook consists of a few non-zero pulses and is represented by the locations and signs (e.g. +1 or -1) of the pulses.
In a typical implementation, a CELP vocoder will block or divide the incoming speech signal into frames, updating the short term predictor's LPC coefficients once per frame. The LPC residual is then divided into subframes for the long term predictor and the fixed codebook search. For example, the input speech may be blocked into a 160 sample frame for the short term predictor. The resulting frame may then be broken up into subframes of 53 samples, 53 samples, and 54 samples. Each subframe is then processed by the long term predictor and the fixed codebook search.
Referring to
The LPC filter is unable to remove all of the redundant information and the remaining quasi-periodic peeks and valleys in the filtered speech signal 200 are referred to as pitch pulses. The short term predictive filter is then applied to speech signal 200 resulting in the short term filtered signal 300, FIG. 3. The long term predictor filter removes the quasi-periodic pitch pulses from the residual speech signal 300,
In
In the current example, the subframe 354,
The pulse position is constrained by the absolute pulse position in the tracks. Disadvantageously, the CELP vocoder tends to place pulses in adjacent positions in the tracks. By placing the pulses in adjacent positions in the tracks, the start of the speech sound is encoded rather than a more balance encoding of the utterance. Additionally, as the bit rate for the vocoder decreases and fewer pulses are used, the voice quality is adversely affected by inefficient placement of pulses into tracks. What is needed is a method to reduce the occurrence of pulses being placed in adjacent track positions.
The inefficiency of absolute track positions placement is eliminated by the implementation of placement of a signal pulse in a second track relative to the position of a signal pulse in the first track. Implementing relative positioning of the N+1 signal pulses in the N+1 tracks during encoding of a signal pulse results in increased signal quality of the decoded signal. The increased signal quality is achieved by more precise placement of pulses in the tracks and by reducing the occurrence of adjacent placement of signal pulse positions within the tracks.
The foregoing objects and advantageous features of the invention will be explained in greater detail and others will be made apparent from the detailed description of the present invention, which is given with reference to the several figures of the drawing, in which:
In
Rather than encoding signal pulses in adjacent track positions, a relative positioning of the second signal pulse occurs. By having fewer adjacent signal pulses encoded in the track, the signal pulses are better able to reproduce the bursts energy which improves the voice quality of the signal decoded by the vocoder. A single signal pulse is encoded in each of the two tracks 502 and 504 in the present embodiment. By positions the second signal pulse in the second track in relation to the first signal pulse in the first track an increase in the quality of the decoded utterance is achieved. In an alternate embodiment, the codebook table contains more than two tracks and the additional signal pulses in tracks are relative to an earlier track position of an earlier signal pulse.
In the present embodiment the relative location of the second signal pulse in the second track is to the first signal pulse in the first track. In an alternate embodiment the relative position of the second signal pulse in the second track is relative to the first signal pulse sample position. In yet another embodiment, the signal pulse position in the second track may be grouped in a non-sequential order (i.e. 1, -1, 7, -7, 2, -2, 6, -6, 3, -3, 5, -5, 4, -4).
Turning to
Each device 602, 604 has a respective signal input/output units 608, 610. Units 608, 610 are shown as telephonic devices that transfer analog voice signals to and from the transmitter device 602 and receiver device 604. The signal input/output unit 608 is coupled to the transmitter device 602 by a two-wire communication path 612. Similarly, the other signal input/output unit 610 is coupled to the receiver device 604 over another two-wire communication path 614. In an alternate embodiment, the signal input unit is incorporated in the transmitting and receiving communication devices (i.e. speakers and microphones built into the transmitting and receiving devices)or communicate over a wireless communication path (i.e. cordless telephone).
The transmitter device 602 contains an analog signal port 616 coupled to the two-wire communication path 612, a CELP vocoder 618, and a controller 620. The controller 620 is coupled to the analog signal port 616, the vocoder 618, and a network interface 622. Additionally, the network interface 622 is coupled to the vocoder 618, the controller 620, and the communication path 606.
Similarly, the receiver device 604 has another network interface 624 coupled to another controller 626, the communication path 606, and another vocoder 628. The other controller 626 is coupled to the other vocoder 628, the other network interface 624, and another analog signal port 630. Additionally, the other analog signal port 630 is coupled to the other two-wire communication path 614.
A voice signal is received at the analog port 616 from the signal input device 608. The controller 620 provides the control and timing signals for the transmitter device 602 and enables the analog port 161 to transfer the received signal to the vocoder 618 for signal compression. The vocoder 618 has a fixed codebook with a data structure shown in
Two signal pulses are kept from being adjacently assigned in the tracks by assignment of the second pulse position relative to the first pulse position. The first signal pulse is encoded and assigned a pulse position in the first track 502 and the pulse position of the second signal pulse in the second track 504 is encoded relative to the first track 502. The relative encoding of the second pulse position results in a compressed signal having a greater likelihood that the first pulse position is not adjacent to the second pulse position. The compressed signal is then sent from the vocoder 618,
The other network interface 624 located in the receiver device 604 receives the compressed signal. The receiver controller 626 enables the received compressed signal to be transferred to the receiver vocoder 628. The receiver vocoder 628 decodes the compressed signal by using a lookup table 500, FIG. 6. The vocoder 628,
Turning to
The analog signal is received at the preprocessor 710 from the analog device 608, FIG. 7. The preprocessor 710,
The output of the perceptual weighting processor 718 is sent to the fixed codebook search 734 and the pitch analyzer 722. The fixed codebook search 734 generates the code values that are sent to the parameter encoder 724 and the fixed codebook 730. The fixed codebook search 734 is shown separate from the fix codebook 730, but may alternatively be included in the fixed codebook 730 and does not have to be implemented separately. Additionally, the fixed codebook search has access to the data structure of the lookup table 500,
The pitch analyzer 722,
The fixed codebook 730 receives the code values generated by the fixed codebook search 734 and regenerates a signal. The generated signal is combined with the signal from the adaptive codebook 732 by signal combiner 720. The resulting combined signal is then used by the synthesis filter 716 to model the short term spectral shape of the speech signal and fed back to the adaptive codebook 732.
The parameter encoder receives parameters from the fixed codebook search 734, the pitch analyzer 722, and the LP filter 714. The parameter encoder using the received parameters generates the compressed signal. The compressed signal is then transmitted by the transmitter 728 across the network.
In an alternate embodiment of the above system, the encoder and decoder portions of the vocoder reside in the same device, such as a digital answering machine. A communication path in such an embodiment is a data bus that allows the compressed signal to be stored and retrieved from a memory.
In
The compressed signal is received by the receiver device 604 at the network interface 616. The receiver 802 unpacks the data from the compressed signal received at the network interface 616. The data consists of a fixed codebook index, a fixed codebook gain, an adaptive codebook index, adaptive codebook gain, and an index for the LP coefficients. The fixed codebook 804 contains a lookup table 500,
Turning to
The filtered residual signal is further filtered by a long term filter, in step 906, FIG. 10 and the adaptive codebook 732,
The lookup table 500 is used by the fixed codebook 730,
Current state of technology allows general purpose digital signal processors to be combined with other electronic elements in order to make a CELP vocoder that is configured by software. Therefore, a computer readable signal bearing medium may contain software code to implement a vocoder having additional constraints for restricting pulse positions in a codebook.
While the invention has been particularly shown and described with reference to a particular embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention and it is intended that all such changes come within the scope of the following claims.
Patent | Priority | Assignee | Title |
10083698, | Dec 26 2006 | Huawei Technologies Co., Ltd. | Packet loss concealment for speech coding |
10089995, | Jan 26 2011 | Huawei Technologies Co., Ltd. | Vector joint encoding/decoding method and vector joint encoder/decoder |
6847929, | Oct 12 2000 | Texas Instruments Incorporated | Algebraic codebook system and method |
6980948, | Sep 15 2000 | HTC Corporation | System of dynamic pulse position tracks for pulse-like excitation in speech coding |
7224295, | Jul 11 2005 | MEDIATEK INC. | System and method for modulation and demodulation using code subset conversion |
7302386, | Nov 14 2002 | Electronics and Telecommunications Research Institute | Focused search method of fixed codebook and apparatus thereof |
7596493, | Dec 31 2004 | STMicroelectronics Asia Pacific Pte Ltd. | System and method for supporting multiple speech codecs |
7742926, | Apr 18 2003 | Intel Corporation | Digital audio signal compression method and apparatus |
8502706, | Dec 18 2003 | Intel Corporation | Bit allocation for encoding track information |
8520536, | Apr 25 2006 | Samsung Electronics Co., Ltd. | Apparatus and method for recovering voice packet |
8831959, | Jun 30 2011 | TELEFONAKTIEBOLAGET LM ERICSSON PUBL | Transform audio codec and methods for encoding and decoding a time segment of an audio signal |
8930200, | Jan 26 2011 | Huawei Technologies Co., Ltd | Vector joint encoding/decoding method and vector joint encoder/decoder |
9065547, | Apr 18 2003 | Intel Corporation | Digital audio signal compression method and apparatus |
9230553, | Jun 15 2011 | III Holdings 12, LLC | Fixed codebook searching by closed-loop search using multiplexed loop |
9336790, | Dec 26 2006 | Huawei Technologies Co., Ltd | Packet loss concealment for speech coding |
9404826, | Jan 26 2011 | Huawei Technologies Co., Ltd. | Vector joint encoding/decoding method and vector joint encoder/decoder |
9472199, | Sep 28 2011 | LG Electronics Inc | Voice signal encoding method, voice signal decoding method, and apparatus using same |
9546924, | Jun 30 2011 | TELEFONAKTIEBOLAGET L M ERICSSON PUBL | Transform audio codec and methods for encoding and decoding a time segment of an audio signal |
9704498, | Jan 26 2011 | Huawei Technologies Co., Ltd. | Vector joint encoding/decoding method and vector joint encoder/decoder |
9767810, | Dec 26 2006 | Huawei Technologies Co., Ltd. | Packet loss concealment for speech coding |
9881626, | Jan 26 2011 | Huawei Technologies Co., Ltd. | Vector joint encoding/decoding method and vector joint encoder/decoder |
Patent | Priority | Assignee | Title |
4625286, | May 03 1982 | Texas Instruments Incorporated | Time encoding of LPC roots |
4932061, | Mar 22 1985 | U S PHILIPS CORPORATION | Multi-pulse excitation linear-predictive speech coder |
5704003, | Sep 19 1995 | THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT | RCELP coder |
5708757, | Apr 22 1996 | France Telecom | Method of determining parameters of a pitch synthesis filter in a speech coder, and speech coder implementing such method |
5754976, | Feb 23 1990 | Universite de Sherbrooke | Algebraic codebook with signal-selected pulse amplitude/position combinations for fast coding of speech |
5778338, | Jun 11 1991 | Qualcomm Incorporated | Variable rate vocoder |
5924062, | Jul 01 1997 | Qualcomm Incorporated | ACLEP codec with modified autocorrelation matrix storage and search |
5963897, | Feb 27 1998 | Nuance Communications, Inc | Apparatus and method for hybrid excited linear prediction speech encoding |
6067511, | Jul 13 1998 | Lockheed Martin Corporation | LPC speech synthesis using harmonic excitation generator with phase modulator for voiced speech |
6094629, | Jul 13 1998 | Lockheed Martin Corporation | Speech coding system and method including spectral quantizer |
6119082, | Jul 13 1998 | Lockheed Martin Corporation | Speech coding system and method including harmonic generator having an adaptive phase off-setter |
6138092, | Jul 13 1998 | Lockheed Martin Corporation | CELP speech synthesizer with epoch-adaptive harmonic generator for pitch harmonics below voicing cutoff frequency |
6233550, | Aug 29 1997 | The Regents of the University of California | Method and apparatus for hybrid coding of speech at 4kbps |
6240386, | Aug 24 1998 | Macom Technology Solutions Holdings, Inc | Speech codec employing noise classification for noise compensation |
6311154, | Dec 30 1998 | Microsoft Technology Licensing, LLC | Adaptive windows for analysis-by-synthesis CELP-type speech coding |
6334105, | Aug 21 1998 | Matsushita Electric Industrial Co., Ltd. | Multimode speech encoder and decoder apparatuses |
6539349, | Feb 15 2000 | Lucent Technologies Inc. | Constraining pulse positions in CELP vocoding |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Aug 03 2000 | BENNO, STEVEN A | Lucent Technologies, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 011135 | /0686 | |
Aug 07 2000 | Lucent Technologies Inc. | (assignment on the face of the patent) | / | |||
Jan 30 2013 | Alcatel-Lucent USA Inc | CREDIT SUISSE AG | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 030510 | /0627 | |
Aug 19 2014 | CREDIT SUISSE AG | Alcatel-Lucent USA Inc | RELEASE BY SECURED PARTY SEE DOCUMENT FOR DETAILS | 033949 | /0531 |
Date | Maintenance Fee Events |
Sep 26 2007 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Sep 22 2011 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Oct 20 2015 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Date | Maintenance Schedule |
Apr 27 2007 | 4 years fee payment window open |
Oct 27 2007 | 6 months grace period start (w surcharge) |
Apr 27 2008 | patent expiry (for year 4) |
Apr 27 2010 | 2 years to revive unintentionally abandoned end. (for year 4) |
Apr 27 2011 | 8 years fee payment window open |
Oct 27 2011 | 6 months grace period start (w surcharge) |
Apr 27 2012 | patent expiry (for year 8) |
Apr 27 2014 | 2 years to revive unintentionally abandoned end. (for year 8) |
Apr 27 2015 | 12 years fee payment window open |
Oct 27 2015 | 6 months grace period start (w surcharge) |
Apr 27 2016 | patent expiry (for year 12) |
Apr 27 2018 | 2 years to revive unintentionally abandoned end. (for year 12) |