Disclosed is a system and method for implementing compression coding of audio signals, such as speech signals, using two long-term prediction (ltp) models. The method determines the parameters of a second long-term prediction model on the basis of the parameters of at least one first ltp model. The present invention is aimed at switching from an ltp model with a single coefficient (monotap) to an ltp model with several coefficients, (multitap) and vice versa, as well as at switching between two multitap ltp models. The complexity of the method may be adjusted, especially as a function of a desired compromise between a target complexity and a desired quality. A device for implementing the method according to the invention is, moreover, very useful for multiple codings in cascade (transcodings) or in parallel (multi-codings and multi-mode codings).
|
1. A method of coding according to a second coding format, on the basis of information obtained by implementing at least one step of coding according to a first coding format, the first and second coding formats implementing, in particular for the coding of a speech signal, a step of searching for long-term prediction (ltp) parameters by exploring at least one dictionary having candidate parameters, one at least of the first and second coding formats using a multitap filtering with several coefficients for a fine search for the ltp parameters, the method comprising:
a) conducting one of a statistical and analytical study, as a function of successive suites of ltp parameters according to the first coding format, so as to determine a number of orders and appropriate orders in a dictionary that the second coding format uses;
b) recovering a priori information including information corresponding to a partition of the first dictionary relating to a class of the partition to which an ltp parameter obtained in the course of the coding according to the first coding format belongs, so as to select at least one order from said dictionary that the second coding format uses;
c) applying the selected order to the candidate parameters of said dictionary that the second coding format uses so as to rank the candidate parameters;
d) choose the m first ranked candidate parameters as a plurality of first candidate parameters, m depending on a desired quality or complexity; and
e) performing the second coding by conducting the fine search for the ltp parameters only among said plurality of candidate parameters, wherein the first coding format uses a first dictionary and the second coding format uses a second dictionary, and wherein a plurality of similar orders is grouped so as to dynamically modify an initial partition of the first dictionary and, thereby, a number of orders of the second dictionary.
22. A device for coding according to a second coding format, designed to use coding information obtained by implementing a coding according to a first coding format, the first and second coding formats implementing, in particular for the coding of a speech signal, a search for long-term prediction (ltp) parameters by exploring a dictionary comprising candidate parameters, one at least of the first and second coding formats using a multitap filtering with several coefficients for a fine search for the ltp parameters, the device comprising:
a memory storing a correspondence table defining, as a function of ltp parameters determined by the first coding format which uses a first dictionary, orders of a second dictionary that the second coding format uses, an order being defined by ranking elements of said second dictionary according to a certain criterion, said correspondence table being defined by conducting one of a statistical and analytical study, as a function of successive suites of ltp parameters according to the first coding format, so as to determine a number of orders and appropriate orders in a said second dictionary that the second coding format uses, wherein a plurality of similar orders is grouped so as to dynamically modify an initial partition of the first dictionary and, thereby, a number of orders of the second dictionary;
a component configured to recover a signal giving at least one a priori information including information corresponding to a partition of the first dictionary relating to a class of the partition to which an ltp parameter obtained in the course of the coding according to the first coding format belongs on ltp parameters in the course of a coding according to the first coding format;
a component configured to be active on reception of said signal for consulting said correspondence table and selecting at least one order of said second dictionary of the second coding format; and
a calculator which:
ranks the candidate parameters of said second dictionary of the second coding format according to the selected order, with a view to choosing the m first ranked candidate parameters as a plurality of first candidate parameters from the second dictionary, m depending on a desired quality or complexity; and
continues the coding according to the second coding format, by conducting the ltp search only among the plurality of first candidate parameters.
2. The method as claimed in
3. The method as claimed in
4. The method as claimed in
5. The method as claimed in
6. The method as claimed in
7. The method as claimed in
8. The method as claimed in
9. The method as claimed in
wherein, for first and second subframes of identical duration, to each current subframe of the second coding format there corresponds a single subframe of the first coding format; and:
the first coding format selects a first suite of ltp parameters for the current subframe;
on the basis of the partition by classes of the first dictionary associated with one of the ltp parameters of the first format, selecting an order of exploration of the second dictionary of the second format by choosing an order associated with the class of the element of said first suite; and
following the order thus selected, exploring a limited number of first candidate parameters of the second dictionary of the second format.
10. The method as claimed in
wherein, for first and second subframes of different durations:
the first coding format selects a plurality of ltp parameter suites, for first subframes corresponding substantially to a second current subframe;
on the basis of the partition by classes of the first dictionary associated with one of the ltp parameters of the first format, orders of exploration of the second dictionary of the second format are preselected by choosing the orders associated with the classes of the elements of said ltp parameter suites;
at least one preferred order is determined on the basis of the preselection of said orders; and
said second dictionary of the second format is explored following the preferred order, limiting the exploration to its first elements.
11. The method as claimed in
12. The method as claimed in
13. The method as claimed in
preselecting K orders;
examining the first element of each of the K orders while eliminating any redundancies, to obtain K1 elements, with K1≦K;
adding K2 elements chosen from a set including the second element of the K orders while eliminating any redundancies, and such that K2≦K and K2≦N-K1, and substantially these steps are repeated until said N elements are obtained.
14. The method as claimed in
constructing K subsets of rankings by preselecting the first Ni elements, with Ni≦N, of each ranking Ci, with i lying between 1 and K;
choosing the Ni elements so that ΣNi≧N;
selecting all the elements present in the K subsets; and
repeating the selecting step with a selection of the elements present in K-i subsets, where i increases by recurrence until N elements are retained.
15. The method as claimed in
determining for each first subframe, by implementing the first coding format, a pair (λe, βe) of first parameters of the ltp filter with one coefficient;
determining, for the coding of a second current subframe, a plurality of pairs (λs,(βi)s) of parameters of the ltp filter with several coefficients on the basis of the suite of parameters of the first format (λe,βe), with:
a determination of an ltp delay λs corresponding preferably to that determined by the first coding format on a first subframe which most overlaps the second current subframe; and
a determination of a vector of gains (βi)s for the second current subframe on the basis of one at least of the gains βe of the first subframes, by implementing steps (b), (c) and (d) where the orders of the second dictionary of the second format correspond to a set of gain vectors (βi)s of the second subframe.
16. The method as claimed in
preselecting, on the basis of first ltp gains of the first format βe that are chosen for one or more first subframes corresponding to a second current subframe, the orders of the second dictionary of the second format, that are associated with classes of the first ltp gains;
constructing a single one of these orders, on the basis of said orders preselected for said second current subframe; and
testing N first vectors of seconds gains, determined by the order constructed, so as to select, according to a chosen criterion, a better vector of gains to be associated with the second subframe.
17. The method as claimed in
determining for each first subframe, by implementing the first coding format, a first suite of ltp parameters λe,(βi)e corresponding to a pair comprising an ltp delay λe and a vector of associated gains (βi)e of the ltp filter with several coefficients;
performing a partition of a said first dictionary of the gain vectors (βi)e of the first format;
determining for the coding of a second current subframe by the second format, orders of a second dictionary of the second format for first subframes corresponding to the second current subframe, said second dictionary of the second format being constructed from a set of jitter values and said orders of this second dictionary being associated with the partition of the first dictionary of the first format; and
determining an order of the jitter values and ltp delay values for the second format are explored successively on the jitter values thus ordered and at least one anchoring delays determined as a function of the delays λe on the first subframes.
18. The method as claimed in
19. The method as claimed in
20. The method as claimed in
on the basis of at least one first suite of parameters selected by the first format and including at least one vector of gains (βi)e determined for at least one first subframe, a partition is conducted of the first dictionary of the first format corresponding to a dictionary of the gain vectors of the first format (βi)e;
orders of the second dictionary of the second format corresponding to a dictionary of the gain vectors (βi)s of the second format are deduced therefrom, said orders being associated with said partition;
on the basis of gain vectors (βi)e chosen by the first format for first subframes which substantially cover the second current subframe, orders of the second dictionary that are associated with classes of said partition are preselected;
one of the preselected orders is retained;
several gain vectors to be associated with the second current subframe are determined as a function of the order retained; and
by tests on said several gain vectors, the best gain vector is selected according to a chosen criterion.
21. The method as claimed in one of
23. A coding system implementing at least one first and one second coding format, comprising at least one device for coding according to the first format and a coding device as claimed in
24. The coding system as claimed in
25. The coding system as claimed in
26. A computer program product, stored in a memory of a processing unit or on a non-transitory removable medium intended to cooperate with a reader of said processing unit, comprising instructions for implementing the steps of the method as claimed in any one of
|
This application claims priority from PCT/FR2006/000038 filed Jan. 9, 2006, which claims priority from French Application FR 05 00272, filed Jan. 11, 2005, both of which are hereby incorporated by reference in their entirety.
The present invention relates to the compression coding/decoding of digital audio signals, in particular of speech signals and/or of multimedia signals, in particular for transmission or storage applications. It is more especially aimed at effective determination of the parameters of a second long-term prediction model (or “LTP” for “Long Term Prediction”), on the basis of the parameters of at least one first LTP prediction model.
Compression coders use properties of the digital audio signal such as its local stationarity, utilized by short-term prediction filters, as well as its harmonic structure, utilized by LTP long-term prediction filters. Typically, the voiced sounds of a speech signal (such as the vowels) exhibit a long-term correlation due to the vibration of the vocal cords. The long-term correlation is modeled by an LTP filter denoted P(z) which makes it possible to retrieve the harmonic structure by using a synthesis filter of the type:
The simplest form of the long-term prediction filter is the filter P(z) with a single coefficient β (also called the gain) and integer delay T such that P(Z)=βZ−T. The delay T is also called the “pitch” period, or more simply the “pitch”.
Currently, more elaborate modelings are aimed at:
where for a delay (T+1/D), of resolution 1/D, the coefficients p1(i) are given by p1(i)=hinter (iD−1), 0≦1≦D−1, hinter being an interpolation filter of length 2ID+1.
The parameters of the filter (delay and gain(s)) vary according to the signals to be coded and for one and the same signal over time. For example, in speech coding, the span of the pitch periods seeks to cover the range of the fundamental frequencies of the human voice (from low voices to high voices). For one and the same talker, this frequency also varies temporally. Likewise, the coefficient(s) of the filter also evolves(evolve) over time.
On coding, the parameters of P(z) are determined either by an open-loop analysis or by a closed-loop analysis or usually by a combination of both analyses. The open-loop analysis is performed by minimizing the prediction error in the signal to be modeled. The closed-loop analysis (termed “analysis by synthesis”) minimizes the quadratic error, usually weighted, between the voice signal to be modeled and the synthesis signal. Usually, an open-loop search is firstly envisaged so as to determine a first estimate of the pitch called the “open-loop pitch”. Then, a search based on analysis by synthesis over a restricted neighborhood around this anchoring value makes it possible to obtain a more accurate value of the pitch. These analyses are performed on blocks of samples. The lengths of the open-loop and closed-loop analysis blocks are not necessarily equal. Often, a single open-loop analysis is performed for several closed-loop analyses.
For any LTP model (monotap or multitap), the determination of the LTP parameters is very expensive in terms of calculational complexity. It generally consists of an open loop over a large block of samples followed by closed loops over several sub-blocks of samples (also called subframes). In particular, the open-loop search for the harmonic lag is a very expensive operation, on coding. Usually, it requires the calculation of an auto-correlation function of the signal for numerous values (in fact over a span of variation of the delays). In the coder according to the UIT-T G.723.1 standard, this span of delays comprises 125 integer delays (from 18 to 142) and the open-loop delay is estimated every 15 ms (i.e. therefore for blocks of 120 samples). In the coder according to the 8-kbits/s UIT-T G.729 standard, the open-loop analysis is performed every 10 ms (at each block of 80 samples) and explores a span of 124 integer delays (from 20 to 143). This operation constitutes nearly 70% of the complexity of the LTP analysis for this type of coding.
Even though it is focused around the delay obtained in open loop, the closed loop is also extremely expensive in terms of calculations and, consequently, resources. It requires the generation of adaptive excitations and their filtering. For example, in the G.723.1 coding which uses a multitap LTP model, the closed-loop analysis jointly determines the vector of gains (βi) and a lag λ (in the guise of candidate pitch) of each subframe by exploring a dictionary of gain vectors for several candidate pitch values. This analysis constitutes nearly half the total complexity of the 5.3-kbits/s G.723.1 coder.
The complexity of the LTP analysis is especially critical when several codings must be performed by one and the same processing unit such as a gateway responsible for managing numerous communications in parallel or a server distributing numerous multimedia contents. The problem of complexity is further increased by the multiplicity of compression formats which circulate around the networks. Several codings are then envisaged, either in cascade (or “transcoding”), or in parallel (multi-format coding or multi-mode coding). Transcoding is typically used when, in a transmission chain, a compressed signal frame sent by a coder can no longer continue its path, in this format. Transcoding makes it possible to convert this frame into another format compatible with the rest of the transmission chain. The most elementary solution (and the commonest at present) is to abut a decoder and a coder. The compressed frame, arriving in a first format, is decompressed. This decompressed signal is then re-compressed into a second format accepted by the rest of the communication chain. This cascading of a decoder and a coder is called “tandem”. Nevertheless, this solution is very expensive in terms of complexity (essentially because of the recoding) and degrades the quality, the second coding being done in fact on a decoded signal which is a degraded version of the original signal. Additionally, a frame may encounter several tandems before arriving at its destination, thereby further increasing the cost in terms of calculation and the loss of quality. Furthermore, the delays related to each tandem operation accumulate and may be detrimental to the interactivity of the communications.
As regards the multi-format compression systems where one and the same content is compressed in several formats (typically in the case of content servers which broadcast one and the same content in several formats suited to the conditions of access, networks and terminals of the various end users), the multi-coding operation becomes extremely complex as the number of desired formats increases, and this may rapidly saturate the resources of the systems. Another case of multiple coding in parallel is multi-mode compression with a posteriori decision according to which, at each signal segment to be coded, several compression modes are executed and the mode which optimizes a given criterion or obtains the best throughput/distortion compromise is selected. Here again, the complexity of each of the compression modes limits their number and/or leads to a very restricted number of modes being selected a priori.
Currently, most multiple coding operations do not yet take full account of the similarities between coding formats, and this could however reduce the complexity and the algorithmic delay while limiting the degradation introduced. For one and the same coding format parameter, the differences between coders reside in the modeling, the procedure and/or the frequency of calculation, or else the quantization.
Generally, the solutions proposed today endeavor to limit the number of values explored for the parameters of a second LTP model by using the parameters chosen by the first format, to reduce the complexity of the LTP search for the second format.
Transcoding between two monotap LTP models is the simplest case. Most of the currently proposed procedures relate to transcoding between delays, the transcoding of the LTP gain usually being performed at the actual signal level (one speaks of “partial” tandem) when the two models are identical (the same dictionary of delays and same subframe length), a simple copy of the binary fields of the delays from one bit stream to the other is sufficient. When the dictionaries differ by their resolution (integer or fractional ⅓, ⅙, etc.) and/or by their spans of values, a transcoding into the binary or parameter domain, with a possible transformation, is used. The transformation may be a quantization, a truncation, a doubling or a splitting. When the lengths of the subframes of the two formats are different, an interpolation of the delays may be provided. For example, the delays of a first format overlapping an output subframe are interpolated. It is then possible to use this interpolated delay only when the latter is close to the delay obtained at the previous subframe, otherwise a conventional search is conducted. Another more direct procedure, without interpolation, consists in selecting a delay from among these delays of the first format. This selection may be made according to several criteria: last subframe, subframe having the most samples in common with the subframe of the second format or else that which maximizes a criterion which depends on the LTP gain. The delay determined is an anchoring value for the search for the delay of the second format. It may be used as open-loop delay of the second format around which a conventional or restricted closed-loop search is performed, or as a first estimate of it, or as anchoring of a delay trajectory.
In the case of a transcoding between a monotap LTP modeling and a multitap LTP modeling, the only implementation that is provided for at present is simply in the signal domain, owing to the dissimilarity of the modelings. Most of the existing transcoding techniques limit themselves to reducing the complexity of the open loop of the second format by selecting one of the delays of the first format or an interpolation of these delays as open-loop delay. However, a few techniques have been proposed for also reducing the complexity of the closed loop.
In document WO-03058407, the fractional delay λ′ of a monotype model is determined on the basis of the vector of coefficients (βi) of a multitap model by calculating the expression:
In document reference [1]:
“An Efficient Transcoding Algorithm for G.723.1 and G.729A Speech Coders”, Sung-Wan Yoon, Sung-Kyo Jung, Young-Cheol Park, and Dae-Hee Youn, Proc. Eurospeech 2001, pp. 2499-2502,
the closed-loop search for the vector of gains of a multitap model is restricted to a subset of the dictionary of multitap gains, which is determined by the gain of the monotap model of the first format. This determination, as well as the composition of the subsets are performed as follows: the global gain of each vector of the dictionary of gains is calculated; next, on the basis of 170 global gains corresponding to the 170 vectors of the dictionary, 8 subsets are constructed and a single one of these subsets is selected depending on the LTP gain of the first monotap model.
In a variant according to the document referenced [2]:
“Transcoding algorithm for G723.1 and AMR Speech Coders: for Interoperability between VoIP and Mobile Networks”, Sung-Wan Yoon and al., Proc. Eurospeech 2003, pp. 1101-1104,
the subsets are built up by learning as follows: the span of variation of the monotap gain of an NB-AMR coder is divided into 8 subsections, then, for each subsection, a statistical study on an NB-AMR tandem makes it possible to determine M vectors of gains of the dictionaries of a coder according to the G.723.1 standard. These gain vectors are statistically the most probable. The number M is taken equal to 40 for the dictionary comprising 85 vectors and to 85 for the dictionary comprising 170 vectors. During the search for the optimal vector of gains, the exploration of the dictionary is limited to the subset associated with the subsection to which the gain of the NB-AMR coder belongs.
To the knowledge of the inventors, there is at present no technique for transcoding between two multitap LTP modelings. As was seen above, most of the current solutions relate only to monotap LTP models. Certain techniques propose a transcoding between a multitap model and a monotap model but limit themselves to reducing the complexity of the search for the open-loop delay of the second format.
Among the few approaches proposed for reducing the complexity of the closed loop, some are based on approximating a multitap LTP filter by a monotap LTP filter (fractional or otherwise). For example, in the case of an approximation of a multitap filter:
by a nonfractional monotap filter Pmono(z)=βz−(T−δ),
a gain β and a delay jitter δ are estimated such that: Pmono(z)≈Pmulti(z), for all the integer delays T considered.
The approximating of a multitap LTP model by a monotap LTP model has already been utilized from the UIT-T G.723.1 standard, in fact to estimate the adaptive prefilter and also to control the instability of the LTP filter. The studies conducted during the design of the coder according to the G.723.1 standard have shown that it is not always possible to satisfactorily approximate a multitap LTP filter by a monotap LTP filter, over a wide span of delays, with the same gain β and the same jitter δ in the delay. For one and the same vector of gains (βi), the estimate of the optimal pair (β, δ) may vary greatly as a function of the delay T. In the coder according to the G.723.1, it has been possible to overcome this difficulty since the stability control procedure picks out the maximum gain from among the estimated gains (which may then be very dissimilar) and the adaptive prefilter is disabled for any vector of gains of the multitap model when, over the relevant span of delays, the estimated gains are too different or the jitters in the delay are too dissimilar or too large. If, for the modules for adaptive pre-filtering and instability control of the long-term prediction filter, it is possible, to overcome the difficulty of estimation without degrading performance, these advantages are more difficult to achieve with the LTP analysis module itself which plays a crucial role with regard to quality. Thus, according to the vector of gains and/or the delay considered, the 170 global gains calculated for each vector of the 170 entries of the dictionary, as seen in the prior art above [1], may be very far from the optimal gains. Likewise, according to the vector of gains (βi) and/or the delay λ, the calculation of the fractional delay λ′, as seen in the prior art WO-03058407 hereinabove, may lead to a poor determination of the fractional delay.
Whether the approach be analytical or statistical, the approximating, over a wide range of delays, of a multitap LTP filter by a single monotap LTP filter (or the inverse approximation) is too inaccurate. To solve this problem, it would, in order to take account of the variation of the gain β and/or of the jitter δ according to the delay T, be possible to store a pair (β,δ) for each delay T. However, this solution would be too expensive in terms of storage since it would require the storage of a pair for each gain vector and for each delay of the span. In the case of the approximation of the multitap LTP filters of the G.723.1 code, which comprises two multitap dictionaries of 170 and 85 vectors, with a span of 125 delays, it would be necessary to store 31875 (=125*(170+85)) pairs. Moreover, this solution would not solve the cases where the approximation of a multitap by a monotap is really too inaccurate, or even erroneous. It will be noted that conversely, several pairs (β,δ) may also constitute good approximations of a multitap LTP filter.
The present invention intends to improve the situation.
Firstly, the present invention is aimed at switching from an LTP model with a single coefficient (monotap) to an LTP model with several coefficients, (multitap) and vice versa, as well as at switching between two multitap LTP models. In particular, it proposes a method whose complexity may be adjusted, especially as a function of a desired compromise between a target complexity and a desired quality. A device for implementing the method according to the invention is, moreover, very useful for multiple codings in cascade (transcodings) or in parallel (multi-codings and multi-mode codings).
Thus, the invention is firstly aimed at a method of coding according to a second format, on the basis of information obtained by implementing at least one step of coding according to a first format. The first and second formats implementing, in particular for the coding of a speech signal, a step of searching for LTP long-term prediction parameters by exploring at least one dictionary comprising candidate parameters, one at least of the first and second coding formats using a filtering with several coefficients (so-called “multitap” hereinabove) for a fine search for the LTP parameters.
According to a general definition of the invention, the method comprises the following steps:
a) defining orders of at least one dictionary that the second coding format uses,
b) recovering an a priori information, obtained following the determination of the LTP parameters in the course of the coding according to the first format, so as to select at least one order of said dictionary,
c) applying the selected order to the candidates of said dictionary so as to choose a limited number of first candidates, and
d) so as to perform the second coding, conducting the LTP search only among said limited number of candidates.
The invention therefore differs from the existing solutions through the definition of orders in the dictionary and the utilization of these orders in the dictionary exploration procedure.
Other features and advantages of the invention will become apparent on examining the detailed description hereinbelow, and the appended drawings in which:
The present invention therefore pertains to multiple coding in cascade or in parallel or to any other system using, to represent the long-term periodicity of a signal, a modeling of monotap or multitap type. The invention makes it possible on the basis of the knowledge of the parameters of a first model to determine the parameters of a second model in the case where at least one of the two models uses a multitap modeling. For the sake of conciseness, only the case of a switch from a first model to a second is described but it will be understood that the invention applies also in the case of switching from m (m≧1) first models to n (n≧2) second models (where m and n are absolutely arbitrary).
With reference to
The present invention pertains to the determination of a parameter of an LTP model, denoted LTP2, from at least one parameter LTP1 of another LTP model, when at least one of the two models is a multitap model. Instead of searching for the parameter of the second coding format in its definition set (or “dictionary”), the invention provides for the following steps, referring now to
Thus, it will be understood that it is possible to limit, by implementing the invention, the number of elements of the second dictionary DIC2 to which the LTP search will pertain during the second coding COD2, while ensuring good quality of the coding COD2. In
It will be noted that
Represented in
Of course, the processor 35 manages all or some of the modules of the device. For this purpose, it may be driven by a computer program product. The present invention is moreover aimed at such a computer program product, stored in a memory of a processing unit or on a removable medium intended to cooperate with a reader of said processing unit or downloadable from a remote site, and comprising instructions for implementing all or some of the steps of the method according to the invention.
It will be understood in particular that the device COD2, within the meaning of the invention, can directly recover the parameters LTP1 of the first coder COD1 so as to deduce therefrom the aforesaid a priori information and, thereby, the order of its dictionary DIC2, or, as a variant received from the first coder COD1 directly the a priori information regarding the order of its dictionary, of the first coder COD1. In the latter case, the first coder COD1 already plays a particular role in the invention.
The present invention is also aimed at a system which includes the first coder and the device within the meaning of the invention. Specifically, the device of
In the implementation of the invention, it will be supposed that the second coder COD2 can recover from the first coder COD1, (when the latter has determined the parameters LTP1) information which will enable it to order its dictionary DIC2 (see
Advantageously, the utilization of the orders of the second dictionary DIC2 offers great flexibility regarding the number of ordered elements to be explored. It is then possible:
This adjustment may be performed at the start of the processing. It may also be performed at each block to be processed as a function of parameters of the first coding format and/or of the characteristics of the signal to be coded (for example, as a function of a voicing criterion). For one and the same block, the complexity may also vary as a function of the LTP subframes. The invention offers great flexibility which makes it possible to dynamically distribute the calculational power available between the modules of the second coder and/or the resources to process the LTP subframes.
Preferably, it is on the basis of an initial partition of the dictionary DIC1 associated with a parameter of the first LTP model that orders of the dictionary DIC2 associated with a parameter of the second LTP model are determined. It is indicated that the determination of an order consists in ranking the elements of the second dictionary DIC2 according to a certain criterion. A ranking (or “order”) is given by an indexation of the elements of the dictionary DIC2.
Several types of partition of the first dictionary DIC1 may be envisaged. A first example is the elementary partition of a dictionary DIC1 of N elements into N disjoint classes of size 1. N orders of the second dictionary are then determined. More elaborate partitions may be chosen, in particular by techniques known per se of (vector or scalar) quantization or of data classification.
Advantageously, it is possible to group similar orders together, this amounting to modifying the initial partition of the first dictionary and, consequently, the number of orders of the second dictionary. It is also possible to recalculate the orders once they have been grouped together. The procedures for determining the partition of the first dictionary into N classes and for calculating the N orders of the second dictionary may be iterated, it being possible moreover for the number N to vary in the course of the iterations. As a variant or as a supplement, to limit the memory required for storing the orders of the second dictionary, for each of these orders, a maximum number of elements to be retained is chosen, this number possibly differing according to the orders and/or the classes of the first dictionary.
In a further variant, the classes of the first dictionary are not necessarily disjoint. Typically, one and the same element may be associated with more than one order of the second dictionary. The choice of the order or the combination of orders may then take account of factors other than the current LTP parameter of the first dictionary.
Initially, the number of orders and the orders which are appropriate in the second dictionary are determined by a statistical and/or analytical study, as a function of successive suites of LTP parameters according to the first model. This study therefore defines, for each class of the partition of the dictionary associated with an LTP parameter of the first format, a ranking of the dictionary of a parameter of the second format. A statistical study has been carried out on an off-line bank by associating in one and the same coder the LTP model of the first format and the LTP model of the second format. The placing of the two LTP analyses in parallel has been the preferred learning configuration. Of course, other configurations may be used, in particular a conventional tandem which cascades the two codings. The statistical study ensures, for each element of the first dictionary (or each class of its partition), a ranking of the elements of the second dictionary according to a certain criterion. Preferably, this criterion evaluates the impact on the quality of the signal retrieved. Specifically, the quality criterion can be that used on coding to select the second LTP parameter. Of course, other criteria may be used, in particular the invoking of an element of the second dictionary for a class of the first dictionary. Furthermore, a combination of criteria may also be used.
An analytical study may also be performed to determine orders of the second dictionary as a function of a partition of the first dictionary. Preferably, the analytical study completes the statistical study described above. It is preferably limited to the dictionary parts which lead to satisfactory analytical approximations.
The determination of an LTP parameter of the second coding format, on the basis of the LTP parameters according to the first coding format, will now be described.
Within the framework of the design of algorithms for restricted exploration of the second dictionary knowing the LTP parameters chosen by the first coding format, preferential utilization is made of the partition of a first dictionary and the orders of the second dictionary which are associated with this partition of the first dictionary.
For the sake of the clarity of the account, the principals of the algorithm used are first described when the two coding formats have LTP subframes of identical duration. To each current subframe of the second coding format there corresponds a single subframe of the first coding format. For this first subframe, the first coding format has selected a suite of LTP parameters (termed the “first suite LTP1”). By virtue of the partition of the dictionary associated with one of the LTP parameters of the first model, an order of exploration of the second dictionary is selected by choosing the order associated with the class of the element of the first suite LTP1. Next, the second dictionary is explored in accordance with the order thus determined. Moreover, as a function of a quality/complexity compromise and/or possibly of the maximum number of elements of the second dictionary retained for the class, the number of elements tested is restricted. In general, it will therefore be supposed that, among all the elements of the second dictionary, only the first elements determined by the order which has been chosen are tested.
When the two coding formats have LTP subframes of different durations, it transpires that a current subframe of the second format may correspond to more than 1 subframe of the first format. This situation is illustrated in
Depending on the type of LTP parameter of the partition of the first dictionary, other criteria may be adopted. Instead of retaining just a single order, another solution consists in combining at least some of the various preselected orders. Several combining procedures are possible. For example, if K orders have been retained, then the first element of each of the K orders is firstly examined, while eliminating any redundancies. K1 elements (K1≦K) are obtained. Next, K2 elements are added, such that K2≦K and K2≦N-K1, chosen from the set consisting of the second element of the K orders (while eliminating any redundancies), and so on and so forth until N elements are obtained, N being the maximum number of elements of the second dictionary to be tested. This selection of N elements ei, ej, . . . , ek, . . . in the guise of first elements of K orders ORD1, ORD2, . . . , ORDK, has been represented schematically in
As a variant, it is also possible to construct K subsets of the rankings by preselecting the Ni(≦N) first elements of each ranking Ci (1≦i≦K). The choice of Ni is such that
and makes it possible to process the rankings equitably or, conversely, to favor certain rankings. Next, all the elements present in the K subsets and then the elements present in K−1 subsets are selected, and so on and so forth until N elements are retained. If N elements have not been obtained, the number of elements is completed by taking for example successively the following elements in the K subsets.
It is of course possible to combine some of these ranking strategies. It is indicated in a general manner that the second dictionary is preferably explored according to a “dynamic” order thus determined. This procedure for constructing a dynamic order from predetermined, stored orders may also be applied when the classes of the partition are not disjoint and an element of the first dictionary belongs to more than one class.
Described below are three cases of switching from a first LTP model to a second LTP model, illustrating the application of the invention to various models and types of LTP parameters. Of course, although the examples are given only for a first and a second dictionary, the invention is readily generalized to more than one first and/or second dictionary.
The parameters of the monotap model of a format COD1 are available and one seeks to determine at least calculational and/or resource cost those of the multitap model of a format COD2. For each subframe, the coder COD1 has determined the pair (λe,βe) of parameters of the monotap LTP filter. The coding of a subframe of COD2 requires the determination of pairs (λs, (βi)s) (where i is a gain index) of parameters of the multitap LTP filter. The suite of parameters of the first model is therefore (λe,βe). The suite of parameters of the second model is (λs, (βi)s).
The determination of the delay λs is done by one of the known prior art procedures. For example, it is possible to use the intelligent transcoding procedure which determines this delay λs directly by choosing as delay, that determined by COD1 on its subframe which shares the most samples with the current subframe of COD2 (if this delay λe is fractional, its integer part or the nearest integer is taken). This situation will be described later with reference to
The vector of gains (βi)s for each subframe of COD2 is then determined, with a low complexity within the meaning of the invention, on the basis of one at least of the gains βe of the subframes of COD1. Through a study which associates the two LTP models, a partition of the first dictionary (here the dictionary of the scalar gains βe) has been performed. Orders of the second dictionary which are associated with this partition are then determined. These orders correspond here to the whole set of vectors of gains (βi)s. On the basis of the scalar LTP gains βe chosen by the first format COD1 for its subframes corresponding to a current subframe of COD2, the orders of the second dictionary that are associated with the classes of the scalar gains are preselected. Next, a single of these orders may be retained, or else, an order is constructed dynamically. Finally, the first N vectors of gains determined by this order are tested to select the best vector (according to a criterion such as the usual CELP criterion). It is recalled that, by virtue of the orders, the number N may readily be adjustable as a function for example of the desired quality/complexity compromise. In general, N is much less than the size of the second dictionary.
According to one of the advantages of the present invention, the optimal vector of gains of a multitap LTP filter of a second coding format is thus determined on the basis of at least one gain of a monotap LTP filter of a first format, while considerably reducing the complexity of exploration of the second dictionary of the vectors of gains and while limiting the number of vectors of gains to be tested. Contrary to reference [2] given hereinabove, where, for each monotap gain, a subset of vectors of gains of fixed size is associated, the solution within the meaning of the invention makes it possible to adjust the exploration of the dictionary as a function of the target quality and of the complexity constraints. It will be understood that the invention entails greater involvement of the various orders of the dictionary of vectors of gains than of the predetermined and fixed subsets as in the aforesaid reference.
In the case of an intelligent transcoding from the 8-kbit/s UIT-T G.729 coder to the 6.3-kbit/s UIT-T G.723.1 coder, which will be described later as an exemplary embodiment, the steps set forth hereinabove may be applied to the focusing of the closed-loop search in the two dictionaries of vectors of gains of the G.723.1 on the basis of the LTP gains of the G.729 coder.
This particular case is the inverse of the previous one. The parameters of the multitap LTP model of a first format COD1 are available and one seeks to determine at least cost those of the monotap LTP model of a second format COD2. The suite of parameters of the first model is therefore written (λe, (βi)e) (where i is a gain index), while the suite of parameters of the second model is written (λs,δs). On the basis of at least one suite of parameters selected by the first coder COD1, one seeks to obtain a delay λs and a gain βs for the format COD2. Through a study which associates the two LTP models, a partition of the first dictionary which is, in this case, that of the vectors of gains (βi)e, has been performed. Orders of the second dictionary which are associated with the partition of the first dictionary are then determined, within the meaning of the invention. Here, the second dictionary consists of the whole set of jitter values (λe−λs). On the basis of the vectors of gains (βi)e chosen by the first format COD1, for its subframes which correspond to the current subframe of COD2, the orders of the second dictionary which are associated with the classes of these vectors of gains are preselected. Thereafter, a single of these orders may be retained, or else, an order may be constructed dynamically. Finally, the “neighborhood” values thus determined around one or more anchoring delays λ′s are explored. The determination of the anchoring delay(s) is done by a procedure known in the prior art.
The present invention therefore proposes an original solution which makes it possible to reduce the complexity of determining the delay λs, by reducing the number of delay values tested of a monotap LTP model of a second coding format on the basis of a knowledge of the parameters of a multitap LTP model of a first coding format. Most of the prior art procedures use only the delay without utilizing the gain vector. As in document WO-03058407, here both types of parameters are used. Nevertheless, in contradistinction to the teaching of this last reference, a gain vector points to a set of several jitter values and not to a single value as in this reference. According to one of the advantages afforded by the invention, the problems related to the approximating of a multitap LTP filter by a single monotap filter are thus circumvented.
In an advantageous variant, to limit storage, the ordered neighborhoods are intervals of increasing size. This measure is particularly advantageous for focusing the open-loop and/or closed-loop search. An exemplary embodiment will be described later, relating to the closed-loop search for the LTP delay of the 8-kbit/s UIT-T G.729 coder based on the LTP parameters of the 6.3-kbit/s UIT-T G.723.1 coder.
To the knowledge of the inventors, this case has never been studied in the prior art.
The parameters of the multitap model of a first format COD1 are available and one seeks to determine at least cost those of the multitap model of a second format COD2. The suite of parameters of the first model may therefore be written (λe, (λi)e). The suite of parameters of the second model may also be written (λs, (βi)s). On the basis of at least one suite of parameters selected by the first format COD1, one seeks to obtain a delay λs and a vector of gains (βi)s for the second format COD2.
The determination of the delay λs on the basis of at least one delay λe is done by a procedure known in the prior art. It will be supposed that the implementation of the present invention makes it possible here to determine with low complexity the vectors of gains (βi)s for each subframe of the second format COD2 on the basis of at least one vector of gains (βi)e of the subframes of the first format COD1. By a study which associates the two multitap LTP models, a partition of the first dictionary which in this case is that of the vectors of gains (βi)e has been performed, within the meaning of the invention. The orders of the second dictionary (here that of the vectors of gains (βi)s) which are associated with this partition is then determined. On the basis of the vectors of gains (βi)e chosen by the first format COD1 for its subframes which correspond to the current subframe of the second format COD2, the orders of the second dictionary which are associated with the classes of these vectors of gains are preselected. Thereafter, a single of these orders may be retained, or else an order can be dynamically and progressively constructed. Finally, the first vectors of gains determined by this order are tested to select the best one.
An exemplary embodiment between the bitrates 6.3 kbit/s and 5.3 kbit/s of the UIT-T G.723.1 coder illustrating the latter case is presented later.
Presented hereinbelow are three exemplary embodiments which are aimed at transcoding between two different coding formats UIT-T G.729 and UIT-T G.723.1 in the case of the first two, and a change of bit rate within a multirate coder (UIT-T G.723.1) in the case of the last one. A description of these two UIT-T coders is firstly given together with their LTP modelings.
These two coders belong to the family of CELP coders, coders based on analysis by synthesis.
Coders Based on Analysis by Synthesis
In these coders, the synthesis model is used to extract the parameters which model the signals to be coded. These signals may be sampled at the telephone frequency (Fe=8 kHz) or a higher frequency, for example at 16 kHz for broadband coding (bandwidth from 50 Hz to 7 kHz). According to the application and the desired quality, the compression factor varies from 1 to 16 so that these coders operate at bit rates of 2 to 16 kbit/s in the telephone band, and at bit rates of 6 to 32 kbit/s in broadband. The digital coding and decoding device of CELP type, the coder based on analysis by synthesis used most widely at present for coding speech signals, is presented in 4a. The speech signal s0 is sampled and converted into a string of blocks of (L′) samples called frames. In general, each frame is cut up into smaller blocks of (L) samples, called subframes. Each block is synthesized by filtering a waveform extracted from a catalogue (also called the fixed excitation dictionary), multiplied by a gain, through two time-varying filters. The excitation dictionary is a finite set of waveforms of L samples. The first filter is the long-term prediction filter. A “LTP” (Long Term Prediction) analysis makes it possible to evaluate the parameters of this long-term predictor which utilizes the periodicity of the voiced sounds. This predictor is equivalent to a dictionary that stores the past excitation for various delays. This dictionary is generally called the “adaptive excitation dictionary”. The second filter is the short-term prediction filter. The “LPC” (Linear Prediction Coding) analysis procedures make it possible to obtain these short-term prediction parameters that are representative of the transfer function of the vocal tract and are characteristic of the spectrum of the signal.
Thus, referring to
The method used to determine the innovation sequence is therefore the analysis by synthesis procedure. At the coder, a large number of innovation sequences of the excitation dictionary are filtered by the two LTP and LPC filters, and the waveform selected is that which produces the synthetic signal closest to the original signal according to a perceptual weighting criterion, generally known by the name of the CELP criterion.
LTP Model of the G.729 at 8 kbit/s (Monotap)
The UIT-T G.729 coder operates on a speech signal limited band-wise to 3.4 kHz, sampled at 8 kHz and cut up into frames of 10 ms (i.e. 80 samples per frame). Each frame is divided into two subframes (numbered 0 and 1 hereinbelow) of 40 samples (5 ms). The LTP model of the UIT-T G.729 coder is based on a monotap modeling with fractional resolution. At each frame, the LTP analysis determines a delay λi and a gain βi for each subframe.
and under integer resolution in the span [85; 143].
Next, the delay λ1 of the second subframe is determined with a fractional resolution of ⅓ by analysis by synthesis about λ0 over the span [int(λ0)−52/3; int(λ0)+42/3], int(λ0) being the integer part of the possibly fractional delay λ0 (step 404). For each subframe, the gain β is calculated once the closed-loop delay has been determined (steps 403 and 405). After the search for the fixed excitation, the gain β is quantized jointly with the gain of the fixed excitation by vector quantization on 7 bits. The definition set (or dictionary) of monotap LTP gain of the G.729 therefore has a size of 128.
LTP Model of the G.723.1 (Multitap)
The UIT-T G.723.1 coder operates on a speech signal limited band-wise to 3.4 kHz, sampled at 8 kHz and cut up into frames of 30 ms (i.e. 240 samples per frame). Each frame comprises 4 subframes of 7.5 ms (60 samples) grouped 2 by 2 into super subframes of 15 ms (120 samples). The UIT-T G723.1 coder uses a multitap modeling of order 5. The coefficients of the long-term predictor are quantized vectorally by means of two dictionaries previously stored with 85 or 170 entries for the 6.3-kbit/s mode, while the 5.3-kbit/s mode uses only the dictionary with 170 entries. In the 6.3-kbit/s mode, the choice of the dictionary explored depends on the delay value of the even subframes.
As presented in
Determination of the Delay of the Multitap Filter
The determination of the delay is direct. Thus, for the even subframes of the G.723.1, that is to say subframes 0 and 2, the delay is taken equal to the integer part of that of the subframes 1 and 4 of the G.729. For the odd subframes, a closed loop is performed about the previous delay (even subframe). This closed loop may be identical to that of the G.723.1, but may also be restricted according to the desired complexity, or even eliminated so as to keep the same delay value on the two subframes, even and odd.
Determination of the Coefficients of the Multitap Filter
Here, only a single first dictionary is considered, namely the set of the 128 monotap LTP gains of the G.729, whereas two possible second dictionaries are considered (the two dictionaries of vectors of gains of the G.723.1, the choice of which depends on the delay of the subframes).
Once the delay has been determined, it still remains to determine a vector of 5 gains in the dictionary of vectors of 5 coefficients that the G.723.1 coder selects. The implementation of the present invention makes it possible to restrict the exploration thereof to a limited number of vectors of gains determined on the basis of the monotap LTP gains of the subframes of the G.729 coder.
A statistical study has been carried out beforehand by associating within one and the same coder the multitap model of the G.723.1 coder and the monotap model of the G.729 coder. This study has made it possible to rank the 170 and 85 vectors of multitap LTP gains of the two dictionaries of the G.723.1 according to their impact on the quality of the signal retrieved, for each of the 128 monotap LTP gains of the G.729. Here, it is the CELP criterion which is used for this purpose. For each of these two dictionaries of the G.723.1, 128 orders (or rankings) associated with the elementary partition of the set of 128 monotap LTP gains have thus been obtained.
Each subframe of the G.723.1 covers (at least partially) two subframes of the G.729. Firstly, the two monotap gains (denoted g1 and g2) of these two corresponding subframes of the G.729 are extracted. With each of these two gains is associated a ranking C(gi) of the vectors of the dictionary of vectors of multitap coefficients. This dictionary is selected by the value of the delay of the even subframe of the G.723.1.
Let N be the maximum permitted number of vectors of multitap gains for the current subframe of the G.723.1 coder. If the two gains of the G.729 are equal, there is therefore just one ranking and the first N elements ordered by this ranking of the dictionary of vectors of gains are retained. Otherwise, an order of N elements is constructed from two different orders. For example, two subsets of the rankings C(g1) and C(g2) are constructed by preselecting their first N1 and N2 (respectively) elements. N1 and N2 are less than or equal to N. The two rankings (N1=N2) can be processed equitably or one of the two rankings can be favored. For example, it is possible to favor the ranking associated with the largest monotap gain (typically if g1>g2 then 0≦N2≦N1≦N). Is also possible to favor the one whose G.729 subframe most overlaps the G.723.1 subframe considered. Next, all the elements belonging to the two subsets are firstly selected. The set forming the dictionary is supplemented to N, by taking alternately in the two subsets the element ranked best among the remaining ones. Here again, it is possible, by supplementing, to give preference to one of their two subsets. It is of course possible to combine some of these strategies. For example, choose N1=N2 but after selecting the common elements, continue with the remaining elements of one of the two rankings before possibly supplementing with the remaining elements of the other rankings. The strategy may also vary depending on the G.723.1 subframe considered.
Finally, the exploration of the dictionary of vectors of gains is limited to the N vectors determined by virtue of the “dynamic” order thus constructed. This focused exploration makes it possible to select the best gain vector. Preferably, the selection criterion is the CELP criterion used conventionally by the G.723.1 for exploring the dictionaries of vectors with 5 LTP coefficients. The solution set forth here allows a very great reduction in the complexity of the LTP analysis of the G.723.1 coding without, however, impairing the quality. By way of example of performance,
Additionally, complete storage of the 128 orders for the two dictionaries represents a total of 128*(170+85)=32640 index values to be stored. In reality, it is not necessary to retain all these values since, as indicated hereinabove, only a limited number is necessary. Thus, for a zero loss in the CELP criterion, trials show that it would be sufficient to store about 13582 indices. By choosing a weaker constraint on the CELP criterion, this number can be reduced again (down to 11251 values for 1% loss). It can be greatly reduced again by adopting a partition other than the elementary partition for the set of monotap gains.
In contradistinction to the previous exemplary embodiment, the parameters of the multitap LTP model of a G.723.1 frame are available and one seeks to obtain the monotap LTP parameters of the G.729 for three frames, that is to say six subframes (see
Determination of the Open-Loop Delay
The open-loop search has been eliminated. To do this, each of the three G.729 frames firstly adopts the delay of one of the subframes of the G.723.1 coder as open-loop delay. The correspondence between G.729 frames and G.723.1 subframes is illustrated in
However, it should be noted that the delay chosen by the G.723.1 coder may be outside the span of values permitted by the G.729 coder. Specifically, the smallest value permitted by the G.729 coder is 19 whereas it is 18 for the G.723.1 coder. Several solutions are possible for getting round this problem. Typically, it is for example possible to double the delay arising from the G.723.1 coder, or more simply add 1 to it.
Determination of the Closed-Loop Delay
Once the open-loop delays have been fixed for the three frames of the G.729 coder, it remains to perform, for each subframe, the closed-loop search. It is recalled that the spans of values are as follows:
The basic closed-loop search for the G.729 coder consists firstly in successively testing all the integer values of the span (7 values for λ0 and 10 for λ1). Once the best integer value has been selected, the various fractions (−⅔, −⅓, ⅓, ⅔) are tested to determine the best one according to the criterion chosen, in this instance the one which maximizes the CELP criterion. For the even subframe, it will be noted that the fractional part is searched for only if the integer part of λ0 is less than 85.
Here, the first dictionary (in the definition of the invention given hereinabove) is one of the two dictionaries of LTP gain vectors of the G.723.1 coder, the second dictionary being one of the two sets of neighborhood integer values (or jitter) around an anchoring delay. It will then be understood that the invention may be applied readily to more than one first dictionary, on the one hand, and to more than one second dictionary, on the other hand.
To reduce the complexity of the closed-loop search for the integer values within the neighborhood of the anchoring value λ′(λOL or int(λ0)), it is proposed, within the meaning of the invention, that the number of integer delay values tested by the closed loops be limited. Depending on the choice of LTP gain vector made by the G.723.1, only a reduced number of values is tested. The integer delay is determined in this restricted set. Next, the fractional part is searched for in a conventional manner.
A statistical study has been carried out beforehand by associating within one and the same coder the multitap model of the G.723.1 and the monotap model of the G.729. This study has made it possible to establish for the two closed-loop search neighborhoods of the G.729 (even and odd subframes) an order of importance of the neighborhood values according to their impact on the quality of the signal retrieved, for each of the gain vectors of the two multitap LTP dictionaries of the G.723.1. This classification makes it possible to choose the number of values tested according to the quality and complexity constraints and to limit, for each of the six subframes of the G.729, the extent of the closed loop based on the choice of the gains βi made for the subframes of the G.723.1. By using the correspondence between subframes of the table of
The association between even (respectively odd) subframes of the G.729 coder and the suite of parameters (λj, (βi)j), arising from the G.723.1 coder is illustrated in
It will be noted that for certain subframes, the anchoring value λ′ may be different from the delay λj of the parameter suite (λj, (βi)j) determined for the associated G.723.1 subframe. This point is explained later where the parity of the subframes (even or odd) is taken into account. In a first variant, it is simply possible to ignore any difference. Advantageously, in another variant, the set of ordered neighborhoods is modified as a function of the difference (λj−λ′) and the size of this set may possibly be modified. Preferably, the difference (λj−λ′) is subtracted from each element of this neighborhood ordered according to the gains (βi)j and consideration is given to its intersection with the set defining the neighborhoods (here the interval [−3;3] for the even subframes and the interval [−5;4] for the odd subframes, as will be seen later).
It is also possible to condition the use of the restricted neighborhoods as a function of the deviation between the two delays. The strategy may therefore be adapted to the subframe or to the deviation between the delays, or to the two criteria combined.
Even Subframes
The search must be performed around the open-loop delay λ0L over the span [λ0L−3; λOL+3]. Depending on the vector(s) of gains chosen by the G.723.1 coder, orders of the set of 7 jitter values (−3, −2, −1, 0, 1, 2, 3) are determined. For subframe 0 (respectively 2) of the G.729 coder, there is only a single associated subframe of the G.723.1 and hence a single vector of gains and, thus, a single order. On the other hand, two subframes of the G.723.1 coder are associated with subframe 4 of the G.729 coder, as shown by
Odd Subframes
The search must be conducted around the integer part λ′2p of the previous (even) subframe over the span [λ′2p−52/3; λ′2p+42/3]. For these odd subframes, just as for the even subframe 4, the delay λj of the parameter suite (λj, (λi)j) of the associated G.723.1 subframe(s) may be different from this anchoring value λ′2p. Depending on the vector(s) (βi)j of gains chosen by the G.723.1 coder, orders of the set of 10 jitter values are preselected and modified as a function of the difference (λj−λ′2p). Let N(N≦10) be the maximum permitted number of tested values.
To determine the restricted search span, the following procedure is preferably carried out for each odd subframe.
Subframe 1:
The total search span is [λ′0−52/3; λ′0+42/3]. Two orders corresponding to the gain vectors (βi)0 and (βi)1 are preselected. Next, the ordered neighborhoods are modified as a function of the differences (λ0−λ′0) and (λ1−λ′0). These two deviations are limited since:
On the basis of the first N1 and N2 elements of the modified neighborhoods, a single ordered neighborhood of size N is constructed. The values that are common to both subsets are firstly selected, then the set is completed, if necessary, by alternately taking the best remaining value in the two subsets. The closed-loop search is then conducted in the subset thus constructed.
Subframe 3:
The total search span is [λ′2−52/3; λ2+42/3]. An order corresponding to the gain vector (βi)2 is selected. Next, the ordered neighborhood is modified as a function of the difference (λ2−λ′2). In contradistinction to the previous case, the deviation between λ2 and λ′2 may be sizeable in the intersection of the ordered neighborhood, modified by subtracting (λ2−λ2), may be zero. In this case, preferably, the search is done over the whole span [λ′1−52/3; λ′1+42/3]. The use of ordered neighborhoods may also be conditioned to a threshold on |λ2−λ′2|. For example, the neighborhoods are restricted only if |λ2−λ′2|3; otherwise, the whole span [−5,4] is explored. The choice of this variant may also depend on the permitted complexity.
Subframe 5:
The total search span is [λ′4−52/3; λ′4+42/3]. An order corresponding to the gain vector (βi)3 is selected. Next, the ordered neighborhood is modified as a function of the difference (λ3−λ′4). As in the case of subframe 1, this deviation is limited. Specifically, the closed-loop delay of the G.729, λ′2, is in the neighborhood ([−3,3]) of the open-loop delay (here taken equal to the closed-loop delay λ3 of the G.723.1). The first N values of the modified ordered set are explored.
The solution presented here allows a very great reduction in the complexity of the LTP analysis of the G.729 coding. Relative to exploring the complete neighborhoods, the invention makes it possible to test only 60% (respectively 40%) of the neighborhood values if the gain vector of the G.723.1 coder is in the dictionary with 170 entries (respectively 85 entries).
The two models are much the same and differ practically only by the choice of the dictionary of multitap LTP gain vectors.
Determination of the Delay of the Multitap Filter
In a similar manner to the determination of the delay of a monotap described hereinabove on the basis of the multitap LTP parameters, it is possible to use the delay of the even subframes, as open-loop delay of the super subframe, then to restrict the span of variation of the closed-loop delay of the 5.3-kbit/s mode as a function of the vector of five coefficients of the filter chosen by the 6.3-kbit/s mode. Preferably, no processing other than a simple copying of the delay is necessary. Thus, each subframe of the 5.3-kbit/s adopts the delay that the 6.3-kbit/s mode has chosen for the same subframe, as delay.
Determination of the Coefficients of the Multitap Filter
Here, there is a single second dictionary which is the dictionary with 170 vectors of five coefficients of the 5.3-kbit/s mode whereas it is necessary to consider two “first dictionaries”, according to the terminology used in the general definition of the invention. These two first dictionaries are the two dictionaries of vectors of gains used by the 6.3-kbit/s mode of the G.723.1.
In this exemplary embodiment, one therefore seeks to determine for the 5.3-kbit/s mode a gain vector in the dictionary with 170 entries on the basis of a gain vector selected by the 6.3-kbit/s mode in one of the two dictionaries (with 170 or 85 vectors).
One of the two cases may seem trivial since if the 6.3-kbit/s mode uses the same dictionary (the dictionary with 170 vectors) for the current subframe, it would be tempting to choose the same vector as the 6.3-kbit/s mode for the 5.3-kbit/s mode. Nevertheless, this approach introduces a noticeable degradation of the signal. Specifically, although the LTP modeling is identical for both modes (same dictionaries of delays and of vectors of 5 gains), it should be borne in mind that the remainder of the coding process is not the same. The LTP filtering is therefore not applied to the same signal and it is thus necessary to widen the choice of vectors of coefficients of the filter for the 5.3-kbit/s mode.
For this purpose, a study has been carried out on the two dictionaries to associate with each of the vectors, a ranking of the vectors of the dictionary with 170 vectors.
Thus, to select a gain vector for the 5.3-kbit/s mode, it is preferred, on the basis of the choice of the gain vector made by the 6.3-kbit/s mode, to explore in the large dictionary (170 vectors) only a set restricted to the first N vectors of the ranking associated with the gain vector chosen by the 6.3-kbit/s mode. The size N depends on the complexity or the quality or the quality complexity compromise desired. Thus, as described hereinabove, the gain vector which maximizes a criterion, preferably the CELP criterion, is selected from this subset.
Lamblin, Claude, Ghenania, Mohamed
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
6687668, | Dec 31 1999 | C & S Technology Co., Ltd. | Method for improvement of G.723.1 processing time and speech quality and for reduction of bit rate in CELP vocoder and CELP vococer using the same |
6829579, | Jan 08 2002 | DILITHIUM NETWORKS INC ; DILITHIUM ASSIGNMENT FOR THE BENEFIT OF CREDITORS , LLC; Onmobile Global Limited | Transcoding method and system between CELP-based speech codes |
7016831, | Oct 30 2000 | Fujitsu Limited | Voice code conversion apparatus |
7184953, | Jan 08 2002 | Dilithium Networks Pty Limited | Transcoding method and system between CELP-based speech codes with externally provided status |
7505899, | Feb 02 2001 | NEC Corporation | Speech code sequence converting device and method in which coding is performed by two types of speech coding systems |
7519532, | Sep 29 2003 | Texas Instruments Incorporated | Transcoding EVRC to G.729ab |
7792670, | Dec 19 2003 | Google Technology Holdings LLC | Method and apparatus for speech coding |
20020077812, | |||
20030033142, | |||
20030142699, | |||
20030177004, | |||
20040068407, | |||
20050137863, | |||
20050154584, | |||
20060074644, | |||
20070124138, | |||
WO3058407, | |||
WO2004008734, | |||
WO2005066936, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jan 09 2006 | France Telecom | (assignment on the face of the patent) | / | |||
Oct 22 2007 | LAMBLIN, CLAUDE | France Telecom | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 030639 | /0344 | |
Oct 25 2007 | GHENANIA, MOHAMED | France Telecom | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 030639 | /0344 | |
May 28 2013 | France Telecom | Orange | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 032698 | /0396 |
Date | Maintenance Fee Events |
Jan 24 2014 | ASPN: Payor Number Assigned. |
Oct 23 2017 | REM: Maintenance Fee Reminder Mailed. |
Apr 09 2018 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Mar 11 2017 | 4 years fee payment window open |
Sep 11 2017 | 6 months grace period start (w surcharge) |
Mar 11 2018 | patent expiry (for year 4) |
Mar 11 2020 | 2 years to revive unintentionally abandoned end. (for year 4) |
Mar 11 2021 | 8 years fee payment window open |
Sep 11 2021 | 6 months grace period start (w surcharge) |
Mar 11 2022 | patent expiry (for year 8) |
Mar 11 2024 | 2 years to revive unintentionally abandoned end. (for year 8) |
Mar 11 2025 | 12 years fee payment window open |
Sep 11 2025 | 6 months grace period start (w surcharge) |
Mar 11 2026 | patent expiry (for year 12) |
Mar 11 2028 | 2 years to revive unintentionally abandoned end. (for year 12) |