Techniques utilising time scale modification (TSM) of signals are described. The signal is analysed and divided into frames of similar signal types. Techniques specific to the signal type are then applied to the frames thereby optimising the modification process. The method of the present invention enables TSM of different audio signal parts to be realized using different methods, and a system for effecting said method is also described.
|
12. A time scale modifying device adapted to modify a signal so as to effect the formation of a time scale modified signal comprising:
a) means for determining different signal types within frames of the signal,
b) means for applying a first time scale modification algorithm to frames having a first determined signal type and a second different time scale modification algorithm to frames having a second determined signal type,
wherein the first signal type is a voiced signal segment and wherein the second signal type is an un-voiced signal segment,
means for splitting the un-voiced speech signal in a first portion and a second portion, and
means for inserting noise in between the first portion and the second portion to obtain a time scale expanded signal,
wherein the noise is synthetic noise with a spectral shape equivalent to the spectral shape of the first and second portions of the signal and wherein the inserted noise is excised from the middle of a previously synthesized noise sequence.
1. A method of time scale modifying a signal, the method comprising the acts of:
defining individual frame segments within the signal,
analyzing the individual frame segments to determine a signal type in each frame segment,
applying a first algorithm to a determined first signal type and a second different algorithm to a determined second signal type, wherein the first and second algorithms are time scale modification algorithms and the method is used for time scale modification of the signal, and
wherein the first signal type is a voiced signal segment and wherein the second signal type is an un-voiced signal segment,
splitting the un-voiced speech signal in a first portion and a second portion, and
inserting noise in between the first portion and the second portion to obtain a time scale expanded signal,
wherein the noise is synthetic noise with a spectral shape equivalent to the spectral shape of the first and second portions of the signal and wherein the inserted noise is excised from the middle of a previously synthesized noise sequence.
2. The method as claimed in
4. The method as claimed in
dividing each frame of the determined second signal type into a lead in and a lead out portion,
generating a noise signal, and
inserting the noise signal between the lead-in and lead-out portions so as to effect an expanded segment.
5. The method as claimed in
6. The method as claimed in
8. A method as claimed in
9. The method of
10. The method of
11. A method of receiving an audio signal, the method comprising the acts of:
decoding the audio signal, and
time scale expanding the decoded audio signal according to a method as claimed in
13. The device as claimed in
a) means for splitting the signal frame in a first portion and a second portion, and
b) means for inserting noise in between the first portion and the second portion to obtain a time scale expanded signal.
14. A receiver for receiving an audio signal, the receiver comprising:
a) a decoder for decoding the audio signal, and
b) a device according to
15. The device of
16. The device of
|
The invention relates to the time-scale modification (TSM) of a signal, in particular a speech signal, and more particularly to a system and method that employs different techniques for the time-scale modification of voiced and un-voiced speech.
Time-scale modification (TSM) of a signal refers to compression or expansion of the time scale of that signal. Within speech signals, the TSM of the speech signal expands or compresses the time scale of the speech, while preserving the identity of the speaker (pitch, format structure). As such, it is typically explored for purposes where alteration of the pronunciation speed is desired. Such applications of TSM include test-to-speech synthesis, foreign language learning and film/soundtrack post synchronisation.
Many techniques for fulfilling the need for high quality TSM of speech signals are known and examples of such techniques are described in E. Moulines, J. Laroche, “Non parametric techniques for pitch scale and time scale modification of speech”. In Speech Communication (Netherlands) Vol 16, No. 2 p175-205 1995.
Another potential application of TSM techniques is speech coding which, however, is much less reported. Within this application, the basic intention is to compress the time scale of a speech signal prior to coding, reducing the number of speech samples that need to be encoded, and to expand it by a reciprocal factor after decoding, to reinstate the original timescale. This concept is illustrated in
The use of TSM in this context has been explored in the past, and fairly good results were claimed using several TSM methods and speech coders [1]-[3]. Recently, improvements have been made both to TSM and speech coding techniques, where these two have mostly been studied independently from each other.
As detailed in Moulines and Laroche, as referenced above, one widely used TSM algorithm is synchronised overlap-add (SOLA), which is an example of a waveform approach algorithm. Since its introduction [4], SOLA has evolved into a widely used algorithm for TSM of speech. Being a correlation method, it is also applicable to speech produced by multiple speakers or corrupted by background noise, and to some extent to music.
With SOLA, an input speech signal s is analysed as a sequence of N-samples long overlapping frames xi (i=0, . . . , m), consecutively delayed by a fixed analysis period of Sa, samples (Sa<N) The starting idea is that s can be compressed or expanded by outputting these frames while now successively shifting them by a synthesis period Ss, which is chosen such that Ss<Sa, respectively Ss>Sa (Ss<N). The overlapping segments would be first weighted by two amplitude complementary functions then added up, which is a suitable way of waveform averaging.
The actual synchronisation mechanism of SOLA consists of additionally shifting each xi during the synthesis, to yield similarity of the overlapping waveforms. Explicitly, a frame xi will now start contributing to the output signal at position iSs+ki, where ki is found such that the normalised cross-correlation given by Equation 1 is maximal for k=ki.
In this equation, {tilde over (s)} denotes the output signal while L denotes the length of the overlap corresponding to a particular lag k in the given range [1]. Having found ki, the synchronisation parameters, the overlapping signals are averaged as before. With a large number of frames the ratio of the output and input signal length will approach the value Ss/Sa, hence defining the scale factor α.
When SOLA compression is cascaded with the reciprocal SOLA expansion, several artefacts are typically introduced into the output speech, such as reverberation, artificial tonality and occasional degradation of transients.
The reverberation is associated with voiced speech, and can be attributed to waveform averaging. Both compression and the succeeding expansion average similar segments. However, similarity is measured locally, implying that the expansion does not necessarily insert additional waveform in the region where it was “missing”. This results in waveform smoothing, possibly even introducing new local periodicity. Furthermore, frame positioning during expansion is designed to re-use same segments, in order to create additional waveform. This introduces correlation in unvoiced speech, which is often perceived as an artificial “tonality”.
Artefacts also occur in speech transients, i.e. regions of voicing transition, which usually exhibit an abrupt alteration of the signal energy level. As the scale factor increases, so does the distance between ‘iSa’ and ‘iSs’ which may impede alignment of similar parts of a transient for averaging. Hence, overlapping distinct parts of a transient causes its “smearing”, endangering proper perception of its strength and timing.
In [5], [6], it was reported that a companded speech signal of a good quality can be achieved by employing the ki's that are obtained during SOLA compression. So, quite opposite to what is done by SOLA, the N-samples long frames {circumflex over (x)}i would now be excised from the compressed signal {tilde over (s)} at time instants iSs+ki and re-positioned at the original time instants iSa (while averaging the overlapping samples similar as before). The maximal cost of transmitting/storing all ki's is given by Equation 2, where Ts, is the speech sampling period and ┌ ┐ represents the operation of rounding towards the nearest-higher integer.
It has also been reported that exclusion of transients from high (i.e. >30%) SOLA compression or expansion yields improved speech quality. [7]
It will be appreciated therefore that presently several techniques and approaches exist that can successfully (e.g. giving good quality) be employed for compressing or expanding the time-scale of signals. Although described specifically with reference to speech signals, it will be appreciated that this description is of an exemplary embodiment of a signal type and the problems associated with speech signals are also applicable to other signal types. When used for coding purposes, where the time-scale compression is followed by time-scale expansion (time-scale companding), the performance of prior art techniques degrade considerably. The best performance for speech signals is generally obtained from time-domain methods, among which SOLA is widely used, but problems still exist using these methods, some of which have been identified above. There is, therefore, a need to provide an improved method and system for time scale modifying a signal in a manner specific to the components making up that signal.
By providing a method that analyses individual frame segments within a signal and applies different algorithms to specific signal types it is possible to optimise the modification of the signal. Such application of specific modification algorithms to specific signal types enables a modification of the signal in a manner which is adapted to cater for different requirements of the individual component segments that make up the signal.
In a preferred embodiment of the present invention, the method is applied to speech signals and the signal is analysed for voiced and un-voiced components with different expansion or compression techniques being utilised for the different types of signal. The choice of technique is optimised for the specific type of signal.
The expansion of the signal is effected by the splitting of the signal into portions and the insertion of noise between the portions. Desirably, the noise is synthetically generated noise rather than generated from the existing samples, which allows for the insertion of a noise sequence having similar spectral and energy properties to that of the signal components.
A first aspect of the present invention provides a method for time-scale modification of signals and is particularly suited for audio signals and is particular to the expansion of unvoiced speech, and is designed to overcome the problem of artificial tonality introduced by the “repetition” mechanism which is inherently present in all time-domain methods. The invention provides for the lengthening of the time-scale by inserting an appropriate amount of synthetic noise that reflects the spectral and energy properties of the input sequence. The estimation of these properties is based on LPC (Linear Predictive Coding) and variance matching. In a preferred embodiment the model parameters are derived from the input signal, which may be an already compressed signal, thereby avoiding the necessity for their transmission. Although it is not intended to limit the invention to any one theoretical analysis, it is thought that only a limited distortion of the above mentioned properties of an unvoiced sequence is caused by a compression of its time-scale.
Parametric Modelling of Unvoiced Speech
Linear predictive coding is a widely applied method for speech processing, employing the principle of predicting the current sample from a linear combination of previous samples. It is described by Equation 3.1, or, equivalently, by its z-transformed counterpart 3.2. In Equation 3.1, s and ŝ respectively denote an original signal and its LPC estimate, and e the prediction error. Further, M determines the order of prediction, and ai are the LPC coefficients. These coefficients are derived by some of the well-known algorithms ([6], 5.3), which are usually based on least squares error (LSE) minimisation, i.e. minimisation of Σne2[n]
Using the LPC coefficients, a sequence s can be approximated by the synthesis procedure described by Equation 3.2. Explicitly, the filter H(z) (often denoted as 1/A(z)) is excited by a proper signal e, which, ideally, reflects the nature of the prediction error. In the case of unvoiced speech, a suitable excitation is normally distributed zero-mean noise.
Eventually, to ensure a proper amplitude level variation of the synthetic sequence, the excitation noise is multiplied by a suitable gain G. Such a gain is conveniently computed based on variance matching with the original sequence s, as described by Equations 3.3. Usually, the mean value
The described way of signal estimation is only accurate for stationary signals. Therefore, it should only be applied to speech frames, which are quasi-stationary. When LPC computation is concerned, speech segmentation also includes windowing, which has the purpose of minimising smearing in the frequency domain. This is illustrated in
Finally, it should be noted that the gain and LPC computation need not necessarily be performed at the same rate, as the time and frequency resolution that is needed for an accurate estimation of the model parameters does not have to be the same. Typically, the LPC parameters are updated every 10 ms, whereas the gain is updated much faster (e.g. 2.5 ms). Time resolution (described by the gains) for unvoiced speech is perceptually more important than frequency resolution, since unvoiced speech typically has more higher frequencies than voiced speech.
A possible way to realise time-scale modification of unvoiced speech utilising the previously discussed parametric modelling is to perform the synthesis at a different rate than the analysis, and in
V=V(a(t), G(t)) (a=[a1, . . . , aM], t=nT, n=1, 2, . . . ) (equation 3.4)
To obtain time-scale expansion by a scale factor b (b>1), this vector space is simply ‘down-sampled’ by the same factor, prior to the synthesis. Explicitly, after each period of bT samples, an element of V is used for the synthesis of a new N samples-long frame. Hence, compared to the analysis frames, the synthesis frames will be overlapping in time by a smaller amount. To demonstrate this, the frames have been marked by using the Hamming windows again. In practice, it will be appreciated that the overlapping parts of the synthesis frames may be averaged by applying the power-complementary weighting instead, deploying the appropriate windows for that purpose. It will be appreciated that by performing the synthesis at a faster rate than the analysis that time-scale compression could be achieved in a similar way.
It will be appreciated by those skilled in the art that the output signal produced by applying this approach is an entirely synthetic signal. As a possible remedy to reduce the artefacts, which are usually perceived as an increased noisiness, a faster update of the gain could serve. A more effective approach, however, is to reduce the amount of synthetic noise in the output signal. In the case of time-scale expansion, this can be accomplished as detailed below.
Instead of synthesising whole frames at a certain rate, in one embodiment of the present invention a method is provided for the addition of an appropriate and smaller amount of noise to be used to lengthen the input frames. The additional noise for each frame is obtained similar as before, namely from the models (LPC coefficients and the gain) derived for that frame. When expanding compressed sequences, in particular, the window length for LPC computation may generally extend beyond the frame length. This is principally meant to give the region of interest a sufficient weight. Subsequently, a compressed sequence which is being analysed is assumed to have sufficiently retained the spectral and energy properties of the original sequence from which it has been obtained.
Using the illustration from
The time-scale expanded version of one particular frame
In addition, the windows drawn by dashed lines suggest that averaging (cross-fade) can be performed around the joints of the region where the noise is being inserted. Still, due to the noise-like character of all involved signals, possible (perceptual) benefits of such ‘smoothing’ in the transition regions remain bounded.
In
It will be understood that the above described way of noise insertion is in accordance with the usual way of performing LPC analysis, employing the Hamming window, and since the central part of the frame is given the highest weight, inserting the noise in the middle seems logical. However, if the input frame marks a region close to an acoustical event, like a voicing transition, then inserting the noise in a different way may be more desirable. For example, if the frame consists of unvoiced speech gradually transforming into a more ‘voiced-like’ speech, then insertion of synthetic noise closer to the beginning of the frame (where the most noise-like speech is located) would be most appropriate. An asymmetrical window putting the most weight on the left part of the frame could then be suitably used for the purpose of LPC analysis. It will be appreciated therefore that the insertion of noise in different regions of the frame may be considered for different types of signal.
The signal flow can be described as follows. The incoming speech is submitted to buffering and segmentation into frames, to suit the succeeding processing stages. Namely, by performing a voicing analysis on the buffered speech (inside the block denoted by ‘V/UV’) and shifting the consecutive frames inside the buffer, a flow of the voicing information is created, which is exploited to classify speech parts and handle them accordingly. Specifically, voiced onsets are translated, while all other speech is compressed using SOLA. The out-coming frames are then passed to the codec (A), or bypass the codec (B) directly to the expander. Simultaneously, the synchronisation parameters are transmitted through a side channel. They are used to select and perform a certain expansion method. That is, voiced speech is expanded using SOLA frame shifts ki. During SOLA, the N-samples long analysis frames xi are excised from an input signal at times i Sa, and output at the corresponding times ki+iSs. Eventually, such modified time-scale can be restored by the opposite process, i.e. by excising N samples long frames {circumflex over (x)}i from the time-scale modified signal at times ki+Ss, and outputting them at times i Sa. This procedure can be expressed through Equation 4.0 where {tilde over (s)} and ŝ respectively de-note the TSM-ed and reconstructed version of an original signal s. It is assumed here that k0=0, in accordance with the indexing of k, starting from m=1. {circumflex over (x)}i [n] may be assigned multiple values, i.e. samples from different frames which will overlap in time, and should be averaged by cross-fade.
{circumflex over (x)}i[n]=ŝ[n+iSa]={tilde over (s)}[n+iSs+ki](i=
By comparing the consecutive overlap-add stages of SOLA and the reconstruction procedure outlined above, it can easily be seen that ŝi and xi will generally not be identical. It will therefore be appreciated that these two processes do not exactly form a “1-1” transformation pair. However, the quality of such reconstruction is notably higher compared to merely applying SOLA that uses a reciprocal Ss=Sa ratio.
The unvoiced speech is desirably expanded using the parametric method previously described. It should be noted that the translated speech segments are used to a realise the expansion, instead of simply being copied to the output. Through suitable buffering and manipulation of all received data, a synchronised processing results, where each incoming frame of the original speech will produce a frame at the output (after an initial delay).
It will be appreciated that a voiced onset may be simply detected as any transition from unvoiced-like to voiced-like speech.
Finally, it should be noted that the voicing analysis could in principle be performed on the compressed speech, as well, and that process could therefore be used to eliminate the need for transmitting the voicing information. However, such speech would be rather inadequate for that purpose, because relatively long analysis frames must usually be analysed in order to obtain reliable voicing decisions.
The compression can easily be described using
Initially, the buffer contains a zero signal. Then, a first frame d(
It can be seen that, while maintaining speech continuity, much of frame
It can now be concluded that after each iteration the compressor will output an “information triplet”, consisting of a speech frame, SOLA k and a voicing decision corresponding to the front frame in the buffer. Since no cross-correlation is computed during the translation, ki=0 will be attributed to each translated frame. So, by denoting speech frames by their length, the triplets produced in this case are (Ss, k0, 0), (Ss, k1, 0), (Sa+k1, 0, 0) and (Ss, k3, 1). Note that the transmission of (most) k's acquired during the compression of unvoiced speech is superfluous, because (most) unvoiced frames will be expanded using the parametric method.
The expander is desirably adapted to keep the track of the synchronisation parameters in order to identify the incoming frames and handle them appropriately.
The principal consequence of translation of voiced onsets is that it “disturbs” a continuous time-scale compression. It will be appreciated that all compressed frames have the equal length of Ss samples, while the length of translated frames is variable. This could introduce difficulties in maintaining a constant bit-rate when the time-scale compression is followed by the coding. At this stage, we choose to compromise the requirement of achieving a constant bit rate, in favour of achieving a better quality.
With respect to the quality, one could also argue that preserving a segment of the speech through translation could introduce discontinuities if the connecting segments on its both sides are distorted. By detecting voiced onsets early, which implies that the translated segment will start with a part of the unvoiced speech preceding the onset it is possible to minimise the effect of such discontinuities. It will be appreciated also that SOLA's slow convergence for moderate compression rates, which ensures that the terminating part of the translated speech will include some of the voiced speech succeeding the onset.
It will be appreciated that during the compression each incoming Sa samples long frame will produce an Ss or Sa+ki−1 (ki≦Sa) samples long frame at the output. Hence, in order to reinstate the original time-scale, the speech coming from the expander should desirably comprise of Sa samples long frames, or frames having different lengths but producing the same total length of m·Sa, with m being the number of iterations. The present discussion is with regard to a realisation which is capable of only approximating the desired length and is the result of a pragmatic choice, allowing us to simplify the operations and avoid introducing further algorithmical delay. It will be appreciated that alternative methodology may be deemed necessary for differing applications.
In the following, we shall assume to have disposal over several separate buffers, all of which will be updated by simple shifting of samples. For the sake of illustration, we shall be showing the complete “information triplets” as produced by the compressor, including the k's acquired during compression of unvoiced sounds, most of which are actually obsolete.
This is also illustrated in
During the expansion, some typical actions will be performed on the “present” frame, invoked by particular states of the buffers containing the synchronisation parameters. In the following, this is clarified through examples.
i. Unvoiced Expansion
The parametric expansion method previously described is exclusively deployed in the situation where all three frames of interest are unvoiced, as shown in
Hence, the present frame
ii. Voiced Expansion
A possible voicing state invoking this expansion method is illustrated in
iii. Translation
As detailed previously the term “translation” as used within the present specification is intended to refer to all situations where the present frame, or a part of it, is output as is or skipped, i.e. shifted but not output.
However, it is clear that the above outlined problem will now only be postponed and will re-appear with the future frame
In order to obviate this problem firstly, each present ki samples that have been used in the past is skipped. This now implies a deviation from the principle exploited so far, where for each incoming Ss samples Sa samples are output. In order to compensate “the shortage” of samples”, we shall use the “surplus” of samples contained in the translated Sa +kj samples long frames produced by the compressor, If such a frame does not directly follow a voiced offset (if a voiced onset does not appear shortly after a voiced offset) then none of its samples will have been used in the previous iterations, and it can be output as a whole. Hence, the “shortage” of ki samples following a voiced offset will be counterbalanced by a “surplus” of at most kj samples proceeding the next voiced onset.
Since both kj and ki are obtained during compression of unvoiced speech, therefore having a random-like character, their counterbalance will not be exact for a particular j and i. As a consequence, a slight mismatch between the duration of the original and the corresponding companded unvoiced sounds will generally result, which is expected to be not perceivable. At the same time, speech continuity is assured.
It should be noted that the mismatch problem could easily be tackled even without introducing additional delay and processing, by choosing the same k for all unvoiced frames during the compression. Possible quality degradation due to this action is expected to remain bounded, since waveform similarity, based on which k is computed, is not an essential similarity measure for unvoiced speech.
It should be noted that it is desirable for all the buffers to be consistently updated, in order to ensure speech continuity when switching between different actions. For the purpose of this switching and identification of incoming frames, a decision mechanism has been established, based on inspecting the states of voicing and “k-buffer”. It can be summarised through the table given below, where the previously described actions are abbreviated. To signal “re-usage” of samples, i.e. occurrence of a voiced offset in the past, an additional predicate named “offset” is introduced. It can be defined by looking one step further into the past of the voicing buffer, as true if ν[0]=1 v ν[−1]=1 and false in all other cases (v denotes logical “or”). Note that through suitable manipulation, no explicit memory location for ν[−1] is needed.
TABLE 1
Selecting actions of the expander
v[0]
v[1]
v[2]
offset
k[0] > SS
ACTION
0
0
0
0
—
UV
0
0
0
1
0
UV
0
0
0
1
1
T
0
0
1
—
—
T
0
1
1
—
—
V
1
0
0
—
—
V
1
0
1
—
—
T
1
1
0
—
—
V
1
1
1
—
—
V
It will be appreciated that the present invention utilises a time-scale expansion method for unvoiced speech. Unvoiced speech is compressed with SOLA, but expanded by insertion of noise with the spectral shape and the gain of its adjacent segments. This avoids the artificial correlation which is introduced by “re-using” unvoiced segments.
If TSM is combined with speech coders that operate at lower bit rates (i.e. <8 kbit/s), the TSM-based coding performs worse compared to conventional coding (in this case AMR). If the speech coder is operating at higher bit rates, a comparable performance can be achieved. This can have several benefits. The bit rate of a speech coder with a fixed bit rate can now be lowered to any arbitrary bit rate by using higher compression ratios. By compression ratios up to 25%, the performance of the TSM system can be comparable to a dedicated speech coder. Since the compression ratio can be varied in time, the bit rate of the TSM system can also be varied in time. For example, in case of network congestion, the bit rate can be temporarily lowered. The bit stream syntax of this speech coder is not changed by the TSM. Therefore, standardised speech coders can be used in a bit stream compatible manner. Furthermore, TSM can be used for error concealment in case of erroneous transmission or storage. If a frame is received erroneously, the adjacent frames can be time-scale expanded more in order to fill the gap introduced by the erroneous frame.
It has been shown that most of the problems accompanying time-scale companding occur during the unvoiced segments and voiced onsets that are present in a speech signal. In the output signal, the unvoiced sounds take on a tonal character, while less gradual and smooth voiced onsets are often smeared, especially when larger scale factors are used. The tonality in unvoiced sounds is introduced by the “repetition” mechanism which is inherently present in all time-domain algorithms. To overcome this problem, the present invention provides separate methods for expanding voiced and unvoiced speech. A method is provided for expansion of unvoiced speech, which is based on inserting an appropriately shaped noise sequence into the compressed unvoiced sequences. To avoid smearing of voiced onsets, the voice onsets are excluded from TSM and are then translated.
The combination of these concepts with SOLA, has enabled the realisation of a time-scale companding system which outperforms the traditional realisations that use a similar algorithm for both compression and expansion.
It will be appreciated that the introduction of a speech codec between the TSM stages may cause quality degradation, being more noticeable in proportion to the lowering of the bit-rate of the codec. When a particular codec and TSM are combined to produce a certain bit-rate, the resulting system performs worse than dedicated speech coders operating at a comparable bit-rate. At lower bit-rates, quality degradation is unacceptable. However, TSM can be beneficial in providing graceful degradation at higher bit-rates.
Although hereinbefore described with reference to one specific implementation it will be appreciated that several modifications are possible. Refinements of the proposed expansion method for unvoiced speech through deploying alternative ways of noise insertion and gain computation could be utilised.
Similarly, although the description of the invention is mainly addressed to time scale expanding a speech signal, the invention is further applicable to other signals such as but not limited to an audio signal.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word ‘comprising’ does not exclude the presence of other elements or steps than those listed in a claim. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Taori, Rakesh, Gerrits, Andreas Johannes, Burazerovic, Dzevdet
Patent | Priority | Assignee | Title |
10334384, | Feb 03 2015 | Dolby Laboratories Licensing Corporation | Scheduling playback of audio in a virtual acoustic space |
7853447, | Dec 08 2006 | Micro-Star Int'l Co., Ltd. | Method for varying speech speed |
8143620, | Dec 21 2007 | SAMSUNG ELECTRONICS CO , LTD | System and method for adaptive classification of audio sources |
8150065, | May 25 2006 | SAMSUNG ELECTRONICS CO , LTD | System and method for processing an audio signal |
8180064, | Dec 21 2007 | SAMSUNG ELECTRONICS CO , LTD | System and method for providing voice equalization |
8189766, | Jul 26 2007 | SAMSUNG ELECTRONICS CO , LTD | System and method for blind subband acoustic echo cancellation postfiltering |
8194880, | Jan 30 2006 | SAMSUNG ELECTRONICS CO , LTD | System and method for utilizing omni-directional microphones for speech enhancement |
8194882, | Feb 29 2008 | SAMSUNG ELECTRONICS CO , LTD | System and method for providing single microphone noise suppression fallback |
8204252, | Oct 10 2006 | SAMSUNG ELECTRONICS CO , LTD | System and method for providing close microphone adaptive array processing |
8204253, | Jun 30 2008 | SAMSUNG ELECTRONICS CO , LTD | Self calibration of audio device |
8259926, | Feb 23 2007 | SAMSUNG ELECTRONICS CO , LTD | System and method for 2-channel and 3-channel acoustic echo cancellation |
8345890, | Jan 05 2006 | SAMSUNG ELECTRONICS CO , LTD | System and method for utilizing inter-microphone level differences for speech enhancement |
8355511, | Mar 18 2008 | SAMSUNG ELECTRONICS CO , LTD | System and method for envelope-based acoustic echo cancellation |
8521530, | Jun 30 2008 | SAMSUNG ELECTRONICS CO , LTD | System and method for enhancing a monaural audio signal |
8744844, | Jul 06 2007 | SAMSUNG ELECTRONICS CO , LTD | System and method for adaptive intelligent noise suppression |
8774423, | Jun 30 2008 | SAMSUNG ELECTRONICS CO , LTD | System and method for controlling adaptivity of signal modification using a phantom coefficient |
8849231, | Aug 08 2007 | SAMSUNG ELECTRONICS CO , LTD | System and method for adaptive power control |
8867759, | Jan 05 2006 | SAMSUNG ELECTRONICS CO , LTD | System and method for utilizing inter-microphone level differences for speech enhancement |
8886525, | Jul 06 2007 | Knowles Electronics, LLC | System and method for adaptive intelligent noise suppression |
8934641, | May 25 2006 | SAMSUNG ELECTRONICS CO , LTD | Systems and methods for reconstructing decomposed audio signals |
8949120, | Apr 13 2009 | Knowles Electronics, LLC | Adaptive noise cancelation |
8996389, | Jun 14 2011 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Artifact reduction in time compression |
9008329, | Jun 09 2011 | Knowles Electronics, LLC | Noise reduction using multi-feature cluster tracker |
9015041, | Jul 11 2008 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs |
9025777, | Jul 11 2008 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Audio signal decoder, audio signal encoder, encoded multi-channel audio signal representation, methods and computer program |
9043216, | Jul 11 2008 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Audio signal decoder, time warp contour data provider, method and computer program |
9076456, | Dec 21 2007 | SAMSUNG ELECTRONICS CO , LTD | System and method for providing voice equalization |
9185487, | Jun 30 2008 | Knowles Electronics, LLC | System and method for providing noise suppression utilizing null processing noise subtraction |
9263057, | Jul 11 2008 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs |
9293149, | Jul 11 2008 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs |
9293150, | Sep 12 2013 | International Business Machines Corporation | Smoothening the information density of spoken words in an audio signal |
9299363, | Jul 11 2008 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Time warp contour calculator, audio signal encoder, encoded audio signal representation, methods and computer program |
9431026, | Jul 11 2008 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs |
9466313, | Jul 11 2008 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs |
9502049, | Jul 11 2008 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs |
9536540, | Jul 19 2013 | SAMSUNG ELECTRONICS CO , LTD | Speech signal separation and synthesis based on auditory scene analysis and speech modeling |
9640194, | Oct 04 2012 | SAMSUNG ELECTRONICS CO , LTD | Noise suppression for speech processing based on machine-learning mask estimation |
9646632, | Jul 11 2008 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Time warp activation signal provider, audio signal encoder, method for providing a time warp activation signal, method for encoding an audio signal and computer programs |
9799330, | Aug 28 2014 | SAMSUNG ELECTRONICS CO , LTD | Multi-sourced noise suppression |
9830899, | Apr 13 2009 | SAMSUNG ELECTRONICS CO , LTD | Adaptive noise cancellation |
Patent | Priority | Assignee | Title |
5809454, | Jun 30 1995 | Godo Kaisha IP Bridge 1 | Audio reproducing apparatus having voice speed converting function |
5828994, | Jun 05 1996 | Vulcan Patents LLC | Non-uniform time scale modification of recorded audio |
6070135, | Sep 30 1995 | QIANG TECHNOLOGIES, LLC | Method and apparatus for discriminating non-sounds and voiceless sounds of speech signals from each other |
6484137, | Oct 31 1997 | MATSUSHITA ELECTRIC INDUSTRIAL CO , LTD | Audio reproducing apparatus |
6718309, | Jul 26 2000 | SSI Corporation | Continuously variable time scale modification of digital audio signals |
EP817168, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Apr 02 2002 | Koninklijke Philips Electronics N.V. | (assignment on the face of the patent) | / | |||
May 27 2002 | TAORI, RAKESH | Koninklijke Philips Electronics N V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 013079 | /0913 | |
May 30 2002 | GERRITS, ANDREAS JOHANNES | Koninklijke Philips Electronics N V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 013079 | /0913 | |
May 30 2002 | BURAZEROVIC, DZEVDET | Koninklijke Philips Electronics N V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 013079 | /0913 |
Date | Maintenance Fee Events |
Mar 26 2012 | REM: Maintenance Fee Reminder Mailed. |
Aug 12 2012 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Aug 12 2011 | 4 years fee payment window open |
Feb 12 2012 | 6 months grace period start (w surcharge) |
Aug 12 2012 | patent expiry (for year 4) |
Aug 12 2014 | 2 years to revive unintentionally abandoned end. (for year 4) |
Aug 12 2015 | 8 years fee payment window open |
Feb 12 2016 | 6 months grace period start (w surcharge) |
Aug 12 2016 | patent expiry (for year 8) |
Aug 12 2018 | 2 years to revive unintentionally abandoned end. (for year 8) |
Aug 12 2019 | 12 years fee payment window open |
Feb 12 2020 | 6 months grace period start (w surcharge) |
Aug 12 2020 | patent expiry (for year 12) |
Aug 12 2022 | 2 years to revive unintentionally abandoned end. (for year 12) |