A method for varying speech speed is provided. The method includes the following steps: receive an original speech signal; calculate a pitch period of the original speech signal; define search ranges according to the pitch period; find a maximum within each of the search ranges of the original speech signal; divide the original speech signal into speech sections according to the maxima; obtain a speed-varied speech signal by applying a speed-varying algorithm to each speech section of the original speed signal according to a speed-varying command; and eventually, output the speed-varied speech signal.
|
1. A method for varying speech speed, comprising the steps of:
receiving an original speech signal;
calculating, using a microprocessor, a pitch period of the original speech signal;
defining search ranges according to the pitch period;
finding a maximum within each of the search ranges of the original speech signal;
dividing the original speech signal into a plurality of speech sections according to the maxima;
obtaining a speed-varied speech signal by applying a speed-varying algorithm to each of the speech sections according to a speed-varying command; and
outputting the speed-varied speech signal;
wherein the speed-varying algorithm comprises the steps of:
multiplying each of the speech sections in the original speech signal by a weighting function to obtain a plurality of weighting sections; and
adding up the weighting sections;
wherein in each of the search ranges the weighting function is an increasing function when prior to the maximum but a decreasing function when posterior to the maximum;
wherein the weighting function is a triangular wave function; and
wherein if the speech sections have different sizes, the overlapped portion of the speech sections is multiplied by the weighting function, and the unoverlapped portion is not multiplied by the weighting function.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
|
This non-provisional application claims priority under 35 U.S.C. §119(a) on Patent Application No(s). 95145977 filed in Taiwan, R.O.C. on Dec. 8, 2006, the entire contents of which are hereby incorporated by reference.
1. Field of Invention
The present invention relates to a method for varying speech speed, and more particularly to a method based on pitch period of speech signal to vary the speech speed.
2. Related Art
For the electronic apparatuses equipped with language learning functions, language conversations intended to learn may be recorded in the apparatus in advance. The electronic apparatus may be portable to allow the user learning language wherever and whenever. However, every user is at different learning level; the same speed for playing a section of conversation may be proper to understand for some users, but too fast to understand for others. Therefore, a so-called speed-varying function becomes one of the major functions of the language-learning apparatus.
Speed variation indicates that the language-learning apparatus varies the playing speed by user's demand while playing speech(s), accompanying with the same tone under various speeds. So ideally no matter the speed variation becomes slower or faster, users may all listen clearly; which is really helpful to language learning.
Although the conventional language-learning apparatus has the speed-varying function, usually the speech played through speed variation is distorted. Since the speech signal is a continuous analog signal, the voiceprint frequencies generated from different persons' pronunciations or different sound sources are different. A common speed-varying technology is to repeatedly play the sampling speech data, or to play intermittently by intervals, thereby facilitate the speed-varying function. Such approach will provide decelerated or accelerated playing speeds and the same signal envelope as the original speech. However, it also generates echoes and machine noises, leading to decreases of the voiceprint frequency; the effects are just like decelerating or accelerating the rotation speed of a recorder motor, which causes obvious distortions.
Therefore, how to maintain the tone of the original speech without distortion while the user operates the speed-varying function on a language-learning apparatus has become an issue required to be urgently solved.
Accordingly the present invention provides a method for varying speech speed, which aims at the processing of the speech signal to facilitate deceleration or acceleration of playing the speech by user's demand. Those output to the user's ears after speed variation will be clear speeches without losing its original tones.
A method for varying speech speed provided by an exemplary embodiment of the present invention includes the following steps. First, receive an original speech signal. Calculate a pitch period of the original speech signal. Define search ranges according to the pitch period. Find a maximum within each of the search ranges of the original speech signal. Divide the original speech signal into speech sections according to the maxima. Obtain a speed-varied speech signal by applying a speed-varying algorithm to each speech section of the original speed signal according to a speed-varying command. Eventually, output the speed-varied speech signal.
According to the present invention, first the original speech signal is divided into plural speech sections. The divided sections is not fixed as the conventional technology, but defined according to the Sum of Magnitude Difference Function (SMDF) or Average of Magnitude Difference Function (AMDF). The pitch period of the original speech signal will be obtained in advance, and then a maximum will be found according to the data around the pitch period. Afterwards, use the found maxima to divide the original speech signal into the plural speech sections. The advantage of above solution is to proceed through speed variation process by using the smallest unit in the speech signal, namely, the pitch period. Therefore, the present invention actually uses a more precise solution to improve the quality of relevant speed variation.
Further scope of applicability of the present invention will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
The present invention will become more fully understood from the detailed description given hereinbelow illustration only, and thus are not limitative of the present invention, and wherein:
Please refer to
Step S10: Receive an original speech signal. The original speech signal is language declamation such as English, Japanese conversation and etc.
Step S20: Calculate a pitch period of the original speech signal. The sound range of human voice is about 50 Hz to 1000 Hz. Everyone will read a same section of conversation and make various ways of speech. That is because every person has a different voice timbre. The differences between voice timbres represent different soundwave shapes for their pitch periods. Accordingly, every different speech signal has its different pitch period. As a result of every individual's unique voice timbre, the speech signal generated by the same person will have approximately the same pitch period; even though the speech has different contents.
Please refer to
Please refer to
In addition, the above SMDF calculation will make smaller curves due to the shorter overlapped waveform. To avoid such situation, we can proceed to obtain a normalized SMDF. Namely, divide the inner product of the overlapped portion by the amount of the overlapped dots to obtain the conventional AMDF (Average of Magnitude Difference Function). Therefore, using either SMDF or AMDF may calculate the pitch period of the original speech signal.
Step S30: Define search ranges according to the pitch period calculated in step S20. Although a section of the original speech signal is combined by multiple sections of the pitch period, there are still differences between high and low sounds generated as result of different speech contents (different contents of declaiming languages). So the pitch periods will have minor difference in their period sizes. Consequently, after calculate the pitch period(s) we define a search range around each of the pitch periods to facilitate the following search operations.
Step S40: Find a maximum within each of the search ranges of the original speech signal. Use each of the search ranges defined in step S30 as a unit to search in the original speech signal. Record the maximum found in each of the search ranges in the original speech signal.
Step S50: Divide the original speech signal into plural speech sections according to the maxima. Please refer to
Step S60: Obtain a speed-varied speech signal by applying a speed-varying algorithm to each speech section of the original speed signal according to a speed-varying command. The speed-varying command is given by the user. When the user thinks the speech signal is played too fast, the speed-varying command to decelerate may be given to the apparatus. When the speed-varying command is to decelerate, the speed-varying algorithm duplicates some of the speech section to make the speed-varied speech signal longer than the original speech signal. Please refer to
Oppositely, when the speed-varying command is to accelerate, the speed-varying algorithm will delete some of the speech sections to make the speech signal shorter than the original speech signal. Please refer to
Step S70: Eventually, output the speed-varied speech signal. The speed variation procedure is now completed.
Please refer to
Step S62: Multiply each of the speech sections in the original speech signal by a weighting function to obtain a weighting section; wherein in each of the search ranges the weighting function is an increasing function when prior to the maximum but a decreasing function when posterior to the maximum. Therefore, the weighting function may be a triangle wave function.
Step S64: Add up the weighting sections. Since each of the speech sections has been multiplied by the weighting function and becomes the weighting section, we can add up these weighting sections afterwards according to the speed-varying command. Therefore, the speed-varied speech signal will as clear as the original speech signal without distortions. Neither intermittent sounds nor echoes will be generated.
The aforesaid add-up speed-varying algorithm may further include the step of insetting the add-up weighting section between the speech sections. Please refer to
Oppositely, the add-up speed-varying algorithm may further include another step of replacing the speech section(s) with the add-up weighting section(s). Please refer to
Eventually, please refer to
The invention being thus described, it will be obvious that the same may be varied in many ways. Such variations are not to be regarded as a departure from the spirit and scope of the invention, and all such modifications as would be obvious to one skilled in the art are intended to be included within the scope of the following claims.
Yen, Ming Hsiang, Yen, Jui Yu, Kao, Kuang Chien
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
4864620, | Dec 21 1987 | DSP GROUP, INC , THE, A CA CORP | Method for performing time-scale modification of speech information or speech signals |
5175769, | Jul 23 1991 | Virentem Ventures, LLC | Method for time-scale modification of signals |
5341432, | Oct 06 1989 | Matsushita Electric Industrial Co., Ltd. | Apparatus and method for performing speech rate modification and improved fidelity |
5479564, | Aug 09 1991 | Nuance Communications, Inc | Method and apparatus for manipulating pitch and/or duration of a signal |
5717829, | Jul 28 1994 | Sony Corporation | Pitch control of memory addressing for changing speed of audio playback |
5749064, | Mar 01 1996 | Texas Instruments Incorporated | Method and system for time scale modification utilizing feature vectors about zero crossing points |
5828995, | Feb 28 1995 | Motorola, Inc. | Method and apparatus for intelligible fast forward and reverse playback of time-scale compressed voice messages |
6173255, | Aug 18 1998 | Lockheed Martin Corporation | Synchronized overlap add voice processing using windows and one bit correlators |
6496794, | Nov 22 1999 | Google Technology Holdings LLC | Method and apparatus for seamless multi-rate speech coding |
6718309, | Jul 26 2000 | SSI Corporation | Continuously variable time scale modification of digital audio signals |
6944510, | May 21 1999 | KONINKLIJKE PHILIPS ELECTRONICS, N V | Audio signal time scale modification |
6982377, | Dec 18 2003 | Texas Instruments Incorporated | Time-scale modification of music signals based on polyphase filterbanks and constrained time-domain processing |
7412379, | Apr 05 2001 | Koninklijke Philips Electronics N V | Time-scale modification of signals |
20020133334, | |||
20030033140, | |||
20050273321, | |||
20060149535, | |||
CN1197976, | |||
EP681398, | |||
EP910065, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jan 22 2007 | YEN, MING HSIANG | MICRO-STAR INT L CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 018900 | /0867 | |
Jan 22 2007 | YEN, JUI YU | MICRO-STAR INT L CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 018900 | /0867 | |
Jan 22 2007 | KAO, KUANG CHIEN | MICRO-STAR INT L CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 018900 | /0867 | |
Feb 16 2007 | Micro-Star Int'l Co., Ltd. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Mar 05 2014 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Mar 06 2018 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Mar 09 2022 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Date | Maintenance Schedule |
Dec 14 2013 | 4 years fee payment window open |
Jun 14 2014 | 6 months grace period start (w surcharge) |
Dec 14 2014 | patent expiry (for year 4) |
Dec 14 2016 | 2 years to revive unintentionally abandoned end. (for year 4) |
Dec 14 2017 | 8 years fee payment window open |
Jun 14 2018 | 6 months grace period start (w surcharge) |
Dec 14 2018 | patent expiry (for year 8) |
Dec 14 2020 | 2 years to revive unintentionally abandoned end. (for year 8) |
Dec 14 2021 | 12 years fee payment window open |
Jun 14 2022 | 6 months grace period start (w surcharge) |
Dec 14 2022 | patent expiry (for year 12) |
Dec 14 2024 | 2 years to revive unintentionally abandoned end. (for year 12) |