An end-pointer determines a beginning and an end of a speech segment. The end-pointer includes a voice triggering module that identifies a portion of an audio stream that has an audio speech segment. A rule module communicates with the voice triggering module. The rule module includes a plurality of rules used to analyze a part of the audio stream to detect a beginning and an end of the audio speech segment. A consonant detector detects occurrences of a high frequency consonant in the portion of the audio stream.
|
34. A system that determines a beginning and an end of an audio speech segment in an audio stream, comprising:
an /s/ detector that converts a difference between a signal-to-noise ratio in a high frequency band of the audio stream and a signal-to-noise ratio in a low frequency band of the audio stream into a probability value that predicts a likelihood of an /s/ sound in the audio stream; and
an end-pointer comprising a processor that varies an amount of an audio input sent to a recognition device based on a plurality of rules and an output of the /s/ detector;
where the end-pointer identifies a beginning of the audio input or an end of the audio input based on the output of the /s/ detector, and where the beginning of the audio input and the end of the audio input represent boundaries between speech and non-speech portions of the audio stream.
27. A system that identifies a beginning and an end of a speech segment comprising:
an end-pointer comprising a processor that analyzes a dynamic aspect of an audio stream to determine the beginning and the end of the speech segment; and
a high frequency consonant detector that marks the end of the speech segment, where the high frequency consonant detector calculates a difference between a signal-to-noise ratio in a high frequency band of the audio stream and a signal-to-noise ratio in a low frequency band of the audio stream, and where the high frequency consonant detector converts the difference between the signal-to-noise ratio in the high frequency band and the signal-to-noise ratio in the low frequency band into a probability value that predicts a likelihood that a high frequency consonant exists in a frame of the audio stream;
where the beginning of the speech segment and the end of the speech segment represent boundaries between speech and non-speech portions of the audio stream, and where the end-pointer identifies the beginning of the audio speech segment or the end of the audio speech segment based on an output of the high frequency consonant detector.
1. An end-pointer that determines a beginning and an end of a speech segment comprising:
a voice triggering module that identifies a portion of an audio stream comprising an audio speech segment;
a rule module in communication with the voice triggering module, the rule module comprising a plurality of rules used by a processor to analyze a part of the audio stream to detect a beginning and an end of the audio speech segment; and
a consonant detector that calculates a difference between a signal-to-noise ratio in a high frequency band and a signal-to-noise ratio in a low frequency band, where the consonant detector converts the difference between the signal-to-noise ratio in the high frequency band and the signal-to-noise ratio in the low frequency band into a probability value that predicts a likelihood of a high frequency consonant in the portion of the audio stream;
where the beginning of the audio speech segment and the end of the audio speech segment represent boundaries between speech and non-speech portions of the audio stream, and where the rule module identifies the beginning of the audio speech segment or the end of the audio speech segment based on an output of the consonant detector.
16. A method that identifies a beginning and an end of a speech segment using an end-pointer comprising:
receiving a portion of an audio stream;
determining whether the portion of the audio stream includes a triggering characteristic;
calculating a difference between a signal-to-noise ratio in a high frequency band of the portion of the audio stream and a signal-to-noise ratio in a low frequency band of the portion of the audio stream;
converting, by a consonant detector implemented in hardware or embodied in a computer-readable storage medium, the difference between the signal-to-noise ratio in the high frequency band and the signal-to-noise ratio in the low frequency band into a probability value that predicts a likelihood of a high frequency consonant in the portion of the audio stream; and
applying a rule that passes only a portion of the audio stream to a device when the triggering characteristic identifies a beginning of a voiced segment and an end of a voiced segment;
where the identification of the end of the voiced segment is based on an output of the consonant detector, where the end of the voiced segment represents a boundary between speech and non-speech portions of the audio stream.
36. A non-transitory computer readable medium that stores software that determines at least one of a beginning and end of an audio speech segment comprising:
a detector that converts sound waves into operational signals;
a triggering logic that analyzes a periodicity of the operational signals;
a signal analysis logic that analyzes a variable portion of the sound waves that are associated with the audio speech segment to determine a beginning and end of the audio speech segment, and
a consonant detector that calculates a difference between a signal-to-noise ratio in a high frequency band and a signal-to-noise ratio in a low frequency band, where the consonant detector converts the difference between the signal-to-noise ratio in the high frequency band and the signal-to-noise ratio in the low frequency band into a probability value that predicts a likelihood of an /s/ sound in the sound waves, where the consonant detector provides an input to the signal analysis logic when the /s/ is detected;
where the beginning of the audio speech segment and the end of the audio speech segment represent boundaries between speech and non-speech portions of the sound waves, and where the signal analysis module identifies the beginning of the audio speech segment or the end of the audio speech segment based on an output of the consonant detector.
5. The end-pointer of
6. The end-pointer of
7. The end-pointer of
8. The end-pointer of
10. The end-pointer of
11. The end-pointer of
12. The end-pointer of
13. The end-pointer of
14. The end-pointer of
where the consonant detector adds to the current probability value a portion of the difference between the current probability value and the probability value associated with the previous frame, upon determination that the current probability value is less than the probability value associated with the previous frame, where the consonant detector generates the portion of the difference by multiplying the difference by a percentage; and
where the smoothing factor is different than the percentage.
15. The end-pointer of
17. The method of
18. The method of
21. The method of
23. The method of
24. The method of
25. The method of
26. The method of
28. The system of
30. The system of
31. The system of
33. The system of
35. The end system of
37. The non-transitory computer readable medium of
38. The non-transitory computer readable medium of
39. The non-transitory computer readable medium of
40. The non-transitory computer readable medium of
41. The non-transitory computer readable medium of
42. The non-transitory computer readable medium of
43. The non-transitory computer readable medium of
|
This application is a continuation-in-part of U.S. application Ser. No. 11/152,922 filed Jun. 15, 2005. The entire content of the application is incorporated herein by reference, except that in the event of any inconsistent disclosure from the present application, the disclosure herein shall be deemed to prevail.
1. Technical Field
These inventions relate to automatic speech recognition, and more particularly, to systems that identify speech from non-speech.
2. Related Art
Automatic speech recognition (ASR) systems convert recorded voice into commands that may be used to carry out tasks. Command recognition may be challenging in high-noise environments such as in automobiles. One technique attempts to improve ASR performance by submitting only relevant data to an ASR system. Unfortunately, some techniques fail in non-stationary noise environments, where transient noises like clicks, bumps, pops, coughs, etc trigger recognition errors. Therefore, a need exists for a system that identifies speech in noisy conditions.
An end-pointer determines a beginning and an end of a speech segment. The end-pointer includes a voice triggering module that identifies a portion of an audio stream that has an audio speech segment. A rule module communicates with the voice triggering module. The rule module includes a plurality of rules used to analyze a part of the audio stream to detect a beginning and end of an audio speech segment. A consonant detector detects occurrences of a high frequency consonant in the portion of the audio stream.
Other systems, methods, features and advantages of the invention will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the following claims.
The inventions can be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like referenced numerals designate corresponding parts throughout the different views.
ASR systems are tasked with recognizing spoken commands. These tasks may be facilitated by sending voice segments to an ASR engine. A voice segment may be identified through end-pointing logic. Some end-pointing logic applies rules that identify the duration of consonants and pauses before and/or after a vowel. The rules may monitor a maximum duration of non-voiced energy, a maximum duration of continuous silence before a vowel, a maximum duration of continuous silence after a vowel, a maximum time before a vowel, a maximum time after a vowel, a maximum number of isolated non-voiced energy events before a vowel, and/or a maximum number of isolated non-voiced energy events after a vowel. When a vowel is detected, the end-pointing logic may follow a signal-to-noise (SNR) contour forward and backward in time. The limits of the end-pointing logic may occur when the amplitude reaches a predetermined level which may be zero or near zero. While searching, the logic identifies voiced and unvoiced intervals to be processed by an ASR engine.
Some end-pointers examine one or more characteristics of an audio stream for a triggering characteristic. A triggering characteristic may identify a speech interval that includes voiced or unvoiced segments. Voiced segments may have a near periodic structure in the time-domain like vowels. Non-voiced segments may have a noise-like structure (nonperiodic) in the time domain like a fricative. The end-pointers analyze one or more dynamic aspects of an audio stream. The dynamic aspects may include: (1) characteristics that reflect a speaker's pace (e.g., rate of speech), pitch, etc.; (2) a speaker's expected response (such as a “yes” or “no” response); and/or (3) environmental characteristics, such as a background noise level, echo, etc.
The local or remote memory 106 may buffer audio data received before or during an end-pointing process. The processor 104 may communicate through an input/output (I/O) interface 110 that receives input from devices that convert sound waves into electrical, optical, or operational signals 114. The I/O 110 may transmit these signals to devices 112 that convert signals into sound. The controller 104 and/or processor 104 may execute the software or code that implements each of the processes described herein including those described in
Initially, the process designates some or all of the initial frames as not speech 304. When energy is detected, voicing analysis of the current frame or, designated framen occurs at 306. The voicing analysis described in U.S. Ser. No. 11/131,150, filed May 17, 2005, which is incorporated herein by reference, may be used. The voicing analysis monitors triggering characteristics that may be present in framen. The voicing analysis may detect higher frequency consonants such as an “s” or “x” in a framen. Alternatively, the voicing analysis may detect vowels. To further explain the process, a vowel triggering characteristic is further described.
Voicing analysis detects vowels in frames in
When the voicing analysis detects a vowel in framen, the framen is marked as speech at 310. The system then processes one or more previous frames. A previous frame may be an immediate preceding frame, framen−1 at 312. The system may determine whether the previous frame was previously marked as speech at 314. If the previous frame was marked as speech (e.g., answer of “Yes” to block 314), the system analyzes a new audio frame at 304. If the previous frame was not marked as speech (e.g., answer of “No” to 314), the process applies one or more rules to determine whether the frame should be marked as speech.
Block 316 designates decision block “Outside EndPoint” that applies one or more rules to determine when the frame should be marked as speech. The rules may be applied to any part of the audio segment, such as a frame or a group of frames. The rules may determine whether the current frame or frames contain speech. If speech is detected, the frame is designated within an end-point. If not, the frame is designated outside of the endpoint.
If a framen−1 is outside of the end-point (e.g., no speech is present), a new audio frame, framen+1, may be processed. It may be initially designated as non-speech, at block 304. If the decision at 316 indicates that framen−1 is within the end-point (e.g., speech is present), then framen−1 is designated or marked as speech at 318. The previous audio stream is then analyzed, until the last frame is read from a local or remote memory at 320.
The rules may examine transitions into energy events from periods of silence or from periods of silence into energy events. A rule may analyze the number of transitions before a vowel is detected; another rule may determine that speech may include no more than one transition between an unvoiced event or silence and a vowel. Some rules may analyze the number of transitions after a vowel is detected with a rule that speech may include no more than two transitions from an unvoiced event or silence after a vowel is detected.
One or more rules may be based on the occurrence of one or multiple events (e.g. voiced energy, un-voiced energy, an absence/presence of silence, etc.). A rule may analyze the time preceding an event. Some rules may be triggered by the lapse of time before a vowel is detected. A rule may expect a vowel to occur within a variable range such as about a 300 ms to 400 ms interval or a rule may expect a vowel to be detected within a predetermined time period (e.g., about 350 ms in some processes). Some rules determine a portion of speech intervals based on the time following an event. When a vowel is detected a rule may extend a speech interval by a fixed or variable length. In some processes the time period may comprise a range (e.g., about 400 ms to 800 ms in some processes) or a predetermined time limit (e.g., about 600 ms in some processes).
Some rules may examine the duration of an event. The rules may examine the duration of a detected energy (e.g., voiced or unvoiced) or the lack of energy. A rule may analyze the duration of continuous unvoiced energy. A rule may establish that continuous unvoiced energy may occur within a variable range (e.g., about 150 ms to about 300 ms in some processes), or may occur within a predetermined limit (e.g., about 200 ms in some processes). A rule may analyze the duration of continuous silence before a vowel is detected. A rule may establish that speech may include a period of continuous silence before a vowel is detected within a variable range (e.g., about 50 ms to about 80 ms in some processes) or at a predetermined limit (e.g., about 70 ms in some processes). A rule may analyze the time duration of continuous silence after a vowel is detected. Such a rule may establish that speech may include a duration of continuous silence after a vowel is detected within a variable range (e.g., about 200 ms to about 300 ms in some processes) or a rule may establish that silence occurs across a predetermined time limit (e.g., about 250 ms in some processes).
At 402, the process determines if a frame or group of frames has an energy level above a background noise level. A frame or group of frames having more energy than a background noise level may be analyzed based on its duration or its relationship to an event. If the frame or group of frames does not have more energy than a background noise level, then the frame or group of frames may be analyzed based on its duration or relationship to one or more events. In some systems the events may comprise a transition into energy events from periods of silence or a transition from periods of silence into energy events.
When energy is present in the frame or a group of frames, an “energy” counter is incremented at block 404. The “energy” counter tracks time intervals. It may be incremented by a frame length. If the frame size is about 32 ms, then block 404 may increment the “energy” counter by about 32 ms. At 406, the “energy” counter is compared to a threshold. The threshold may correspond to the continuous unvoiced energy rule which may be used to determine the presence and/or absence of speech. If decision 406 determines that the threshold was exceeded, then the frame or group of frames are designated outside the end-point (e.g. no speech is present) at 408 at which point the system jumps back to 304 of
If the time threshold is not exceeded by the “energy” counter at 406, then the process determines if the “noenergy” counter exceeds an isolation threshold at 410. The “noenergy” counter 418 may track time and is incremented by the frame length when a frame or group of frames does not possess energy above a noise level. The isolation threshold may comprise a threshold of time between two plosive events. A plosive relates to a speech sound produced by a closure of the oral cavity and subsequent release accompanied by a burst of air. Plosives may include the sounds /p/ in pit or /d/ in dog. An isolation threshold may vary within a range (e.g., such as about 10 ms to about 50 ms) or may be a predetermined value such as about 25 ms. If the isolation threshold is exceeded, an isolated unvoiced energy event (e.g., a plosive followed by silence) was identified, and “isolatedevents” counter 412 is incremented. The “isolatedevents” counter 412 is incremented in integer values. After incrementing the “isolatedevents” counter 412, “noenergy” counter 418 is reset at block 414. The “isolatedevents” counter may be reset due to the energy found within the frame or group of frames analyzed. If the “noenergy” counter 418 does not exceed the isolation threshold, the “noenergy” counter 418 is reset at block 414 without incrementing the “isolatedevents” counter 412. The “noenergy” counter 418 is reset because energy was found within the frame or group of frames analyzed. When the “noenergy” counter 418 is reset, the outside end-point analysis designates the frame or group of frames analyzed within the end-point (e.g. speech is present) by returning a “NO” value at 416. As a result, the system marks the analyzed frame(s) as speech at 318 or 322 of
Alternatively, if the process determines that there is no energy above the noise level at 402 then the frame or group of frames analyzed contain silence or background noise. In this condition, the “noenergy” counter 418 is incremented. At 420, the process determines if the value of the “noenergy” counter exceeds a predetermined time threshold. The predetermined time threshold may correspond to the continuous non-voiced energy rule threshold which may be used to determine the presence and/or absence of speech. At 420, the process evaluates the duration of continuous silence. If the process determines that the threshold is exceeded by the value of the “noenergy” counter at 420, then the frame or group of frames are designated outside the end-point (e.g. no speech is present) at block 408. The process then proceeds to 304 of
If no time threshold is exceeded by the value of the “noenergy” counter 418, then the process determines if the maximum number of allowed isolated events has occurred at 422. The maximum number of allowed isolated events is a configurable or programmed parameter. If grammar is expected (e.g. a “Yes” or a “No” answer) the maximum number of allowed isolated events may be programmed to “tighten” the end-pointer's interval or band. If the maximum number of allowed isolated events is exceeded, then the frame or frames analyzed are designated as being outside the end-point (e.g. no speech is present) at block 408. The system then jumps back to block 304 where a new frame, framen+1, is processed and marked as non-speech.
If the maximum number of allowed isolated events is not reached, “energy” counter 404 is reset at block 424. “Energy” counter 404 may be reset when a frame of no energy is identified. When the “energy” counter 404 is reset, the outside end-point analysis designates the frame or frames analyzed inside the end-point (e.g. speech is present) by returning a “NO” value at block 416. The process then marks the analyzed frame as speech at 318 or 322 of
Block 512 illustrates how the end-pointer may respond to an input audio stream. In
In
Some end-pointers determine the beginning and/or end of a speech segment by analyzing a dynamic aspect of an audio stream.
The global and local initializations may occur at various times throughout system operation. The background noise estimations (local aspect initialization) may occur during nonspeech intervals or when certain events occur such as when the system is powered up. The pace of a speaker's speech or pitch (global initialization) and monitoring of certain responses (local aspect initialization) may be initialized less frequently. Initialization may occur when an ASR engine communicates to an end-pointer or at other times.
During initialization periods 1002 and 1004, the end-pointer may operate at programmable default thresholds. If a threshold or timer needs to be change, the system may dynamically change the thresholds or timing values. In some systems, thresholds, times, and other variables may be loaded into an end-pointer by reading specific or general user profiles from the system's local memory or a remote memory. These values and settings may also be changed in real-time or near real-time. If the system determines that a user speaks at a fast pace, the duration of certain rules may be changed and retained within the local or remote profiles. If the system uses a training mode, these parameters may also be programmed or set during a training session.
The operation of some dynamic end-pointer processes may have similar functionality to the processes described in
An alternative end-pointer system includes a high frequency consonant detector or s-detector that detects high-frequency consonants. The high frequency consonant detector calculates the likelihood of a high-frequency consonant by comparing a temporally smoothed SNR in a high-frequency band to a SNR in one or more low frequency bands. Some systems select the low frequency bands from a predetermined plurality of lower frequency bands (e.g., two, three, four, five, etc. of the lower frequency bands). The difference between these SNR measurements is converted into a temporally smoothed probability through probability logic that generates a ratio between about zero and one hundred that predicts the likelihood of a consonant.
One process that may adjust the voice thresholds may be based on the detection of unvoiced speech, plosives, or a consonant such as an /s/. In
In some processes the programmed number of audio frames comprises the difference between the originally stored frame number and the current frame number. In an alternative process, the programmed frame number comprises the number of frames occurring within a predetermined time period (e.g., may be very short such as about 100 ms). In these processes the voice threshold is raised to the previously stored current voice threshold across that time period. In an alternative process, a counter tracks the number of frames processed. The alternative process raises the voice threshold across a count of successive frames.
The methods shown in
A “computer-readable medium,” “machine-readable medium,” “propagated-signal” medium, and/or “signal-bearing medium” may comprise any means that contains, stores, communicates, propagates, or transports software for use by or in connection with an instruction executable system, system, or device. The machine-readable medium may selectively be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, system, device, or propagation medium. A non-exhaustive list of examples of a machine-readable medium would include: an electrical connection “electronic” having one or more wires, a portable magnetic or optical disk, a volatile memory such as a Random Access Memory “RAM” (electronic), a Read-Only Memory “ROM” (electronic), an Erasable Programmable Read-Only Memory (EPROM or Flash memory) (electronic), or an optical fiber (optical). A machine-readable medium may also include a tangible medium upon which software is printed, as the software may be electronically stored as an image or in another format (e.g., through an optical scan), then compiled, and/or interpreted or otherwise processed. The processed medium may then be stored in a computer and/or machine memory.
While various embodiments of the inventions have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the inventions. Accordingly, the inventions are not to be restricted except in light of the attached claims and their equivalents.
Hetherington, Phillip A., Fallat, Mark
Patent | Priority | Assignee | Title |
10140975, | Apr 23 2014 | GOOGLE LLC | Speech endpointing based on word comparisons |
10269341, | Oct 19 2015 | GOOGLE LLC | Speech endpointing |
10546576, | Apr 23 2014 | GOOGLE LLC | Speech endpointing based on word comparisons |
10593352, | Jun 06 2017 | GOOGLE LLC | End of query detection |
10929754, | Jun 06 2017 | GOOGLE LLC | Unified endpointer using multitask and multidomain learning |
11004441, | Apr 23 2014 | GOOGLE LLC | Speech endpointing based on word comparisons |
11062696, | Oct 19 2015 | GOOGLE LLC | Speech endpointing |
11328736, | Jun 22 2017 | WEIFANG GOERTEK MICROELECTRONICS CO ,LTD | Method and apparatus of denoising |
11551709, | Jun 06 2017 | GOOGLE LLC | End of query detection |
11636846, | Apr 23 2014 | GOOGLE LLC | Speech endpointing based on word comparisons |
11676625, | Jun 06 2017 | GOOGLE LLC | Unified endpointer using multitask and multidomain learning |
11710477, | Oct 19 2015 | GOOGLE LLC | Speech endpointing |
8442831, | Oct 31 2008 | KYNDRYL, INC | Sound envelope deconstruction to identify words in continuous speech |
8793132, | Dec 26 2006 | Microsoft Technology Licensing, LLC | Method for segmenting utterances by using partner's response |
8843369, | Dec 27 2013 | GOOGLE LLC | Speech endpointing based on voice profile |
8942987, | Dec 11 2013 | Jefferson Audio Video Systems, Inc. | Identifying qualified audio of a plurality of audio streams for display in a user interface |
9607613, | Apr 23 2014 | GOOGLE LLC | Speech endpointing based on word comparisons |
Patent | Priority | Assignee | Title |
4435617, | Aug 13 1981 | Griggs Talkwriter Corporation | Speech-controlled phonetic typewriter or display device using two-tier approach |
4486900, | Mar 30 1982 | AT&T Bell Laboratories | Real time pitch detection by stream processing |
4531228, | Oct 20 1981 | Nissan Motor Company, Limited | Speech recognition system for an automotive vehicle |
4532648, | Oct 22 1981 | AT & T TECHNOLOGIES, INC , | Speech recognition system for an automotive vehicle |
4630305, | Jul 01 1985 | Motorola, Inc. | Automatic gain selector for a noise suppression system |
4701955, | Oct 21 1982 | NEC Corporation | Variable frame length vocoder |
4811404, | Oct 01 1987 | Motorola, Inc. | Noise suppression system |
4843562, | Jun 24 1987 | BROADCAST DATA SYSTEMS LIMITED PARTNERSHIP, 1515 BROADWAY, NEW YORK, NEW YORK 10036, A DE LIMITED PARTNERSHIP | Broadcast information classification system and method |
4856067, | Aug 21 1986 | Oki Electric Industry Co., Ltd. | Speech recognition system wherein the consonantal characteristics of input utterances are extracted |
4945566, | Nov 24 1987 | U S PHILIPS CORPORATION | Method of and apparatus for determining start-point and end-point of isolated utterances in a speech signal |
4989248, | Jan 28 1983 | Texas Instruments Incorporated | Speaker-dependent connected speech word recognition method |
5027410, | Nov 10 1988 | WISCONSIN ALUMNI RESEARCH FOUNDATION, MADISON, WI A NON-STOCK NON-PROFIT WI CORP | Adaptive, programmable signal processing and filtering for hearing aids |
5056150, | Nov 16 1988 | Institute of Acoustics, Academia Sinica | Method and apparatus for real time speech recognition with and without speaker dependency |
5146539, | Nov 30 1984 | Texas Instruments Incorporated | Method for utilizing formant frequencies in speech recognition |
5151940, | Dec 24 1987 | Fujitsu Limited | Method and apparatus for extracting isolated speech word |
5152007, | Apr 23 1991 | Motorola, Inc | Method and apparatus for detecting speech |
5201028, | Sep 21 1990 | ILLINOIS TECHNOLOGY TRANSFER, L L C | System for distinguishing or counting spoken itemized expressions |
5293452, | Jul 01 1991 | Texas Instruments Incorporated | Voice log-in using spoken name input |
5305422, | Feb 28 1992 | Panasonic Corporation of North America | Method for determining boundaries of isolated words within a speech signal |
5313555, | Feb 13 1991 | Sharp Kabushiki Kaisha | Lombard voice recognition method and apparatus for recognizing voices in noisy circumstance |
5400409, | Dec 23 1992 | Nuance Communications, Inc | Noise-reduction method for noise-affected voice channels |
5408583, | Jul 26 1991 | Casio Computer Co., Ltd. | Sound outputting devices using digital displacement data for a PWM sound signal |
5479517, | Dec 23 1992 | Nuance Communications, Inc | Method of estimating delay in noise-affected voice channels |
5495415, | Nov 18 1993 | Regents of the University of Michigan | Method and system for detecting a misfire of a reciprocating internal combustion engine |
5502688, | Nov 23 1994 | GENERAL DYNAMICS ADVANCED TECHNOLOGY SYSTEMS, INC | Feedforward neural network system for the detection and characterization of sonar signals with characteristic spectrogram textures |
55201, | |||
5526466, | Apr 14 1993 | Matsushita Electric Industrial Co., Ltd. | Speech recognition apparatus |
5568559, | Dec 17 1993 | Canon Kabushiki Kaisha | Sound processing apparatus |
5572623, | Oct 21 1992 | Sextant Avionique | Method of speech detection |
5584295, | Sep 01 1995 | Analogic Corporation | System for measuring the period of a quasi-periodic signal |
5596680, | Dec 31 1992 | Apple Inc | Method and apparatus for detecting speech activity using cepstrum vectors |
5617508, | Oct 05 1992 | Matsushita Electric Corporation of America | Speech detection device for the detection of speech end points based on variance of frequency band limited energy |
5677987, | Nov 19 1993 | Matsushita Electric Industrial Co., Ltd. | Feedback detector and suppressor |
5680508, | May 03 1991 | Exelis Inc | Enhancement of speech coding in background noise for low-rate speech coder |
5687288, | Sep 20 1994 | U S PHILIPS CORPORATION | System with speaking-rate-adaptive transition values for determining words from a speech signal |
5692104, | Dec 31 1992 | Apple Inc | Method and apparatus for detecting end points of speech activity |
5701344, | Aug 23 1995 | Canon Kabushiki Kaisha | Audio processing apparatus |
5732392, | Sep 25 1995 | Nippon Telegraph and Telephone Corporation | Method for speech detection in a high-noise environment |
5794195, | Jun 28 1994 | Alcatel N.V. | Start/end point detection for word recognition |
5933801, | Nov 25 1994 | Method for transforming a speech signal using a pitch manipulator | |
5949888, | Sep 15 1995 | U S BANK NATIONAL ASSOCIATION | Comfort noise generator for echo cancelers |
5963901, | Dec 12 1995 | Nokia Technologies Oy | Method and device for voice activity detection and a communication device |
6011853, | Oct 05 1995 | Nokia Technologies Oy | Equalization of speech signal in mobile phone |
6021387, | Oct 21 1994 | Sensory Circuits, Inc. | Speech recognition apparatus for consumer electronic applications |
6029130, | Aug 20 1996 | Ricoh Company, LTD | Integrated endpoint detection for improved speech recognition method and system |
6098040, | Nov 07 1997 | RPX CLEARINGHOUSE LLC | Method and apparatus for providing an improved feature set in speech recognition by performing noise cancellation and background masking |
6163608, | Jan 09 1998 | Ericsson Inc. | Methods and apparatus for providing comfort noise in communications systems |
6167375, | Mar 17 1997 | Kabushiki Kaisha Toshiba | Method for encoding and decoding a speech signal including background noise |
6173074, | Sep 30 1997 | WSOU Investments, LLC | Acoustic signature recognition and identification |
6175602, | May 27 1998 | Telefonaktiebolaget LM Ericsson | Signal noise reduction by spectral subtraction using linear convolution and casual filtering |
6192134, | Nov 20 1997 | SNAPTRACK, INC | System and method for a monolithic directional microphone array |
6199035, | May 07 1997 | Nokia Technologies Oy | Pitch-lag estimation in speech coding |
6216103, | Oct 20 1997 | Sony Corporation; Sony Electronics Inc. | Method for implementing a speech recognition system to determine speech endpoints during conditions with background noise |
6240381, | Feb 17 1998 | Fonix Corporation | Apparatus and methods for detecting onset of a signal |
6304844, | Mar 30 2000 | VERBALTEK, INC | Spelling speech recognition apparatus and method for communications |
6317711, | Feb 25 1999 | Ricoh Company, Ltd. | Speech segment detection and word recognition |
6324509, | Feb 08 1999 | Qualcomm Incorporated | Method and apparatus for accurate endpointing of speech in the presence of noise |
6356868, | Oct 25 1999 | MAVENIR, INC | Voiceprint identification system |
6405168, | Sep 30 1999 | WIAV Solutions LLC | Speaker dependent speech recognition training using simplified hidden markov modeling and robust end-point detection |
6434246, | Oct 10 1995 | GN RESOUND AS MAARKAERVEJ 2A | Apparatus and methods for combining audio compression and feedback cancellation in a hearing aid |
6453285, | Aug 21 1998 | Polycom, Inc | Speech activity detector for use in noise reduction system, and methods therefor |
6487532, | Sep 24 1997 | Nuance Communications, Inc | Apparatus and method for distinguishing similar-sounding utterances speech recognition |
6507814, | Aug 24 1998 | SAMSUNG ELECTRONICS CO , LTD | Pitch determination using speech classification and prior pitch estimation |
6535851, | Mar 24 2000 | SPEECHWORKS INTERNATIONAL, INC | Segmentation approach for speech recognition systems |
6574592, | Mar 19 1999 | Kabushiki Kaisha Toshiba | Voice detecting and voice control system |
6574601, | Jan 13 1999 | Alcatel Lucent | Acoustic speech recognizer system and method |
6587816, | Jul 14 2000 | Nuance Communications, Inc | Fast frequency-domain pitch estimation |
6643619, | Oct 30 1997 | Nuance Communications, Inc | Method for reducing interference in acoustic signals using an adaptive filtering method involving spectral subtraction |
6687669, | Jul 19 1996 | Nuance Communications, Inc | Method of reducing voice signal interference |
6711540, | Sep 25 1998 | MICROSEMI SEMICONDUCTOR U S INC | Tone detector with noise detection and dynamic thresholding for robust performance |
6721706, | Oct 30 2000 | KONINKLIJKE PHILIPS ELECTRONICS N V | Environment-responsive user interface/entertainment device that simulates personal interaction |
6782363, | May 04 2001 | WSOU Investments, LLC | Method and apparatus for performing real-time endpoint detection in automatic speech recognition |
6822507, | Apr 26 2000 | Dolby Laboratories Licensing Corporation | Adaptive speech filter |
6850882, | Oct 23 2000 | System for measuring velar function during speech | |
6859420, | Jun 26 2001 | Raytheon BBN Technologies Corp | Systems and methods for adaptive wind noise rejection |
6873953, | May 22 2000 | Nuance Communications | Prosody based endpoint detection |
6910011, | Aug 16 1999 | Malikie Innovations Limited | Noisy acoustic signal enhancement |
6996252, | Apr 19 2000 | DIGIMARC CORPORATION AN OREGON CORPORATION | Low visibility watermark using time decay fluorescence |
7117149, | Aug 30 1999 | 2236008 ONTARIO INC ; 8758271 CANADA INC | Sound source classification |
7146319, | Mar 31 2003 | Apple Inc | Phonetically based speech recognition system and method |
7535859, | Oct 16 2003 | MORGAN STANLEY SENIOR FUNDING, INC | Voice activity detection with adaptive noise floor tracking |
20010028713, | |||
20020071573, | |||
20020176589, | |||
20030040908, | |||
20030120487, | |||
20030216907, | |||
20040078200, | |||
20040138882, | |||
20040165736, | |||
20040167777, | |||
20050096900, | |||
20050114128, | |||
20050240401, | |||
20060034447, | |||
20060053003, | |||
20060074646, | |||
20060080096, | |||
20060100868, | |||
20060115095, | |||
20060116873, | |||
20060136199, | |||
20060178881, | |||
20060251268, | |||
20070033031, | |||
20070219797, | |||
20070288238, | |||
CA2157496, | |||
CA2158064, | |||
CA2158847, | |||
CN1042790, | |||
EP76687, | |||
EP543329, | |||
EP629996, | |||
EP750291, | |||
EP1450353, | |||
EP1450354, | |||
EP1669983, | |||
JP2000250565, | |||
JP6269084, | |||
JP6319193, | |||
KR1019990077910, | |||
KR1020010091093, | |||
WO41169, | |||
WO156255, | |||
WO173761, | |||
WO20040111996, |
Date | Maintenance Fee Events |
Oct 26 2015 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Oct 24 2019 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Oct 24 2023 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Date | Maintenance Schedule |
Apr 24 2015 | 4 years fee payment window open |
Oct 24 2015 | 6 months grace period start (w surcharge) |
Apr 24 2016 | patent expiry (for year 4) |
Apr 24 2018 | 2 years to revive unintentionally abandoned end. (for year 4) |
Apr 24 2019 | 8 years fee payment window open |
Oct 24 2019 | 6 months grace period start (w surcharge) |
Apr 24 2020 | patent expiry (for year 8) |
Apr 24 2022 | 2 years to revive unintentionally abandoned end. (for year 8) |
Apr 24 2023 | 12 years fee payment window open |
Oct 24 2023 | 6 months grace period start (w surcharge) |
Apr 24 2024 | patent expiry (for year 12) |
Apr 24 2026 | 2 years to revive unintentionally abandoned end. (for year 12) |