A wireless system comprises at least one subscriber unit in wireless communication with an infrastructure. Each subscriber unit implements a speech recognition client, and the infrastructure comprises a speech recognition server. A given subscriber unit takes as input an unencoded speech signal that is subsequently parameterized by the speech recognition client. The parameterized speech is then provided to the speech recognition server that, in turn, performs speech recognition analysis on the parameterized speech. information signals, based in part upon any recognized utterances identified by the speech recognition analysis, are subsequently provided to the subscriber unit. The information signals may be used to control the subscriber unit itself; to control one or more devices coupled to the subscriber unit, or may be operated upon by the subscriber unit or devices coupled thereto.
|
0. 74. In a subscriber unit that wirelessly communicates with an infrastructure, the subscriber unit comprising a speech recognition client and the infrastructure comprising a speech recognition server, a method for providing information signals to the subscriber unit:
receiving, by the speech recognition client, an unencoded speech signal;
analyzing, by the speech recognition client, the unencoded speech signal to provide a parameterized speech signal;
transmitting, by the subscriber unit, the parameterized speech signal to the speech recognition server; and
receiving, by the subscriber unit from the infrastructure, the information signals based on the parameterized speech signal being associated with recognized utterances that comprise a query, wherein the information signals provide information responsive to the query.
0. 62. In a speech recognition server forming a part of an infrastructure that wirelessly communicates with one or more subscriber units, each of the one or more subscriber units comprising a speech recognition client, a method for providing information signals to a subscriber unit of the one or more subscriber units, the method comprising steps of:
receiving a parameterized speech signal, produced by the speech recognition client, from the subscriber unit;
performing speech recognition analysis on the parameterized speech signal to provide recognized utterances;
determining the recognized utterances comprise a query;
determining information signals based on the recognized utterances, wherein the information signals provide information responsive to the query; and
responsive to the recognized utterances, providing the information signals to the subscriber unit.
0. 57. A subscriber unit that wirelessly communicates with an infrastructure, the subscriber unit comprising:
a speech recognition client that takes as input an unencoded speech signal and analyzes that unencoded speech signal to provide a parameterized speech signal, performs a speech recognition analysis on the parameterized speech to provide recognized utterances, and determines whether the recognized utterances comprise a predetermined utterance;
a transmitter, coupled to the speech recognition client, that wirelessly communicates a second parameterized speech signal to the infrastructure based on the determination; and
a receiver that takes as input signals regarding information signals and provides the information signals as output, wherein the information signals are generated by a speech recognition server residing in the infrastructure and are based on the second parameterized speech signal.
0. 51. In a subscriber unit that wirelessly communicates with an infrastructure, the subscriber unit comprising a speech recognition client and the infrastructure comprising a speech recognition server, a method for providing information signals to the subscriber unit:
receiving, by the speech recognition client, an unencoded speech signal;
analyzing, by the speech recognition client, the unencoded speech signal to provide a parameterized speech signal;
analyzing, by the speech recognition client, the parameterized speech signal to provide recognized utterances;
determining, by the speech recognition client, the recognized utterances comprise a predetermined utterance;
transmitting, by the subscriber unit, a second parameterized speech signal to the speech recognition server based on the determination; and
receiving, by the subscriber unit from the infrastructure, the information signals based on the second parameterized speech signal.
0. 68. A speech recognition server for use in an infrastructure, wherein the infrastructure wirelessly communicates with one or more subscribe units, the speech recognition server comprising:
a receiver that takes as input signals regarding a parameterized speech signal output by a subscriber unit of the one or more subscriber units and provides as output the parameterized speech signal;
a speech recognition analyzer, coupled to the receiver, that performs speech recognition analysis on the parameterized speech signal to provide recognized utterances;
a control processor that takes as input the recognized utterances and provides information signals based on the recognized utterances by determining the recognized utterances comprise a query, wherein the information signals provide information responsive to the query; and
a transmitter, coupled to the control processor, that provides the information signals to the subscriber unit of the one or more subscriber units.
0. 1. In a speech recognition server forming a part of an infrastructure that wirelessly communicates with one or more subscriber units, each of the one or more subscriber units comprising a speech recognition client, a method for providing information signals to a subscriber unit of the one or more subscriber units, the method comprising steps of:
receiving a parameterized speech signal, produced by the speech recognition client, from the subscriber unit;
performing speech recognition analysis on the parameterized speech signal to provide recognized utterances;
determining information signals based on the recognized utterances; and
responsive to the recognized utterances, providing the information signals to the subscriber unit.
0. 2. The method of
0. 3. The method of
directing the information signals to the subscriber unit, wherein the information signals control operation of the subscriber unit.
0. 4. The method of
directing the information signals to the at least one device, wherein the information signals control operation of the at least one device.
0. 5. The method of
directing the information signals to the subscriber unit, wherein the subscriber unit operates upon the information signals.
0. 6. The method of
receiving user data in response to the information signals from the subscriber unit; and
responsive to the user data, providing additional information signals to the subscriber unit.
0. 7. The method of
directing the information signals to the at least one device, wherein the at least one device operates upon the information signals.
0. 8. The method of
receiving user data in response to the information signals from the subscriber unit; and
responsive to the user data, providing additional information signals to the subscriber unit.
0. 9. A computer-readable medium having computer-executable instructions for performing the steps recited in
0. 10. In a speech recognition server forming a part of an infrastructure that wirelessly communicates with one or more subscriber units, each of the one or more subscriber units comprising a speech recognition client, a method for providing information signals to a subscriber unit of the one or more subscriber units, the method comprising steps of:
receiving a parameterized speech signal from the subscriber unit, the parameterized speech signal being output by the speech recognition client;
performing speech recognition analysis on the parameterized speech signal to provide recognized utterances; and
providing information regarding the recognized utterances to control entity forming a part of the infrastructure, wherein the control entity provides the information signals to the subscriber unit based on the information regarding the recognized utterances.
0. 11. The method of
0. 12. A computer-readable medium having computer-executable instructions for performing the steps recited in
0. 13. In a subscriber unit that wirelessly communicates with an infrastructure, the subscriber unit comprising a speech recognition client and the infrastructure comprising a speech recognition server, a method for providing information signals to the subscriber unit;
receiving, by the speech recognition client, the unencoded speech signal;
analyzing, by the speech recognition client, the unencoded speech signal to provide a parameterized speech signal;
transmitting, by the subscriber unit, the parameterized speech signal to the speech recognition server; and
receiving, by the subscriber unit from the infrastructure, the information signals based on the parameterized speech signals.
0. 14. The method of
using the information signals to control operation of the subscriber unit.
0. 15. The method of
based on the information signals, locally generating control signals for controlling operation of any of the subscriber unit and at least one device coupled to the subscriber unit.
0. 16. The method of
using the information signals to control operation of the at least one device.
0. 17. method of
0. 18. The method of
operating upon the information signals.
0. 19. The method of
providing user data in response to the information signals to the infrastructure; and
responsive to the user data, receiving additional information signals from the infrastructure.
0. 20. The method of
operating, by the at least one device, upon the information signals.
0. 21. The method of
providing user data in response to the information signals to the infrastructure; and
responsive to the user data, providing additional information signals from the infrastructure.
0. 22. The method of
0. 23. The method of
0. 24. A computer-readable medium having computer-executable instructions for performing the steps recited in
0. 25. In a wireless communications system comprising one or more subscriber units in wireless communication with an infrastructure, each of the one or more subscriber units comprising a speech recognition client and the infrastructure comprising a speech recognition server, a method for providing information signals to a subscriber unit of the one or more subscriber units, the method comprising steps of:
receiving, by the speech recognition client, an unencoded speech signal;
analyzing, by the speech recognition client, the unencoded speech signal to provide a parameterized speech signal;
transmitting, by the subscriber unit, the parameterized speech signal to the speech recognition server;
performing, by the speech recognition server, speech recognition analysis on the parameterized speech signal to provide recognized utterances;
determining, by the speech recognition server, the information signals based on the recognized utterances; and
responsive to the recognized speech utterances, providing, by the infrastructure, the information signals to the subscriber unit.
0. 26. The method of
based on the information signals, locally generating, by the subscriber unit, control signals for controlling operation of any of the subscriber unit and at least one device coupled to the subscriber unit.
0. 27. The method of
directing the information signals to the subscriber unit, wherein the information signals control operation of the subscriber unit.
0. 28. The method of
directing the information signals to the at least one device, wherein the information signals control operation of the at least one device.
0. 29. The method of
0. 30. The method of
directing the information signals to the subscriber unit, wherein the subscriber unit operates upon the information signals.
0. 31. The method of
receiving, by the infrastructure, user data in response to the information signals from the subscriber unit; and
responsive to the user data, providing, by the infrastructure, additional information signals to the subscriber unit.
0. 32. The method of
directing the information signals to the at least one device, wherein the at least one device operates upon the information signals.
0. 33. The method of
receiving, by the infrastructure, user data in response to the information signals from the subscriber unit; and
responsive to the user data, providing, by the infrastructure, additional information signals to the subscriber unit.
0. 34. The method of
0. 35. The method of
0. 36. A computer-readable medium having computer-executable instructions for performing the steps recited in
0. 37. A speech recognition server for use in an infrastructure of a wireless communication system, wherein the infrastructure wirelessly communicates with one or more subscriber units, the speech recognition server comprising:
a receiver that takes as input signals regarding a parameterized speech signal output by a subscriber unit of the one or more subscriber units and provides as output the parameterized speech signal;
a speech recognition analyzer, coupled to the receiver, that performs speech recognition analysis on the parameterized speech signal to provide recognized utterances; and
a transmitter, coupled to the speech recognition analyzer, that provides information regarding the recognized utterances to a control entity forming a part of the infrastructure, wherein the control entity provides information signals, based on the information regarding the recognized utterances, to the subscriber unit.
0. 38. A wireless communication system comprising the speech recognition server in accordance with
0. 39. A speech recognition server for use in an infrastructure, wherein the infrastructure wirelessly communicates with one or more subscribe units, the speech recognition server comprising:
a receiver that takes as input signals regarding a parameterized speech signal output by a subscriber unit of the one or more subscriber units and provides as output the parameterized speech signal;
a speech recognition analyzer, coupled to the receiver, that performs speech recognition analysis on the parameterized speech signal to provide recognized utterances;
a control processor that takes as input the recognized utterances and provides information signals based on the recognized utterances; and
a transmitter, coupled to the control processor, that provides the information signals to the subscriber unit of the one or more subscribe units.
0. 40. A wireless communication system comprising the speech recognition server in accordance with
0. 41. A subscriber unit that wirelessly communicates with an infrastructure, the subscriber unit comprising:
a speech recognition client that takes as input an unencoded speech signal and analyzes that unencoded speech signal to provide a parameterized speech signal;
a transmitter, coupled to the speech recognition client, that wirelessly communicates the parameterized speech signal to the infrastructure; and
a receiver that takes as input signals regarding information signals and provides the information signals as output, wherein the information signals are generated by a speech recognition server residing in the infrastructure and are based on the parameterized speech signal.
0. 42. The subscriber unit of
means, coupled to the receiver, for coupling the subscriber unit to at least one device, wherein the information signals are used to control operation of the at least one device.
0. 43. The subscriber unit of
0. 44. The subscriber unit of
0. 45. The subscriber unit of
means, based on the information signals, for locally generating control signals for controlling operation of any of the subscriber unit and at least one device coupled to the subscriber unit.
0. 46. The subscriber unit of
0. 47. The subscriber unit of
0. 48. The subscriber unit of
means, coupled to the receiver, for coupling the subscriber unit to at least one device, wherein the at least one device operates upon the information signals.
0. 49. The subscriber unit of
0. 50. The subscriber unit of
0. 52. The method of claim 51, wherein the received speech signal is an unencoded speech signal.
0. 53. The method of claim 51, wherein the subscriber unit is coupled to at least one device, the method further comprising:
directing the received information signals to the at least one device, wherein the information signals control operation of the at least one device.
0. 54. The method of claim 51, wherein the subscriber unit is coupled to at least one device, the method further comprising:
directing the received information signals to the at least one device, wherein the at least one device operates upon the information signals.
0. 55. The method of claim 51, wherein the predetermined utterance provides an indication of an interrupt.
0. 56. The method of claim 51, further comprising:
performing a speech recognition analysis on the second parameterized speech signal at the speech recognition server.
0. 58. The subscriber unit of claim 57, wherein the received speech signal is an unencoded speech signal.
0. 59. The subscriber unit of claim 57, wherein the subscriber unit is coupled to at least one device, and further wherein the information signals control operation of the at least one device.
0. 60. The subscriber unit of claim 57, wherein the subscriber unit is coupled to at least one device, and further wherein the at least one device operates upon the information signals.
0. 61. The subscriber unit of claim 57, wherein the predetermined utterance provides an indication of an interrupt.
0. 63. The method of claim 62, wherein the subscriber unit is coupled to at least one device, the method further comprising:
directing the information signals to the at least one device, wherein the information signals control operation of the at least one device.
0. 64. The method of claim 62, wherein the subscriber unit is coupled to at least one device, the method further comprising:
directing the information signals to the at least one device, wherein the at least one device operates upon the information signals.
0. 65. The method of claim 62, wherein the query comprises a query for an entity's telephone number.
0. 66. The method of claim 65, wherein the information signals comprise the entity's telephone number.
0. 67. The method of claim 66, wherein the information signals further comprise a control signal instructing the subscriber unit to dial the telephone number.
0. 69. The speech recognition server of claim 68, wherein the subscriber unit is coupled to at least one device, and further wherein the subscriber unit directs the information signals to the at least one device, wherein the information signals control operation of the at least one device.
0. 70. The speech recognition server of claim 68, wherein the subscriber unit is coupled to at least one device, and further wherein the subscriber unit directs the information signals to the at least one device, wherein the at least one device operates upon the information signals.
0. 71. The speech recognition server of claim 68, wherein the query comprises a query for an entity's telephone number.
0. 72. The speech recognition server of claim 71, wherein the information signals comprise the entity's telephone number.
0. 73. The speech recognition server of claim 72, wherein the information signals further comprise a control signal instructing the subscriber unit to dial the telephone number.
0. 75. The method of claim 74, wherein the received speech signal is an unencoded speech signal.
0. 76. The method of claim 74, wherein the subscriber unit is coupled to at least one device, the method further comprising:
directing the received information signals to the at least one device, wherein the information signals control operation of the at least one device.
0. 77. The method of claim 74, wherein the subscriber unit is coupled to at least one device, the method further comprising:
directing the received information signals to the at least one device, wherein the at least one device operates upon the information signals.
0. 78. The method of claim 74, wherein the predetermined utterance provides an indication of an interrupt.
0. 79. The method of claim 74, further comprising:
performing a speech recognition analysis on the second parameterized speech signal at the speech recognition server.
0. 80. A non-transitory computer-readable medium having computer-executable instructions for performing the steps recited in claim 51.
0. 81. A non-transitory computer-readable medium having computer-executable instructions for performing the steps recited in claim 62.
0. 82. A non-transitory computer-readable medium having computer-executable instructions for performing the steps recited in claim 74.
|
The present application, Reissue application Ser. No. 13/891,273 filed on May 10, 2013, is a continuation of Reissue application Ser. No. 13/891,262, filed on May 10, 2013, which is a reissue application of U.S. Pat. No. 6,868,385.
The present invention relates generally to communication systems incorporating speech recognition and, in particular, to the provision of information signals to subscriber units and/or accompanying devices based upon speech recognition analysis.
Speech recognition systems are generally known in the art, particularly in relation to telephony systems. U.S. Pat. Nos. 4,914,692; 5,475,791; 5,708,704; and 5,765,130 illustrate exemplary telephone networks that incorporate speech recognition systems. A common feature of such systems is that the speech recognition element (i.e., the device or devices performing speech recognition) is typically centrally located within the fabric of the telephone network, as opposed to at the subscriber's communication device (i.e., the user's telephone). In a typical application, a combination of speech synthesis and speech recognition elements is deployed within a telephone network or infrastructure. Callers may access the system and, via the speech synthesis element, be presented with information prompts or queries in the form of synthesized speech. A caller will typically provide a spoken response to the synthesized speech and the speech recognition element will process the caller's spoken response in order to provide further service to the caller.
Although a substantial body of prior art exists regarding systems such as those described above, the incorporation of speech recognition systems into wireless communication systems is a relatively new development. In an effort to standardize the application of speech recognition in wireless communication environments, work has recently been initiated by the European Telecommunications Standards Institute (ETSI) on the so-called Aurora Project. A goal of the Aurora Project is to define a global standard for distributed speech recognition systems. Generally, the Aurora Project is proposing to establish a client-server arrangement in which front-end speech recognition processing, such as feature extraction or parameterization, is performed within a subscriber unit (e.g., a handheld wireless communication device such as a cellular telephone). The data provided by the front-end would then be conveyed to a server to perform back-end speech recognition processing.
It is anticipated that the client-server arrangement being proposed by the Aurora Project will adequately address the needs for a distributed speech recognition system. However, it is uncertain at this time what features and services the Aurora Project will enable. For example, efforts are currently under way to develop so-called telematics systems. Telematics systems may be broadly defined to include any technology concerning the delivery of information-based services to users and their devices in their vehicles. However, there does not appear to be any significant solutions defining how speech recognition technology can be incorporated into telematics systems. Thus, it would be advantageous to provide a technique incorporating speech recognition technology into telematics systems, as well as other systems, in order to enable various services.
The present invention provides a technique, principally applicable to wireless communication environments, for providing information to subscriber units based on speech recognition processing. In general, a wireless system in accordance with the present invention comprises at least one subscriber unit in wireless communication with an infrastructure. Preferably, each subscriber unit implements a speech recognition client, whereas the infrastructure comprises a speech recognition server. A given subscriber unit takes as input an unencoded speech signal that is subsequently parameterized by the speech recognition client. The parameterized speech is then provided to the speech recognition server which, in turn, performs speech recognition analysis on the parameterized speech. Information signals, based in part upon any recognized utterances identified by the speech recognition analysis, are subsequently provided to the subscriber unit. The information signals may comprise control signals used to control the subscriber unit itself or to control one or more devices coupled to the subscriber unit. Alternatively, the information signals may comprise data signals to be operated upon by the subscriber unit itself or devices coupled to the subscriber unit. Such data signals can be used to locally develop control signals, or may lead to the provision of additional user data to the speech recognition server which, in turn can respond with additional information signals as described above. In this manner, the present invention provides a technique for enabling services in wireless subscriber units based in part upon a client-server speech recognition model.
The present invention may be more fully described with reference to
The subscriber units may comprise any wireless communication device, such as a handheld cellphone 103 or a wireless communication device residing in a vehicle 102, capable of communicating with a communication infrastructure. It is understood that a variety of subscriber units, other than those shown in
The subscriber units 102-103 wirelessly communicate with the wireless system 110 via the wireless channel 105. The wireless system 110 preferably comprises a cellular system, although those having ordinary skill in the art will recognize that the present invention may be beneficially applied to other types of wireless systems supporting voice communications. The wireless channel 105 is typically a radio frequency (RF) carrier implementing digital transmission techniques and capable of conveying speech and/or data both to and from the subscriber units 102-103. It is understood that other transmission techniques, such as analog techniques, may also be used. In a preferred embodiment, the wireless channel 105 is a wireless packet data channel, such as the General Packet Data Radio Service (GPRS) defined by the European Telecommunications Standards Institute (ETSI). The wireless channel 105 transports data to facilitate communication between a client portion of the client-server speech recognition and synthesis system, and the server portion of the client-server speech recognition and synthesis system. Other information, such as display, control, location, or status information can also be transported across the wireless channel 105.
The wireless system 110 comprises an antenna 112 that receives transmissions conveyed by the wireless channel 105 from the subscriber units 102-103. The antenna 112 also transmits to the subscriber units 102-103 via the wireless channel 105. Data received via the antenna 112 is converted to a data signal and transported to the wireless network 113. Conversely, data from the wireless network 113 is sent to the antenna 112 for transmission. In the context of the present invention, the wireless network 113 comprises those devices necessary to implement a wireless system, such as base stations, controllers, resource allocators, interfaces, databases, etc. as generally known in the art. As those having ordinary skill the art will appreciate, the particular elements incorporated into the wireless network 113 is dependent upon the particular type of wireless system 110 used, e.g., a cellular system, a trunked land-mobile system, etc.
A speech recognition server 115 providing a server portion of a client-server speech recognition and synthesis system may be coupled to the wireless network 113 thereby allowing an operator of the wireless system 110 to provide speech-based services to users of the subscriber units 102-103. A control entity 116 may also be coupled to the wireless network 113. The control entity 116 can be used to send control signals, responsive to input provided by the speech recognition server 115, to the subscriber units 102-103 to control the subscriber units or devices interconnected to the subscriber units. As shown, the control entity 116, which may comprise any suitably programmed general purpose computer, may be coupled to the speech recognition server 115 either through the wireless network 113 or directly, as shown by the dashed interconnection.
As noted above, the infrastructure of the present invention can comprise a variety of systems 110, 120, 130, 140 coupled together via a data network 150. A suitable data network 150 may comprise a private data network using known network technologies, a public network such as the Internet, or a combination thereof. As alternatives, or in addition to, the speech recognition server 115 within the wireless system 110, remote speech recognition servers 123, 132, 143, 145 may be connected in various ways to the data network 150 to provide speech-based services to the subscriber units 102-103. The remote speech recognition servers, when provided, are similarly capable of communicating to with the control entity 116 through the data network 150 and any intervening communication paths.
A computer 122, such as a desktop personal computer or other general-purpose processing device, within a small entity system 120 (such as a small business or home) can be used to implement a speech recognition server 123. Data to and from the subscriber units 102-103 is routed through the wireless system 110 and the data network 150 to the computer 122. Executing stored software algorithms and processes, the computer 122 provides the functionality of the speech recognition server 123, which, in the preferred embodiment, includes the server portions of both a speech recognition system and a speech synthesis system. Where, for example, the computer 122 is a user's personal computer, the speech recognition server software on the computer can be coupled to the user's personal information residing on the computer, such as the user's entail, telephonic book, calendar, or other information. This configuration would allow the user of a subscriber unit to access personal information on their personal computer utilizing a voice-based interface. The client portions of the client-server speech recognition and speech synthesis systems in accordance with the present invention are described in conjunction with
Alternatively, a content provider 130, which has information it would like to make available to users of subscriber units, can connect a speech recognition server 132 to the data network. Offered as a feature or special service, the speech recognition server 132 provides a voice-based interface to users of subscriber units desiring access to the content provider's information (not shown).
Another possible location for a speech recognition server is within an enterprise 140, such as a large corporation or similar entity. The enterprise's internal network 146, such as an Intranet, is connected to the data network 150 via security gateway 142. The security gateway 142 provides, in conjunction with the subscriber units, secure access to the enterprise's internal network 146. As known in the art, the secure access provided in this manner typically rely, in part, upon authentication and encryption technologies. In this manner, secure communications between subscriber units and an internal network 146 via an unsecured data network 150 are provided. Within the enterprise 140, server software implementing a speech recognition server 145 can be provided on a personal computer 144, such as a given employee's workstation. Similar to the configuration described above for use in small entity systems, the workstation approach allows an employee to access work-related or other information through a voice-based interface. Also, similar to the content provider 130 model, the enterprise 140 can provide an internally available speech recognition server 143 to provide access to enterprise databases.
Regardless of where the speech recognition servers of the present invention are deployed, they can be used to implement a variety of speech-based services. For example, operating in conjunction with the control entity 116, when provided, the speech recognition servers enable operational control of subscriber units or devices coupled to the subscriber units. It should be noted that the term speech recognition server, as used throughout this description, is intended to include speech synthesis functionality as well.
The infrastructure of the present invention also provides interconnections between the subscriber units 102-103 and normal telephony systems. This is illustrated in
It is anticipated that the present invention can be applied with particular advantage to in-vehicle systems, as discussed below. When employed in-vehicle, a subscriber unit in accordance with the present invention also includes processing components that would generally be considered part of the vehicle and not part of the subscriber unit. For the purposes of describing the instant invention, it is assigned that such processing components are part of the subscriber unit. It is understood that an actual implementation of a subscriber unit may or may not include such processing components as dictated by design considerations. In a preferred embodiment, the processing components comprise a general-purpose processor (CPU) 201, such as a “POWER PC” by IBM Corp., and a digital signal processor (DSP) 202, such as a DSP56300 series processor by Motorola Inc. The CPU 201 and the DSP 202 are shown in contiguous fashion in
In a preferred embodiment, subscriber units also include a global positioning satellite (GPS) receiver 206 coupled to an antenna 207. The GPS receiver 206 is coupled to the DSP 202 to provide received GPS information. The DSP 202 takes information from GPS receiver 206 and computes location coordinates if the wireless communications device. Alternatively the GPS receiver 206 may provide location information directly to the CPU 201.
Various inputs and outputs of the CPU 201 and DSP 202 are illustrated in
In one embodiment of the present invention, the CPU 201 is coupled through a bi-directional interface 230 to an in-vehicle data bus 208. This data bus 208 allows control and status information to be communicated between various devices 209a-n in the vehicle, such as a cellphone, entertainment system, climate control system, etc. and the CPU 201. It is expected that a suitable data bus 208 will be an ITS Data Bus (IDB) currently in the process of being standardized by the Society of Automotive Engineers. Alternative means of communicating control and status information between various devices may be used such as the short-range, wireless data communication system being defined by the Bluetooth Special Interest Group (SIG). The data bus 208 allows the CPU 201 to control the devices 209 on the vehicle data bus in response to voice commands recognized either by a local speech recognizer or the client-server speech recognizer.
CPU 201 is coupled to the wireless data transceiver 203 via a receive data connection 231 and a transmit data connection 232. These connections 231-232 allow the CPU 201 to receive control information and speech-synthesis information sent from the wireless system 110. The speech-synthesis information is received from a server portion of a client-server speech synthesis via the wireless data channel 105. The CPU 201 decodes the speech-synthesis audio output 211. Any control information received via the receive data connection 231 may be used to control operation of the subscriber unit itself or sent to one or more of the devices in order to control their operation. Additionally, the CPU 201 can send status information, and the output data from the client portion of the client-server speech recognition system, to the wireless system 110. The client portion of the client-server speech recognition system is preferably implemented in software in the DSP 202 and the CPU 201, as described in greater detail below. When supporting speech recognition, the DSP 202 receives speech from the microphone input 220 and processes this audio to provide a parameterized speech signal to the CPU 201. The CPU 201 encodes the parameterized speech signal and sends this information to the wireless data transceiver 203 via the transmit data connection 232 to be sent over the wireless data channel 105 to a speech recognition server in the infrastructure.
The wireless voice transceiver 204 is coupled to the CPU 201 via a bi-directional data bus 233. This data bus allows the CPU 201 to control the operation of the wireless voice transceiver 203 and receive status information from the wireless voice transceiver 204. The wireless voice transceiver 204 is also coupled to the DSP 202 via a transmit audio connection 221 and a receive audio connection 210. When the wireless voice transceiver 204 is being used to facilitate a telephone (cellular) call, audio is received from the microphone input 220 by the DSP 202. The microphone audio is processed (e.g., filtered, compressed, etc.) and provided to the wireless voice transceiver 204 to be transmitted to the cellular infrastructure. Conversely, audio received by wireless voice transceiver 204 is sent via the receive audio connection 210 to the DSP 202 where the audio is processed (e.g., decompressed, filtered, etc.) and provided to the speaker output 211. The processing performed by the DSP 202 will be described in greater detail with regard to
The subscriber unit illustrated in
Finally, the subscriber unit is preferably equipped with an annunciator 255 for providing an indication to a user of the subscriber unit in response to annunciator control 256 that the speech recognition functionality has been activated in response to the interrupt indicator. The annunciator 255 is activated in response to the detection of the interrupt indicator, and may comprise a speaker used to provide an audible indication, such as a limited-duration tone or beep. (Again, the presence of the interrupt indicator can be signaled using either the input device-based signal 260 or the speech-based signal 260a.) In another implementation, the functionality of the annunciator is provided via a software program executed by the DSP 202 that directs audio to the speaker output 211. The speaker may be separate from or the same as the speaker 271 used to render the audio output 211 audible. Alternatively, the annunciator 255 may comprise a display device, such as an LED or LCD display, that provides a visual indicator. The particular form of the annunciator 255 is a matter of design choice, and the present invention need not be limited in this regard. Further still, the annunciator 255 may be connected to the CPU 201 via the bi-directional interface 230 and the in-vehicle data bus 208.
Referring now to
Microphone audio 220 is provided as an input to the subscriber unit. In an automotive environment, the microphone would be a hands-free microphone typically mounted on or near the visor or steering column of the vehicle. Preferably, the microphone audio 220 arrives at the echo cancellation and environmental processing (ECEP) block 301 in digital form. The speaker audio 211 is delivered to the speaker(s) by the ECEP block 301 after undergoing any necessary processing. In a vehicle, such speakers can be mounted under the dashboard. Alternatively, the speaker audio 211 can be routed through an in-vehicle entertainment system to be played through the entertainment system's speaker system. The speaker audio 211 is preferably in a digital format. When a cellular phone call, for example, is in progress, received audio from the cellular phone arrives at the ECEP block 301 via the receive audio connection 210. Likewise, transmit audio is delivered to the cell phone over the transmit audio connection 221.
The ECEP block 301 provides echo cancellation of speaker audio 211 from the microphone audio 220 before delivery, via the transmit audio connection 221, to the wireless voice transceiver 204. This form of echo cancellation is known as acoustic echo cancellation and is well known in the art. For example, U.S. Pat. No. 5,136,599 issued to Amano et al. and titled “Sub-band Acoustic Echo Canceller”, and U.S. Pat. No. 5,561,668 issued to Genter and entitled “Echo Canceler with Subband Attenuation and Noise Injection Control” teach suitable techniques for performing acoustic echo cancellation, the teachings of which patents are hereby incorporated by this reference.
The ECEP block 301 also provides, in addition to echo-cancellation, environmental processing to the microphone audio 220 in order to provide a more pleasant voice signal to the party receiving the audio transmitted by the subscriber unit. One technique that is commonly used is called noise suppression. The hand-free microphone in a vehicle will typically pick up many types of acoustic noise that will be heard by the other party. This technique reduces the perceived background noise that the other party hears and is described, for example, in U.S. Pat. No. 4,811,404 issued to Vilmur et al., the teachings of which patent are hereby incorporated by this reference.
The ECEP block 301 also provides echo-cancellation processing of synthesized speech provided by the speech-synthesis back end 304 via a first audio path 316, which synthesized speech is to be delivered to the speaker(s) via the audio output 211. As in the case with received voice routed to the speaker(s), the speaker audio “echo” which arrives on the microphone audio path 220 is cancelled out. This allows speaker audio that is acoustically coupled to the microphone to be eliminated from the microphone audio before being delivered to the speech recognition front end 302. This type of processing enables what is known in the art as “barge-in”. Barge-in allows a speech recognition system to respond to input speech while output speech is simultaneously being generated by the system. Examples of “barge-in” implementations can be found, for example, in U.S. Pat. Nos. 4,914,692; 5,475,791; 5,708,704; and 5,765,130.
Echo-cancellation microphone audio is supplied to a speech recognition front end 302 via a second audio path 326 whenever speech recognition processing is being performed. Optionally, ECEP block 301 provides background noise information to the speech recognition front end 302 via a first data path 327. This background noise information can be used to improve recognition performance for speech recognition systems operating in noisy environments. A suitable technique for performing such processing is described in U.S. Pat. No. 4,918,732 issued to Gerson et al., the teachings of which patent are hereby incorporated by this reference.
Based on the echo-cancelled microphone audio and, optionally, the background noise information received from the ECEP block 301, the speech recognition front-end 302 generates parameterized speech information. Together, the speech recognition front-end 302 and the speech synthesis back-end 304 provide the core functionality of a client-side portion of a client-server based speech recognition and synthesis system. Parameterized speech information is typically in the form of feature vectors, where a new vector is computed every 10 to 20 msec. One commonly used technique for the parameterization of a speech signal is mel cepstra as described by Davis et al. in “Comparison Of Parametric Representations For Monosyllabic Word Recognition In Continuously Spoken Sentences,” IEEE Transactions on Acoustics Speech and Signal Processing, ASSP-28 (4), pp. 357-366, August 1980, the teachings of which publication are hereby incorporated by this reference.
The parameter vectors computed by the speech recognition front-end 302 are passed to a local speech recognition block 303 via a second data path 325 for local speech recognition processing. The parameter vectors are also optionally passed, via a third data path 323, to a protocol processing block 306 comprising speech application protocol interfaces (API's) and data protocols. In accordance with known techniques, the processing block 306 sends the parameter vectors to the wireless data transceiver 203 via the transmit data connection 232. In turn, the wireless data transceiver 203 conveys the parameter vectors to a server functioning as a part of the client-server based speech recognizer. While a single speech recognition front-end 302 is shown, the local speech recognizer 303 and the client-server based speech recognizer may in fact utilize different speech recognition front-ends.
The local speech recognizer 303 receives the parameter vectors 325 from the speech recognition front-end 302 and performs speech recognition analysis thereon, for example, to determine whether there are any recognizable utterances within the parameterized speech. In one embodiment, the recognized utterances (typically, words) are sent from the local speech recognizer 303 to the protocol processing block 306 via a fourth data path 324, which in turn passes the recognized utterances to various applications 307 for further processing. The applications 307, which may be implemented using either or both of the CPU 201 and DSP 202, can include a detector application that, based on recognized utterances, ascertains that a speech-based interrupt indicator has been received. For example, the detector compares the recognized utterances against a list of predetermined utterances (e.g., “wake up”) searching for a match. When a match is detected, the detector application issues a signal 260a signifying the presence of the interrupt indicator. The =presence of the interrupt indicator, in turn, is used to activate a portion of speech recognition element to begin processing voice-based commands. This is schematically illustrated in
The speech synthesis back end 304 takes as input a parametric representation of speech and converts the parametric representation to a speech signal which is then delivered to ECEP block 301 via the first audio path 316. The particular parametric representation used is a matter of design choice. One commonly used parametric representation is formant parameters as described in Klatt, “Software For A Cascade/Parallel Formant Synthesizer”, Journal of the Acoustical Society of America, Vol. 67, 1980, pp. 971-995. Linear prediction parameters are another commonly used Parametric representation as discussed in Markel et al., Linear Prediction of Speech, Springer Verlag, New York, 1976. The respective teachings of the Klatt and Markel et al. publications are incorporated herein by this reference.
In the case of client-server based speech synthesis, the parametric representation of speech is received from the network via the wireless channel 105, the wireless data transceiver 203 and the protocol processing block 306, where it is forwarded to the speech synthesis back-end via a fifth data path 313. In the case of local speech synthesis, an application 307 would generate a test string to be spoken. This text string would be passed through the protocol processing block 306 via a sixth data path 314 to a local speech synthesizer 305. The local speech synthesizer 305 converts the text string into a parametric representation of the speech signal and passes this parametric representation via a seventh data path 315 to the speech synthesis back-end 304 for conversion to a speech signal.
It should be noted that the receive data connection 231 can be used to transport other received information in addition to speech synthesis information. For example, the other received information may include data (such as display information) and/or control information received from the infrastructure, and code to be downloaded into the system. Likewise, the transmit data connection 232 can be used to transport other transmit information in addition to the parameter vectors computed by the speech recognition front-end 302. For example, the other transmit information may include device status information, device capabilities, and information related to barge-in timing.
Referring now to
A network interface 405 provides connectivity between a CPU 401 and the network connection 411. The network interface 405 routes data from the network 411 to CPU 401 via a receive path 408, and from the CPU 401 to the network connection 411 via a transmit path 410. As part of a client-server arrangement, the CPU 401 communicates with one or more clients (preferably implemented in subscriber units) via the network interface 405 and the network connection 411. In a preferred embodiment, the CPU 401 implements the server portion of the client-server speech recognition and synthesis system. Although not shown, the server illustrated in
A memory 403 stores machine-readable instructions (software) and program data for execution and use by the CPU 401 in implementing the server portion of the client-server arrangement. The operation and structure of this software is further described with reference to
As part of a client-server speech recognition arrangement, the speech recognition analyzer 504 takes speech recognition parameter vectors from a subscriber unit and completes recognition, processing. Recognized words or utterances 507 are then passed to the local control processor 508. A description of the processing required to convert parameter vectors to recognized utterances can be found in Lee et al, “Automatic Speech Recognition: The Development of the Sphinx System”, 1988, the teachings of which publication are herein incorporated by this reference.
The local control processor 508 receives the recognized Utterances 507 from the speech recognition analyzer 504 and other information 508. Generally, the present invention requires a control processor to operate upon the recognized utterances and, based on the recognized utterances, provide control signals. In a preferred embodiment, these control signals are used to subsequently control the operation of a subscriber unit or at least one device coupled to a subscriber unit. To this end, the local control processor may preferably operate in one of two manners. First, the local control processor 508 can implement application programs. One example of a typical application is an electronic assistant as described in U.S. Pat. No. 5,652,789. Alternatively, such applications can run remotely on a remote control processor 516. For example, in the system of
The application program running either on the remote control processor 516 or the local control processor 508 determines a response to the recognized utterances 507 and/or the other information 506. Preferably, the response may comprise a synthesized message and/or control signals. Control signals 513 are relayed from the local control processor 508 to a transmitter (TX) 510. Information 514 to be synthesized, typically text information, is sent from the local control processor 508 to a text-to-speech analyzer 512. The text-to-speech analyzer 512 converts the input text string into a parametric speech recognition. A suitable technique for performing such a conversion is described in Sproat (editor), “Multilingual Text-To-Speech Synthesis: The Bell Labs Approach”, 1977, the teachings of which publication are incorporated herein by this reference. The parametric speech representation 511 from the text-to-speech analyzer 512 is provided to the transmitter 510 that multiplexes, as necessary, the parametric speech representation 511 and the control information 513 over the transmit path 410 for transmission to a subscriber unit. Operating in the same manner just described, the text-to-speech analyzer 512 may also be used to provide synthesized prompts or the like to be played as an output audio signal at a subscriber unit.
Referring now to
As noted above, information signals in the context of the instant invention may comprise data signals that may be operated upon by a subscriber unit or devices coupled thereto, or control signals that may be used to control operation of the subscriber unit or its associated devices. In order to provide information signals in response to the recognized utterances, the recognized utterances may be processed in one of two ways. According to the first method, illustrated by steps 603 and 604, the speech recognition server (e.g., through the local control processor 508) first determines information signals based upon the recognized utterances. For example, this could be done through the use of lookup-tables, pattern matching and/or similar mechanisms to correlate a specific recognized utterance or string of utterances to one or more predefined information signals. For example, if the recognized utterances comprise a query regarding a certain party's phone number, the information signal provided in response may comprise that party's phone number ascertained by accessing a data base of phone members indexed by names. As another example, if the recognized utterances comprise an instruction to establish telephone call with a named party, the information signal may comprise the relevant party's phone number, as determined from a data base, and a control signal instructing the subscriber unit to dial the party's phone number, as determined by yet another data base indexed by command content. A large variety of similar scenarios are readily identifiable by those having ordinary skill in the art. Regardless of the method used, the speech recognition server subsequently provides the resulting information signals to the subscriber unit.
In the second method, the speech recognition server, rather than directly determining any control signals, provides the information regarding the recognized utterances to a control entity or remote control processor at step 605. In this manner, the control entity or remote control processor can perform the same processing as described above relative to steps 603, after which processing, the control entity or remote control processor will route the information signals directly to the subscriber unit. Regardless of the method used, the speech recognition server of the present invention facilitates the provision of information signals to subscriber units in wireless communication systems.
Optional steps 606 and 607 are also illustrated in
Referring now to
At step 704, the subscriber unit receives the information signals, if any, which are based upon the parameterized speech signal. Consequently, at steps 705 and 706, the information signals are operated upon by, or used to control operation of, the subscriber unit itself or any devices coupled to the subscriber unit, as might be the case with an in-vehicle system. It should be noted that when the information signals comprise data, the data can be used to locally generate (i.e., at the subscriber unit) control signals. For example, receipt of a phone number from the infrastructure can be used to trigger a control signal instructing the subscriber unit to dial the phone number. Alternatively, receipt of a voice prompt to be rendered audible may cause the generation of a control signal instructing a stereo coupled to the subscriber unit to reduce the volume of, or mute altogether, its current audio output. Other examples incorporating such functionality are readily identifiable.
Furthermore, optional steps 707 and 708 correspond to, and are the complement of, steps 606 and 607 described above. In particular, at step 707, the subscriber unit provides user data to the infrastructure (i.e., the speech recognition server and/or a control entity). Again, the user data provided at step 707 is responsive to the previously received information signals. At step 708, the subscriber unit receives additional information signals, comprising data and/or control signals, from the infrastructure, which control signals may be operated upon or used to control the subscriber unit or any devices coupled to the subscriber unit.
The present invention as described above provides a unique technique for providing control signals to subscriber units in a wireless communication system. Relying in part upon a client-server speech recognition arrangement, the present invention provides an efficient method for supplying information signals to subscriber units. As a result, the present invention can be used to enable services. What has been described above is merely illustrative of the application of the principles of the present invention. Other arrangements and methods can be implemented by those skilled in the art without departing from the spirit and scope of the present invention.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
4914692, | Dec 29 1987 | THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT | Automatic speech recognition using echo cancellation |
5155760, | Jun 26 1991 | THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT | Voice messaging system with voice activated prompt interrupt |
5199062, | Jan 20 1988 | Phone Base Systems Inc. | Telephone communications system including a digital telephone switch, a voice response unit and a stored program sequence for controlling both the switch and the voice response unit |
5475791, | Aug 13 1993 | Nuance Communications, Inc | Method for recognizing a spoken word in the presence of interfering speech |
5652789, | Sep 30 1994 | ORANGE S A | Network based knowledgeable assistant |
5703881, | Dec 06 1990 | Hughes Electronics Corporation | Multi-subscriber unit for radio communication system and method |
5708704, | Apr 07 1995 | Texas Instruments Incorporated | Speech recognition method and system with improved voice-activated prompt interrupt capability |
5742929, | Apr 21 1992 | Teliasonera AB | Arrangement for comparing subjective dialogue quality in mobile telephone systems |
5758317, | Oct 04 1993 | MOTOROLA SOLUTIONS, INC | Method for voice-based affiliation of an operator identification code to a communication unit |
5765130, | May 21 1996 | SPEECHWORKS INTERNATIONAL, INC | Method and apparatus for facilitating speech barge-in in connection with voice recognition systems |
5768308, | Dec 19 1994 | BlackBerry Limited | System for TDMA mobile-to-mobile VSELP codec bypass |
5774859, | Jan 03 1995 | Cisco Technology, Inc | Information system having a speech interface |
5940793, | Oct 25 1994 | Cisco Technology, Inc | Voice-operated services |
6085080, | Jun 26 1997 | Nokia Mobile Phones Limited | Rejection of incoming and outgoing calls in WLL terminal |
6615171, | Jun 11 1997 | Nuance Communications, Inc | Portable acoustic interface for remote access to automatic speech/speaker recognition server |
6868385, | Oct 05 1999 | Malikie Innovations Limited | Method and apparatus for the provision of information signals based upon speech recognition |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Nov 19 2007 | FASTMOBILE INC | Research In Motion Limited | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 030394 | /0587 | |
May 10 2013 | BlackBerry Limited | (assignment on the face of the patent) | / | |||
Jul 09 2013 | Research In Motion Limited | BlackBerry Limited | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 032495 | /0694 | |
May 11 2023 | BlackBerry Limited | Malikie Innovations Limited | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 064104 | /0103 |
Date | Maintenance Fee Events |
Nov 10 2014 | ASPN: Payor Number Assigned. |
Sep 15 2016 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Date | Maintenance Schedule |
Aug 05 2017 | 4 years fee payment window open |
Feb 05 2018 | 6 months grace period start (w surcharge) |
Aug 05 2018 | patent expiry (for year 4) |
Aug 05 2020 | 2 years to revive unintentionally abandoned end. (for year 4) |
Aug 05 2021 | 8 years fee payment window open |
Feb 05 2022 | 6 months grace period start (w surcharge) |
Aug 05 2022 | patent expiry (for year 8) |
Aug 05 2024 | 2 years to revive unintentionally abandoned end. (for year 8) |
Aug 05 2025 | 12 years fee payment window open |
Feb 05 2026 | 6 months grace period start (w surcharge) |
Aug 05 2026 | patent expiry (for year 12) |
Aug 05 2028 | 2 years to revive unintentionally abandoned end. (for year 12) |