A method of updating speech recognition data including a language model used for speech recognition, the method including obtaining language data including at least one word; detecting a word that does not exist in the language model from among the at least one word; obtaining at least one phoneme sequence regarding the detected word; obtaining components constituting the at least one phoneme sequence by dividing the at least one phoneme sequence into predetermined unit components; determining information regarding probabilities that the respective components constituting each of the at least one phoneme sequence appear during speech recognition; and updating the language model based on the determined probability information.

Patent
   RE49762
Priority
Jan 16 2015
Filed
Sep 28 2021
Issued
Dec 19 2023
Expiry
Jan 16 2035
Assg.orig
Entity
Large
0
65
currently ok
0. 17. A method of performing speech recognition of a voice spoken by a user, the method comprising:
obtaining first audio data based on the voice spoken by the user received by a first electronic device;
obtaining second audio data based on the voice spoken by the user received by a second electronic device;
determining first audio quality of the first audio data;
determining second audio quality of the second audio data;
identifying an electronic device from among the first electronic device and the second electronic device, based on the first audio quality and the second audio quality;
performing speech recognition of a voice received by the identified electronic device; and
outputting a result of the speech recognition at the identified electronic device.
0. 27. A method of performing speech recognition of a voice spoken by a user, the method comprising:
obtaining first audio data based on the voice spoken by the user received by a first electronic device;
obtaining second audio data based on the voice spoken by the user received by a second electronic device;
determining first audio quality of the first audio data;
determining second audio quality of the second audio data;
identifying a closest electronic device that is closest to the user from among the first electronic device and the second electronic device, based on the first audio quality and the second audio quality;
performing speech recognition of a voice received by the closest electronic device; and
outputting a result of the speech recognition at the closest electronic device.
9. A method of performing speech recognition of a voice spoken by a user, the method comprising:
obtaining first audio data based on the voice spoken by the user detected by a first electronic device;
obtaining second audio data based on the voice spoken by the user detected by a second electronic device;
determining first audio quality of the first audio data;
determining second audio quality of the second audio data;
selecting a closest electronic device that is closest to the user from among the first electronic device and the second electronic device, based on the first audio quality and the second audio quality;
performing speech recognition of the voice spoken by the user based on the closest electronic device; and
outputting a result of the speech recognition at the closest electronic device.
1. A method of performing speech recognition of a voice spoken by a user, the method comprising:
obtaining first audio data based on the voice spoken by the user detected by a first electronic device;
obtaining second audio data based on the voice spoken by the user detected by a second electronic device;
determining first audio quality of the first audio data;
determining second audio quality of the second audio data;
selecting audio data from among the first audio data and the second audio data, based on the first audio quality and the second audio quality;
selecting an electronic device that obtained the audio data from among the first electronic device and the second electronic device;
performing speech recognition of the voice spoken by the user, based on the audio data; and
outputting a result of the speech recognition at the electronic device.
0. 22. An electronic device for performing speech recognition of a voice spoken by a user, the electronic device comprising:
a memory storing computer-readable instructions; and
at least one processor when executing the computer-readable instructions configured to:
obtain first audio data based on the voice spoken by the user received by the electronic device,
obtain second audio data based on the voice spoken by the user received by a second electronic device,
determine first audio quality of the first audio data,
determine second audio quality of the second audio data,
identify the electronic device from among the electronic device and the second electronic device, based on the first audio quality and the second audio quality,
perform speech recognition of a voice received by the identified electronic device, and
output a result of the speech recognition at the identified electronic device.
0. 31. An electronic device for performing speech recognition of a voice spoken by a user, the electronic device comprising:
a memory storing computer-readable instructions; and
at least one processor when executing the computer-readable instructions configured to
obtain first audio data based on the voice spoken by the user received by the electronic device,
obtain second audio data based on the voice spoken by the user received by a second electronic device,
determine first audio quality of the first audio data,
determine second audio quality of the second audio data,
identify the electronic device as a closest electronic device that is closest to the user from among the electronic device and the second electronic device, based on the first audio quality and the second audio quality,
perform speech recognition of a voice received by the closest electronic device, and
output a result of the speech recognition at the closest electronic device.
5. An electronic device for performing speech recognition of a voice spoken by a user, the electronic device comprising:
a memory storing computer-readable instructions; and
at least one processor when executing the computer-readable instructions configured to obtain first audio data based on the voice spoken by the user detected by the electronic device, obtain second audio data based on the voice spoken by the user detected by a second electronic device, determine first audio quality of the first audio data, determine second audio quality of the second audio data, select the first audio data from among the first audio data and the second audio data, based on the first audio quality and the second audio quality, select the electronic device that obtained the first audio data from among the electronic device and the second electronic device, perform speech recognition of the voice spoken by the user, based on the audio data, and output a result of the speech recognition at the electronic device.
13. An electronic device for performing speech recognition of a voice spoken by a user, the electronic device comprising:
a memory storing computer-readable instructions; and
at least one processor when executing the computer-readable instructions configured to obtain first audio data based on the voice spoken by the user detected the electronic device, obtain second audio data based on the voice spoken by the user detected by a second electronic device, determine first audio quality of the first audio data, determine second audio quality of the second audio data, select the electronic device as a closest electronic device that is closest to the user from among the electronic device and the second electronic device, based on the first audio quality and the second audio quality, perform speech recognition of the voice spoken by the user based on the closest electronic device, and output a result of the speech recognition at the electronic device that is the closest electronic device closest to the user.
2. The method of claim 1, wherein the first audio quality comprises a first volume of the first audio data, and
wherein the second audio quality comprises a second volume of the second audio data.
3. The method of claim 1, wherein the first audio quality comprises a first signal to noise ratio of the first audio data, and
wherein the second audio quality comprises a second signal to noise ratio of the second audio data.
4. The method of claim 1, wherein the first audio data comprises at least one of a first volume of the first audio data and a first signal to noise ratio of the first audio data, and
wherein the second audio data comprises at least one of a second volume of the second audio data and a second signal to noise ratio of the second audio data.
6. The electronic device of claim 5, wherein the first audio quality comprises a first volume of the first audio data, and
wherein the second audio quality comprises a second volume of the second audio data.
7. The electronic device of claim 5, wherein the first audio quality comprises a first signal to noise ratio of the first audio data, and
wherein the second audio quality comprises a second signal to noise ratio of the second audio data.
8. The electronic device of claim 5, wherein the first audio data comprises at least one of a first volume of the first audio data and a first signal to noise ratio of the first audio data, and
wherein the second audio data comprises at least one of a second volume of the second audio data and a second signal to noise ratio of the second audio data.
10. The method of claim 9, wherein the first audio quality comprises a first volume of the first audio data, and
wherein the second audio quality comprises a second volume of the second audio data.
11. The method of claim 9, wherein the first audio quality comprises a first signal to noise ratio of the first audio data, and
wherein the second audio quality comprises a second signal to noise ratio of the second audio data.
12. The method of claim 9, wherein the first audio data comprises at least one of a first volume of the first audio data and a first signal to noise ratio of the first audio data, and
wherein the second audio data comprises at least one of a second volume of the second audio data and a second signal to noise ratio of the second audio data.
14. The electronic device of claim 13, wherein the first audio quality comprises a first volume of the first audio data, and
wherein the second audio quality comprises a second volume of the second audio data.
15. The electronic device of claim 13, wherein the first audio quality comprises a first signal to noise ratio of the first audio data, and
wherein the second audio quality comprises a second signal to noise ratio of the second audio data.
16. The electronic device of claim 13, wherein the first audio data comprises at least one of a first volume of the first audio data and a first signal to noise ratio of the first audio data, and
wherein the second audio data comprises at least one of a second volume of the second audio data and a second signal to noise ratio of the second audio data.
0. 18. The method of claim 17, wherein the identifying the electronic device from among the first electronic device and the second electronic device comprises:
selecting audio data from among the first audio data and the second audio data, based on the first audio quality and the second audio quality; and
identifying the electronic device that obtained the selected audio data from among the first electronic device and the second electronic device.
0. 19. The method of claim 17, wherein the first audio quality comprises a first volume of the first audio data, and
wherein the second audio quality comprises a second volume of the second audio data.
0. 20. The method of claim 17, wherein the first audio quality comprises a first signal to noise ratio of the first audio data, and
wherein the second audio quality comprises a second signal to noise ratio of the second audio data.
0. 21. The method of claim 17, wherein the first audio data comprises at least one of a first volume of the first audio data and a first signal to noise ratio of the first audio data, and
wherein the second audio data comprises at least one of a second volume of the second audio data and a second signal to noise ratio of the second audio data.
0. 23. The electronic device of claim 22, The electronic device of claim 6, wherein the at least one processor is configured to:
select audio data from among the first audio data and the second audio data, based on the first audio quality and the second audio quality; and
identify the electronic device that obtained the selected audio data from among the electronic device and the second electronic device.
0. 24. The electronic device of claim 22, wherein the first audio quality comprises a first volume of the first audio data, and
wherein the second audio quality comprises a second volume of the second audio data.
0. 25. The electronic device of claim 22, wherein the first audio quality comprises a first signal to noise ratio of the first audio data, and
wherein the second audio quality comprises a second signal to noise ratio of the second audio data.
0. 26. The electronic device of claim 22, wherein the first audio data comprises at least one of a first volume of the first audio data and a first signal to noise ratio of the first audio data, and
wherein the second audio data comprises at least one of a second volume of the second audio data and a second signal to noise ratio of the second audio data.
0. 28. The method of claim 27, wherein the first audio quality comprises a first volume of the first audio data, and
wherein the second audio quality comprises a second volume of the second audio data.
0. 29. The method of claim 27, wherein the first audio quality comprises a first signal to noise ratio of the first audio data, and
wherein the second audio quality comprises a second signal to noise ratio of the second audio data.
0. 30. The method of claim 27, wherein the first audio data comprises at least one of a first volume of the first audio data and a first signal to noise ratio of the first audio data, and
wherein the second audio data comprises at least one of a second volume of the second audio data and a second signal to noise ratio of the second audio data.
0. 32. The electronic device of claim 31, wherein the first audio quality comprises a first volume of the first audio data, and
wherein the second audio quality comprises a second volume of the second audio data.
0. 33. The electronic device of claim 31, wherein the first audio quality comprises a first signal to noise ratio of the first audio data, and
wherein the second audio quality comprises a second signal to noise ratio of the second audio data.
0. 34. The electronic device of claim 31, wherein the first audio data comprises at least one of a first volume of the first audio data and a first signal to noise ratio of the first audio data, and
wherein the second audio data comprises at least one of a second volume of the second audio data and a second signal to noise ratio of the second audio data.

In Equation 1, P(a|b) denotes an appearance probability regarding a under a condition that b appears before a. P1 and P2 denote an appearance probability regarding a included in a first language model and a second language model, respectively ω1 and ω2 denotes weights that may be applied to P1 and P2, respectively. A number of right-side components of Equation 1 may increase according to a number of language models including appearance probability information regarding a.

Weights that may be applied to respective appearance probabilities may be determined based on situation information or various other conditions, e.g., information regarding a user, a region, a command history, a module being executed, etc.

According to Equation 1, an appearance probability may increase as information regarding the appearance probability is included in more language models. On the contrary, an appearance probability may decrease as information regarding the appearance probability is included in less language models. Therefore, a preferable appearance probability may not be determined in the case of determining an appearance probability according to Equation 1.

The language model combining unit 435 may obtain an appearance probability regarding a word or a subword according to Equation 2 based on the Bayesian interpolation. In the case of determining an appearance probability according to Equation 2, the appearance probability may not increase or decrease according to a number of language models including appearance probability information. In the case of an appearance probability included only in a first language model or a second language model, the appearance probability may not decrease and may be maintained according to Equation 2.

P ( a b ) = ω 1 P 1 ( ( b ) P 1 ( a b ) + ω 2 P 2 ( ( b ) P 2 ( a b ) ω 1 P 1 ( b ) + ω 2 P 2 ( b ) ( ω 1 + ω 2 = 1 ) Equation 2

Furthermore, the language model combining unit 435 may obtain an appearance probability according to Equation 3. According to Equation 3, an appearance probability may be the largest one from among appearance probabilities included in the respective language models
P(a|b)=max{P1(a|b),P2(a|b)}  Equation 3:

In the case of determining an appearance probability according to Equation 3, the appearance probability may be the largest one from among the appearance probabilities, and thus an appearance probability regarding a word or subword included one or more times in each of the language models may have a relatively large value. Therefore, according to Equation 3, an appearance probability regarding a word added to language models as a new word according to an embodiment may be falsely reduced.

In the operation S1315, the speech recognition device 430 may obtain a word corresponding to the phoneme sequence determine in the operation S1313 based on segment information. The segment information may include information regarding a correspondence relationship between at least one unit component constituting a phoneme sequence and a word. If a new word is detected according to a method of updating speech recognition data according to an embodiment, segment information regarding each word may be generated as information regarding a new word. If a phoneme sequence is determined as a result of speech recognition based on probability information, the speech recognition device 430 may convert a phoneme sequence to a word based on the segment information, and thus a result of the speech recognition may be output as the word.

FIG. 14 is a block diagram showing a speech recognition system that executes a module based on a result of speech recognition performed based on situation information, according to an embodiment.

Referring to FIG. 14, a speech recognition system 1400 may include a speech recognition data updating device 1420, a speech recognition device 1430, and a user device 1450. The speech recognition data updating device 1420, the speech recognition device 1430, and the user device 1450 may exist as independent deices as shown in FIG. 14. However, the present invention is not limited thereto, and the speech recognition data updating device 1420, the speech recognition device 1430, and the user device 1450 may be included in a single device as components of the device. The speech recognition data updating device 1420 and the speech recognition device 1430 of FIG. 14 may correspond to the speech recognition data updating devices 220 and 420 and the speech recognition devices 230 and 430 described above with reference to FIG. 13, where repeated descriptions will be omitted.

First, a method of updating speech recognition data in consideration of situation information by using the speech recognition system 1400 shown in FIG. 14 will be described.

The speech recognition data updating device 1420 may obtain language data 1410 for updating speech recognition data. The language data 1410 may be obtained from various devices and transmitted to the speech recognition data updating device 1420. For example, the language data 1410 may be obtained by the user device 1450 and transmitted to the speech recognition data updating device 1420.

Furthermore, a situation information managing unit 1451 of the user device 1450 may obtain situation information corresponding to the language data 1410 and transmit the obtained situation information to the speech recognition data updating device 1420. The speech recognition data updating device 1420 may determine a language model to add a new word included in the language data 1410 based on the situation information received from the situation information managing unit 1451. If no language model corresponding to the situation information exists, the speech recognition data updating device 1420 may generate a new language model and add appearance probability information regarding a new word to the newly generated language model.

The speech recognition data updating device 1420 may detect new words ‘Let it go,’ and ‘born born born’ included in the language data 1410. Situation information corresponding to the language data 1410 may include an application A for music playback. Situation information may be determined with respect to the language data 1410 or may also be determined with respect to each of new words included in the language data 1410.

The speech recognition data updating device 1420 may add appearance probability information regarding ‘Let it go’ and ‘born born born’ to at least one language model corresponding to the application A. The speech recognition data updating device 1420 may update speech recognition data by adding appearance probability information regarding a new word to a language model corresponding to situation information. The speech recognition data updating device 1420 may update speech recognition data by re-determining appearance probability information included in the language model to which appearance probability information regarding a new word is added. A language model to which appearance probability information may be added may correspond to one application or a group including at least one application.

The speech recognition data updating device 1420 may update a language model in real time based on a user input. In relation to the speech recognition device 1430 according to an embodiment, a user may issue a voice command to an application or an application group according to a language defined by the user. If only an appearance probability regarding a command ‘Play [Song]’ exists in a language model, appearance probability information regarding a command ‘Let me listen to [Song]’ may be added to the language model based on a user definition.

However, if a language can be determined based on a user definition, an unexpected voice command may be performed as a language defined by another user is applied. Therefore, the speech recognition data updating device 1420 may set an application or a time for application of a language model as a range for applying a language model determined based on a user definition.

The speech recognition data updating device 1420 may update speech recognition data in real time based on situation information received from the situation information managing unit 1451 of the user device 1450. If the user device 1450 is located nearby a movie theater, the user device 1450 may transmit information regarding the corresponding movie theater to the speech recognition data updating device 1420 as situation information. Information regarding a movie theater may include information regarding movies being played at the corresponding movie theater, information regarding restaurants nearby the movie theater, traffic information, etc. The speech recognition data updating device 1420 may collect information regarding the corresponding movie theater via web crawling or from a content provider. Next, the speech recognition data updating device 1420 may update speech recognition data based on the collected information. Therefore, since the speech recognition device 1430 may perform speech recognition in consideration of location of the user device 1450, speech recognition efficiency may be further improved.

Second, a method of performing speech recognition and executing a module based on a result of the speech recognition at the speech recognition system 1400 will be described.

The user device 1450 may include various types of terminal devices that may be used by a user. For example, the user device 1450 may be a mobile phone, a smart phone, a laptop computer, a tablet PC, an e-book terminal, a digital broadcasting device, a personal digital assistant (PDA), a portable multimedia player (PMP), a navigation device, a MP3 player, a digital camera, or a wearable device (e.g., eyeglasses, a wristwatch, a ring, etc.). However, the present invention is not limited thereto.

The user device 1450 according to an embodiment may collect at least one of situation information related to speech data 1440 and the user device 1450 and perform a determined task based on a speech-recognized word that is speech-recognized based on the situation information.

The user device 1450 may include the situation information managing unit 1451, the module selecting and instructing unit 1452, and an application A 1453 for performing a task based on a result of speech recognition.

The situation information managing unit 1451 may collect situation information for selecting a language model during speech recognition at the speech recognition device 1430 and transmit the situation information to the speech recognition device 1430.

Situation information may include information regarding a module being currently executed on the user device 1450, a history of using modules, a history of voice commands, information regarding an application that may be executed on the user device 1450 and corresponds to an existing language model, information regarding a user currently using the user device 1450, etc. The history of using modules and the history of voice commands may include information regarding time points at which the respective modules are used and time points at which the respective voice commands are received, respectively.

Situation information according to an embodiment may be configured as shown in Table 1 below.

TABLE 1
Situation Information
Currently Used Movie Player Module 1
Module
History of Music Player Module 1/1 Day Ago
Module Usage Cable Broadcasting/1 Hour Ago
Music Player Module 1/30 Minutes Ago
History of Home Theater Play [Singer 1] Song/10 Minutes Ago
Voice Command Music Player Module 1/30 Minutes Ago
Application Broadcasting
with Language Music Player Module 1
Movie Player Module 1
Music Player Module 2

The speech recognition device 1430 may select at least one language model to be used during speech recognition based on situation information. If situation information indicates that the speech data 1440 is obtained from the user device 1450 while the application A is being executed, the speech recognition device 1430 may select a language model corresponding to at least one of the application A and the user device 1450.

The module selecting and instructing unit 1452 may select a module based on a result of speech recognition performed by the speech recognition device 1430 and transmit a command to perform a task to the selected module. First, the module selecting and instructing unit 1452 may determine whether the result of speech recognition includes an identifier of a module and a keyword for a command. A keyword for a command may include identifiers indicating commands for requesting a module to perform respective tasks, e.g., play, pause, next, etc.

If a module identifier is included in the result of speech recognition, the module selecting and instructing unit 1452 may select a module corresponding to the module identifier and transmit a command to the selected module.

If a module identifier is not included in the result of speech recognition, the module selecting and instructing unit 1452 may obtain at least one of a keyword for a command included in the result of speech recognition and situation information corresponding to the result of speech recognition. Based on at least one of the keyword for a command and the situation information, the module selecting and instructing unit 1452 may determine a module for performing a task according to the result of speech recognition.

In detail, the module selecting and instructing unit 1452 may determine a module for performing a task based on a keyword for a command Furthermore, the module selecting and instructing unit 1452 may determine a module that is the most suitable for performing the task based on situation information. For example, the module selecting and instructing unit 1452 may determine a module based on an execution frequency or whether the corresponding module is the most recently executed module.

Situation information that may be collected by the module selecting and instructing unit 1452 may include information regarding a module currently being executed on the user device 1450, a history of using modules, a history of voice commands, information regarding an application that corresponding to an existing language model, etc. The history of using modules and the history of voice commands may include information regarding time points at which the modules are used and time points at which the voice commands are received.

Even if a result of speech recognition includes a module identifier, the corresponding module may not be able to perform a task according to a command The module selecting and instructing unit 1452 may determine a module to perform a task as in the case where a result of speech recognition does not include a module identifier.

Referring to FIG. 14, the module selecting and instructing unit 1452 may receive ‘let me listen to Let it go’ from the speech recognition device 1430 as a result of speech recognition. Since the result of speech recognition does not include an application identifier, an application A for performing a task based on the result of speech recognition may be determined based on situation information or a keyword for a command The module selecting and instructing unit 1452 may request the application A to play back a song ‘Let it go.’

FIG. 15 is a diagram showing an example of situation information regarding a module, according to an embodiment.

Referring to FIG. 15, an example of commands of a music player program 1510 for performing a task based on a voice command is shown. The speech recognition data updating device 1520 may correspond to the speech recognition data updating device 1420 of FIG. 14.

The speech recognition data updating device 1520 may receive situation information regarding the music player 25 program 1510 from the user device 1450 and update speech recognition data based on the received situation information.

The situation information regarding the music player program 1510 may include a header 1511, a command language 1512, and music information 1513 as shown in FIG. 15.

The header 1511 may include information for identifying the music player program 1510 and may include information regarding type, storage location, and name of the music player program 1510.

The command language 1512 may include an example of commands regarding the music player program 1510. The music player program 1510 may perform a task when a speech-recognized sentence like the command language 1512 is received. A command of the command language 1512 may also be set by a user.

The music information 1513 may include information regarding music that may be played back by the music player program 1510. For example, the music information 1513 may include identification information regarding music files that may be played back by the music player program 1510 and classification information thereof, such as information regarding albums and singers.

The speech recognition data updating device 1520 may update a second language model regarding the music player program 1510 by using a sentence of the command language 1512 and words included in the music information 1513. For example, the speech recognition data updating device 1520 may obtain appearance probability information by including words included in the music information 1513 in a sentence of the command language 1512.

When a new application is installed, the user device 1450 according to an embodiment may transmit information regarding the application, which includes the header 1511, the command language 1512, and the music information 1513, to the speech recognition data updating device 1520. Furthermore, when a new event regarding an application occurs, the user device 1450 may update information regarding the application, which includes the header 1511, the command language 1512, and the music information 1513, and transmit the updated information to the speech recognition data updating device 1520. Therfore, the speech recognition data updating device 1520 may update a language model based on the latest information regarding the application.

When the speech recognition device 1430 performs speech recognition, the user device 1450 may transmit situation information for performing speech recognition to the speech recognition device 1430. The situation information may include information regarding the music player program shown in FIG. 5.

The situation information may be configured as shown in Table 2.

TABLE 2
Situation Information
Currently Used Memo
Module
Command History Music Player Module 3 Play [Song Title] 1/10
Minutes Ago
Music Player Module 3 Play [Singer 1] Song/15
Minutes Ago
History of Memo - Music Player Module 3/1 Day Ago
Simultaneous Memo - Music Player Module 3/2 Days Ago
Module Usage
Module Music Player Module 1 [Singers 1-3] N Songs
Information Music Player Module 2 [Singers 3-6] N Songs
Music Player Module 3 [Singers 6-8] N Songs
SNS History Music Player Module 1 Stated Once
Music Player Module 2 Stated Four Times
Music Player Module N Stated Twice

The speech recognition device 1430 may determine weights applicable to language models corresponding to respective music player programs based on a history of simultaneous module usages from among situation information shown in Table 2 If a memo program is currently being executed, the speech recognition device 1430 may perform speech recognition by applying a weight to a language model corresponding to a music player program that has been simultaneously used with the memo program. As a voice input is received from a user, if a result of speech recognition performed by the speech recognition device 1430 is output as ‘Play all [Singer 3] songs,’ the module selecting and instructing unit 1432 may determine a module to perform a corresponding task. Since a speech-recognized command does not include a module identifier, the module selecting and instructing unit 1432 may determine a module to perform a corresponding task based on the command and the situation information. In detail, the module selecting and instructing unit 1432 may select a module to play back music according to a command in consideration of various information including a history of simultaneous module usages, a history of recent module usages, and a history of SNS usages included in the situation information. Referring to Table 1, from between music player modules 1 and 2 capable of play back songs of [Singer 3], a number of times that the music player module 2 is mentioned on SNS is greater than the music player module 1, the module selecting and instructing unit 1432 may select the music player module 2. Since the command does not include a module identifier, the module selecting and instructing unit 1432 may finally decide whether to play music by using the selected music player module 2 based on a user input.

The module selecting and instructing unit 1432 may request to perform a plurality of tasks with respect to a plurality of modules according to a speech-recognized command. It is assumed that situation information is configured as shown in Table 3 below.

TABLE 3
Situation Information
Currently Used Home Screen
Module
Command History Music Player Module 3 Play [Song]/10 Minutes Ago
I Will Write Memo/20 Minutes Ago
History of Using Movie Player Module - Volume 1/1 Day Ago
Settings for Movie Player Module - Increase Brightness/1 Day
Using Modules Ago

If a speech-recognized command is ‘show me [Movie],’ the module selecting and instructing unit 1432 may select a movie player module capable of playing back the [Movie] as a module to perform a corresponding task. The module selecting and instructing unit 1432 may determine a plurality of modules to perform a command, other than the movie player module, based on information regarding a history of using settings for using modules from among situation information.

In detail, the module selecting and instructing unit 1432 may select a volume adjusting is module and an, illumination adjusting module for adjusting volume and illumination based on the information regarding the history of using settings for using modules. Next, the module selecting and instructing unit 1432 may transmit requests for adjusting volume and illumination to a module selected based on the information regarding the history of using settings for using modules.

FIG. 16 is a flowchart showing an example of methods of performing speech recognition according to an embodiment.

Referring to FIG. 16, in an operation 1610, the speech recognition device 1430 may obtain speech data to perform speech recognition.

In an operation 1620, the speech recognition device 1430 may obtain situation information regarding the speech data. If an application A for music playback is being executed on the user device 1450 at which the speech data is obtained, the situation information may include situation information indicating that the application A is being executed.

In an operation 1630, the speech recognition device 1430 may determine at least one language model based on the situation information obtained in the operation 1620.

In operations 1640 and 1670, the speech recognition device 1430 may obtain phoneme sequences corresponding to the speech data. Phoneme sequences corresponding to speech data including a speech ‘Let it go’ may include phoneme sequences ‘leritgo’ and ‘naerigo.’ Furthermore, phoneme sequences corresponding to speech data including a speech ‘dulryojyo’ may include phoneme sequences ‘dulryojyo’ and ‘dulyeojyo.’

If a word corresponding to a pronunciation dictionary exists in the obtained phoneme sequences, the speech recognition device 1430 may convert the phoneme sequences to words. Furthermore, a phoneme sequence without a word corresponding to the pronunciation dictionary may be divided into predetermined unit components.

From among the phoneme sequences, since a word corresponding the phoneme sequence ‘leritgo’ does not exist in the pronunciation dictionary, the phoneme sequence ‘leritgo’ may be divided into predetermined unit components. Furthermore, regarding the phoneme sequence ‘naerigo’ from among the phoneme sequences, a correspond word ‘naerigo’ in the pronunciation dictionary and predetermined unit components ‘nae ri go’ may be obtained.

Since words corresponding to the phoneme sequences ‘dulryojyo’ and ‘dulyeojyo’ exist in the pronunciation dictionary, the phoneme sequences ‘dulryojyo’ and ‘dulyeojyo’ may be obtained.

In an operation 1650, the speech recognition device 1430 may determine ‘le rit go’ from among ‘le rit go,’ ‘naerigo,’ and ‘nae ri go’ based on appearance probability information. Furthermore, in an operation 1680, the speech recognition device 1430 may determine “dulryojyo’ from between ‘dulryojyo’ and ‘dulyeojyo’ based on appearance probability information.

From among the phoneme sequences, there are two appearance probability information regarding the phoneme sequence ‘naerigo,’ and thus an appearance probability regarding the phoneme sequence ‘naerigo’ may be determined by combining language models as described above.

In an operation 1660, the speech recognition device 1430 may restore ‘le rit go’ to the original word ‘Let it go’ based on segment information. Since ‘dulryojyo’ is not a divided word and segment information does not include information regarding ‘dulryojyo,’ an operation like the operation 1660 may not be performed thereon.

In an operation 1690, the speech recognition device 1430 may output ‘Let it go dulryojyo’ as a final result of speech recognition.

FIG. 17 is a flowchart showing an example of methods of performing speech recognition according to an embodiment.

Referring to FIG. 17, in an operation 1710, the speech recognition device 1430 may obtain speech data to perform speech recognition.

In an operation 1703, the speech recognition device 1430 may obtain situation information regarding the speech data. In an operation 1730, the speech recognition device 1430 may determine at least one language model based on the situation information obtained in the operation 1720.

In operations 1707, 1713, and 1719, the speech recognition device 1430 may obtain phoneme sequences corresponding to the speech data. Phoneme sequences corresponding to speech data including speeches ‘oneul’ and ‘gim yeon a’ may include ‘oneul’ and ‘gi myeo na,’ respectively. Furthermore, phoneme sequences corresponding to speech data including a speech ‘toyeojyo’ may include ‘boyeojeo’ and ‘toyeojyo.’ However, not limited to the above-stated phoneme sequences, phoneme sequences different from the examples may be obtained according to speech data.

In an operation 1707, the speech recognition device 1430 may obtain a word ‘oneul’ corresponding to the phoneme sequence ‘oneul’ by using a pronunciation dictionary. In an operation 1713, the speech recognition device 1430 may obtain a word ‘gim yeon a’ corresponding to the phoneme sequence ‘gi myeo na’ by using the pronunciation dictionary.

Furthermore, in operations 1713 and 1719, the speech recognition device 1430 may divide ‘gimyeona,’ ‘boyeojyo,’ and ‘boyeojeo’ into designated unit components and obtain ‘gi myeo na,’ ‘bo yeo jyo,’ and ‘bo yeo jeo,’ respectively.

In operations 1709, 1715, and 1721, the speech recognition device 1430 may determine ‘oneul,’ ‘gi myeo na,’ and ‘bo yeo jeo’ based on appearance probability information. From among the phoneme sequences, two appearance probability information may exist in relation to ‘gi myeo na,’ and thus an appearance probability regarding ‘gi myeo na’ may be determined by combining language models as described above.

In operations 1717 and 1723, the speech recognition device 1430 may restore original words ‘gimyeona’ and ‘boyeojyo’ based on segment information. Since ‘oneul’ is not a word divided into predetermined unit components and segment information does not include ‘oneul,’ a restoration operation may not be performed.

In an operation 1725, the speech recognition device 1430 may output ‘oneul gimyeona boyeojyo’ as a final result of speech recognition.

FIG. 18 is a block diagram showing a speech recognition system that executes a plurality of modules according to a result of speech recognition performed based on situation information, according to an embodiment.

Referring to FIG. 18, the speech recognition system 1800 may include a speech recognition data updating device 1820, a speech recognition device 1830, a user device 1850, and external device 1860 and 1870. The speech recognition data updating device 1820, the speech recognition device 1830, and the user device 1850 may be embodied as independent devices as shown in FIG. 18. However, the present invention is not limited thereto, and the speech recognition data updating device 1820, the speech recognition device 1830, and the user device 1850 may be embedded in a single device as components of the device. The speech recognition data updating device 1820 and the speech recognition device 1830 of FIG. 18 may correspond to the speech recognition data updating devices 220 and 420 and the speech recognition devices 230 and 430 described above with reference to FIGS. 1 through 17, where repeated descriptions thereof will be omitted below.

First, a method of updating speech recognition data in consideration of situation information by using the speech recognition system 1800 shown in FIG. 18 will be described.

The speech recognition data updating device 1820 may obtain language data 1810 for updating speech recognition data. Furthermore, a situation information managing unit 1851 of the user device 1850 may obtain information regarding corresponding to the language data 1810 and transmit the obtained situation information to the speech recognition data updating device 1820. The speech recognition data updating device 1820 may determine a language model to add new words included in the language data 1810 based on the situation information received from the situation information managing unit 1851.

The speech recognition data updating device 1820 may detect new words ‘winter kingdom’ and ‘5.1 channels’ included in the language data 1810. Situation information regarding the word ‘winter kingdom’ may include information regarding related to a digital versatile disc (DVD) player device 1860 for movie playback. Furthermore, situation information regarding the word ‘5.1 channels’ may include information regarding a home theatre device 1870 for audio output.

The speech recognition data updating device 1820 may add appearance probability information regarding ‘winter kingdom’ and ‘5.1 channels’ to at least one or more language models respectively corresponding to the DVD player device 1860 and the home theatre device 1870.

Second, a method that the speech recognition system 1800 shown in FIG. 18 performs speech recognition and each device performs a task based on a result of the speech recognition will be described.

The user device 1850 may include various types of terminals that may be used by a user.

The user deice 1850 according to an embodiment may collect at least one of speech data 1840 and situation information regarding the user device 1850. Next, the user device 1850 may request at least one device to perform a task determined according to a speech-recognized language based on situation information.

The user device 1850 may include the situation information managing unit 1851 and a module selecting and instructing unit 1852.

The situation information managing unit 1851 may collect situation information for selecting a language model for speech recognition performed by the speech recognition device 1830 and transmit the situation information to the speech recognition device 1830.

The speech recognition device 1830 may select at least one language model to be used for speech recognition based on situation information. If situation information includes information indicating that the DVD player device 1860 and the home theatre device 1870 are available to be used, the speech recognition device 1830, the speech recognition device 1830 may select language model corresponding to the DVD player device 1860 and the home theatre device 1870. Alternatively, if a voice signal includes a module identifier, the speech recognition device 1830 may select a language model corresponding to the module identifier and perform speech recognition. A module identifier may include information for identifying not only a module, but also a module group or a module type.

The module selecting and instructing unit 1852 may determine at least one device to transmit a command thereto based on a result of speech recognition performed by the speech recognition device 1830 and transmit a command to the determined device.

If a result of speech recognition includes information for identifying a device, the module selecting and instructing unit 1852 may transmit a command to a device corresponding to the identification information.

If a result of speech recognition does not include information for identifying a device, the module selecting and instructing unit 1852 may obtain at least one of a keyword for a command included in the result of the speech recognition and situation information. The module selecting and instructing unit 1852 may determine at least one device for transmit a command thereto based on at least one of the keyword for a command and the situation information.

Referring to FIG. 18, the module selecting and instructing unit 1852 may receive ‘show me winter kingdom in 5.1 channels’ as a result of speech recognition from the speech recognition device 1830. Since the result of the speech recognition does not include a device identifier or an application identifier, the DVD player device 1860 and the home theatre device 1870 to transmit a command thereto may be determined based on situation information or a keyword for a command

In detail, the module selecting and instructing unit 1852 may determine a plurality of devices capable of output sound in 5.1 channels and capable of output moving pictures from among currently available devices. The module selecting and instructing unit 1852 may finally determine a device for performing a command from among the plurality of determined devices based on situation information, such as a history of usages of the respective devices.

Situation information that may be obtained by the situation information managing unit 1851 may be configured as shown below in Table 4.

TABLE 4
Situation Information
Currently Used TV Broadcasting Module
Module
History of TV Broadcasting Module - Home Theater Device/20
Simultaneous Minutes Ago
Module Usage DVD Player Device - Home Theater Device/1 Day
Ago
History of Home Theater Play [Singer 1] Song/10 Minutes
Voice Command Ago
DVD Player Play [Movie 1]/1 Day Ago
Application TV Broadcasting Module
having Language DVD Player Device
Movie Player Module 1
Home Theater Device

Next, the module selecting and instructing unit 1852 may transmit a command to the finally determined device. In detail, based on a result of recognition of a speech ‘show me winter kingdom in 5.1 channels,’ the module selecting and instructing unit 1852 may transmit a command requesting to play back ‘winter kingdom’ to the DVD player device 1860. Furthermore, the module selecting and instructing unit 1852 may transmit a command requesting to output sound signal of the ‘winter kingdom’ in 5.1 channels to the home theatre device 1870.

Therefore, according to an embodiment, based on a single result of speech recognition, commands may be transmitted to a plurality of devices or modules, and the plurality of devices or modules may simultaneously perform tasks. Furthermore, even if a result of speech recognition does not include a module or device identifier, the module selecting and instructing unit 1852 according to an embodiment may determine the most appropriate module or device for performing a task based on a keyword for a command and situation information.

FIG. 19 is a diagram showing an example of a voice command with respect to a plurality of devices, according to an embodiment.

Referring to FIG. 19, based on the module selecting and instructing unit 1922, an example of commands for devices capable of performing tasks according to voice commands are shown. The module selecting and instructing unit 1922 may correspond to a module selecting and instructing unit 1952 of FIG. 17. Furthermore, a DVD player device 1921 and a home theatre device 1923 may correspond to the DVD player device 1860 and the home theatre device 1870 of FIG. 17, respectively.

A speech instruction 1911 is an example of a result of speech recognition that may be output based on a speech recognition according to an embodiment. If the speech instruction 1911 includes name of a video and 5.1 channels, the module selecting and instructing unit 1922 may select the DVD player device 1921 and the home theatre device 1923 capable of playing back the video as devices for transmitting commands thereto.

As shown in FIG. 19, the module selecting and instructing unit 1922 may include headers 1931 and 1934, command languages 1932 and 1935, video information 1933, and a sound preset 1936 in information regarding the DVD player device 1921 and the home theatre device 1923.

The headers 1931 and 1934 may include information for identifying the DVD player device 1921 and the home theatre device 1923, respectively. The headers 1931 and 1934 may include information including types, locations, and names of the respective devices.

The command languages 1932 and 1935 may include examples of commands with respect to the devices 1921 and the 1923. When voices identical to the command languages 1932 and 1935 are received, the respective devices 1921 and the 1923 may perform tasks corresponding to the received commands.

The video information 1933 may include information regarding a video that may be played back by the DVD player device 1921. For example, the video information 1933 may include identification information and detailed information regarding a video file that may be played back by the DVD player device 1921.

The sound preset 1936 may include information about available settings regarding sound output of the home theatre device 1923. If the home theatre device 1923 may be set to 7.1 channels, 5.1 channels, and 2.1 channels, the sound preset 1936 may include 7.1 channels, 5.1 channels, and 2.1 channels as information regarding available settings regarding channels of the home theatre device 1923. Other than channels, the sound preset 1936 may include an equalizer setting, a volume setting, etc., and may further include information regarding various available settings with respect to the home theatre device 1923 based on user settings.

The module selecting and instructing unit 1922 may transmit information 1931 through 1936 regarding the DVD player device 1921 and the home theatre device 1923 to the speech recognition data updating device 1820. The speech recognition data updating device 1820 may update second language models corresponding to the respective devices 1921 and 1923 based on the received information 1931 through 1936.

The speech recognition data updating device 1820 may update language models corresponding to the respective devices 1921 and 1923 by using words included in sentences of the command languages 1932 and 1935, the video infor mation 1933, or the sound preset 1936. For example, the speech recognition data updating device 1820 may include words included in the video information 1933 or the sound preset 1936 in the sentences of the command languages 1932 and 1935 and obtain appearance probability information regarding the same.

FIG. 20 is a block diagram showing an example of speech recognition devices according to an embodiment.

Referring to FIG. 20, a speech recognition device 2000 may include a front-end engine 2010 and a speech recognition engine 2020.

The front-end engine 2010 may receive speech data or language data from the speech recognition device 2000 and output a result of speech recognition regarding the speech data. Furthermore, the front-end engine 2010 may perform a pre-processing with respect to the received speech data or language data and transmit the pre-processed speech data or language data to the speech recognition engine 2020.

The front-end engine 2010 may correspond to the speech recognition data updating devices 220 and 420 described above with reference to FIGS. 1 through 17. The speech recognition engine 2020 may correspond to the speech recognition devices 230 and 430 described above with reference to FIGS. 1 through 18.

Since updating speech recognition data and speech recognition may be respectively performed by independent engines, speech recognition and updating speech recognition may be simultaneously performed in the speech recognition device 2000.

The front-end engine 2010 may include a speech buffer 2011 for receiving speech data and transmitting the speech data to a speech recognizer 2022 and a language model updating unit 2012 for updating the speech recognition. Furthermore, the front-end engine 2010 may include segment information 2013 including information for restoring speech-recognized subwords to a word, according to an embodiment. The front-end engine 2010 may restore subwords speech-recognized by the speech recognizer 2022 to words by using the segment information 2013 and output a speech-recognized language 2014 including the restored words as a result of speech recognition.

The speech recognition engine 2020 may include a language model 2021 updated by the language model updating unit 2012. Furthermore, the speech recognition engine 2020 may include the speech recognizer 2022 capable of performing speech recognition based on the speech data and the language model 2021 received from the speech buffer 2011.

When speech data is input as recording is performed, the speech recognition device 2000 may collect language data including new words at the same time. Next, as speech data including a recorded speech is stored in the speech buffer 2011, the language model updating unit 2012 may update a second language model of the language model 2021 by using the new words. When the second language model is updated, the speech recognizer 2022 may receive the speech data stored in the speech buffer 2011 and perform speech recognition. A speech-recognized language may be transmitted to the front-end engine 2010 and restored based on the segment information 2013. The front-end engine 2010 may output a result of speech recognition including restored words.

FIG. 21 is a block diagram showing an example of performing speech recognition at a display device, according to an embodiment.

Referring to FIG. 21, a display device 2110 may receive speech data from a user, transmit the speech data to a speech recognition server 2120, receive a result of speech recognition from the speech recognition server 2120, and output the result of speech recognition. The display device 2110 may perform a task based on the result of speech recognition.

The display device 2110 may include a language data generating unit 2114 for generating language data for updating speech recognition data at the speech recognition server 2120. The language data generating unit 2114 may generate language data from information currently displayed on the display device 2110 or content information related to the information currently displayed on the display device 2110 and transmit the language data to the speech recognition server 2120. For example, the language data generating unit 2114 may generate language data from a text 2111 and a current broadcasting information 2112 included in content that is currently displayed, is previously displayed, or will be displayed. Furthermore, the language data generating unit 2114 may receive information regarding a conversation displayed on the display device 2110 from a conversation managing unit 2113 and generate language data by using the received information. Information that may be received from the conversation managing unit 2113 may include texts included in a social network service (SNS), texts included in a short message service (SMS), texts included in a multimedia message service (MMS), and information regarding a conversation between the display device 2110 and a user.

A language model updating unit 2121 may update a language model by using language data received from the language data generating unit 2114 of the display device 2110. Next, a speech recognition unit 2122 may perform speech recognition based on the updated language model. If a speech-recognized language includes subwords, a text restoration unit 2123 may perform text restoration based on segment information according to an embodiment. The speech recognition server 2120 may transmit a text-restored and speech-recognized language to the display device 2110, and the display device 2110 may output the speech-recognized language.

In the case of updating speech recognition data by dividing a new word into predetermined unit components according to an embodiment, the display device 2110 may update the speech recognition in a couple of ms. Therefore, the speech recognition server 2120 may immediately add a new word in a text displayed on the display device 2110 to a language model.

A user may not only speak a set command, but also speak name of a broadcasting program that is currently being broadcasted or a text displayed on the display device 2110. Therefore, the speech recognition server 2120 according to an embodiment may receive a text displayed on the display device 2110 or information regarding contents displayed on the display device 2110, which are likely to be spoken. Next, the speech recognition server 2120 may update speech recognition data based on the received information. Since the speech recognition server 2120 is capable of updating a language model in from a couple of ms to a couple of seconds, a new word that is likely to be spoken may be processed to be recognized as soon as the new word is obtained.

FIG. 22 is a block diagram showing an example of updating a language model in consideration of situation information, according to an embodiment.

A speech recognition data updating device 2220 and a speech recognition device 2240 of FIG. 22 may correspond to the speech recognition data updating devices 220 and 420 and the speech recognition devices 230 and 430 shown in FIGS. 2 through 17, respectively.

Referring to FIG. 22, the speech recognition data updating device 2220 may obtain personalized information 2221 from a user device 2210 or a service providing server 2230.

The speech recognition data updating device 2220 may include information regarding a user from the user device 2210, the information including an address book 2211, an installed application list 2212, and a stored album list 2213. However, the present invention is not limited thereto, and the speech recognition data updating device 2220 may receive various information regarding the user device 2210 from the user device 2210.

Since individual users have different articulation patterns from one another, the speech recognition data updating device 2220 may periodically receive information for performing speech recognition for each of the users and store the information in the personalized information 2221. Furthermore, a language model updating unit 2222 of the speech recognition data updating device 2220 may update language models based on the personalized information 2221 of the respective users. Furthermore, the speech recognition data updating device 2220 may collect information regarding service usages collected in relation to the respective users from the service providing server 2230 and store the information in the personalized information 2221.

The service providing server 2230 may include a preferred channel list 2231, a frequently viewed video-on-demand (VOD) 2232, a conversation history 2233, and a speech recognition result history 2234 for each user. In other words, the service providing server 2230 may store information regarding services provided to the user device 2210, e.g., a broadcasting program providing service, a VOD service, a SNS service, a speech recognition service, etc. The collectable information is merely an example and is not limited thereto. The service providing server 2230 may collect various information regarding each of users and transmit the collected information to the speech recognition data updating device 2220. The speech recognition result history 2234 may include information regarding results of speech recognition performed by the speech recognition device 2240 with respect to the respective users.

In detail, the language model updating unit 2222 may determine a second language model 2223 corresponding to each user. In the speech recognition data updating device 2220, at least one second language model 2223 corresponding to each user may exist. If there is no second language model 2223 corresponding to a user, the language model updating unit 2222 may newly generate a second language model 2223 corresponding to the user. Next, the language model updating unit 2222 may update language models corresponding to the respective users based on the personalized information 2221. In detail, the language model updating unit 2222 may detect new words from the personalized information 2221 and update the second language models 2223 corresponding to the respective users by using the detected new words.

A voice recognizer 2241 of the speech recognition device 2240 may perform speech recognition by using the second language models 2223 established with respect to the respective users. When speech data including a voice command is received, the voice recognizer 2241 may perform speech recognition by using the second language model 2223 corresponding to a user who is issuing voice commands.

FIG. 23 is a block diagram showing an example of a speech recognition system including language models corresponding to respective applications, according to an embodiment.

Referring to FIG. 23, a second language model 2323 of a voice recognition data updating device 2320 may be updated or generated based on device information 2321 regarding at least one application installed on a user device 2310. Therefore, each of applications installed in the user device 2310 may not perform speech recognition by itself, and speech recognition may be performed on a separate platform for speech recognition. Next, based on a result of performing speech recognition on the platform for speech recognition, a task may be requested to at least one application.

The user device 2310 may include various types of terminal devices that may be used by a user, where at least one application may be installed thereon. An application 2311 installed on the user device 2310 may include information regarding tasks that may be performed according to commands, For example, the application 2311 may include ‘Play,’ ‘Pause,’ and ‘Stop’ as information regarding tasks corresponding to commands ‘Play,’ ‘Pause,’ and ‘Stop.’ Furthermore, the application 2311 may include information regarding texts that may be included in commands. The user device 2310 may transmit at least one of information regarding tasks of the application 2311 that may be performed based on commands and information regarding texts that may be included in commands to the voice recognition data updating device 2320. The voice recognition data updating device 2320 may perform speech recognition based on the information received from the user device 2310.

The voice recognition data updating device 2320 may include the device information 2321, a language model updating unit 2322, the second language model 2323, and segment information 2324. The voice recognition data updating device 2320 may correspond to the speech recognition data updating devices 220 and 420 shown in FIGS. 2 through 20.

The device information 2321 may include information regarding the application 2311, the information received from the user device 2310. The voice recognition data updating device 2320 may receive at least one of information regarding tasks of the application 2311 that may be performed based on commands and information regarding texts that may be included in commands from the user device 2310. The voice recognition data updating device 2320 may store at least one of the information regarding the application 2311 received from the user device 2310 as the device information 2321. The voice recognition data updating device 2320 may store the device information 2321 for each of the user devices 2310.

The voice recognition data updating device 2320 may receive information regarding the application 2311 from the user device 2310 periodically or when a new event regarding the application 2311 occurs. Alternatively, when the speech recognition device 2330 starts performing speech recognition, the voice recognition data updating device 2320 may request information regarding the application 2311 to the user device 2310. Furthermore, the voice recognition data updating device 2320 may store received information as the device information 2321. Therefore, the voice recognition data updating device 2320 may update a language model based on the latest information regarding the application 2311.

The language model updating unit 2322 may update a language model, which may be used to perform speech recognition, based on the device information 2321. A language model that may be updated based on the device information 2321 may include a second language model corresponding to the user device 2310 from among the at least one second language model 2323. Furthermore, a language model that may be updated based on the device information 2321 may include a second language model corresponding to the application 2311 from among the at least one second language model 2323

The second language model 2323 may include at least one independent language model that may be selectively applied based on situation information. The speech recognition device 2330 may select at least one of the second language models 2323 based on situation information and perform speech recognition by using the selected second language model 2323.

The segment information 2324 may include information regarding predetermined unit components of a new word that may be generated when speech recognition data is updated, according to an embodiment. The voice recognition data updating device 2320 may divide a new word into subwords and update speech recognition data according to an embodiment to add new words to the second language model 2323 in real time. Therefore, when a new word divided into subwords is speech-recognized, a result of speech recognition thereof may include subwords. If speech recognition is performed by the speech recognition device 2330, the segment information 2324 may be used to restore speech-recognized subwords to an original word.

The speech recognition device 2330 may include a speech recognition unit 2331, which performs speech recognition with respect to a received voice command, and a text restoration device 2332, which restores subwords to an original word. The text restoration device 2332 may restore speech-recognized subwords to an original word and output a final result of speech recognition.

FIG. 24 is a diagram showing an example of a user device transmitting a request to perform a task based on a result of speech recognition, according to an embodiment. A user device 2410 may correspond to the user device 1850, 2210, and 2310 of FIG. 18, 22, or 21.

Referring to FIG. 24, if the user device 2410 is a television (TV), a command based on a result of speech recognition may be transmitted via the user device 2410 to external devices including the user device 2410, that is, an air conditioner 2420, a cleaner 2430, and a laundry machine 2450.

When a user issues a voice command at a location a 2440, speech data may be collected by the air conditioner 2420, the cleaner 2430, and the user device 2410. The user device 2410 may compare speech data collected by the user device 2410 to speech data collected by the air conditioner 2420 and the cleaner 2430 in terms of a signal-to-noise ratio (SNR) or volume. As a result of the comparison, the user device 2410 may select speech data of the highest quality and transmit the selected speech data to a speech recognition device for performing speech recognition. Referring to FIG. 24, since the user is at a location closest to the cleaner 2430, speech data collected by the cleaner 2430 may be speech data of the highest quality.

According to an embodiment, speech data may be collected by using a plurality of devices, and thus high quality speech data may be collected even if a user is far from the user device 2410. Therefore, variation of success rates according to distances between a user and the user device 2410 may be reduced.

Furthermore, even if the user is at a location 2460 in a laundry room far from a living room in which the user device 2410 is located, speech data including a voice command of the user may be collected by the laundry machine 2450. The laundry machine 2450 may transmit the collected speech data to the user device 2410, and the user device 2410 may perform a task based on the received speech data. Therefore, the user may issue voice commands at a high success rate regardless a distance to the user device 2410 using various devices.

Hereinafter, a method of performing speech recognition regarding each user will be described in closer details.

FIG. 25 is a block diagram showing a method of generating an personal preferred content list regarding classes of speech data according to an embodiment.

Referring to FIG. 25, the speech recognition device 230 may receive acoustic data 2520 and content information 2530 from speech data and text data 2510. The text data and the acoustic data 2520 may correspond to each other, where the content information 2530 may be obtained from the text data, and the acoustic data 2520 may be obtained from the speech data. The text data may be obtained from a result of performing speech recognition to the speech data.

The acoustic data 2520 may include voice feature information for distinguishing voices of different persons. The speech recognition device 230 may distinguish classes based on the acoustic data 2520 and, if acoustic data 2520 differs with respect to a same user due to difference voice features according to time slots, the acoustic data 2520 may be classified into different classes. The acoustic data 2520 may include feature information regarding speech data, such as an average of pitches indicating how high or low a sound is, a variance, a jitter (change of vibration of vocal cords), a shimmer (regularity of voice waveforms), a duration, an average of Mel frequency cepstral coefficients (MFCC), and a variance.

The content information 2530 may be obtained based on title information included in the text data. The content information 2530 may include a title included in the text data as-is. Furthermore, the content information 2530 may further include words related to a title.

For example, if titles included in the text data are ‘weather’ and ‘professional baseball game result,’ ‘weather information’ related to ‘weather’, and ‘sports news’ and ‘professional baseball replay’ related to ‘news’ and ‘professional baseball game result’ may be obtained as the content information 2540.

The speech recognition device 230 may determine a class related to speech data based on the acoustic data 2520 and the content information 2540 obtained from text data, Classes may include acoustic data and personal preferred content lists corresponding to the respective classes. The speech recognition device 230 may determine a class regarding speech data based on acoustic data and a personal preferred content list regarding the corresponding class.

Since no personal preferred content list exists before speech data is initially classified or is initialized, the speech recognition device 230 may classify speech data based on acoustic data. Next, the speech recognition device 230 may extract the content information 2540 from text data corresponding to the respective classified speech data and generate personal preferred content lists corresponding to the respective classes. Next, weights that are applied to personal preferred content lists during classification may be gradually increased by adding the extracted content information 2540 to the personal preferred content lists during later speech recognition.

A method of updating a class may be performed based on Equation 3-4 below.
Classsimilarity=WaAv+WlLv   [Equation 4]

In Equation 4, Av and Wa respectively denote a class based on acoustic data of speech data and a weight regarding the same, whereas Lv and Wl respectively denote a class based on a personal preferred content list and a weight regarding the same.

Initially, the value of Wl may be 0, and the value of Wl may increase as an personal preferred content list is updated.

Furthermore, the speech recognition device 230 may generate language models corresponding to respective classes based on personal preferred content lists and speech recognition histories of the respective classes. Furthermore, the speech recognition device 230 may generate personalized acoustic models for the respective classes based on speech data corresponding to the respective classes and a global acoustic model by applying a speaker-adaptive algorithm (e.g., a maximum likelihood linear regression (MLLR), a maximum A posterior (MAP), etc.).

During speech recognition, the speech recognition device 230 may identify a class from speech data and determine a language model or an acoustic model corresponding to the identified class. The speech recognition device 230 may perform speech recognition by using the determined language model or acoustic model.

After the speech recognition is performed, the speech recognition data updating device 220 may update a language model and an acoustic model, to which speech-recognized speech data and text data respectively belong, by using a result of the speech recognition.

FIG. 26 is a diagram showing an example of determining a class of speech data, according to an embodiment.

Referring to FIG. 26, each acoustic data may have feature information including acoustic information and content information. Each acoustic data may be indicated by a graph, in which the x-axis indicates acoustic information and the y-axis indicates content information. Acoustic data may be classified into n classes based on acoustic information and content information by using a K-mean clustering method.

FIG. 27 is a flowchart showing a method of updating speech recognition data according to classes of speech data, according to an embodiment. Referring to FIG. 27, in an operation S2701, the speech recognition data updating device 220 may obtain speech data and a text corresponding to the speech data. The speech recognition data updating device 220 may obtain a text corresponding to the speech data as a result of speech ecognition performed by the speech recognition device 230.

In an operation S2703, the speech recognition data updating device 220 may detect the text obtained in the operation S2701 or content information related to the text. For example, content information may further include words related to the text.

In an operation S2705, the speech recognition data updating device 220 may extract acoustic information from the speech data obtained in the operation S2701. The acoustic information that may be extracted in the operation S2705 may include information regarding acoustic features of the speech data and may include the above-stated features information like a pitch, jitter, and shimmer.

In an operation S2707, the speech recognition data updating device 220 may determine a class corresponding to the content information and the acoustic information detected in the operation S2703 and the operation S2705.

In an operation S2709, the speech recognition data updating device 220 may update a language model or an acoustic model corresponding to the class determined in the operation S2707, based on the content information and the acoustic information. The speech recognition data updating device 220 may update a language model by detecting a new word included in the content information. Furthermore, the speech recognition data updating device 220 may update an acoustic model by applying the acoustic information, a global acoustic model, and a speaker-adaptive algorithm.

FIGS. 28 and 29 are diagrams showing examples of acoustic data that may be classified according to embodiments.

Referring to FIG. 28, speech data regarding a plurality of users may be classified into a single class. It is not necessary to classify users with similar acoustic characteristics and similar content preferences into different classes, and thus such users may be classified into a single class.

Referring to FIG. 29, speech data regarding a same user may be classified into different classes based on characteristics of the respective speech data. In the case of a user whose voice differs in the morning and in the evening, acoustic information regarding speech data may be detected differently, and thus speech data regarding the voice in the morning and speech data regarding the voice in the evening may be classified into different classes.

Furthermore, if content information of speech data regarding a same user differs, the speech data may be classified into different classes. For example, a same user may use ‘baby-related’ content for nursing a baby. Therefore, if content information of speech data differs, speech data including voices of a same user may be classified into different classes.

According to an embodiment, the speech recognition device 230 may perform speech recognition by using second language models determined for respective users. Furthermore, in the case where a same device ID is used and users cannot be distinguished with device IDs, users may be classified based on acoustic information and content information of speech data. The speech recognition device 230 may determine an acoustic model or a language model based on the determined class and may perform speech recognition.

Furthermore, if users cannot be distinguished based on acoustic information only due to similarity of voices of the users (e.g., brothers, family members, etc.), the speech recognition device 230 may distinguish classes by further considering content information, thereby performing speaker-adaptive speech recognition.

FIGS. 30 and 31 are block diagrams showing an example of performing a personalized speech recognition method according to an embodiment.

Referring to FIGS. 30 and 31, information for performing personalized speech recognition for respective classes may include language model updating units 3022, 3032, 3122, and 3132 that update second language models 3023, 3033, 3123, and 3133 based on the personalized information 3021, 3031, 3121, and 3131 including information regarding individuals, and segment information 3024, 3034, 3124, and 3134 that may be generated when the second language models 3023, 3033, 3123, and 3133 are updated. The information for performing personalized speech recognition for respective classes may be included in a speech recognition device 3010, which performs speech recognition, or the speech recognition data updating device 220.

When a plurality of persons are articulating, the speech recognition device 3010 may interpolate language model for the respective individuals for speech recognition.

Referring to FIG. 30, an interpolating method using a plurality of language models may be the method as described above with reference to Equations 1 through 3. For example, the speech recognition device 3010 may apply higher weight to a language model corresponding to a person holding a microphone. If a plurality of language models are used according to Equation 1, a word commonly included in the language models may have a high probability. According to Equations 2 and 3, words included in the language model for the respective individuals may be simply combined.

Referring to FIG. 30, if sizes of language models for respective individuals are not large, speech recognition may be performed based on a single language model 3141, which is a combination of the language models for a plurality of persons. As language models are combined, an amount of probabilities to be calculated for speech recognition may be reduced. However, in the case of combining language models, it is necessary to generate a combined language model by re-determining respective probabilities. Therefore, if sizes of language models for respective individuals are small, it is efficient to combine the language models. If a group consisting of a plurality of individuals may be set up in advance, the speech recognition device 3010 may obtain a combined language model regarding the group before a time point at which speech recognition is performed.

FIG. 32 is a block diagram showing the internal configuration of a speech recognition data updating device according to an embodiment. The speech recognition data updating device of FIG. 32 may correspond to the speech recognition data updating device of FIGS. 2 through 23.

The speech recognition data updating device 3200 may include various types of devices that may be used by a user or a server device that may be connected to a user device via a network.

Referring to FIG. 32, the speech recognition data updating device 3200 may include a controller 3210 and a memory 3220.

The controller 3210 may detect new words included in collected language data and update a language model that may be used during speech recognition. In detail, the controller 3210 may convert new words to phoneme sequences, divide each of the phoneme sequences into predetermined unit components, and determine appearance probability information regarding the components of the phoneme sequences. Furthermore, the controller 3210 may update a language model by using the appearance probability information.

The memory 3220 may store the language model updated by the controller 3210.

FIG. 33 is a block diagram showing the internal configuration of a speech recognition device according to an embodiment. The speech recognition device of FIG. 33 may correspond to the speech recognition device of FIGS. 2 through 31.

The speech recognition device 3300 may include various types of devices that may be used by a user or a server device that may be connected to a user device via a network.

Referring to FIG. 33, the speech recognition device 3300 may include a controller 3310 and a communication unit 3320.

The controller 3310 may perform speech recognition by using speech data. In detail, the controller 3310 may obtain at least one phoneme sequence from speech data and obtain appearance probabilities regarding predetermined unit components obtained by dividing the phoneme sequence. Next, the controller 3310 may obtain one phoneme sequence based on the appearance probabilities and output a word corresponding to the phoneme sequence as a speech-recognized word based on segment information regarding the obtained phoneme sequence.

A communication unit 3320 may receive speech data including articulation of a user according to a user input. If the speech recognition device 3300 is a server device, the speech recognition device 3300 may receive speech data from a user device. Next, the communication unit 3320 may transmit a word speech-recognized by the controller 3310 to the user device.

FIG. 34 is a block diagram for describing the configuration of a user device 3400 according to an embodiment.

As shown in FIG. 34, the user device 3400 may include various types of devices that may be used by a user, e.g., a mobile phone, a tablet PC, a PDA, a MP3 player, a kiosk, an electronic frame, a navigation device, a digital TV, and a wearable device, such as a wristwatch or a head mounted display (HMD).

The user device 3400 may correspond to the user device of FIGS. 2 through 24, may receive a user's articulation, transmit the user's articulation to a speech recognition device, receive a speech-recognized language from the speech recognition device, and output the speech-recognized language.

For example, as shown in FIG. 34, the user device 3400 according to embodiments may include not only a display unit 3410 and a controller 3470, but also a memory 3420, a GPS chip 3425, a communication unit 3430, a video processor 3435, an audio processor 3440, a user inputter 3445, a microphone unit 3450, an image pickup unit 3455, a speaker unit 3460, and a motion detecting unit 3465.

Detailed descriptions of the above-stated components will be given below.

The display unit 3410 may include a display panel 3411 and a controller (not shown) for controlling the display panel 3411. The display panel 3411 may be embodied as any of various types of display panels, such as a liquid crystal display (LCD) panel, an organic light emitting diode (OLED) display panel, an active-matrix organic light emitting diode (AM-OLED) panel, and a plasma display panel (PDP). The display panel 3411 may be embodied to be flexible, transparent, or wearable. The display unit 3410 may be combined with a touch panel 3447 of the user inputter 3445 and provided as a touch screen. For example, the touch screen may include an integrated module in which the display panel 3411 and the touch panel 3447 are combined with each other in a stack structure.

The display unit 3410 according to embodiments may display a result of speech recognition under the control of the controller 3470.

The memory 3420 may include at least one of an internal memory (not shown) and an external memory (not shown).

For example, the internal memory may include at least one of a volatile memory (e.g., a dynamic random access memory (DRAM), a static RAM (SRAM), a synchronous dynamic RAM (SDRAM), etc.), a non-volatile memory (e.g., an one time programmable read-only memory (OT-PROM), a programmable ROM (PROM), an erasable/programmable ROM (EPROM), an electrically erasable/programmable ROM (EEPROM), a mask ROM, a flash ROM, etc.), a hard disk drive (HDD), or a solid state disk (SSD). According to an embodiment, the controller 3470 may load a command or data received from at least one of a non-volatile memory or other components to a volatile memory and process the same. Furthermore, the controller 3470 may store data received from or generated by other components in the non-volatile memory.

The external memory may include at least one of a compact flash (CF), a secure digital (SD), a micro secure digital (Micro-SD), a mini secure digital (Mini-SD), an extreme digital (xD), and a memory stick.

The memory 3420 may store various programs and data used for operations of the user device 3400. For example, the memory 3420 may temporarily or permanently store at least one of speech data including articulation of a user and result data of speech recognition based on the speech data.

The controller 3470 may control the display unit 3410 to display a part of information stored in the memory 3420 on the display unit 3410. In other words, the controller 3470 may display a result of speech recognition stored in the 3420 on the display unit 3410. Alternatively, when a user gesture is performed at a region of the display unit 3410, the controller 3470 may perform a control operation corresponding to the user gesture.

The controller 3470 may include at least one of a RAM 3471, a ROM 3472, a CPU 3473, a graphic processing unit (GPU) 3474, and a bus 3475. The RAM 3471, the ROM 3472, the CPU 3473, and the GPU 3474 may be connected to one another via the bus 3475.

The CPU 3473 accesses the memory 3420 and performs a booting operation by using an OS stored in the memory 3420. Next, the CPU 3473 performs various operations by using various programs, contents, and data stored in the memory 3420.

A command set for booting a system is stored in the ROM 3472. For example, when a turn-on command is input and power is supplied to the user device 3400, the CPU 3473 may copy an OS stored in the memory 3420 to the RAM 3471 according to commands stored in the ROM 3472, execute the OS, and boot a system. When the user device 3400 is booted, the CPU 3473 copies various programs stored in the memory 3420 and performs various operations by executing the programs copied to the RAM 3471. When the user device 3400 is booted, the GPU 3474 displays a UI screen image in a region of the display unit 3410. In detail, the GPU 3474 may generate a screen image in which an electronic document including various objects, such as contents, icons, and menus, is displayed. The GPU 3474 calculates property values like coordinates, shapes, sizes, and colors of respective objects based on a layout of the screen image. Next, the GPU 3474 may generate screen images of various layouts including objects based on the calculated property values. Screen images generated by the GPU 3474 may be provided to the display unit 3410 and displayed in respective regions of the display unit 3410.

The GPS chip 3425 may receive GPS signals from a global positioning system (GPS) satellite and calculate a current location of the user device 3400. When a current location of a user is needed for using a navigation program or other purposes, the controller 3470 may calculate the current location of the user by using the GPS chip 3425. For example, the controller 3470 may transmit situation information including a user's location calculated by using the GPS chip 3425 to a speech recognition device or a speech recognition data updating device. A language model may be updated or speech recognition may be performed by the speech recognition device or the speech recognition data updating device based on the situation information.

The communication unit 3430 may perform communications with various types of external devices via various forms of communication protocols. The communication unit 3430 may include at least one of a Wi-Fi chip 3431, a Bluetooth chip 3432, a wireless communication chip 3433, and a NFC chip 3434. The controller 3470 may perform communications with various external device by using the communication unit 3430. For example, the controller 3470 may receive a request for controlling a memo displayed on the display unit 3410 and transmit a result based on the received request to an external device, by using the communication unit 3430.

The Wi-Fi chip 3431 and the Bluetooth chip 3432 may perform communications via the Wi-Fi protocol and the Bluetooth protocol. In the case of using the Wi-Fi chip 3431 or the Bluetooth chip 3432, various connection information, such as a service set identifier (SSID) and a session key, are transmitted and received first, communication is established by using the same, and then various information may be transmitted and received. The wireless communication chip 3433 refers to a chip that performs communications via various communication specifications, such as IEEE, Zigbee, 3rd generation (3G), 3rd generation partnership project (3GPP), and long term evolution (LTE). The NFC chip 3434 refers to a chip that operates according to the near field communication (NFC) protocol that uses 13.56 MHz band from among various RF-ID frequency bands; e.g., 135 kHz band, 13.56 MHz band, 433 MHz band, 860-960 MHz band, and 2.45 GHz band.

The video processor 3435 may process contents received via the communication unit 3430 or video data included in contents stored in the memory 3420. The video processor 3435 may perform various image processing operations with respect to video data, e.g., decoding, scaling, noise filtering, frame rate conversion, resolution conversion, etc.

The audio processor 3440 may process audio data included in contents received via the communication unit 3430 or included in contents stored in the memory 3420. The audio processor 3440 may perform various audio processing operation with respect to audio data, e.g., decoding, amplification, noise filtering, etc. For example, the audio processor 3440 may play back speech data including a user's articulation.

When a program for playing back multimedia content is executed, the controller 3470 may operate the user inputter 3445 and the audio processor 3440 and play back the corresponding content. The speaker unit 3460 may output audio data generated by the audio processor 3440.

The user inputter 3445 may receive various commands input by a user. The user inputter 3445 may include at least one of a key 3446, the touch panel 3447, and a pen recognition panel 3448. The user device 3400 may display various contents or user interfaces based on a user input received from at least one of the key 3446, the touch panel 3447, and the pen recognition panel 3448.

The key 3446 may include various types of keys, such as a mechanical button or a wheel, formed at various regions of the outer surfaces, such as the front surface, side surfaces, or the rear surface, of the user device 3400.

The touch panel 3447 may detect a touch of a user and output a touch event value corresponding to a detected touch signal. If a touch screen (not shown) is formed by combining the touch panel 3447 with the display panel 3411, the touch screen may be embodied as any of various types of touch sensors, such as an capacitive type, a resistive type, and a piezoelectric type. When a body part of a user touches a surface of a capacitive type touch screen, coordinates of the touch is calculated by detecting a micro-electricity induced by the body part of the user. A resistive type touch screen includes two electrode plates arranged inside the touch screen and, when a user touches the touch screen, coordinates of the touch are calculated by detecting a current that flows as an upper plate and a lower plate at the touched location touch each other. A touch event occurring at a touch screen may usually be generated by a finger of a person, but a touch event may also be generated by an object formed of a conductive material for applying a capacitance change.

The pen recognition panel 3448 may detect a proximity pen input or a touch pen input of a touch pen (e.g., a stylus pen or a digitizer pen) operated by a user and output a detected pen proximity event or pen touch event. The pen recognition panel 3448 may be embodied as an electromagnetic resonance (EMR) type panel, for example, and is capable of detecting a touch input or a proximity input based on a change of intensity of an electromagnetic field due to an approach or a touch of a pen. In detail, the pen recognition panel 3448 may include an electromagnetic induction coil sensor (not shown) having a grid structure and an electromagnetic signal processing unit (not shown) that sequentially provides alternated signals having a predetermined frequency to respective loop coils of the electromagnetic induction coil sensor. When a pen including a resonating circuit exists near a loop coil of the pen recognition panel 3448, a magnetic field transmitted by the corresponding loop coil generates a current in the resonating circuit inside the pen based on mutual electromagnetic induction. Based on the current, an induction magnetic field is generated by a coil constituting the resonating circuit inside the pen, and the pen recognition panel 3448 detects the induction magnetic field at a loop coil in signal reception mode, and thus a proximity location or a touch location of the pen may be detected. The pen recognition panel 3448 may be arranged to occupy a predetermined area below the display panel 3411, e.g., an area sufficient to cover the display area of the display panel 3411.

The microphone unit 3450 may receive a user's speech or other sounds and convert the same into audio data. The controller 3470 may use a user's speech input via the microphone unit 3450 for a phone call operation or may convert the user's speech into audio data and store the same in the memory 3420. For example, the controller 3470 may convert a user's speech input via the microphone unit 3450 into audio data, include the converted audio data in a memo, and store the memo including the audio data.

The image pickup unit 3455 may pick up still images or moving pictures under the control of a user. The image pickup unit 3455 may be embodied as a plurality of units, such as a front camera and a rear camera.

If the image pickup unit 3455 and the microphone unit 3450 are arranged, the controller 3470 may perform a control operation based on a user's speech input via the microphone unit 3450 or the user's motion recognized by the image pickup unit 3455. For example, the user device 3400 may operate in a motion control mode or a speech control mode. If the user device 3400 operates in the motion control mode, the controller 3470 may activate the image pickup unit 3455, pick up an image of a user, trace changes of a motion of the user, and perform a control operation corresponding to the same. For example, the controller 3470 may display a memo or an electronic document based on a motion input of a user that is detected by the image pickup unit 3455. If the user device 3400 operates in the speech control mode, the controller 3470 may operate in a speech recognition mode to analyze a user's speech input via the microphone unit 3450 and perform a control operation according to the analyzed speech of the user.

The motion detecting unit 3465 may detect motion of the main body of the user device 3400. The user device 3400 may be rotated or tilted in various directions. Here, the motion detecting unit 3465 may detect motion characteristics, such as a rotating direction, a rotating angle, and a tilted angle, by using at least one of various sensors, such as a geomagnetic sensor, a gyro sensor, and an acceleration sensor. For example, the motion detecting unit 3465 may receive a user's input by detecting a motion of the main body of the user device 3400 and display a memo or an electronic document based on the received input.

Furthermore, although not shown in FIG. 34, according to embodiments, the user device 3400 may further include a USB port via which a USB connector may be connected into the user device 3400, various external input ports to be connected to various external terminals, such as a headset, a mouse, and a LAN, a digital multimedia broadcasting (DMB) chip for receiving and processing DMB signals, and various other sensors.

Names of the above-stated components of the user device 3400 may vary. Furthermore, the user device 3400 according to the present embodiment may include at least one of the above-stated components, where some of the components may be omitted or additional components may be further included.

The present invention can also be embodied as computer readable codes on a computer readable recording medium. The computer readable recording medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, etc.

While the present invention has been particularly shown and described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims. The preferred embodiments should be considered in descriptive sense only and not for purposes of limitation. Therefore, the scope of the present invention is defined not by the detailed description of the invention but by the appended claims, and all differences within the scope will be construed as being included in the present invention.

Lee, Jae-Won, Kim, Nam-Hoon, Kim, Il-Hwan, Lee, Kyung-min, Park, Chi-youn

Patent Priority Assignee Title
Patent Priority Assignee Title
10964310, Jan 16 2015 Samsung Electronics Co., Ltd. Method and device for performing voice recognition using grammar model
5960395, Feb 09 1996 Canon Kabushiki Kaisha Pattern matching method, apparatus and computer readable memory medium for speech recognition using dynamic programming
5963903, Jun 28 1996 Microsoft Technology Licensing, LLC Method and system for dynamically adjusted training for speech recognition
6415257, Aug 26 1999 Sovereign Peak Ventures, LLC System for identifying and adapting a TV-user profile by means of speech technology
6952675, Sep 10 1999 Nuance Communications, Inc Methods and apparatus for voice information registration and recognized sentence specification in accordance with speech recognition
7139715, Apr 14 1997 Nuance Communications, Inc System and method for providing remote automatic speech recognition and text to speech services via a packet network
7310600, Oct 28 1999 Canon Kabushiki Kaisha Language recognition using a similarity measure
7505905, May 13 1999 Nuance Communications, Inc In-the-field adaptation of a large vocabulary automatic speech recognizer (ASR)
7720678, Feb 04 2002 Microsoft Technology Licensing, LLC Systems and methods for managing multiple grammars in a speech recognition system
7822608, Feb 27 2007 Microsoft Technology Licensing, LLC Disambiguating a speech recognition grammar in a multimodal application
7899673, Aug 09 2006 Microsoft Technology Licensing, LLC Automatic pruning of grammars in a multi-application speech recognition interface
8296142, Jan 21 2011 GOOGLE LLC Speech recognition using dock context
8504367, Sep 22 2009 Ricoh Company, Ltd. Speech retrieval apparatus and speech retrieval method
8595008, May 07 2009 LG Electronics Inc. Operation control apparatus and method in multi-voice recognition system
8645139, Jun 03 2009 Samsung Electronics Co., Ltd. Apparatus and method of extending pronunciation dictionary used for speech recognition
9484012, Feb 10 2014 Kabushiki Kaisha Toshiba; Toshiba Digital Solutions Corporation Speech synthesis dictionary generation apparatus, speech synthesis dictionary generation method and computer program product
20020072918,
20020077816,
20020082831,
20020143532,
20020193989,
20040054533,
20080103781,
20080130699,
20080140423,
20080249770,
20100049514,
20100211397,
20110060592,
20110231183,
20120258437,
20130238326,
20140343935,
20150228274,
20150370531,
20160180853,
20170083285,
20170300831,
20180011842,
20200219483,
CN100883657,
CN101558442,
CN102023995,
CN102084417,
CN102968989,
CN104835493,
CN1223739,
CN1409842,
EP3958255,
JP2002290859,
JP2003202890,
JP2006308848,
JP2007286174,
JP2008129318,
KR100883657,
KR101289081,
KR1020070030451,
KR1020090004216,
KR1020090060631,
KR1020100120740,
KR1020100130263,
KR102013014766,
WO2014064324,
WO2014064324,
WO2016114428,
/
Executed onAssignorAssigneeConveyanceFrameReelDoc
Sep 28 2021Samsung Electronics Co., Ltd.(assignment on the face of the patent)
Date Maintenance Fee Events
Sep 28 2021BIG: Entity status set to Undiscounted (note the period is included in the code).
Aug 12 2024M1551: Payment of Maintenance Fee, 4th Year, Large Entity.


Date Maintenance Schedule
Dec 19 20264 years fee payment window open
Jun 19 20276 months grace period start (w surcharge)
Dec 19 2027patent expiry (for year 4)
Dec 19 20292 years to revive unintentionally abandoned end. (for year 4)
Dec 19 20308 years fee payment window open
Jun 19 20316 months grace period start (w surcharge)
Dec 19 2031patent expiry (for year 8)
Dec 19 20332 years to revive unintentionally abandoned end. (for year 8)
Dec 19 203412 years fee payment window open
Jun 19 20356 months grace period start (w surcharge)
Dec 19 2035patent expiry (for year 12)
Dec 19 20372 years to revive unintentionally abandoned end. (for year 12)