The present invention relates to a method and apparatus of translating a language using voice recognition. The present invention provides a method of translating a language using voice recognition, comprising: receiving a voice input comprising a first language; acquiring at least one recognition candidate corresponding to the voice input by performing voice recognition on the voice input; providing a user interface for selecting at least one of the acquired at least one recognition candidate; and outputting a second language corresponding to the selected at least one recognition candidate, wherein the type of the user interface is determined according to the number of the acquired at least one recognition candidate, and an apparatus of translating a language using voice recognition for implementing the above method.
|
1. A method of translating a language using voice recognition, the method comprising:
receiving a voice input comprising a first language;
acquiring at least one recognition candidate corresponding to the voice input by performing voice recognition on the voice input;
providing a plurality of user interfaces comprising at least a first user interface and a second user interface for selecting at least one of the acquired at least one recognition candidate,
wherein the first user interface is provided for a first selection to decrease the acquired at least one recognition candidate,
wherein the second user interface is generated based on a result of the first selection and is provided for a second selection; and
outputting a second language corresponding to the second selection regarding the at least one recognition candidate,
wherein a type of the first user interface is determined according to the number of the acquired at least one recognition candidate.
6. An apparatus of translating a language using voice recognition, the apparatus comprising:
a voice input unit for receiving a voice input comprising a first language;
a voice recognition unit for acquiring at least one recognition candidate corresponding to the voice input by performing voice recognition on the voice input;
a memory storing a plurality of user interfaces comprising a first user interface and a second user interface for selecting at least one of the acquired at least one recognition candidate and a database in which the first and second languages are matched with each other for each of a plurality of categories; and
a controller configured to:
provide the first user interfaces for a first selection to decrease the acquired at least one recognition candidate,
provide the second user interface for a second selection based on a result of the first selection,
wherein a type of the first user interface is determined according to the number of the acquired at least one recognition candidate,
receive the second selection signal for at least one of the acquired at least one recognition candidate through the provided second user interface, and
output translation data comprising the second language corresponding to the selected at least one recognition candidate with reference to the database.
2. The method of
3. The method of
4. The method of
5. The method of
7. The apparatus of
8. The apparatus of
9. The apparatus of
10. The apparatus of
11. The apparatus of
wherein, if the second language is outputted in the form of the voice information, the controller synthesizes the second language into voice and outputs the same by controlling the voice synthesis unit.
|
This nonprovisional application claims priority under 35 U.S.C. §119(a) on Patent Application No. KR 10-2008-0078247 filed in Republic of Korea on Aug. 11, 2008 the entire contents of which are hereby incorporated by reference.
1. Field
This document relates to translating a language using voice recognition. More particularly, this document relates to a method and apparatus of translating a language using voice recognition which allows a user to be efficiently provided with a desired translation result by providing various user interfaces according to a voice recognition result of a voice inputted from the user.
2. Related Art
Voice recognition technology is a technology for implementing a function that allows a computer to recognize a human voice. Recently, the voice recognition technology is being applied so that the existing text-input-based language translation system can be conveniently used in a portable situation. Particularly, in the U.S., there had been developed and commercialized a portable voice translation system for military purposes.
A conventional translation system only translates expressions of limited situations and contexts due to the limitations of language processing technologies. Thus, the translation system presumes limited situations that a user can experience and stores translation results of sentences which are useful for these situations. The conventional translation system classifies limited situations into categories by theme, such as ‘travel’, ‘hotel’, ‘restaurant’, ‘transportation’, etc., and stores several hundreds of useful expressions in each of the categories. Therefore, in the conventional translation system, in order for the user to input a desired expression, the user first has to select a category and then select his or her desired expression.
In the conventional translation system, the reason why a category is firstly selected is to reduce the number of candidates of expressions to be recognized and, as a result, to increase the recognition rate. That is to say, in a case where the user attempts to recognize among all the expressions stored in the translation system without selecting a category, if there is a number of similar expressions according to category, this increases the possibility of false recognition and has a negative effect on the processing speed of the system. Accordingly, the user has the inconvenience of having to know in advance which category his or her desired expression belongs to.
According to a first aspect of the present invention, there is provided a method of translating a language using voice recognition, comprising: receiving a voice input comprising a first language; acquiring at least one recognition candidate corresponding to the voice input by performing voice recognition on the voice input; providing a user interface for selecting at least one of the acquired at least one recognition candidate; and outputting a second language corresponding to the selected at least one recognition candidate, wherein the type of the user interface is determined according to the number of the acquired at least one recognition candidate.
According to a second aspect of the present invention, there is provided a method of translating a language using voice recognition, comprising: receiving a voice input comprising a first language; outputting at least one category each including at least one recognition candidate corresponding to the voice input by performing voice recognition on the voice input; selecting one of the at least one category; outputting the at least one recognition candidate contained in the selected category; selecting any one of the outputted at least one recognition candidate; and outputting translation data corresponding to the selected recognition candidate and comprising a second language.
According to a third aspect of the present invention, there is provided a method of translating a language using voice recognition, comprising: receiving a voice input comprising a first language; acquiring at least one recognition candidate corresponding to the voice input by performing voice recognition on the voice input; providing a user interface for receiving one of expressions related to the voice input; receiving one of expressions related to the voice input through the user interface; outputting at least one recognition candidate related to the received expression; selecting any one of the at least one recognition candidate related to the received expression; and outputting translation data corresponding to the selected recognition candidate and comprising a second language.
According to a fourth aspect of the present invention, there is provided an apparatus of translating a language using voice recognition, comprising: a voice input unit for receiving a voice input comprising a first language; a voice recognition unit for acquiring at least one recognition candidate corresponding to the voice input by performing voice recognition on the voice input; a memory storing a plurality of user interfaces for selecting at least one of the acquired at least one recognition candidate and a database in which the first and second languages are matched with each other for each of a plurality of categories; and a controller for providing any one of the plurality of user interfaces according to the number of the acquired at least one recognition candidate, receiving a selection signal for at least one of the acquired at least one recognition candidate through the provided user interface, and outputting translation data comprising the second language corresponding to the selected at least one recognition candidate with reference to the database.
According to a fifth aspect of the present invention, there is provided a method of translating a language using voice recognition in an electronic device providing a voice recognition function and having a display, the method comprising: receiving a voice input comprising a first language and containing a plurality of words; acquiring a recognition candidate corresponding to the voice input by performing voice recognition on the voice input in units of words; displaying the recognition candidate on the display in such a manner as to distinguish a word having a confidence below a reference value among the plurality of words from the other words; changing the word having a confidence below a reference value into a new word; and replacing the word having a confidence below a reference value with a second language corresponding to the new word and displaying the second language corresponding to the recognition candidate on the display.
According to a sixth aspect of the present invention, there is provided a method of translating a language using voice recognition in an electronic device providing a voice recognition function and having a display, the method comprising: receiving a voice input comprising a first language and containing a plurality of words; and displaying translation data comprising a second language corresponding to the voice input by performing voice recognition on the voice input in units of words, wherein, in the displaying of translation data, a word comprising a second language corresponding to a word having a confidence below a reference value among the plurality of words is displayed distinguishably from the other words comprising the translation data.
According to a seventh aspect of the present invention, there is provided a method of translating a language using voice recognition in an electronic device providing a voice recognition function and having a display, the method comprising: (a) receiving a voice input comprising a first language; (b) acquiring translation data comprising a second language corresponding to the voice input by performing voice recognition on the voice input; (c) storing the translation data in a translation file upon receipt of a first preset command signal; and (d) repetitively performing steps (a) to (c) until a second preset command signal is received, wherein the translation data acquired each time steps (a) to (c) are repetitively performed is added and stored in the translation file.
According to the method and apparatus of translating a language using voice recognition of the present invention, the following effects can be obtained.
According to the present invention, it is possible to allow a user to be efficiently provided with a translation result of a language by providing different user interfaces in accordance with a result of voice recognition on a voice inputted from the user.
According to the present invention, it is possible to allow the user to be provided with a desired language translation result through a minimum search procedure by providing the user with a proper user interface even if there are a number of recognition candidates as a result of performing voice recognition.
According to the present invention, the user can easily and quickly find a language translation result even if he or she does not know the category his or her desired language translation result belongs to.
According to the present invention, in a case where the user cannot obtain a desired language translation result, recognition candidates and translation results acquired as a result of performing voice recognition can be corrected, thereby providing a user interface convenient for the user.
The implementation of this document will be described in detail with reference to the following drawings in which like numerals refer to like elements.
The above-mentioned objectives, features, and advantages will be more apparent by the following detailed description in association with the accompanying drawings. Hereinafter, embodiments of the present invention will be set forth in detail with reference to the accompanying drawings. It is to be noted that the same reference numerals are used to designate the same elements throughout the specification. In addition, detailed descriptions of known functions and configurations incorporated herein is omitted to avoid making the subject matter of the present invention unclear.
Voice recognition technology is an application of the pattern matching technique. That is, characteristic parameters of words or phonemes to be recognized are stored in advance, and, when a voice is inputted, the characteristics of the voice are extracted by analysis, the similarities to the characteristics of the stored words or phonemes are measured, and the most similar word or phoneme is outputted as a recognition result. As voice changes with time, the characteristics of voice are stable only during a short frame. Therefore, the characteristics of voice are analyzed for each frame to generate characteristic vectors, and are expressed in columns of these characteristic vectors.
Methods of voice recognition are roughly classified into two: 1) a voice recognition method in which a voice is regarded as a kind of pattern and the similarity between a registered pattern and an input pattern is measured for recognition; and 2) a voice recognition method in which a unique model is allocated to each target word or phoneme by modeling a voice production process and it is detected which voice model an input voice is most likely produced from. Besides, there are included a method using a nerve circuit network, a combination of various methods, etc. In addition to such a signal processing aspect, a language model containing knowledge information associated with a language system can be applied in the voice recognition method.
The voice input unit 101 receives a user's voice signal. For example, the voice input unit 10 may correspond to a microphone (MIC).
The memory 109 stores a predetermined program for controlling the overall operations of the language translating apparatus 10, and is able to temporarily or permanently store data input and output when the controller 119 carries out general operation of the language translating apparatus 10 and various data processed by the controller 109.
The memory 109 may include a sound model, a recognition dictionary, and a translation database that are required for the operations of the present invention. Further, the memory 109 may include a language model.
The recognition dictionary may include at least one of words, phrases, keywords, and expressions comprising a specific language.
The translation database includes data which matches a plurality of languages with each other. For example, the translation database may include data which matches a first language (Korean) and a second language (English/Japanese/Chinese) with each other. The second language is a term used to discriminate it from the first language, and may be a plurality of languages. For example, the translation database may include data which matches a Korean sentence “ . . . ” with an English sentence “I'd like to make a reservation.”.
The translation database can make the first language and the second language correspond to each other by categories. The categories may be by theme or by language. The categories by theme may include, for example, “general”, “travel”, “hotel”, “flight”, “shopping”, “government offices” and so forth. The categories by language may include, for example, “English”, “Japanese”, “Chinese”, “Spanish” and so forth.
The voice recognition unit 103 performs voice recognition on a voice signal inputted through the voice input unit 102, and acquires at least one recognition candidate corresponding to the recognized voice. For instance, the voice recognition unit 103 can recognize the inputted voice signal by detecting a voice section from the inputted voice signal, performs voice analysis, and then recognizing it in recognition unit. Next, the voice recognition unit 103 can acquire the at least one recognition candidate corresponding to a result of the voice recognition with reference to the recognition dictionary and translation database stored in the memory 109.
The voice synthesis unit 105 converts text into speech by using a TTS (Text-To-Speech) engine. The TTS technology is a technology for converting text information or symbols into human speech and reading them. In the TTS technology, a pronunciation database for all the phonemes of a language is built and the phonemes are connected to generate a continuous speech. At this moment, the amplitude, length, volume, etc. of the speech are adjusted to thus synthesize a natural speech. To this end, a natural language processing technology may be included. The TTS technology can be easily seen in the field of electronic communications, such as CTIs, PCs, PDAs, and mobile phones, and in the field of home appliances, such as recorders, toys, and game devices, and is widely used to contribute to the improvement of productivity in factories or widely used for home automation systems for more convenient everyday living. Since the TTS technology is a well-known technology, a detailed description thereof will be omitted.
The communication unit 107 connects to a wired or wireless network that exists outside the language translating apparatus 10 to send or receive data. For example, the communication unit 107 may include at least one of a broadcasting reception module for receiving a broadcasting signal from a broadcasting station, a mobile communication module capable of connecting to a mobile communication network and sending and receiving data, and a portable internet module capable of connecting to a portable internet network, such as WiBRO or WiMAX, and sending and receiving data. The recognition dictionary and translation database stored in the memory 109 can be updated through the communication unit 107.
The display unit 111 displays various information by a control signal outputted from the controller 119.
The touch device 113 is an input device capable of recognizing an external touch. For example, the user can input various information or commands by touching a certain spot of the touch device 113 with a finger or a stylus pen. The touch device 113 includes, for example, a touch pad, a touch screen, and the like. A device in which the display unit 111 and the touch device 113 are integrated with each other is generally referred to as a touch screen. A touch input to be mentioned in the present invention includes both a physical touch and a proximity touch.
The key input unit 115 is an input device including at least one key. The user can input various information and commands through the key input unit 115.
The voice output unit 117 is a device for outputting voice through a speaker. For example, the voice output unit 117 receives a voice signal from the voice synthesis unit 105 and outputs it.
The controller 119 controls the above-described components, and controls the overall operations of the language translating apparatus 10 according to the embodiments of the present invention.
Hereinafter, referring to the required drawings, a concrete operation of the language translating apparatus 10 using voice recognition according to the embodiments of the present invention and a method of translating a language using voice recognition according to the embodiments of the present invention will be described in detail. Hereinafter, the display unit 111 and the touch device 113 will be described as a touch screen 111 and 113 for convenience.
<Provision of User Interface According to Result of Voice Recognition>
First, a voice comprising a first language is inputted through the voice input unit 101 [S100]. For example, the user says “ . . . (I'd like to make a reservation)” through a microphone provided in the language translating apparatus 10.
The controller 119 performs voice recognition on the voice input, and acquires at least one recognition candidate corresponding to the voice input [S110]. The controller 119 can acquire the at least one recognition candidate with reference to the recognition dictionary and translation database stored in the memory 109.
As stated above, the translation database can store the first language in categories by theme. In this case, the at least one recognition candidate acquired by performing the step S110 may belong to different categories, respectively.
The controller 119 provides a user interface for selecting at least one of the at least one recognition candidate acquired in step S110 [S120]. The type of the user interface may be determined according to the number of the acquired at least one recognition candidate.
The provision of a different user interface according to the number of the acquired recognition candidate in step S120 is to allow the user to easily select a desired recognition candidate among the acquired recognition candidates.
For example, if the number of the acquired recognition candidates is more than a reference value, the user interface may be the one for selecting a category. If the number of the acquired recognition candidates is too large, the provision of all these to the user may cause a great burden and inconvenience to the user. At this moment, if the user is allowed to select a specific category and only the recognition candidate belonging to the category selected by the user is provided to the user, the user can obtain a desired translation result quickly and conveniently. The case where a user interface for selecting a category is provided will be described in detail in a second embodiment of the present invention to be described later.
Furthermore, for example, if the number of the acquired recognition candidates is more than a reference value, the user interface may be the one for selecting one of expressions contained in the voice input. Like the case of selecting a category, if the user is allowed to select one of expressions and only the recognition candidate containing the expression selected by the user is provided to the user, the user can avoid the inconvenience of having to select a desired recognition candidate among too many recognition candidates. The case where a user interface for selecting one of expressions is provided will be described in detail in a third embodiment of the present invention to be described later.
Furthermore, for example, if the number of the acquired recognition candidate is less than a reference value, the user interface may be the one for allowing the user to directly select at least one recognition candidate among the acquired recognition candidates. In this case, as the number of the acquired recognition candidates is less than a reference value, even if the whole acquired recognition candidates are provided to the user, the user can select a desired recognition candidate without inconvenience.
The user interface may be provided, being coupled to various devices. For example, the user interface may be an interface using a touch screen or an interface using voice recognition.
The controller 119 receives a selection signal for at least one of the at least one recognition candidate acquired in step S110 [S113], and a second language corresponding to the at least one recognition candidate selected in step S130 is outputted [S140].
In step S140, the controller 119 may output the second language with reference to the translation database stored in the memory 109. Once step S140 is performed, a recognition candidate for translation is determined. Thus, the second language corresponding to the determined recognition candidate can be determined.
Outputting of the second language can be performed in various manners. For example, the controller 119 may display the second language on the display unit 111. Further, for example, the controller 119 may synthesize the second language into voice and then output the synthesized voice through the voice output unit 117 by controlling the voice synthesis unit 105.
<User Interface for Selecting Category>
First, a voice comprising a first language is inputted through the voice input unit 101 [S200]. Step S200 is identical to step S100 of
The controller 119 performs voice recognition on the voice input by controlling the voice recognition unit 103 [S210], and outputs at least one category each containing at least one recognition candidate corresponding to the voice input [S220]. For example, in a case where an expression “ . . . (Are there)” is inputted as voice from the user, the controller 119 performs voice recognition and displays, on the touch screen 111 and 113, four categories containing recognition candidates corresponding to “ . . . (Are there)”. The four categories displayed in step S220 of
In step S220, the controller 119 may output the number of the at least one recognition candidate contained in each of the at least one category. For example, referring to step S220 of
The controller 119 receives a selection signal for one of the at least one category selected in step S230 [240]. For example, as shown in
In step S240, if there exists no recognition candidate desired by the user among the outputted recognition candidates, the display can move directly to other categories without returning to the upper menu.
The controller 119 receives a selection signal for at least one recognition candidate among the recognition candidates outputted in step S240 [S250]. For example, as shown in
The controller 119 outputs translation data comprising a second language corresponding to at least one recognition candidate selected in step S250 [S260]. For instance, as shown in
In the above-described method of translating a language using voice recognition according to the second embodiment of the present invention, categories by language as well as the categories by theme can be provided to the user.
For example, the user interface for providing categories by language can be provided in step S220 or in step S250 in
In
Each of the categories by language may provide recognition candidates comprising a corresponding language. For example, in
<User Interface for Selecting One of Expressions>
First, a voice comprising a first language is inputted through the voice input unit 101 [S300]. Step S300 is identical to step S100 of
The controller 119 performs voice recognition on the voice input by controlling the voice recognition unit 103 and acquires at least one recognition candidate corresponding to the voice input [S310]. For example, in
The controller 119 provides a user interface for receiving one of expressions related to the voice input [S320], and receives one of expressions related to the voice input through the provided user interface [S330].
As mentioned in the first embodiment of the present invention, the user interface provided in step S320 is provided to allow the user to quickly and efficiently access his or her desired recognition candidate if there are too many recognition candidates corresponding to the expression spoken in voice by the user. For example, in
The user interface for receiving one of expressions may be implemented in various forms. As shown in
Here, the user may input by touching his or her desired expression among the expressions displayed on the touch screen 111 and 113.
The expressions may be ones contained in the voice input or keywords related to the voice input. Regarding what kind of expression the user is to be requested to use, the controller 119 may determine in a way advantageous to reduce the number of recognition candidates. For example, if it is judged that the number of recognition candidates will significantly decrease upon receipt of a correct selection from the user about one of expressions related to the voice inputted from the user, the controller 110 may request the user to confirm one of expressions related to the voice input as shown in
The controller 119 outputs at least one recognition candidate related to the expression inputted in step S330 among the at least one recognition candidate acquired in step S310 [S340]. For example, as shown in
The controller 119 receives a selection signal for at least one of the at least one recognition candidates related to the inputted expression [S350]. For example, if the user says “No. 1”, a recognition candidate “ . . . (Are there any seat?)” can be selected. In addition, for example, if the user touches a specific recognition candidate among the recognition candidates displayed on the touch screen 111 and 113, his or her desired recognition candidate can be selected.
The controller 119 outputs translation data comprising a second language corresponding to the at least one recognition candidate selected in step S350 [S360]. Step S360 corresponds to step S260 of
<User Interface for Correcting Recognition Candidate>
First, a voice comprising a first language is inputted through the voice input unit 101 [S400]. Step S400 is identical to step S100 of
The words mentioned in the present invention may be one word or a word group comprising more two or more words. For example, the words mentioned in the present invention may be one noun, a word group comprising one noun and one postposition, or a word group comprising one noun and one preposition.
The controller 119 acquires recognition candidates corresponding to the voice input by performing voice recognition by controlling the voice recognition unit 103 [S410]. The voice recognition may be performed in units of words. The recognition dictionary and translation database stored in the memory 109 may have a structure required to perform voice recognition in units of words. For example, in matching a first language and a second language, the translation database may contain both matching information for matching between specific sentences and matching information for each word comprising the specific sentences.
The controller 119 displays the recognition candidates acquired in step S410 on the touch screen 111 and 113 in such a manner as to distinguish a word having a confidence below a reference value among the plurality of words comprising the voice inputted in step S400 from the other words [S420]. For example, in step S420 of
Since a confidence measure algorithm for measuring the confidence of a voice recognition result is a well-known technique, a detailed description thereof will be omitted. This technique related to confidence measurement is called utterance verification.
In step S420, the word having a confidence below a reference value is displayed in a different shape or color from the other words, so that the user can distinguish a word having a low confidence from the other words. For example, in step S420 of
The controller 119 changes the word having a confidence below a reference value into a new word [S430]. Step S430 may comprise steps S431 and S432 to be described later.
If the user selects the word having a confidence below a reference value [S431 of
The controller 119 may replace the word having a confidence below a reference value with a second language corresponding to the new word changed in step S430, and display the second language corresponding to the recognition candidate on the touch screen 111 and 113 [S440]. For example, the recognition candidate acquired in step S420 is “ . . . ”, and the corresponding second language is “Are there any seat on the flight”. However, since “ . . . (on the flight” is changed into “ . . . (on the flight)” in step S430, as shown in step S440 of
The user may register the final translation result outputted in step S440 in the translation database. If there exists no sentence containing a specific word desired by the user in the translation database, the user can register a new sentence by using the sentences existing in the translation database.
<User Interface for Correcting Translation Results>
First, a voice comprising a first language is inputted through the voice input unit 101 [S500]. Step S500 is identical to step S100 of
The controller 119 performs voice recognition on the voice inputted in step S500 [S510], and displays translation data comprising a second language corresponding to the voice input on the touch screen 111 and 113 [S520]. Here, the controller 119 displays the second language corresponding to a word having a confidence below a reference value among the plurality of words in such a manner as to distinguish the word from the other words comprising the translation data. A method of displaying the second language corresponding to the word having a confidence below a reference value is identical to that described in the fourth embodiment. For example, referring to
For example, referring to
The user may select a word displayed distinguishably from the other words in step S520 [S530]. For example, in step S530 of
The controller 119 may provide a user interface for changing the selected word according to the performing of step S530 [S540]. For example, referring to step S540 of
In addition, for example, in step S540, the controller 119 may provide a Korean-to-English dictionary for changing “flight” into other English words (not shown). When the user search for a desired Korean word by using a search window provided in the Korean-to-English dictionary.
The controller 119 may output a new translation result, as shown in S560 of
Like in the fourth embodiment of the present invention, the user may register the new translation result outputted in step S560 in the translation database.
<Management and Cumulative Storage of Translation Result File>
First, a voice comprising a first language is inputted through the voice input unit 101 [S600]. Step S600 is identical to step S100 of
The controller 119 may acquire and output translation data comprising a second language corresponding to the input voice by performing voice recognition on the input voice [S610].
The controller 119 judges whether a first command signal about the storage of the acquired translation data is received or not [S620]. If the first command signal is received, the translation data acquired in step S610 is stored in a specific translation file [S630].
There may be various sources of the first command signal. For example, the controller 119 may receive the first command signal through an input device provided in the language translating apparatus 10. In addition, for example, the controller 119 may receive the first command signal by an internal setup algorithm preset and stored in the memory 109. If the first command signal is transmitted to the controller 119 by the internal setup algorithm, the user does not need to give a direct command to the language translating apparatus 10.
The controller 119 judges whether a second command signal about the completion of the storage of the acquired translation data is received or not [S640], and if the second command signal is received, completes the storage of the translation data.
As a result of the judgment of step S640, if the second command signal is not received, the controller 119 returns to step S600 in order to continuously receive voice from the outside, and repetitively performs steps S600 to S640.
In the sixth embodiment of the present invention, the translation data acquired according to the repetitive performing of steps S600 to S640 may be added and stored in the translation file. Additional storage in the translation file means that, for example, a translation result corresponding to a voice spoken by the user is cumulatively stored in the same translation file.
For example, if the user pronounces “Korean(1)” [S600], “English(1)”, which is translation data corresponding to “Korean(1)”, is acquired [S610], and stored in the translation file [S630] according to a user command [S620]. Then, the user does not complete the storage of the translation data [S640]. If “Korean(2)” is pronounced [S610], “English(2)”, which is translation data corresponding to “Korean(2)”, is acquired [S610], and additionally stored [S630] in the translation file according to a user command [S620].
At this moment, the translation file stores both “English(1)” and “English(2)” therein. The translation file may match and store “Korean(1)” and “English(1)” and match and store “Korean(2)” and “English(2)”.
By the cumulative storage of translation results according to the embodiment of the present invention, translation results of a plurality of sentences can be managed in a single file, and the single file can be outputted in voice, thereby delivering a lot of meanings to the user.
In the above-described various embodiments, if a specific search result is outputted, the present invention can provide a re-search function in the outputted specific search result. By using the re-search function, the user can efficiently access a desired search result by performing search on the outputted specific search result.
While the various embodiments described above have been focused on a case in which all of operations are achieved in the same language translating apparatus, the present invention is not limited such embodiments. The above-described embodiments can be implemented in a plurality of apparatuses related to language translation. In a typical example, a mobile terminal and a server can be assumed.
<Implementation in System Comprising Mobile Terminal and Server>
As shown in
The mobile terminal as shown in
The server 20 may store data related to language translation stored in the memory 109 of the language translating apparatus 10 in the foregoing embodiments. For example, the server 20 may store an acoustic model, a recognition dictionary, and a translation database. In addition, the server 20 may store a language model. Details of the acoustic model, recognition dictionary, translation database, and language model are identical to those described above. For example, the translation database may contain a first language and a second language that are matched with each other by category.
Further, the server 20 may comprise the voice recognition unit 103 provided in the language translating apparatus 10 in the foregoing embodiments.
The network as shown in
The mobile terminal 10 receives a voice comprising a first language through a voice input unit 101 [S700]. Step 700 is identical to step S200 of
The mobile terminal 10 transmits the voice input to the server 20 through a communication unit 107 [S702].
The server 20 receives the voice input transmitted from the mobile terminal 10, performs voice recognition on the received voice [S704], and acquires at least one category each comprising at least one recognition candidate corresponding to the received voice [S706]. Each of steps S704 and S706 may correspond to steps S210 and S220 of
The server 20 transmits information about the at least one category acquired in step S706 to the mobile terminal 10 [S708]. At this moment, the server 20 may transmit the recognition candidates included in each of the acquired at least one category along with category information, or may transmit only the category information.
The mobile terminal 10 receives information about the at least one category through the communication unit 107 from the server 20, and outputs information about the received at least one category [S710]. Step S710 may correspond to step S220 of
The mobile terminal 10 receives a selection signal about one of the outputted at least one category [S712]. Step S712 may correspond to step S230 of
The mobile terminal 10 transmits information about the selected category to the server 20 through the communication unit 107 [S714].
The server 20 receives information about the selected category from the mobile terminal 10, acquires recognition candidates contained in the selected category [S716], and transmits the acquired recognition candidates to the mobile terminal 10 [S718]. Here, the server 20 may transmit the acquired recognition candidates only in the first language, or may transmit the acquired recognition candidates in the first language and the second language corresponding to the acquired recognition candidates.
The mobile terminal 10 receives and outputs the acquired recognition candidates through the communication unit 107 from the server 20, and receives a selection signal for at least one of the outputted recognition candidates [S720].
Step S714, step S718, and step S720 may correspond to step S240 and step S250 of
The mobile terminal 10 outputs translation data comprising a second language corresponding to the selected at least one recognition candidate [S722]. Step S722 may correspond to step S260 of
Here, in step S718, in the case that the server 20 has transmitted the recognition candidates only comprising the first language to the mobile terminal 10, the mobile terminal 10 has to request the server 20 for translation data comprising the second language corresponding to the selected at least one recognition candidate. In step S718, in the case that the server 20 has transmitted the recognition candidates comprising the first language and the second language corresponding to the recognition candidates to the mobile terminal 10, the mobile terminal 10 does not have to make a request for a data transmission to the server 20.
The previous seventh embodiment of the present invention has been described with respect to a case where the server 20 comprises both a voice recognition module and a translation-related data. However, the server 20 may comprises translation-related data only, and the voice recognition process may be achieved in the mobile terminal 10 like in the foregoing embodiments. In this case, step S704 may be performed in the mobile terminal 10. To this end, the mobile terminal 10 may comprise the voice recognition unit 103. Further, step S702 may be replaced by the step in which the mobile terminal 10 perform voice recognition and then transmits a result of voice recognition to the server 20.
The above-described method of translating a language using voice recognition according to the present invention can be recorded in a computer-readable recording medium by being prepared as a program for execution in computer.
The method of translating a language using voice recognition according to the present invention can be realized as software. When the present invention is realized as software, components of the present invention are embodied as code segments for executing required operations. A program or the code segments can be stored in a processor readable recording medium and transmitted as computer data signals combined with a carrier using a transmission medium or a communication network.
The computer readable recording medium includes any type of recording device that can store data which can be thereafter read by a computer system. Examples of the computer readable recording medium also include read-only memory (ROM), random-access memory (RAM), CD-ROMs, DVD±ROMs, magnetic tapes, floppy disks, hard disks, optical data storage devices, and so forth. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
One advantage of the present invention is to provide a method and apparatus of translating a language using voice recognition which can efficiently provide a translation result of a language a user speaks.
Another advantage of the present invention is to provide a method and apparatus of translating a language according to the present invention, which allows a user to be easily and quickly provided with a desired language translation result by providing different user interfaces in accordance with a result of voice recognition on a voice inputted from the user.
Still another advantage of the present invention is to provide a method and apparatus of translating a language according to the present invention, which allows a user to be easily provided with a desired language translation result by providing a user interface capable of correcting a translation result obtained as a result of performing voice recognition on the user's voice.
Those skilled in the art will appreciate that various substitutions, modifications and changes are possible, without departing from the technical spirit of the present invention. Thus, the present invention is not limited to the details of the aforementioned embodiments and the accompanying drawings.
Patent | Priority | Assignee | Title |
10796699, | Dec 08 2016 | Alibaba Group Holding Limited | Method, apparatus, and computing device for revision of speech recognition results |
8914277, | Sep 20 2011 | Microsoft Technology Licensing, LLC | Speech and language translation of an utterance |
Patent | Priority | Assignee | Title |
5329609, | Jul 31 1990 | Fujitsu Limited | Recognition apparatus with function of displaying plural recognition candidates |
5842218, | Dec 06 1996 | NIELSEN COMPANY US , LLC, THE | Method, computer program product, and system for a reorienting categorization table |
6085162, | Oct 18 1996 | Gedanken Corporation | Translation system and method in which words are translated by a specialized dictionary and then a general dictionary |
6385602, | Nov 03 1998 | GOOGLE LLC | Presentation of search results using dynamic categorization |
6917920, | Jan 07 1999 | RAKUTEN, INC | Speech translation device and computer readable medium |
6996520, | Nov 22 2002 | Transclick, Inc.; TRANSCLICK, INC | Language translation system and method using specialized dictionaries |
7020612, | Oct 16 2000 | Pioneer Corporation | Facility retrieval apparatus and method |
20020046029, | |||
20020198713, | |||
20040019487, | |||
20050038662, | |||
20050131673, | |||
20050288933, | |||
20060100871, | |||
20060173683, | |||
20070011133, | |||
20070016401, | |||
20080235017, | |||
20100057457, | |||
KR1020030059678, | |||
KR1020060037228, | |||
KR1020080069077, | |||
KR200249965, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 26 2009 | KIM, YU JIN | LG Electronics Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022492 | /0079 | |
Mar 26 2009 | SHIN, WON HO | LG Electronics Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022492 | /0079 | |
Mar 30 2009 | LG Electronics Inc. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Aug 23 2013 | ASPN: Payor Number Assigned. |
Aug 02 2016 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Nov 16 2020 | REM: Maintenance Fee Reminder Mailed. |
May 03 2021 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Mar 26 2016 | 4 years fee payment window open |
Sep 26 2016 | 6 months grace period start (w surcharge) |
Mar 26 2017 | patent expiry (for year 4) |
Mar 26 2019 | 2 years to revive unintentionally abandoned end. (for year 4) |
Mar 26 2020 | 8 years fee payment window open |
Sep 26 2020 | 6 months grace period start (w surcharge) |
Mar 26 2021 | patent expiry (for year 8) |
Mar 26 2023 | 2 years to revive unintentionally abandoned end. (for year 8) |
Mar 26 2024 | 12 years fee payment window open |
Sep 26 2024 | 6 months grace period start (w surcharge) |
Mar 26 2025 | patent expiry (for year 12) |
Mar 26 2027 | 2 years to revive unintentionally abandoned end. (for year 12) |