According to one embodiment, a television apparatus includes a speech input unit, an indication input unit, a speech recognition unit, and a control unit. The speech input unit is configured to input a speech. The indication input unit is configured to input an indication to start speech recognition from a user. The speech recognition unit is configured to recognize the user's speech inputted after the indication is inputted. The control unit is configured to execute an operation command corresponding to a recognition result of the user's speech. The control unit, if a volume of the television apparatus at a timing when the indication is inputted is larger than or equal to a threshold, temporarily sets the volume to a value smaller than the threshold while the speech recognition unit is recognizing.
|
6. A method for controlling a television apparatus, comprising:
replaying a broadcast program being broadcasted or a recorded broadcast program;
inputting a speech;
inputting an indication to start a speech recognition from a user;
recognizing the user's speech inputted after the indication is inputted;
executing an operation command corresponding to a recognition result of the user's speech;
deciding whether a content being replayed is the broadcast program at a timing when the indication is inputted;
if the content is the broadcast program, and if a volume of the television apparatus at the timing is larger than or equal to a threshold, temporarily setting the volume to a value smaller than the threshold while the user's speech is being recognized;
starting to record the broadcast program, when the volume is set to the value smaller than the threshold while the broadcast program is being replayed;
replaying the recorded broadcast program by chasing playback from the timing and by resetting the volume to a value prior to the speech recognition indication during the playback, after the speech recognition is completed and if the operation command is not channel-change of broadcasting wave; and
discarding the recorded broadcast program if the operation command is channel-change of broadcasting wave.
1. A television apparatus comprising:
a speech input unit configured to input a speech;
an indication input unit configured to input an indication to start a speech recognition from a user;
a speech recognition unit configured to recognize the user's speech inputted after the indication is inputted;
a control unit configured to execute an operation command corresponding to a recognition result of the user's speech;
a recording unit configured to acquire a recorded content by recording a broadcast program being broadcasted; and
a replay unit configured to replay the broadcast program or the recorded content;
wherein the replay unit decides whether a content being replayed is the broadcast program at a timing when the indication is inputted,
wherein the control unit, if the content is the broadcast program, and if a volume of the television apparatus at the timing is larger than or equal to a threshold, temporarily sets the volume to a value smaller than the threshold while the speech recognition unit is recognizing,
wherein the recording unit starts to record the broadcast program, when the volume is set to the value smaller than the threshold while the broadcast program is being replayed, and
wherein the replay unit replays the recorded content by chasing playback from the timing and by resetting the volume to a value prior to the speech recognition indication during the playback, after the speech recognition is completed and if the operation command is not channel-change of broadcasting wave, and discards the recorded content if the operation command is channel-change of broadcasting wave.
2. The television apparatus according to
wherein the replay unit replays a stored content recorded in a recording medium; and
when the indication is inputted during replaying the stored content, temporarily stops replay of the stored content while the speech recognition unit is recognizing.
3. The television apparatus according to
the control unit resets the volume to a value prior to the timing, if the recognition result is a specific one.
4. The television apparatus according to
an utterance detection unit configured to detect the user's utterance;
wherein the control unit displays a rest time to pass a predetermined time from the timing, and resets the volume to a value prior to the timing, if the user's utterance is not detected in the predetermined time from the timing.
5. The television apparatus according to
an echo canceller configured to cancel an output sound of the television apparatus from an input sound of the speech input unit; and
a calculation resource-monitor unit configured to monitor a calculation resource of a main processor of the television apparatus;
wherein the control unit switches control processing of the volume and echo cancel processing by the echo canceller, based on the calculation resource at the timing.
|
This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2011-044430, filed on Mar. 1, 2011; the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to a television apparatus and a remote operation apparatus each operable by a speech.
As a conventional technique, a user's utterance is recognized and used for operating a device. If the device (as an operation target) outputs a sound (a broadcasted speech, an artificial speech, and so on), this sound is a noise for recognizing the user's speech. Furthermore, from an input signal mixing the sound (outputted by the device) with a speech uttered by a speaker (user), by using an echo canceller to cancel the sound, a technique to improve an accuracy of speech recognition is proposed. However, in this case, computing processing for the echo canceller is necessary. Accordingly, as to a device having restricted throughput, this technique is difficult to be realized.
On the other hand, a device to mute the sound during recognizing a user's speech is utilized. As to this device, while the user's speech is being recognized, the sound does not exist. Accordingly, the user's speech is recognized without influence of the sound. However, if the device (as the operation target) is a television set, the user (viewer) cannot listen to the sound (speech) broadcasted from the television set during recognizing the speech.
According to one embodiment, a television apparatus includes a speech input unit, an indication input unit, a speech recognition unit, and a control unit. The speech input unit is configured to input a speech. The indication input unit is configured to input an indication to start speech recognition from a user. The speech recognition unit is configured to recognize the user's speech inputted after the indication is inputted. The control unit is configured to execute an operation command corresponding to a recognition result of the user's speech. The control unit, if a volume of the television apparatus at a timing when the indication is inputted is larger than or equal to a threshold, temporarily sets the volume to a value smaller than the threshold while the speech recognition unit is recognizing.
Various embodiments will be described hereinafter with reference to the accompanying drawings.
In the first embodiment, the speech recognition apparatus 100 includes a microphone 101, a speech input unit 102, a speech recognition start-detection unit 103, an utterance detection unit 104, a speech recognition completion-detection unit 105, an echo canceller 106, a speech recognition unit 107, and a signal sending unit 108. The speech input unit 102 inputs a speech from the microphone 101. The speech recognition completion-detection unit 103 detects a predetermined sign to start speech recognition from a user. The utterance detection unit 104 detects existence (or non-existence) of the user's utterance. The speech recognition completion-detection unit 105 detects completion of speech recognition by detecting non-existence. The speech recognition unit 107 recognizes a speech inputted from the speech input unit 102. The signal sending unit 108 sends a predetermined signal based on a speech recognition result. After inputting the user's sign to start speech recognition, the utterance detection unit 104 detects existence (or non-existence) of the user's utterance. As to a speech (sound) outputted from a speaker 115 of the television set 110 and inputted to the speech input unit 102 via the microphone 101, the echo canceller 106 cancels the speech.
The television set 110 includes a television control unit 111, a calculation resource-monitor unit 112, a video replay unit 113, a recording unit 114, a speaker 115, and a display unit 116. The television control unit 111 controls television-volume and executes various operations of television based on signals sent from the signal sending unit 108. The calculation resource-monitor unit 112 monitors a calculation resource of a main processor of the television set 110. The recording unit 114 records a program being broadcasted. The speaker 115 outputs a sound of the program being viewed. The display unit 116 displays a video of the program being viewed. The video replay unit 113 replays a program content being broadcasted, a program content recorded, or a video content recorded in a recording medium. As the recording medium, for example, DVD (Digital Versatile Disc) or BD (Blue-ray Disc) is used.
First, the speech recognition start-detection unit 103 waits an input of speech recognition start-indication from a user (S1). As the speech recognition start-indication, a predetermined sound is used. For example, by continuously clapping two times with hands, this sound is used as the indication. In this case, from sounds (speeches) inputted to the microphone 101, a clap sound (continuous two claps by hands) is detected.
As another example, a specific word uttered by a user may be used. In this case, a sign-recognition dictionary for recognizing a word used for a sign and a command-recognition dictionary for recognizing a word used for a television operation command are prepared. Regularly, the speech recognition unit 107 performs speech recognition by using the sign-recognition dictionary. When the word of the sign is recognized, the speech recognition unit 107 switches the sign-recognition dictionary to the command-recognition dictionary.
As another example, by providing a speech recognition-start button on a remote controller (in
When the speech recognition start-detection unit 103 detects a sign of speech recognition-start (S2), the signal sending unit 108 sends a speech recognition-start signal to the television control unit 111 of the television set 110 (S3). In this case, in order to feedback start of speech recognition to the user, this purport may be displayed by lighting of a LED (Light Emitting Diode) or an OSD (On-Screen Display).
The television set 110 waits a signal from the signal sending unit 108 of the speech recognition apparatus 100 (S101). When any signal is received from the signal sending unit 108, the television control unit 111 decides whether this signal is a speech recognition start-command (S102). If this signal is the speech recognition start-command, the video replay unit 113 of the television set 110 decides whether a video (being displayed) is a broadcast content or a stored content (S103). The broadcast content is a video broadcasted by digital terrestrial television broadcast, BS digital broadcast, CS digital broadcast, and CATV. The stored content is a program recorded by the recording unit 114 and a video recorded in a media (DVD, BD).
If the video (being viewed) is a broadcast wave, the calculation-monitor unit 112 measures a calculation load of CPU in the control unit 130 of the television set 110 (S104), and decides the calculation load is larger than a predetermined threshold (S105). In this case, this decision may be based on a ratio of the calculation load to all resources of CPU. Furthermore, by defining a calculation quantity of each processing to be executed by the television set 110, the decision may be based on the sum of calculation quantity of processing being presently executed.
By previously examining a calculation quantity required for echo cancel processing, the threshold is determined based on whether the CPU has a performance to execute the echo cancel processing. Accordingly, if a calculation load of the CPU is smaller than the threshold, the CPU has a performance to execute the echo cancel processing. When the calculation load is smaller than the threshold, the echo cancel processing is executed (S106), and the speech recognition unit 107 starts to input a speech signal as a target of speech recognition (S4). In this case, the television control unit 111 does not change the television-volume.
When the calculation load is larger than the threshold, the television control unit 111 reads the present value of television-volume (S107). By deciding whether this value is larger than a predetermined value, operation to change the television-volume is changed.
In
In
In
In
In
In this way, after executing volume-change based on the television-volume, the speech recognition unit 107 executes input of a speech to be recognized (34).
On the other hand, if the stored content is viewed, the video replay unit 113 temporarily stops a video being replayed (S109), and the speech recognition unit 107 executes input of a speech to be recognized (S4). The stored content is, for example, a program recorded by the recording unit 114, or a video recorded in a medium such as DVD or BD.
The utterance detection unit 104 of the speech recognition apparatus 100 detects whether a user starts to utter. For the case that the user erroneously utters a sign to start speech recognition or the speech recognition start-detection unit 103 erroneously detects the sign, a time-out to automatically return to original status had better be set. Furthermore, as shown in a display 1101 of
The speech recognition completion-detection unit 105 decides whether the speech recognition is completed (S5). For example, “a silent period continues over a predetermined time” is one condition of speech recognition completion. The speech recognition unit 107 executes speech recognition, and obtains a recognition result of the speech recognition (S6). Based on the recognition result, the signal sending unit 108 sends an operation command of the television set 110 to the television control unit 111 (S7).
In this case, the operation command corresponding to a specific speech command (the recognition result) such as “channel-change”, “volume-change”, “input-switch” and “screen mode-switch”, is sent. Examples of correspondence between the operation command and the speech command are shown in a table 1300 of
When the television set 110 receives an operation command except for the speech recognition start-command (No at S102), the television control unit 111 decides whether the operation command is a cancel command (S110). If the operation command is the cancel command (Yes at S110), the television control unit 111 resets the television-volume to a value prior to the speech recognition start without executing television-operation (S112). If the operation command is not the cancel command (No at S110), the television control unit 111 executes television-operation corresponding to the operation command received (S111), and resets the television-volume to a value prior to the speech recognition start (S112).
As mentioned-above, in the television apparatus of the first embodiment, based on a television-volume prior to the speech recognition start, the television-volume during speech recognition processing is temporarily controlled. As a result, while the speech recognition is accurately executed with little calculation load, disturbance of viewing by the speech operation is avoided.
Furthermore, when the stored content is replayed, this replay is temporarily stopped during the speech recognition. As a result, during operation by the user's speech, viewing of the stored content under incomplete condition is avoided.
The television apparatus of the second embodiment is explained by referring to Figs. As to the same processing/component as the first embodiment, the same sign is assigned and explanation thereof is omitted, and parts different from the first embodiment are only explained.
After receiving the speech recognition start-command, the television set 110 changes processing operation based on the present viewing media (S103). If the present viewing media is broadcast, the television control unit 111 makes the screen be static and the sound mute (S201). Afterwards, the recording unit 114 begins to record the program immediately (S202).
After the speech recognition is completed, the television control unit 111 receives an operation command based on a speech recognition result, and executes television operation corresponding to the operation command (S111). The television control unit 111 decides whether following two conditions are satisfied (S203).
(1) The viewing media before starting the speech recognition is broadcast.
(2) The television operation executed by the television control unit 111 is not channel-change of broadcasting wave.
If two conditions (1) and (2) are satisfied, the television control unit 111 starts a chasing playback from the screen at the static timing (S203). Typically, this is the case that operation not channel-change (For example, volume-change) is executed.
On the other hand, if at least one of the two conditions (1) and (2) is not satisfied, the television control unit 111 resets the volume to a value prior to the speech recognition start without the chasing playback (S112). When the recording is executed (S202), and after that, if the viewing-channel is changed, the recording may be stopped. If the recording is stopped, recorded data may be erased.
In the television set of the second embodiment, the speech recognition is executed under a condition that the sound is muted. As a result, the speech recognition can be accurately executed by little calculation cost. Furthermore, a broadcast content during the speech recognition is recorded, and, after the speech recognition, the broadcast content is replayed by chasing. As a result, even if a user operates the television by his/her speech, the user's viewing is not disturbed.
The television apparatus of the third embodiment is explained by referring to Figs. As to the same processing/component as the first and second embodiments, the same sign is assigned and explanation thereof is omitted, and parts different from the first and second embodiments are only explained.
As shown in
In the third embodiment, in order to estimate an ambient sound at a position where the speech recognition apparatus 100 is located, the speech recognition apparatus 100 includes a television-volume estimation unit 120. The television-volume estimation unit 120 estimates a television-volume from an averaged volume of the ambient sound inputted for a past predetermined period by the speech input unit 102.
The signal sending unit 108 changes a volume-level of the television set 110 during the speech recognition, based on the television-volume estimated by the television-volume estimation unit 120. Briefly, based on the volume-level estimated, the signal sending unit 108 calculates a volume-level during the speech recognition. As a correspondence relationship between the volume level estimated and the volume level during the speech recognition, for example, setting examples shown in
The signal sending unit 108 sends an operation command to set the volume level (calculated) to the television set 110. The signal sending unit 108 may repeatedly send an operation command to lower the volume-level, and may send an operation command (direct code) to directly indicate a value of the volume level. Furthermore, the signal sending unit 108 may send a special operation command to set the volume level to a half value (½ mute). Only if the volume level used during the speech recognition is lower than a specific level, another operation command may be sent.
When the speech recognition start-detection unit 103 detects the speech recognition start, the television-volume estimation unit 120 estimates a television-volume from an averaged volume of the ambient sound inputted for a past predetermined period by the speech input unit 102 (S10). Based on the television-volume, the signal sending unit 108 sends an operation command to change the television volume during the speech recognition (S11). After that, the speech recognition unit 107 recognizes a speech, and acquires a recognition result of the speech (S4, S5, S6). The signal sending unit 108 sends an operation command based on the recognition result (S7). After that, the signal sending unit 108 sends an operation command (such as a mute release command) to reset the volume to a value prior to the speech recognition (S12).
As mentioned-above, in the third embodiment, a television-volume during the speech recognition is controlled based on a television-volume measured by the speech recognition unit 107. As a result, the television-volume can be controlled within a range necessary for the speech recognition.
The television apparatus of the fourth embodiment is explained by referring to Figs. As to the same processing/component as the first, second and third embodiments, the same sign is assigned and explanation thereof is omitted. Parts different from the first, second and third embodiments are only explained.
As mentioned-above, in the television apparatus of the fourth embodiment, based on a television-volume prior to the speech recognition start, the television-volume during speech recognition processing is temporarily controlled. As a result, while the speech recognition is accurately executed with little calculation load, disturbance of viewing by the speech operation is avoided.
While certain embodiments have been described, these embodiments have been presented by way of examples only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Kawamura, Akinori, Suzuki, Kaoru, Sakai, Masaru, Ouchi, Kazushige, Kida, Yusuke
Patent | Priority | Assignee | Title |
10412337, | May 23 2016 | Funai Electric Co., Ltd.; FUNAI ELECTRIC CO , LTD | Display device |
10645333, | May 23 2016 | Funai Electric Co., Ltd. | Display device |
11128974, | Nov 16 2017 | SAMSUNG ELECTRONICS CO , LTD | Electronic device, external electronic device, and audio system comprising same |
11276404, | Sep 25 2018 | Toyota Jidosha Kabushiki Kaisha | Speech recognition device, speech recognition method, non-transitory computer-readable medium storing speech recognition program |
Patent | Priority | Assignee | Title |
5987106, | Jun 24 1997 | ATI Technologies ULC | Automatic volume control system and method for use in a multimedia computer system |
6396544, | Jul 17 1995 | Gateway, Inc. | Database navigation system for a home entertainment system |
6665645, | Jul 28 1999 | Panasonic Intellectual Property Corporation of America | Speech recognition apparatus for AV equipment |
7023498, | Nov 19 2001 | Matsushita Electric Industrial Co. Ltd. | Remote-controlled apparatus, a remote control system, and a remote-controlled image-processing apparatus |
7272232, | May 30 2001 | ACCESS CO , LTD | System and method for prioritizing and balancing simultaneous audio outputs in a handheld device |
8106750, | Feb 07 2005 | Samsung Electronics Co., Ltd. | Method for recognizing control command and control device using the same |
8165641, | Jul 26 2007 | LENOVO INNOVATIONS LIMITED HONG KONG | Noise suppression system, sound acquisition apparatus, sound output apparatus, and computer-readable medium |
8187093, | Jun 16 2006 | KONAMI DIGITAL ENTERTAINMENT CO LTD | Game sound output device, game sound control method, information recording medium, and program |
8212707, | Apr 16 2008 | Sony Corporation | Remote control system and remote control signal processing method |
20050043948, | |||
20070050832, | |||
20090148135, | |||
20100076763, | |||
20100333163, | |||
20110051016, | |||
20110091031, | |||
20110301950, | |||
20120162540, | |||
20120245932, | |||
JP11015494, | |||
JP2001236090, | |||
JP2006065981, | |||
JP2006119520, | |||
JP2009109536, | |||
JP3203796, | |||
WO2009150776, | |||
WO2011055410, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Sep 19 2011 | Kabushiki Kaisha Toshiba | (assignment on the face of the patent) | / | |||
Oct 03 2011 | OUCHI, KAZUSHIGE | Kabushiki Kaisha Toshiba | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 027103 | /0824 | |
Oct 03 2011 | KAWAMURA, AKINORI | Kabushiki Kaisha Toshiba | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 027103 | /0824 | |
Oct 03 2011 | SAKAI, MASARU | Kabushiki Kaisha Toshiba | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 027103 | /0824 | |
Oct 03 2011 | SUZUKI, KAORU | Kabushiki Kaisha Toshiba | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 027103 | /0824 | |
Oct 03 2011 | KIDA, YUSUKE | Kabushiki Kaisha Toshiba | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 027103 | /0824 |
Date | Maintenance Fee Events |
May 27 2019 | REM: Maintenance Fee Reminder Mailed. |
Nov 11 2019 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Oct 06 2018 | 4 years fee payment window open |
Apr 06 2019 | 6 months grace period start (w surcharge) |
Oct 06 2019 | patent expiry (for year 4) |
Oct 06 2021 | 2 years to revive unintentionally abandoned end. (for year 4) |
Oct 06 2022 | 8 years fee payment window open |
Apr 06 2023 | 6 months grace period start (w surcharge) |
Oct 06 2023 | patent expiry (for year 8) |
Oct 06 2025 | 2 years to revive unintentionally abandoned end. (for year 8) |
Oct 06 2026 | 12 years fee payment window open |
Apr 06 2027 | 6 months grace period start (w surcharge) |
Oct 06 2027 | patent expiry (for year 12) |
Oct 06 2029 | 2 years to revive unintentionally abandoned end. (for year 12) |