Improved audio processing management in a portable converged communication device is provided. A default mode of audio processing is provided where audio input to a microphone is routed and processed simultaneously to a plurality of different processors in parallel. The default mode is suspended in response to a keyword input to a microphone. The keyword is recognized by a speech recognition engine (ASR). The keyword enables audio processing through one of the plurality of processors while selectively suspending other audio processing through the other processors.
|
1. A method for managing audio processing in a portable converged communication device, the method comprising:
operating in a default mode of audio processing where audio input to a microphone is routed and processed simultaneously to a plurality of different processors; and
suspending the default mode of audio processing in response to a keyword input to the microphone, the keyword being recognized by a speech recognition engine (ASR), the keyword enabling audio processing through one of the plurality of different processors while selectively suspending other audio processing through the plurality of different processors.
15. A portable communication device, comprising:
a land mobile radio (lmr) transceiver operatively managed by an lmr baseband processor (BP);
a broadband transceiver operatively managed by an applications processor (AP);
a serial bus interface operatively coupling the BP and the AP;
a first codec operatively coupled to the lmr BP;
a second codec operatively coupled to the AP;
a first microphone coupled within the portable communication device and operatively coupled to the first and second codecs in parallel;
a second microphone coupled to the first codec via an external accessory;
an automatic speech recognition (ASR) engine operating within the second codec with functionality shared between the AP and BP and operatively coupled to a microphone; and
the ASR engine being configured to recognize a keyword throughout any audio communications, the keyword selectively enabling and suspending audio processing in the AP and BP processors.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
BP audio processing is suspended in response to a keyword associated with AP audio processing; and
AP audio processing is suspended in response to a keyword associated with BP audio processing.
7. The method of
8. The method of
9. The method of
transferring BP receive audio over a first channel of an I2S bus from the BP to the AP;
transferring an echo reference signal formed of BP receive audio mixed with AP receive audio over a second channel of the I2S bus from the BP to the AP; and
recording the BP receive audio at the AP while applying echo cancellation at the AP.
10. The method of
transferring lmr receive audio over an I2S bus from the BP to the AP; and
recording the lmr receive audio by the AP while simultaneously playing out a combination of lmr receive audio and broadband receive audio at the speaker.
11. The method of
transferring the combined lmr receive audio with broadband receive audio from the BP to the AP as an echo reference signal; and
applying the echo reference signal to acoustic feed through picked up by the first internal microphone from the speaker, thereby cancelling echo to the combination of lmr receive audio and broadband receive audio.
12. The method of
dynamically adjusting audio processing sampling rates based on the predetermined keyword.
13. The method of
an internal microphone of the portable radio; and
an external microphone associated with an audio accessory coupled to the portable radio.
14. The method of
a VoIP call;
a video recording (VR);
a virtual assistant (VA); and
a voice control (VC) session.
16. The portable communication device of
an I2S serial bus for transferring digitized lmr receive audio from the lmr BP to the AP for lmr voice recording by the AP.
17. The portable communication device of
18. The portable communication device of
19. The portable communication device of
an I2S serial bus for transferring the VoIP receive audio generated during the full duplex call from the AP to the BP for mixing with an lmr receive audio signal within the lmr BP to generate a mixed signal, the mixed signal being processed through the first codec and played out at the speaker.
20. The portable communication device of
acoustic feed through from the speaker to the microphone within the portable communication device is cancelled via the echo reference signal within the AP.
|
Public safety personnel, such as police officers, firefighters, paramedics and the like typically utilize portable communication devices while working in the field. Public safety communication devices may include, for example, handheld radios operating with remote accessories, such as remote speaker microphones, headsets, and the like. Audio processing in a multi-microphone environment can be challenging, particularly in cases where different audio features and audio applications are operating at the same time, such as in the multi-processor environment of a converged communication device. There is a need for improved management of audio processing in a converged portable communication device.
The accompanying figures, where like reference numerals refer to identical or functionally similar elements throughout the separate views, together with the detailed description below, are incorporated in and form part of the specification, and serve to further illustrate embodiments of concepts that include the claimed invention, and explain various principles and advantages of those embodiments.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.
The apparatus and method components have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments of the present invention so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.
Briefly, there is provided herein a portable a portable communication device with audio accessory coupled thereto which provide converged functionality via two different processors, two different transceivers, and two codecs, operating within a multi-microphone environment. Parallel microphone sharing in the multi-microphone environment is achieved by the use of a dynamic interface control between the two different processors. The parallel microphone sharing enables efficient co-existence during a converged default mode as well as selectivity of different audio features and applications. Messaging via an interprocessor communications (IPC) interface coordinates events on both processors to enable user selection of audio processing based on keywords spoken by the user.
In accordance with the embodiments provided herein voice keyword detection is used at the beginning of, or during a call, to trigger radio software to determine an appropriate audio path. The type of call intended by the user, spoken as a keyword, routes the audio along the appropriate voice processing path.
Using event coordination between the two processors (an applications processor and a baseband processor) allows more audio applications and features to run concurrently while minimizing interference. The messages allow command and control over each activity on each processor with flexibility to start, stop, and pause different audio related tasks. Pre-selected commands or keyword allow a user to vary their audio focus during interaction with different concurrent audio calls. For example, when a user is having a private or sensitive conversation in a land mobile radio (LMR) group call, and that user does not want to be heard in a concurrent VoIP call, the VoIP call can be suspended via a command message or keyword.
The embodiments further provide for the minimization of interference, such as echo interference, by re-purposing an inter-processor I2S path as an auxiliary audio source between the two processors during concurrent LMR receive audio of a half-duplex call and broadband receive audio of a full duplex call. The I2S path is used to send an echo feedback reference signal from the baseband processor to the applications processor, the echo reference signal being a combination of the LMR receive audio and the broadband receive audio. The inter-processor auxiliary path, I2S, can also be re-used for different audio applications, for example the recording of LMR receive audio by the applications processor.
The architecture of
In broadband receive mode, incoming broadband signals (for example LTE, WiFi, or BLUETOOTH signals) are processed through the broadband transceiver 108, applications processor 104, second codec 112, and first codec 110 for amplification by the audio power amplifier 114 and routed via the audio switch 116 for playing out at the internal speaker 118 or the accessory speaker 120.
The architecture of
When the external accessory microphone 124 is being used, the analog LMR or app audio input 132 is routed to the first codec 110 for conversion of the analog signal to a digital signal 136, which is then transferred to the LMR baseband processor 102 for processing into digital signal 138. The LMR BP 102 forwards the processed digital signal 138 back to the first codec 110 for conversion to an analog signal 140. The analog signal 140 is then sent to the second codec 112 (in series) for conversion to a digital signal 142 which is sent to and processed by the AP 104. In this audio routing arrangement the BP 102 acts as the microphone audio master, whereas the AP 104 operates as the slave.
When the internal microphone 122 is being used, the audio input 134 (app audio or LMR audio) goes to the first and second codecs 110, 112 in parallel, each codec 110, 112 processing the analog input signal into respective app digital signals 144, 146. Each codec 110, 112 transfers the app digital signals 144, 146 to both the BP 102 and the AP 104. The purpose of parallel microphone sharing between the two processors 102, 104 is to allow independent audio processing through both processors as a default mode, unless the user explicitly suspends one of them through a spoken keyword. Only the BP 102 is used to control the speakers, 118, 120 via the first codec 110, which, as discussed in later embodiments will help with echo cancellation.
Hence, the architecture has provided for series and parallel processing of the applications or LMR audio input, depending on which microphone type (internal or external) is being used by the user. This architecture is particularly beneficial as additional LMR audio features are run simultaneously with app audio applications as will be described in the following processing figures.
The VOX feature, also referred to as “voice activated transmission”, is found in most higher-end consumer two-way radios. Virtually all two-way radios and two-way radio accessories have a push-to-talk (PTT) button, to enable transmitting, With the VOX feature switched on, the microphone is always listening for voice and when the microphone detects voice, it will then automatically start broadcasting. The VOX feature thus facilitates hands free operation by not requiting the PTT to be pressed.
However, the LMR VOX feature transmits all microphone audio with present voice activity which can be challenging when applications input audio is present at one of the device's microphones. In the past, audio intended only for the AP 104 may have been unintentionally transmitted into an LMR talkgroup via VOX (e.g. a virtual assistant audio input being inadvertently transmitted to an LMR talkgroup). In accordance with the embodiments, the parallel microphone sharing approach facilitates the VOX feature being operated in parallel with an AP audio application, however due to the lack of PTT press as an indicator, the parallel processing does also present some challenges which have been addressed.
Operationally, an audio signal 202 is input to the accessory microphone 124 of audio accessory 130 which is processed through first codec 110 and translated to a digital signal 204 for processing by the BP 102. The audio signal 204 is sent as a digital signal to the first codec 110 where, similarly to
The VOX signaling thus, by default, hits both processors 102, 104—whether having been processed in series as shown in
The use of keyword detection is provided at the beginning of or during a call (or app usage) to trigger radio software within the AP 104 to determine which voice processing path the user intends to use. For example, a user can say words such as: “private”, “LMR” “VOX”, or “WhatsApp”, to trigger an audio processing path prior to speaking the actual message (or during the message) in order to express user voice intent to control the processor and audio path that become active while other paths are suspended.
Upon detecting the pre-selected keywords in the microphone audio input (either at internal microphone 122 or the external accessory microphone 124), either at the beginning or during the call, the portable communication device can infer the user's intent and use that to temporarily suspend the audio processing through the non-selected processor by communicating the suspension via the IPC interface 126.
If the user does not speak any of the preselected keywords at the beginning or during the call, the device 100 will interpret this as the user wanting concurrent parallel processing of the microphone input through multiple features and applications in the different processors 102, 104—as the default behavior.
In the example of
While the ASR engine 522 is described as being in the AP processor 104, it is appreciated that the ASR engine could be in one or both processors 102, 104. By not having the ASR on the BP 102, the BP becomes the default processing operation triggered by preselected BP type keywords, such as LMR, private, VOX, or other LMR type audio feature.
If the ASR remains on the AP 104 and preselected words are directed to AP applications, such as “WhatsApp” or some other preselected keywords, the AP can control the audio processing by suspending the BP processor 102 via the IPC 126. Hence, audio routing is able to remain intact, (audio still present at the input to both processors), and just the processing of that audio has changed. Audio signals are permitted to be present at the inputs to both processors but only one processor will respond while the other is suspended.
The microphone input may be may input to an internal microphone of the portable communication device and/or an external microphone associated with the audio accessory coupled to the portable communication device. At 404, keyword recognition monitoring takes place to determine if a preselected keyword has been input to a microphone (internal microphone or external microphone). If no keyword is detected at 404, the method goes into a default mode at 406 where parallel processing of audio takes place through two different processors, the BP 102 and the AP 104 of the previous figures.
Depending on the audio applications and audio features being run at 406, echo cancellation and/or additional audio applications may additionally be run at 408. For example echo cancellation can be applied by the AP to improve audio processing of parallel AP audio and BP audio being played out of a speaker. Voice recording of LMR receive audio can be also recorded by the AP. Certain features such as voice recording can take place simultaneously with the echo cancellation. A user may choose to send voice to two destinations simultaneously, if desired. For example, a user may want the LMR audio to simultaneously reach an LMR talkgroup and a VoIP call recipient (e.g. Skype user). As another example, a user may want to simultaneously transmit LMR voice in parallel through WiFi/LTE for diagnostic purposes. As another example, an LMR talk group can take place while the talkgroup call is sent to the cloud for analytics. Echo cancellation can take place in parallel with these as well as other audio features being processed.
If a preselected keyword is input to one of the microphones during the default mode of parallel audio processing at 406, then the corresponding audio processing is suspended and the audio input to the microphone is processed through the non-suspended processing path at 412. The audio processing is responsive to the keyword at any time during the default mode. Continued monitoring for pre-selected keywords takes place at 410 until the call ends at 414 and audio processing ends at 416.
If at 404, a preselected keyword is input to one of the microphones at the beginning of transmit, then corresponding audio processing associated with the non-selected processing is suspended, and the audio input to the microphone is processed through the non-suspended processing path at 412. Continuous keyword monitoring takes place at 410 until the call ends at 414 and audio processing ends at 416. Hence, keyword monitoring takes place consciously and can be detected at the beginning or during the audio processing.
The method 400 can be summarized as a method for managing audio processing in a portable converged communication device by operating in a default mode of audio processing where audio input to a microphone is routed and processed simultaneously to a plurality of different processors, and suspending the default mode of audio processing in response to a keyword input to a microphone, the keyword being recognized by a speech recognition engine (ASR), the keyword enabling audio processing through one of the plurality of different processors while selectively suspending other audio processing through the plurality of different processors.
The audio processing is responsive to the keyword at any time during the default mode. The method may dynamically adjust audio processing sampling rates based on the predetermined keyword. The plurality of different processors may comprise a baseband (BP) and an applications processor (AP). The plurality of different processors may selectively manage simplex audio communications and full duplex audio communications. The simplex audio communications may be managed by the BP and the full duplex audio communications may be managed by an AP. An example of suspended operations may be the BP performing an LMR voice call while the AP suspends broadband audio applications. The BP audio processing may be suspended in response to a keyword associated with AP audio processing, and AP audio processing may be suspended in response to a keyword associated with BP audio processing. The predetermined keyword may be selected to enable mission critical simplex audio communications through the baseband processor, while other BP managed audio applications continue to operate simultaneously, while all other AP managed audio applications managed by the AP are suspended.
The audio architecture of the embodiments provides improved echo cancellation to address the potential acoustic feed through 580. In accordance with the embodiment, the BP 102 provides a mixed signal 510 as an echo reference signal to the AP 104. Mixed signal 510 is a mixed audio signal comprising broadband receive audio, in this case VoIP receive audio 530, of a full duplex call mixed with LMR receive audio 540.
In accordance with this embodiment, an I2S bus 570 is utilized between the BP 102 and the AP 104 to control the transfer of the various receive audio signals (LMR receive 540, VoIP receive 530, mixed signal 510, and speaker data 550). The I2S bus 570, while typically used as a stereo bus (left and right channels), has been repurposed as an auxiliary audio path for the various signal transfers and echo cancellation. The architecture and use of I2S bus 570 allows for various audio applications to be run in parallel by the AP 104. For example, LMR receive audio 540 can be recorded by the AP 104, while echo cancellation takes place within the AP 104. Speaker priority data can be conveyed from the AP 104 to the BP 102 while VoIP receive audio 530 is sent from the AP 104 to the BP 102.
Operationally, while the communication device 100 is engaged in a VoIP full duplex call while simultaneously receiving an LMR call on a different talk group, the microphone input audio 520 of the full duplex call is encoded into a digital signal at second codec 112 and sent to AP 104 for processing and transmission over the broadband transceiver 108, such as over LTE, WIFI, or BT modem.
The VoIP receive audio 530 is sent over I2S bus 570 to the BP 102 for mixing with the LMR receive audio 540 at mixer 560 of the BP 102. The mixed signal 510, which is a combination of VoIP receive audio and LMR receive audio is processed by the first codec 110 for conversion back to an analog signal which is forwarded through the speaker path and played out at speaker 118.
The mixed signal 510 is also transferred over the I2S bus 570 from the BP 104 as an echo reference signal to the AP 104. Improved echo cancellation is achieved using adaptive digital signal processing in the AP 104 that subtracts the echo reference signal 510 from the microphone input signal 520 using predetermined time alignment and delays, under the control of the AP 104.
Accordingly, there has been provided a microphone processing approach for a converged communication device that does not rely or require dedicated independent microphone paths. The use of continuous keyword monitoring to control microphone audio routing allows the user's intent to control transmit audio processing. The continuous voice keyword recognition between concurrently running apps as part of a regular voice call microphone audio is beneficially convenient and efficient from a user standpoint, in that it negates the need for any button pressing or other user interface steps. The approach advantageously prevents, for example, AP audio applications from being transmitted into an LMR talk group via VOX. The approach is particularly beneficial to multi-processor converged communication devices by allowing the user to control the audio processing independently on each processor via spoken pre-selected keywords. The focus of the keyword being on the microphone/transmit side allows the user to have more control over audio in a converged device.
In the foregoing specification, specific embodiments have been described. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the invention as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of present teachings.
The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential features or elements of any or all the claims. The invention is defined solely by the appended claims including any amendments made during the pendency of this application and all equivalents of those claims as issued.
Moreover in this document, relational terms such as first and second, top and bottom, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” “has”, “having,” “includes”, “including,” “contains”, “containing” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises, has, includes, contains a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element proceeded by “comprises a . . . ”, “has a . . . ”, “includes a . . . ”, “contains a . . . ” does not, without more constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises, has, includes, contains the element. The terms “a” and “an” are defined as one or more unless explicitly stated otherwise herein. The terms “substantially”, “essentially”, “approximately”, “about” or any other version thereof, are defined as being close to as understood by one of ordinary skill in the art, and in one non-limiting embodiment the term is defined to be within 10%, in another embodiment within 5%, in another embodiment within 1% and in another embodiment within 0.5%. The term “coupled” as used herein is defined as connected, although not necessarily directly and not necessarily mechanically. A device or structure that is “configured” in a certain way is configured in at least that way, but may also be configured in ways that are not listed.
It will be appreciated that some embodiments may be comprised of one or more generic or specialized processors (or “processing devices”) such as microprocessors, digital signal processors, customized processors and field programmable gate arrays (FPGAs) and unique stored program instructions (including both software and firmware) that control the one or more processors to implement, in conjunction with certain non-processor circuits, some, most, or all of the functions of the method and/or apparatus described herein. Alternatively, some or all functions could be implemented by a state machine that has no stored program instructions, or in one or more application specific integrated circuits (ASICs), in which each function or some combinations of certain of the functions are implemented as custom logic. Of course, a combination of the two approaches could be used.
Moreover, an embodiment can be implemented as a computer-readable storage medium having computer readable code stored thereon for programming a computer (e.g., comprising a processor) to perform a method as described and claimed herein. Examples of such computer-readable storage mediums include, but are not limited to, a hard disk, a CD-ROM, an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a PROM (Programmable Read Only Memory), an EPROM (Erasable Programmable Read Only Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory) and a Flash memory. Further, it is expected that one of ordinary skill, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology, and economic considerations, when guided by the concepts and principles disclosed herein will be readily capable of generating such software instructions and programs and ICs with minimal experimentation.
The Abstract of the Disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in various embodiments for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.
Li, Haohua, Corretjer, Jesus F, Ng, Lian Kooi, Teh, Keh Yong
Patent | Priority | Assignee | Title |
11861261, | Jan 12 2021 | AsusTek Computer Inc. | Electronic device |
11943382, | Jun 07 2022 | MOTOROLA SOLUTIONS, INC. | Smart receive audio switching for a converged device |
Patent | Priority | Assignee | Title |
10074369, | Sep 01 2016 | Amazon Technologies, Inc | Voice-based communications |
8843227, | Nov 26 2005 | CIRRUS LOGIC INTERNATIONAL SEMICONDUCTOR LTD ; CIRRUS LOGIC INC | Wireless cellular telephone with audio codec |
20110124300, | |||
20160373899, | |||
20180190279, | |||
20200075018, | |||
20200106872, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Aug 27 2020 | LI, HAOHUA | MOTOROLA SOLUTIONS INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 053770 | /0251 | |
Aug 27 2020 | TEH, KEH YONG | MOTOROLA SOLUTIONS INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 053770 | /0251 | |
Aug 27 2020 | CORRETIJER, JESUS F | MOTOROLA SOLUTIONS INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 053770 | /0251 | |
Aug 27 2020 | NG, LIAN KOOI | MOTOROLA SOLUTIONS INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 053770 | /0251 | |
Sep 15 2020 | MOTOROLA SOLUTIONS, INC. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Sep 15 2020 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Sep 21 2024 | 4 years fee payment window open |
Mar 21 2025 | 6 months grace period start (w surcharge) |
Sep 21 2025 | patent expiry (for year 4) |
Sep 21 2027 | 2 years to revive unintentionally abandoned end. (for year 4) |
Sep 21 2028 | 8 years fee payment window open |
Mar 21 2029 | 6 months grace period start (w surcharge) |
Sep 21 2029 | patent expiry (for year 8) |
Sep 21 2031 | 2 years to revive unintentionally abandoned end. (for year 8) |
Sep 21 2032 | 12 years fee payment window open |
Mar 21 2033 | 6 months grace period start (w surcharge) |
Sep 21 2033 | patent expiry (for year 12) |
Sep 21 2035 | 2 years to revive unintentionally abandoned end. (for year 12) |