A method and system for enhancing a target sound signal from multiple sound signals is provided. An array of an arbitrary number of sound sensors positioned in an arbitrary configuration receives the sound signals from multiple disparate sources. The sound signals comprise the target sound signal from a target sound source, and ambient noise signals. A sound source localization unit, an adaptive beamforming unit, and a noise reduction unit are in operative communication with the array of sound sensors. The sound source localization unit estimates a spatial location of the target sound signal from the received sound signals. The adaptive beamforming unit performs adaptive beamforming by steering a directivity pattern of the array of sound sensors in a direction of the spatial location of the target sound signal, thereby enhancing the target sound signal and partially suppressing the ambient noise signals, which are further suppressed by the noise reduction unit.

Patent
   RE48371
Priority
Sep 24 2010
Filed
Aug 02 2018
Issued
Dec 29 2020
Expiry
Mar 16 2031
Assg.orig
Entity
Small
95
96
currently ok

REINSTATED
0. 38. A microphone array system for enhancing a target sound signal from a plurality of sound signals, comprising:
an array of sound sensors, wherein said sound sensors receive said sound signals from a plurality of disparate sound sources, wherein said received sound signals comprise said target sound signal from a target sound source among said disparate sound sources, and ambient noise signals;
a digital signal processor, said digital signal processor comprising:
a sound source localization unit that estimates a location of said target sound signal from said received sound signals by determining a delay between each of said sound sensors and a reference point of said array of said sound sensors as a function of distance between each of said sound sensors and said reference point and an angle of each of said sound sensors biased from a reference axis;
a beamforming unit that enhances said target sound signal and partially suppresses said ambient noise signals;
an echo cancellation unit that performs echo cancellation and further enhances said target sound signal; and
a noise reduction unit that suppresses said ambient noise signals and further enhances said target sound signal.
0. 30. A microphone array system for enhancing a target sound signal from a plurality of sound signals, comprising:
an array of sound sensors positioned in a linear, circular, or other configuration, wherein said sound sensors receive said sound signals from a plurality of disparate sound sources, wherein said received sound signals comprise said target sound signal from a target sound source among said disparate sound sources, and ambient noise signals;
a digital signal processor, said digital signal processor comprising:
a sound source localization unit that estimates a location of said target sound signal from said received sound signals, by determining a delay between each of said sound sensors and an origin of said array of said sound sensors as a function of distance between each of said sound sensors and said origin, a predefined angle between each of said sound sensors and a reference axis, and an azimuth angle between said reference axis and said target sound signal, when said target sound source that emits said target sound signal is in a two dimensional plane, wherein said delay is represented in terms of number of samples, and wherein said determination of said delay enables beamforming for said array of sound sensors in a plurality of configurations;
an adaptive beamforming unit that steers directivity pattern of said array of said sound sensors in a direction of said location of said target sound signal, wherein said adaptive beamforming unit enhances said target sound signal and partially suppresses said ambient noise signals;
an echo cancellation unit that performs echo cancellation for further enhancing said target sound signal; and
a noise reduction unit that suppresses said ambient noise signals for further enhancing said target sound signal.
0. 22. A method for enhancing a target sound signal from a plurality of sound signals, comprising:
providing a microphone array system comprising an array of sound sensors positioned in a linear, circular, or other configuration, a sound source localization unit, an adaptive beamforming unit, a noise reduction unit, and an echo cancellation unit, wherein said sound source localization unit, said adaptive beamforming unit, said noise reduction unit, and said echo cancellation unit are implemented in a digital signal processor, and wherein said digital signal processor is in operative communication with said array of said sound sensors;
receiving said sound signals from a plurality of disparate sound sources by said sound sensors, wherein said received sound signals comprise said target sound signal from a target sound source among said disparate sound sources, and ambient noise signals;
determining a delay between each of said sound sensors and an origin of said array of said sound sensors as a function of distance between each of said sound sensors and said origin, a predefined angle between each of said sound sensors and a reference axis, and an azimuth angle between said reference axis and said target sound signal, when said target sound source that emits said target sound signal is in a two dimensional plane, wherein said delay is represented in terms of number of samples, and wherein said determination of said delay enables beamforming for said array of said sound sensors in a plurality of configurations;
estimating a location of said target sound signal from said received sound signals by said sound source localization unit;
performing adaptive beamforming for steering a directivity pattern of said array of said sound sensors in a direction of said location of said target sound signal by said adaptive beamforming unit, wherein said adaptive beamforming unit enhances said target sound signal and partially suppresses said ambient noise signals;
performing echo cancellation by said echo cancellation unit for further enhancing said target sound signal; and
suppressing said ambient noise signals by said noise reduction unit for further enhancing said target sound signal.
0. 1. A method for enhancing a target sound signal from a plurality of sound signals, comprising:
providing a microphone array system comprising an array of sound sensors positioned in an arbitrary configuration, a sound source localization unit, an adaptive beamforming unit, and a noise reduction unit, wherein said sound source localization unit, said adaptive beamforming unit, and said noise reduction unit are in operative communication with said array of said sound sensors;
receiving said sound signals from a plurality of disparate sound sources by said sound sensors, wherein said received sound signals comprise said target sound signal from a target sound source among said disparate sound sources, and ambient noise signals;
determining a delay between each of said sound sensors and an origin of said array of said sound sensors as a function of distance between each of said sound sensors and said origin, a predefined angle between each of said sound sensors and a reference axis, and an azimuth angle between said reference axis and said target sound signal, when said target sound source that emits said target sound signal is in a two dimensional plane, wherein said delay is represented in terms of number of samples, and wherein said determination of said delay enables beamforming for arbitrary numbers of said sound sensors and a plurality of arbitrary configurations of said array of said sound sensors;
estimating a spatial location of said target sound signal from said received sound signals by said sound source localization unit;
performing adaptive beamforming for steering a directivity pattern of said array of said sound sensors in a direction of said spatial location of said target sound signal by said adaptive beamforming unit, wherein said adaptive beamforming unit enhances said target sound signal and partially suppresses said ambient noise signals; and
suppressing said ambient noise signals by said noise reduction unit for further enhancing said target sound signal.
0. 2. The method of claim 1, wherein said spatial location of said target sound signal from said target sound source is estimated using a steered response power-phase transform by said sound source localization unit.
0. 3. The method of claim 1, wherein said adaptive beamforming comprises:
providing a fixed beamformer, a blocking matrix, and an adaptive filter in said adaptive beamforming unit;
steering said directivity pattern of said array of said sound sensors in said direction of said spatial location of said target sound signal from said target sound source by said fixed beamformer for enhancing said target sound signal, when said target sound source is in motion;
feeding said ambient noise signals to said adaptive filter by blocking said target sound signal received from said target sound source using said blocking matrix; and
adaptively filtering said ambient noise signals by said adaptive filter in response to detecting one of presence and absence of said target sound signal in said sound signals received from said disparate sound sources.
0. 4. The method of claim 3, wherein said fixed beamformer performs fixed beamforming by filtering and summing output sound signals from said sound sensors.
0. 5. The method of claim 3, wherein said adaptive filtering comprises sub-band adaptive filtering performed by said adaptive filter, wherein said sub-band adaptive filtering comprises:
providing an analysis filter bank, an adaptive filter matrix, and a synthesis filter bank in said adaptive filter;
splitting said enhanced target sound signal from said fixed beamformer and said ambient noise signals from said blocking matrix into a plurality of frequency sub-bands by said analysis filter bank;
adaptively filtering said ambient noise signals in each of said frequency sub-bands by said adaptive filter matrix in response to detecting one of presence and absence of said target sound signal in said sound signals received from said disparate sound sources; and
synthesizing a full-band sound signal using said frequency sub-bands of said enhanced target sound signal by said synthesis filter bank.
0. 6. The method of claim 3, wherein said adaptive beamforming further comprises detecting said presence of said target sound signal by an adaptation control unit provided in said adaptive beamforming unit and adjusting a step size for said adaptive filtering in response to detecting one of said presence and said absence of said target sound signal in said sound signals received from said disparate sound sources.
0. 7. The method of claim 1, wherein said noise reduction unit performs noise reduction by using one of a Wiener-filter based noise reduction algorithm, a spectral subtraction noise reduction algorithm, an auditory transform based noise reduction algorithm, and a model based noise reduction algorithm.
0. 8. The method of claim 1, wherein said noise reduction unit performs noise reduction in a plurality of frequency sub-bands, wherein said frequency sub-bands are employed by an analysis filter bank of said adaptive beamforming unit for sub-band adaptive beamforming.
0. 9. A system for enhancing a target sound signal from a plurality of sound signals, comprising:
an array of sound sensors positioned in an arbitrary configuration, wherein said sound sensors receive said sound signals from a plurality of disparate sound sources, wherein said received sound signals comprise said target sound signal from a target sound source among said disparate sound sources, and ambient noise signals;
a sound source localization unit that estimates a spatial location of said target sound signal from said received sound signals, by determining a delay between each of said sound sensors and an origin of said array of said sound sensors as a function of distance between each of said sound sensors and said origin, a predefined angle between each of said sound sensors and a reference axis, and an azimuth angle between said reference axis and said target sound signal, when said target sound source that emits said target sound signal is in a two dimensional plane, wherein said delay is represented in terms of number of samples, and wherein said determination of said delay enables beamforming for arbitrary numbers of said sound sensors and a plurality of arbitrary configurations of said array of said sound sensors;
an adaptive beamforming unit that steers directivity pattern of said array of said sound sensors in a direction of said spatial location of said target sound signal, wherein said adaptive beamforming unit enhances said target sound signal and partially suppresses said ambient noise signals; and
a noise reduction unit that suppresses said ambient noise signals for further enhancing said target sound signal.
0. 10. The system of claim 9, wherein said sound source localization unit estimates said spatial location of said target sound signal from said target sound source using a steered response power-phase transform.
0. 11. The system of claim 9, wherein said adaptive beamforming unit comprises:
a fixed beamformer that steers said directivity pattern of said array of said sound sensors in said direction of said spatial location of said target sound signal from said target sound source for enhancing said target sound signal, when said target sound source is in motion;
a blocking matrix that feeds said ambient noise signals to an adaptive filter by blocking said target sound signal received from said target sound source; and
said adaptive filter that adaptively filters said ambient noise signals in response to detecting one of presence and absence of said target sound signal in said sound signals received from said disparate sound sources.
0. 12. The system of claim 11, wherein said fixed beamformer performs fixed beamforming by filtering and summing output sound signals from said sound sensors.
0. 13. The system of claim 11, wherein said adaptive filter comprises a set of sub-band adaptive filters comprising:
an analysis filter bank that splits said enhanced target sound signal from said fixed beamformer and said ambient noise signals from said blocking matrix into a plurality of frequency sub-bands;
an adaptive filter matrix that adaptively filters said ambient noise signals in each of said frequency sub-bands in response to detecting one of presence and absence of said target sound signal in said sound signals received from said disparate sound sources; and
a synthesis filter bank that synthesizes a full-band sound signal using said frequency sub-bands of said enhanced target sound signal.
0. 14. The system of claim 9, wherein said adaptive beamforming unit further comprises an adaptation control unit that detects said presence of said target sound signal and adjusts a step size for said adaptive filtering in response to detecting one of said presence and said absence of said target sound signal in said sound signals received from said disparate sound sources.
0. 15. The system of claim 9, wherein said noise reduction unit is one of a Wiener-filter based noise reduction unit, a spectral subtraction noise reduction unit, an auditory transform based noise reduction unit, and a model based noise reduction unit.
0. 16. The system of claim 9, further comprising one or more audio codecs that convert said sound signals in an analog form of said sound signals into digital sound signals and reconverts said digital sound signals into said analog form of said sound signals.
0. 17. The system of claim 9, wherein said noise reduction unit performs noise reduction in a plurality of frequency sub-bands employed by an analysis filter bank of said adaptive beamforming unit for sub-band adaptive beamforming.
0. 18. The system of claim 9, wherein said array of said sound sensors is one of a linear array of said sound sensors, a circular array of said sound sensors, and an arbitrarily distributed coplanar array of said sound sensors.
0. 19. The method of claim 1, wherein said delay (τ) is determined by a formula τ=fs*t, wherein fs is a sampling frequency and t is a time delay.
0. 20. A method for enhancing a target sound signal from a plurality of sound signals, comprising:
providing a microphone array system comprising an array of sound sensors positioned in an arbitrary configuration, a sound source localization unit, an adaptive beamforming unit, and a noise reduction unit, wherein said sound source localization unit, said adaptive beamforming unit, and said noise reduction unit are in operative communication with said array of said sound sensors;
receiving said sound signals from a plurality of disparate sound sources by said sound sensors, wherein said received sound signals comprise said target sound signal from a target sound source among said disparate sound sources, and ambient noise signals;
determining a delay between each of said sound sensors and an origin of said array of said sound sensors as a function of distance between each of said sound sensors and said origin, a predefined angle between each of said sound sensors and a first reference axis, an elevation angle between a second reference axis and said target sound signal, and an azimuth angle between said first reference axis and said target sound signal, when said target sound source that emits said target sound signal is in a three dimensional plane, wherein said delay is represented in terms of number of samples, and wherein said determination of said delay enables beamforming for arbitrary numbers of said sound sensors and a plurality of arbitrary configurations of said array of said sound sensors;
estimating a spatial location of said target sound signal from said received sound signals by said sound source localization unit;
performing adaptive beamforming for steering a directivity pattern of said array of said sound sensors in a direction of said spatial location of said target sound signal by said adaptive beamforming unit, wherein said adaptive beamforming unit enhances said target sound signal and partially suppresses said ambient noise signals; and
suppressing said ambient noise signals by said noise reduction unit for further enhancing said target sound signal.
0. 21. A system for enhancing a target sound signal from a plurality of sound signals, comprising:
an array of sound sensors positioned in an arbitrary configuration, wherein said sound sensors receive said sound signals from a plurality of disparate sound sources, wherein said received sound signals comprise said target sound signal from a target sound source among said disparate sound sources, and ambient noise signals;
a sound source localization unit that estimates a spatial location of said target sound signal from said received sound signals as a function of distance between each of said sound sensors and said origin, a predefined angle between each of said sound sensors and a first reference axis, an elevation angle between a second reference axis and said target sound signal, and an azimuth angle between said first reference axis and said target sound signal, when said target sound source that emits said target sound signal is in a three dimensional plane, wherein said delay is represented in terms of number of samples, and wherein said determination of said delay enables beamforming for arbitrary numbers of said sound sensors and a plurality of arbitrary configurations of said array of said sound sensors;
an adaptive beamforming unit that steers directivity pattern of said array of said sound sensors in a direction of said spatial location of said target sound signal, wherein said adaptive beamforming unit enhances said target sound signal and partially suppresses said ambient noise signals; and
a noise reduction unit that suppresses said ambient noise signals for further enhancing said target sound signal.
0. 23. The method of claim 22, wherein said location of said target sound signal from said target sound source is estimated using a steered response power-phase transform by said sound source localization unit.
0. 24. The method of claim 22, wherein said adaptive beamforming comprises:
providing a fixed beamformer, a blocking matrix, and an adaptive filter in said adaptive beamforming unit;
steering said directivity pattern of said array of said sound sensors in said direction of said location of said target sound signal from said target sound source by said fixed beamformer for enhancing said target sound signal, when said target sound source is in motion;
feeding said ambient noise signals to said adaptive filter by blocking said target sound signal received from said target sound source using said blocking matrix; and
adaptively filtering said ambient noise signals by said adaptive filter in response to voice activity detection, wherein said voice activity detection comprises detecting one of presence and absence of said target sound signal in said sound signals received from said disparate sound sources.
0. 25. The method of claim 24, wherein said fixed beamformer performs fixed beamforming by one of filtering and summing output sound signals from said sound sensors, and delaying and summing output sound signals from said sound sensors.
0. 26. The method of claim 24, wherein said adaptive filtering comprises sub-band adaptive filtering performed by said adaptive filter, and wherein said sub-band adaptive filtering comprises:
providing an analysis filter bank, an adaptive filter matrix, and a synthesis filter bank in said adaptive filter;
splitting said enhanced target sound signal from said fixed beamformer and said ambient noise signals from said blocking matrix into a plurality of frequency sub-bands by said analysis filter bank;
adaptively filtering said ambient noise signals in each of said frequency sub-bands by said adaptive filter matrix in response to said detection of one of said presence and said absence of said target sound signal in said sound signals received from said disparate sound sources; and
synthesizing a full-band sound signal using said frequency sub-bands of said enhanced target sound signal by said synthesis filter bank.
0. 27. The method of claim 24, wherein said adaptive beamforming further comprises detecting said presence of said target sound signal by an adaptation control unit provided in said adaptive beamforming unit and adjusting a step size for said adaptive filtering in response to said detection of one of said presence and said absence of said target sound signal in said sound signals received from said disparate sound sources.
0. 28. The method of claim 22, wherein said noise reduction unit performs noise reduction by using one of a Wiener-filter based noise reduction algorithm, a spectral subtraction noise reduction algorithm, an auditory transform based noise reduction algorithm, and a model based noise reduction algorithm.
0. 29. The method of claim 22, wherein said noise reduction unit performs noise reduction in a plurality of frequency sub-bands, wherein said frequency sub-bands are employed by an analysis filter bank of said adaptive beamforming unit for sub-band adaptive beamforming, wherein said sound source localization unit calculates said delay (τ) based on said number of samples within a time period and a time delay for said target sound signal to travel said distance between each of said sound sensors in said microphone array and said origin of said array of said sound sensors, and wherein said distance between said each of said sound sensors in the microphone array and said origin of said array of said sound sensors is either a same distance or a different distance.
0. 31. The system of claim 30, wherein said sound source localization unit estimates said location of said target sound signal from said target sound source using a steered response power-phase transform.
0. 32. The system of claim 30, wherein said adaptive beamforming unit comprises:
a fixed beamformer that steers said directivity pattern of said array of said sound sensors in said direction of said location of said target sound signal from said target sound source for enhancing said target sound signal, when said target sound source is in motion;
a blocking matrix that feeds said ambient noise signals to an adaptive filter by blocking said target sound signal received from said target sound source; and
said adaptive filter adaptively filters said ambient noise signals in response to voice activity detection, wherein said voice activity detection comprises detecting one of presence and absence of said target sound signal in said sound signals received from said disparate sound sources.
0. 33. The system of claim 32, wherein said fixed beamformer performs fixed beamforming by one of filtering and summing output sound signals from said sound sensors, and delaying and summing output sound signals from said sound sensors.
0. 34. The system of claim 32, wherein said adaptive filter comprises a set of sub-band adaptive filters comprising:
an analysis filter bank that splits said enhanced target sound signal from said fixed beamformer and said ambient noise signals from said blocking matrix into a plurality of frequency sub-bands;
an adaptive filter matrix that adaptively filters said ambient noise signals in each of said frequency sub-bands in response to said detection of one of said presence and said absence of said target sound signal in said sound signals received from said disparate sound sources; and
a synthesis filter bank that synthesizes a full-band sound signal using said frequency sub-bands of said enhanced target sound signal.
0. 35. The system of claim 32, wherein said adaptive beamforming unit further comprises an adaptation control unit that detects said presence of said target sound signal and adjusts a step size for said adaptive filtering in response to said detection of one of said presence and said absence of said target sound signal in said sound signals received from said disparate sound sources.
0. 36. The system of claim 30, wherein said noise reduction unit is one of a Wiener-filter based noise reduction unit, a spectral subtraction noise reduction unit, an auditory transform based noise reduction unit, and a model based noise reduction unit, wherein said noise reduction unit performs noise reduction in a plurality of frequency sub-bands employed by an analysis filter bank of said adaptive beamforming unit for sub-band adaptive beamforming, wherein said sound source localization unit calculates said delay (τ) based on said number of samples within a time period and a time delay for said target sound signal to travel said distance between each of said sound sensors in said microphone array and said origin of said array of said sound sensors, and wherein said distance between said each of said sound sensors in the microphone array and said origin of said array of said sound sensors is either a same distance or a different distance.
0. 37. The system of claim 30, further comprising one or more audio codecs that convert said sound signals in an analog form of said sound signals into digital sound signals and reconverts said digital sound signals into said analog form of said sound signals.
0. 39. The system of claim 38, wherein said microphone array system is implemented in one of devices with speech acquisition capability, hands-free devices, handheld devices, conference phones and video conferencing applications, wherein said handheld devices comprise smart phones, tablet computers and laptop computers, and wherein said array of said sound sensors is one of a linear array of said sound sensors, a circular array of said sound sensors, and other types of array of said sound sensors.
0. 40. The method of claim 22, wherein said microphone array system is implemented in one of devices with speech acquisition capability, hands-free devices, handheld devices, conference phones and video conferencing applications, wherein said handheld devices comprise smart phones, tablet computers and laptop computers.
0. 41. The system of claim 30, wherein said microphone array system is implemented in one of devices with speech acquisition capability, hands-free devices, handheld devices, conference phones and video conferencing applications, wherein said handheld devices comprise smart phones, tablet computers and laptop computers.

This application
where wT=[w0T, w1T, w2T, w3T, . . . , wN−1T] and
g(ω,θ)={gi(ω, θ)}i=1 . . . NL={e−jω(k+τn(θ))}i=1 . . . NL is the steering vector, i=1 . . . NL, and k=mod(i−1,L) and n=floor((i−1)/L).

FIGS. 7A-7C exemplarily illustrate an embodiment of a microphone array 201 when the target sound source is in a three dimensional plane. In an embodiment where the target sound source that emits the target sound signal is in a three dimensional plane, the delay (τ) between each of the sound sensors 301 and the origin of the microphone array 201 is determined as a function of distance (d) between each of the sound sensors 301 and the origin, a predefined angle (Φ) between each of the sound sensors 301 and a first reference axis (Y), an elevation angle (Ψ) between a second reference axis (Z) and the target sound signal, and an azimuth angle (θ) between the first reference axis (Y) and the target sound signal. The determined delay (τ) is represented in terms of number of samples. The determination of the delay enables beamforming for arbitrary numbers of the sound sensors 301 and multiple arbitrary configurations of the microphone array 201.

Consider an example of a microphone array configuration with four sound sensors 301 M0, M1, M2, and M3. FIG. 7A exemplarily illustrates a graphical representation of a microphone array 201, when the target sound source in a three dimensional plane. As exemplarily illustrated in FIG. 7A, the target sound signal from the target sound source is received from the direction (Ψ, θ) with reference to the origin of the microphone array 201, where Ψ is the elevation angle and θ is the azimuth.

FIG. 7B exemplarily illustrates a table showing delay between each sound sensor 301 in a circular microphone array configuration and the origin of the microphone array 201, when the target sound source is in a three dimensional plane. The target sound source in a three dimensional plane emits a target sound signal from a spatial location (Ψ, θ). The distances between the origin O and the sound sensors 301 M0, M1, M2, and M3 when the incoming target sound signal from the target sound source is at an angle (Ψ, θ) from the Z-axis and the Y-axis respectively, are denoted as τ0, τ1, τ2, and τ3 respectively. When the spatial location of the target sound signal moves from the location Ψ=90° to a location Ψ=0°, sin(Ψ) changes from 1 to 0, and as a result, the difference between each sound sensor 301 in the microphone array 201 becomes smaller and smaller. When Ψ=0°, there is no difference between the sound sensors 301, which implies that the target sound signal reaches each sound sensor 301 at the same time. Taking into account that the sample delay between the sound sensors 301 can only be an integer, the range where all the sound sensors 301 are identical is determined.

FIG. 7C exemplarily illustrates a three dimensional working space of the microphone array 201, where the target sound signal is incident at an elevation angle Ψ<Ω, where Ω is a specific angle and is a variable representing the elevation angle. When the target sound signal is incident at an elevation angle Ψ<Ω, all four sound sensors 301 M0, M1, M2, and M3 receive the same target sound signal for 0°<0<360°. The delay τ is a function of both the elevation angle Ψ and the azimuth angle θ. That is, τ=τ(θ, Ψ). As used herein, Ω refers to the elevation angle such that all τi (θ, Ω) are equal to each other, where i=0, 1, 2, 3, etc. The value of Ω is determined by the sample delay between each of the sound sensors 301 and the origin of the microphone array 201. The adaptive beamforming unit 203 enhances sound from this range and suppresses sound signals from other directions, for example, S1 and S2 treating them as ambient noise signals.

Consider a least mean square solution for beamforming according to the method disclosed herein. Let the spatial directivity pattern be 1 in the passband and 0 in the stopband. The least square cost function is defined as:

J ( w ) = Ω p Θ p H ( ω , θ ) - 1 2 d ω d θ + α Ω s Θ s H ( ω , θ ) 2 d ω d θ = Ω p Θ p H ( ω , θ ) 2 d ω d θ + α Ω S Θ X H ( ω , θ ) 2 d ω d θ - 2 Ω P Θ P Re ( H ( ω , θ ) ) d ω d θ + Ω P Θ P 1 d ω d θ ( 3 )
Replacing
|H(ω,θ)|2=wTg(ω,θ)gH(ω,θ)w=wT(GR(ω,θ)+jG1(ω,θ))w=wTGR(ω,θ)w and Re(H(ω,θ))=wTgR(ω,θ),J(ω) becomes
J(ω)=wTQw−2wTα+d, where
Q=∫ΩPΘPGR(ω,θ)dωdθ+αθΩsΘSGR(ω,θ)dωdθ
α=∫ΩPΘPgR(ω,θ)dωdθ
d=∫ΩPΘP1dωdθ  (4)
where gR(ω,θ)=cos [ω(k+τn)] and GR(ω,θ)=cos [ω(k−l+τn−τm)].
When ∂J/∂w=0, the cost function J is minimized. The least-square estimate of w is obtained by:
w=Q−1α  (5)

Applying linear constrains Cw=b, the spatial response is further constrained to a predefined value b at angle θf using following equation:

[ g R T ( ω start , θ f ) g R T ( ω end , θ f ) ] w = [ b start b end ] ( 6 )
Now, the design problem becomes:

min w w T Qw - 2 w T a + d subject to Cw = b ( 7 )
and the solution of the constrained minimization problem is equal to:
w=Q−1CT(CQ−1CT)−1(b−CQ−1α)+Q−1α  (8)
where w is the filter parameter for the designed adaptive beamforming unit 203.

In an embodiment, the beamforming is performed by a delay-sum method. In another embodiment, the beamforming is performed by a filter-sum method.

FIG. 8 exemplarily illustrates a method for estimating a spatial location of the target sound signal from the target sound source by the sound source localization unit 202 using a steered response power-phase transform (SRP-PHAT). The SRP-PHAT combines the advantages of sound source localization methods, for example, the time difference of arrival (TDOA) method and the steered response power (SRP) method. The TDOA method performs the time delay estimation of the sound signals relative to a pair of spatially separated sound sensors 301. The estimated time delay is a function of both the location of the target sound source and the position of each of the sound sensors 301 in the microphone array 201. Because the position of each of the sound sensors 301 in the microphone array 201 is predefined, once the time delay is estimated, the location of the target sound source can be determined. In the SRP method, a filter-and-sum beamforming algorithm is applied to the microphone array 201 for sound signals in the direction of each of the disparate sound sources. The location of the target sound source corresponds to the direction in which the output of the filter-and-sum beamforming has the largest response power. The TDOA based localization is suitable under low to moderate reverberation conditions. The SRP method requires shorter analysis intervals and exhibits an elevated insensitivity to environmental conditions while not allowing for use under excessive multi-path. The SRP-PHAT method disclosed herein combines the advantages of the TDOA method and the SRP method, has a decreased sensitivity to noise and reverberations compared to the TDOA method, and provides more precise location estimates than existing localization methods.

For direction i (0≤t≤360), the delay Dit is calculated 801 between the tth pair of the sound sensors 301 (t=1: all pairs). The correlation value corr(Dit) between the tth pair of the sound sensors 301 corresponding to the delay of Dit is then calculated 802. For the direction i (0≤i≤360), the correlation value is given 803 by:

CORR i = t = 1 ALL PAIR corr ( D it )
Therefore, the spatial location of the target sound signal is given 804 by:

S = argmax 0 i 360 CORR i .

FIGS. 9A-9B exemplarily illustrate graphs showing the results of sound source localization performed using the steered response power-phase transform (SRP-PHAT). FIG. 9A exemplarily illustrates a graph showing the value of the SRP-PHAT for every 10° The maximum value corresponds to the location of the target sound signal from the target sound source. FIG. 9B exemplarily illustrates a graph representing the estimated target sound signal from the target sound source and a ground truth.

FIG. 10 exemplarily illustrates a system for performing adaptive beamforming by the adaptive beamforming unit 203. The algorithm for fixed beamforming is disclosed with reference to equations (3) through (8) in the detailed description of FIG. 4, FIGS. 6A-6B, and FIGS. 7A-7C, which is extended herein to adaptive beamforming. Adaptive beamforming refers to a beamforming process where the directivity pattern of the microphone array 201 is adaptively steered in the direction of a target sound signal emitted by a target sound source in motion. Adaptive beamforming achieves better ambient noise suppression than fixed beamforming. This is because the target direction of arrival, which is assumed to be stable in fixed beamforming, changes with the movement of the target sound source. Moreover, the gains of the sound sensors 301 which are assumed uniform in fixed beamforming, exhibit significant distribution. All these factors reduce speech quality. On the other hand, adaptive beamforming adaptively performs beam steering and null steering; therefore, the adaptive beamforming method is more robust against steering error caused by the array imperfection mentioned above.

As exemplarily illustrated in FIG. 10, the adaptive beamforming unit 203 disclosed herein comprises a fixed beamformer 204, a blocking matrix 205, an adaptation control unit 208, and an adaptive filter 206. The fixed beamformer 204 adaptively steers the directivity pattern of the microphone array 201 in the direction of the spatial location of the target sound signal from the target sound source for enhancing the target sound signal, when the target sound source is in motion. The sound sensors 301 in the microphone array 201 receive the sound signals S1, . . . , S4, which comprise both the target sound signal from the target sound source and the ambient noise signals. The received sound signals are fed as input to the fixed beamformer 204 and the blocking matrix 205. The fixed beamformer 204 outputs a signal “b”. In an embodiment, the fixed beamformer 204 performs fixed beamforming by filtering and summing output sound signals from the sound sensors 301. The blocking matrix 205 outputs a signal “z” which primarily comprises the ambient noise signals. The blocking matrix 205 blocks the target sound signal from the target sound source and feeds the ambient noise signals to the adaptive filter 206 to minimize the effect of the ambient noise signals on the enhanced target sound signal.

The output “z” of the blocking matrix 205 may contain some weak target sound signals due to signal leakage. If the adaptation is active when the target sound signal, for example, speech is present, the speech is cancelled out with the noise. Therefore, the adaptation control unit 208 determines when the adaptation should be applied. The adaptation control unit 208 comprises a target sound signal detector 208a and a step size adjusting module 208b. The target sound signal detector 208a of the adaptation control unit 208 detects the presence or absence of the target sound signal, for example, speech. The step size adjusting module 208b adjusts the step size for the adaptation process such that when the target sound signal is present, the adaptation is slow for preserving the target sound signal, and when the target sound signal is absent, adaptation is quick for better cancellation of the ambient noise signals.

The adaptive filter 206 is a filter that adaptively updates filter coefficients of the adaptive filter 206 so that the adaptive filter 206 can be operated in an unknown and changing environment. The adaptive filter 206 adaptively filters the ambient noise signals in response to detecting presence or absence of the target sound signal in the sound signals received from the disparate sound sources. The adaptive filter 206 adapts its filter coefficients with the changes in the ambient noise signals, thereby eliminating distortion in the target sound signal, when the target sound source and the ambient noise signals are in motion. In an embodiment, the adaptive filtering is performed by a set of sub-band adaptive filters using sub-band adaptive filtering as disclosed in the detailed description of FIG. 11.

FIG. 11 exemplarily illustrates a system for sub-band adaptive filtering. Sub-band adaptive filtering involves separating a full-band signal into different frequency ranges called sub-bands prior to the filtering process. The sub-band adaptive filtering using sub-band adaptive filters lead to a higher convergence speed compared to using a full-band adaptive filter. Moreover, the noise reduction unit 207 disclosed herein is developed in a sub-band, whereby applying sub-band adaptive filtering provides the same sub-band framework for both beamforming and noise reduction, and thus saves on computational cost.

As exemplarily illustrated in FIG. 11, the adaptive filter 206 comprises an analysis filter bank 206a, an adaptive filter matrix 206b, and a synthesis filter bank 206c. The analysis filter bank 206a splits the enhanced target sound signal (b) from the fixed beamformer 204 and the ambient noise signals (z) from the blocking matrix 205 exemplarily illustrated in FIG. 10 into multiple frequency sub-bands. The analysis filter bank 206a performs an analysis step where the outputs of the fixed beamformer 204 and the blocking matrix 205 are split into frequency sub bands. The sub-band adaptive filter 206 typically has a shorter impulse response than its full band counterpart. The step size of the sub-bands can be adjusted individually for each sub-band by the step-size adjusting module 208b, which leads to a higher convergence speed compared to using a full band adaptive filter.

The adaptive filter matrix 206b adaptively filters the ambient noise signals in each of the frequency sub-bands in response to detecting the presence or absence of the target sound signal in the sound signals received from the disparate sound sources. The adaptive filter matrix 206b performs an adaptation step, where the adaptive filter 206 is adapted such that the filter output only contains the target sound signal, for example, speech. The synthesis filter bank 206c synthesizes a full-band sound signal using the frequency sub-bands of the enhanced target sound signal. The synthesis filter bank 206c performs a synthesis step where the sub-band sound signal is synthesized into a full-band sound signal. Since the noise reduction and the beamforming are performed in the same sub-band framework, the noise reduction as disclosed in the detailed description of FIG. 13, by the noise reduction unit 207 is performed prior to the synthesis step, thereby reducing computation.

In an embodiment, the analysis filter bank 206a is implemented as a perfect-reconstruction filter bank, where the output of the synthesis filter bank 206c after the analysis and synthesis steps perfectly matches the input to the analysis filter bank 206a. That is, all the sub-band analysis filter banks 206a are factorized to operate on prototype filter coefficients and a modulation matrix is used to take advantage of the fast Fourier transform (FFT). Both analysis and synthesize steps require performing frequency shifts in each sub-band, which involves complex value computations with cosines and sinusoids. The method disclosed herein employs the FFT to perform the frequency shifts required in each sub-band, thereby minimizing the amount of multiply-accumulate operations. The implementation of the sub-band analysis filter bank 206a as a perfect-reconstruction filter bank ensures the quality of the target sound signal by ensuring that the sub-band analysis filter banks 206a do not distort the target sound signal itself.

FIG. 12 exemplarily illustrates a graphical representation showing the performance of a perfect-reconstruction filter bank. The solid line represents the input signal to the analysis filter bank 206a, and the circles represent the output of the synthesis filter bank 206c after analysis and synthesis. As exemplarily illustrated in FIG. 12, the output of the synthesis filter bank 206c perfectly matches the input, and is therefore referred to as the perfect-reconstruction filter bank.

FIG. 13 exemplarily illustrates a block diagram of a noise reduction unit 207 for performing noise reduction using, for example, a Wiener-filter based noise reduction algorithm. The noise reduction unit 207 performs noise reduction for further suppressing the ambient noise signals after adaptive beamforming, for example, by using a Wiener-filter based noise reduction algorithm, a spectral subtraction noise reduction algorithm, an auditory transform based noise reduction algorithm, or a model based noise reduction algorithm. In an embodiment, the noise reduction unit 207 performs noise reduction in multiple frequency sub-bands employed by an analysis filter bank 206a of the adaptive beamforming unit 203 for sub-band adaptive beamforming.

In an embodiment, the noise reduction is performed using the Wiener-filter based noise reduction algorithm. The noise reduction unit 207 explores the short-term and long-term statistics of the target sound signal, for example, speech, and the ambient noise signals, and the wide-band and narrowband signal-to-noise ratio (SNR) to support a Wiener gain filtering. The noise reduction unit 207 comprises a target sound signal statistics analyzer 207a, a noise statistics analyzer 207b, a signal-to-noise ratio (SNR) analyzer 207c, and a Wiener filter 207d. The target sound signal statistics analyzer 207a explores the short-term and long-term statistics of the target sound signal, for example, speech. Similarly, the noise statistics analyzer 207b explores the short-term and long-term statistics of the ambient noise signals. The SNR analyzer 207c of the noise reduction unit 207 explores the wide-band and narrow-band signal-to-noise ratio (SNR). After the spectrum of noisy-speech passes through the Wiener filter 207d, an estimation of the clean-speech spectrum is generated. The synthesis filter bank 206c, by an inverse process of the analysis filter bank 206a, reconstructs the signals of the clean speech into a full-band signal, given the estimated spectrum of the clean speech.

FIG. 14 exemplarily illustrates a hardware implementation of the microphone array system 200 disclosed herein. The hardware implementation of the microphone array system 200 disclosed in the detailed description of FIG. 2 comprises the microphone array 201 having an arbitrary number of sound sensors 301 positioned in an arbitrary configuration, multiple microphone amplifiers 1401, one or more audio codecs 1402, a digital signal processor (DSP) 1403, a flash memory 1404, one or more power regulators 1405 and 1406, a battery 1407, a loudspeaker or a headphone 1408, and a communication interface 1409. The microphone array 201 comprises, for example, four or eight sound sensors 301 arranged in a linear or a circular microphone array configuration. The microphone array 201 receives the sound signals.

Consider an example where the microphone array 201 comprises four sound sensors 301 that pick up the sound signals. Four microphone amplifiers 1401 receive the output sound signals from the four sound sensors 301. The microphone amplifiers 1401 also referred to as preamplifiers provide a gain to boost the power of the received sound signals for enhancing the sensitivity of the sound sensors 301. In an example, the gain of the preamplifiers is 20 dB.

The audio codec 1402 receives the amplified output from the microphone amplifiers 1401. The audio codec 1402 provides an adjustable gain level, for example, from about −74 dB to about 6 dB. The received sound signals are in an analog form. The audio codec 1402 converts the four channels of the sound signals in the analog form into digital sound signals. The pre-amplifiers may not be required for some applications. The audio codec 1402 then transmits the digital sound signals to the DSP 1403 for processing of the digital sound signals. The DSP 1403 implements the sound source localization unit 202, the adaptive beamforming unit 203, and the noise reduction unit 207.

After the processing, the DSP 1403 either stores the processed signal from the DSP 1403 in a memory device for a recording application, or transmits the processed signal to the communication interface 1409. The recording application comprises, for example, storing the processed signal onto the memory device for the purposes of playing back the processed signal at a later time. The communication interface 1409 transmits the processed signal, for example, to a computer, the internet, or a radio for communicating the processed signal. In an embodiment, the microphone array system 200 disclosed herein implements a two-way communication device where the signal received from the communication interface 1409 is processed by the DSP 1403 and the processed signal is then played through the loudspeaker or the headphone 1408.

The flash memory 1404 stores the code for the DSP 1403 and compressed audio signals. When the microphone array system 200 boots up, the DSP 1403 reads the code from the flash memory 1404 into an internal memory of the DSP 1403 and then starts executing the code. In an embodiment, the audio codec 1402 can be configured for encoding and decoding audio or sound signals during the start up stage by writing to registers of the DSP 1403. For an eight-sensor microphone array 201, two four-channel audio codec 1402 chips may be used. The power regulators 1405 and 1406, for example, linear power regulators 1405 and switch power regulators 1406 provide appropriate voltage and current supply for all the components, for example, 201, 1401, 1402, 1403, etc., mechanically supported and electrically connected on a circuit board. A universal serial bus (USB) control is built into the DSP 1403. The battery 1407 is used for powering the microphone array system 200.

Consider an example where the microphone array system 200 disclosed herein is implemented on a mixed signal circuit board having a six-layer printed circuit board (PCB). Noisy digital signals easily contaminate the low voltage analog sound signals from the sound sensors 301. Therefore, the layout of the mixed signal circuit board is carefully partitioned to isolate the analog circuits from the digital circuits. Although both the inputs and outputs of the microphone amplifiers 1401 are in analog form, the microphone amplifiers 1401 are placed in a digital region of the mixed signal circuit board because of their high power consumption 1401 and switch amplifier nature.

The linear power regulators 1405 are deployed in an analog region of the mixed signal circuit board due to the low noise property exhibited by the linear power regulators 1405. Five power regulators, for example, 1405 are designed in the microphone array system 200 circuits to ensure quality. The switch power regulators 1406 achieve an efficiency of about 95% of the input power and have high output current capacity; however their outputs are too noisy for analog circuits. The efficiency of the linear power regulators 1405 is determined by the ratio of the output voltage to the input voltage, which is lower than that of the switch power regulators 1406 in most cases. The regulator outputs utilized in the microphone array system 200 circuits are stable, quiet, and suitable for the low power analog circuits.

In an example, the microphone array system 200 is designed with a microphone array 201 having dimensions of 10 cm×2.5 cm×1.5 cm, a USB interface, and an assembled PCB supporting the microphone array 201 and a DSP 1403 having a low power consumption design devised for portable devices, a four-channel codec 1402, and a flash memory 1404. The DSP 1403 chip is powerful enough to handle the DSP 1403 computations in the microphone array system 200 disclosed herein. The hardware configuration of this example can be used for any microphone array configuration, with suitable modifications to the software. In an embodiment, the adaptive beamforming unit 203 of the microphone array system 200 is implemented as hardware with software instructions programmed on the DSP 1403. The DSP 1403 is programmed for beamforming, noise reduction, echo cancellation, and USB interfacing according to the method disclosed herein, and fine tuned for optimal performance.

FIGS. 15A-15C exemplarily illustrate a conference phone 1500 comprising an eight-sensor microphone array 201. The eight-sensor microphone array 201 comprises eight sound sensors 301 arranged in a configuration as exemplarily illustrated in FIG. 15A. A top view of the conference phone 1500 comprising the eight-sensor microphone array 201 is exemplarily illustrated in FIG. 15A. A front view of the conference phone 1500 comprising the eight-sensor microphone array 201 is exemplarily illustrated in FIG. 15B. A headset 1502 that can be placed in a base holder 1501 of the conference phone 1500 having the eight-sensor microphone array 201 is exemplarily illustrated in FIG. 15C. In addition to a conference phone 1500, the microphone array system 200 disclosed herein with broadband beamforming can be configured for a mobile phone, a tablet computer, etc., for speech enhancement and noise reduction.

FIG. 16A exemplarily illustrates a layout of an eight-sensor microphone array 201 for a conference phone 1500. Consider an example of a circular microphone array 201 in which eight sound sensors 301 are mounted on the surface of the conference phone 1500 as exemplarily illustrated in FIG. 15A. The conference phone 1500 has a removable handset 1502 on top, and hence the microphone array system 200 is configured to accommodate the handset 1502 as exemplarily illustrated in FIGS. 15A-15C. In an example, the circular microphone array 201 has a diameter of about four inches. Eight sound sensors 301, for example, microphones, M0, M1, M2, M3, M4, M5, M6, and M7 are distributed along a circle 302 on the conference phone 1500. Microphones M4-M7 are separated by 90 degrees from each other, and microphones Mo-M3 are rotated counterclockwise by 60 degrees from microphone M4-M7 respectively.

FIG. 16B exemplarily illustrates a graphical representation of eight spatial regions to which the eight-sensor microphone array 201 of FIG. 16A responds. The space is divided into eight spatial regions with equal spaces centered at 15°, 60°, 105°, 150°, 195°, 240°, 285°, and 330° respectively. The adaptive beamforming unit 203 configures the eight-sensor microphone array 201 to automatically point to one of these eight spatial regions according to the location of the target sound signal from the target sound source as estimated by the sound source localization unit 202.

FIGS. 16C-16D exemplarily illustrate computer simulations showing the steering of the directivity patterns of the eight-sensor microphone array 201 of FIG. 16A, in the directions 15° and 60° respectively, in the frequency range 300 Hz to 5 kHz. FIG. 16C exemplarily illustrates the computer simulation result showing the directivity pattern of the microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 15°.

The computer simulation for verifying the performance of the adaptive beamforming unit 203 when the target sound signal is received from the target sound source in the spatial region centered at 15° uses the following parameters:

Sampling frequency fs=16 k,

FIR filter taper length L=20

Passband (Θp, Ωp)={300-5000 Hz, −5°-35°}, designed spatial directivity pattern is 1.

Stopband (Θs, Ωs)={300˜5000 Hz, −180°˜−15°+45°˜180°}, the designed spatial directivity pattern is 0.

It can be seen that the directivity pattern of the microphone array 201 in the spatial region centered at 15° is enhanced while the sound signals from all other spatial regions are suppressed.

FIG. 16D exemplarily illustrates the computer simulation result showing the directivity pattern of the microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 60°. The computer simulation for verifying the performance of the adaptive beamforming unit 203 when the target sound signal is received from the target sound source in the spatial region centered at 60° uses the following parameters:

Sampling frequency fs=16 k,

FIR filter taper length L=20

Passband (Θp, Ωp)={300-5000 Hz, 40°-80°}, designed spatial directivity pattern is 1.

Stopband (Θs, Ωs)={300˜5000 Hz, −180°˜30°+90°˜180°}, the designed spatial directivity pattern is 0.

It can be seen that the directivity pattern of the microphone array 201 in the spatial region centered at 60° is enhanced while the sound signals from all other spatial regions are suppressed. The other six spatial regions have similar parameters. Moreover, in all frequencies, the main lobe has the same level, which means the target sound signal has little distortion in frequency.

FIGS. 16E-16L exemplarily illustrate graphical representations showing the directivity patterns of the eight-sensor microphone array 201 of FIG. 16A in each of the eight spatial regions, where each directivity pattern is an average response from 300 Hz to 5000 Hz. The main lobe is about 10 dB higher than the side lobe, and therefore the ambient noise signals from other directions are highly suppressed compared to the target sound signal in the pass direction. The microphone array system 200 calculates the filter coefficients for the target sound signal, for example, speech signals from each sound sensor 301 and combines the filtered signals to enhance the speech from any specific direction. Since speech covers a large range of frequencies, the method and system 200 disclosed herein covers broadband signals from 300 Hz to 5000 Hz.

FIG. 16E exemplarily illustrates a graphical representation showing the directivity pattern of the eight-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 15°. FIG. 16F exemplarily illustrates a graphical representation showing the directivity pattern of the eight-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 60°. FIG. 16G exemplarily illustrates a graphical representation showing the directivity pattern of the eight-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 105°. FIG. 16H exemplarily illustrates a graphical representation showing the directivity pattern of the eight-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 150°. FIG. 16I exemplarily illustrates a graphical representation showing the directivity pattern of the eight-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 195°. FIG. 16J exemplarily illustrates a graphical representation showing the directivity pattern of the eight-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 240°. FIG. 16K exemplarily illustrates a graphical representation showing the directivity pattern of the eight-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 285°. FIG. 16L exemplarily illustrates a graphical representation showing the directivity pattern of the eight-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 330°. The microphone array system 200 disclosed herein enhances the target sound signal from each of the directions 15°, 60°, 105°, 150°, 195°, 240°, 285°, and 330°, while suppressing the ambient noise signals from the other directions.

The microphone array system 200 disclosed herein can be implemented for a square microphone array configuration and a rectangular array configuration where a sound sensor 301 is positioned in each corner of the four-cornered array. The microphone array system 200 disclosed herein implements beamforming from plane to three dimensional sound sources.

FIG. 17A exemplarily illustrates a graphical representation of four spatial regions to which a four-sensor microphone array 201 for a wireless handheld device responds. The wireless handheld device is, for example, a mobile phone. Consider an example where the microphone array 201 comprises four sound sensors 301, for example, microphones, uniformly distributed around a circle 302 having diameter equal to about two inches. This configuration is identical to positioning four sound sensors 301 or microphones on four corners of a square. The space is divided into four spatial regions with equal space centered at −90°, 0°, 90°, and 180° respectively. The adaptive beamforming unit 203 configures the four-sensor microphone array 201 to automatically point to one of these spatial regions according to the location of the target sound signal from the target sound source as estimated by the sound source localization unit 202.

FIGS. 17B-17I exemplarily illustrate computer simulations showing the directivity patterns of the four-sensor microphone array 201 of FIG. 17A with respect to azimuth and frequency. The results of the computer simulations performed for verifying the performance of the adaptive beamforming unit 203 of the microphone array system 200 disclosed herein for a sampling frequency fs=16 k and FIR filter taper length L=20, are as follows:

For the spatial region centered at 0°:

Passband (Θp, Ωp)={300-4000 Hz, −20°-20°}, designed spatial directivity pattern is 1.

Stopband (Θ, Ωs)={300˜4000 Hz, −180°˜−30°+30°˜180°}, the designed spatial directivity pattern is 0.

For the spatial region centered at 90°:

Passband (Θp, Ωp)={300-4000 Hz, 70°-110°}, designed spatial directivity pattern is 1.

Stopband (Θs, Ωs)={300˜4000 Hz, −180°˜60°+120°˜180°}, the designed spatial directivity pattern is 0. The directivity patterns for the spatial regions centered at −90° and 180° are similarly obtained.

FIG. 17B exemplarily illustrates the computer simulation result representing a three dimensional (3D) display of the directivity pattern of the four-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at −90°. FIG. 17C exemplarily illustrates the computer simulation result representing a 2D display of the directivity pattern of the four-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at −90°.

FIG. 17D exemplarily illustrates the computer simulation result representing a 3D display of the directivity pattern of the four-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 0°. FIG. 17E exemplarily illustrates the computer simulation result representing a 2D display of the directivity pattern of the four-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 0°.

FIG. 17F exemplarily illustrates the computer simulation result representing a 3D display of the directivity pattern of the four-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 90°. FIG. 17G exemplarily illustrates the computer simulation result representing a 2D display of the directivity pattern of the four-sensor microphone array 201 when the target sound signal is received from the target sound source in the spatial region centered at 90°.

FIG. 17H exemplarily illustrates the computer simulation result representing a 3D display of the directivity pattern of the four-sensor microphone array 201 when the target sound source is received from the target sound source in the spatial region centered at 180°. FIG. 17I exemplarily illustrates the computer simulation result representing a 2D display of the directivity pattern of the four-sensor microphone array 201 when the target sound source is received from the target sound source in the spatial region centered at 180°. The 3D displays of the directivity patterns in FIG. 17B, FIG. 17D, FIG. 17F, and FIG. 17H demonstrate that the passbands have the same height. The 2D displays of the directivity patterns in FIG. 17C, FIG. 17E, FIG. 17G, and FIG. 17I demonstrate that the passbands have the same width along the frequency and demonstrates the broadband properties of the microphone array 201.

FIGS. 18A-18B exemplarily illustrates a microphone array configuration for a tablet computer. In this example, four sound sensors 301 of the microphone array 201 are positioned on a frame 1801 of the tablet computer, for example, the iPad® of Apple Inc. Geometrically, the sound sensors 301 are distributed on the circle 302 as exemplarily in FIG. 18B. The radius of the circle 302 is equal to the width of the tablet computer. The angle θ between the sound sensors 301 M2 and M3 is determined to avoid spatial aliasing up to 4000 Hz. This microphone array configuration enhances a front speaker's voice and suppresses background ambient noise. The adaptive beamforming unit 203 configures the microphone array 201 to form an acoustic beam 1802 pointing frontwards using the method and system 200 disclosed herein. The target sound signal, that is, the front speaker's voice within the range of Φ<30° is enhanced compared to the sound signals from other directions.

FIG. 18C exemplarily illustrates an acoustic beam 1802 formed using the microphone array configuration of FIGS. 18A-18B according to the method and system 200 disclosed herein.

FIGS. 18D-18G exemplarily illustrates graphs showing processing results of the adaptive beamforming unit 203 and the noise reduction unit 207 for the microphone array configuration of FIG. 18B, in both a time domain and a spectral domain for the tablet computer. Consider an example where a speaker is talking in front of the tablet computer with ambient noise signals on the side. FIG. 18D exemplarily illustrates a graph showing the performance of the microphone array 201 before performing beamforming and noise reduction with a signal-to-noise ratio (SNR) of 15 dB. FIG. 18E exemplarily illustrates a graph showing the performance of the microphone array 201 after performing beamforming and noise reduction, according to the method disclosed herein, with an SNR of 15 dB. FIG. 18F exemplarily illustrates a graph showing the performance of the microphone array 201 before performing beamforming and noise reduction with an SNR of 0 dB. FIG. 18G exemplarily illustrates a graph showing the performance of the microphone array 201 after performing beamforming and noise reduction, according to the method disclosed herein, with an SNR of 0 dB.

It can be seen from FIGS. 18D-18G that the performance graph is noisier for the microphone array 201 before the beamforming and noise reduction is performed. Therefore, the adaptive beamforming unit 203 and the noise reduction unit 207 of the microphone array system 200 disclosed herein suppresses ambient noise signals while maintaining the clarity of the target sound signal, for example, the speech signal.

FIGS. 19A-19F exemplarily illustrate tables showing different microphone array configurations and the corresponding values of delay τn for the sound sensors 301 in each of the microphone array configurations. The broadband beamforming method disclosed herein can be used for microphone arrays 201 with arbitrary numbers of sound sensors 301 and arbitrary locations of the sound sensors 301. The sound sensors 301 can be mounted on surfaces or edges of any speech acquisition device. For any specific microphone array configuration, the only parameter that needs to be defined to achieve the beamformer coefficients is the value of; for each sound sensor 301 as disclosed in the detailed description of FIG. 5, FIGS. 6A-6B, and FIGS. 7A-7C and as exemplarily illustrated in FIGS. 19A-19F. In an example, the microphone array configuration exemplarily illustrated in FIG. 19F is implemented on a handheld device for hands-free speech acquisition. In a hands-free and non-close talking scenario, a user prefers to talk in distance rather than speaking close to the sound sensor 301 and may want to talk while watching a screen of the handheld device. The microphone array system 200 disclosed herein allows the handheld device to pick up sound signals from the direction of the speaker's mouth and suppress noise from other directions. The method and system 200 disclosed herein may be implemented on any device or equipment, for example, a voice recorder where a target sound signal or speech needs to be enhanced.

The foregoing examples have been provided merely for the purpose of explanation and are in no way to be construed as limiting of the present invention disclosed herein. While the invention has been described with reference to various embodiments, it is understood that the words, which have been used herein, are words of description and illustration, rather than words of limitation. Further, although the invention has been described herein with reference to particular means, materials and embodiments, the invention is not intended to be limited to the particulars disclosed herein; rather, the invention extends to all functionally equivalent structures, methods and uses, such as are within the scope of the appended claims. Those skilled in the art, having the benefit of the teachings of this specification, may affect numerous modifications thereto and changes may be made without departing from the scope and spirit of the invention in its aspects.

Li, Qi, Zhu, Manli

Patent Priority Assignee Title
11100923, Sep 28 2018 Sonos, Inc Systems and methods for selective wake word detection using neural network models
11120794, May 03 2019 Sonos, Inc; Sonos, Inc. Voice assistant persistence across multiple network microphone devices
11132989, Dec 13 2018 Sonos, Inc Networked microphone devices, systems, and methods of localized arbitration
11133018, Jun 09 2016 Sonos, Inc. Dynamic player selection for audio signal processing
11137979, Feb 22 2016 Sonos, Inc. Metadata exchange involving a networked playback system and a networked microphone system
11138969, Jul 31 2019 Sonos, Inc Locally distributed keyword detection
11138975, Jul 31 2019 Sonos, Inc Locally distributed keyword detection
11159880, Dec 20 2018 Sonos, Inc. Optimization of network microphone devices using noise classification
11175880, May 10 2018 Sonos, Inc Systems and methods for voice-assisted media content selection
11175888, Sep 29 2017 Sonos, Inc. Media playback system with concurrent voice assistance
11183181, Mar 27 2017 Sonos, Inc Systems and methods of multiple voice services
11183183, Dec 07 2018 Sonos, Inc Systems and methods of operating media playback systems having multiple voice assistant services
11184704, Feb 22 2016 Sonos, Inc. Music service selection
11184969, Jul 15 2016 Sonos, Inc. Contextualization of voice inputs
11189286, Oct 22 2019 Sonos, Inc VAS toggle based on device orientation
11197096, Jun 28 2018 Sonos, Inc. Systems and methods for associating playback devices with voice assistant services
11200889, Nov 15 2018 SNIPS Dilated convolutions and gating for efficient keyword spotting
11200894, Jun 12 2019 Sonos, Inc.; Sonos, Inc Network microphone device with command keyword eventing
11200900, Dec 20 2019 Sonos, Inc Offline voice control
11212612, Feb 22 2016 Sonos, Inc. Voice control of a media playback system
11288039, Sep 29 2017 Sonos, Inc. Media playback system with concurrent voice assistance
11302326, Sep 28 2017 Sonos, Inc. Tone interference cancellation
11308958, Feb 07 2020 Sonos, Inc.; Sonos, Inc Localized wakeword verification
11308961, Oct 19 2016 Sonos, Inc. Arbitration-based voice recognition
11308962, May 20 2020 Sonos, Inc Input detection windowing
11315556, Feb 08 2019 Sonos, Inc Devices, systems, and methods for distributed voice processing by transmitting sound data associated with a wake word to an appropriate device for identification
11343614, Jan 31 2018 Sonos, Inc Device designation of playback and network microphone device arrangements
11354092, Jul 31 2019 Sonos, Inc. Noise classification for event detection
11361756, Jun 12 2019 Sonos, Inc.; Sonos, Inc Conditional wake word eventing based on environment
11380322, Aug 07 2017 Sonos, Inc. Wake-word detection suppression
11405430, Feb 21 2017 Sonos, Inc. Networked microphone device control
11432030, Sep 14 2018 Sonos, Inc. Networked devices, systems, and methods for associating playback devices based on sound codes
11451908, Dec 10 2017 Sonos, Inc. Network microphone devices with automatic do not disturb actuation capabilities
11482224, May 20 2020 Sonos, Inc Command keywords with input detection windowing
11482978, Aug 28 2018 Sonos, Inc. Audio notifications
11500611, Sep 08 2017 Sonos, Inc. Dynamic computation of system response volume
11501773, Jun 12 2019 Sonos, Inc. Network microphone device with command keyword conditioning
11501795, Sep 29 2018 Sonos, Inc. Linear filtering for noise-suppressed speech detection via multiple network microphone devices
11513763, Feb 22 2016 Sonos, Inc. Audio response playback
11514898, Feb 22 2016 Sonos, Inc. Voice control of a media playback system
11516610, Sep 30 2016 Sonos, Inc. Orientation-based playback device microphone selection
11531520, Aug 05 2016 Sonos, Inc. Playback device supporting concurrent voice assistants
11538451, Sep 28 2017 Sonos, Inc. Multi-channel acoustic echo cancellation
11538460, Dec 13 2018 Sonos, Inc. Networked microphone devices, systems, and methods of localized arbitration
11540047, Dec 20 2018 Sonos, Inc. Optimization of network microphone devices using noise classification
11545169, Jun 09 2016 Sonos, Inc. Dynamic player selection for audio signal processing
11551669, Jul 31 2019 Sonos, Inc. Locally distributed keyword detection
11551690, Sep 14 2018 Sonos, Inc. Networked devices, systems, and methods for intelligently deactivating wake-word engines
11551700, Jan 25 2021 Sonos, Inc Systems and methods for power-efficient keyword detection
11556306, Feb 22 2016 Sonos, Inc. Voice controlled media playback system
11556307, Jan 31 2020 Sonos, Inc Local voice data processing
11557294, Dec 07 2018 Sonos, Inc. Systems and methods of operating media playback systems having multiple voice assistant services
11562740, Jan 07 2020 Sonos, Inc Voice verification for media playback
11563842, Aug 28 2018 Sonos, Inc. Do not disturb feature for audio notifications
11589329, Dec 30 2010 ST PORTFOLIO HOLDINGS, LLC; ST CASESTECH, LLC Information processing using a population of data acquisition devices
11641559, Sep 27 2016 Sonos, Inc. Audio playback settings for voice interaction
11646023, Feb 08 2019 Sonos, Inc. Devices, systems, and methods for distributed voice processing
11646045, Sep 27 2017 Sonos, Inc. Robust short-time fourier transform acoustic echo cancellation during audio playback
11664023, Jul 15 2016 Sonos, Inc. Voice detection by multiple devices
11676590, Dec 11 2017 Sonos, Inc. Home graph
11689858, Jan 31 2018 Sonos, Inc. Device designation of playback and network microphone device arrangements
11694689, May 20 2020 Sonos, Inc. Input detection windowing
11696074, Jun 28 2018 Sonos, Inc. Systems and methods for associating playback devices with voice assistant services
11698771, Aug 25 2020 Sonos, Inc. Vocal guidance engines for playback devices
11710487, Jul 31 2019 Sonos, Inc. Locally distributed keyword detection
11714600, Jul 31 2019 Sonos, Inc. Noise classification for event detection
11715489, May 18 2018 Sonos, Inc. Linear filtering for noise-suppressed speech detection
11726742, Feb 22 2016 Sonos, Inc. Handling of loss of pairing between networked devices
11727919, May 20 2020 Sonos, Inc. Memory allocation for keyword spotting engines
11727933, Oct 19 2016 Sonos, Inc. Arbitration-based voice recognition
11727936, Sep 25 2018 Sonos, Inc. Voice detection optimization based on selected voice assistant service
11736860, Feb 22 2016 Sonos, Inc. Voice control of a media playback system
11741948, Nov 15 2018 SONOS VOX FRANCE SAS Dilated convolutions and gating for efficient keyword spotting
11750969, Feb 22 2016 Sonos, Inc. Default playback device designation
11769505, Sep 28 2017 Sonos, Inc. Echo of tone interferance cancellation using two acoustic echo cancellers
11771866, Jul 31 2019 Sonos, Inc. Locally distributed keyword detection
11778259, Sep 14 2018 Sonos, Inc. Networked devices, systems and methods for associating playback devices based on sound codes
11790911, Sep 28 2018 Sonos, Inc. Systems and methods for selective wake word detection using neural network models
11790937, Sep 21 2018 Sonos, Inc. Voice detection optimization using sound metadata
11792590, May 25 2018 Sonos, Inc. Determining and adapting to changes in microphone performance of playback devices
11797263, May 10 2018 Sonos, Inc. Systems and methods for voice-assisted media content selection
11798553, May 03 2019 Sonos, Inc. Voice assistant persistence across multiple network microphone devices
11832068, Feb 22 2016 Sonos, Inc. Music service selection
11854547, Jun 12 2019 Sonos, Inc. Network microphone device with command keyword eventing
11862161, Oct 22 2019 Sonos, Inc. VAS toggle based on device orientation
11863593, Feb 21 2017 Sonos, Inc. Networked microphone device control
11869503, Dec 20 2019 Sonos, Inc. Offline voice control
11890168, Mar 21 2022 LI CREATIVE TECHNOLOGIES INC.; LI CREATIVE TECHNOLOGIES INC Hearing protection and situational awareness system
11893308, Sep 29 2017 Sonos, Inc. Media playback system with concurrent voice assistance
11899519, Oct 23 2018 Sonos, Inc Multiple stage network microphone device with reduced power consumption and processing load
11900937, Aug 07 2017 Sonos, Inc. Wake-word detection suppression
11961519, Feb 07 2020 Sonos, Inc. Localized wakeword verification
11979960, Jul 15 2016 Sonos, Inc. Contextualization of voice inputs
11983463, Feb 22 2016 Sonos, Inc. Metadata exchange involving a networked playback system and a networked microphone system
11984123, Nov 12 2020 Sonos, Inc Network device interaction by range
Patent Priority Assignee Title
10062372, Mar 28 2014 Amazon Technologies, Inc. Detecting device proximities
10109294, Mar 25 2016 Amazon Technologies, Inc Adaptive echo cancellation
10147439, Mar 30 2017 Amazon Technologies, Inc Volume adjustment for listening environment
10147441, Dec 19 2013 Amazon Technologies, Inc Voice controlled system
10229698, Jun 21 2017 Amazon Technologies, Inc. Playback reference signal-assisted multi-microphone interference canceler
10237647, Mar 01 2017 Amazon Technologies, Inc.; Amazon Technologies, Inc Adaptive step-size control for beamformer
10242695, Jun 27 2012 Amazon Technologies, Inc. Acoustic echo cancellation using visual cues
10244313, Mar 28 2014 Amazon Technologies, Inc. Beamforming for a wearable computer
10304475, Aug 14 2017 Amazon Technologies, Inc. Trigger word based beam selection
5315562, Oct 23 1992 TELEDYNE RD INSTRUMENTS, INC Correlation sonar system
5825898, Jun 27 1996 Andrea Electronics Corporation System and method for adaptive interference cancelling
6198693, Apr 13 1998 Andrea Electronics Corporation System and method for finding the direction of a wave source using an array of sensors
6236862, Dec 16 1996 INNOVATIVE COMPUTING TECHNOLOGIES, INC Continuously adaptive dynamic signal separation and recovery system
7039199, Aug 26 2002 Microsoft Technology Licensing, LLC System and process for locating a speaker using 360 degree sound source localization
7068801, Dec 18 1998 National Research Council of Canada Microphone array diffracting structure
7970151, Oct 15 2004 LIFESIZE, INC Hybrid beamforming
8855295, Jun 25 2012 Amazon Technologies, Inc Acoustic echo cancellation using blind source separation
8885815, Jun 25 2012 Amazon Technologies, Inc Null-forming techniques to improve acoustic echo cancellation
8953777, May 30 2013 Amazon Technologies, Inc. Echo path change detector with robustness to double talk
8983057, Sep 20 2013 Amazon Technologies, Inc Step size control for acoustic echo cancellation
9116962, Mar 28 2012 Amazon Technologies, Inc Context dependent recognition
9229526, Sep 10 2012 Amazon Technologies, Inc Dedicated image processor
9319782, Dec 20 2013 Amazon Technologies, Inc Distributed speaker synchronization
9319783, Feb 19 2014 Amazon Technologies, Inc Attenuation of output audio based on residual echo
9332167, Nov 20 2012 Amazon Technologies, Inc. Multi-directional camera module for an electronic device
9354731, Jun 20 2012 Amazon Technologies, Inc Multi-dimension touch input
9363616, Apr 18 2014 Amazon Technologies, Inc Directional capability testing of audio devices
9373338, Jun 25 2012 Amazon Technologies, Inc Acoustic echo cancellation processing based on feedback from speech recognizer
9390723, Dec 11 2014 Amazon Technologies, Inc Efficient dereverberation in networked audio systems
9423886, Oct 02 2012 Amazon Technologies, Inc Sensor connectivity approaches
9431982, Mar 30 2015 Amazon Technologies, Inc Loudness learning and balancing system
9432768, Mar 28 2014 Amazon Technologies, Inc Beam forming for a wearable computer
9432769, Jul 30 2014 Amazon Technologies, Inc Method and system for beam selection in microphone array beamformers
9456276, Sep 30 2014 Amazon Technologies, Inc Parameter selection for audio beamforming
9473646, Sep 16 2013 Amazon Technologies, Inc. Robust acoustic echo cancellation
9516410, Jun 29 2015 Amazon Technologies, Inc Asynchronous clock frequency domain acoustic echo canceller
9521249, May 30 2013 Amazon Technologies, Inc. Echo path change detector with robustness to double talk
9589575, Dec 02 2015 Amazon Technologies, Inc; Amazon Technologies, Inc. Asynchronous clock frequency domain acoustic echo canceller
9591404, Sep 27 2013 Amazon Technologies, Inc Beamformer design using constrained convex optimization in three-dimensional space
9614486, Dec 30 2015 Amazon Technologies, Inc Adaptive gain control
9653060, Feb 09 2016 Amazon Technologies, Inc. Hybrid reference signal for acoustic echo cancellation
9658738, Nov 29 2012 Amazon Technologies, Inc. Representation management on an electronic device
9659555, Feb 09 2016 Amazon Technologies, Inc. Multichannel acoustic echo cancellation
9661438, Mar 26 2015 Amazon Technologies, Inc Low latency limiter
9677986, Sep 24 2014 Amazon Technologies, Inc Airborne particle detection with user device
9678559, Sep 18 2015 Amazon Technologies, Inc Determining a device state based on user presence detection
9689960, Apr 04 2013 Amazon Technologies, Inc Beam rejection in multi-beam microphone systems
9704478, Dec 02 2013 Amazon Technologies, Inc Audio output masking for improved automatic speech recognition
9734845, Jun 26 2015 Amazon Technologies, Inc Mitigating effects of electronic audio sources in expression detection
9747899, Jun 27 2013 Amazon Technologies, Inc Detecting self-generated wake expressions
9747920, Dec 17 2015 Amazon Technologies, Inc Adaptive beamforming to create reference channels
9754605, Jun 09 2016 Amazon Technologies, Inc. Step-size control for multi-channel acoustic echo canceller
9767828, Jun 27 2012 Amazon Technologies, Inc Acoustic echo cancellation using visual cues
9818425, Jun 17 2016 Amazon Technologies, Inc Parallel output paths for acoustic echo cancellation
9820036, Dec 30 2015 Amazon Technologies, Inc Speech processing of reflected sound
9837099, Jul 30 2014 Amazon Technologies, Inc. Method and system for beam selection in microphone array beamformers
9918163, Jun 29 2015 Amazon Technologies, Inc. Asynchronous clock frequency domain acoustic echo canceller
9940949, Dec 19 2014 Amazon Technologies, Inc Dynamic adjustment of expression detection criteria
9966059, Sep 06 2017 Amazon Technologies, Inc.; Amazon Technologies, Inc Reconfigurale fixed beam former using given microphone array
9967661, Feb 09 2016 Amazon Technologies, Inc. Multichannel acoustic echo cancellation
9973848, Jun 21 2011 Amazon Technologies, Inc Signal-enhancing beamforming in an augmented reality environment
9973849, Sep 20 2017 Amazon Technologies, Inc.; Amazon Technologies, Inc Signal quality beam selection
9978387, Aug 05 2013 Amazon Technologies, Inc Reference signal generation for acoustic echo cancellation
9997151, Jan 20 2016 Amazon Technologies, Inc. Multichannel acoustic echo cancellation for wireless applications
20030204397,
20040071284,
20040161121,
20060153360,
20060245601,
20060269080,
20070055505,
20070076898,
20080112574,
20080181430,
20080232607,
20090067642,
20090073040,
20090141907,
20090279714,
20090304200,
20100150364,
20120327115,
20130265276,
20150006176,
20170178662,
20180130468,
20180182387,
EP1538867,
KR20090128221,
RSO2008041878,
SG2006006935,
WO2008041878,
WO2013155098,
WO2017105998,
WO2018118895,
WO2008041878,
///
Executed onAssignorAssigneeConveyanceFrameReelDoc
Aug 02 2018VOCALIFE LLC(assignment on the face of the patent)
Jan 31 2019ZHU, MANLIVOCALIFE LLCASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0497700423 pdf
Jan 31 2019LI, QIVOCALIFE LLCASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0497700423 pdf
Date Maintenance Fee Events
Aug 02 2018BIG: Entity status set to Undiscounted (note the period is included in the code).
Aug 09 2018SMAL: Entity status set to Small.
May 23 2024M2552: Payment of Maintenance Fee, 8th Yr, Small Entity.
May 23 2024M2558: Surcharge, Petition to Accept Pymt After Exp, Unintentional.


Date Maintenance Schedule
Dec 29 20234 years fee payment window open
Jun 29 20246 months grace period start (w surcharge)
Dec 29 2024patent expiry (for year 4)
Dec 29 20262 years to revive unintentionally abandoned end. (for year 4)
Dec 29 20278 years fee payment window open
Jun 29 20286 months grace period start (w surcharge)
Dec 29 2028patent expiry (for year 8)
Dec 29 20302 years to revive unintentionally abandoned end. (for year 8)
Dec 29 203112 years fee payment window open
Jun 29 20326 months grace period start (w surcharge)
Dec 29 2032patent expiry (for year 12)
Dec 29 20342 years to revive unintentionally abandoned end. (for year 12)