A system and method for sonically connecting special devices is provided. A plurality of devices is monitored. One or more sound profiles are maintained on each of the devices, wherein at least one of the sound profiles on each device is for a sound emitted by one other device in the plurality. A sound is detected on one of the devices and the detected sound is compared to one or more of the sound profiles stored on that device. A match is identified between the detected sound and one of the sound profiles. One or more response actions are performed based on the identified match.
|
1. A method for sonically connecting communication devices, comprising:
generating for a training sound, a training profile comprising a frequency only profile and a stft profile, wherein the frequency only profile comprises a set of energy well coefficients, trigger regions and cutoff thresholds, and the stft profile comprises different trigger regions and different cutoff functions for a plurality of windows for the sound;
storing the training sound profile in a memory of a device; comparing an observed sound received on the device as an alert to the frequency only profile via a processor of the device;
assigning via the processor a vote when at least one point along an average power spectrum for the observed sound is greater than or equal to a threshold defined by one such energy well coefficient;
determining a total number of votes via the processor;
when the total number of votes is greater than or equal to a predetermined minimum number of votes, comparing via the processor an observation power spectrum for a plurality of windows of the observed sound to the stft profile;
comparing each profile window to each observed sound window via the processor; and
determining via the processor that the observed sound matches the sound profile when a predetermined number of matches between the profile windows and the sound observation windows exists.
2. A method according to
training a specialized device by loading the sound; and
generating the sound profile, comprising:
generating a frequency only profile for a sound associated with one such maintained sound profile;
generating an stft profile for the sound associated with the maintained sound profile; and
combining the frequency only profile and the stft profile as the maintained sound profile.
3. A method according to
preprocessing the training sound, comprising:
transforming the training sound into a time series representation;
stratifying the time series representation into a plurality of windows;
computing a frequency transform for each window; and
computing a power density spectrum based on the frequency transforms for each of the windows.
4. A method according to
defining the cutoff thresholds, comprising:
obtaining an average power spectrum for the training sound;
determining a first cutoff threshold based on a mean amplitude of the average power spectrum for the training sound;
determining a second cutoff threshold as at least one standard deviation from the mean amplitude; and
determining a third cutoff threshold as a higher number of standard deviations from the mean amplitude than the second cutoff threshold.
5. A method according to
defining one such trigger region, comprising all contiguous points of the average power spectrum for the training sound that is above the second cutoff threshold.
6. A method according to
defining the energy well coefficients, comprising:
selecting an energy well template; and
stretching the energy well template across each trigger region.
7. A method according to
8. A method according to
performing one or more response actions based on the identified match.
|
This application relates in general to remotely interacting with customers, and, in particular, to a computer-implemented system and method detecting sound and performing an action based on the detected sound.
Smoke alarms are heavily relied on to detect smoke, such as caused by a fire, and to alert individuals of the fire, while fire alarms detect a presence of fire and generate an alert. Generally, the alarms are the first indicator of a fire and are instrumental in assisting individuals to timely escape. The National Fire Protection Association estimates that almost two-thirds of home fire deaths resulted from fires in which the home did not have a working smoke alarm. Placement of the smoke alarms and maintenance are extremely important to ensure the alarms are effective in providing a warning. For example, the U.S. Fire Administration recommends installing a smoke alarm on each floor of a property. Further, with regards to residential properties, the U.S. Fire Administration recommends a smoke alarm in every bedroom and in the hallway outside of each bedroom.
Most battery powered alarms function autonomously. Thus, alarms closest to a fire will sound first, while other alarms will not sound until the fire is in a predefined range of those alarms. In one example, a two-story house has an alarm on the first floor and the second floor. A fire starts in the kitchen, which is on the first floor, and the alarm sounds. However, a family asleep on the second floor cannot hear the alarm sounding on the first floor. By the time the alarm on the second floor sounds, the family has lost precious time in escaping the fire and finds themselves in a dangerous and possibly life-threatening situation.
To prevent such situations, wireless interconnectable smoke alarms, such as by Kidde, utilize radio frequency to provide a warning such that when one alarm sounds, the other connected alarms also sound. However, the wireless smoke alarms of Kidde can only communicate with one another via radio frequency and merely provide a warning alarm, rather than instructions for escaping the facility in which a fire is burning. While wireless networks provide the ability to communicate, they are only able to communicate with one another and are unable to communicate with other types of devices, such as those that are commonly found in residential and commercial dwellings. In contrast, sonically connected devices offer the ability to bridge disparate technologies and have the ability to sense environmental sounds against which the devices are trained to lend themselves to triggering capabilities.
Accordingly, there is a need for a diverse communication system that allows different types of devices to communicate with one another to provide alarms at the same time, as well as instructions for escaping a fire. Preferably, the communication system includes sonic communication.
To quickly and effectively warn individuals of an impending fire, a group of specialized devices work together to detect the fire, alert other devices, and provide instructions for escaping the fire. Each individual device stores a profile for one or more sounds. When one of the devices observes a sound, such as from another device that has detected fire or smoke, the sound is compared with each of the stored profiles. If a match between the observed sound and one of the profiles is detected, the device sounds an alarm or repeats an existing alarm as a radio frequency wireless network notification to other devices within a common network. In lieu of or in addition to the alarm, provides an instruction for escaping the facility in which the fire has been detected.
An embodiment provides a system and method for sonically connecting special devices. A plurality of devices is monitored. One or more sound profiles are maintained on each of the devices, wherein at least one of the sound profiles on each device is for a sound emitted by one other device in the plurality. A sound is detected on one of the devices and the detected sound is compared to one or more of the sound profiles stored on that device. A match is identified between the detected sound and one of the sound profiles. One or more response actions are performed based on the identified match.
Still other embodiments of the present invention will become readily apparent to those skilled in the art from the following detailed description, wherein are described embodiments by way of illustrating the best mode contemplated for carrying out the invention. The other embodiments can include an analysis of spectra or event detection in acoustic or optics fields. As will be realized, the invention is capable of other and different embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and the scope of the present invention. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
Smoke and fire alarms are conventionally used to detect fires and alert individuals. However, many of the alarms function autonomously and require individual detection of the fire before sounding the alarm. Alternatively, wireless alarm systems allow detection of a fire by a single alarm to trigger other alarms and generate an alert, rather than waiting for each individual alarm to detect the fire. However, current wireless systems include only specific alarms. Further, the wirelessly connected alarms only sound an alert and fail to provide instructions for escaping from or stopping the fire. In contrast, sonically connected devices offer a wide spectrum of environmental stimuli that can be sampled and directly responded to with association rules unlike existing systems that typically relay alarm information to a central processing location, such as a 911 dispatch center. To provide individuals with a maximum amount of time to escape from a detected emergency, as well as instructions for escaping, sonically connected devices sound an alert and can provide instructions for the escape.
The devices are trained to recognize a sound emitted from one or more other devices.
In one example, the fire alarm 12 detects a fire 11 burning and emits a signal, such as a sound, that functions as an alert. Hereinafter, the terms “signal” and “sound” are used interchangeably with the same intended meaning, unless otherwise indicated. The sound of the alert is observed by the picture frame 13, fire extinguisher 14, and smart phone 16. Each of the frame 13, fire extinguisher 14, and phone 16 compare the observed sound to the respective stored sound profiles. A determination of whether the observed sound matches one of the sound profiles is made, and if a match is determined to exist, an emergency response action can be performed 17a, 17b. When the picture frame 13 identifies that the observed sound is from the fire alarm 12, the picture frame displays arrow signs 17a directing individuals present in the facility to an exit. Meanwhile, the fire extinguisher plays a pre-recorded message “follow the illuminated arrows” upon detecting a match between the observed sound and one of the stored sound profiles. Alternatively, if the sound emitted from the fire alarm 12 does not match a sound profile, such as those profiles stored on the smart phone 16, no action is taken and the device continues to monitor sounds within the facility.
In a further embodiment, the system of sonically connected communication devices can also be connected via a mesh network to enable the devices to extend a communication range of the devices. For example, a communication device that includes both a mesh network node and a sonic coupling algorithm can communicate alerts from far off nodes that are outside a listening range of the sonic devices. If a fire breaks out in a basement of a two-story dwelling, a mesh network/sonic fire extinguisher located in the basement can be triggered by the sound of a fire alarm, also in the basement. The fire extinguisher can then relay the alert to a communication device located on a second floor of the dwelling. The communication device can then relay a pre-recorded message, such as “fire in the basement.”
In yet a further embodiment, other types of wireless networks can be used in addition to or in place of the mesh network. Use of the wireless network allows the communication devices to transmit profiles of trained sounds from one device to another. For instance, an environment that experiences high acoustic distortion includes three communication devices that are linked via a mesh network and are sonically coupled with the environment. A user triggers a test signal on a fire alarm and activates a “training” setting on one of the devices to generate a signal profile. The device conducts the training to sonically connect to the sound of the test signal. Concurrently, the device sends a wireless signal to the two remaining devices, which separately listen to the test signal. Thus, three different training profiles can be recorded in response to a single training test signal. Further, the devices can exchange their signal profiles with one another since the size of each profile is small. The ability to access the profiles of other devices enables each device to be resilient to changes in the environment or acoustical distortion.
To ensure that the devices effectively communicate with one another to provide a warning and instructions for escape, the devices should be carefully placed and installed.
The communication devices should be appropriately spaced so as to avoid sound distortion or masking between transmission from one device and receipt by another device. For instance, as the distance between devices increases, a natural energy loss of the sound is experienced, which may mask the sound and prevent a device from recognizing the sound. Also, the further the distance between devices, the more opportunity for competing environmental noise, which can cause interpretation problems. In one embodiment, the communicating devices are placed within a range of one to ten feet of each other device. However, other ranges of distances are possible.
In one embodiment, a maximum detection distance can be determined based on a function of alarm intensity, environmental noise intensity, a distance between the alarm and the receiving microphone, environmental materials, geometry of the environments and characteristics of the microphone itself. For example, communication devices can have a detection distance of around 600 meters when an alarm source is 100 dB and the environmental noise is a constant 30 dB. However, other detection distances, alarm sounds, and environmental noise levels are possible.
When the devices are properly placed and installed, the devices can communicate to provide a warning to an individual, as well as instructions or assistance for responding to a cause of the alert. In addition to the example described above with respect to
The communication system can also be used for other scenarios in which alerts or warnings are beneficial. For example, a doorbell 32 outside the front door 32 is pressed by a visitor and a sound emitted by the doorbell is received via the fire alarm 29a, which recognizes the sound and emits a further sound for recognition by the picture frame 30a. Subsequently, the picture frame 30a emits a sound or wireless radio frequency signal that is recognized by the upstairs picture frame 30b and fire alarm 29b, which each emit a sound so that a resident of the dwelling located in the bedroom 28 can hear the doorbell.
Each communication device includes components necessary to input a sound, detect a sound, and emit a sound, as well as perform a response to a detected sound.
A speaker 44 can emit a sound or warning upon identification of a high room temperature as determined by the temperature sensor 45 or a sound match by the processor 41. Additionally, the speaker 44 can continue sounding an alarm in case other sensors fail. For example, a detector located in a basement of a building is triggered due to a fire. A further detector located on a first floor of the building is unable to detect the smoke from the fire since a door leading to the basement is closed; however intermittent temperature readings from the basement detector can be transmitted to the upstairs detector to provide notice of the fire, such as via a warning alarm. The fire continues to burn in the basement and eventually, the detector is destroyed. A determination is made as to whether the further detector will be silenced, such as because the threat has been removed or continue sounding a warning alarm. In this example, the further alarm continues to sound due to the previously detected fire in the basement and a failure of the further detector to connect with the basement detector, which can identify a continued threat even though no communication is received from the destroyed downstairs detector. In a further embodiment, the battery of the downstairs dies during the fire. The further alarm can continue to sound despite a termination of communication with the downstairs detector due to the bad battery. Subsequently, the further detector may terminate the alarm when a manual button is pressed or further notice is received that the fire has been put out. Optionally, the communication device 40 can include an accelerometer 46, which detects movement of the device. Upon movement detection, the speaker 44 can emit the sound or warning.
To identify a match, the processor can implement computer-executable code that can be implemented as modules. The modules can be implemented as a computer program or procedure written as source code in a conventional programming language and presented for execution by the central processing unit as object or byte code. Alternatively, the modules could also be implemented in hardware, either as integrated circuitry or burned into read-only memory components. Each of the communications can act as a specialized computer. For instance, when the modules are implemented as hardware, that particular hardware is specialized to perform the data quality assessment and other computers cannot be used. Additionally, when the modules are burned into read-only memory components, the computer storing the read-only memory becomes specialized to perform the data quality assessment that other computers cannot. The various implementations of the source code and object and byte codes can be held on a computer-readable storage medium, such as a floppy disk, hard drive, digital video disk (DVD), random access memory (RAM), read-only memory (ROM) and similar storage mediums. Once a match has been determined, the communication device 40 can perform an action, such as sounding an alert or providing verbal instructions via a speaker 44. Other components are possible.
Accurately determining whether a match exists can trigger a response action to provide a warning and instructions to an individual for performing a task, such as escaping an emergency or answering a doorbell.
Each device in the communication system is trained to recognize one or more sounds by receiving a sound and generating a profile for the sound. Preprocessing of a training sound can include time series functional transforms, such as stretching of the data, to enhance the resolution of the communication device to distinguish between similar signals that only differ in pulse length. Stretching helps distinguish the similar signals with shared frequencies using a greater time granularity. During preprocessing, a variety of input formats are normalized to generate an appropriate data structure representation based on a fixed clock speed and an input medium sampling rate.
A power density spectrum that represents the sound is generated and used to build a training profile for the sound.
Next, a training sound loaded into the device, as described above with respect to
Subsequently, the pulse code modulation representation is translated to a normalized time series representation of digital samples representing the sound. If background noise of a time series is known, the background noise can be aligned with the signal measured and subtracted from the signal to reduce the noise.
The time series representation is then stratified (block 63) to define a series of windows associated with a window width and window offset. An operational window of the time series has a known number of samples corresponding to a known duration in time since the data input is normalized based on the fixed clock speed. When the operational window is stratified, each of the stratified windows is associated with a known number of samples, or points that include time and amplitude that correspond to a fixed frequency bandwidth using, for example, the Nyquist relation.
The window width and a minimum number of samples for each window can be determined based on a maximum frequency that is expected to be observed and a sample rate of the analog-to-digital converter, or clock speed. In one embodiment, an observation signal has a time series of a known duration, such as 10 seconds, which is stratified into a set of time segments having a duration related to a maximum observable frequency of 7000 Hz. The frequency of 7000 Hz accounts for most standard alarm frequencies that generally range between 3000 and 5000 Hz. Other maximum frequencies are possible; however, alarms with frequencies above 7000 Hz are not likely to be found in residential or commercial structures. A number of samples can be determined using the Nyquist equation provided below:
where fnyquist is the maximum frequency of 7000 Hz and sr represents the minimum number of samples per second, which is equal to 14,000 samples per stratified window in this example. Thus, 14,000 time series samples per stratified window are needed to have a frequency spectrum that is capable of measuring power contribution up to 7000 Hz.
An amount of time to measure the determined number of samples, which is 14,000 samples in this example, can be determined based on the clock speed. In one example, the clock speed is 44,100 Hz, which is a default sample rate for generating CD quality sounds. However, other sample rates are possible. The 14,000 time samples per stratified window is divided by the clock speed, 44,100 Hz to determine a window width of 0.317 seconds per stratified window having 14,000 samples at a clock speed of 44,100 Hz. The 10 second observation signal is then divided by the 0.317 seconds per stratified window to determine a number of 31 stratified windows for the time series representation. Meanwhile, the window offset determines a granularity of time resolution and a number of spectra that are used to compute an average power density spectrum.
A frequency transform is computed (block 64) for each stratified window in the time series representation to convert the representation of the training sound based on time to a power density spectrum that is based on amplitude and frequency of the sound wave. The frequency transform can include Fourier Transform, Discrete Fourier Transform, or Fast Fourier Transform, as well as other types of transform. Finally, for each frequency transform of each window, a corresponding power density spectrum is calculated (block 65).
The power density spectrum for each stratified time window provides a frequency breakdown of the relative power contribution to the signal power for each frequency sampled.
Once the training sound has been preprocessed and the power density spectra are determined for the “frequency only” phase of generating a training profile, calibration constants and a “frequency only” contribution to the training profile can be calculated. The profile can include calibration data, such as a mean, standard deviation, or temperature, as well as other types of data; frequency data that includes trigger regions with scaled energy wells.
Processing of the average power spectrum is performed to further reduce noise and optimize the signal to noise ratio (block 82). Noise reduction can occur via high pass and low pass filters, transformations to other spaces, including frequency and wavelet transforms, and statistical transforms. In one example, a moving average is used to reduce a baseline shift of the average power spectrum by determining a moving average for each point in the power density spectra of the stratified windows and subtracting each corresponding moving average point from the associated point along the power density spectrum.
Once the noise has been reduced, cutoff thresholds for the cleaned average power spectrum are defined (block 83). The cutoff thresholds, such as CT1(xi), CT2(xi), and CT3(xi), are functions based on signal statistics and are used to make decisions regarding signal landmarks or artifacts. The threshold functions can be linear, such as based upon a signal mean and standard deviation. However, other types of threshold functions are possible. The cutoff thresholds provide a scalable way to compare two different signals using common comparison means and allow for the ability to scale energy wells.
In a further embodiment, the cutoff threshold functions, CT1(xi), CT2(xi), and CT3(xi) can be defined based on signal statistics, such as a mean of the cleaned average power spectrum and a multiple of a corresponding standard deviation, using the following equations:
CT1=μ Eq. 2
CT2=μ+1σ Eq. 3
CT3=μ+2σ Eq. 4
To determine the cutoff thresholds, μ represents a mean of the amplitude for the cleaned average power spectrum, while σ represents a standard deviation from the mean. A general equation for the cutoff threshold functions is:
CTa(xi)=μ+ασ Eq. 5
where α represents an offset from the mean by a multiple of a standard deviation. In one embodiment, a value of 1.75 for α is used for CT2.
Once determined, the cutoff thresholds are mapped.
After the cutoff thresholds are determined, a peak region of the cleaned average power spectrum is defined (block 84). The third cutoff threshold 113 is an energy threshold that defines whether a peak has been discovered during training. The peak is then used to identify a peak region, which includes a set of contiguous points that are greater than or equal to the CT3. Each point in the cleaned average power spectrum is associated with a frequency, amplitude pair (xi, yi). For the points to be contiguous, each point should satisfy the following relationship:
xi+1−xi=1 Eq. 6
Next, an amplitude (yi) of each point should satisfy the following equation:
yi≧CT3(xi) Eq. 7
Points that are contiguous and that satisfy the constraints above, are considered to be members of a peak region. Left and right boundaries of each peak region occur at an intersection of the third cutoff threshold with the power density spectrum.
For each peak region, a trigger region is next defined (block 85). Specifically, a trigger region is a bounding box that uses a peak region as a minimal representation and grows dimensions of that peak region. A length of the peak region is increased by finding a next most appropriate cutoff threshold, such as CT2, or boundary end point rule that intersects with the cleaned average power spectrum. A width of the trigger region is determined via one of three peak termination conditions; however, other peak termination conditions are possible. In a first embodiment, the width of the trigger region is determined by identifying two points at which the amplitude of the cleaned average power spectrum intersects the second cutoff threshold. However, this termination condition may not always effective at defining a trigger region. For example, two different peaks exist, but the lowest amplitude separating these peaks of the cleaned power spectrum is higher than the second cutoff threshold and thus, only one trigger region is identified, rather than two separate trigger regions.
A second embodiment to determine a trigger region width is by determining N-point lead and lag trends that result in an inflection point. The inflection point is then determined to be a division point that distinguishes between two different trigger regions. Finally, a third embodiment to distinguish trigger regions is a boundary collision of one peak region with another peak region. As trigger regions are formed, they increase in width. The mutual exclusion rule is a boundary condition that forces trigger region termination prior to intersection with the peak region. That is to say, a peak region is a subset of a trigger region and a trigger region contains one and only one peak region. In one example, each of the peak termination conditions is applied and the highest number of peaks determined is used to determine a number of trigger regions and a width of each trigger region. Alternately, one or more peak termination conditions can be applied.
The trigger regions within a sound profile are needed for power spectrum analysis because of changes in frequency, which are caused by frequency shifts that occur when environmental conditions cause a frequency component of the spectrum to change such as being partially absorbed by environmental surroundings, changes in temperature, and movement of a sound emitter, as well as other changes. At a minimum, the trigger region should be wide enough to accommodate an expected frequency shift, while remaining narrow enough to distinguish between different frequencies along the x-axis. Further, the widening, or additional points in the trigger region, must be contiguous with the set of points contained within a corresponding peak region. Additionally, every additional point must have an amplitude that is higher than the second cutoff threshold.
Once determined, the trigger regions are represented as subsequences of points defined by endpoints of each contiguous segment, which is stored in the memory of a communication device.
After the trigger region has been defined, an appropriate energy well template is selected (block 86) and applied to a trigger region. The templates are defined prior to training and are scaled to fit an energy well to a particular trigger region. An energy well EW(x) is a classification function that determines whether a peak within a training sound matches a peak within a trigger region for an observed sound. Different classification functions can be used for the energy well, such as parabolas, sine, cosine, piecewise functions, and functional transforms on the spectrum. A profile generated from a training signal will have the same number of energy wells as significant peaks, peak regions, and trigger regions.
An energy well template has a common interface specified by a set of placeholder parameters that define how an energy well is anchored to a location on a power spectrum for the training sound at a trigger region. The placeholder parameters also define how the energy well template is stretched along the x-axis and the y-axis of the trigger region, filling the entire rectangle. An energy well template interface is specified as: EWTemplate(XTa, XTp, XTb, Ymin, Ymax).
Each energy well profile can be determined using the following equation for a piecewise sine wave energy well:
where R in Equation 10 represents a trigger region to which the energy well is anchored. A value of x selected along the energy well must be positioned within the trigger region. The output range of f(x) is {0≦y≦1}. The parameters specified by the energy well template: XTA, XTP, XTB, are landmarks of the energy well template that have specific filter characteristics for a particular detection context. More specific contexts will be determined through later engineering experiments. Other functions for determining energy wells are possible.
The energy well template 131 includes a left range 132 and a right range 133, which meet at a peak point (XTp, YTmin) 136. On one end, the left range 132 is bounded by an energy asymptote defined by a vertical line at XTa 134 at a minimum frequency. On the other end, the left range 132 is bounded at the peak frequency XTp 136, at a maximum frequency for the left range 132. The right range 133 is bounded on one end at a minimum frequency, at the peak frequency XTp 136, and on the other end at an energy asymptote at a vertical line intersecting the maximum frequency XTb 135. The minimum and maximum frequencies for the energy well template are defined by the left and right asymptotes, which each occur at a common peak amplitude.
Upon determination, an energy well template such as the piecewise sine function of
Prior to applying an energy well template to a trigger region, energy well calibration coefficients from a training power spectrum are determined. Each trigger region is bound to a particular location on a power spectrum by three points on the X axis: xa, xp, and xb where xa<=xp<=xb, and two points on the Y axis, CT2 at the peak location and yp the amplitude at the peak location. Referring back to
The energy well can be used as a filter query on an array representation of the cleaned average power spectrum to help identify a sound match. Only those frequencies that are contained within the trigger regions are searched using the energy well. During the search, amplitudes for each point are tested against the corresponding energy well's expectation amplitudes. The minimum expectation amplitude refers to a minimum amount of energy that an observed signal must have at a corresponding frequency for the observed signal, having a peak at a common expected location, to trigger the corresponding energy well. The maximum expectation amplitude refers to a maximum energy at the energy well boundaries allowing for triggering of a frequency shifted peak in an observed signal only if the peak is within the trigger region and the amplitude is greater than the energy well maximum. In one embodiment, a default maximum expectation amplitude is a height of the maximum peak within that trigger region identified during training.
Each sound has a distinct pattern of peaks within a corresponding power spectrum. Using the Pareto principle or the 80/20 rule, only a subset of highest amplitude peaks within a power spectrum need be used to perform an initial match of the sound represented by the power spectrum with a sound, such as observed in the environment. For example, only 20% of the signal peaks are responsible for generating 80% of the signal's power and thus, 20% or less of the signal data is required to make an accurate classification, which results in a corresponding set of mutually exclusive trigger regions. As described below in detail with reference to
xa≧xi≧xb Eq. 13
yi≧EWj(xi) Eq. 14
The values for xa and xb represent left and right boundaries of an energy well function that correspond with the boundaries of the associated trigger region. Determining a sound match is further described below with references to
In addition to the energy well profile, the training profile also includes an STFT profile. The energy well, as described above with respect to
Using STFT, one or more parallel stacks of power spectra are computed by generating several time series channels from the sound with each time series channel offset from the other time series channels. The stack of power spectra represent a sequence in time of mutually exclusive and contiguous sequence of stratified blocks that are the same size. A stack has a default initial time position of zero, which starts immediately at the point at which the time series was sampled. For example, when a training button is pushed on a communication device, data flows into the device via a microphone, the first datum is included in the first stratified time series block. The time series channel is an additional sequence in time of mutually exclusive and contiguous sequence of stratified blocks of the same size. The difference is that the first datum and potentially other time samples are not included since there is a time offset that makes stratified blocks from different channels overlap one another. By changing time offsets and generating new stacks of power spectra, the granularity of the STFT operation can be influenced. In a further embodiment, the time series can be stretched so that artificially the block size decreases and the time resolution is increased. Other methods for influencing the STFT are possible.
A single average frequency spectrum frame set can then be computed using an alignment in time to calculate weighted averages between temporally overlapping frames. Each of the stratified time windows generates an array of amplitudes. When a stack of stratified time series windows is generated, a time offset of the windows is recorded and saved. In one example, the time series is 10 seconds long and a window width is two seconds. Thus, the time series would include five, two second windows. A time anchor is represented by a left window bound, which refers to a minimum time included in a time series block. The power spectra associated with each block represents frequency and amplitude as described above. The frequency and amplitude measures can be bound to a two second time window in time according to the above example. So, for every two seconds, a spectrum appears. When sorted in time, time series changes in frequency can be understood, but only at a two second granularity. As the size of the time block is reduced, the maximum frequency is also reduced. Thus, to increase the time granularity, the windows can be recalculated at different positions in time or stretching of the time series can occur, as described above.
The time offset is recorded for each of the stratified windows in a channel based on the left window bound and used to generate an average power spectrum between corresponding offset windows. Each of the windows within a channel are mutually exclusive. For example,
W represents a stratified window 253 and t represents a time offset. Each window is also identified by a subscript for the channel to which the window belongs and the time offset. Subsequently, an average power spectrum (Wi) 254 is determined for each of the stratified windows 253 having overlapping windows in time between channels. For example, an average power spectrum W0 can be determined based on the power spectra of the first stratified window, Wa0 and Wb0, in each channel A and B, and the average power spectrum W1 is determined based on the set of power spectra overlapping Wa1 in the second stratified window in each channel, and so on until all the average power spectra have been determined.
The weighted average power spectra between the windows can be determined by using the following equations:
where Pwr represents a function that computes the power spectrum for a stratified window, as described above. W0 orepresents the average power spectrum for the first time offset, W1 Irepresents the average power spectrum for the second time offset, Wi represents the average power spectrum for the ith time offset, and Wn represents the average power spectrum for the last time offset.
Once the stratified windows and their corresponding power spectra are obtained, the noise of each average power spectrum is reduced (block 151) to generate a cleaned representation. Reducing the noise is described above in further detail with respect to
Once the training profile has been generated for a training sound, the profile is stored by a communication, or specialized, device, such as a fire alarm, fire extinguisher, sign, picture frame, or mirror, as well as other types of devices, on which the training sound was loaded. In one embodiment, the training profiles of different communication devices can be exchanged between the devices over a network, such as a mesh network. The communication device can act as a detector to observe sounds within the environment and determine whether the observed sound matches a sound associated with a training profile stored on the communication device. Prior to comparing the observed sound to the training profile, the observed sound must be preprocessed to generate a representation of the observed sound as a power spectrum.
Once the power spectra have been determined for each of the stratified windows, the observed sound can be compared with one or more training profiles stored on the communication device to determine whether the observed sound matches one of the profiles. Comparing the observed sound to a training profile occurs in two steps. The first step includes comparing the observed sound with the frequency only profile and the second step includes comparing the observed sound with the STFT profile.
To perform a comparison between the observed sound and the training sound, the frequency only profiles are accessed (block 174) memory. However, prior to comparing the profile for the observed sound with the training profiles, the power spectrum of the training profile should be calibrated according to the speed of sound, which is estimated based on a temperature of the environment in which the communication device is located, and signal statistics, such as a mean and standard deviation of the power spectra. Specifically, each trigger region and corresponding energy well can be translated (block 175) to the center peak frequency of the observed sound and dilated according to left and right window boundaries. With regard to the training profiles, each training profile has one or more energy wells, which are associated with corresponding left, peak, and right landmarks that each represent frequencies on the power spectrum that occurred at the training temperature. If the temperature measured for the sample sound and the observed sound is constant, an expected location to “superimpose” an energy well from the training profile onto a power spectrum for an observed sound is identical to the positions defined by the training profile for the left and right boundaries, and the peak maximum. However, if the temperature is different, a frequency shift caused by a change in temperature should be calculated since frequency shifts caused by a change in temperature are not constant. New left and right boundaries and a peak maximum along the frequency axis should be calculated for expected locations at the different temperature. Specifically, for each of the left and right boundaries of the trigger region, an expectation of an observation frequency for the observed sound can be calculated based on a training frequency at a training temperature and an observation temperature using the following equation:
where Tobs represents a temperature of the environment during loading of the observed sound, Ttrain represents a temperature of the environment during loading of the training sound, ftrain represents a frequency of the training sound, and fobs represents a frequency of the observed sound. The expected observation frequency is then used to translate the associated boundary of the trigger region to the expected location. For example, an expected peak maximum frequency is used to center the peak region at the expected location. Further, energy wells can be scaled using a linear transform for the left and right segments using reference landmarks between spectra. An example of a linear scaling between spectra is described next.
Within a trigger region of a training profile, a minimum value for an energy well emin and a maximum value for the energy well emax are determined. The minimum value can equal the second cutoff threshold, while the maximum value can equal the third cutoff threshold. An output value, outvalue, for an observation sound is determined via a linear transform that uses an input value, invalue, from the training profile scale and converts the input value to the output value for the observation sound, according to the equation below:
where qmin s a known minimum energy that is equal to the second cutoff threshold for the observed sound and qmax is an estimated value that is equal to the third cutoff threshold for the observed sound. An energy well associated with a training profile can be scaled to fit a corresponding location within the average power spectrum for the observed sound.
Next, a peak query test is performed (block 176).
A vote is tallied for the observed sound when the amplitude of the average power spectrum for the input sound equals or exceeds an amplitude of the energy well. Specifically, if at least one point along the average power spectrum and located within the trigger region has an amplitude greater than or equal to the energy well amplitude at the corresponding frequency, then the query stops and a vote count is incremented. Alternatively, if no point along the average power spectrum that is located within the trigger region has an amplitude greater than or equal to the energy well amplitude at the corresponding frequency, then the vote count is not incremented for that trigger region. For example, a point 191, 192 is identified having a value xi along an x-axis of the graph 190. When the y-value yi1 of the point (Pt(xi1, yi1)) 191 is above the energy well, a vote count is incremented. However, when the y value yi2 of the point (Pt(xi1, yi2)) 192 has a lower amplitude than the energy well, the vote count is not incremented for the trigger region.
Each average power spectrum can have many potential trigger regions, as described above with respect to the training profile. A peak query is performed for each distinct trigger region. After all trigger regions have been processed, the vote count (VC) is totaled and then compared to a threshold (block 178), such as a minimum number of votes sufficient for identifying a match between the observed sound and one of the training profiles. The minimum number of votes threshold is based on a maximum score (MS) and a voting percentage parameter (VPP). The MS is defined by the number of trigger regions discovered in the training phase, while the VPP is a predefined parameter that is used to calculate a minimum number of votes needed to confirm a match between the observed sound and one of the training profiles according to the following equation:
VC≧VPP*MS Eq. 19
In one embodiment, the VPP can be set to 50%. However, other values are possible.
If the vote count is equal to or greater (block 178) than the threshold, a match is identified (block 179) and further analysis of the observed sound is continued by comparing the observed sound with the STFT profile. Alternatively, if the vote count fails to satisfy the threshold (block 178), the detection analysis is terminated and a finding is made that no match exists (block 181).
During the STFT phase of detection, a change in peak frequency over time is analyzed to determine if a best fit time aligned sequence meets a user-defined threshold.
Cutoff thresholds for each of the stratified windows of the observed sound are determined (block 203). After, the training STFT profiles are loaded (block 204) for comparing with the stratified windows of the observed sound. However, prior to the comparison, the frequency spectrum of the training STFT profiles should be calibrated to the speed of sound. Each peak region and corresponding energy well of the stratified windows for the STFT profiles can be translated (block 205) to a center peak frequency of the power spectra for the stratified windows of the observed sound as described above with respect to
The stratified windows of the STFT profile are aligned (block 206) with the stratified windows of the observed sound in a matrix. Since the number of time windows in a training STFT profile is equal to a number of time windows for an observed sound, the stratified windows can be compared using an n×n matrix of potential time sequence combinations from which a set of distinct peak queries can be identified. The windows are ordered within the matrix by time, which occurs at a summary level of granularity by generating an overview of matches over all the power spectra.
The trigger regions of the STFT profile are applied to the stratified windows of the observed sound within a single frame and queried (block 207) to determine whether a match exists between stratified windows of the training sound and the observed sound. Specifically, when a peak of the power spectrum of a stratified window for the observed sound is within the trigger region of a stratified window from the training STFT profile and yi>=CT2(xi), a vote is assigned, as described above with reference to
The windows of the observed sound and STFT profile can be organized in a matrix.
The matrix can be used to determine whether a match exists between the observed sound and the sound associated with the STFT profile being compared to the observed sound.
A scoring table 222 records the diagonal scores 225. The score table includes three columns, the first for observation time anchors, the second for profile time anchors and the last for diagonal scores. A observation time anchor is identified as a window for the observed sound that acts as a starting point for the diagonal, while the profile time anchor is a window associated with the STFT profile and acts as a starting point for its diagonal contribution. Meanwhile, the diagonal score is determined by totaling the number of matching windows that occur along the diagonal.
Subsequently, a sound matching score is determined by dividing the maximum score by the total possible score. The maximum score is determined by a number of STFT windows in the profile. When each frame within a training profile matches each corresponding power spectrum frame for the observed sound, an exact match is identified. Each profile frame and power spectrum frame for the observed sound is illustrated by a single cell in the matrix. If the sound matching score is greater than or equal to a predetermined matching threshold, a match between the observed sound and the training sound is identified. However, if the sound matching score fails to satisfy the threshold, no match is identified. In one embodiment, the score is calculated as a percentage of matching frame on a diagonal. If no frames match, a percentage of zero is assigned. In contrast, if all the frames match, a percentage of 100 is applied. Returning to the discussion of
Further, the communication system can be used to notify elderly individuals when the doorbell rings, when a laundry cycle has completed, or other household or business notifications. As well, the communication system can be used to help individuals locate items, such as a pair of glasses. A profile is generated for a sound of the glasses being placed on different materials and when a sound is observed, the observed sound is compared to the glasses sound profiles. If a match exists, a location of the glasses can be determined and relayed to the owner.
In a factory, the communication system can be used to ensure compliance with safety procedures. For example, a factory includes a cleanroom for scientific research in which sensitive products are located. The cleanroom must be entered at certain times to ensure the products do not become contaminated. A sensor located outside the cleanroom can detect a presence of an individual via sound or imaging. The presence triggers a recorded speech phase that plays “do not enter, cleanroom in process.” Other examples to ensure compliance are possible.
The communication system can also be useful in for militaries to ensure that soldiers are safe even in enemy environments. For example, each soldier in a platoon can wear a sensor that can recognize the cocking of a gun, enemy attacks codes in English or a language different than English, as well as other sounds. Once one of the sounds is detected via a sensor from at least one individual, the sensors of the other individual soldiers can sound to provide a warning of a potential threat.
Further, the communication system can be used to identify border crossers by placing sensors around a border and training the sensors to identify sounds, such as walking or running on the terrain, or airplane or boat motors, and, as well as other sounds that may indicate an individual is crossing the border.
In addition to sound, the communication system can be used to identify images and trigger an alarm based on a recognized image. For instance, a picture of a coffee mug can be taken. Vertical and horizontal scan lines of the picture are analyzed and an image for a cross section of the cup is recorded. Additionally, a target picture of an environment is taken. Bisection is used to split the target image into segments. Specifically, the target image is divided vertically and then horizontally. Next, a search is performed in each of the segments to determine whether the cross section image of the coffee mug is located in one of the segments. If so, an identification of the segment is provided as an identified location of the mug.
In yet a further embodiment, peak identification and classification can be used in spectroscopy and 2D image processing. Profiles can then be generated to query images for use in biometrics, face detection, object detection, radiology, and disease detection. In one example, a detector can utilize spectroscopy to identify a composition, such as organic compounds. Some type of analytic spectrum representing absorption, emission, or scatter can be processed and compared using a similar methodology. Based on a positive identification an appropriate response can occur, such as classification or an alarm.
While the invention has been particularly shown and described as referenced to the embodiments thereof, those skilled in the art will understand that the foregoing and other changes in form and detail may be made therein without departing from the spirit and scope of the invention.
Patent | Priority | Assignee | Title |
10163313, | Mar 14 2016 | Tata Consultancy Services Limited | System and method for sound based surveillance |
10694107, | Nov 13 2015 | Method and device for eliminating thermal interference for infrared and video-based early fire detection | |
11289112, | Apr 23 2019 | Samsung Electronics Co., Ltd. | Apparatus for tracking sound source, method of tracking sound source, and apparatus for tracking acquaintance |
Patent | Priority | Assignee | Title |
4417235, | Mar 24 1981 | Audible alarm network | |
4450436, | Sep 07 1979 | Massa Products Corporation | Acoustic alarm repeater system |
4611198, | Sep 19 1985 | LEVINSON, SAMUEL H | Security and communication system |
5999089, | May 13 1997 | Alarm system | |
6624750, | Oct 06 1998 | ICN ACQUISITION, LLC | Wireless home fire and security alarm system |
7015807, | Oct 02 2002 | SAFEAWAKE, LLC | Method and apparatus for indicating activation of a smoke detector alarm |
7148797, | Jul 23 2004 | InnovAlarm Corporation | Enhanced fire, safety, security and health monitoring and alarm response method, system and device |
8269625, | Jul 29 2009 | InnovAlarm Corporation | Signal processing system and methods for reliably detecting audible alarms |
8558708, | Feb 09 2010 | InnovAlarm Corporation | Supplemental alert generation device with speaker enclosure assembly |
9349372, | Jul 10 2013 | Panasonic Intellectual Property Corporation of America | Speaker identification method, and speaker identification system |
20160117905, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Date | Maintenance Fee Events |
Jun 23 2021 | M2551: Payment of Maintenance Fee, 4th Yr, Small Entity. |
Date | Maintenance Schedule |
Dec 26 2020 | 4 years fee payment window open |
Jun 26 2021 | 6 months grace period start (w surcharge) |
Dec 26 2021 | patent expiry (for year 4) |
Dec 26 2023 | 2 years to revive unintentionally abandoned end. (for year 4) |
Dec 26 2024 | 8 years fee payment window open |
Jun 26 2025 | 6 months grace period start (w surcharge) |
Dec 26 2025 | patent expiry (for year 8) |
Dec 26 2027 | 2 years to revive unintentionally abandoned end. (for year 8) |
Dec 26 2028 | 12 years fee payment window open |
Jun 26 2029 | 6 months grace period start (w surcharge) |
Dec 26 2029 | patent expiry (for year 12) |
Dec 26 2031 | 2 years to revive unintentionally abandoned end. (for year 12) |