The present disclosure is generally directed to audio visualization methods for visual pattern recognition of sound. In particular, the present disclosure is directed to plotting amplitude intensity as brightness/saturation and phase-cycles as hue-variations to create visual representations of sound.
|
1. An audio visualization method for recognition of a sound, the method comprising:
capturing a sound;
determining a brightness and saturation level corresponding to a logarithmic amplitude of the sound,
determining a coefficient phase angle of the sound; and,
displaying the amplitude and phase of the sound simultaneously to generate an image of the sound.
15. A method of recreating a sound on a real-time basis, the method comprising:
capturing a sound;
determining a brightness and saturation level corresponding to a logarithmic amplitude of the sound,
determining a coefficient phase angle of the sound; and,
displaying the amplitude and phase of the sound simultaneously to generate an image of the sound;
analyzing the image of the sound; and,
recreating the sound.
8. A method of reconstructing a sound from an image, the method comprising:
capturing a sound;
determining a brightness and saturation level corresponding to a logarithmic amplitude of the sound,
determining a coefficient phase angle of the sound;
displaying the amplitude and phase of the sound simultaneously to generate an image of the sound;
storing the sound as the generated image;
retrieving the generated image; and, reverse processing the generated image to recover the sound.
3. The method of
7. The method of
10. The method of
14. The method of
17. The method of
|
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 62/427,499, filed Nov. 29, 2016, the entire contents of which are incorporated herein by reference.
The present disclosure is generally related to audio visualization methods for visual pattern recognition of sound. In particular, the present disclosure is directed to plotting amplitude intensity as brightness/saturation and phase-cycles as hue-variations to create visual representations of sound.
While traditional audio visualization methods depict amplitude intensities vs. time, such as in a time-frequency spectrogram, and while some may use complex phase information to augment the amplitude representation, such as in a reassigned spectrogram, the phase data are not generally represented in their own right. By plotting amplitude intensity as brightness/saturation and phase-cycles as hue-variations, the complex spectrogram method described herein displays both amplitude and phase information simultaneously, making the resulting images canonical visual representations of the source wave.
As disclosed herein, encoding log-amplitude visualization of complex-number amplitude and phase (over a wide range of intensities) into a single pixel allows for visualization of total sound. That is, visualization is provided for the total sound coming into a microphone such that every pressure front in time as it impacted the microphone's transducer is reconstructed from the resulting image. As a result, in some embodiments, the original sound is precisely reconstructed (down to the original phases) from an image, by reversing this process. This allows humans to apply their highly-developed visual pattern recognition skills to complete audio data in a new way. Applications of these methods, for example, include making “visual field guides” to sounds, as well as online image generation for sound visualization through mobile devices running browsers (e.g., in real-time and/or “without tiling of time-slices”).
One aspect of the present disclosure describes an audio visualization method for recognition of a sound. The method comprises capturing a sound, creating a logarithmic color amplitude of the sound, creating a coefficient phase angle of the sound, and displaying the amplitude and phase of the sound simultaneously to generate an image of the sound.
Another aspect of the present disclosure describes a method of reconstructing a sound from an image. The method comprises capturing a sound, creating a logarithmic color amplitude of the sound, creating a coefficient phase angle of the sound, displaying the amplitude and phase of the sound simultaneously to generate an image of the sound, and reverse processing the generated image to recover the sound.
Yet another aspect of the present disclosure describes a method of recreating a sound on a real-time basis. The method comprises capturing a sound, creating a logarithmic color amplitude of the sound, creating a coefficient phase angle of the sound, and displaying the amplitude and phase of the sound simultaneously to generate an image of the sound. The method further comprises analyzing the image of the sound and recreating the sound.
The patent application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
In some embodiments of the present disclosure, audio visualization methods for visual pattern recognition of sound are disclosed. In particular, plotting amplitude intensity as brightness/saturation and phase-cycles as hue-variations to create visual representations of sound is described.
While some current audio visualization methods use the complex fast Fourier transform (FFT) components to augment the accuracy of (real) amplitude readings, they tend to be highly application-specific, and do not appear concerned with the significance of generalized, total-sound analysis, by which simultaneous display of both amplitude and phase data in each pixel provides a canonical means of recording, analyzing, cataloguing, and displaying more sound than humans are generally considered capable of hearing. Disclosed herein is an efficient and robust real-time method of viewing total sound spectrographs that incorporates log-intensity (for improved dynamic range) amplitude-visualization combined with chroma-like phase-visualization. By simultaneously displaying both real and imaginary FFT data-sets, the resulting image is ensured to contain all the information of the original source, meaning it is always possible to recover the original sound from any image generated with this method, down to the original phases.
This presents alternative data storage techniques, novel cataloguing methods such as visual sound field-guides (which, when combined with a mobile real-time visualization app could allow for live imitation-feedback), improved sound-availability for the hearing-impaired, and more. Additional modifications that include, e.g., Grand Staff musical overlay and/or stereo versions for wearable devices help music readers without specific technical backgrounds and/or sensory capabilities to make sense of such total-sound visualizations. The ever-increasing capability of modern mobile devices can already support implementation of this visualization method, leveraging their wide distribution as well as their pre-installed microphones, color displays, and processing speeds.
Methods
The study of spatial periodicities in nanocrystalline solids has shown the utility of representing both amplitude and phase with a single pixel, since condensed matter crystals contain periodicities in two and three spatial dimensions, and so require higher dimensional FFTs rather than the one time-dimension periodicities involved in audio analysis. By applying this visualization method to audio signals, the complete, complex FFT of a given time-slice is displayed as a single column of pixels, allowing the horizontal axis to remain available for sequential slices in the time domain.
In contrast to current audio visualization methods like traditional spectrograms, reassigned spectrograms, constant-Q transforms (CQTs), and chroma features which use various techniques to optimize amplitude visualization, a simpler scheme is applied herein based on complex FFTs that simultaneously display the amplitude and phase information associated with each pixel. As in many other applications, not the least of which is the traditional Western musical notation, logarithmic scaling of the frequency axis is optionally adopted for some embodiments, since on it octaves and harmonics are equally spaced. While techniques like reassigned spectrograms utilize the imaginary part of the Fourier transform to enhance accuracy of particular amplitude and harmonic representations, and chroma (i.e., saturation of a distinctive hue of color) visualizations show periodic changes in tone as hue-variations, the methods described herein simultaneously display both real and imaginary Fourier data to produce a canonical view of total sound. By showing Fourier coefficient amplitude as the brightness/saturation of the associated pixel, and Fourier phase as hue, each pixel simultaneously represents both real and imaginary components of a complex Fourier coefficient.
On a linear frequency scale, log-color phase-representation begins with each complex Fourier coefficient being converted to a color according to
As seen in
In order to achieve the benefits of the log-frequency scale from equally spaced samples in the time-domain, the linear-frequency data must be transformed, limiting the retention of some detailed sound information in favor of a more robust visual representation. In particular, since the transformation from linear- to log-frequency expands the lower-frequency coefficients and compresses the higher-frequency coefficients along the vertical axis, the lower-frequency coefficients (those below about 1200 Hz) require interpolation to sufficiently inform the brightness values for the multiple rows of a single coefficient. In contrast, the higher-frequency coefficients are under-sampled so that only coefficients closest to display-rows are represented. This optional nonlinear transformation of the frequency axis allows the discrete time-frequency spectrogram to be “warped” (different frequencies stretched or compressed differently, but frequency-order preserved) without being “scrambled” (order of represented frequencies not preserved), making it more amenable to visual pattern recognition techniques.
The log-frequency display is then rendered by first completing the linear-frequency counterpart as described above and then by mapping the vertical axis to a log-frequency scale. At lower frequencies, this requires interpolation between complex-valued coefficients, for which there are two methods. Complex-color log-frequency interpolation of Fourier coefficients for a 10% frequency-modulated tone centered around 256 Hz are shown using rectangular (
While both polar and rectangular interpolation routines were applied to this task, rectangular interpolation (
Since each Fourier coefficient corresponds to a frequency range determined by the FFT size, a coefficient “center” is where a linear coefficient index plots on the log-frequency scale. Since tiny changes in amplitude are detected by examining more-sensitive phase-variations, mapping Fourier phase to hue allows frequency-variations well below the resolution allowed by a typical FFT size to be visualized from one time-slice to the next as colored stripes. In this way, rougher frequency data are shown with brightness/saturation, while the finer details are represented in color. Assuming a sampling rate of 44.1 kHz and a 2048 FFT size, the separation of coefficient centers is 44100/2048˜21.533 Hz.
At various points between coefficient centers, rectangular interpolation results in zero-amplitude phase-inversions. During these transitions, the interpolated phases switch from being above the center of the lower coefficient to being below the center of the higher coefficient, or vice versa, at which point the Fourier phase undergoes an inversion. At these intersections, the interpolated amplitudes reach zero before immediately becoming positive again. The effect is that black lines appear between coefficient centers with alternating color rotations on either side. Such black lines are artifacts of the rectangular phase-interpolation routine, and, as an exception, do not actually correspond to zero-intensities in the input signal. This effect is seen in practice in
Results
Realizations of this log-color visualization method in HTML5/JavaScript have been shown to process and render audio signals on a variety of hardware platforms in about one-third the time necessary to maintain real-time synchronization. Since this method for showing variation in phase among Fourier coefficients allows for the representation of a complex number by a single pixel, the entire FFT is conveniently displayed as a vertical line of colored pixels with the brightness corresponding to the log of the intensity of the Fourier coefficient and the hue corresponding to the coefficient-phase. In the time direction, steady variations in Fourier-coefficient phase at the onset of each time-slice are seen as colored stripes, with stripes of opposing sequence (RGB vs. RBG) occupying opposite sides of the zero-amplitude lines. When the oscillation frequency is below the center of a coefficient, the hue alternates in the RBG direction, and when the oscillation frequency is above a Fourier-coefficient center, the hue alternates in the RGB direction, as seen in
Whenever the phase is centered on the Fourier coefficient, the hue remains constant, which allows highly accurate, well-centered data points to be easily distinguished and isolated even in real-time. In fact, the color-oscillations have a period inversely proportional to the frequency offset from the coefficient center, just as do amplitude beats used to tune woodwind instruments (see
Discussion
The connection of technologies like microphones, digital displays, and computing power with currently-existing, globally-interconnected, wireless networks of highly-portable devices provides a historically unique opportunity to drastically expand the scope of applications for visual audio analysis. In addition, versatile phase-sensitive audio-analysis applications incorporating both modern (log-frequency) and traditional (Grand Staff) optimizations for enhancing visual pattern recognition provide a meaningful (or at least relatable) basis from which anyone with experience reading music may make interpretations of phase-detailed audio data.
Several exemplary embodiments of applications involving these features are illustrated in
In addition to displaying data on the complete sound wave, in some embodiments, a generated image is reverse-processed to recover the original signal, including the original phases imparted by the interference of the digital detector with the source wave, which contain information like relative angle to direction of source-wave propagation, etc. While CQTs have also been shown to be invertible, they do not display phase information explicitly and generally require additional computational resources compared to the discrete FFT. Since musical notation provides a practical reference, and since each pixel is able to be mapped back to the original sound, both human imitation and recovery to audio occur. Other modifications, such as adjustment of the frequency axis so Fourier coefficients match frequencies of particular tuning standards, are used to readily display whether a note is in appropriate tune, or if not, whether it is sharp or flat and by precisely how much. Such note-specific applications are completely accessible to anyone who reads music, and incorporates a new class of potential users of technically sophisticated audio analysis software.
Browser implementations are only one facet of this development. More specialized implementations, e.g., in hardware instead of software will enable other uses. For instance, by doing a separate transform for each half-note in a log-frequency display, a user avoids all interpolation artifacts and puts any sound into playable music notation. This is illustrated in
In some embodiments, the combination of processing and display techniques described herein enables total sound visualization that includes source-detector phase-interference information. The convenient and portable image format allows for improved accuracy in sound measurement, storage, analysis, and reproduction in a plethora of new and diverse environments and applications. Further development of robust audio visualization software, in parallel with semiconductor technology, will give the general public access to a growing variety of specialized, phase-interferometric tools to record, analyze, and recreate sounds on an increasingly real-time basis. As software is developed, applications which take advantage of traditional musical notation are likely to have the advantage of wider accessibility by the general public, as well as additional potential for musical reproduction and conceptual reference. Consequently, the ability to record and analyze audio in a visual form that retains precise information (i.e., regarding the physical orientation of the actual sound wave in space relative to the detector that recorded it) is significantly valuable for detailed sound-feature analysis.
In some embodiments, a sound is reconstructed from an image. A method for reconstructing a sound from an image comprises capturing a sound, creating a logarithmic color amplitude of the sound, creating a coefficient phase angle of the sound, generating an image of the sound by plotting the amplitude and the phase simultaneously and storing the generated image, and reverse processing the generated image to recover the sound. In some embodiments, such as utilizing various software applications, a sound is captured and stored as a generated image. Upon retrieval of the generated image, the sound is reconstructed by reverse processing of the plotted amplitude and phase of the generated image. In some embodiments, the generated image is not displayed. In some embodiments, the generated image is displayed before and/or after the sound is reconstructed.
When introducing elements of the present disclosure or embodiments thereof, the articles “a,” “an,” “the,” and “said” are intended to mean that there are one or more of the elements. The terms “comprising,” including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
In view of the above, it will be seen that the several advantages of the disclosure are achieved and other advantageous results attained. As various changes could be made in the above processes and composites without departing from the scope of the disclosure, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
Fraundorf, Philip, Wedekind, Stephen, Garver, Wayne
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
5614840, | May 17 1995 | ALTERA CORPORATION A CORPORATION OF DELAWARE | Programmable logic array integrated circuits with segmented, selectively connectable, long interconnection conductors |
8213648, | Jan 26 2006 | Sony Corporation | Audio signal processing apparatus, audio signal processing method, and audio signal processing program |
8368715, | Jul 21 2006 | Sony Corporation | Audio signal processing apparatus, audio signal processing method, and audio signal processing program |
9281793, | May 29 2012 | uSOUNDit Partners, LLC | Systems, methods, and apparatus for generating an audio signal based on color values of an image |
20080130918, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Nov 02 2016 | WEDEKIND, STEPHEN | The Curators of the University of Missouri | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 044246 | /0177 | |
Nov 03 2016 | FRAUNDORF, PHILIP | The Curators of the University of Missouri | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 044246 | /0177 | |
Nov 21 2016 | GARVER, WAYNE | The Curators of the University of Missouri | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 044246 | /0177 | |
Nov 28 2017 | The Curators of the University of Missouri | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Nov 28 2017 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Dec 11 2017 | SMAL: Entity status set to Small. |
May 09 2019 | PTGR: Petition Related to Maintenance Fees Granted. |
Dec 06 2022 | M2551: Payment of Maintenance Fee, 4th Yr, Small Entity. |
Date | Maintenance Schedule |
Jul 02 2022 | 4 years fee payment window open |
Jan 02 2023 | 6 months grace period start (w surcharge) |
Jul 02 2023 | patent expiry (for year 4) |
Jul 02 2025 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 02 2026 | 8 years fee payment window open |
Jan 02 2027 | 6 months grace period start (w surcharge) |
Jul 02 2027 | patent expiry (for year 8) |
Jul 02 2029 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 02 2030 | 12 years fee payment window open |
Jan 02 2031 | 6 months grace period start (w surcharge) |
Jul 02 2031 | patent expiry (for year 12) |
Jul 02 2033 | 2 years to revive unintentionally abandoned end. (for year 12) |