Method for describing the composition of audio signals

Method for describing the composition of audio signals
US9002716

Method for describing the composition of audio signals, which are encoded as separate audio objects. The arrangement and the processing of the audio objects in a sound scene is described by nodes arranged hierarchically in a scene description. A node specified only for spatialization on a 2d screen using a 2d vector describes a 3D position of an audio object using said 2d vector and a 1D value describing the depth of said audio object. In a further embodiment a mapping of the coordinates is performed, which enables the movement of a graphical object in the screen plane to be mapped to a movement of an audio object in the depth perpendicular to said screen plane.

PTO Wrapper PDF
Dossier Espace Google

Patent 9002716
Priority Dec 02 2002
Filed Nov 28 2003
Issued Apr 07 2015
Expiry Oct 01 2027 Extension 1403 days
Inventors Spille, Je…
Assg.orig Thomson Li…
Assg.curr INTERDIGIT…
Entity Large
Referenced by 0
References 27
Maint.: EXPIRED

BACKGROUND
INVENTION
EXEMPLARY EMBODIMENTS

1. A method using an audio processing apparatus for spatialization of a sound object, the sound object having associated a first parameter, 2d location information and depth information, wherein the first parameter defines whether or not the sound object is to be spatialized, the 2d location information comprises second and third parameters that define the 2d location of the sound object in terms of height and width respectively on a 2d plane, and the depth information comprises a fourth parameter, the method comprising steps of

using an audio processing apparatus to determine from the first parameter that the sound object is to be spatialized;

transforming the 2d location information and the depth information of the sound object to a 3D coordinate system, wherein said second parameter defining the height of the 2d location is mapped to audio depth information perpendicular to said 2d plane, said third parameter defining the width of the 2d location is mapped to the width information in the 3D coordinate system, and said fourth parameter is mapped to the height in the 3D coordinate system; and

spatializing the sound according to the resulting 3D location information.

2. Method according to claim 1, wherein the spatialization is performed according to a scene description containing a parametric description of sound sources corresponding to the audio signals, wherein the parametric description has a hierarchical graph structure with nodes, wherein a first node comprises said 2d location information and a second node comprises at least said defining depth information, the second node being hierarchically arranged above said first node.

3. Method according to claim 2, wherein the second node comprises further data defining said step of transforming.

4. Method according to claim 2, wherein the first node further comprises an intensity parameter for adjusting the loudness of a sound, and a source parameter.

5. Method according to claim 2, wherein a soundtrack is composed from a plurality of sound objects, and wherein each of the sound objects is decoded separately.

6. Method according to claim 1, wherein said 2d plane in which the sound object is located corresponds to the screen plane of a video related to the sound object.

7. Method according to claim 6, wherein said transforming enables mapping of a vertical movement of a graphical object in the screen plane to a movement of a corresponding audio object in the depth, perpendicular to said screen plane.

8. Method according to claim 1, wherein the mapping is performed according to a 2×3 matrix or corresponding rotation.

This application claims the benefit, under 35 U.S.C. §365 of International Application PCT/EP03/13394, filed Nov. 28, 2003, which was published in accordance with PCT Article 21(2) on Jun. 17, 2004 in English and which claims the benefit of European patent application No. 02026770.4, filed Dec. 2, 2002 and European patent application No. 03016029.5, filed Jul. 15, 2003.

The invention relates to a method and to an apparatus for coding and decoding a presentation description of audio signals, especially for the spatialization of MPEG-4 encoded audio signals in a 3D domain.

BACKGROUND

The MPEG-4 Audio standard as defined in the MPEG-4 Audio standard ISO/IEC 14496-3:2001 and the MPEG-4 Systems standard 14496-1:2001 facilitates a wide variety of applications by supporting the representation of audio objects. For the combination of the audio objects additional information—the so-called scene description—determines the placement in space and time and is transmitted together with the coded audio objects.

For playback the audio objects are decoded separately and composed using the scene description in order to prepare a single soundtrack, which is then played to the listener.

For efficiency, the MPEG-4 Systems standard ISO/IEC 14496-1:2001 defines a way to encode the scene description in a binary representation, the so-called Binary Format for Scene Description (BIFS). Correspondingly, audio scenes are described using so-called AudioBIFS.

A scene description is structured hierarchically and can be represented as a graph, wherein leaf-nodes of the graph form the separate objects and the other nodes describe the processing, e.g., positioning, scaling, effect. The appearance and behavior of the separate objects can be controlled using parameters within the scene description nodes.

INVENTION

The invention is based on the recognition of the following fact. The above mentioned version of the MPEG-4 Audio standard defines a node named “Sound” which allows spatialization of audio signals in a 3D domain. A further node with the name “Sound2D” only allows spatialization on a 2D screen. The use of the “Sound” node in a 2D graphical player is not specified due to different implementations of the properties in a 2D and 3D player. However, from games, cinema and TV applications it is known, that it makes sense to provide the end user with a fully spatialized “3D-Sound” presentation, even if the video presentation is limited to a small flat screen in front. This is not possible with the defined “Sound” and “Sound2D” nodes.

In principle, the inventive coding method comprises the generation of a parametric description of a sound source including information which allows spatialization in a 2D coordinate system. The parametric description of the sound source is linked with the audio signals of said sound source. An additional 1D value is added to said parametric description which allows in a 2D visual context a spatialization of said sound source in a 3D domain.

Separate sound sources may be coded as separate audio objects and the arrangement of the sound sources in a sound scene may be described by a scene description having first nodes corresponding to the separate audio objects and second nodes describing the presentation of the audio objects. A field of a second node may define the 3D spatialization of a sound source.

Advantageously, the 2D coordinate system corresponds to the screen plane and the 1D value corresponds to a depth information perpendicular to said screen plane.

Furthermore, a transformation of said 2D coordinate system values to said 3 dimensional positions may enable the movement of a graphical object in the screen plane to be mapped to a movement of an audio object in the depth perpendicular to said screen plane.

The inventive decoding method comprises, in principle, the reception of an audio signal corresponding to a sound source linked with a parametric description of the sound source. The parametric description includes information which allows spatialization in a 2D coordinate system. An additional 1D value is separated from said parametric description. The sound source is spatialized in a 2D visual contexts in a 3D domain using said additional 1D value.

Audio objects representing separate sound sources may be separately decoded and a single soundtrack may be composed from the decoded audio objects using a scene description having first nodes corresponding to the separate audio objects and second nodes describing the processing of the audio objects. A field of a second node may define the 3D spatialization of a sound source.

Advantageously, the 2D coordinate system corresponds to the screen plane and said 1D value corresponds to a depth information perpendicular to said screen plane.

EXEMPLARY EMBODIMENTS

The Sound2D node is defined as followed:


Sound2D {
exposedField	SFFloat	intensity	1.0
exposedField	SFVec2f	location	0,0
exposedField	SFNode	source	NULL
field	SFBool	spatialize	TRUE
}

and the Sound node, which is a 3D node, is defined as followed:


Sound {
exposedField	SFVec3f	direction	0, 0, 1
exposedField	SFFloat	intensity	1.0
exposedField	SFVec3f	location	0, 0, 0
exposedField	SFFloat	maxBack	10.0
exposedField	SFFloat	maxFront	10.0
exposedField	SFFloat	minBack	1.0
exposedField	SFFloat	minFront	1.0
exposedField	SFFloat	priority	0.0
exposedField	SFNode	source	NULL
field	SFBool	spatialize	TRUE
}

In the following the general term for all sound nodes (Sound2D, Sound and DirectiveSound) will be written in lower-case e.g. ‘sound nodes’.

In the simplest case the Sound or Sound2D node is connected via an AudioSource node to the decoder output. The sound nodes contain the intensity and the location information.

From the audio point of view a sound node is the final node before the loudspeaker mapping. In the case of several sound nodes, the output will be summed up. From the systems point of view the sound nodes can be seen as an entry point for the audio sub graph. A sound node can be grouped with non-audio nodes into a Transform node that will set its original location.

With the phasegroup field of the AudioSource node, it is possible to mark channels that contain important phase relations, like in the case of “stereo pair”, “multichannel” etc. A mixed operation of phase related channels and non-phase related channels is allowed. A spatialize field in the sound nodes specifies whether the sound shall be spatialized or not. This is only true for channels, which are not member of a phase group.

The Sound2D can spatialize the sound on the 2D screen. The standard said that the sound should be spatialized on scene of size 2 m×1.5 m in a distance of one meter. This explanation seems to be ineffective because the value of the location field is not restricted and therefore the sound can also be positioned outside the screen size.

The Sound and DirectiveSound node can set the location everywhere in the 3D space. The mapping to the existing loudspeaker placement can be done using simple amplitude panning or more sophisticated techniques.

Both Sound and Sound2D can handle multichannel inputs and basically have the same functionalities, but the Sound2D node cannot spatialize a sound other than to the front.

A possibility is to add Sound and Sound2D to all scene graph profiles, i.e. add the Sound node to the SF2DNode group.

But, one reason for not including the “3D” sound nodes into the 2D scene graph profiles is, that a typical 2D player is not capable to handle 3D vectors (SFVec3f type), as it would be required for the Sound direction and location field.

Another reason is that the Sound node is specially designed for virtual reality scenes with moving listening points and attenuation attributes for far distance sound objects. For this the Listening point node and the Sound maxBack, maxFront, miniBack and minFront fields are defined.

According to one embodiment of the invention, the old Sound2D mode is extended or a new Sound2D depth node is defined. The Sound2Ddepth mode could be similar to the Sound2D node but with an additional depth field.


Sound2Ddepth {
exposedField	SFFloat	intensity	1.0
exposedField	SFVec2f	location	0,0
exposedField	SFFloat	depth	0.0
exposedField	SFNode	source	NULL
field	SFBool	spatialize	TRUE
}

The intensity field adjusts the loudness of the sound. Its value ranges from 0.0 to 1.0, and this value specifies a factor that is used during the playback of the sound.

The location field specifies the location of the sound in the 2D scene.

The depth field specifies the depth of the sound in the 2D scene using the same coordinate system as the location field. The default value is 0.0 and it refers to the screen position.

The spatialize field specifies whether the sound shall be spatialized. If this flag is set, the sound shall be spatialized with the maximum sophistication possible.

The same rules for multichannel audio spatialization apply to the Sound2Ddepth node as to the Sound (3D) node.

Using the Sound2D node in a 2D scene allows presenting surround sound, as the author recorded it. It is not possible to spatialize a sound other than to the front. Spatialize means moving the location of a monophonic signal due to user interactivities or scene updates.

With the Sound2Ddepth node it is possible to spatialize a sound also in the back, at the side or above the listener, if an audio presentation system has the capability to present such features.

The invention is not restricted to the above embodiment where the additional depth field is introduced into the Sound2D node. Also, the additional depth field could be inserted into a node hierarchically arranged above the Sound2D node.

According to a further embodiment a mapping of the coordinates is performed. An additional field dimensionMapping in the Sound2DDepth node defines a transformation, e.g. as a 2 rows×3 columns Vector used to map the 2D context coordinate-system (ccs) from the ancestor's transform hierarchy to the origin of the node.

The node's coordinate system (ncs) will be calculated as follows:
ncs=ccs×dimensionMapping.

The location of the node is a 3 dimensional position, merged from the 2D input vector location and depth {location.x location.y depth} with regard to ncs.

Example: The node's coordinate system context is (x_i, y_i). DimensionMapping is (1, 0, 0, 0, 0, 1). This leads to ncs=(x_i, 0, y_i), which enables the movement of an object in the y-dimension to be mapped to the audio movement in depth field

The field ‘dimensionMapping’ may be defined as MFFloat. The same functionality could also be achieved by using the field data type ‘SFRotation’ that is an other MPEG-4 data type.

The invention allows the spatialization of the audio signal in a 3D domain, even if the playback device is restricted to 2D graphics.

INVENTORS:

Spille, Jens, Schmidt, Jürgen

THIS PATENT IS REFERENCED BY THESE PATENTS:

Patent

Priority

Assignee

Title

THIS PATENT REFERENCES THESE PATENTS:

Patent	Priority	Assignee	Title
5208860,	Sep 02 1988	SPECTRUM SIGNAL PROCESSING, INC ; J&C RESOURCES, INC	Sound imaging method and apparatus
5714997,	Jan 06 1995		Virtual reality television system
5943427,	Apr 21 1995	Creative Technology, Ltd	Method and apparatus for three dimensional audio spatialization
6009394,	Sep 05 1996	ILLINOIS, UNIVERSITY OF THE BOARD OF TRUSTEES, THE	System and method for interfacing a 2D or 3D movement space to a high dimensional sound synthesis control space
6694033,	Jun 17 1997	British Telecommunications public limited company	Reproduction of spatialized audio
6829017,	Feb 01 2001	CERBERUS BUSINESS FINANCE, LLC, AS COLLATERAL AGENT	Specifying a point of origin of a sound for audio effects using displayed visual information from a motion picture
6829018,	Sep 17 2001	Koninklijke Philips Electronics N.V.	Three-dimensional sound creation assisted by visual information
6983251,	Feb 15 1999	Sharp Kabushiki Kaisha	Information selection apparatus selecting desired information from plurality of audio information by mainly using audio
7113610,	Sep 10 2002	Microsoft Technology Licensing, LLC	Virtual sound source positioning
7116789,	Jan 29 2001	GUANGDONG OPPO MOBILE TELECOMMUNICATIONS CORP , LTD	Sonic landscape system
7190794,	Jan 29 2001	HEWLETT-PACKARD DEVELOPMENT COMPANY L P	Audio user interface
7266207,	Jan 29 2001	HEWLETT-PACKARD DEVELOPMENT COMPANY L P	Audio user interface with selective audio field expansion
7356465,	Nov 26 2003	Inria Institut National de Recherche en Informatique et en Automatique	Perfected device and method for the spatialization of sound
7533346,	Jan 09 2002	Dolby Laboratories Licensing Corporation	Interactive spatalized audiovisual system
7894610,	Dec 02 2003	Thomson Licensing	Method for coding and decoding impulse responses of audio signals
8020050,	Apr 23 2009	International Business Machines Corporation	Validation of computer interconnects
8437868,	Oct 14 2002	INTERDIGITAL CE PATENT HOLDINGS	Method for coding and decoding the wideness of a sound source in an audio scene
20020103553,
20030053680,
20030095669,
20040141622,
20050114121,
20060165238,
20060174267,
20070140501,
20140037117,
JP2001169309,

ASSIGNMENT RECORDS Assignment records on the USPTO

/////

Executed on	Assignor	Assignee	Conveyance	Frame	Reel	Doc
Nov 28 2003		Thomson Licensing	(assignment on the face of the patent)
Mar 14 2005	SPILLE, JENS	THOMSON LICENSING S A	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	016952	0929	pdf
Mar 18 2005	SCHMIDT, JURGEN	THOMSON LICENSING S A	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	016952	0929	pdf
Jul 26 2005	THOMSON LICENSING S A	Thomson Licensing	CHANGE OF NAME SEE DOCUMENT FOR DETAILS	051317	0841	pdf
Jul 30 2018	Thomson Licensing	INTERDIGITAL CE PATENT HOLDINGS	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	051340	0289	pdf

MAINTENANCE FEES AND DATES: Maintenance records on the USPTO

Date	Maintenance Fee Events
Oct 01 2018	M1551: Payment of Maintenance Fee, 4th Year, Large Entity.
Nov 28 2022	REM: Maintenance Fee Reminder Mailed.
May 15 2023	EXP: Patent Expired for Failure to Pay Maintenance Fees.

Date	Maintenance Schedule
Apr 07 2018	4 years fee payment window open
Oct 07 2018	6 months grace period start (w surcharge)
Apr 07 2019	patent expiry (for year 4)
Apr 07 2021	2 years to revive unintentionally abandoned end. (for year 4)
Apr 07 2022	8 years fee payment window open
Oct 07 2022	6 months grace period start (w surcharge)
Apr 07 2023	patent expiry (for year 8)
Apr 07 2025	2 years to revive unintentionally abandoned end. (for year 8)
Apr 07 2026	12 years fee payment window open
Oct 07 2026	6 months grace period start (w surcharge)
Apr 07 2027	patent expiry (for year 12)
Apr 07 2029	2 years to revive unintentionally abandoned end. (for year 12)