An apparatus configured, based on virtual reality content for viewing in virtual reality, the virtual reality content comprising visual content for display in a three dimensional virtual reality space and spatial audio content comprising audio for presentation such that it is audibly perceived to originate from one or more particular directions in the virtual reality space corresponding to one or more points or regions in the visual content, the virtual reality content defining at least a ground level of said virtual reality space; and based on display of a birds-eye view of the virtual reality space to a user comprising a view of the visual content substantially downward towards said ground level from a point of view location in said virtual reality space elevated from said ground level; to provide for presentation of said spatial audio content with a spatial audio modification, the spatial audio modification configured to modify the one or more particular directions from which the user perceives the spatial audio as being heard in an upward direction relative to said ground level at least for spatial audio content having a particular direction that is outside a field of view of said birds-eye view.
|
14. A method comprising:
receiving a virtual reality content comprising: (1) a visual content configured for display in a virtual reality space; and (2) a spatial audio content configured for a presentation that is audibly perceived by a user to originate from one or more directions in the virtual reality space, wherein the one or more directions correspond to at least one or more regions in the visual content relative to a point of view location, wherein the virtual reality content defines at least a ground level of the virtual reality space with the one or more directions from which the user audibly perceives the spatial audio content to originate being substantially in a plane corresponding to the ground level;
generating a display of a bird's-eye view of the virtual reality space comprising a view of the visual content downward towards the ground level from the point of view location, wherein the point of view location in the virtual reality space is elevated above the ground level; and
rendering, based on the display of the bird's-eye view and the virtual reality content, a presentation of the spatial audio content with a spatial audio modification configured to modify the one or more directions from which the user perceives the spatial audio content as being heard, wherein the spatial audio modification causes the spatial audio content to be heard from an upward direction relative to the ground level, wherein the spatial audio content is at least partially associated with a direction that is outside a field of view of the bird's-eye view.
20. A non-transitory computer readable medium comprising program instructions stored thereon for performing at least the following:
receive a virtual reality content comprising: (1) a visual content configured for display in a virtual reality space; and (2) a spatial audio content configured for a presentation that is audibly perceived by a user to originate from one or more directions in the virtual reality space, wherein the one or more directions correspond to at least one or more regions in the visual content relative to a point of view location, wherein the virtual reality content defines at least a ground level of the virtual reality space with the one or more directions from which the user audibly perceives the spatial audio content to originate being substantially in a plane corresponding to the ground level;
generate a display of a bird's-eye view of the virtual reality space comprising a view of the visual content downward towards the ground level from the point of view location, wherein the point of view location in the virtual reality space is elevated above the ground level; and
render, based on the display of the bird's-eye view and the virtual reality content, a presentation of the spatial audio content with a spatial audio modification configured to modify the one or more directions from which the user perceives the spatial audio content as being heard, wherein the spatial audio modification causes the spatial audio content to be heard from an upward direction relative to the ground level, wherein the spatial audio content is at least partially associated with a direction that is outside a field of view of the bird's-eye view.
1. An apparatus comprising:
at least one processor; and
at least one memory including computer program code,
the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:
receive a virtual reality content comprising: (1) a visual content configured for display in a virtual reality space; and (2) a spatial audio content configured for a presentation that is audibly perceived by a user to originate from one or more directions in the virtual reality space, wherein the one or more directions correspond to at least one or more regions in the visual content relative to a point of view location, wherein the virtual reality content defines at least a ground level of the virtual reality space with the one or more directions from which the user audibly perceives the spatial audio content to originate being substantially in a plane corresponding to the ground level;
generate a display of a bird's-eye view of the virtual reality space comprising a view of the visual content downward towards the ground level from the point of view location, wherein the point of view location in the virtual reality space is elevated above the ground level; and
render, based on the display of the bird's-eye view and the virtual reality content, a presentation of the spatial audio content with a spatial audio modification configured to modify the one or more directions from which the user perceives the spatial audio content as being heard, wherein the spatial audio modification causes the spatial audio content to be heard from an upward direction relative to the ground level, wherein the spatial audio content is at least partially associated with a direction that is outside a field of view of the bird's-eye view.
2. The apparatus according to
3. The apparatus according to
4. The apparatus according to
5. The apparatus according to
6. The apparatus according to
7. The apparatus according to
8. The apparatus according to
9. The apparatus according to
when the elevation of the point of view location is below an in-view-modification threshold, the spatial audio modification is applied to spatial audio content having a direction that is outside the field of view of the bird's-eye view; and
when the elevation of the point of view location is above the in-view-modification threshold, the spatial audio modification is applied to spatial audio content having a direction that is within and outside the field of view of the bird's-eye view.
10. The apparatus according to
11. The apparatus according to
12. The apparatus according to
13. The apparatus according to
15. The method according to
16. The method according to
17. The method according to
18. The method according to
19. The method according to
|
This application claims priority to PCT Application No. PCT/EP2018/077067, filed on Oct. 5, 2018, which claims priority to EP Application No. 17195576.8, filed on Oct. 10, 2017, each of which is incorporated herein by reference in its entirety.
The present disclosure relates to the field of virtual reality and, in particular, to the field of spatial audio presentation when providing bird's eye view.
The size of virtual reality content and the spaces they define is growing as content producers capture and generate richer and larger virtual worlds for a user to explore. As the virtual reality spaces grow they may become more difficult to navigate for a user. Ensuring such virtual reality spaces are easy and intuitive to navigate is important.
The listing or discussion of a prior-published document or any background in this specification should not necessarily be taken as an acknowledgement that the document or background is part of the state of the art or is common general knowledge. One or more aspects/examples of the present disclosure may or may not address one or more of the background issues.
In a first example aspect there is provided an apparatus comprising:
In one or more examples, the spatial audio modification may only be applied when the point of view location is elevated above the ground level by more than a threshold amount. In one or more examples, the spatial audio modification may only be applied when the view from the elevated point of view location is within an angular threshold of directly downwards towards said ground level. In one or more examples, the spatial audio content has a default direction from which it is heard, which may correspond to said points or regions in the visual imagery, and when the spatial audio modification is not applied, the spatial audio content is heard from said default direction.
In one or more embodiments, the spatial audio content is defined by one or more spatial audio objects, each spatial audio object having an audio track associated therewith and a spatial audio object location in the virtual reality space, the spatial audio object location comprising said location of the one of the point or region and comprising the location from which the audio track is perceived to originate when presented to the user relative to the point of view location of the user in the virtual reality space, the spatial audio modification configured to modify the spatial audio objects by providing for elevation of each of the spatial audio object locations from the ground level to thereby provide the modification of the particular direction from which the user perceives the audio of the audio track of the or each spatial audio object as being heard in an upward direction relative to said ground level.
Accordingly, in one or more examples, the spatial audio object location is modified away from its corresponding point or region in the visual content should the point or region in the visual content be visible in the bird's-eye view. In one or more examples, the spatial audio object locations are modified, by the spatial audio modification, in terms of their position in a plane aligned at the ground level in addition to the elevation.
In one or more embodiments, the spatial audio modification is configured to provide for elevation of one or more of or each of the spatial audio object locations from the ground level to elevated spatial audio object locations as a function of a distance of the spatial audio object location from the bird's eye view.
In one or more examples, the spatial audio modification is also a function of the elevation of the point of view location.
In one or more examples, the spatial audio modification is also a function of one or more of a roll and pitch and yaw of the bird's eye view. In one or more examples, based on one or more of roll and pitch in the bird's eye view, the change in direction provided by the spatial audio modification is reduced at least for spatial audio content having a particular direction that is within the field of view of said birds-eye view by virtue of the one or both of a roll and pitch of the bird's eye view.
In one or more examples, the function is configured to elevate the spatial audio object locations to corresponding elevated spatial audio object locations, the elevated spatial audio object locations more elevated the greater the distance of the spatial audio object location from the bird's eye view.
In one or more examples, the distance may be measured between the respective spatial audio object location and one or more of i) the point of view location, ii) the ground location, iii) a centre of the field of view at the ground level, iv) an edge of the field of view, or v) any other location falling within the field of view of the bird's eye view in the virtual reality space.
In one or more embodiments, the function is configured to elevate the spatial audio object locations to corresponding elevated spatial audio object locations, the elevated spatial audio object locations having no more than a maximum elevation threshold, the maximum elevation threshold based on the elevation of the point of view location.
In one or more embodiments, the spatial audio modification is applied only to spatial audio objects having spatial audio object locations within a threshold distance of the ground level.
In one or more examples, spatial audio objects not arranged within the threshold distance of the ground level may also be modified but perhaps with a different spatial audio modification.
In one or more examples, spatial audio objects having a spatial audio location above said threshold distance of the ground level may not be modified by the spatial audio modification.
In one or more embodiments, the spatial audio modification is further configured to modify the volume at which the audio track of each spatial audio object is presented to the user.
In one or more examples, the spatial audio modification is configured to modify the volume at which the audio track of each modified spatial audio object is presented to the user as a function of a distance between the any one of the ground location, the point of view location or any location based thereon and any one of the modified or unmodified spatial audio object location.
In one or more embodiments, the presentation of said spatial audio content with the spatial audio modification is conditional on the elevation of the point of view location being above a first elevation threshold.
In one or more examples, a second elevation threshold is provided at an elevation between the ground location and the first elevation threshold, and wherein when a bird's eye view is provided having an elevation between the second and the first elevation thresholds, the effect of the spatial audio modification is reduced relative to when applied above the first elevation threshold. In one or more examples, based on the user being provided with a bird's eye view having an elevation less than the second elevation threshold, the apparatus is caused not to apply the spatial audio modification.
In one or more embodiments, when the elevation of the point of view location is below an in-view-modification threshold, the spatial audio modification is only applied to spatial audio content having a particular direction that is outside the field of view of said birds-eye view; and
In one or more embodiments, based on movement of the bird's eye view to a different field of view and thereby include, at least in part, a different part of the visual content, the apparatus is caused to provide for removal of the spatial audio modification at least for spatial audio content having a particular direction that is within the different field of view of said birds-eye view.
In one or more embodiments, the application of the spatial audio modification to spatial audio objects is dependent on a category assigned to said spatial audio objects.
In one or more examples, the category defines whether or not the user is within a predetermined interaction distance of the spatial audio object or the point or region of visual content associated therewith.
In one or more embodiments, based on one or both of a change in elevation of the bird's eye view and a change from a view provided substantially at the ground level to a bird's eye view at a higher elevation, provide for gradual application of the spatial audio modification over a gradual-application time greater than the time to complete the change.
In one or more examples, the provision of the gradual application of the spatial audio modification is conditional on the change in elevation or the change in view occurring above a threshold rate.
In one or more embodiments, the virtual reality content comprises six degrees of freedom virtual reality content in which the user is free to change the viewing location in the virtual reality space as well as their viewing direction in the virtual reality space. In one or more embodiments, the virtual reality content and the spatial audio content comprises captured real-world content.
In a further aspect there is provided a method, the method comprising:
In a further aspect there is provided a computer readable medium comprising computer program code stored thereon, the computer readable medium and computer program code being configured to, when run on at least one processor, perform the method of:
In a further aspect there is provided an apparatus, the apparatus comprising means configured to; based on virtual reality content for viewing in virtual reality, the virtual reality content comprising visual content for display in a three dimensional virtual reality space and spatial audio content comprising spatial audio for presentation such that it is audibly perceived to originate from one or more particular directions in the virtual reality space corresponding to one or more points or regions in the visual content relative to a point of view location, the virtual reality content defining at least a ground level of said virtual reality space; and based on display of a bird's-eye view of the virtual reality space to a user comprising a view of the visual content downward towards said ground level from the point of view location in said virtual reality space that is elevated from said ground level;
The present disclosure includes one or more corresponding aspects, examples or features in isolation or in various combinations whether or not specifically stated (including claimed) in that combination or in isolation. Corresponding means and corresponding functional units (e.g., function enabler, spatial audio presenter, spatial audio modifier, movement tracker, display device) for performing one or more of the discussed functions are also within the present disclosure.
Corresponding computer programs for implementing one or more of the methods disclosed are also within the present disclosure and encompassed by one or more of the described examples.
The above summary is intended to be merely exemplary and non-limiting.
A description is now given, by way of example only, with reference to the accompanying drawings, in which:
Virtual reality (VR) may use a VR display comprising a headset, such as glasses or goggles or virtual retinal display, or one or more display screens that surround a user to provide the user with an immersive virtual experience. A virtual reality apparatus, which may or may not include the VR display, may provide for presentation of multimedia VR content representative of a virtual reality scene to a user to simulate the user being present within the virtual reality scene. Accordingly, in one or more examples, the VR apparatus may provide signalling to a VR display for display of the VR content to a user while in one or more other examples, the VR apparatus may be part of the VR display, e.g. part of the headset. The virtual reality scene may therefore comprise the VR content displayed within a three-dimensional virtual reality space so that the user feels immersed in the scene, as if they were there, and may look around the VR space at the VR content presented around them. In one or more examples, the VR apparatus may provide signalling to speakers or headphones for presentation of VR content comprising spatial audio. The virtual reality scene may therefore comprise the VR content audibly presented so that the user feels immersed in an audio scene, as if they were there, and may look around and hear the audio presented around them. The virtual reality scene may replicate a real world scene to simulate the user being physically present at a real world location or the virtual reality scene may be computer generated or a combination of computer generated and real world multimedia content. Thus, the VR content may be considered to comprise the imagery (e.g. static or video imagery), audio and/or accompanying data from which a virtual reality scene may be generated for display. The VR apparatus may therefore provide the VR scene by generating the virtual, three-dimensional, VR space in which to display the VR content. The virtual reality scene may be provided by a panoramic video (such as a panoramic live broadcast), comprising a video having a wide or 360° field of view (or more, such as above and/or below a horizontally oriented field of view). A panoramic video may have a wide field of view in that it has a spatial extent greater than a field of view of a user or greater than a field of view with which the panoramic video is intended to be displayed.
The VR content provided to the user may comprise live or recorded images of the real world, captured by a VR content capture device, for example. An example VR content capture device comprises a Nokia Technologies OZO device. As the VR scene is typically larger than a portion a user can view with the VR display, the VR apparatus may provide, for display on the VR display, a virtual reality view of the VR scene to a user, the VR view showing only a spatial portion of the VR content that is viewable at any one time. The VR apparatus may provide for panning around of the VR view in the VR scene based on movement of a user's head and/or eyes.
The virtual reality content may comprise, and a VR apparatus presenting said VR content may provide, predefined-viewing-location VR or free-viewing-location VR. In predefined-viewing-location VR, the location of the user in the virtual reality space may be fixed or follow a predefined path. Accordingly, a user may be free to change their viewing direction with respect to the virtual reality imagery provided for display around them in the virtual reality space, but they may not be free to arbitrarily change their viewing location in the VR space to explore the VR space. Thus, the user may experience such VR content from a fixed point of view or viewing location (or a limited number of locations based on where the VR content capture devices were located in the scene). In some examples of predefined-viewing-location VR the imagery may be considered to move past them. In predefined-viewing-location VR content captured of the real world, the user may be provided with the point of view of the VR content capture device. Predefined-viewing-location VR content may provide the user with three degrees of freedom in the VR space comprising rotation of the viewing direction around any one of x, y and z axes and may therefore be known as three degrees of freedom VR (3DoF VR).
In free-viewing-location VR, the VR content and VR apparatus presenting said VR content may enable a user to be free to explore the virtual reality space. Thus, the user may be provided with a free point of view or viewing location in the virtual reality space. Free-viewing-location VR is also known as six degrees of freedom (6DoF) VR or volumetric VR to those skilled in the art. Thus, in 6DoF VR the user may be free to look in different directions around the VR space by modification of their viewing direction and also free to change their viewing location (their virtual location) in the VR space by translation along any one of x, y and z axes. The movement available in a 6DoF virtual reality space may be divided into two categories: rotational and translational movement (with three degrees of freedom each). Rotational movement enables a user to turn their head to change their viewing direction. The three rotational movements are around x-axis (roll), around y-axis (pitch), and around z-axis (yaw). Translational movement means that the user may also change their point of view in the space to view the VR space from a different virtual location, i.e., move along the x, y, and z axes according to their wishes. The translational movements may be referred to as surge (x), sway (y), and heave (z) using the terms derived from ship motions.
One or more examples described herein relate to 6DoF virtual reality content in which the user is at least substantially free to move in the virtual space either by user-input through physically moving or, for example, via a dedicated user interface (UI).
Spatial audio comprises audio presented in such a way to a user that it is perceived to originate from a particular location, as if the source of the audio was located at that particular location. Thus, virtual reality content may be provided with spatial audio having directional properties, such that the audio is perceived to originate from a point in the VR space, which may be linked to the imagery of the VR content.
The spatial positioning of the spatial audio may be provided by the degree to which audio is presented to each channel of a multichannel audio arrangement, as well as by 3D audio effects, such as those that utilise a head related transfer function to create a spatial audio space in which audio can be positioned for presentation to a user. Spatial audio may be presented by headphones by using head-related-transfer-function (HRTF) filtering techniques or, for loudspeakers, by using vector-base-amplitude panning techniques to position the perceived aural origin of the audio content. The user or the headphones may be head-tracked so that movements of the user's head can be accounted for in the presentation of the spatial audio so that the audio is heard from the appropriate directions.
The spatial audio content may define an aural scene that defines the spatial, three-dimensional arrangement of audio in a scene, which is presentable to recreate the three-dimensional audible experience of being in that scene. Thus, the aural scene may be considered representative of a three-dimensional audio environment in which audio is perceived to be heard from different directions in the three-dimensional audio environment as defined by the spatial audio content.
Spatial audio content may be represented in many different ways. In one or more examples, the spatial audio content may be considered to define an aural scene defined by spatial audio objects that have an associated three-dimensional position in the virtual reality space. The audio objects may represent audio sources in the scene that have been captured or may be computer generated sources of audio. A spatial audio object may comprise an object that serves as a source of spatial audio in the virtual reality space. Accordingly, when presented to a user, the position of the spatial audio objects is used to render the spatial audio content associated with each object such that the user perceives the arrangement of the audio objects in the aural scene. In one or more other examples, the spatial audio may be considered to comprise an aural scene encoded using ambisonic processing techniques. Ambisonic processing may not define specific directions from which the audio is heard, but may be captured such that information representing the three-dimensional positioning of the audio is captured in the way the audio is captured. For example, ambisonic audio capture is done using an ‘ambisonic microphone’ comprising a number of microphone capsules. In a first order ambisonic case, the microphone is used to capture four signals W (omni-directional), X, Y and Z. During playback through a loudspeaker array for example, the signal rendered from each loudspeaker is a linear combination of the above signals which can recreate the aural scene.
In one or more other examples, the captured spatial audio may be considered to comprise an aural scene defined by way of spatial audio coding. For example, the audio of a scene is recorded with a microphone array. For every predetermined time frame (10 ms, for example) the microphone signals are split into frequency bands. For each frequency band a direction is determined (i.e. the direction of the sound at this frequency band). Accordingly, the audio is associated with directional information based on the spatial audio coding. During rendering the audio corresponding to the different frequencies may be played from the determined directions using vector base amplitude panning (VBAP), for example.
In one or more examples, the spatial audio content may be considered to include information representing the directions towards/locations of the audio sources from a point of view location or “listening position” or information that provides for determination of the directions towards/locations of the audio sources.
By whichever technique the spatial audio content is represented, it may be presented such that a user may hear different audio perceived from different directions in the virtual reality space.
As virtual reality spaces increase in size, navigating around said spaces, particularly in 6DoF virtual reality, can be challenging. In one or more examples, the virtual reality space includes a ground level, similar to a real-world ground level. The ground level may comprise a plane, or one or more interconnected planes, in the three-dimensional virtual reality space configured to provide a default level for the user to explore the virtual reality space. If a plurality of interconnected planes are provided, they may be at angles to one another to simulate hills or inclines, or the planes may be at different levels to simulate floors in a building. In one or more embodiments, the ground level provides a reference plane from which to measure elevation in the virtual reality space. A user may be given a point of view location in said virtual reality space substantially at said ground level. Accordingly, a VR apparatus may provide the user with a virtual reality view of the virtual reality space at a height above the ground level to replicate the user standing in said virtual reality space. In at least some examples, this height may depend on the user's height giving a more natural, personalized spatial audio (and visual) percept for the user. The audio of the spatial audio may be heard from one or more of a plurality of directions but may specifically be provided from directions substantially in a plane corresponding to ground level. Accordingly, for many instances of VR content, the spatial audio content may be heard from directions in said plane which may appear to the user as being in front, behind and to the left and right of them and directions in between.
In one or more examples, it may be advantageous to switch to bird's eye view of the virtual reality space to view the visual content from above, which may assist in navigating the space or providing a visual overview of the virtual reality content in the space.
While a bird's eye view may be visually useful, the presentation of spatial audio, at least for spatial audio where the source of the audio is perceived to originate from ground level, may be less useful. The bird's eye view may be provided by providing the user with a virtual reality view of the virtual reality space from an elevated point of view location, downward towards the ground level. From this elevated point of view location, some or all of the spatial audio (that has its source substantially located at ground level) may be heard from a direction in front of the user, i.e. from the ground level which, as the virtual reality view is provided looking substantially downward, is presented in front of the user. It will be appreciated that the immersive quality of spatial audio may be lost when providing a bird's eye view. It will also be appreciated that the ability for a user to audibly identify the direction towards a source of audio is hampered when providing a bird's eye view because much of the audio will be heard from substantially the same direction—in front of the user given the downward directed virtual reality view towards the ground level.
With reference to the example of
Accordingly, from the point of view location that provides the bird's eye view, the spatial audio content would normally be heard from directions towards the ground (apart from, perhaps, audio from birds or aircraft in the virtual reality space). However, the apparatus 101 provides for application of the spatial audio modification which may temporarily modify the presentation of the spatial audio so that rather than presenting the spatial audio such that it is heard from a direction corresponding to the visual imagery, the direction from which it is perceived relative to the user is modified upwards away from the ground level. Accordingly, as will be appreciated, when provided with a bird's eye view looking downward, the spatial audio that would previously be perceived as generally in front of the user at ground level, will, by virtue of the modification be perceived to be heard from directions above, below, left and right of a user relative to their downward view. Thus, put another way, the spatial audio modification widens a field of hearing from a narrow range of directions downward towards the ground level to a wider range of directions around the user. A technical effect of such a modification is to overcome the limited directions from which spatial audio (having its audio “origin” at ground level) is presented when a bird's eye view is provided, which in turn may improve the ability for a user to audibly navigate the virtual reality space. Thus, the spatial audio modification may allow for more useful user perception of the direction towards the perceived origin of spatial audio and therefore allow for more accurate user panning of the bird's eye view to locate an audible object of interest in said virtual reality content.
The apparatus 101 may comprise or be connected to a processor 101A and a memory 101B configured to execute computer program code. The apparatus 101 may have only one processor 101A and one memory 101B but it will be appreciated that other embodiments may utilise more than one processor and/or more than one memory (e.g. same or different processor/memory types). Further, the apparatus 101 may be an Application Specific Integrated Circuit (ASIC).
The processor 101A may be a general purpose processor dedicated to executing/processing information received from other components, such as from a content store 110 and the respective apparatuses 101, 106 in accordance with instructions stored in the form of computer program code in the memory. The output signalling generated by such operations of the processor is provided onwards to further components, such as to a VR apparatus 102 and/or a VR headset 103 comprising a VR display 104 and headphones 105.
The memory 101B (not necessarily a single memory unit) is a computer readable medium (solid state memory in this example, but may be other types of memory such as a hard drive, ROM, RAM, Flash or the like) that stores computer program code. This computer program code stores instructions that are executable by the processor, when the program code is run on the processor. The internal connections between the memory and the processor can be understood to, in one or more example embodiments, provide an active coupling between the processor and the memory to allow the processor to access the computer program code stored on the memory.
In this example the respective processors and memories are electrically connected to one another internally to allow for electrical communication between the respective components. In this example the components are all located proximate to one another so as to be formed together as an ASIC, in other words, so as to be integrated together as a single chip/circuit that can be installed into an electronic device. In some examples one or more or all of the components may be located separately from one another.
The apparatus 101, in this example, forms part of the virtual reality apparatus 102 for presenting the virtual reality content to a user. The apparatus 101 and the VR apparatus 102 may share the processor 101A and memory 101B. In other examples, the apparatuses 101, 102 may have different processors and/or memories. In other examples the apparatus 101 may be physically independent of the VR apparatus 102 and may be in communication therewith to provide the spatial audio modification.
The apparatus 101 and/or VR apparatus 102 may receive input signalling from a head tracker 106, which may track the orientation of a user's head and, in some examples, their physical position to provide signalling to modify the direction of view provided to the user and/or to modify their point of view location in the virtual reality space. It will be appreciated that other input means may be used to control the direction of view provided to the user or their point of view location in the virtual reality space.
In the examples that follow the spatial audio content may be encoded in any form, such as ambisonic audio or as spatial audio objects. However, for ease of understanding, in the description that follows the spatial audio content is described in terms of spatial audio objects that define a source location of an audio track such that the user perceives the audio of the audio track from the spatial audio object based on the relative position of the user and the position of the audio object in the virtual reality space. Thus, the audio sources 203-207 may comprise spatial audio objects. The audio of a first of the audio objects 203, as presented to the user, will be perceived by the user as from direction 208. Likewise, the audio of the second audio object 204 will be heard from direction 209, the third audio object 205 will be heard from direction 210, the fourth audio object 206 will be heard from direction 211 and the fifth audio object 207 will be heard from direction 212.
The example of
As will be described in more detail below with reference to later figures, the spatial audio modification provides for temporary modification of the directions 208-212 in an upwards direction 304 away from the ground level 301, such that the spatial audio is heard from a different direction shown by direction 210M for audio object 205. Thus, at least when the bird's eye view of
The example of
The example of
The example of
In the example of
It will be appreciated that at least a subset of the spatial audio objects are located substantially at the ground level 504, namely the amusement park 510, the bell 511 and the horn 512 (as well as the birds in the trees and cars). The apparatus may compare the location of the spatial audio objects to a threshold above the ground level to identify spatial audio objects that are “substantially at said ground level”. It will also be appreciated that a subset of the audio objects may be categorized as objects of interest. In one or more examples, the categorization may be provided by one or more of a) categorization information in the virtual reality content, b) the VR apparatus based on user preferences and information in the virtual reality content, such as audio object name or type or c) manual user-selection, among other examples. The spatial audio modification may be applied to only those spatial audio objects located within a threshold distance of the ground level 504 or, more generally, spatial audio that is associated with visual points or regions located within a threshold of ground level. The spatial audio modification may only be applied to spatial audio objects located having a predetermined category or, more generally, spatial audio that is associated with visual points or regions having the predetermined category.
The example of
The example
The example
The change in direction from which the user perceives the spatial audio as being heard may be achieved by applying an ambisonic audio processing function to spatial audio content of ambisonic type to achieve the upward change in direction. In other examples, the spatial audio content may be of spatial audio coded type, and a different, spatial audio processing function may be applied to achieve the change in perceived direction in an upward direction. In the present examples, which show the spatial audio content encoded in terms of spatial audio objects that have a location in the virtual reality space, the spatial audio modification may be considered to change the location of the spatial audio objects in the virtual reality space.
Thus, as shown in
In other examples, the spatial audio modification may change the position of the audio object not just in terms of their elevation (i.e. Z direction of a Cartesian coordinate system) to the ground level 504, but their position in a plane aligned at the ground level (i.e. an X and Y direction of a Cartesian coordinate system).
In the above example, the position of the audio object 812 representing the horn 512 is not changed. This is because in this example the amount of elevation applied to the spatial audio object to modify its perceived direction, or more generally the degree to which the perceived direction towards the spatial audio content is modified, is dependent on the distance between the ground location 601 and the location of the spatial audio object.
In one or more examples, the spatial audio modification is configured to provide for elevation of each of the locations of the spatial audio objects from the ground level 504 as a function of a distance of the spatial audio object location from the bird's eye view. How the distance between the spatial audio object location and the bird's eye view may be measured may vary between embodiments. For example, the distance may be measured from the ground location 601 (i.e. the point at ground level directly below the point of view location 700) or from the point of view location 700 or from an edge or point within the bird's eye view of
In one or more examples, the spatial audio modification may only be applied to spatial audio objects greater than a threshold distance from the bird's eye view, for example from a point within the bird's eye view including the point of view location. In this example, the spatial audio object 812 representing the horn may be less than the threshold distance away and is therefore not modified by the spatial audio modification. The threshold distance may be based on the field of view of the bird's eye view shown in
In one or more examples, the spatial audio modification may elevate the location of the spatial audio objects to corresponding elevated spatial audio object locations, the elevated spatial audio object locations more elevated the greater the distance between the spatial audio object locations from the bird's eye view. Thus, spatial audio objects having a location close to the bird's eye view may be elevated by a first amount while spatial audio objects having a location further from the bird's eye view may be elevated by a greater second amount.
The elevation of the location of the spatial audio objects above the ground level (i.e. in the Z direction) while retaining their position in the plane of the ground level (i.e. in the X, Y directions) may advantageously widen the range of directions from which spatial audio is heard when provided with a bird's eye view. Thus, a user may be provided with an improved understanding of the location of spatial audio objects relative to the bird's eye view because they will perceive the audio as originating from a wider range of directions rather than in front of them on the ground level. Further, with the spatial audio modification being a function of distance, the further the spatial audio object the greater the elevation provided and thus the user may be provided with an appreciation of the distance they are from the spatial audio object even when they cannot see the visual content representation of it in the bird's eye view.
In one or more examples, the spatial audio modification may be such that the locations of the spatial audio object may not be elevated more than a maximum elevation threshold. In one or more examples, the maximum elevation threshold is based on the elevation of the point of view location while in other examples it may be a predetermined threshold. Thus, while it may be advantageous to elevate spatial audio objects as a function of distance from the location of the bird's eye view, having the spatial audio objects elevated higher than the point of view location 700 may not be desirable as the audio from distant spatial audio objects could be perceived as originating behind the user, relative to the user's downwardly directed bird's eye view. Thus, the spatial audio modification may be a function of distance but beyond a threshold distance the location to which the spatial audio objects are elevated is the maximum elevation threshold. In other words, the spatial audio modification may be a function of distance of the spatial audio object from a point within the current bird's eye view provided to the user as well as a function of the elevation of the point of view location 700 from the ground level 504.
As mentioned above, the spatial audio modification may only be applied to spatial audio objects outside the field of view of the bird's eye view (i.e. as shown in
Example
The presentation of spatial audio may include the presentation of audio from more distant spatial audio objects at a lower volume to those that are closer to the point of view location of the user. In one or more examples, however, the spatial audio modification may provide for modification of the volume at which the audio track of each spatial audio object is presented to the user. The modification may comprise an increase in volume dependent on the distance from a point in the bird's eye view. In one or more examples, the volume modification may provide for equalization of the volume of the spatial audio objects affected by the spatial audio modification. In other examples, the volume of one or more spatial audio objects beyond a threshold distance may be increased to a predetermined audible level or boosted by a particular factor. The particular factor may be a function of the direction of travel in the virtual space. Spatial audio objects within the threshold distance may be presented with a volume dependent on their distance from the user that is without a change in volume provided by the spatial audio modification.
Example
In one or more examples, a transition region may be provided between application and non-application of the spatial audio modification. Thus, a second elevation threshold 1302 may be provided at an elevation between the ground location 504 and the first elevation threshold 1301, and wherein when the user is provided with a bird's eye view having an elevation between the second and the first elevation thresholds, the effect of the spatial audio modification may be reduced relative to when applied above the first elevation threshold. Thus, below the second elevation threshold the spatial audio modification may not be applied and above the first elevation threshold the spatial audio modification is applied. However, at an elevation of the point of view location 700 in a transition region between the second elevation threshold 1302 and first elevation threshold 1301, the change in the perceived direction towards the audio objects is reduced. As discussed above, when considering the change in perceived direction to the spatial audio in terms of spatial audio objects we can consider that the spatial audio object has been elevated. As an example, if the spatial audio modification would result in the elevation of a spatial audio object by X when the point of view location has an elevation above the first elevation threshold 1301, the elevation for a point of view location having an elevation in the transition region may be 0.5× or increase from 0× to 1× gradually across the transition region. Accordingly, as the elevation from which the bird's eye view is provided from changes, there is not an abrupt change in the directions from which the spatial audio content is perceived to be heard.
Example
With reference to
While in some embodiments the spatial audio modification is removed when the unmodified location of the spatial audio object is within the field of view, example
While the apparatus 101 may, in some embodiments, apply the spatial audio modification to spatial audio objects within the field of view (as in
Example
The example of
The example of
The choice of whether to apply the spatial audio modification to a spatial audio object may be taken based on various different factors. For example, it may be chosen by the VR content creator that some spatial audio objects should not be subject to the spatial audio modification. In one or more examples, the proximity of visual content with which the spatial audio object is associated to the user may be a deciding factor. In one or more examples, whether or not it is possible for the user to interact with the visual object associated with the spatial audio object or the spatial audio object itself may be a deciding factor.
In general, the application of the spatial audio modification to spatial audio objects may be dependent on a category assigned to said spatial audio modification. The category may be assigned by the content creator or it may be assigned at playback of the VR content based on the ability to interact with the object, user preferences or any other factor. In other examples, the category may define whether the spatial audio object is a point of interest or not and the spatial audio modification may be applied or not applied based on such a category. In one or more examples, the category defines whether or not the user is within a predetermined interaction distance of the spatial audio object or the point or region of visual content associated therewith. If the user is within the interaction distance it may be preferable to not apply the spatial audio modification while if the user is beyond the interaction distance the spatial audio modification may be applied.
In one or more embodiments, the apparatus may be configured to apply different spatial audio modifications (such as different functions in terms of applied elevation and/or volume changes) based on the category of the spatial audio object.
The example
In part (a), the first visual object 1701 corresponding to the first spatial audio object 1701 is shown within the field of view 1705, 1706. Accordingly, in this example, the spatial audio modification is not applied to the first spatial audio object 1702 and it is thus presented such that the audio is perceived from its default location, aligned with the visual object of the car 1701. The second visual object 1703 corresponding to the second spatial audio object 1704 is shown outside the field of view 1705, 1706. Accordingly, in this example, the spatial audio modification is applied to the second spatial audio object 1704 and it is thus presented such that the audio is perceived from a direction corresponding to a location elevated from the second visual object 1703. Thus, the perceived direction towards the second spatial audio object 1704 is heard from a more upwards direction away from the ground level 504.
In this example, the spatial audio modification is a function of distance of the spatial audio object from the ground location 1707. The function is illustrated by the bowl 1708. Further, in this embodiment, the maximum elevation threshold comprises the elevation of the point of view location 1700. The second spatial audio object 1704 is shown beyond a threshold distance from the bird's eye view, represented in this example by the edges of the bowl 1708, and is thus elevated to the maximum elevation threshold.
Part (b) shows the point of view location 1700 translated to a position between the first visual object 1701 and the second visual object 1703 and both of them are outside the field of view 1705, 1706. In such a position, both the first and second spatial audio objects 1702, 1704 are subject to the spatial audio modification, which provides for their elevation in accordance with the function 1708 based on their distance from the ground location 1707.
Part (c) shows the point of view location 1700 translated further to the right such that the second visual object 1703 is within the field of view 1705, 1706 while the first visual object 1701 is outside. In such a position, the first spatial audio object is subject to the spatial audio modification and is thus elevated to the maximum elevation threshold. The second spatial audio objects 1704, being within the field of view is not subject to the spatial audio modification and is thus located at its default location, aligned with the second visual object 1703.
In summary of the effect of
The example
Parts (a), (b) and (c) illustrate a change in elevation of the point of view location 1800 of the bird's eye view. Two spatial audio objects 1801, 1802 are shown. At the first elevation shown in part (a) the spatial audio objects are at a height that is level with the point of view location 1800. However, in part (b) the point of view location is changed, such as at a rate above a threshold rate comprising a rate-of-change-in-elevation threshold. Providing an abrupt change in the location of the spatial audio objects in accordance with the modification indicated by the spatial audio modification may be confusing to a user. Accordingly, the elevation of the spatial audio objects to reach the elevation indicated by the spatial audio modification is gradually applied over a gradual-application time. Thus, while the change in elevation of the point of view location is provided over the time between parts (a) and (b). The modified location of the spatial audio object in accordance with the spatial audio modification is not reached until a later time shown in part (c). Accordingly, the time between part (a) and part (c) may represent the gradual-application time while the time between part (a) and part (b) represent the time over which the change in elevation occurred.
Part (d) shows the elevation of the point of view location decreasing, shown by arrow 1803. As with the increase in elevation of the point of view location 1800, the spatial audio modification may be applied gradually based on a decrease in elevation.
User inputs may be gestures which comprise one or more of a tap, a swipe, a slide, a press, a hold, a rotate gesture, a static hover gesture proximal to the user interface of the device, a moving hover gesture proximal to the device, bending at least part of the device, squeezing at least part of the device, a multi-finger gesture, tilting the device, or flipping a control device. User inputs may provide for changing to or from the bird's eye view, rotating or translating the bird's eye view or changing elevation of the point of view location. Further the gestures may be any free space user gesture using the user's body, such as their arms, or a stylus or other element suitable for performing free space user gestures.
The apparatus shown in the above examples may be a portable electronic device, a laptop computer, a mobile phone, a Smartphone, a tablet computer, a personal digital assistant, a digital camera, a smartwatch, smart eyewear, a pen based computer, a non-portable electronic device, a desktop computer, a monitor, a smart TV, a server, a wearable apparatus, a virtual reality apparatus, or a module/circuitry for one or more of the same.
Any mentioned apparatus and/or other features of particular mentioned apparatus may be provided by apparatus arranged such that they become configured to carry out the desired operations only when enabled, e.g. switched on, or the like. In such cases, they may not necessarily have the appropriate software loaded into the active memory in the non-enabled (e.g. switched off state) and only load the appropriate software in the enabled (e.g. on state). The apparatus may comprise hardware circuitry and/or firmware. The apparatus may comprise software loaded onto memory. Such software/computer programs may be recorded on the same memory/processor/functional units and/or on one or more memories/processors/functional units.
In some examples, a particular mentioned apparatus may be pre-programmed with the appropriate software to carry out desired operations, and wherein the appropriate software can be enabled for use by a user downloading a “key”, for example, to unlock/enable the software and its associated functionality. Advantages associated with such examples can include a reduced requirement to download data when further functionality is required for a device, and this can be useful in examples where a device is perceived to have sufficient capacity to store such pre-programmed software for functionality that may not be enabled by a user.
Any mentioned apparatus/circuitry/elements/processor may have other functions in addition to the mentioned functions, and that these functions may be performed by the same apparatus/circuitry/elements/processor. One or more disclosed aspects may encompass the electronic distribution of associated computer programs and computer programs (which may be source/transport encoded) recorded on an appropriate carrier (e.g. memory, signal).
Any “computer” described herein can comprise a collection of one or more individual processors/processing elements that may or may not be located on the same circuit board, or the same region/position of a circuit board or even the same device. In some examples one or more of any mentioned processors may be distributed over a plurality of devices. The same or different processor/processing elements may perform one or more functions described herein.
The term “signalling” may refer to one or more signals transmitted as a series of transmitted and/or received electrical/optical signals. The series of signals may comprise one, two, three, four or even more individual signal components or distinct signals to make up said signalling. Some or all of these individual signals may be transmitted/received by wireless or wired communication simultaneously, in sequence, and/or such that they temporally overlap one another.
With reference to any discussion of any mentioned computer and/or processor and memory (e.g. including ROM, CD-ROM etc.), these may comprise a computer processor, Application Specific Integrated Circuit (ASIC), field-programmable gate array (FPGA), and/or other hardware components that have been programmed in such a way to carry out the inventive function.
The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole, in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that the disclosed aspects/examples may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the disclosure.
While there have been shown and described and pointed out fundamental novel features as applied to examples thereof, it will be understood that various omissions and substitutions and changes in the form and details of the devices and methods described may be made by those skilled in the art without departing from the scope of the disclosure. For example, it is expressly intended that all combinations of those elements and/or method steps which perform substantially the same function in substantially the same way to achieve the same results are within the scope of the disclosure. Moreover, it should be recognized that structures and/or elements and/or method steps shown and/or described in connection with any disclosed form or examples may be incorporated in any other disclosed or described or suggested form or example as a general matter of design choice. Furthermore, in the claims means-plus-function clauses are intended to cover the structures described herein as performing the recited function and not only structural equivalents, but also equivalent structures. Thus although a nail and a screw may not be structural equivalents in that a nail employs a cylindrical surface to secure wooden parts together, whereas a screw employs a helical surface, in the environment of fastening wooden parts, a nail and a screw may be equivalent structures.
Lehtiniemi, Arto, Tammi, Mikko, Laaksonen, Lasse, Vilermo, Miikka
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
10235010, | Jul 28 2016 | Canon Kabushiki Kaisha | Information processing apparatus configured to generate an audio signal corresponding to a virtual viewpoint image, information processing system, information processing method, and non-transitory computer-readable storage medium |
6330486, | Jul 16 1997 | RPX Corporation | Acoustic perspective in a virtual three-dimensional environment |
7048632, | Mar 19 1998 | KONAMI CO , LTD , A JAPANESE CORPORATION | Image processing method, video game apparatus and storage medium |
9041741, | Mar 14 2013 | Qualcomm Incorporated | User interface for a head mounted display |
9268406, | Sep 30 2011 | Microsoft Technology Licensing, LLC | Virtual spectator experience with a personal audio/visual apparatus |
20090005961, | |||
20100001993, | |||
20100265399, | |||
20110283865, | |||
20150297949, | |||
20180014135, | |||
20180109900, | |||
WO2009128859, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Oct 23 2017 | VILERMO, MIIKKA | Nokia Technologies Oy | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 052572 | /0531 | |
Oct 23 2017 | TAMMI, MIKKO | Nokia Technologies Oy | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 052572 | /0531 | |
Oct 30 2017 | LAAKSONEN, LASSE | Nokia Technologies Oy | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 052572 | /0531 | |
Oct 30 2017 | LEHTINIEMI, ARTO | Nokia Technologies Oy | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 052572 | /0531 | |
Oct 05 2018 | Nokia Technologies Oy | (assignment on the face of the patent) | / | |||
May 13 2024 | Nokia Technologies Oy | PIECE FUTURE PTE LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 068407 | /0454 | |
May 13 2024 | NOKIA SOLUTIONS AND NETWORKS OY | PIECE FUTURE PTE LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 068407 | /0454 |
Date | Maintenance Fee Events |
Apr 06 2020 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Jul 27 2024 | 4 years fee payment window open |
Jan 27 2025 | 6 months grace period start (w surcharge) |
Jul 27 2025 | patent expiry (for year 4) |
Jul 27 2027 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 27 2028 | 8 years fee payment window open |
Jan 27 2029 | 6 months grace period start (w surcharge) |
Jul 27 2029 | patent expiry (for year 8) |
Jul 27 2031 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 27 2032 | 12 years fee payment window open |
Jan 27 2033 | 6 months grace period start (w surcharge) |
Jul 27 2033 | patent expiry (for year 12) |
Jul 27 2035 | 2 years to revive unintentionally abandoned end. (for year 12) |