Spatial 3D audio technologies are helping headset-wearers locate sounds beyond the field of view.
Recognizing where a sound comes from — a gunshot, a car horn, a dog’s bark — is key to how we understand our environment, and respond to it.
And that’s true in virtual reality, just as in real life.
But sound cues, which are influenced by everything from head size to sound pressure differences from one ear to another, are incredibly difficult to replicate in a virtual environment.
That’s why audio has become an important area where VR leaders are focused on innovation. The latest advances in VR audio technology – including sophisticated “spatial audio” solutions – are making improved sound “localization” possible in virtual environments.
- How the brain ‘localizes’ sound position
- Capturing audio in three dimensions
- Recognizing how our heads move
- Encoding your head into your audio
How the brain ‘localizes’ sound position
Our brains process sound and visual inputs in very different ways, posing challenges for simulating alternate reality in VR.
When we use our eyes to see, the receptor cells of our retinas recognize the way light illuminates virtual or physical objects. That information is then communicated to the brain, which uses it to recognize the locations of visuals in the path of sight.
So when it comes to visuals, we still “see” VR media through the same process we use to see anything else.
Audio is a different story.
Sound location, in particular, is computed by the brain based on cues for things like direction and distance. We recognize lateral direction – aka a sound’s position to our right or left – using factors such as:
- Interaural time difference (ITD), the difference between the times sounds reach the two ears
- Interaural level difference (ILD), the difference in sound pressure level reaching the two ears
And that’s just right to left. Our brains also need to recognize the location of sounds in front or back of us, or above or below us.
To do that “localization,” we clue into the ways that sounds originating from different directions interact with the geometry of our bodies differently: The unique shapes and sizes of our heads, necks, torsos, and outer ears cause changes in sound filters and reflections – known as spectral modifications – that our brains use to infer direction and elevation.
These localization cues are just a few of the factors our brains consider when determining sound position. Distance localization is another concern, which we compute using cues around loudness, reverberations, and the time delay between an event taking place and the sound reaching our ears.
All of these considerations (among others) play into the development of VR audio technologies, which use specialized recording systems and algorithms to mimic natural audio.
Capturing audio in three dimensions
Audio companies have long tried to recreate location cues to better emulate real-world audio. One of the most effective approaches, which has made its way into VR, is “binaural” or 3D audio.
Binaural literally means “two ears,” but binaural audio is a far cry from traditional audio.
Binaural recording systems attempt to recreate the way the human head captures sound: Audio is recorded through microphones placed on both sides of a stand or dummy head (on the spots where our ears are) – enabling the microphones to capture level and time differences (aka ITD and ILD) just like our actual ears would.
Binaural microphones are typically housed in cavities that look just like the outer ear, which further improves their ability to capture audio in a way that sounds natural upon playback.
(Our outer ears, or pinnae, dramatically alter incoming sound waves; our brains understand these alterations as directional cues.)
Binaural audio has been around for centuries, but the rise of headphones and handheld audio devices has brought it back to the fore.
Whereas earlier eras of audio were about filling up big spaces with sound, headphone-delivered audio (including in VR) can benefit from the sense of 3D immersion that binaural audio provides.
“Binaural audio is critical to an immersive experience within the context of VR. We consider audio to be 50 percent of the immersive experience.”
–Adam Somers, Engineering Manager of JauntVR
But VR isn’t the only application. On YouTube, there are hundreds of binaural audio clips that are used for relaxation, studying, entertainment, and other purposes.
The effects are remarkable: In the “virtual haircut” video below, for example, you can hear the sounds of trimming behind you or atop your head. (Wear headphones to hear it properly.)
But for all its advantages, binaural audio on its own is not ideal for VR.
Consider a simulation environment, in which you might need to hear or sense a gunman behind you through headphones.
With only binaural audio in your ears, the gunman would always remain behind you – even as you turn to face him – since the static audio recording isn’t inherently responsive to changes in your head position.
To create the most effective audio in a VR environment, sound needs to “move” along with the user’s positioning changes.
Recognizing how our heads move
Headphones are the standard-bearer for delivering VR audio, since they isolate the listener acoustically. (External speakers are not ideal for VR, since even the best “surround sound” can be infiltrated by outside sounds or reverberations around the room, reducing immersion in the virtual environment.)
But as the gunman example suggests, headphones still have innate drawbacks that must be addressed for effective recognition of sound location in VR.
The most basic problem is head motion: When we turn our heads in VR, the soundscape needs to change to reflect the corresponding change in distance or direction. If it doesn’t, then the reality of the virtual experience isn’t so real.
Head-tracking technologies help mitigate the problem: If your headphones know you’re turning your head from left-to-right, your audio experience can be synchronized to the videoscape in view.
Delivering binaural audio that responds to positioning changes picked up through head-tracking headphones thus makes it more effective. In fact, head tracking is key to making both binaural and normal “ambisonic” audio more immersive in virtual reality.
A number of startups, including 3D Sound Labs, sell headphones with motion sensing technology. Dysonics also sells a motion sensor that can be attached to existing headphones to make audio more location-smart.
And Ossic (formerly SonicVR) is developing the Ossic X, a highly sophisticated headphone in which every component – including head tracking – is designed for immersive 3D-audio playback.
The “Ossic X” is not yet available for purchase, but the 10,000+ backers who pledged over $2.7M to the project on Kickstarter indicate the high level of interest in the company’s (potentially) forthcoming product.
Why? Because the Ossic X’s “spatialized” 3D sound functionality could offer the most ideal audio experience for VR of any product out there.
But to understand what makes the Ossic X unique, it’s important to understand how spatial audio technologies fool the brain into localizing sound.
Encoding your head into your audio
Whereas binaural recording systems attempt to capture sound just like our ears do, spatial audio technologies use algorithms to deliver sound in the same manner as a real-world experience. That means mimicking the same pitch, volume, reverberation levels, and other cues the brain would expect in real life.
To create a realistic sense of direction and distance, for example, spatial audio algorithms manipulate the frequency, pressure, and loudness that affect interaural level and time differences (ILD/ITD).
But direction and distance aren’t the only cues the brain computes to understand sound location.
As mentioned earlier, the spectral modifications of sounds – the changes in filters and reflections of audio waves against our heads and bodies – provide important info to our brains about sound location. (Binaural audio doesn’t factor these things in.)
For highly immersive spatial audio, these spectral cues can be mimicked by calibrating headphones to what are known as HRTFs, or head-related transfer functions. HRTFs serve as a marker for the way sound waves hit our bodies at various degrees (depending on the origin of the sound, per the graphic below).
Since we all have different-sized heads and torsos, delivering audio through head-tracking headphones, calibrated to our unique, custom-measured HRTFs, would be the holy grail of spatial audio.
But today, capturing the unique HRTF data for an individual VR headset-wearer would require an extensive process: HRTF data is captured by placing microphones inside your ears, and recording data on how sound waves hit your body as audio is played in a 360-degree manner around you.
Since that process is too onerous to do at scale, “generalist” HRTFs are encoded into the algorithms of many spatial audio technologies. (The generalist HRTF data comes from publicly available data sets, obtained from a generalist set of people’s heads and torsos.)
But obtaining custom HRTF data may not be “too onerous” for much longer. With its Ossic X, Ossic hopes to make head-measured audio customization possible outside a sound lab.
The Ossic X is equipped with sensors designed to begin a custom “HRTF anatomy calibration” process as soon as you put the headphones on.
The data is then applied into Ossic’s “smart algorithms” to determine accurate spatial audio playback based on your anatomy and position.
The Ossic X has yet to ship beyond a select tier of Kickstarter pre-orders, which may speak to the challenges of creating an HRTF-smart headphone solution at consumer-friendly prices.
So until the Ossic X arrives, VR enthusiasts are stuck with spatial audio solutions that use generalist HRTFs and comply with motion-sensing headphones. (Unless any other tools, such as bone conduction sensors, emerge to assist with sound localization.)
Startups are also creating new spatial audio technologies for producers and sound engineers.
- G’Audio, for example, sells production kits that enable detailed placement of sound in a 3D environment. They recently piloted a new solution for livestreaming spatial audio in VR.
- Kinicho, which was recently selected to participate in Augmentor (an AR/VR accelerator program), is developing a “sonic reality engine” for creating high-fidelity spatial audio for AR/VR.
And the spatial audio algorithms of these companies, among others, will continue to improve… but only as fast VR adoption increases: Improving the algorithms will demand more user data, which tech companies will only get as more and more people use VR.