3D spacial audio

Spatial audio processing refers to the plethora of signal processing and audio reproduction techniques that allow the perception of audio content in space and thus in context.

The quest for increased auditory realism has been going on for a long time starting as early as Alan D. Blumlein's invention of stereophonic sound reproduction in the first half of 1930s.

Stereophony, having the additional convenience of using only two channels of audio has defined the standard in spatial audio for a long time despite all of its disadvantages such as a limited optimal listening area and lack of spatial immersion. Quadrophony and other four channel systems have followed course but despite providing better realism and immersion than stereophony, did not attract great commercial and consumer interest.

The following spatial audio concepts such as binaural and transaural audio, multichannel audio, Ambisonics, WFS, and hybrids of those systems have been actively researched to this time. These systems are used in a variety of contexts from home entertainment systems, movie theatres, acoustic design, virtual and augmented reality applications, computer games etc. A fair categorisation of existing spatial audio systems is not possible as each system has its own merits.

Binaural audio

Binaural audio reproduction has originated from recordings made using microphones that resembled an average human head. These microphones are called the dummy-head microphone (or binaural microphone) and give very good results where it is possible to do a recording. Another approach is measuring the transfer functions between the sound source located at certain positions and the eardrums of a listener and design filters based on these transfer functions. These filters can then be used to synthesise a virtual sound source arbitrarily positioned in a listener's auditory space when audio is played back from headphones. This method allows a single user to experience an egocentric auditory reality. Due to lower computational complexity, binaural audio reproduction is the preferred choice for mobile terminals. For VR and AR applications, the rotation of the listener's head needs to be tracked for updating the filters used in the system accordingly.

Transaural audio

If a pair of loudspeakers is used instead of headphones, the problem of cross-talk occurs due to two channels of audio being not acoustically isolated from each other. This problem can be solved using filters that provide cross-talk cancellation. This way, it is possible to eliminate the need for headphones. However, transaural audio systems also provide spatial audio for a single listener only and at a single listening position unless the position of the listener is also tracked in addition to the rotation of his/her head.


Ambisonics is based on the reproduction of the exact sound field at a measurement or recording position. Its principles were laid out in early 1970s by Michael Gerzon. In general practice, the complete sound field including the volume velocity and pressure components at a given recording position is recorded using a four-element microphone array called the B-format microphone. The recorded sound field can be played back over a set of carefully positioned loudspeakers. Depending on the number and positioning of the loudspeakers, Ambisonic reproduction allows a periphonic display of recorded sound sources. It is also possible to synthesise audio and use it for Ambisonic reproduction. The major limitation of Ambisonics however, is the small listening area at which the reproduced sound field is correct.

Wave field synthesis

Wave field synthesis (WFS) is based on the Huygens principle stating that a number of secondary sources on a given wave front generated by a primary source can be used to simulate the primary source generating that sound field. Instead of using secondary sources on the wavefront, it is possible to calculate delays and gains for a linear or circular array of loudspeakers to simulate the wavefront associated with a primary source. The obtained sound field would be physically accurate at the vertical level of the array. The typical optimal listening area is large and WFS allows multiple listeners to experience the same auditory reality. WFS is very realistic and can also be referred to as an equivalent of holography in the audio domain.

The spatial audio research in I-Lab concentrates on all the different technologies described above. The research is mainly concerned with recording, synthesis, and reproduction techniques and the perception of sound fields. A particular area of interest is on audio systems that combine acoustical models of enclosures with spatial audio reproduction. The effect of visual reproduction on the perception of spatial audio is also being investigated.

Contact us

Find us

Centre for Vision Speech and Signal Processing
Alan Turing Building (BB)
University of Surrey