Surrey to demo world’s first computer-vision-assisted audio system for VR
The world’s first computer-vision-assisted, spatial audio system could help virtual reality gaming ‘level up’, claim experts from the University of Surrey’s Centre for Vision, Speech and Signal Processing (CVSSP).
In a paper to be presented at the IEEE VR Conference in Osaka Japan, researchers detail how they developed a computer-vision system that takes two images from off-the-shelf 360° cameras and uses AI to understand the acoustics of the user’s environment.
This allows the spatial audio to intelligently adapt to the user’s environment and produce a more immersive virtual or augmented reality experience.
The system uses a machine learning process called convolutional neural network to analyse and extract high-level understanding of the environment shape, type of objects and material acoustic properties, such as furniture, doors, windows and carpet. The system takes this information to produce an acoustic model of the room, which is used to adapt the spatial audio reproduction to the environment, therefore providing a realistic audio experience.
Dr Hansung Kim, lead author from CVSSP, said: “This is a fruit of long-term collaboration between vision and audio researchers in the S3A project at CVSSP. I believe this work has built a strong cross-disciplinary link between vision and audio research fields for complete VR/AR content production.”
Professor Adrian Hilton, co-author of the paper and Director of CVSSP, said: “This research advances the use of AI and computer vision to create a unique, immersive and compelling listening experience for users. This could be a game changer for VR and AR applications.
“We tend to focus on the creation of highly realistic visual content for VR but spatial audio is equally, if not, more important to engage participants and open immersive experiences to a mass-audience.”
This research was conducted as part of S3A: Future Spatial Audio for Immersive Listener Experiences at Home – a flagship project funded by the EPSRC in collaboration with the BBC and Universities of Salford and Southampton.
Read the full paper here.