Audio and video based speech separation for multiple moving sources within a room environment

Overview

Human beings have developed a unique ability to communicate within a noisy environment, such as at a cocktail party. This skill is dependent upon the use of both the aural and visual senses together with sophisticated processing within the brain. To mimic this ability within a machine is very challenging, particularly if the humans are moving. This project attempts to address major challenges in audio-visual speaker localization, tracking and separation.

Funder

EPSRC

Team

Principal investigators

Professor Wenwu Wang

Professor in Signal Processing and Machine Learning

See profile

w.wang@surrey.ac.uk

Professor Josef Kittler

Distinguished Professor

See profile

j.kittler@surrey.ac.uk