Blind source separation

Blind source separation (BSS) aims to extract the original unknown source signals from their mixtures and possibly to estimate the unknown mixing channels using only the information within the mixtures observed at the output of each channel with no, or very limited, knowledge about the source signals and the mixing channel.

BSS has a wide range of potential applications in e.g. audio, speech, music, image, video, biomedical, and communication signal processing (including enhancement and recognition), among many others.

Our activities in BSS focus mainly on the following challenges:

Convolutive BSS

In this problem, the observed signals are assumed to be the mixtures of linear convolutions of unknown sources. To address this problem, we are specially interested in the frequency domain methods, binary mask based techniques (such as ideal binary mask, and statistical soft mask), methods that enforces additional constraints for the estimation of the unmixing filters and source signals. We are also interested in the study of the ambiguities associated with the frequency domain BSS method, such as the permutation problem and scale ambiguities.

Underdetermined BSS

In this problem, the number of unknown sources is greater than that of the observed mixtures. It is an ill-posed problem, and to solve this problem, additional constraints and assumptions, e.g. the sparsity assumption as widely used in the literature, are necessary. For this problem, we have been studying several techniques including sparse representations, convolutive sparse coding, dictionary learning, non-negative matrix factorization, and binaural cues based statistical modelling approach.

Audio-visual BSS

In this problem, the aim is to use the interaction (coherence) between different (e.g. audio and visual) modalities to enhance the performance of source separation for the mixtures observed in adverse environments (e.g. noise). We have been studying several aspects of this problem, including audio-visual fusion, statistical modelling of audio-visual coherence, using audio-visual coherence as statistical priors to supervise source separation, learning audio-visual dictionaries from audio-visual data, audio-visual moving source localisation and tracking, and visual lip-reading for audio enhancement and separation.


These research activities have been funded by the EPSRC, DSTL, and RAENG, and also led to consultancy for the Home Office.

For more details about these activities, please see the publications tab on Wenwu Wang's staff profile page.

Contact us

Find us

Centre for Vision Speech and Signal Processing
Alan Turing Building (BB)
University of Surrey