We propose a novel method for full-sphere binaural sound source localization that is designed to be robust to real world recording conditions. A mask is proposed that is designed to remove diffuse noise and early room reflections. The method makes use of the interaural phase difference (IPD) for lateral angle localization and spectral cues for polar angle localization. The method is tested using different HRTF datasets to generate the test data and training data. The method is also tested with the presence of additive noise and reverberation. The method outperforms the state of the art binaural localization methods for most testing conditions.
For a sound source on the median-plane of a binaural system, interaural localization cues are absent. So, for robust binaural localization of sound sources on the median-plane, localization methods need to be designed with this in consideration. We compare four median-plane binaural sound source localization methods. Where appropriate, adjustments to the methods have been made to improve their robustness to real world recording conditions. The methods are tested using different HRTF datasets to generate the test data and training data. Each method uses a different combination of spectral and interaural localization cues, allowing for a comparison of the effect of spectral and interaural cues on median-plane localization. The methods are tested for their robustness to different levels of additive noise and different categories of sound.
A binaural sound source localization method is proposed that uses interaural and spectral cues for localization of sound sources with any direction of arrival on the full-sphere. The method is designed to be robust to the presence of reverberation, additive noise and different types of sounds. The method uses the interaural phase difference (IPD) for lateral angle localization, then interaural and spectral cues for polar angle localization. The method applies different weighting to the interaural and spectral cues depending on the estimated lateral angle. In particular, only the spectral cues are used for sound sources near or on the median plane.
Binaural recording technology offers an inexpensive, portable solution for spatial audio capture. In this paper, a full-sphere 2D localization method is proposed which utilizes the Model-Based Expectation-Maximization Source Separation and Localization system (MESSL). The localization model is trained using a full-sphere head related transfer function dataset and produces localization estimates by maximum-likelihood of frequency-dependent interaural parameters. The model’s robustness is assessed using matched and mismatched HRTF datasets between test and training data, with environmental sounds and speech. Results show that the majority of sounds are estimated correctly with the matched condition in low noise levels; for the mismatched condition, a ‘cone of confusion’ arises with albeit effective estimation of lateral angles. Additionally, the results show a relationship between the spectral content of the test data and the performance of the proposed method.