Published: 20 September 2017

CVSSP wins Google-sponsored ‘audio-tagging’ challenge

Advanced signal processing systems developed at Surrey, which can recognise sounds within everyday environments, have won first prize in the DCASE 2017 challenge.

Going head to head with top industry and academic players in the field of acoustics, the team from Surrey’s Centre for Vision, Speech and Signal Processing (CVSSP) have won first prize in the DCASE (Detection and Classification of Acoustic Scenes and Events) 2017 challenge, sponsored by Google and Audio Analytic.

The competition challenged teams to demonstrate systems which can recognise types of sounds within everyday environments such as a busy street, office or park. Enabling machines to understand their surroundings – providing situational awareness and improving safety – this technology has huge potential in sectors such as security monitoring, autonomous cars, smart homes and robotics.

The Surrey team won the ‘large-scale weakly supervised sound event detection for smart cars’ challenge after demonstrating their technology using data from YouTube video excerpts. The systems were shown to be able to identify a wide range of vehicle sounds (such as cars, buses, bikes and skateboards) and warning sounds (such as car alarms, ambulance sirens and reversing beeps).

The four systems submitted by CVSSP took the top four places in the ‘audio tagging’ results table, beating the 25 systems submitted by other competitors. A system developed by CVSSP was also ranked in third place in the ‘sound event detection’ subtask.

The challenge submission was made by Dr Yong Xu and PhD student Qiuqiang Kong, Dr Wenwu Wang, and Professor Mark Plumbley of CVSSP. It was funded by an EPSRC project, ‘Making Sense of Sounds’, which is a collaboration between the Universities of Surrey and Salford, led by Professor Mark Plumbley and involving Dr Wenwu Wang, Dr Philip Jackson of CVSSP, and Professor David Frohlich, Director of Surrey’s Digital World Research Centre.

The DCASE 2017 challenge – the third since the competition launched in 2013 – was organised by Tampere University of Technology (TUT), Carnegie Mellon University (CMU) and public research organisation INRIA. Other competitors included CMU, New York University, Bosch, USC, TUT, Singapore A* Star, Korean Advanced Institute of Science and Technology, Seoul National University and National Taiwan University.

Dr Wang said: “Our success in the DCASE 2017 challenge demonstrates CVSSP’s world leading research in the area of machine perception and understanding of complex audio-visual scenes. Our work is likely to have a significant impact on shaping future technologies for the robotics, creative, security, healthcare, electronics and sound engineering industries.”

Professor Adrian Hilton, Head of CVSSP, added: “This is another great achievement by CVSSP researchers on the world stage.”

Why not explore our programmes in Electrical and Electronic Engineering, including our MSc in Computer Vision, Robotics and Machine Learning?


Share what you've read?