Automatic sound labelling for broadcast audio

The aim of this project is to develop new methods for automatic labelling of sound environments and events in broadcast audio, assisting production staff to find and search through content, and helping the general public access archive content.

Start date
1 October 2021
Duration
3 years
Application deadline
Funding information

A stipend of £17,909  for 21/22, which will increase each year in line with the UK Research and Innovation (UKRI) rate, plus home rate fee allowance of £4,500 (with automatic increase to UKRI rate each year). Exceptional international applicants will be considered for a fee scholarship covering the full international tuition fee. The studentship is offered for 3 years.

About

The aim of this project is to develop new methods for automatic labelling of sound environments and events in broadcast audio, assisting production staff to find and search through content, and helping the general public access archive content. The project will undertake a combination of interviews and user profiling, analysis of audio search datasets, and categorisation by audio experts to determine the most useful terminology for production staff and the general public as user groups. The project will develop a taxonomy of labels, and examine the similarities and differences between each group. The project will also investigate the application of a labelled library in a production environment, examining workflows with common broadcast tools, then integrating and evaluating prototype systems.  The project will also investigate methods for automatic subtitling of non-speech sounds, such as end-to-end encoder-decoder models with alignment, to directly map the acoustic signal to text sequences. Working with BBC R&D, the student will develop software tools to demonstrate the results, especially for broadcasting and the management of audiovisual archive data, and benchmark the results against human-assigned tags and descriptions of audio content. Using archive data provided by BBC R&D, the student will engage with audio production and research experts through Expert Panels, and potential end users through Focus Groups.

The project will be supervised by Prof Mark Plumbley , as part of the EPSRC Fellowship on AI for Sound in the Centre for Vision, Speech and Signal Processing. As part of this PhD, you will have the opportunity for close day-to-day collaboration with the BBC as a member of the R&D Audio Team. You will have inside access to meetings, data, tools and technology, and be able to work alongside a wide range of BBC staff.

We acknowledge, understand and embrace diversity.

Related links
AI for Sound Centre for Vision, Speech and Signal Processing (CVSSP)

Eligibility criteria

All applicants should have (or expect to obtain) a first-class degree in a numerate discipline (mathematics, science or engineering) or MSc with Distinction (or 70% average) and a strong interest in pursuing research in this field. Additional experience which is relevant to the area of research is also advantageous.

This studentship is open to UK, EU or overseas students.

Non-native English speakers will be required to have IELTS 6.5 or above (or equivalent) with no sub-test of less than 6.

How to apply

For enquiries contact Nan Bennett indicating your areas of interest and including your CV with qualification details (copies of transcripts and certificates).

Shortlisted applicants will be contacted directly to arrange a suitable time for an interview.

For further information about our research portfolio and how to apply visit the Centre for Vision, Speech and Signal Processing (CVSSP) research page. 

Vision, Speech and Signal Processing PhD


Application deadline

Contact details

Mark Plumbley
03 BB 01
Telephone: +44 (0)1483 689843
E-mail: m.plumbley@surrey.ac.uk

About CVSSP

CVSSP is a leading UK research centre in audio-visual signal processing, computer vision and machine learning ranked 1st in the UK and 3rd in Europe for Computer Vision. Our Centre is one of the largest in Europe with over 170 researchers and a grant portfolio in excess of £27 million, bringing together a unique combination of cutting-edge sound and vision expertise. Our aim is to advance the state of the art in multimedia signal processing and computer vision, with a focus on image, video and audio applications. Our Centre has a robust track-record of innovative research leading to technology transfer and exploitation in biometrics, creative industries (film, TV, games, VR), communication, healthcare, robotics and consumer electronics.

CVSSP is a destination of choice for postgraduate talent and it is part of the Department of Electrical and Electronic Engineering which is ranked second in the Guardian newspaper league table 2020.  The University of Surrey has recently been ranked 7th in the UK in the 2020 Advance HE Postgraduate Research Experience Survey (PRES).

This PhD project is associated with an EPSRC Fellowship in AI for Sound awarded to Prof Mark Plumbley, in collaboration with BBC R&D. As part of this PhD, you will have the opportunity for close day-to-day collaboration with the BBC as a member of the R&D Audio Team. You will have inside access to meetings, data, tools and technology, and be able to work alongside a wide range of BBC staff.

We acknowledge, understand and embrace diversity.

Studentships at Surrey

We have a wide range of studentship opportunities available.