Deep Learning for Audio-Visual Scene Analysis

A fully funded PhD studentship available in the area of Deep Learning for Audio-Visual Scene Analysis.

Start date

1 October 2024


3 years

Application deadline

Funding information

Full UK/EU/International tuition fees covered for 3 years.

Stipend at £19,237 (2024/25 ) per annum for 3 years initially, and can be extended for up to 6 months.

International students are also welcomed to apply.

For exceptional international candidates, there is the possibility of obtaining a scholarship to cover overseas fees.


The University of Surrey is offering a fully funded PhD studentship on the topics of Deep Learning for Audio-Visual Scene Analysis, with industrial partner Bang & Olufsen. This project aims to develop new deep learning methods for audio-visual scene analysis in a smart home environment. This will involve the use of heterogeneous sensors, e.g. microphones and cameras, for analysing various sources and events present in the acoustic environment. Tasks to be considered include audio-visual source separation, localization/tracking, and audio-visual event detection/recognition. 

Successful candidates will be supervised by Professor Wenwu Wang and Professor Philip Jackson in the Centre for Vision, Speech and Signal Processing (CVSSP) and Surrey Institute for People Centred Artificial Intelligence, at the University of Surrey. The PhD student will be based at the CVSSP. They will benefit from resources from CVSSP, the Surrey Institute for People-Centred AI, and potential secondment opportunities at Bang & Olufsen.

Eligibility criteria

Open to any UK or international candidates.

You will need to meet the minimum entry requirements for our Vision, Speech and Signal Processing PhD PhD programme.

All applicants should have (or expect to obtain) a first-class degree in a numerate discipline (mathematics, science or engineering) or MSc with Distinction (or 70% average) and a strong interest in pursuing research in this field. 

Additional experience which is relevant to the area of research is also advantageous. 

English language requirements: 

IELTS Academic 6.5 or above (or equivalent) with 6.0 in each individual category.

How to apply

Applications should be submitted via the our Vision, Speech and Signal Processing PhD programme page.

Studentship FAQs

Read our studentship FAQs to find out more about applying and funding.

Application deadline

Contact details

Wenwu Wang
06 BB 01
Telephone: +44 (0)1483 686039
Philip Jackson
07 BB 01
Telephone: +44 (0)1483 686044

Studentships at Surrey

We have a wide range of studentship opportunities available.