My research project
Audio-visual object-based dynamic scene representation from monocular video
This project will investigate the transformation of monocular audio and visual video into a spatially localised object-based audio-visual representation. Self-supervised and weakly supervised deep learning will be investigated for the transformation of general scenes into semantically labelled and localised objects. Learning on in-the-wild and BBC archive datasets will be investigated to support the generalisation to complex scenes. Specific use-cases such as sports and programme recommendation will also be investigated for evaluation in constrained contexts. The approach will be evaluated on both live and legacy content.