CVSSP projects

We have a collaborative research and innovation portfolio of over £30m in current projects supported by government, industry and third sector organisations. Our centre leads multiple national and international flagship programmes in AI and machine learning, together with a large portfolio of collaborative research, development and technology transfer projects.

Primary funders include UKRI (EPSRC, InnovateUK, BBSRC, MRC, AHRC) with a current EPSRC portfolio of £20m, Royal Society, Royal Academy of Engineering, Wellcome Trust, Cancer Research UK, Dementia Research UK, Alzheimer’s Society, BBC, NPL, MoD, Dstl, EU and SNSF. Direct industry funding of research and licensing of CVSSP technology is over £4m with substantial additional in-kind support for research and facilities.

National flagship programmes in AI

CVSSP leads national and international flagship research programmes in AI including: the DECaDE UKRI Digital Economy Centre in AI and Blockchain; a UKRI-EPSRC Prosperity Partnership in AI for Creative Industries; and two UKRI-EPSRC programme grants in face recognition and spatial audio. International partnerships include the MURI/EPSRC programme leading fundamental AI advances for multimodal semantic information.

The centre has also received strategic UKRI-EPSRC Platform Grant support since 2003 to underpin continuity of leading UK expertise in audio-visual AI.


CVSSP hosts a number of prestigious personal fellowships for established and early career academics to support national and international research leadership. The Centre is keen to host and support individual fellowship programmes to become independent research leaders and grow impactful areas of research and collaboration across multiple disciplines related to AI.

Please get in touch if you are interested in applying for a fellowship to be hosted in CVSSP.

Meet our current fellows:

Project portfolio

Our research has pioneered new technologies for the benefit of society and the economy, with applications spanning healthcare, security, entertainment, robotics, autonomous vehicles, communication and audio-visual data analysis.

Creative vision and sound

Creative Vision focuses on machine perception for creative technologies, specialising in 4D immersive VR content production, performance capture and video-based animation for film and games.

Creative Sound works on spatial audio and machine audition, developing audio signal processing technology related to sound recognition and immersive audio experiences.

Future Personalised Media Experiences

AI4ME: AI for Media Experiences

Principal investigator: Prof. Adrian Hilton

Funder: EPSRC

Dates: 2021 - 2026


This 5-year Prosperity Partnership, with the BBC and universities of Surrey and Lancaster, will develop and trial new approaches for creating and delivering object-based media at scale. Through groundbreaking collaborative research we aim to transform how our audiences can enjoy personalised content and services in the years ahead. The goal is to enable scaled delivery of a vast array of content experiences in an efficient and sustainable way to mainstream audiences via multiple platforms and devices.

Find our more on the project page.

Collaborators: University of Surrey, BBC, Lancaster University.


AI for Sound - EPSRC Senior Fellowship 

Principal investigator: Prof. Mark Plumbley

Funder: EPSRC

Dates: 2020 - 2025


The aim of this project is to transform how AI understands everyday sounds – from our homes, outdoor environments, to the workplace – tackling key issues that have prevented computational analysis of sound from reaching its potential.

The Fellowship will focus on four key areas:

  • Monitoring sounds of human activity in the home to help with assisted living
  • Measuring sounds in non-domestic buildings to improve the office and workplace environment
  • Measuring sounds in smart cities to improve the urban environment
  • Developing sound tools to help producers and consumers of broadcast creative content

Professor Mark Plumbley is an expert in the detection, analysis and separation of sounds. He founded the influential Detection and Classification of Acoustic Scenes and Events (DCASE) challenge – an annual showcase for researchers to discuss and present their work world-wide. Find our more on the project page

Collaborators: University of Surrey, Accenture, Audio Analytic, BBC R&D, the Connected Placed Catapult, the Digital Catapult, the Environment Agency, Pindrop, Samsung, The Alan Turing Institute, and the UK Dementia Research Institute.


InHEAR: Intelligent hearables with environment-aware rendering 

Principal investigator: Dr Philip Jackson

Funder: EPSRC

Dates: 2020 - 2024


"Hearables"—in-ear headphones with built-in microphones and signal processing capability—offer many new opportunities to reproduce sound that adapts and responds to your preferences, your activities, and what's happening around you. For example, they can change the properties of what you're listening to if it's a noisy train compared to a quiet library, or reproduce a telephone conversation so that the talker sounds as if they're coming from a position that is as clear and intelligible as possible.

The InHEAR project will develop new algorithms to maximise the listener's experience regardless of their listening context. The project brings together three doctoral researchers spanning disciplines including sound analysis, perceptually optimised audio rendering and neuroscience

Find our more on the project page

Collaborators: University of Surrey, Bang & Olufsen.

Armin Mustafa

4D Vision for Perceptive machines - The Royal Academy of Engineering Fellowship 

Principal investigator: Dr Armin Mustafa

Funder: The Royal Academy of Engineering

Dates: 2018 - 2024


The natural world is 4D – three-dimensional objects, such as people, which move over time. 4D machine perception of natural dynamic scenes is critical to enabling autonomous systems, such as robots, to work safely alongside people at home or work. The five-year RAEng Fellowship will provide the foundation for 4D machine perception of complex dynamic scenes, to help machines reconstruct, interpret and interact with the natural world.

The rise in popularity of smart devices – from mobile phones to autonomous cars and robotic care assistants – has led to a demand for improved machine understanding of the world they operate in, so they can integrate with our day-to-day lives.

Find our more on the RAE page.

Collaborators: University of Surrey.


DUO-India Fellowship Programme (Professors) on Audio Scene Analysis and Source Separation

Principal investigator: Prof. Wenwu Wang

Funder: DUO-India

Dates: 2020 - 2022


This project is collaboration between Surrey and Indian Institute of Technology at Kanpur (Dr Vipul Arora). The project aims to develop new deep embedding techniques for audio scene analysis and source separation. Source identification and separation are fundamental research areas in signal processing. Traditionally, matrix factorization approaches have been used for these tasks. Recently, data driven methods, especially the deep learning-based methods have been found to be very effective in learning the variations in sources. This project will focus on developing more robust and agile embeddings for source identification and separation. These will be mostly oriented towards problems in audio event detection in environmental sounds and for music related applications. These two areas are of great interest in academia as well as industry. In academia, there have been annual international challenges on these problems. Industries providing audio based smart home technologies (Alexa, Google Home, Siri, Cortana, etc.) and music information retrieval and recommendation technologies (Spotify, Amazon Prime Music, Google Play Music, etc.) are greatly interested in such systems.

Find our more on the ASEM-Duo (India) website.

Collaborators: University of Surrey, Indian Institute of Technology at Kanpur.


MAchine learNing acousTIc Surveillance MANTIS Phase 2

Principal investigator: Prof. Wenwu Wang

Funder: MoD DASA

Dates: 2020 - 2021


This project is collaboration with Airspeed, developing novel AI and signal processing techniques for drone sound detection and localization.

Collaborators: University of Surrey, Airspeed.


Automated Captioning of Image and Audio for Visually and Hearing Impaired

Principal investigator: Prof. Wenwu Wang

Funder: British Council (Newton Institutional Links Award)

Dates: 2021 - 2023


The project is led by Prof Wenwu Wang at CVSSP, University of Surrey, jointly with Izmir Katip Celebi University (IKCU) @ Turkey (Dr Volkan Kilic), with project partners from charities (e.g. Cued Speech UK) and industrial sectors working with the hearing and visually impaired. The focus at Surrey will be to develop machine learning and signal processing algorithms for information extraction from audio data, recognize audio classes (i.e. tags and labels), and generate text description of audio content. This work is built on the recent contributions of CVSSP in the area of acoustic scene analysis, audio event detection, environmental sound recognition, and audio tagging, together with some latest results on audio captioning. The algorithms developed will be integrated by the partner university

IKCU into a smartphone app to prototype and demonstrate the concept. The postdoctoral researcher working on this project is co-supervised by Prof Sabine Braun, Director of the Centre for Translation Studies, University of Surrey. Find our more on the funding scheme website.

Collaborators: University of Surrey, Izmir Katip Celebi University (IKCU) at Turkey, Cued Speech UK.


SIGNetS: signal and information gathering for networked surveillance

Principal investigator: Prof. Wenwu Wang

Funder: DoD & MoD (UDRC phase 3 call on the application theme Signal and Information Processing for Decentralized Intelligence, Surveillance, and Reconnaissance)

Dates: 2020 - 2023


SIGNetS is a collaborative project between the University of Surrey, University of Cambridge (Prof Simon Godsill) and the University of Sheffield (Prof Lyudmila Mihaylova), to address fundamental challenges in distributed sensing, multimodal data fusion and autonomous sensor management. The work at Surrey will focus on new methods and algorithms of autonomous sensor management and fusion for large scale distributed sensor networks. In a complex environment, a variety of heterogeneous sensors may be deployed for network surveillance, where the sensor measurements are collected in different dimensions, resolutions, reliability and power consumptions. As a result, it is often infeasible to schedule the execution of the task as a priori, without consuming additional resources including communication loads and computational over-heads. To address this issue, new strategies are required for autonomous sensor management, with consideration of practical constraints and performance trade-off. More specifically, we will focus on developing autonomous sensor management algorithms, scalable algorithms for large scale sensor networks, and uncertainty modelling and quantification with heterogeneous sensor measurements. Find our more on the project page..

Collaborators: University of Surrey, University of  Cambridge, University of Sheffield.


Animo – Tracking and Understanding Animal Motion in the Wild

Principal investigator: Dr Charles Malleson

Funder: The Leverhulme Trust

Dates: 2019 - 2022


Through this fellowship Dr Malleson will investigate non-contact tracking and understanding of animal motion in unconstrained environments.

Motion capture has numerous applications,  from entertainment (film, video and game production, virtual/augmented reality) to bio-mechanics (gait analysis for rehabilitation and high-performance sport analysis for athlete training). Human motion capture from natural video has been a central challenge in computer vision for the past two decades, but little attention has been directed towards video-based motion capture of animals. Dr Malleson’s fellowship will address the problem of non-contact motion capture of animals in the wild and will open up new collaborative research directions The research will advance our understanding of animal motion and ultimately impact animal welfare and healthcare applications by enabling non-invasive monitoring of animals – such as pets, livestock and wild animals.

Dr Malleson is a Research Fellow in Computer Vision at CVSSP. His research includes image and video processing, immersive content production, general dynamic 3D reconstruction and marker-less human and animal motion capture using multi-modal input, including multiple-view video, depth and inertial sensors. Charles obtained his PhD in computer vision from the University of Surrey in 2015 with the thesis ‘Dynamic scene modelling and representation from video and depth’ and was a post-doctoral research associate at Disney Research, working on computer vision for immersive content production. Find our more on the news page

Collaborators: University of Surrey.

project logo

Polymersive: Immersive Video Production Tools for Studio and Live Events 

Principal investigator: Prof. Adrian Hilton

Funder: Innovate UK

Dates: 2019 - 2021


"Whilst audiovisual production pipelines have recently been developing to include stereoscopic 360-degree video content and directional or spatial audio, the viewer is still assumed to be at a fixed location---able to turn their head, but not actually move. Headsets that can track head movement (often called 6DOF) are becoming available and finding application within video games, but there is currently no production-ready pipeline for 6DOF audiovisual content capture. We refer to such content as ""Polymersive"" This project sets out to develop a production tool chain to support Polymersive content.

IMRSVRay, Surrey University, and the BBC will bring together their work on lightfield capture and spatial audio to give conventional production teams the tools and processes they need to rapidly create immersive content. We call this toolset the Polymersive Production system. We will also develop tools for post-production to enable these teams to create experiences and test level distribution systems for end-user feedback.

Find our more on the project page.

Collaborators: University of Surrey, BBC Research and Development, IMSRVRay.


Healthcare focuses on medical imaging technologies for cancer detection and machine learning in personalised care for better living and healthy ageing.


Radiomics and Data Science in Medical Imaging for Cancer 

Contact: Prof. Philip Evans

Funder: EPSRC, NPL, Alliance Medical Limited, Royal Surrey NHS Foundation Trust

Dates: 2018-2025


The goal of this project is to develop new approaches to extracting information from medical images to help diagnosis, treatment planning and outcome prediction, particularly in cancer. It is carried out working with the largest independent medical imaging provider in Europe, a major UK cancer centre and the national measurements institute. 

Radiomics involves extracting texture features from images. We have developed a model to determine which features can provide stable imaging biomarkers and produce good results across a range of datasets. We have also developed methods of reducing noise in the images to make them more reliable. We are also currently working on a test object or “phantom” to allow standardisation of measurement of texture across a range of imaging devices.   

We have developed the use of deep learning AI to help radiologists detect both the primary tumour location and metastatic spread for oesophageal cancer. The next stage of this work will be to apply the methodology to detection of lung cancer.  

Collaborators: Alliance Medical Limited, The National Physical laboratory, Royal Surrey NHS Foundation Trust.


Phantoms for Audit of the MR-Linac 

Contact: Prof. Philip Evans

Funder: Elekta, NPL

Dates: 2017-2021


This project involves developing phantoms and related materials to allow audit of the new Unity MR-linac system developed by Elekta for image guided radiotherapy. Elekta Unity combines state of the art MR imaging with state-of-the-art radiotherapy delivery. Elekta Unity has set a new standard for treatment and it is anticipated that the clinical use of this technology will increase in the future. One challenge is how to demonstrate its accuracy and audit the quality of treatment delivered with it at different centres. The project has involved using new 3D printing technology, state of art alanine dosimetry (which is delivered by NPL)  coupled with anatomical phantom design to produce a system that allows stringent testing of the new technology that can be used for audit around the country and around the world. 

Collaborators: Elekta Limited, The National Physical laboratory, Royal Surrey NHS Foundation Trust.

smiling elder couple

Engineering Integrated Dementia cAre (EIDA) UK Dementia Research Institute (DRI) Care Research & Technology Centre 

Contact: Dr Kevin Wells

Funder: Medical Research Council, Alzheimer’s Society, Alzheimer’s Research UK

Dates: 2019 - 2025


The goal  of this project is to bring together a diverse team of doctors, engineers and scientists who together can harness recent advances in artificial intelligence, engineering, robotics and sleep science to create novel technologies that will deliver the highest quality dementia care in the home by establishing a Healthy Home environment. Find our more on the project page.

Collaborators: University of Surrey, Imperial College London, Borders Partnership NHS Foundation Trust (SABP).

CVSSP PROTEIN project logo

PROTEIN: PeRsOnalized nutriTion for hEalthy livINg 

Contact: Dr Kevin Wells

Funder: European Commission

Dates: 2018 - 2022


Proper nutrition is essential for good health, well-being and the prevention, mitigation or treatment of a number of non-communicable diseases (NCDs). Food is not only a source of calories, but also a complex mixture of dietary chemicals, some of which are directly related to cardiovascular diseases, diabetes, allergies and some types of cancer.

Foods, diet and nutritional status, including overweight and obesity, are also associated with elevated blood pressure and blood cholesterol or even resistance to the action of insulin. These conditions are not only risk factors for non-communicable diseases, but major causes of illness themselves. However, today's diet is characterized by irregular and poorly balanced meals.

Unhealthy eating habits in our daily life are not only risk factors for non-communicable diseases, but also major causes of stress and tiredness, i.e., lack of energy. Knowledge about our dietary habits based on the analysis of diverse types of information, including individual parameters, can contribute greatly towards answering key questions to respond to societal challenges regarding food and health. Find our more on the PROTEIN project page.

Collaborators: University of Surrey(United Kingdom), Intrasoft International Sa (Luxembourg), Ocado Group Plc (United Kingdom), Biosense Institute - Research And Development Institute For Information Technologies In Biosystems (Serbia), Aristotelio Panepistimio Thessalonikis (Greece), Katholieke Universiteit Leuven (Belgium), Datawizard Srl (Italy), Charite - Universitaetsmedizin Berlin (Germany), Cognicase Management Consulting Sl (Spain), The European Association For The Study Of Obesity - Ireland Company Limited By Guarantee (Ireland), Plux - Wireless Biosignals S.A. (Portugal), Diethnes Panepistimio Ellados (Greece), Istituto Comprensivo Di Boscochiesanuova (Italy), Fluviale - Societa A Responsabilita Limitata (Italy), Healthium - Healthcare Software Solutions, Sa (Portugal), Agrifood Capital Bv (Netherlands), Sport Lisboa E Benfica - Futebol Sad (Portugal), Istituto Comprensivo Statale B. Lorenzi Fumane Vr  (Italy), Virtuagym Bv (Netherlands).

Lucia Florescu

Optical Characterisation of Epithelial Tissue Function and Metabolism for Early Cancer Diagnosis and Treatment Monitoring - Wellcome Trust Fellowship

Principal investigator:  Dr Lucia Florescu

Funder: The Wellcome Trust

Dates: 2017 - 2022


More than 80% of cancers occur in directly or endoscopically accessible epithelial tissue lining the surfaces and cavities of organs and are preceded by a curable precancerous stage. For these cancers, clinical evaluation based on subjective visual impression is still the most widely employed method for assessing lesions. There is an urgent need for new approaches to epithelial tissue characterisation for objective early cancer diagnosis and monitoring.

We will develop and evaluate an optical imaging technology that will enable clinicians to see accurate 3D images of precancerous and cancerous changes in the epithelial tissue, the extent of disease, and early changes associated with treatment. The technology will be applied to tissue samples in the laboratory and to live tissue to define clinical criteria for objective early diagnosis and assessment of treatment success.

This development will improve cancer screening, diagnosis and treatment and greatly increase the chances of a cure while also reducing the cost to patients and healthcare systems. The new technology will also represent a valuable tool for cancer biology studies, which in turn may lead to new cancer treatments.

Find our more on the Wellcome Trust page.

Collaborators: University of Surrey.

Close up image of a human eye

RetinaUWF: AI Detection of Diabetic Retinopathy in Ultra-Wide-Field Retinal Images 

Contact: Prof. Miroslaw Bober.

Funder: EPSRC, Innovate UK

Dates: 2019 - 2021


Diabetic Retinopathy (DR) is a common complication of diabetes mellitus, which affects around half of 430m diabetics worldwide (WHO). It is a major cause of blindness (\>7% UK blindness) but can be easily ameliorated if detected and treated early. Hence the importance of annual screening when images of the retina are taken and reviewed by qualified graders for symptomatic features.

This project goal is to develop the world's first AI competent to analyse retinal images for DR and  urther innovate the deep-learning architecture to achieve high detection performance and robustness to distortions such as blurring and occlusions. Find our more on the UKRI project page.

Collaborators: University of Surrey, RetinaScan Ltd, Gloucestershire Hospitals NHS Foundation Trust.

Main chickens

LMDP: Low-cost Portable Molecular Diagnostic Platform for Rapid Detection of Poultry Infectious Pathogens

Contact: Dr Anil Fernando

Funder: EPSRC, BBSRC Newton Fund

Dates: 2018 - 2021


The goal of this project is to develop a rapid smartphone diagnostic test to identify bacterial and viral pathogens in poultry in remote areas of the Philippines.

This potentially ground breaking diagnostic test will consist of a sample collection and preparation device and a small instrument which will wirelessly connect to a smartphone. A smartphone app will run the test and display the results, which can be sent to a central database and used for disease surveillance purposes. Find our more on the news page.

Collaborators: University of Surrey, Brunel University, The Pirbright Institute.

Close up of a virus

Covid-19 IF - Smart Rapid COVID-19 Testing and Tracing system 

Contact: Dr Anil Fernando

Funder: EPSRC

Dates: 2020 - 2021


A team of scientists from the University of Surrey, Lancaster University and Brunel University are developing an easy-to-use test that can inform people if they have COVID-19 in just half an hour.

The proposed molecular test and smartphone app would let people who are self-isolating test themselves, and allow health care workers test both patients and themselves – helping the UK to dramatically upscale its testing capacity.

The battery-operated and hand-held smart phone-linked device is highly cost effective (£100/device) and easy to use. Developed in collaboration with Surrey’s Centre for Vision Speech and Signal Processing (CVSSP) and Lancaster, it works by taking nasal or throat swabs, which are put into the device. Then in 30 minutes, it can determine if someone has CoVID-19. The samples don’t need to go to a laboratory and the same device can test six people at once at a cost of around £4 per person.

The science behind the test has been used and evaluated in the Philippines to check chickens for viral and bacterial infections. The UK-based team is adapting the Philippines method to detect COVID-19 in humans and is calling on backers to help them mass-produce the kits.

Find our more on the news page.

Collaborators: University of Surrey, Lancaster University and Brunel University.


Robotics works on autonomous systems, covering a broad range of technologies related to visual human-machine interaction. These include sign language and autonomous vehicles.

Hands making sign language

Scalable Multimodal sign language technology for sIgn language Learning and assessment Phase-II

Principal investigator: Prof. Richard Bowden.

Dates: 2021 - 2024


The goal of the proposed project SMILE is to pioneer an assessment system for Swiss German Sign Language (Deutschschweizerische Gebärdensprache, DSGS) using automatic sign language recognition technology.

The SMILE project will involve not only experienced and internationally known researchers in their respective fields, but also young hearing and Deaf team members. The results of the project are expected to have an echo in the larger Deaf community -- not only through the involvement of many Swiss German Deaf signers with the project as subjects, but also because the national Swiss Deaf Association has recently decided to align its sign language curricula to the levels of the Common European Framework of Reference for Languages (CEFR). This SMILE project follows the CEFR approach by developing an assessment system that tests the production of vocabulary of DSGS at level A1 with first time integration of new technologies for sign language. SMILE will thus lay an advanced platform for teaching and learning systems, both specifically for DSGS and as a model for other sign languages. Find our more on the project page

Collaborators: University of Surrey, ETH Zurich, Eidgenossische Technische Hochschule.


Reflexive robotics using asynchronous perception

Principal investigator: Dr Simon Hadfield.

Funder: EPSRC

Dates: 2020 - 2023


This project will develop a fundamentally different approach to visual perception & autonomy where the concept of an image itself is replaced with a stream of independently firing pixels, similar to unsynchronised biological cells in the retina. Recent advances in computer vision & machine learning have enabled robots which can perceive, understand, and interact intelligently with, their environments. However, this "interpretive" behaviour is just one of the fundamental models of autonomy found in nature. The techniques developed in this project will exploit recent breakthroughs in instantaneous, non-image-based, visual sensing, to enable entirely new types of autonomous system. The corresponding step-change in robotic capabilities will impact the manufacturing, space, autonomous vehicles and medical sectors. Find our more on the project page

Collaborators: University of Surrey.

Person using virtual reality headset

ROSSINI: Reconstructing 3D structure from single images: a perceptual reconstruction approach

Principal investigator: Prof. Richard Bowden.

Funder: EPSRC

Dates: 2019 - 2022


Consumers enjoy the immersive experience of 3D content in cinema, TV and virtual reality (VR), but it is expensive to produce. Filming a 3D movie requires two cameras to simulate the two eyes of the viewer. A common but expensive alternative is to film a single view, then use video artists to create the left and right eyes' views in post-production. What if a computer could automatically produce a 3D model (and binocular images) from 2D content: 'lifting images into 3D'? This is the overarching aim of this project. Lifting into 3D has multiple uses, such as route planning for robots, obstacle avoidance for autonomous vehicles, alongside applications in VR and cinema.

ROSSINI will develop a new machine vision system for 3D reconstruction that is more flexible and robust than previous methods. Focussing on static images, we will identify key structural features that are important to humans. We will combine neural networks with computer vision methods to form human-like descriptions of scenes and 3D scene models. Our aims are to (i) produce 3D representations that look correct to humans even if they are not strictly geometrically correct (ii) do so for all types of scene and (iii) express the uncertainty inherent in each reconstruction. To this end we will collect data on human interpretation of images and incorporate this information into our network. Our novel training method will learn from humans and existing ground truth datasets; the training algorithm selecting the most useful human tasks (i.e. judge depth within a particular image) to maximise learning. Importantly, the inclusion of human perceptual data should reduce the overall quantity of training data required, while mitigating the risk of over-reliance on a specific dataset. Moreover, when fully trained, our system will produce 3D reconstructions alongside information about the reliability of the depth estimates.Find our more on the project page

Collaborators: University of Surrey, Aston University, CrossWing, Double Negative Ltd, Microsoft.


ExTOL: End to End Translation of British Sign Language

Principal investigator: Prof. Richard Bowden.

Funder: EPSRC

Dates: 2019 - 2021


British Sign Language (BSL) is the natural language of the British Deaf community and is as rich and expressive as any spoken language. However, BSL is not just English words converted into hand motions. It is a language in its own right, with its own grammar, very different from English. Also BSL uses different elements of the body simultaneously. Not just the movement and shape of the hands but the body, face, mouth and the space around the signer are all used to convey meaning.

The ultimate goal of this project is to take the annotated data and understanding from linguistic study and to use this to build a system that is capable of watching a human signing and turning this into written English. This will be a world first and an important landmark for deaf-hearing communication. To achieve this the computer must be able to recognise not only hand motion and shape but the facial expression and body posture of the signer. It must also understanding how these aspects are put together into phrases and how these can be translated into written/spoken language. Find our more on the project page

Collaborators: University of Surrey, Aston University, CrossWing, Double Negative Ltd, Microsoft.

Security and data

Security theme works on biometrics related technologies, specialising in facial recognition and natural language interfaces for human-AI collaboration.

Data research theme addresses the application of AI for audio-visual information search, understanding and preservation including visual recognition, distributed ledger technologies and the understanding of AI systems.


MVSE: Multimodal Video Search by Examples

Principal investigator: Prof. Josef Kittler.

Funder: EPSRC

Dates: 2021 - 2024


How to effectively and efficiently search for content from large video archives such as BBC TV programmes is a significant challenge. Search is typically done via keyword queries using pre-defined metadata such as titles, tags and viewer's notes. However, it is difficult to use keywords to search for specific moments in a video where a particular speaker talks about a specific topic at a particular location. Most videos have little or no metadata about content in the video, and automatic metadata extraction is not yet sufficiently reliable. Furthermore, metadata may change over time and cannot cover all content. Therefore, search by keyword is not a desirable approach for a comprehensive and long-lasting video search solution. 

In this project we will study efficient, effective, scalable and robust MVSE where video archives are large, historical and dynamic; and the modalities are person (face or voice), context, and topic. The aim is to develop a framework for MVSE and validate it through the development of a prototype search tool. Such a search tool will be useful for organisations such as the BBC and British Library, who maintain large collections of video archives and want to provide a search tool for their own staff as well as for the public. It will also be useful for companies such as Youtube who host videos from the public and want to enable video search by examples. We will address key challenges in the development of an efficient, effective, scalable and robust MVSE solution, including video segmentation, content representation, hashing, ranking and fusion. Find our more on the project page.

Collaborators: University of Surrey, BBC, Cambridge University, Ulster University.


FACER2VM: Face Matching for Automatic Identity Retrieval, Recognition, Verification and Management 

Principal investigator: Prof. Josef Kittler.


Dates: 2016 - 2021


Although face biometrics is beginning to be deployed in several sectors, it is currently limited to applications where a strict control can be imposed on the process of face image capture (frontal face recognition in controlled lighting). However, automatic face recognition in uncontrolled scenarios is an unsolved problem because of the variability of face appearance in images captured in different poses, with diverse expressions, under changing illumination. Furthermore, the image variability is aggravated by degradation phenomena such as noise, blur and occlusion.

The project will develop unconstrained face recognition technology, which is robust to a range of degradation factors, for applications in the Digital Economy and in a world facing global security issues, as well as demographic changes. The approach adopted will endeavour to devise novel machine learning solutions, which combine the technique of deep learning with sophisticated prior information conveyed by 3D face models.  Find our more on the FACER2VM project page.

Collaborators: University of Surrey, Imperial College London, University of Stirling, 3rd Forensic Ltd, IBM United Kingdom Limited, Home Office, Cognitec Systems GmbH, Digital Barriers, British Broadcasting Corporation - BBC, European Assoc for Biometrics EAB, University of Oxford, Jiangnan University University, University of York, Metropolitan Police Service, Stage Technologies Ltd, Digital Barriers Ltd, European Association for Biometrics.


MURI: Semantic information pursuit for multimodal data analysis

Principal investigator: Prof. Josef Kittler.

Funder: EPSRC

Dates: 2018 - 2023


The project comes under the topic 'Characterisation of Information Content in Data for Multimodal Data Analysis' in the MURI call for proposals. It will address the challenges surrounding information extraction from visual and audio sensors, such as cameras and microphones.

Machine perception concerns itself with extracting useful information from data gathered by sensors, in order to assist with tasks. The goal of the research will be to advance machine perception technology, allowing cameras and microphones to extract useful information from an environment and separate it from ‘nuisance’ factors such as illumination, blur, noise and object pose.

The collaborative research team will also work towards developing information theory and finding effective ways to characterise 'information semantics’. This will enable future machine perception systems to extract meaningful and actionable information from sensors mounted on autonomous vehicles, installed in smart cities, or supporting assisted living. Find our more on the MURI project page.

Collaborators: UK universities of Surrey, Oxford, University College London and Imperial College London, and the US universities of Maryland, John Hopkins, Stanford, California (Berkeley), California (Los Angeles) and Southern California.

Pins to represent 5G pinned around a city

IoT-Crawler: A Distributed Framework for Massive Multi-modal Data Stream Discovery and Predictive Analysis in Internet of Things

Principal investigator: Prof. Adrian Hilton.

Funder: European Commission

Dates: 2018 - 2021


IoTCrawler is a three-year long research project focusing on developing a search engine for the Internet of Things (IoT), enabling search on it’s devices. The project spans both industry, universities and cities.
IoTCrawler will focus on integration and interoperability across different platforms, dynamic and reconfigurable solutions for discovery and integration of data and services from legacy and new systems, adaptive, privacy-aware and secure algorithms and mechanisms for crawling, indexing, search in distributed IoT systems. Find our more on the project page.

Collaborators: University of Surrey (United Kingdom), Universidad de Murcia (Spain), University of Applied Sciences Osnabrück (Germany), Aarhus University (Denmark), Siemens AG Österreich (Austria), NEC Corporation (Germany), AGT Group (R&D) GmbH (Germany), Digital worx GmbH (Germany), Odin Solutions S.L (Spain), City of Aarhus (Denmark).


We are investigating alternative uses for distributed ledger technology (DLT), including safe online identity, healthcare, and secure digital archives. The new approach, fusing DLT (trusted data) and AI (making sense of that data), is a common thread across all of our projects in DLT and a unique perspective to this emerging technology pioneered by the University of Surrey

Logo of DECaDE project

DECaDE: Centre for the Decentralised Digital Economy

Principal investigator: Prof. John Collomosse.


Dates: 2020 - 2025


In today’s Decentralised Digital Economy (DDE) everyone has the opportunity to be both a producer and consumer of goods and services. For example, I could hire or drive a rideshare, rent or host my apartment, watch a video blog or monetize my own.

But these dynamic, peer to peer markets are all underpinned by centralised digital platforms. Users rarely have a say in their governance decisions, which often made in isolation from the global impacts they have on the society.

Launched in October 2020, DECaDE is a 5 year National Research Centre exploring how emerging data technologies such as Distributed Ledger Technology (aka `Blockchain’) and Artificial Intelligence (AI) could transform our digital economy through decentralised platforms.

DECaDE’s mission is to accelerate research in DLT, AI, and Human Data Interaction (HDI), working with industry and end-users to create the tools and techniques that will shape the evolution of the digital economy toward a new 21st century model of work and value creation, ensuring a prosperous, safe and inclusive society for all. Find our more on the DECaDE project page.

Collaborators: University of Surrey, University of Edinburgh, and the Digital Catapult.

DLT & blockchain research at CVSSP

Blockstart: Blockchain-based applications for SME competitiveness

Principal investigator: Prof. John Collomosse.

Funder: European Commission

Dates: 2019 - 2022


Blockchain innovation is a big yet unseized opportunity for SMEs in NWE. It has the potential to transform 3 of NWE’s top 5 sectors with €600bn turnover (agrofoods, logistics, health) that have shared transnational data challenges.SMEs in agrofoods, health and logistics rely heavily on data transactions across countries, eg; tracking shipments real-time across 20-30 organisations, sharing sensitive medical records between practitioners or for research, or assuring the safety, quality, and origin of food produce. SMEs are particularly vulnerable to risks in delayed invoices & cash flow, data security, and errors in data. Blockchain transforms data transactions by allowing for automated, efficient and secure transactions between parties that do not need to trust each other. Find our more on the project page.

Collaborators: University of Surrey, Brightlands Smart Services Campus, ChainPoint B.V., Multitel asbl, Christelijke Hogeschool Windesheim, Option Public, Ontwikkelingsmaatschappij Oost-Nederland NV, BioRegio STERN Management GmbH, Medicen Paris Region.

Digital Inspiration and Search in the National Archives

Principal investigator: Prof. John Collomosse.

Funder: Arts and Humanities Research Council

Dates: 2020 - 2021

Strategic funding

MSc in AI

Audio-Visual Media Research Platform 

Principal investigator: Prof. Yi-Zhe Song.

Funder: IFLYTEK (Suzhou) Technology Co., Ltd.

Dates: 2019 - 2025


Collaborators: University of Surrey, IFLYTEK (Suzhou) Technology Co., Ltd.


Audio-Visual Media Research Platform 

Principal investigator: Prof. Adrian Hilton.


Dates: 2017 - 2022


Platform Grant support for audio-visual media processing is supporting a critical-mass of joint research expertise in multi-sensory machine perception. This is critical towards achieving machines which can hear and see to understand and interact with real-world dynamic scenes.

Recent advances in both the audio and vision research communities, with the introduction of deep learning methodologies, have achieved a step change in the capability of machine understanding enabling for the first time automatic interpretation of real-world complexity audio and visual data. Our research seeks to capitalise on these advances developing both the next generation of research leaders in multi-sensory machine perception and realising the capabilities for autonomous systems capable of combining sensing modalities to robustly understand and interpret audio-visual media. Research will address the open-challenge of machine understanding of complex dynamic real-world scenes combining the complementary information available from audio and visual sensor to achieve robust interpretation. Find our more on the UKRI project page.

Collaborators: University of Surrey, University of Edinburgh, and the Digital Catapult.

EPSRC Capital award for core equipment

Principal investigator: Prof. Adrian Hilton.

Funder: EPSRC

Dates: 2019 - 2021

Experimental Equipment Call

Principal investigator: Prof. Adrian Hilton.

Funder: EPSRC

Dates: 2015 - 2021

Research at CVSSP

Over the past thirty years, we have become an international centre of excellence for training and research in audio and visual machine perception in collaboration with industry.

Past projects

Further details about these projects can be obtained either by visiting the relevant websites or by contacting those involved in the research. The list is non-exhaustive.

Creative vision and sound

Artistic rendering of consumer video (HP Labs)

Principal investigator: J. Collomosse.

Developing technologies for the automatic organisation of large consumer media (photo and video) collections, and their presentation on ambient displays in novel  styles and formats.

Spot the difference (JISC)

Principal investigator: J. Collomosse.

Developing visual search technology to enable the detection of visual plagiarism in the arts. A collaboration with the University of the Creative Arts (UCA), and the Visual Arts Data Service (VADS).

Digital Doubles for Film Production - Royal Society Industry Fellowship with Framestore (2008-2012)

Principal investigator: Prof. Adrian Hilton.

4D computer vision modelling - Royal Society Wolfson Research Merit Award 

Principal investigator: Prof. Adrian Hilton.

4D vision can sense both 3D shape and motion, enabling ‘seeing’ machines that can understand and model dynamic scenes – such as the exact line of a football across a goal mouth.

Digital doubles: From real actors to computer generated digital doubles (Framestore, Royal Society)

Principal investigator: Prof. Adrian Hilton.

3D capture of digital doubles of actors and integration in the film production pipeline.

RE @ CT: Immersive production and delivery of interactive 3D content (EU FP7)

Principal investigators: A. HiltonJ. Collomosse.

Production of interactive animated 3D characters in conjunction with conventional broadcast production using multiple view video acquisition and 3D video reconstruction.

Collaborators: BBC, Vicon, Artefacto, Fraunhoffer HHI, INRIA.

SCENE: Novel scene representations for richer networked media (EU FP7)

Principal investigator: Prof. Adrian Hilton.

3D acquisition and representation of real-world scenes from video+depth capture for film production.

Collaborators: Technicolor, Intel Visual Computing Institute, ARRI, Brainstorm Multimedia, 3DLIZED, BarcelonaMedia, Fraunhoffer HHI, IBBT.

SyMMM synchronised multimodal movie metadata (TSB)

Principal investigator: Prof. Adrian Hilton.

Improved onset processing of multimodal data sources (video, image, 3D, high-dynamic range, user annotation) in film production.

Collaborators: DoubleNegative, Filmlight.

Dicta-Sign (EU FP7)

Principal investigator: R. Bowden.

Dicta-Sign is a three-year EU-funded research project that aims at making online communications more accessible to deaf sign language users. Principally the project develops Web2.0 tools for the deaf community, and incorporates visual sign language recognition and production. Visit the Dictasign project page for more information.

Sports performance analysis in broadcast (BBC)

Principal investigator: Prof. Adrian Hilton.

Video analysis to enable through-the-lens analysis of athlete performance.

Bodyshape (EPSRC)

Principal investigator: Prof. Adrian Hilton.

Investigating body shape measurement in the home for online clothing retail. A collaboration with the London College of Fashion, bodymetrics, guided collective.

Digital Dance Archives (AHRC)

Principal investigator: J. Collomosse.

Developing the UK’s first cross-collection online portal to explore 100 years of archival dance performance.

Developing visual search technologies to search dance by pose and choreographic example. Collaboration with Department of Dance (Surrey), National Resource Centre for Dance (NRCD) (Surrey), Department of Dance (Coventry). Visit the DDA website for more information.

POSZ: Reproduction of personal sound zones (IoSR)

Principal investigator: P. Jackson.

Increasingly, there is a demand for multiple personal sound zones which allow listeners in different parts of an acoustic space to listen to different programme material. This project involves two main strands: psychoacoustic research into the criteria for acceptability of sound leakage/interference between the two zones; and engineering research into methods to create personal sound zones for a range of applications.

Multimodal blind source separation for robot audition (EPSRC, DSTL)

Principal investigators: W. Wang.

This project attempts to use both the audio and visual modalities for the problem of source separation of target speech in the presence of multiple competing speech interferences and sound sources in room environments for a robotic system.

Audio and video based speech separation for multiple moving sources within a room environment (EPSRC)

Principal investigator: W. WangJ. Kittler.

Human beings have developed a unique ability to communicate within a noisy environment, such as at a cocktail party. This skill is dependent upon the use of both the aural and visual senses together with sophisticated processing within the brain. To mimic this ability within a machine is very challenging, particularly if the humans are moving. This project attempts to address major challenges in audio-visual speaker localization, tracking and separation.


Nikolaos Dikaios

Fully integrating MRI into the Radiotherapy workflow using the MR-Linac - The Royal Society Industrial Fellowship

Principal investigator: Dr Nikolaos Dikaios

Funder: The Royal Society

Dates: 2019 - 2021


Radiotherapy is a vital part of most cancer patients’ treatment plans – with 50 per cent receiving the remedy. Key to radiotherapy treatment is image guidance that shows doctors the location and size of the tumour, allowing them to provide maximum dosage to the specific site without affecting surrounding, healthy tissue. It is also essential for oncologists to have accurate images so they understand how the tumour is responding to treatment.

The awarded Royal Society Industry Fellowship will see Dr Dikaios work with Elekta to further explore the encouraging indications that MRI can provide unprecedented imaging accuracy during radiotherapy treatment.

Dr Dikaios, Lecturer in Image Analysis for Medicine and Healthcare at CVSSP, said: “I am proud to receive this Royal Society Industry Fellowship, but most of all, I am eager to get on with the task of further developing this very promising technology. We will be using MRI to perform real-time visualisations of radiotherapy treatments and monitoring of tumours – testing whether our methods give doctors the precious time needed to adapt according to how a tumour is reacting.” Find our more on the news page

Collaborators: University of Surrey. Elekta.

Optimam (Cancer Research UK)

Principal investigator: K. Wells.

Investigating the optimal adoption of digital imaging techniques/technology for UK breast screening programme. Involves simulation of imaging systems, lesion simulation, generating synthetic mammograms etc.


Principal investigator: K. Wells.

The MI3 project is developing the largest rad-hard CMOS imaging sensor for biomedical applications. The device is being used at Surrey for electrophoresis imaging applications.

Motion correction in medical imaging (EPSRC, Malaysian government)

Principal investigator: K. Wells.

The project is applying PCA/PFs/KDEs to model, correct and predict respiratory motion present in medical images and for application in therapeutic radiotherapy.


Do androids see optical illusions? - Royal Society Leverhulme Trust Senior Research Fellowship

Principal investigators: Prof. Richard Bowden .


ACASVA: Adaptive cognition for automated sports video annotation (EPSRC)

Principal investigators: J. Kittler, W. Christmas, D. Windridge.

This project addresses the problem of autonomous cognition at the interface of vision and language. The goal is to develop mechanisms that would allow the transfer of knowledge from an existing system for tennis video annotation so as to be capable of automatically annotating video of novel sports.

This will be accomplished via the cross-modal bootstrapping of high-level visual/linguistic structures in a manner paralleling human capabilities. Find our more on the ACASVA project page.

Security and data

Genetics of the people of the British Isles and their faces (Wellcome Trust)

Principal investigator: J. Kittler.

The project is concerned with 3D face analysis.

The scientific hypothesis to be tested is that face phenotype is determined by groups of genes. Similar faces will be associated with similar genetic code. The study attempts to identify which elements of the genetic morphology determine face characteristic and what aspects of the relevant parts of the genetic code give a rise to differences in face phenotype.

The aim of the project is to investigate the relationship between genotype and its expression in face appearance. Visit our Faces of the British Isles project page for more information.



Principal investigator: R. Bowden.

This project is concerned with automated detection of terrorist activity and looking for patterns in data.

Strategic funding

Visual media research platform grant (EPSRC)

The EPSRC have provided strategic long-term support for visual media research within CVSSP for period 2003-2013, through the platform grant scheme.

Audio-visual research partnership (BBC)

Strategic partnership with the BBC for collaboration in audio-visual research with the BBC and other companies.