
Jiantao Wu
Academic and research departments
Surrey Institute for People-Centred Artificial Intelligence (PAI).About
My research project
Foundation Models for Visual UnderstandingMuch of the success of the AI is attributed to supervised-pretraining (SP) of deep neural networks and then adopting them for downstream applications specific tasks. However, there are several challenges and issues with labelled data, e.g, labelling cost, noisy-labels, incomplete and inadequate labels (focusing on only dominant concept/concepts in images), inherent human labelling bias. An alternative to supervised-learning is self/un-supervised-learning which learns without labels. Two of the key principles of self-supervised-learning in (computer) vision-based AI systems are a) get augmented views of the same input and enforce some notion of consistency between the views, b) mask part of the input and recover that part from rest of the unmasked input.
Although SSL is predicted to be the future of AI research, as put by AI pioneers, “The revolution will not be supervised". However, until very recently self-supervised-pretraining fell behind supervised-pretraining for computer vision applications hindering the realisation of SSL dream. This was changed by Group Masked Model Learning (GMML) which was proposed in our seminal work SiT: Self-supervised vIsion Transformers in 2021. The GMML marked a milestone in AI development as it is the first method to outperform supervised-pretraining and learned semantic concepts without using any labels. The core idea of GMML is adopted by tech giants like Microsoft, Facebook and several others for application areas including, computer vision, audio, medical-image analysis, anomaly detection, multimodal analysis and many more.
Supervisors
Much of the success of the AI is attributed to supervised-pretraining (SP) of deep neural networks and then adopting them for downstream applications specific tasks. However, there are several challenges and issues with labelled data, e.g, labelling cost, noisy-labels, incomplete and inadequate labels (focusing on only dominant concept/concepts in images), inherent human labelling bias. An alternative to supervised-learning is self/un-supervised-learning which learns without labels. Two of the key principles of self-supervised-learning in (computer) vision-based AI systems are a) get augmented views of the same input and enforce some notion of consistency between the views, b) mask part of the input and recover that part from rest of the unmasked input.
Although SSL is predicted to be the future of AI research, as put by AI pioneers, “The revolution will not be supervised". However, until very recently self-supervised-pretraining fell behind supervised-pretraining for computer vision applications hindering the realisation of SSL dream. This was changed by Group Masked Model Learning (GMML) which was proposed in our seminal work SiT: Self-supervised vIsion Transformers in 2021. The GMML marked a milestone in AI development as it is the first method to outperform supervised-pretraining and learned semantic concepts without using any labels. The core idea of GMML is adopted by tech giants like Microsoft, Facebook and several others for application areas including, computer vision, audio, medical-image analysis, anomaly detection, multimodal analysis and many more.