
Sauradip Nag
Academic and research departments
Centre for Vision, Speech and Signal Processing (CVSSP), Department of Electrical and Electronic Engineering.About
My research project
Hierarchical visual activity analysisIn this project, we aim to develop models for recognising complex and hierarchical visual activities. The current state-of-the-art methods such as variants of the two-steam network or 3D CNNs work very well on sports/film type videos, but struggle with complex activities with hierarchical structure such as cooking and assembling flat pack furniture. Analysing these activities is only achievable by delving deep into the interaction between humans and objects and compositionally modelling the hierarchical structure. To this end, we will develop novel compositional learning models and exploit deep graph convolutional networks for structured data analysis.
In this project, we aim to develop models for recognising complex and hierarchical visual activities. The current state-of-the-art methods such as variants of the two-steam network or 3D CNNs work very well on sports/film type videos, but struggle with complex activities with hierarchical structure such as cooking and assembling flat pack furniture. Analysing these activities is only achievable by delving deep into the interaction between humans and objects and compositionally modelling the hierarchical structure. To this end, we will develop novel compositional learning models and exploit deep graph convolutional networks for structured data analysis.
Publications
Existing Temporal Action Detection (TAD) methods typ- ically take a pre-processing step in converting an input varying-length video into a fixed-length snippet represen- tation sequence, before temporal boundary estimation and action classification. This pre-processing step would tem- porally downsample the video, reducing the inference res- olution and hampering the detection performance in the original temporal resolution. In essence, this is due to a temporal quantization error introduced during the resolu- tion downsampling and recovery. This could negatively im- pact the TAD performance, but is largely ignored by existing methods. To address this problem, in this work we intro- duce a novel model-agnostic post-processing method with- out model redesign and retraining. Specifically, we model the start and end points of action instances with a Gaussian distribution for enabling temporal boundary inference at a sub-snippet level. We further introduce an efficient Taylor- expansion based approximation, dubbed as Gaussian Ap- proximated Post-processing (GAP). Extensive experiments demonstrate that our GAP can consistently improve a wide variety of pre-trained off-the-shelf TAD models on the chal- lenging ActivityNet (+0.2%∼0.7% in average mAP) and THUMOS (+0.2%∼0.5% in average mAP) benchmarks. Such performance gains are already significant and highly comparable to those achieved by novel model designs. Also, GAP can be integrated with model training for further performance gain. Importantly, GAP enables lower tem- poral resolutions for more efficient inference, facilitating low-resource applications. The code will be available in https://github.com/sauradip/GAP