2pm - 3pm
Monday 17 September 2018
Learning video representations for human action recognition
Anoop was a postdoctoral researcher in the LEAR group at Inria from 2012-2015 where his research was on the estimation and tracking of human poses in videos. From 2015-2017, he was a Research Fellow at the Australian National University, where he worked on the problem of recognizing human activities in video sequences. Anoop is the recipient of the Best Student Paper award at the Intl. Conference on Image Processing in 2012. Currently, his research focus is on modeling the semantics of video data.
This event has passed
- Dr Anoop Cherian
Representations that can compactly and effectively capture the semantic content of multi-variate time-series data are important to several applications in computer vision, machine learning, and robotics. In this talk, I will present several such representations that we have explored in the recent past motivated by the task of video-based human action recognition.
In our problem setting, each video frame is encoded by a multivariate feature (such as using the intermediate features from a deep CNN) and action dynamics is characterized by their variations in time. Our main idea is to train a local model on each video separately and use the parameters of such a model as our representation for the respective video.
The local models are trained to capture some desirable property of the videos. I will discuss two such properties, namely (i) how to capture the spatio-temporal evolution and (ii) what makes a video different from others. For the former category, we propose a subspace estimation problem with partial-ordering constraints, while for the latter, we present a discriminative representation learning scheme that classifies the video features against those generated by a learned adversary.
We cast our learning formulations in the concrete setting of non-linear optimization on well-known mathematical manifolds and provide efficient algorithms for optimization. Experiments on several standard benchmarks are provided to demonstrate the empirical benefits of our schemes against state-of-the-art methods for action recognition.