Dr Armin Mustafa
Academic and research departmentsCentre for Vision, Speech and Signal Processing (CVSSP), Faculty of Engineering and Physical Sciences.
I am currently a Royal Academy of Engineering Research Fellow in the Centre for Vision, Speech and Signal Processing, University of Surrey working in 4D Vision for perceptive machines. I finished PhD in general dynamic scene reconstruction from multi-view videos in 2016 from the University of Surrey, supervised by Prof. Adrian Hilton, after which I became a Research Fellow at CVSSP, University of Surrey. I have previously worked at Samsung Research Institute, Bangalore, India for 3 years (2010 - 2013) in Computer Vision. In 2010 I received M.Tech. degree from the Indian Institute of Technology (IIT), Kanpur, India supervised by Prof. K.S. Venkatesh in Computer Vision.
Areas of specialism
Human Computer Interaction;
2018 - Research Fellowship, The Royal Academy of Engineering , UK.
2017 - Young Researcher award, CVPR.
2016 - Doctoral Consortium grant, CVPR.
2015 - BMVA travel grant for ICCV.
2014 - Set-Squared Research to Innovator grant, Global #1 University incubator, UK.
2013 - Overseas Research Scholarship, FEPS, The University of Surrey, UK.
2010 - Cadence Silver Medal, Indian Institute of Technology, Kanpur, India.
In the media
The emergence of machines that interact with their environment has led to an increasing demand for automatic visual understanding of real-world scenes. My research aims to better understand complex scenes so that machines can efficiently model and interpret real-world for a range of socially beneficial applications including autonomous systems, augmented reality and healthcare.
4D temporally coherent models of complex dynamic scenes.
No prior knowledge is required of scene structure or camera
calibration allowing reconstruction from multiple moving
cameras. Sparse-to-dense temporal correspondence is integrated
with joint multi-view segmentation and reconstruction
to obtain a complete 4D representation of static and
dynamic objects. Temporal coherence is exploited to overcome
visual ambiguities resulting in improved reconstruction
of complex scenes. Robust joint segmentation and reconstruction
of dynamic objects is achieved by introducing
a geodesic star convexity constraint. Comparative evaluation
is performed on a variety of unstructured indoor and
outdoor dynamic scenes with hand-held cameras and multiple
people. This demonstrates reconstruction of complete
temporally coherent 4D scene models with improved nonrigid
object segmentation and shape reconstruction.
and non-uniform distribution of correspondences when using
conventional detectors such as SIFT, SURF, FAST and
MSER. In this paper we introduce a novel segmentation
based feature detector SFD that produces an increased
number of ?good? features for accurate wide-baseline reconstruction.
Each image is segmented into regions by
over-segmentation and feature points are detected at the
intersection of the boundaries for three or more regions.
Segmentation-based feature detection locates features at
local maxima giving a relatively large number of feature
points which are consistently detected across wide-baseline
views and accurately localised. A comprehensive comparative
performance evaluation with previous feature detection
approaches demonstrates that: SFD produces a large
number of features with increased scene coverage; detected
features are consistent across wide-baseline views for images
of a variety of indoor and outdoor scenes; and the
number of wide-baseline matches is increased by an order
of magnitude compared to alternative detector-descriptor
combinations. Sparse scene reconstruction from multiple
wide-baseline stereo views using the SFD feature detector
demonstrates at least a factor six increase in the number of
reconstructed points with reduced error distribution compared
to SIFT when evaluated against ground-truth and
similar computational cost to SURF/FAST.
led to an increasing demand for machine perception of real-world to be
more robust, accurate and human-like. The research in visual scene un-
derstanding over the past two decades has focused on machine perception
in controlled environments such as indoor, static and rigid objects. There
is a gap in literature for machine perception in general complex scenes
(outdoor with multiple interacting people). The proposed research ad-
dresses the limitations of existing methods by proposing an unsupervised
framework to simultaneously model, semantically segment and estimate
motion for general dynamic scenes captured from multiple view videos
with a network of static or moving cameras. In this talk I will explain the
proposed joint framework to understand general dynamic scenes for ma-
chine perception; give a comprehensive performance evaluation against
state-of-the-art techniques on challenging indoor and outdoor sequences;
and demonstrate applications such as virtual, augmented, mixed reality
(VR/AR/MR) and broadcast production (Free-view point video - FVV).