In 2018 I obtained a PhD in Robot Vision, and have continued to work in Robotics Research. I am currently a Research Fellow at the University of Surrey. I am interested in the fields of Robotics, Computer Vision and Deep Learning. I’ve spent many years developing and building real-time robotic systems that can leverage the advances of Deep Learning to perform difficult computer vision tasks such as SLAM, 3D Reconstruction and Multi-View Geometry. I am also interested in collaboration and automation between robotic agents, specifically emergent behaviours that are not hard-coded into systems.
Areas of specialism
University roles and responsibilities
- Lecturer for EEE1035 (Programming in C)
I am interested in the fields of Robotics, Computer Vision and Deep Learning. I’ve spent many years developing and building real-time robotic systems that can leverage the advances of Deep Learning to perform difficult computer vision tasks such as SLAM, 3D Reconstruction and Multi-View Geometry. I am also interested in collaboration and automation between robotic agents, specifically emergent behaviours that are not hard-coded into systems.
a floorplan? It is probably safe to say that we do not explicitly
measure depths to every visible surface and try to match them
against different pose estimates in the floorplan. And yet, this
is exactly how most robotic scan-matching algorithms operate.
Similarly, we do not extrude the 2D geometry present in the
floorplan into 3D and try to align it to the real-world. And yet,
this is how most vision-based approaches localise.
Humans do the exact opposite. Instead of depth, we use
high level semantic cues. Instead of extruding the floorplan up
into the third dimension, we collapse the 3D world into a 2D
representation. Evidence of this is that many of the floorplans
we use in everyday life are not accurate, opting instead for high
levels of discriminative landmarks.
In this work, we use this insight to present a global
localisation approach that relies solely on the semantic labels
present in the floorplan and extracted from RGB images. While
our approach is able to use range measurements if available,
we demonstrate that they are unnecessary as we can achieve
results comparable to state-of-the-art without them.
In the real world, reconstructing a 3D scene requires nuanced understanding of the environment. Additionally, it is not enough to simply ?understand? the world, autonomous agents must be capable of actively acquiring this understanding. Achieving all of this using simple monocular sensors is extremely challenging. Agents must be able to understand what areas of the world are navigable, how egomotion affects reconstruction and how other agents may be leveraged to provide an advantage. All of this must be considered in addition to the traditional 3D reconstruction issues of correspondence estimation, triangulation and data association.
Simultaneous Localisation and Mapping (SLAM) solutions are not particularly well suited to autonomous multi-agent reconstruction. They typically require the sensors to be in constant communication, do not scale well with the number of agents (or map size) and require expensive optimisations. Instead, this thesis attempts to develop more pro-active techniques from the ground up.
First, an autonomous agent must have the ability to actively select what it is going to reconstruct. Known as view-selection, or Next-Best View (NBV), this has recently become an active topic in autonomous robotics and will form the first contribution of this thesis. Second, once a view is selected, an autonomous agent must be able to plan a trajectory to arrive at that view. This problem, known as path-planning, can be considered a core topic in the robotics field and will form the second contribution of this thesis. Finally, the 3D reconstruction must be anchored to a globally consistent map that co-relates to the real world. This will be addressed as a floorplan localisation problem, an emerging field for the vision community, and will be the third contribution of this thesis.
To give autonomous agents the ability to actively select what data to process, this thesis discusses the NBV problem in the context of Multi-View Stereo (MVS). The proposed approach has the ability to massively reduce the amount of computing resources required for any given 3D reconstruction. More importantly, it autonomously selects the views that improve the reconstruction the most. All of this is done exclusively on the sensor pose; the images are not used for view-selection and only loaded into memory once they have been selected for reconstruction. Experimental evaluation shows that NBV applied to this problem can achieve results comparable to state-of-the-art using as little as 3.8% of the views.
To provide the ability to execute an autonomous 3D reconstruction, this thesis proposes a novel computer-vision based goal-estimation and path-planning approach. The method proposed in the previous chapter is extended into a continuous pose-space. The resulting view then becomes the goal of a Scenic Pathplanner that plans a trajectory between the current robot pose and the NBV. This is done using an NBV-based pose-space that biases the paths towards areas of high information gain. Experimental evaluation shows that the Scenic Planning enables similar performance to state-of-the-art batch approaches using less than 3% of the views, whichcorresponds to 2.7 × 10 ?4 % of the possible stereo pairs (using a naive interpretation of plausible stereo pairs). Comparison against length-based path-planning approaches show that the Scenic Pathplanner produces more complete and more accurate maps with fewer frames. Finally, the ability of the Scenic Pathplanner to generalise to live sc
has been widely addressed in the literature. While many
approaches exist to perform reconstruction, few of them
take an active role in deciding where the next observations
should come from. Furthermore, the problem of travelling
from the camera?s current position to the next, known as
pathplanning, usually focuses on minimising path length.
This approach is ill-suited for reconstruction applications,
where learning about the environment is more valuable than
speed of traversal.
We present a novel Scenic Route Planner that selects
paths which maximise information gain, both in terms of
total map coverage and reconstruction accuracy. We also
introduce a new type of collaborative behaviour into the
planning stage called opportunistic collaboration, which
allows sensors to switch between acting as independent
Structure from Motion (SfM) agents or as a variable baseline
We show that Scenic Planning enables similar performance
to state-of-the-art batch approaches using less than
0.00027% of the possible stereo pairs (3% of the views).
Comparison against length-based pathplanning approaches
show that our approach produces more complete and more
accurate maps with fewer frames. Finally, we demonstrate
the Scenic Pathplanner?s ability to generalise to live scenarios
by mounting cameras on autonomous ground-based
sensor platforms and exploring an environment.
same area to become familiar with it? To begin with, we might first
build a mental model of our surroundings. Upon revisiting this area, we
can use this model to extrapolate to new unseen locations and imagine
Based on this, we propose an approach where an agent is capable of
modelling new environments after a single visitation. To this end, we
introduce ?Deep Imagination?, a combination of classical Visual-based
Monte Carlo Localisation and deep learning. By making use of a feature
embedded 3D map, the system can ?imagine? the view from any
novel location. These ?imagined? views are contrasted with the current
observation in order to estimate the agent?s current location. In order to
build the embedded map, we train a deep Siamese Fully Convolutional
U-Net to perform dense feature extraction. By training these features to
be generic, no additional training or fine tuning is required to adapt to
Our results demonstrate the generality and transfer capability of our
learnt dense features by training and evaluating on multiple datasets.
Additionally, we include several visualizations of the feature representations
and resulting 3D maps, as well as their application to localisation.