I am currently a Research Fellow in the Centre for Vision, Speech and Signal Processing at the University of Surrey. I am interested in using computer vision and graphics techniques to facilitate volumetric video capture, reconstruction, editing and rendering from multiple camera systems for applications in film, broadcast and gaming. I also have a keen interest in Photography.
In 2016, I received a PhD from the University of Surrey supervised by Prof. Adrian Hilton and Prof. Graham Thomas (BBC R&D). My PhD introduced novel techniques to compactly represent multiple view video to allow efficient free-viewpoint rendering. During this time, I undertook an internship at the University of Southern California, Institute for Creative Technologies where I designed and built a photogrammetry scanning cage for rapid avatar creation, and a three month internship at BBC R&D integrating my PhD research into a WebGL-based renderer.
In 2011, I obtained an MEng in Electronic Engineering (Distinction) at the University of Surrey during which I spent a professional training year at Sony Broadcast and Professional Research Labs and a three month summer internship at the Intelligent Systems Research Laboratory University of Reading.
Areas of specialism
which uses input from a sparse set of inertial measurement
units (IMUs) along with images from two or more standard
video cameras and requires no optical markers or specialized
infra-red cameras. A real-time optimization-based
framework is proposed which incorporates constraints from
the IMUs, cameras and a prior pose model. The combination
of video and IMU data allows the full 6-DOF motion to
be recovered including axial rotation of limbs and drift-free
global position. The approach was tested using both indoor
and outdoor captured data. The results demonstrate the effectiveness
of the approach for tracking a wide range of human
motion in real time in unconstrained indoor/outdoor
fidelity volumetric reconstructions of human performance to be captured
from multi-view video comprising only a small set of camera views. Our
method yields similar end-to-end reconstruction error to that of a prob-
abilistic visual hull computed using significantly more (double or more)
viewpoints. We use a deep prior implicitly learned by the autoencoder
trained over a dataset of view-ablated multi-view video footage of a wide
range of subjects and actions. This opens up the possibility of high-end
volumetric performance capture in on-set and prosumer scenarios where
time or cost prohibit a high witness camera count.
augmented reality applications to increase realism and immersion.
However, existing light-field methods are generally
limited to static scenes due to the requirement to acquire
a dense scene representation. The large amount of
data and the absence of methods to infer temporal coherence
pose major challenges in storage, compression and
editing compared to conventional video. In this paper, we
propose the first method to extract a spatio-temporally coherent
light-field video representation. A novel method to
obtain Epipolar Plane Images (EPIs) from a spare lightfield
camera array is proposed. EPIs are used to constrain
scene flow estimation to obtain 4D temporally coherent representations
of dynamic light-fields. Temporal coherence is
achieved on a variety of light-field datasets. Evaluation of
the proposed light-field scene flow against existing multiview
dense correspondence approaches demonstrates a significant
improvement in accuracy of temporal coherence.
that can be delivered via the Internet and interactively controlled
in a WebGL enabled web-browser. Four-dimensional performance
capture is used to capture realistic human motion and appearance.
The captured data is processed into efficient and compact
representations for geometry and texture. Motions are analysed
against a high-level, user-defined motion graph and suitable
inter- and intra-motion transitions are identified. This processed
data is stored on a webserver and downloaded by a client application
is used to manage the state of the character which responds to user
input and sends required frames to a WebGL-based renderer for
display. Through the efficient geometry, texture and motion graph
representations, a game character capable of performing a range of
motions can be represented in 40-50 MB of data. This highlights
the potential use of four-dimensional performance capture for creating
web-based content. Datasets are made available for further
research and an online demo is provided.
(HSDSR) approach to generate temporally consistent
meshes from multiple view video of human subjects.
2D pose detections from multiple view video are used to
estimate 3D skeletal pose on a per-frame basis. The 3D
pose is embedded into a 3D surface reconstruction allowing
any frame to be reposed into the shape from any other
frame in the captured sequence. Skeletal motion transfer
is performed by selecting a reference frame from the surface
reconstruction data and reposing it to match the pose
estimation of other frames in a sequence. This allows an
initial coarse alignment to be performed prior to refinement
by a patch-based non-rigid mesh deformation. The
proposed approach overcomes limitations of previous work
by reposing a reference mesh to match the pose of a target
mesh reconstruction, providing a closer starting point
for further non-rigid mesh deformation. It is shown that the
proposed approach is able to achieve comparable results to
existing model-based and model-free approaches. Finally,
it is demonstrated that this framework provides an intuitive
way for artists and animators to edit volumetric video.