Placeholder image for staff profiles

Dr Matthew Trumble

Postgraduate Research Student
+44 (0)1483 686046
05 BB 00

My publications


Trumble Matthew, Gilbert Andrew, Malleson Charles, Hilton Adrian, Collomosse John Total Capture, University of Surrey
Trumble M, Gilbert A, Hilton ADM, Collomosse JP (2016) Learning Markerless Human Pose Estimation from Multiple Viewpoint Video,Computer Vision ? ECCV 2016 Workshops. Lecture Notes in Computer Science 9915 pp. 871-878
We present a novel human performance capture technique capable of robustly estimating the pose (articulated joint positions) of a performer observed passively via multiple view-point video (MVV). An affine invariant pose descriptor is learned using a convolutional neural network (CNN) trained over volumetric data extracted from a MVV dataset of diverse human pose and appearance. A manifold embedding is learned via Gaussian Processes for the CNN descriptor and articulated pose spaces enabling regression and so estimation of human pose from MVV input. The learned descriptor and manifold are shown to generalise over a wide range of human poses, providing an efficient performance capture solution that requires no fiducials or other markers to be worn. The system is evaluated against ground truth joint configuration data from a commercial marker-based pose estimation system
Trumble Matthew, Gilbert Andrew, Malleson Charles, Hilton Adrian, Collomosse John (2017) Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors,Proceedings of 28th British Machine Vision Conference pp. 1-13
We present an algorithm for fusing multi-viewpoint video (MVV) with inertial measurement
unit (IMU) sensor data to accurately estimate 3D human pose. A 3-D convolutional
neural network is used to learn a pose embedding from volumetric probabilistic
visual hull data (PVH) derived from the MVV frames. We incorporate this model within
a dual stream network integrating pose embeddings derived from MVV and a forward
kinematic solve of the IMU data. A temporal model (LSTM) is incorporated within
both streams prior to their fusion. Hybrid pose inference using these two complementary
data sources is shown to resolve ambiguities within each sensor modality, yielding improved
accuracy over prior methods. A further contribution of this work is a new hybrid
MVV dataset (TotalCapture) comprising video, IMU and a skeletal joint ground truth
derived from a commercial motion capture system. The dataset is available online at
Malleson Charles, Volino Marco, Gilbert Andrew, Trumble Matthew, Collomosse John, Hilton Adrian (2017) Real-time Full-Body Motion Capture from Video and IMUs,3DV 2017 Proceedings CPS
A real-time full-body motion capture system is presented
which uses input from a sparse set of inertial measurement
units (IMUs) along with images from two or more standard
video cameras and requires no optical markers or specialized
infra-red cameras. A real-time optimization-based
framework is proposed which incorporates constraints from
the IMUs, cameras and a prior pose model. The combination
of video and IMU data allows the full 6-DOF motion to
be recovered including axial rotation of limbs and drift-free
global position. The approach was tested using both indoor
and outdoor captured data. The results demonstrate the effectiveness
of the approach for tracking a wide range of human
motion in real time in unconstrained indoor/outdoor
Trumble Matthew, Gilbert Andrew, Hilton Adrian, Collomosse John (2018) Deep Autoencoder for Combined Human Pose Estimation and Body Model Upscaling,Proceedings of ECCV 2018: European Conference on Computer Vision Springer Science+Business Media
We present a method for simultaneously estimating 3D hu-
man pose and body shape from a sparse set of wide-baseline camera views.
We train a symmetric convolutional autoencoder with a dual loss that
enforces learning of a latent representation that encodes skeletal joint
positions, and at the same time learns a deep representation of volumetric
body shape. We harness the latter to up-scale input volumetric data by a
factor of 4X, whilst recovering a 3D estimate of joint positions with equal
or greater accuracy than the state of the art. Inference runs in real-time
(25 fps) and has the potential for passive human behaviour monitoring
where there is a requirement for high fidelity estimation of human body
shape and pose.
Gilbert Andrew, Trumble Matthew, Malleson Charles, Hilton Adrian, Collomosse John (2018) Fusing Visual and Inertial Sensors with Semantics for 3D Human Pose Estimation,International Journal of Computer Vision Springer Verlag
We propose an approach to accurately esti-
mate 3D human pose by fusing multi-viewpoint video
(MVV) with inertial measurement unit (IMU) sensor
data, without optical markers, a complex hardware setup
or a full body model. Uniquely we use a multi-channel
3D convolutional neural network to learn a pose em-
bedding from visual occupancy and semantic 2D pose
estimates from the MVV in a discretised volumetric
probabilistic visual hull (PVH). The learnt pose stream
is concurrently processed with a forward kinematic solve
of the IMU data and a temporal model (LSTM) exploits
the rich spatial and temporal long range dependencies
among the solved joints, the two streams are then fused
in a final fully connected layer. The two complemen-
tary data sources allow for ambiguities to be resolved
within each sensor modality, yielding improved accu-
racy over prior methods. Extensive evaluation is per-
formed with state of the art performance reported on
the popular Human 3.6M dataset [26], the newly re-
leased TotalCapture dataset and a challenging set of
outdoor videos TotalCaptureOutdoor. We release the
new hybrid MVV dataset (TotalCapture) comprising
of multi- viewpoint video, IMU and accurate 3D skele-
tal joint ground truth derived from a commercial mo-
tion capture system. The dataset is available online at
Gilbert Andrew, Trumble Matt, Hilton Adrian, Collomosse John (2018) Inpainting of Wide-baseline Multiple Viewpoint Video,IEEE Transactions on Visualization and Computer Graphics Institute of Electrical and Electronics Engineers (IEEE)
We describe a non-parametric algorithm for multiple-viewpoint video inpainting. Uniquely, our algorithm addresses the domain of wide baseline multiple-viewpoint video (MVV) with no temporal look-ahead in near real time speed. A Dictionary of Patches (DoP) is built using multi-resolution texture patches reprojected from geometric proxies available in the alternate views. We dynamically update the DoP over time, and a Markov Random Field optimisation over depth and appearance is used to resolve and align a selection of multiple candidates for a given patch, this ensures the inpainting of large regions in a plausible manner conserving both spatial and temporal coherence. We demonstrate the removal of large objects (e.g. people) on challenging indoor and outdoor MVV exhibiting cluttered, dynamic backgrounds and moving cameras.