Dr Marco Volino

Research Fellow in Computer Vision


Areas of specialism

Computer Vision; Computer Graphics; Computer Animation; Volumetric Video; Light Fields; Virtual Reality; Augmented Reality

Research projects

My publications


Malleson Charles, Volino Marco, Gilbert Andrew, Trumble Matthew, Collomosse John, Hilton Adrian (2017) Real-time Full-Body Motion Capture from Video and IMUs, 3DV 2017 Proceedings CPS
A real-time full-body motion capture system is presented
which uses input from a sparse set of inertial measurement
units (IMUs) along with images from two or more standard
video cameras and requires no optical markers or specialized
infra-red cameras. A real-time optimization-based
framework is proposed which incorporates constraints from
the IMUs, cameras and a prior pose model. The combination
of video and IMU data allows the full 6-DOF motion to
be recovered including axial rotation of limbs and drift-free
global position. The approach was tested using both indoor
and outdoor captured data. The results demonstrate the effectiveness
of the approach for tracking a wide range of human
motion in real time in unconstrained indoor/outdoor
Gilbert Andrew, Volino Marco, Collomosse John, Hilton Adrian (2018) Volumetric performance capture from minimal camera viewpoints, Computer Vision ? ECCV 2018. ECCV 2018. Lecture Notes in Computer Science 11215 pp. 591-607 Springer Science+Business Media
We present a convolutional autoencoder that enables high
fidelity volumetric reconstructions of human performance to be captured
from multi-view video comprising only a small set of camera views. Our
method yields similar end-to-end reconstruction error to that of a prob-
abilistic visual hull computed using significantly more (double or more)
viewpoints. We use a deep prior implicitly learned by the autoencoder
trained over a dataset of view-ablated multi-view video footage of a wide
range of subjects and actions. This opens up the possibility of high-end
volumetric performance capture in on-set and prosumer scenarios where
time or cost prohibit a high witness camera count.
Mustafa Armin, Volino Marco, Guillemaut Jean-Yves, Hilton Adrian (2018) 4D Temporally Coherent Light-field Video, 3DV 2017 Proceedings IEEE
Light-field video has recently been used in virtual and
augmented reality applications to increase realism and immersion.
However, existing light-field methods are generally
limited to static scenes due to the requirement to acquire
a dense scene representation. The large amount of
data and the absence of methods to infer temporal coherence
pose major challenges in storage, compression and
editing compared to conventional video. In this paper, we
propose the first method to extract a spatio-temporally coherent
light-field video representation. A novel method to
obtain Epipolar Plane Images (EPIs) from a spare lightfield
camera array is proposed. EPIs are used to constrain
scene flow estimation to obtain 4D temporally coherent representations
of dynamic light-fields. Temporal coherence is
achieved on a variety of light-field datasets. Evaluation of
the proposed light-field scene flow against existing multiview
dense correspondence approaches demonstrates a significant
improvement in accuracy of temporal coherence.
Casas D, Volino Marco, Collomosse JP, Hilton A (2014) 4D Video Textures for Interactive Character Appearance, Computer Graphics Forum: the international journal of the Eurographics Association
4D Video Textures (4DVT) introduce a novel representation for rendering video-realistic interactive character animation from a database of 4D actor performance captured in a multiple camera studio. 4D performance capture reconstructs dynamic shape and appearance over time but is limited to free-viewpoint video replay of the same motion. Interactive animation from 4D performance capture has so far been limited to surface shape only. 4DVT is the final piece in the puzzle enabling video-realistic interactive animation through two contributions: a layered view-dependent texture map representation which supports efficient storage, transmission and rendering from multiple view video capture; and a rendering approach that combines multiple 4DVT sequences in a parametric motion space, maintaining video quality rendering of dynamic surface appearance whilst allowing high-level interactive control of character motion and viewpoint. 4DVT is demonstrated for multiple characters and evaluated both quantitatively and through a user-study which confirms that the visual quality of captured video is maintained. The 4DVT representation achieves >90% reduction in size and halves the rendering cost.
Volino Marco, Casas D, Collomosse JP, Hilton A Optimal Representation of Multi-View Video, Proceedings of BMVC 2014 - British Machine Vision Conference BMVC
Multi-view video acquisition is widely used for reconstruction and free-viewpoint rendering of dynamic scenes by directly resampling from the captured images. This paper addresses the problem of optimally resampling and representing multi-view video to obtain a compact representation without loss of the view-dependent dynamic surface appearance. Spatio-temporal optimisation of the multi-view resampling is introduced to extract a coherent multi-layer texture map video. This resampling is combined with a surface-based optical flow alignment between views to correct for errors in geometric reconstruction and camera calibration which result in blurring and ghosting artefacts. The multi-view alignment and optimised resampling results in a compact representation with minimal loss of information allowing high-quality free-viewpoint rendering. Evaluation is performed on multi-view datasets for dynamic sequences of cloth, faces and people. The representation achieves >90% compression without significant loss of visual quality.
Volino Marco, Huang Peng, Hilton Adrian (2015) Online interactive 4D character animation, Proceedings of the 20th International Conference on 3D Web Technology (Web3D '15) pp. 289-295
This paper presents a framework for creating realistic virtual characters
that can be delivered via the Internet and interactively controlled
in a WebGL enabled web-browser. Four-dimensional performance
capture is used to capture realistic human motion and appearance.
The captured data is processed into efficient and compact
representations for geometry and texture. Motions are analysed
against a high-level, user-defined motion graph and suitable
inter- and intra-motion transitions are identified. This processed
data is stored on a webserver and downloaded by a client application
when required. A Javascript-based character animation engine
is used to manage the state of the character which responds to user
input and sends required frames to a WebGL-based renderer for
display. Through the efficient geometry, texture and motion graph
representations, a game character capable of performing a range of
motions can be represented in 40-50 MB of data. This highlights
the potential use of four-dimensional performance capture for creating
web-based content. Datasets are made available for further
research and an online demo is provided.
Regateiro Joao, Volino Marco, Hilton Adrian (2018) Hybrid Skeleton Driven Surface Registration for Temporally Consistent Volumetric Video, Proceedings of 2018 International Conference on 3D Vision (3DV) pp. 514-522 Institute of Electrical and Electronics Engineers (IEEE)
This paper presents a hybrid skeleton-driven surface registration
(HSDSR) approach to generate temporally consistent
meshes from multiple view video of human subjects.
2D pose detections from multiple view video are used to
estimate 3D skeletal pose on a per-frame basis. The 3D
pose is embedded into a 3D surface reconstruction allowing
any frame to be reposed into the shape from any other
frame in the captured sequence. Skeletal motion transfer
is performed by selecting a reference frame from the surface
reconstruction data and reposing it to match the pose
estimation of other frames in a sequence. This allows an
initial coarse alignment to be performed prior to refinement
by a patch-based non-rigid mesh deformation. The
proposed approach overcomes limitations of previous work
by reposing a reference mesh to match the pose of a target
mesh reconstruction, providing a closer starting point
for further non-rigid mesh deformation. It is shown that the
proposed approach is able to achieve comparable results to
existing model-based and model-free approaches. Finally,
it is demonstrated that this framework provides an intuitive
way for artists and animators to edit volumetric video.
Berghi Davide, Stenzel Hanne, Volino Marco, Hilton Adrian, Jackson Philip (2020) Audio-Visual Spatial Alignment Requirements of Central and Peripheral Object Events, IEEE VR 2020 IEEE
Immersive audio-visual perception relies on the spatial integration of both auditory and visual information which are heterogeneous sensing modalities with different fields of reception and spatial
resolution. This study investigates the perceived coherence of audiovisual
object events presented either centrally or peripherally with horizontally aligned/misaligned sound. Various object events were selected to represent three acoustic feature classes. Subjective test results in a simulated virtual environment from 18 participants indicate
a wider capture region in the periphery, with an outward bias favoring more lateral sounds. Centered stimulus results support previous findings for simpler scenes.