Dr Jean-Yves Guillemaut


Senior Lecturer in 3D Computer Vision
MEng (hons), PhD, MIEEE, MBMVA, FHEA
+44 (0)1483 686042
32 BA 00

Biography

Areas of specialism

3D Computer Vision; 3D Reconstruction; Computational Photography; Virtual and Augmented Reality; Lightfield Imaging; 3D Video; Artificial Intelligence

University roles and responsibilities

  • Senior Lecturer in 3D Computer Vision
  • CVSSP Postgraduate Research Director
  • Department Prizes Officer
  • Professional Training Tutor
  • MSc Personal Tutor

My qualifications

2014
Graduate Certificate in Learning and Teaching
University of Surrey
2005
PhD degree in 3D Computer Vision
University of Surrey
2001
MEng degree (first class honours) with specialisation in Automatic Control and Robotics
Ecole Centrale de Nantes

Previous roles

2012 - 2018
Lecturer in 3D Computer Vision
University of Surrey
2012 - 2017
CVSSP External Seminar Organiser
University of Surrey
2005 - 2012
Research Fellow
University of Surrey

Research

Research interests

Research projects

Research collaborations

Indicators of esteem

  • Best Poster Award at European Conference on Visual Media Production (CVMP 2016)

  • Best Student Paper Award at Int. Conference on Computer Vision Theory and Applications (VISAPP 2014)

  • University of Surrey Faculty of Engineering and Physical Sciences Researcher of the Year Award (2012)

  • Honorable Mention for the Best Paper Award at ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2012

  • Best Poster Prize at EPSRC/BMVA Summer School on Computer Vision 2002

Supervision

Postgraduate research supervision

Completed postgraduate research projects I have supervised

My teaching

My publications

Publications

Sarim M, Hilton A, Guillemaut JY (2011) Temporal trimap propagation for video matting using inferential statistics, Proceedings - International Conference on Image Processing, ICIP pp. 1745-1748
This paper introduces a statistical inference framework to temporally propagate trimap labels from sparsely defined key frames to estimate trimaps for the entire video sequence. Trimap is a fundamental requirement for digital image and video matting approaches. Statistical inference is coupled with Bayesian statistics to allow robust trimap labelling in the presence of shadows, illumination variation and overlap between the foreground and background appearance. Results demonstrate that trimaps are sufficiently accurate to allow high quality video matting using existing natural image matting algorithms. Quantitative evaluation against ground-truth demonstrates that the approach achieves accurate matte estimation with less amount of user interaction compared to the state-of-the-art techniques. © 2011 IEEE.
Brujic-Okretic V, Guillemaut J, Hitchin L, Michielen M, Parker G (2004) Real-time scene reconstruction for remote vehicle navigation, Geometric Modeling and Computing: Seattle 2003 pp. 113-123 Nashboro Press
Hall S, Williamson T, Guillemaut J, Goddard T, Baumann A, Hutter J (2017) Modeling the Dynamics of Tamponade Multicomponent Gases During Retina Reattachment Surgery, AIChE Journal 63 (9) pp. 3651-3662 Wiley, for American Institute of Chemical Engineers
Vitrectomy and pneumatic retinopexy are common surgical procedures used to treat retinal detachment. To reattach the retina, gases are used to inflate the vitreous space allowing the retina to attach by surface tension and buoyancy forces that are superior to the location of the bubble. These procedures require the injection of either a pure tamponade gas, such as C3F8 or SF6, or mixtures of these gases with air. The location of the retinal detachment, the anatomical spread of the retinal defect, and the length of time the defect has persisted, will determine the suggested volume and duration of the gas bubble to allow reattachment. After inflation, the gases are slowly absorbed by the blood allowing the vitreous to be refilled by aqueous. We have developed a model of the mass transfer dynamics of tamponade gases during pneumatic retinopexy or pars plana vitrectomy procedures. The model predicts the expansion and persistence of intraocular gases (C3F8, SF6), oxygen, nitrogen, and carbon dioxide, as well as the intraocular pressure. The model was validated using published literature in rabbits and humans. In addition to correlating the mass transfer dynamics by surface area, permeability, and partial pressure driving forces, the mass transfer dynamics are affected by the percentage of the tamponade gases. Rates were also correlated with the physical properties of the tamponade and blood gases. The model gave accurate predictions in humans.
Roubtsova NS, Guillemaut J (2016) Colour Helmholtz Stereopsis for Reconstruction of Dynamic Scenes with Arbitrary Unknown Reflectance, International Journal of Computer Vision 124 pp. 18-48 Springer
Helmholtz Stereopsis is a powerful technique for reconstruction of scenes with arbitrary re ectance properties. However, previous formulations have been limited to static objects due to the requirement to se- quentially capture reciprocal image pairs (i.e. two im- ages with the camera and light source positions mu- tually interchanged). In this paper, we propose Colour Helmholtz Stereopsis - a novel framework for Helmholtz Stereopsis based on wavelength multiplexing. To ad- dress the new set of challenges introduced by multispec- tral data acquisition, the proposed Colour Helmholtz Stereopsis pipeline uniquely combines a tailored pho- tometric calibration for multiple camera/light source pairs, a novel procedure for spatio-temporal surface chromaticity calibration and a state-of-the-art Bayesian formulation necessary for accurate reconstruction from a minimal number of reciprocal pairs. In this frame- work, re ectance is spatially unconstrained both in terms of its chromaticity and the directional component dependent on the illumination incidence and viewing angles. The proposed approach for the rst time en- ables modelling of dynamic scenes with arbitrary un- known and spatially varying re ectance using a practi- cal acquisition set-up consisting of a small number of cameras and light sources. Experimental results demon- strate the accuracy and exibility of the technique on a variety of static and dynamic scenes with arbitrary un- known BRDF and chromaticity ranging from uniform to arbitrary and spatially varying.
Fastovets M, Guillemaut JY, Hilton A (2013) Athlete pose estimation from monocular TV sports footage, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops pp. 1048-1054
Human pose estimation from monocular video streams is a challenging problem. Much of the work on this problem has focused on developing inference algorithms and probabilistic prior models based on learned measurements. Such algorithms face challenges in generalization beyond the learned dataset. We propose an interactive model-based generative approach for estimating the human pose in 2D from uncalibrated monocular video in unconstrained sports TV footage without any prior learning on motion captured or annotated data. Belief-propagation over a spatio-temporal graph of candidate body part hypotheses is used to estimate a temporally consistent pose between key-frame constraints. Experimental results show that the proposed generative pose estimation framework is capable of estimating pose even in very challenging unconstrained scenarios. © 2013 IEEE.
Guillemaut J-Y, Illingworth J (2008) The normalised image of the absolute conic and its application for zooming camera calibration, PATTERN RECOGNITION 41 (12) pp. 3624-3635 PERGAMON-ELSEVIER SCIENCE LTD
Sarim M, Hilton A, Guillemaut JY (2009) Alpha matte estimation of natural images using local and global template correspondence, 2009 International Conference on Emerging Technologies, ICET 2009 pp. 229-234
Natural image matting is an interesting and difficult problem of computer vision because of its under-constrained nature. It often requires a user interaction, a trimap, to aid the algorithm in identifying the initial definite foreground and background regions. Current techniques use local or global image statistics of these definite regions to estimate the alpha matte for the undefined region. In this paper we propose a novel nonparametric template correspondence approach to estimate the alpha matte. This technique alleviates the problem of previous parametric algorithms that rely solely on colour information and hence are unable to exploit the image structure to their advantage. The proposed technique uses global and local template correspondence, to the definite know regions, to construct the background and foreground layers. Once the foreground and background colours are estimated, the final alpha matte is computed. According to the quantitative analysis against the ground truth, the proposed algorithm outperforms the current state-of-the-art parametric matting techniques. ©2009 IEEE.
Klaudiny M, Tejera M, Malleson C, Guillemaut J-Y, Hilton A SCENE Digital Cinema Datasets, University of Surrey
Fastovets M, Guillemaut JY, Hilton A (2014) Athlete pose estimation by non-sequential key-frame propagation, ACM International Conference Proceeding Series 2014-November
Copyright 2014 ACM.This paper considers the problem of estimating human pose in challenging monocular sports videos, where manual intervention is often required in order to obtain useful results. Fully automatic approaches focus on developing inference algorithms and probabilistic prior models based on learned measurements and often face challenges in generalisation beyond the learned dataset. This work expands on the idea of using an interactive model-based generative technique for accurately estimating the human pose from uncalibrated unconstrained monocular TV sports footage. A method of keyframe propagation is introduced to obtain reliable tracking from limited operator input by introducing the concepts of keyframe propagation and optimal keyframe selection assistance for the operator. Experimental results show that the approach produces results competitive with those produced with twice the number of manually annotated keyframes, halving the amount of interaction required.
Guillemaut JY, Hilton A (2012) Space-time joint multi-layer segmentation and depth estimation, Proceedings - 2nd Joint 3DIM/3DPVT Conference: 3D Imaging, Modeling, Processing, Visualization and Transmission, 3DIMPVT 2012 pp. 440-447
Video-based segmentation and reconstruction techniques are predominantly extensions of techniques developed for the image domain treating each frame independently. These approaches ignore the temporal information contained in input videos which can lead to incoherent results. We propose a framework for joint segmentation and reconstruction which explicitly enforces temporal consistency by formulating the problem as an energy minimisation generalised to groups of frames. The main idea is to use optical flow in combination with a confidence measure to impose robust temporal smoothness constraints. Optimisation is performed using recent advances in the field of graph-cuts combined with practical considerations to reduce run-time and memory consumption. Experimental results with real sequences containing rapid motion demonstrate that the method is able to improve spatio-temporal coherence both in terms of segmentation and reconstruction without introducing any degradation in regions where optical flow fails due to fast motion. © 2012 IEEE.
Guillemaut J-Y, Hilton A (2011) Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications, International Journal of Computer Vision 93 (1) pp. 73-100 Springer
Sarim M, Hilton A, Guillemaut J-Y, Kim H, Takai T (2010) Wide-Baseline Multi-View Video Segmentation For 3D Reconstruction, Proceedings of the 1st international workshop on 3D video processing pp. 13-18 ACM
Obtaining a foreground silhouette across multiple views is one of the fundamental steps in 3D reconstruction. In this paper we present a novel video segmentation approach, to obtain a foreground silhouette, for scenes captured by a wide-baseline camera rig given a sparse manual interaction in a single view. The algorithm is based on trimap propagation, a framework used in video matting. Bayesian inference coupled with camera calibration information are used to spatio-temporally propagate high confidence trimap labels across the multi-view video to obtain coarse silhouettes which are later refined using a matting algorithm. Recent techniques have been developed for foreground segmentation, based on image matting, in multiple views but they are limited to narrow baseline with low foreground variation. The proposed wide-baseline silhouette propagation is robust to inter-view foreground appearance changes, shadows and similarity in foreground/background appearance. The approach has demonstrated good performance in silhouette estimation for views up to 180 degree baseline (opposing views). The segmentation technique has been fully integrated in a multi-view reconstruction pipeline. The results obtained demonstrate the suitability of the technique for multi-view reconstruction with wide-baseline camera set-ups and natural background.
Guillemaut J-Y, Kittler J, Sadeghi MT, Christmas WJ (2006) General pose face recognition using frontal face model, PROGRESS IN PATTERN RECOGNITON, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS 4225 pp. 79-88 SPRINGER-VERLAG BERLIN
Sarim M, Hilton A, Guillemaut J (2009) Non-parametric patch based video matting, British Machine Vision Association
In computer vision, matting is the process of accurate foreground estimation in images and videos. In this paper we presents a novel patch based approach to video matting relying on non-parametric statistics to represent image variations in appearance. This overcomes the limitation of parametric algorithms which only rely on strong colour correlation between the nearby pixels. Initially we construct a clean background by utilising the foreground object?s movement across the background. For a given frame, a trimap is constructed using the background and the last frame?s trimap. A patch-based approach is used to estimate the foreground colour for every unknown pixel and finally the alpha matte is extracted. Quantitative evaluation shows that the technique performs better, in terms of the accuracy and the required user interaction, than the current state-of-the-art parametric approaches.
Roubtsova N, Guillemaut J-Y (2015) Extended Bayesian Helmholtz Stereopsis for Enhanced Geometric Reconstruction of Complex Objects, 550 pp. 223-238 SPRINGER-VERLAG BERLIN
Roubtsova N, Guillemaut J-Y Colour Helmholtz Stereopsis Dataset, University of Surrey
Imre HE, Guillemaut J-Y, Hilton ADM (2010) Moving Camera Registration for Multiple Camera Setups in Dynamic Scenes, Proceedings of the 21st British Machine Vision Conference
Many practical applications require an accurate knowledge of the extrinsic calibration (\ie, pose) of a moving camera. The existing SLAM and structure-from-motion solutions are not robust to scenes with large dynamic objects, and do not fully utilize the available information in the presence of static cameras, a common practical scenario. In this paper, we propose an algorithm that addresses both of these issues for a hybrid static-moving camera setup. The algorithm uses the static cameras to build a sparse 3D model of the scene, with respect to which the pose of the moving camera is estimated at each time instant. The performance of the algorithm is studied through extensive experiments that cover a wide range of applications, and is shown to be satisfactory.
Imre E, Guillemaut JY, Hilton A (2012) Through-the-lens multi-camera synchronisation and frame-drop detection for 3D reconstruction, Proceedings - 2nd Joint 3DIM/3DPVT Conference: 3D Imaging, Modeling, Processing, Visualization and Transmission, 3DIMPVT 2012 pp. 395-402
Synchronisation is an essential requirement for multiview 3D reconstruction of dynamic scenes. However, the use of HD cameras and large set-ups put a considerable stress on hardware and cause frame drops, which is usually detected by manually verifying very large amounts of data. This paper improves [9], and extends it with frame-drop detection capability. In order to spot frame-drop events, the algorithm fits a broken line to the frame index correspondences for each camera pair, and then fuses the pair wise drop hypotheses into a consistent, absolute frame-drop estimate. The success and the practical utility of the the improved pipeline is demonstrated through a number of experiments, including 3D reconstruction and free-viewpoint video rendering tasks. © 2012 IEEE.
Neophytou A, Guillemaut J-Y, Hilton A (2015) A dense surface motion capture system for accurate acquisition of cloth deformation, CVMP 2015: PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON VISUAL MEDIA PRODUCTION ASSOC COMPUTING MACHINERY
Mustafa A, Kim H, Guillemaut J-Y, Hilton ADM (2016) Temporally coherent 4D reconstruction of complex dynamic scenes, IEEE
This paper presents an approach for reconstruction of
4D temporally coherent models of complex dynamic scenes.
No prior knowledge is required of scene structure or camera
calibration allowing reconstruction from multiple moving
cameras. Sparse-to-dense temporal correspondence is integrated
with joint multi-view segmentation and reconstruction
to obtain a complete 4D representation of static and
dynamic objects. Temporal coherence is exploited to overcome
visual ambiguities resulting in improved reconstruction
of complex scenes. Robust joint segmentation and reconstruction
of dynamic objects is achieved by introducing
a geodesic star convexity constraint. Comparative evaluation
is performed on a variety of unstructured indoor and
outdoor dynamic scenes with hand-held cameras and multiple
people. This demonstrates reconstruction of complete
temporally coherent 4D scene models with improved nonrigid
object segmentation and shape reconstruction.
Brown M, Windridge D, Guillemaut JY (2015) A generalisable framework for saliency-based line segment detection, Pattern Recognition 48 (12) pp. 3993-4011 Elsevier
© 2015 The Authors. Here we present a novel, information-theoretic salient line segment detector. Existing line detectors typically only use the image gradient to search for potential lines. Consequently, many lines are found, particularly in repetitive scenes. In contrast, our approach detects lines that define regions of significant divergence between pixel intensity or colour statistics. This results in a novel detector that naturally avoids the repetitive parts of a scene while detecting the strong, discriminative lines present. We furthermore use our approach as a saliency filter on existing line detectors to more efficiently detect salient line segments. The approach is highly generalisable, depending only on image statistics rather than image gradient; and this is demonstrated by an extension to depth imagery. Our work is evaluated against a number of other line detectors and a quantitative evaluation demonstrates a significant improvement over existing line detectors for a range of image transformations.
Guillemaut JY, Aguado AS, Illingworth J (2005) Using points at infinity for parameter decoupling in camera calibration, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 27 (2) pp. 265-270 IEEE COMPUTER SOC
Brown M, Windridge D, Guillemaut J-Y (2015) Globally Optimal 2D-3D Registration from Points or Lines Without Correspondences, 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) pp. 2111-2119 IEEE
Guillemaut JY, Drbohlav O, Sara R, Illingworth J (2004) Helmholtz stereopsis on rough and strongly textured surfaces, 2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS pp. 10-17 IEEE COMPUTER SOC
Helmholtz Stereopsis (HS) has recently been explored as a promising technique for capturing shape of objects with unknown reflectance. So far, it has been widely applied to objects of smooth geometry and piecewise uniform Bidirectional Reflectance Distribution Function (BRDF). Moreover, for nonconvex surfaces the inter-reflect ion effects have been completely neglected. We extend the method to surfaces which exhibit strong texture, nontrivial geometry and are possibly nonconvex. The problem associated with these surface features is that Helmholtz reciprocity is apparently violated when point-based measurements are used independently to establish the matching constraint as in the standard HS implementation. We argue that the problem is avoided by computing radiance measurements on image regions corresponding exactly to projections of the same surface point neighbourhood with appropriate scale. The experimental results demonstrate the success of the novel method proposed on real objects.
Casas D, Tejera M, Guillemaut J-Y, Hilton A (2012) Parametric animation of performance-captured mesh sequences, Computer Animation and Virtual Worlds 23 (2) pp. 101-111 Wiley
In this paper, we introduce an approach to high-level parameterisation of captured mesh sequences of actor performance for real-time interactive animation control. High-level parametric control is achieved by non-linear blending between multiple mesh sequences exhibiting variation in a particular movement. For example, walking speed is parameterised by blending fast and slow walk sequences. A hybrid non-linear mesh sequence blending approach is introduced to approximate the natural deformation of non-linear interpolation techniques whilst maintaining the real-time performance of linear mesh blending. Quantitative results show that the hybrid approach gives an accurate real-time approximation of offline non-linear deformation. An evaluation of the approach shows good performance not only for entire meshes but also with specific mesh areas. Results are presented for single and multi-dimensional parametric control of walking (speed/direction), jumping (height/distance) and reaching (height) from captured mesh sequences. This approach allows continuous real-time control of high-level parameters such as speed and direction whilst maintaining the natural surface dynamics of captured movement.
Wang T, Guillemaut J, Collomosse J (2010) Multi-label Propagation for Coherent Video Segmentation and Artistic Stylization, Proceedings of Intl. Conf. on Image Proc. (ICIP) pp. 3005-3008 IEEE
We present a new algorithm for segmenting video frames into temporally stable colored regions, applying our technique to create artistic stylizations (e.g. cartoons and paintings) from real video sequences. Our approach is based on a multilabel graph cut applied to successive frames, in which the color data term and label priors are incrementally updated and propagated over time. We demonstrate coherent segmentation and stylization over a variety of home videos.
Imre E, Guillemaut JY, Hilton A (2011) Calibration of nodal and free-moving cameras in dynamic scenes for post-production, Proceedings - 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, 3DIMPVT 2011 pp. 260-267
In film production, many post-production tasks require the availability of accurate camera calibration information. This paper presents an algorithm for through-the-lens calibration of a moving camera for a common scenario in film production and broadcasting: The camera views a dynamic scene, which is also viewed by a set of static cameras with known calibration. The proposed method involves the construction of a sparse scene model from the static cameras, with respect to which the moving camera is registered, by applying the appropriate perspective-n-point (PnP) solver. In addition to the general motion case, the algorithm can handle the nodal cameras with unknown focal length via a novel P2P algorithm. The approach can identify a subset of static cameras that are more likely to generate a high number of scene-image correspondences, and can robustly deal with dynamic scenes. Our target applications include dense 3D reconstruction, stereoscopic 3D rendering and 3D scene augmentation, through which the success of the algorithm is demonstrated experimentally. © 2011 IEEE.
Malleson C, Klaudiny M, Guillemaut JY, Hilton A (2015) Structured representation of non-rigid surfaces from single view 3D point tracks, Proceedings - 2014 International Conference on 3D Vision, 3DV 2014 pp. 625-632
© 2014 IEEE.This work considers the problem of structured representation of dynamic surfaces from incomplete 3D point tracks from a single viewpoint. The surface is segmented into a set of connected regions each of which can be represented by a fixed intrinsic shape and a parametrised rigid/non-rigid motion trajectory. Neither the model parameters nor the point-to-model assignments are known upfront. Motion and geometric shape parameters are estimated in alternation with a graph-cuts based point-to-model assignment. This modelling process facilitates in-filling of missing data as well as de-noising of measurements by temporal integration while adding meaningful structure to the geometry and reducing storage cost by an order of magnitude. Experiments are presented for real and synthetic sequences to validate the approach and show how a single tuning parameter can be used to trade modelling error with extrapolation level and storage cost.
Imber J, Volino M, Guillemaut JY, Fenney S, Hilton A (2013) Free-viewpoint video rendering for mobile devices, ACM International Conference Proceeding Series
Free-viewpoint video renderers (FVVR) allow a user to view captured video footage from any position and direction. Despite the obvious appeal of such systems, they have yet to make a major impact on digital entertainment. Current FVVR implementations have been on desktop computers. Media consumption is increasingly through mobile devices, such as smart phones and tablets; adapting FVVR to mobile platforms will open this new form of media up to a wider audience. An efficient, high-quality FVVR, which runs in real time with user interaction on a mobile device, is presented. Performance is comparable to recent desktop implementations. The FVVR supports relighting and integration of relightable free-viewpoint video (FVV) content into computer-generated scenes. A novel approach to relighting FVVR content is presented which does not require prior knowledge of the scene illumination or accurate surface geometry. Surface appearance is separated into a detail component, and a set of materials with properties determining surface colour and specular behaviour. This allows plausible relighting of the dynamic FVV for rendering on mobile devices. © 2013 ACM.
Kim H, Sarim M, Takai T, Guillemaut J-Y, Hilton A (2010) Dynamic 3D Scene Reconstruction in Outdoor Environments, In Proc. IEEE Symp. on 3D Data Processing and Visualization IEEE
A number of systems have been developed for dynamic
3D reconstruction from multiple view videos over the past
decade. In this paper we present a system for multiple view
reconstruction of dynamic outdoor scenes transferring
studio technology to uncontrolled environments.
A synchronised portable multiple camera system is
composed of off-the-shelf HD cameras for dynamic scene
capture. For foreground extraction, we propose a
multi-view trimap propagation method which is robust
against dynamic changes in appearance between views and
over time. This allows us to apply state-of-the-art natural
image matting algorithms for multi-view sequences with
minimal interaction. Optimal 3D surface of the foreground
models are reconstructed by integrating multi-view shape
cues and features.
For background modelling, we use a line scan camera
with a fish eye lens to capture a full environment with high
resolution. The environment model is reconstructed from a
spherical stereo image pair with sub-pixel correspondence.
Finally the foreground and background models are
merged into a 3D world coordinate and the composite
model is rendered from arbitrary viewpoints. We show that
the proposed system generates high quality scene images
with dynamic virtual camera actions.
Guillemaut JY, Aguado AS, Illingworth J (2003) Calibration of a zooming camera using the Normalized Image of the Absolute Conic, FOURTH INTERNATIONAL CONFERENCE ON 3-D DIGITAL IMAGING AND MODELING, PROCEEDINGS pp. 225-232 IEEE COMPUTER SOC
Fastovets M, Guillemaut JY, Hilton A (2014) Estimating athlete pose from monocular tv sports footage, 71 pp. 161-178
© Springer International Publishing Switzerland 2014.Human pose estimation from monocular video streams is a challenging problem. Much of the work on this problem has focused on developing inference algorithms and probabilistic prior models based on learned measurements. Such algorithms face challenges in generalisation beyond the learned dataset.We propose an interactive model-based generative approach for estimating the human pose from uncalibratedmonocular video in unconstrained sportsTVfootage. Belief propagation over a spatio-temporal graph of candidate body part hypotheses is used to estimate a temporally consistent pose between user-defined keyframe constraints. Experimental results show that the proposed generative pose estimation framework is capable of estimating pose even in very challenging unconstrained scenarios.
Haccius C, Herfet T, Matvienko V, Eisert P, Feldmann I, Hilton A, Guillemaut J, Klaudiny M, Jachalsky J, Rogmans S (2013) A Novel Scene Representation for Digital Media,
Roubtsova N, Guillemaut JY (2014) A Bayesian framework for enhanced geometric reconstruction of complex objects by helmholtz stereopsis, VISAPP 2014 - Proceedings of the 9th International Conference on Computer Vision Theory and Applications 3 pp. 335-342
Helmholtz stereopsis is an advanced 3D reconstruction technique for objects with arbitrary reflectance properties that uniquely characterises surface points by both depth and normal. Traditionally, in Helmholtz stereopsis consistency of depth and normal estimates is assumed rather than explicitly enforced. Furthermore, conventional Helmholtz stereopsis performs maximum likelihood depth estimation without neighbourhood consideration. In this paper, we demonstrate that reconstruction accuracy of Helmholtz stereopsis can be greatly enhanced by formulating depth estimation as a Bayesian maximum a posteriori probability problem. In reformulating the problem we introduce neighbourhood support by formulating and comparing three priors: a depth-based, a normal-based and a novel depth-normal consistency enforcing one. Relative performance evaluation of the three priors against standard maximum likelihood Helmholtz stereopsis is performed on both real and synthetic data to facilitate both qualitative and quantitative assessment of reconstruction accuracy. Observed superior performance of our depth-normal consistency prior indicates a previously unexplored advantage in joint optimisation of depth and normal estimates.
Guillemaut J-Y, Kilner J, Hilton A (2010) Robust Graph-Cut Scene Segmentation and Reconstruction for Free-Viewpoint Video of Complex Dynamic Scenes, 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) pp. 809-816 IEEE
Casas D, Tejera M, Guillemaut J-Y, Hilton A (2011) Parametric control of captured mesh sequences for real-time animation, Lecture Notes in Computer Science: Motion in Games 7060 pp. 242-253 Springer
In this paper we introduce an approach to high-level parameterisation of captured mesh sequences of actor performance for real-time interactive animation control. High-level parametric control is achieved by non-linear blending between multiple mesh sequences exhibiting variation in a particular movement. For example walking speed is parameterised by blending fast and slow walk sequences. A hybrid non-linear mesh sequence blending approach is introduced to approximate the natural deformation of non-linear interpolation techniques whilst maintaining the real-time performance of linear mesh blending. Quantitative results show that the hybrid approach gives an accurate real-time approximation of offline non-linear deformation. Results are presented for single and multi-dimensional parametric control of walking (speed/direction), jumping (heigh/distance) and reaching (height) from captured mesh sequences. This approach allows continuous real-time control of high-level parameters such as speed and direction whilst maintaining the natural surface dynamics of captured movement.
Kilner J, Guillemaut J-Y, Hilton A (2010) Summarised hierarchical Markov models for speed-invariant action matching, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009 pp. 1065-1072
Action matching, where a recorded sequence is matched against, and synchronised with, a suitable proxy from a library of animations, is a technique for generating a synthetic representation of a recorded human activity. This proxy can then be used to represent the action in a virtual environment or as a prior on further processing of the sequence. In this paper we present a novel technique for performing action matching in outdoor sports environments. Outdoor sports broadcasts are typically multi-camera environments and as such reconstruction techniques can be applied to the footage to generate a 3D model of the scene. However due to poor calibration and matting this reconstruction is of a very low quality. Our technique matches the 3D reconstruction sequence against a predefined library of actions to select an appropriate high quality synthetic representation. A hierarchical Markov model combined with 3D summarisation of the data allows a large number of different actions to be matched successfully to the sequence in a rate-invariant manner without prior segmentation of the sequence into discrete units. The technique is applied to data captured at rugby and soccer games. ©2009 IEEE.
Malleson C, Klaudiny M, Hilton A, Guillemaut J-Y (2013) Single-view RGBD-based reconstruction of dynamic human geometry, Proceedings of the IEEE International Conference on Computer Vision - Workshop on Dynamic Shape Capture and Analysis (4DMOD 2013) pp. 307-314 IEEE
We present a method for reconstructing the geometry and appearance of indoor scenes containing dynamic human subjects using a single (optionally moving) RGBD sensor. We introduce a framework for building a representation of the articulated scene geometry as a set of piecewise rigid parts which are tracked and accumulated over time using moving voxel grids containing a signed distance representation. Data association of noisy depth measurements with body parts is achieved by online training of a prior shape model for the specific subject. A novel frame-to-frame model registration is introduced which combines iterative closest-point with additional correspondences from optical flow and prior pose constraints from noisy skeletal tracking data. We quantitatively evaluate the reconstruction and tracking performance of the approach using a synthetic animated scene. We demonstrate that the approach is capable of reconstructing mid-resolution surface models of people from low-resolution noisy data acquired from a consumer RGBD camera. © 2013 IEEE.
Sarim M, Hilton A, Guillemaut JY, Kim H (2009) Non-parametric natural image matting, Proceedings - International Conference on Image Processing, ICIP pp. 3213-3216
Natural image matting is an extremely challenging image processing problem due to its ill-posed nature. It often requires skilled user interaction to aid definition of foreground and background regions. Current algorithms use these predefined regions to build local foreground and background colour models. In this paper we propose a novel approach which uses non-parametric statistics to model image appearance variations. This technique overcomes the limitations of previous parametric approaches which are purely colour-based and thereby unable to model natural image structure. The proposed technique consists of three successive stages: (i) background colour estimation, (ii) foreground colour estimation, (iii) alpha estimation. Colour estimation uses patch-based matching techniques to efficiently recover the optimum colour by comparison against patches from the known regions. Quantitative evaluation against ground truth demonstrates that the technique produces better results and successfully recovers fine details such as hair where many other algorithms fail. ©2009 IEEE.
Guillemaut J, Aguado A, Illingworth J (2002) Using points at in?nity for parameter decoupling in camera calibration, 1 pp. 263-272
Imber J, Guillemaut J-Y, Hilton A (2014) Intrinsic textures for relightable free-viewpoint video, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8690 LNCS (PART 2) pp. 392-407 Springer
This paper presents an approach to estimate the intrinsic texture properties (albedo, shading, normal) of scenes from multiple view acquisition under unknown illumination conditions. We introduce the concept of intrinsic textures, which are pixel-resolution surface textures representing the intrinsic appearance parameters of a scene. Unlike previous video relighting methods, the approach does not assume regions of uniform albedo, which makes it applicable to richly textured scenes. We show that intrinsic image methods can be used to refine an initial, low-frequency shading estimate based on a global lighting reconstruction from an original texture and coarse scene geometry in order to resolve the inherent global ambiguity in shading. The method is applied to relighting of free-viewpoint rendering from multiple view video capture. This demonstrates relighting with reproduction of fine surface detail. Quantitative evaluation on synthetic models with textured appearance shows accurate estimation of intrinsic surface reflectance properties. © 2014 Springer International Publishing.
Guillemaut J-Y, Drbohlav O, Illingworth J, Sara R (2008) A maximum likelihood surface normal estimation algorithm for Helmholtz stereopsis, VISAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2 pp. 352-359 INSTICC-INST SYST TECHNOLOGIES INFORMATION CONTROL & COMMUNICATION
Budd C, Guillemaut J, Klaudiny M, Hilton A (2012) Scene Modelling for Richer Media Content,
Brujic-Okretic V, Guillemaut J, Hitchin L, Michielen M, Parker G (2003) Remote vehicle manoeuvring using augmented reality, pp. 186-189
Kilner J, Guillemaut J-Y, Hilton A (2010) 3D action matching with key-pose detection, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009 pp. 1-8
This paper addresses the problem of human action matching in outdoor sports broadcast environments, by analysing 3D data from a recorded human activity and retrieving the most appropriate proxy action from a motion capture library. Typically pose recognition is carried out using images from a single camera, however this approach is sensitive to occlusions and restricted fields of view, both of which are common in the outdoor sports environment. This paper presents a novel technique for the automatic matching of human activities which operates on the 3D data available in a multi-camera broadcast environment. Shape is retrieved using multi-camera techniques to generate a 3D representation of the scene. Use of 3D data renders the system camera-pose-invariant and allows it to work while cameras are moving and zooming. By comparing the reconstructions to an appropriate 3D library, action matching can be achieved in the presence of significant calibration and matting errors which cause traditional pose detection schemes to fail. An appropriate feature descriptor and distance metric are presented as well as a technique to use these features for key-pose detection and action matching. The technique is then applied to real footage captured at an outdoor sporting event. ©2009 IEEE.
Guillemaut J-Y, Hilton A, Starck J, Kilner J, Grau O (2007) A Bayesian framework for simultaneous matting and 3D reconstruction, 3DIM 2007: Sixth International Conference on 3-D Digital Imaging and Modeling, Proceedings pp. 167-174 IEEE COMPUTER SOC
Conventional stereoscopic video content production requires use of dedicated stereo camera rigs which is both costly and lacking video editing flexibility. In this paper, we propose a novel approach which only requires a small number of standard cameras sparsely located around a scene to automatically convert the monocular inputs into stereoscopic streams. The approach combines a probabilistic spatio-temporal segmentation framework with a state-of-the-art multi-view graph-cut reconstruction algorithm, thus providing full control of the stereoscopic settings at render time. Results with studio sequences of complex human motion demonstrate the suitability of the method for high quality stereoscopic content generation with minimum user interaction.
Guillemaut J, Kilner J, Starck J, Hilton ADM (2007) Dynamic feathering: Minimising blending artefacts in view-dependent rendering, IET Conference Publications 534 (534 CP)
Conventional view-dependent texture mapping techniques produce composite images by blending subsets of input images, weighted according to their relative influence at the rendering viewpoint, over regions where the views overlap. Geometric or camera calibration errors often result in a los s of detail due to blurring or double exposure artefacts which tends to be exacerbated by the number of blending views considered. We propose a novel view-dependent rendering technique which optimises the blend region dynamically at rendering time, and reduces the adverse effects of camera calibration or geometric errors otherwise observed. The technique has been successfully integrated in a rendering pipeline which operates at interactive frame rates. Improvement over state-of-the-art view-dependent texture mapping techniques are illustrated on a synthetic scene as well as real imagery of a large scale outdoor scene where large camera calibration and geometric errors are present.
Mustafa Armin, Kim Hansung, Guillemaut Jean-Yves, Hilton Adrian (2015) General Dynamic Scene Reconstruction from Multiple View Video, 2015 IEEE International Conference on Computer Vision (ICCV) pp. 900-908 IEEE
This paper introduces a general approach to dynamic scene reconstruction from multiple moving cameras without prior knowledge or limiting constraints on the scene structure, appearance, or illumination. Existing techniques for dynamic scene reconstruction from multiple wide-baseline camera views primarily focus on accurate reconstruction in controlled environments, where the cameras are fixed and calibrated and background is known. These approaches are not robust for general dynamic scenes captured with sparse moving cameras. Previous approaches for outdoor dynamic scene reconstruction assume prior knowledge of the static background appearance and structure. The primary contributions of this paper are twofold: an automatic method for initial coarse dynamic scene segmentation and reconstruction without prior knowledge of background appearance or structure; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes from multiple wide-baseline static or moving cameras. Evaluation is performed on a variety of indoor and outdoor scenes with cluttered backgrounds and multiple dynamic non-rigid objects such as people. Comparison with state-of-the-art approaches demonstrates improved accuracy in both multiple view segmentation and dense reconstruction. The proposed approach also eliminates the requirement for prior knowledge of scene structure and appearance.
Kilner J, Starck J, Guillemaut Jean-Yves, Hilton A (2009) Objective Quality Assessment in Free-viewpoint Video Production, Signal Processing: Image Communication 24 (1-2) pp. 3-16 Elsevier
Brown MR, Windridge D, Guillemaut J (2016) A Generalised Framework for Saliency-Based Point Feature Detection, Computer Vision and Image Understanding 157 pp. 117-137 Elsevier
Here we present a novel, histogram-based salient point feature detector that may naturally be applied to both images and 3D data. Existing point feature detectors are often modality specific, with 2D and 3D feature detectors typically constructed in separate ways. As such, their applicability in a 2D-3D context is very limited, particularly where the 3D data is obtained by a LiDAR scanner. By contrast, our histogram-based approach is highly generalisable and as such, may be meaningfully applied between 2D and 3D data. Using the generalised approach, we propose salient point detectors for images, and both untextured and textured 3D data. The approach naturally allows for the detection of salient 3D points based jointly on both the geometry and texture of the scene, allowing for broader applicability. The repeatability of the feature detectors is evaluated using a range of datasets including image and LiDAR input from indoor and outdoor scenes. Experimental results demonstrate a significant improvement in terms of 2D-2D and 2D-3D repeatability compared to existing multi-modal feature detectors.
Galkandage CVP, Calic J, Dogan S, Guillemaut J (2017) Stereoscopic Video Quality Assessment Using Binocular Energy, Journal of Selected Topics in Signal Processing 11 (1) pp. 102-112 IEEE
Stereoscopic imaging is becoming increasingly popular. However, to ensure the best quality of experience, there is a need to develop more robust and accurate objective metrics for stereoscopic content quality assessment. Existing stereoscopic image and video metrics are either extensions of conventional 2D metrics (with added depth or disparity information) or are based on relatively simple perceptual models. Consequently, they tend to lack the accuracy and robustness required for stereoscopic content quality assessment. This paper introduces full-reference stereoscopic image and video quality metrics based on a Human Visual System (HVS) model incorporating important physiological findings on binocular vision. The proposed approach is based on the following three contributions. First, it introduces a novel HVS model extending previous models to include the phenomena of binocular suppression and recurrent excitation. Second, an image quality metric based on the novel HVS model is proposed. Finally, an optimised temporal pooling strategy is introduced to extend the metric to the video domain. Both image and video quality metrics are obtained via a training procedure to establish a relationship between subjective scores and objective measures of the HVS model. The metrics are evaluated using publicly available stereoscopic image/video databases as well as a new stereoscopic video database. An extensive experimental evaluation demonstrates the robustness of the proposed quality metrics. This indicates a considerable improvement with respect to the state-of-the-art with average correlations with subjective scores of 0.86 for the proposed stereoscopic image metric and 0.89 and 0.91 for the proposed stereoscopic video metrics.
Casas D, Tejera M, Guillemaut J, Hilton A (2013) Interactive Animation of 4D Performance Capture., IEEE Trans. Vis. Comput. Graph. 19 5 pp. 762-773
Hilton A, Guillemaut J, Kilner J, Grau O, Thomas G (2011) 3D-TV Production from Conventional Cameras for Sports Broadcast, IEEE Transactions Broadcasting 57 (2) pp. 462-476 IEEE
3DTV production of live sports events presents a challenging problem involving conflicting requirements of main- taining broadcast stereo picture quality with practical problems in developing robust systems for cost effective deployment. In this paper we propose an alternative approach to stereo production in sports events using the conventional monocular broadcast cameras for 3D reconstruction of the event and subsequent stereo rendering. This approach has the potential advantage over stereo camera rigs of recovering full scene depth, allowing inter-ocular distance and convergence to be adapted according to the requirements of the target display and enabling stereo coverage from both existing and ?virtual? camera positions without additional cameras. A prototype system is presented with results of sports TV production trials for rendering of stereo and free-viewpoint video sequences of soccer and rugby.
Kim H, Guillemaut J, Takai T, Sarim M, Hilton A (2012) Outdoor Dynamic 3D Scene Reconstruction, IEEE Transactions on Circuits and Systems for Video Technology 22 (11) pp. 1611-1622 IEEE
Existing systems for 3D reconstruction from multiple view video use controlled indoor environments with uniform illumination and backgrounds to allow accurate segmentation of dynamic foreground objects. In this paper we present a portable system for 3D reconstruction of dynamic outdoor scenes which require relatively large capture volumes with complex backgrounds and non-uniform illumination. This is motivated by the demand for 3D reconstruction of natural outdoor scenes to support film and broadcast production. Limitations of existing multiple view 3D reconstruction techniques for use in outdoor scenes are identified. Outdoor 3D scene reconstruction is performed in three stages: (1) 3D background scene modelling using spherical stereo image capture; (2) multiple view segmentation of dynamic foreground objects by simultaneous video matting across multiple views; and (3) robust 3D foreground reconstruction and multiple view segmentation refinement in the presence of segmentation and calibration errors. Evaluation is performed on several outdoor productions with complex dynamic scenes including people and animals. Results demonstrate that the proposed approach overcomes limitations of previous indoor multiple view reconstruction approaches enabling high-quality free-viewpoint rendering and 3D reference models for production.
Brown M, Guillemaut Jean-Yves, Windridge D (2015) A Saliency-based Framework for 2D-3D Registration, Proc. International Conference on Computer Vision Theory and Applications (VISAPP 2014)
Here we propose a saliency-based filtering approach to the problem of registering an untextured 3D object to a
single monocular image. The principle of saliency can be applied to a range of modalities and domains to find
intrinsically descriptive entities from amongst detected entities, making it a rigorous approach to multi-modal
registration. We build on the Kadir-Brady saliency framework due to its principled information-theoretic
approach which enables us to naturally extend it to the 3D domain. The salient points from each domain are
initially aligned using the SoftPosit algorithm. This is subsequently refined by aligning the silhouette with
contours extracted from the image. Whereas other point based registration algorithms focus on corners or
straight lines, our saliency-based approach is more general as it is more widely applicable e.g. to curved
surfaces where a corner detector would fail. We compare our salient point detector to the Harris corner and
SIFT keypoint detectors and show it generally achieves superior registration accuracy
Sarim M, Hilton A, Guillemaut J (2009) WIDE-BASELINE MATTE PROPAGATION FOR INDOOR SCENES, 2009 CONFERENCE FOR VISUAL MEDIA PRODUCTION: CVMP 2009 pp. 195-204
This paper presents a method to estimate alpha mattes
for video sequences of the same foreground scene from
wide-baseline views given sparse key-frame trimaps in a
single view. A statistical inference framework is introduced
for spatio-temporal propagation of high-confidence trimap
labels between video sequences without a requirement
for correspondence or camera calibration and motion
estimation. Multiple view trimap propagation integrates
appearance information between views and over time
to achieve robust labelling in the presence of shadows,
changes in appearance with view point and overlap between
foreground and background appearance. Results demonstrate
that trimaps are sufficiently accurate to allow high-quality
video matting using existing single view natural image
matting algorithms. Quantitative evaluation against
ground-truth demonstrates that the approach achieves
accurate matte estimation for camera views separated by
up to 180æ
, with the same amount of manual interaction
required for conventional single view video matting.
Sarim M, Hilton A, Guillemaut J, Takai T, Kim H (2010) Natural image matting for multiple wide-baseline views, Proceedings of 17th IEEE International Conference on Image Processing (ICIP) pp. 2233-2236
In this paper we present a novel approach to estimate the
alpha mattes of a foreground object captured by a widebaseline
circular camera rig provided a single key frame
trimap. Bayesian inference coupled with camera calibration
information are used to propagate high confidence trimaps
labels across the views. Recent techniques have been developed
to estimate an alpha matte of an image using multiple
views but they are limited to narrow baseline views with low
foreground variation. The proposed wide-baseline trimap
propagation is robust to inter-view foreground appearance
changes, shadows and similarity in foreground/background
appearance for cameras with opposing views enabling high
quality alpha matte extraction using any state-of-the-art image
matting algorithm.
Hilton A, Guillemaut J, Kilner J, Grau O, Thomas G (2010) Free-Viewpoint Video for TV Sport Production, In: Ronfard R, Taubin G (eds.), Image and Geometry Processing for 3-D Cinematography 5 Springer
Casas D, Tejera M, Guillemaut J, Hilton A (2012) 4D parametric motion graphs for interactive animation, I3D '12 Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games pp. 103-110 ACM
A 4D parametric motion graph representation is presented for interactive animation from actor performance capture in a multiple camera studio. The representation is based on a 4D model database of temporally aligned mesh sequence reconstructions for multiple motions. High-level movement controls such as speed and direction are achieved by blending multiple mesh sequences of related motions. A real-time mesh sequence blending approach is introduced which combines the realistic deformation of previous non-linear solutions with efficient online computation. Transitions between different parametric motion spaces are evaluated in real-time based on surface shape and motion similarity. 4D parametric motion graphs allow real-time interactive character animation while preserving the natural dynamics of the captured performance. © 2012 ACM.
Sarim M, Hilton A, Guillemaut J, Kim H, Takai T (2010) Multiple view wide-baseline trimap propagation for natural video matting, Proc. European Conference on Visual Media Production (CVMP 2010) pp. 82-91
This paper presents a method to estimate alpha mattes
for video sequences of the same foreground scene from
wide-baseline views given sparse key-frame trimaps in a
single view. A statistical inference framework is introduced
for spatio-temporal propagation of high-confidence trimap
labels between video sequences without a requirement
for correspondence or camera calibration and motion
estimation. Multiple view trimap propagation integrates
appearance information between views and over time
to achieve robust labelling in the presence of shadows,
changes in appearance with view point and overlap between
foreground and background appearance. Results demonstrate
that trimaps are sufficiently accurate to allow high-quality
video matting using existing single view natural image
matting algorithms. Quantitative evaluation against
ground-truth demonstrates that the approach achieves
accurate matte estimation for camera views separated by
up to 180æ
, with the same amount of manual interaction
required for conventional single view video matting.
Roubtsova N, Guillemaut J (2015) Colour Helmholtz Stereopsis for reconstruction of complex dynamic scenes, Proceedings - 2014 International Conference on 3D Vision, 3DV 2014 pp. 251-258
Helmholtz Stereopsis (HS) is a powerful technique for reconstruction of scenes with arbitrary reflectance properties. However, previous formulations have been limited to static objects due to the requirement to sequentially capture reciprocal image pairs (i.e. two images with the camera and light source positions mutually interchanged). In this paper, we propose colour HS-a novel variant of the technique based on wavelength multiplexing. To address the new set of challenges introduced by multispectral data acquisition, the proposed novel pipeline for colour HS uniquely combines a tailored photometric calibration for multiple camera/light source pairs, a novel procedure for surface chromaticity calibration and the state-of-the-art Bayesian HS suitable for reconstruction from a minimal number of reciprocal pairs. Experimental results including quantitative and qualitative evaluation demonstrate that the method is suitable for flexible (single-shot) reconstruction of static scenes and reconstruction of dynamic scenes with complex surface reflectance properties.
Roubtsova Nadejda, Guillemaut Jean-Yves (2017) Bayesian Helmholtz Stereopsis with Integrability Prior, IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (9) pp. 2265-2272 Institute of Electrical and Electronics Engineers (IEEE)
Helmholtz Stereopsis is a 3D reconstruction method
uniquely independent of surface reflectance. Yet, its sub-optimal maximum
likelihood formulation with drift-prone normal integration limits
performance. Via three contributions this paper presents a complete
novel pipeline for Helmholtz Stereopsis. Firstly, we propose a Bayesian
formulation replacing the maximum likelihood problem by a maximum a
posteriori one. Secondly, a tailored prior enforcing consistency between
depth and normal estimates via a novel metric related to optimal surface
integrability is proposed. Thirdly, explicit surface integration is eliminated
by taking advantage of the accuracy of prior and high resolution of
the coarse-to-fine approach. The pipeline is validated quantitatively and
qualitatively against alternative formulations, reaching sub-millimetre
accuracy and coping with complex geometry and reflectance.
Mustafa Armin, Volino Marco, Guillemaut Jean-Yves, Hilton Adrian (2018) 4D Temporally Coherent Light-field Video, 3DV 2017 Proceedings IEEE
Light-field video has recently been used in virtual and
augmented reality applications to increase realism and immersion.
However, existing light-field methods are generally
limited to static scenes due to the requirement to acquire
a dense scene representation. The large amount of
data and the absence of methods to infer temporal coherence
pose major challenges in storage, compression and
editing compared to conventional video. In this paper, we
propose the first method to extract a spatio-temporally coherent
light-field video representation. A novel method to
obtain Epipolar Plane Images (EPIs) from a spare lightfield
camera array is proposed. EPIs are used to constrain
scene flow estimation to obtain 4D temporally coherent representations
of dynamic light-fields. Temporal coherence is
achieved on a variety of light-field datasets. Evaluation of
the proposed light-field scene flow against existing multiview
dense correspondence approaches demonstrates a significant
improvement in accuracy of temporal coherence.
High resolution brain magnetic resonance (MR) images acquired at multiple time points across the treatment of a patient allow the quantification of localised changes brought about by disease progression. The aim of this thesis is to address the challenge of performing automatic longitudinal analysis of magnetic resonance imaging (MRI) in paediatric brain tumours.

The first contribution in this thesis is the validation of a semi-automated segmentation technique. This technique was applied to intra-operative MR images acquired during the surgical resection of hypothalamic tumours in children, in order to assess the volume of tumour resected at different stages of the surgical procedure.

The second contribution in this thesis is the quantification of a rare condition known as hypertrophic olivary degeneration (HOD) in lobes within the brain known as inferior olivary nucleii (ION) in relation to the development of posterior fossa syndrome (PFS) following tumour resection in the hind brain. The change in grey-level intensity over time in the left ION has been identified as a suitable biomarker that correlates with the occurrence of posterior fossa syndrome following tumour resection surgery. This study demonstrates the application of machine learning techniques to T2 brain MR images.

The third contribution presents a novel approach to longitudinal brain MR analysis, focusing on the cerebellum and brain stem. This contribution presents a technique developed to interpolate multi-slice 2D MR image slices of the brain stem and cerebellum both to infill gaps between slices as well as longitudinally over time, that is, in four-dimensional space. This study also investigates the application of machine learning techniques directly to the MR images. Another novel method developed in this study is the Jacobian of deformations in the brain over time, and its use as an imaging feature. Unlike the previous contribution chapter, the third contribution is not hypothesis-driven, and automatically detects six potential biomarkers that are related to the development of PFS following tumour resection in the posterior fossa.

The limited number of patients considered in each study posed a major challenge. This has prompted the use of multiple validation techniques in order to provide accurate results despite the small dataset. These techniques are presented in the second and third contribution chapters.

Williamson Tom H., Guillemaut Jean-Yves, Hall Sheldon K., Hutter Joseph C., Goddard Tony (2018) Theoretical gas concentrations achieving 100% fill of the vitreous cavity in the postoperative period, a gas eye model study (GEMS), RETINA, The Journal of Retinal and Vitreous Diseases 38 pp. S60-S64 Lippincott, Williams & Wilkins
Precis. A mathematical model is described of the physical properties of intraocular gases providing a
guide to the correct gas concentrations to achieve 100% fill of the vitreous cavity postoperatively. A
table for the instruction of surgeons is provided and the effects of different axial lengths examined.

ABSTRACT

Purpose ? To determine the concentrations of different gas tamponades in air to achieve 100% fill of
the vitreous cavity postoperatively and to examine the influence of eye volume on these
concentrations.

Methods ? A mathematical model of the mass transfer dynamics of tamponade and blood gases (O2,
N2, CO2) when injected into the eye was used. Mass transfer surface areas were calculated from
published anatomical data. The model has been calibrated from published volumetric decay and
composition results for three gases sulphahexafluoride, SF6, hexafluoroethane, C2F6, or
perfluoropropane, C3F8. The concentrations of these gases (in air) required to achieve 100% fill of the
vitreous cavity postoperatively without an intra-ocular pressure rise were determined. The
concentrations were calculated for three volumes of the vitreous cavity to test if ocular size
influenced the results.

Results ? A table of gas concentrations was produced. In a simulation of pars plana vitrectomy
operations in which an 80% to 85% fill of the vitreous cavity with gas was achieved at surgery, the
concentrations of the three gases in air to achieve 100% fill postoperatively were 10-13% for C3F8,
12-15% for C2F6 and 19-25% for SF6. These were similar to the so-called ''non-expansive''
concentrations used in the clinical setting. The calculations were repeated for three different sizes
of eye. Aiming for an 80% fill at surgery and 100% postoperatively, an eye with a 4ml vitreous cavity
required 24% SF6, 15% C2F6 or 13% C3F8; 7.2ml required 25% SF6, 15% C2F6 or 13% C3F8; and 10ml
required 25% SF6, 16% C2F6 or 13% C3F8. When using 100% gas (for example, employed in pneumatic
retinopexy), in order to achieve 100% fill postoperatively, the minimum vitreous cavity fill at surgery
was 43% for SF6, 29% for C2F6 and 25% for C3F8 and was only minimally changed by variation in the
size of the eye.

Conclusions ? A table has been produced which could be used for surgical innovation in gas usage in
the vitreous cavity. It provides concentrations for different percentage fills, which will achieve a
moment post-operatively with a full fill of the cavity without a pressure rise. Variation in axial length
and size of the eye does not appear to alter the values in the table significantly. Those using
pneumatic retinopexy need to increase the volume of gas injected with increased size of the eye in
order to match the percentage fill of the vitreous cavity recommended for a given tamponade agent.

Recent developments in 3D media technology have brought to life numerous applications of interactive entertainment such as 3D cinema, 3DTV and gaming. Due to the data intensive nature of 3D visual content, Quality of Experience (QoE) has become a major driving factor to optimise the end-to-end content delivery process. However, to ensure the QoE, there is a need to develop more robust and accurate objective metrics for stereoscopic image and video quality assessment. Existing stereoscopic QoE metrics tend to lack in accuracy and robustness compared to its 2D counterparts as they are either extensions of 2D metrics or are based on simple perceptual models. However, measuring stereoscopic QoE requires more perceptually inspired metrics. This research introduces full-reference stereoscopic image and video quality metrics based on a Human Visual System (HVS) model incorporating important physiological findings on binocular vision. Firstly, a novel HVS model extending existing models in the literature is proposed to include the phenomena of binocular suppression and recurrent excitation towards stereoscopic image quality assessment. Secondly the research is extended to the temporal domain using temporal pooling of the HVS model outputs for individual frames and using a spatio-temporal model in the HVS model towards two distinct temporally inspired stereoscopic video quality metrics. Finally, motion sensitivity is introduced to the HVS model towards a perception inspired stereoscopic video quality metric. The proposed QoE metrics are trained, verified and tested using four publicly available stereoscopic image databases and two stereoscopic video datasets. They indicate an increase of average correlation index from 0.66 (baseline method) to 0.86 for the stereoscopic images and a maximum increase of average correlation index from 0.57 (baseline method) to 0.93 for stereoscopic videos. These results demonstrate the benefits of using a perceptually inspired approach in this research.
Kilner J, Starck J, Guillemaut Jean-Yves, Hilton A (2009) Objective quality assessment in free-viewpoint video production, Signal Processing: Image Communication 24 (1-2) pp. 3-16 Elsevier
Malleson Charles, Guillemaut Jean-Yves, Hilton Adrian (2018) Hybrid modelling of non-rigid scenes from RGBD cameras, IEEE Transactions on Circuits and Systems for Video Technology IEEE
Recent advances in sensor technology have introduced
low-cost RGB video plus depth sensors, such as the
Kinect, which enable simultaneous acquisition of colour and
depth images at video rates. This paper introduces a framework
for representation of general dynamic scenes from video plus
depth acquisition. A hybrid representation is proposed which
combines the advantages of prior surfel graph surface segmentation
and modelling work with the higher-resolution surface
reconstruction capability of volumetric fusion techniques. The
contributions are (1) extension of a prior piecewise surfel graph
modelling approach for improved accuracy and completeness, (2)
combination of this surfel graph modelling with TSDF surface
fusion to generate dense geometry, and (3) proposal of means for
validation of the reconstructed 4D scene model against the input
data and efficient storage of any unmodelled regions via residual
depth maps. The approach allows arbitrary dynamic scenes to be
efficiently represented with temporally consistent structure and
enhanced levels of detail and completeness where possible, but
gracefully falls back to raw measurements where no structure
can be inferred. The representation is shown to facilitate creative
manipulation of real scene data which would previously require
more complex capture setups or manual processing.
Shape information has been recognised as playing a role in intrinsic image estimation since its inception. However, it is only in recent years that hints of the importance of
geometry have been found in decomposing surface appearance into albedo and shading estimates. This thesis establishes the central importance of shape in intrinsic surface
property estimation for static and dynamic scenes, and introduces methods for the use of approximate shape in a wide range of related problems to provide high-level constraints on shading.

A key contribution is intrinsic texture estimation. This is a generalisation of intrinsic image estimation, in which appearance is processed as a function of surface position
rather than pixel position. This approach has numerous advantages, in that the shape can be used to resolve occlusion, inter-reflection and attached shading as a natural part of the method. Unlike previous bidirectional texture function estimation approaches, high-quality albedo and shading textures are produced without prior knowledge of
materials or lighting.

Many of the concepts in intrinsic texture estimation can be extended to single-viewpoint capture for which depth information is available. Depth information greatly reduces the ambiguity of the shading estimation problem, allowing online intrinsic video to be developed for the first time. The availability of a lighting function also allows high-level temporal constraints on shading to be applied over video sequences, which previously required per-pixel correspondence between frames to be established. A number of applications of intrinsic video are investigated, including augmented reality, video stylisation and relighting, all of which run at interactive framerates. The albedo distribution of the input video is preserved, even in the case of natural scenes with complex appearance, and a globally-consistent shading estimate is obtained which remains robust over dynamic sequences.

Finally, an integrated framework bridging the gaps between intrinsic image, video and texture estimation is presented for the first time. Approximate scene geometry provides a convenient means of achieving this, and is used in establishing pixel constraints between adjacent cameras, reconstructing scene lighting, and removing cast shadows and inter-reflections. This introduces a unified geometry-based approach to intrinsic image estimation and related fields, which achieves high-quality results for complex natural scenes for a wide range of capture modalities.

Addari Gianmarco, Guillemaut Jean-Yves (2019) An MRF Optimisation Framework for Full 3D Helmholtz Steropsis, Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications Institute for Systems and Technologies of Information, Control and Communication (INSTICC)
Accurate 3D modelling of real world objects is essential in many applications such as digital film production and cultural heritage preservation. However, current modelling techniques rely on assumptions to constrain the problem, effectively limiting the categories of scenes that can be reconstructed. A common assumption is that the scene?s surface reflectance is Lambertian or known a priori. These constraints rarely hold true in practice and result in inaccurate reconstructions. Helmholtz Stereopsis (HS) addresses this limitation by introducing a reflectance agnostic modelling constraint, but prior work in this area has been predominantly limited to 2.5D reconstruction, providing only a partial model of the scene. In contrast, this paper introduces the first Markov Random Field (MRF) optimisation framework for full 3D HS. First, an initial reconstruction is obtained by performing 2.5D MRF optimisation with visibility constraints from multiple viewpoints and fusing the different outputs. Then, a refined 3D model is obtained through volumetric MRF optimisation using a tailored Iterative Conditional Modes (ICM) algorithm. The proposed approach is evaluated with both synthetic and real data. Results show that the proposed full 3D optimisation significantly increases both geometric and normal accuracy, being able to achieve sub-millimetre precision. Furthermore, the approach is shown to be robust to occlusions and noise.
Brown Mark, Windridge David, Guillemaut Jean-Yves (2019) A family of globally optimal branch-and-bound algorithms for 2D?3D correspondence-free registration, Pattern Recognition 93 pp. 36-54 Elsevier
We present a family of methods for 2D?3D registration spanning both deterministic and non-deterministic branch-and-bound approaches. Critically, the methods exhibit invariance to the underlying scene primitives, enabling e.g. points and lines to be treated on an equivalent basis, potentially enabling a broader range of problems to be tackled while maximising available scene information, all scene primitives being simultaneously considered. Being a branch-and-bound based approach, the method furthermore enjoys intrinsic guarantees of global optimality; while branch-and-bound approaches have been employed in a number of computer vision contexts, the proposed method represents the first time that this strategy has been applied to the 2D?3D correspondence-free registration problem from points and lines. Within the proposed procedure, deterministic and probabilistic procedures serve to speed up the nested branch-and-bound search while maintaining optimality. Experimental evaluation with synthetic and real data indicates that the proposed approach significantly increases both accuracy and robustness compared to the state of the art.
Spiteri Michaela, Guillemaut Jean-Yves, Windridge David, Avula Shivaram, Kumar Ram, Lewis Emma (2019) Fully-Automated Identification of Imaging Biomarkers for Post-Operative Cerebellar Mutism Syndrome Using Longitudinal Paediatric MRI, Neuroinformatics 18 (1) pp. 151-162 Springer US
Post-operative cerebellar mutism syndrome (POPCMS) in children is a post- surgical complication which occurs following the resection of tumors within the brain stem and cerebellum. High resolution brain magnetic resonance (MR) images acquired at multiple time points across a patient?s treatment allow the quantification of localized changes caused by the progression of this syndrome. However, MR images are not necessarily acquired at regular intervals throughout treatment and are often not volumetric. This restricts the analysis to 2D space and causes difficulty in intra- and inter-subject comparison. To address these challenges, we have developed an automated image processing and analysis pipeline. Multi-slice 2D MR image slices are interpolated in space and time to produce a 4D volumetric MR image dataset providing a longitudinal representation of the cerebellum and brain stem at specific time points across treatment. The deformations within the brain over time are represented using a novel metric known as the Jacobian of deformations determinant. This metric, together with the changing grey-level intensity of areas within the brain over time, are analyzed using machine learning techniques in order to identify biomarkers that correspond with the development of POPCMS following tumor resection. This study makes use of a fully automated approach which is not hypothesis-driven. As a result, we were able to automatically detect six potential biomarkers that are related to the development of POPCMS following tumor resection in the posterior fossa.
Malleson Charles, Guillemaut Jean-Yves, Hilton Adrian (2019) 3D Reconstruction from RGB-D Data, In: Rosin Paul L., Lai Yu-Kun, Shao Ling, Liu Yonghuai (eds.), RGB-D Image Analysis and Processing pp. pp 87-115 Springer Nature Switzerland AG
A key task in computer vision is that of generating virtual 3D models
of real-world scenes by reconstructing the shape, appearance and, in the case of
dynamic scenes, motion of the scene from visual sensors. Recently, low-cost video
plus depth (RGB-D) sensors have become widely available and have been applied
to 3D reconstruction of both static and dynamic scenes. RGB-D sensors contain an
active depth sensor, which provides a stream of depth maps alongside standard colour
video. The low cost and ease of use of RGB-D devices as well as their video rate
capture of images along with depth make them well suited to 3D reconstruction. Use
of active depth capture overcomes some of the limitations of passive monocular or
multiple-view video-based approaches since reliable, metrically accurate estimates
of the scene depth at each pixel can be obtained from a single view, even in scenes
that lack distinctive texture. There are two key components to 3D reconstruction
from RGB-D data: (1) spatial alignment of the surface over time and, (2) fusion
of noisy, partial surface measurements into a more complete, consistent 3D model.
In the case of static scenes, the sensor is typically moved around the scene and
its pose is estimated over time. For dynamic scenes, there may be multiple rigid,
articulated, or non-rigidly deforming surfaces to be tracked over time. The fusion
component consists of integration of the aligned surface measurements, typically
using an intermediate representation, such as the volumetric truncated signed distance
field (TSDF). In this chapter, we discuss key recent approaches to 3D reconstruction
from depth or RGB-D input, with an emphasis on real-time reconstruction of static
scenes.
Galkandage Chathura Vindana Perera, Calic Janko, Dogan Safak, Guillemaut Jean-Yves (2020) Full-Reference Stereoscopic Video Quality Assessment Using a Motion Sensitive HVS Model, IEEE Transactions on Circuits and Systems for Video Technology Institute of Electrical and Electronics Engineers
Stereoscopic video quality assessment has become
a major research topic in recent years. Existing stereoscopic video quality metrics are predominantly based on stereoscopic image quality metrics extended to the time domain via for example temporal pooling. These approaches do not explicitly consider the motion sensitivity of the Human Visual System (HVS). To address this limitation, this paper introduces a novel HVS model inspired by physiological findings characterising the
motion sensitive response of complex cells in the primary visual cortex (V1 area). The proposed HVS model generalises previous HVS models, which characterised the behaviour of simple and complex cells but ignored motion sensitivity, by estimating optical flow to measure scene velocity at different scales and orientations.
The local motion characteristics (direction and amplitude) are used to modulate the output of complex cells. The model is applied to develop a new type of full-reference stereoscopic video quality metrics which uniquely combine non-motion sensitive and motion sensitive energy terms to mimic the response of the HVS. A tailored two-stage multi-variate stepwise regression algorithm is introduced to determine the optimal contribution of
each energy term. The two proposed stereoscopic video quality metrics are evaluated on three stereoscopic video datasets. Results indicate that they achieve average correlations with subjective
scores of 0.9257 (PLCC), 0.9338 and 0.9120 (SRCC), 0.8622
and 0.8306 (KRCC), and outperform previous stereoscopic video
quality metrics including other recent HVS-based metrics.
Scarles Caroline, van Evan Suzanne, Klepacz Naomi, Guillemaut Jean-Yves, Humbracht Michael Bringing The Outdoors Indoors: Immersive Experiences of Recreation in Nature and Coastal Environments in Residential Care Homes, E-review of Tourism Research Texas A&M AgriLife
This paper critiques the opportunities afforded by immersive experience technology to create stimulating, innovative living environments for long-term residents of care homes for the elderly. We identify the ways in which virtual mobility can facilitate reconnection with recreational environments. Specifically, the project examines the potential of two assistive and immersive experiences; virtual reality (VR) and multisensory stimulation environments (MSSE). Findings identify three main areas of knowledge contribution. First, the introduction of VR and MSSE facilitated participants re-engagement and sharing of past experiences as they recalled past family holidays, day trips or everyday practices. Secondly, the combination of the hardware of the VR and MSSE technology with the physical objects of the sensory trays created alternative, multisensual ways of engaging with the experiences presented to participants. Lastly, the clear preference for the MSSE experience over the VR experience highlighted the importance of social interaction and exchange for participants.