Dr Jean-Yves Guillemaut


Senior Lecturer in 3D Computer Vision
MEng (hons), PhD, MIEEE, MBMVA, FHEA
+44 (0)1483 686042
32 BA 00

About

Areas of specialism

3D Computer Vision; 3D Reconstruction; Computational Photography; Virtual and Augmented Reality; Lightfield Imaging; 3D Video; Artificial Intelligence

University roles and responsibilities

  • Senior Lecturer in 3D Computer Vision
  • CVSSP Postgraduate Research Director
  • Department Prizes Officer
  • Professional Training Tutor
  • MSc Personal Tutor

    My qualifications

    2014
    Graduate Certificate in Learning and Teaching
    University of Surrey
    2005
    PhD degree in 3D Computer Vision
    University of Surrey
    2001
    MEng degree (first class honours) with specialisation in Automatic Control and Robotics
    Ecole Centrale de Nantes

    Previous roles

    2012 - 2018
    Lecturer in 3D Computer Vision
    University of Surrey
    2012 - 2017
    CVSSP External Seminar Organiser
    University of Surrey
    2005 - 2012
    Research Fellow
    University of Surrey

    Research

    Research interests

    Research projects

    Research collaborations

    Indicators of esteem

    • Best Poster Award at European Conference on Visual Media Production (CVMP 2016)

    • Best Student Paper Award at Int. Conference on Computer Vision Theory and Applications (VISAPP 2014)

    • University of Surrey Faculty of Engineering and Physical Sciences Researcher of the Year Award (2012)

    • Honorable Mention for the Best Paper Award at ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2012

    • Best Poster Prize at EPSRC/BMVA Summer School on Computer Vision 2002

    Supervision

    Postgraduate research supervision

    Completed postgraduate research projects I have supervised

    Teaching

    Publications

    MATTHEW JAMES BAILEY, Adrian Hilton, Jean-Yves Guillemaut Finite Aperture Stereo Datasets, In: Finite Aperture Stereo CVSSP

    This landing page contains the datasets presented in the paper "Finite Aperture Stereo". The datasets are intended for defocus-based 3D reconstruction and analysis. Each download link contains images of a static scene, captured from multiple viewpoints and with different focus settings. The captured objects exhibit a range of reflectance properties and are physically small in scale. Calibration images are also available. A CC BY-NC licence is in effect. Use of this data must be for non-commercial research purposes. Acknowledgement must be given to the original authors by referencing the dataset DOI, the dataset web address, and the aforementioned publication. Re-distribution of this data is prohibited. Before downloading, you must agree with these conditions as presented on the dataset webpage.

    Matthew James Bailey, Adrian Douglas Mark Hilton, Jean-Yves Guillemaut (2022)Finite Aperture Stereo: 3D Reconstruction of Macro-Scale Scenes, In: Finite Aperture Stereo Institute of Electrical and Electronics Engineers (IEEE)

    While the accuracy of multi-view stereo (MVS) has continued to advance, its performance reconstructing challenging scenes from images with a limited depth of field is generally poor. Typical implementations assume a pinhole camera model, and therefore treat defocused regions as a source of outlier. In this paper, we address these limitations by instead modelling the camera as a thick lens. Doing so allows us to exploit the complementary nature of stereo and defocus information, and overcome constraints imposed by traditional MVS methods. Using our novel reconstruction framework, we recover complete 3D models of complex macro-scale scenes. Our approach demonstrates robustness to view-dependent materials, and outperforms state-of-the-art MVS and depth from defocus across a range of real and synthetic datasets.

    Gianmarco Addari, Jean-Yves Guillemaut (2019)Towards Globally Optimal full 3D reconstruction of scenes with complex reflectance using Helmholtz Stereopsis, In: CVMP '19: Proceedings of the 16th ACM SIGGRAPH European Conference on Visual Media Production8pp. 1-10 Association for Computing Machinery (ACM)

    Many 3D reconstruction techniques are based on the assumption of prior knowledge of the object's surface reflectance, which severely restricts the scope of scenes that can be reconstructed. In contrast, Helmholtz Stereopsis (HS) employs Helmholtz Reciprocity to compute the scene geometry regardless of its Bidirectional Reflectance Distribution Function (BRDF). Despite this advantage, most HS implementations to date have been limited to 2.5D reconstruction, with the few extensions to full 3D being generally limited to a local refinement due to the nature of the optimisers they rely on. In this paper, we propose a novel approach to full 3D HS based on Markov Random Field (MRF) optimisation. After defining a solution space that contains the surface of the object, the energy function to be minimised is computed based on the HS quality measure and a normal consistency term computed across neighbouring surface points. This new method offers several key advantages with respect to previous work: the optimisation is performed globally instead of locally; a more discriminative energy function is used, allowing for better and faster convergence; a novel visibility handling approach to take advantage of Helmholtz reciprocity is proposed; and surface integration is performed implicitly as part of the optimisation process, thereby avoiding the need for an additional step. The approach is evaluated on both synthetic and real scenes, with an analysis of the sensitivity to input noise performed in the synthetic case. Accurate results are obtained on both types of scenes. Further, experimental results indicate that the proposed approach significantly outperforms previous work in terms of geometric and normal accuracy.

    Matthew James Bailey, Adrian Douglas Mark Hilton, Jean-Yves Guillemaut (2022)Finite Aperture Stereo, In: Finite Aperture Stereo Datasets Springer Nature

    Multi-view stereo remains a popular choice when recovering 3D geometry, despite performance varying dramatically according to the scene content. Moreover, typical pinhole camera assumptions fail in the presence of shallow depth of field inherent to macro-scale scenes; limiting application to larger scenes with diffuse reflectance. However, the presence of defocus blur can itself be considered a useful reconstruction cue, particularly in the presence of view-dependent materials. With this in mind, we explore the complimentary nature of stereo and defocus cues in the context of multi-view 3D reconstruction; and propose a complete pipeline for scene modelling from a finite aperature camera that encompasses image formation, camera calibration and reconstruction stages. As part of our evaluation, an ablation study reveals how each cue contributes to the higher performance observed over a range of complex materials and geometries. Though of lesser concern with large apertures, the effects of image noise are also considered. By introducing pre-trained deep feature extraction into our cost function, we show a step improvement over per-pixel comparisons; as well as verify the cross-domain applicability of networks using largely in-focus training data applied to defocused images. Finally, we compare to a number of modern multi-view stereo methods, and demonstrate how the use of both cues leads to a significant increase in performance across several synthetic and real datasets.

    Jean-Yves Guillemaut, J Kilner, J Starck, Adrian Hilton (2007)Dynamic feathering: Minimising blending artefacts in view-dependent rendering, In: IET Conference Publications534(534 CP)

    Conventional view-dependent texture mapping techniques produce composite images by blending subsets of input images, weighted according to their relative influence at the rendering viewpoint, over regions where the views overlap. Geometric or camera calibration errors often result in a los s of detail due to blurring or double exposure artefacts which tends to be exacerbated by the number of blending views considered. We propose a novel view-dependent rendering technique which optimises the blend region dynamically at rendering time, and reduces the adverse effects of camera calibration or geometric errors otherwise observed. The technique has been successfully integrated in a rendering pipeline which operates at interactive frame rates. Improvement over state-of-the-art view-dependent texture mapping techniques are illustrated on a synthetic scene as well as real imagery of a large scale outdoor scene where large camera calibration and geometric errors are present.

    Armin Mustafa, Hansung Kim, Jean-Yves Guillemaut, Adrian Hilton (2015)General Dynamic Scene Reconstruction from Multiple View Video, In: 2015 IEEE International Conference on Computer Vision (ICCV)pp. 900-908 IEEE

    This paper introduces a general approach to dynamic scene reconstruction from multiple moving cameras without prior knowledge or limiting constraints on the scene structure, appearance, or illumination. Existing techniques for dynamic scene reconstruction from multiple wide-baseline camera views primarily focus on accurate reconstruction in controlled environments, where the cameras are fixed and calibrated and background is known. These approaches are not robust for general dynamic scenes captured with sparse moving cameras. Previous approaches for outdoor dynamic scene reconstruction assume prior knowledge of the static background appearance and structure. The primary contributions of this paper are twofold: an automatic method for initial coarse dynamic scene segmentation and reconstruction without prior knowledge of background appearance or structure; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes from multiple wide-baseline static or moving cameras. Evaluation is performed on a variety of indoor and outdoor scenes with cluttered backgrounds and multiple dynamic non-rigid objects such as people. Comparison with state-of-the-art approaches demonstrates improved accuracy in both multiple view segmentation and dense reconstruction. The proposed approach also eliminates the requirement for prior knowledge of scene structure and appearance.

    JJ Kilner, J Starck, A Hilton, JY Guillemaut, O Grau (2007)Dual Mode Deformable Models for Free-Viewpoint Video of Outdoor Sports Events, In: IEEE Int.Conf. on 3D Imaging and Modelingpp. 177-184
    JJ Kilner, J-Y Guillemaut, A Hilton (2009)Summarised Hierarchical Markov Models for Speed Invariant Action Matching., In: ICCV Workshop on Tracking Humans for the Evaluation of their Motion in Image Sequencespp. 1065-1072

    Action matching, where a recorded sequence is matched against, and synchronised with, a suitable proxy from a library of animations, is a technique for generating a synthetic representation of a recorded human activity. This proxy can then be used to represent the action in a virtual environment or as a prior on further processing of the sequence. In this paper we present a novel technique for performing action matching in outdoor sports environments. Outdoor sports broadcasts are typically multi-camera environments and as such reconstruction techniques can be applied to the footage to generate a 3D model of the scene. However due to poor calibration and matting this reconstruction is of a very low quality. Our technique matches the 3D reconstruction sequence against a predefined library of actions to select an appropriate high quality synthetic representation. A hierarchical Markov model combined with 3D summarisation of the data allows a large number of different actions to be matched successfully to the sequence in a rate-invariant manner without prior segmentation of the sequence into discrete units. The technique is applied to data captured at rugby and soccer games.

    Michaela Spiteri, Jean-Yves Guillemaut, David Windridge, Shivaram Avula, Ram Kumar, Emma Lewis (2019)Fully-Automated Identification of Imaging Biomarkers for Post-Operative Cerebellar Mutism Syndrome Using Longitudinal Paediatric MRI, In: Neuroinformatics18(1)pp. 151-162 Springer US

    Post-operative cerebellar mutism syndrome (POPCMS) in children is a post- surgical complication which occurs following the resection of tumors within the brain stem and cerebellum. High resolution brain magnetic resonance (MR) images acquired at multiple time points across a patient’s treatment allow the quantification of localized changes caused by the progression of this syndrome. However, MR images are not necessarily acquired at regular intervals throughout treatment and are often not volumetric. This restricts the analysis to 2D space and causes difficulty in intra- and inter-subject comparison. To address these challenges, we have developed an automated image processing and analysis pipeline. Multi-slice 2D MR image slices are interpolated in space and time to produce a 4D volumetric MR image dataset providing a longitudinal representation of the cerebellum and brain stem at specific time points across treatment. The deformations within the brain over time are represented using a novel metric known as the Jacobian of deformations determinant. This metric, together with the changing grey-level intensity of areas within the brain over time, are analyzed using machine learning techniques in order to identify biomarkers that correspond with the development of POPCMS following tumor resection. This study makes use of a fully automated approach which is not hypothesis-driven. As a result, we were able to automatically detect six potential biomarkers that are related to the development of POPCMS following tumor resection in the posterior fossa.

    M Fastovets, J-Y Guillemaut, A Hilton (2014)Athlete pose estimation by non-sequential key-frame propagation., In: P Hall, JP Collomosse, D Cosker (eds.), CVMPpp. 3:1-3:1
    J-Y Guillemaut, J Kilner, A Hilton (2009)Robust Graph-Cut Scene Segmentation and Reconstruction for Free-Viewpoint Video of Complex Dynamic Scenes, In: 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)pp. 809-816
    M Brown, D Windridge, JY Guillemaut (2015)A generalisable framework for saliency-based line segment detection, In: Pattern Recognition48(12)pp. 3993-4011 Elsevier

    © 2015 The Authors. Here we present a novel, information-theoretic salient line segment detector. Existing line detectors typically only use the image gradient to search for potential lines. Consequently, many lines are found, particularly in repetitive scenes. In contrast, our approach detects lines that define regions of significant divergence between pixel intensity or colour statistics. This results in a novel detector that naturally avoids the repetitive parts of a scene while detecting the strong, discriminative lines present. We furthermore use our approach as a saliency filter on existing line detectors to more efficiently detect salient line segments. The approach is highly generalisable, depending only on image statistics rather than image gradient; and this is demonstrated by an extension to depth imagery. Our work is evaluated against a number of other line detectors and a quantitative evaluation demonstrates a significant improvement over existing line detectors for a range of image transformations.

    C Malleson, M Klaudiny, A Hilton, J-Y Guillemaut (2013)Single-view RGBD-based reconstruction of dynamic human geometry, In: Proceedings of the IEEE International Conference on Computer Vision - Workshop on Dynamic Shape Capture and Analysis (4DMOD 2013)pp. 307-314

    We present a method for reconstructing the geometry and appearance of indoor scenes containing dynamic human subjects using a single (optionally moving) RGBD sensor. We introduce a framework for building a representation of the articulated scene geometry as a set of piecewise rigid parts which are tracked and accumulated over time using moving voxel grids containing a signed distance representation. Data association of noisy depth measurements with body parts is achieved by online training of a prior shape model for the specific subject. A novel frame-to-frame model registration is introduced which combines iterative closest-point with additional correspondences from optical flow and prior pose constraints from noisy skeletal tracking data. We quantitatively evaluate the reconstruction and tracking performance of the approach using a synthetic animated scene. We demonstrate that the approach is capable of reconstructing mid-resolution surface models of people from low-resolution noisy data acquired from a consumer RGBD camera. © 2013 IEEE.

    M Sarim, A Hilton, Jean-Yves Guillemaut, T Takai, Hansung Kim (2010)Natural image matting for multiple wide-baseline views, In: Proceedings of 17th IEEE International Conference on Image Processing (ICIP)pp. 2233-2236

    In this paper we present a novel approach to estimate the alpha mattes of a foreground object captured by a widebaseline circular camera rig provided a single key frame trimap. Bayesian inference coupled with camera calibration information are used to propagate high confidence trimaps labels across the views. Recent techniques have been developed to estimate an alpha matte of an image using multiple views but they are limited to narrow baseline views with low foreground variation. The proposed wide-baseline trimap propagation is robust to inter-view foreground appearance changes, shadows and similarity in foreground/background appearance for cameras with opposing views enabling high quality alpha matte extraction using any state-of-the-art image matting algorithm.

    SK Hall, TH Williamson, Jean-Yves Guillemaut, T Goddard, AP Baumann, JC Hutter (2017)Modeling the Dynamics of Tamponade Multicomponent Gases During Retina Reattachment Surgery, In: AIChE Journal63(9)pp. 3651-3662 Wiley, for American Institute of Chemical Engineers

    Vitrectomy and pneumatic retinopexy are common surgical procedures used to treat retinal detachment. To reattach the retina, gases are used to inflate the vitreous space allowing the retina to attach by surface tension and buoyancy forces that are superior to the location of the bubble. These procedures require the injection of either a pure tamponade gas, such as C3F8 or SF6, or mixtures of these gases with air. The location of the retinal detachment, the anatomical spread of the retinal defect, and the length of time the defect has persisted, will determine the suggested volume and duration of the gas bubble to allow reattachment. After inflation, the gases are slowly absorbed by the blood allowing the vitreous to be refilled by aqueous. We have developed a model of the mass transfer dynamics of tamponade gases during pneumatic retinopexy or pars plana vitrectomy procedures. The model predicts the expansion and persistence of intraocular gases (C3F8, SF6), oxygen, nitrogen, and carbon dioxide, as well as the intraocular pressure. The model was validated using published literature in rabbits and humans. In addition to correlating the mass transfer dynamics by surface area, permeability, and partial pressure driving forces, the mass transfer dynamics are affected by the percentage of the tamponade gases. Rates were also correlated with the physical properties of the tamponade and blood gases. The model gave accurate predictions in humans.

    Caroline Scarles, Suzanne van Evan, Naomi Klepacz, Jean-Yves Guillemaut, Michael Humbracht (2020)Bringing The Outdoors Indoors: Immersive Experiences of Recreation in Nature and Coastal Environments in Residential Care Homes, In: E-review of Tourism Research Texas A&M AgriLife

    This paper critiques the opportunities afforded by immersive experience technology to create stimulating, innovative living environments for long-term residents of care homes for the elderly. We identify the ways in which virtual mobility can facilitate reconnection with recreational environments. Specifically, the project examines the potential of two assistive and immersive experiences; virtual reality (VR) and multisensory stimulation environments (MSSE). Findings identify three main areas of knowledge contribution. First, the introduction of VR and MSSE facilitated participants re-engagement and sharing of past experiences as they recalled past family holidays, day trips or everyday practices. Secondly, the combination of the hardware of the VR and MSSE technology with the physical objects of the sensory trays created alternative, multisensual ways of engaging with the experiences presented to participants. Lastly, the clear preference for the MSSE experience over the VR experience highlighted the importance of social interaction and exchange for participants.

    J-Y Guillemaut, J Kilner, A Hilton (2009)Robust Graph-Cut Scene Segmentation and Reconstruction for Free-Viewpoint Video of Complex Dynamic Scenes, In: IEEE Int.Conf. on Computer Vision, ICCVpp. 809-816
    Nadejda Roubtsova, Jean-Yves Guillemaut (2017)Bayesian Helmholtz Stereopsis with Integrability Prior, In: IEEE Transactions on Pattern Analysis and Machine Intelligence40(9)pp. 2265-2272 Institute of Electrical and Electronics Engineers (IEEE)

    Helmholtz Stereopsis is a 3D reconstruction method uniquely independent of surface reflectance. Yet, its sub-optimal maximum likelihood formulation with drift-prone normal integration limits performance. Via three contributions this paper presents a complete novel pipeline for Helmholtz Stereopsis. Firstly, we propose a Bayesian formulation replacing the maximum likelihood problem by a maximum a posteriori one. Secondly, a tailored prior enforcing consistency between depth and normal estimates via a novel metric related to optimal surface integrability is proposed. Thirdly, explicit surface integration is eliminated by taking advantage of the accuracy of prior and high resolution of the coarse-to-fine approach. The pipeline is validated quantitatively and qualitatively against alternative formulations, reaching sub-millimetre accuracy and coping with complex geometry and reflectance.

    D Casas, M Tejera, Jean-Yves Guillemaut, A Hilton (2013)Interactive Animation of 4D Performance Capture., In: IEEE Trans. Vis. Comput. Graph.195pp. 762-773
    M Brown, D Windbridge, J Guillemaut (2015)Globally Optimal 2D-3D Registration from Points or Lines Without Correspondences, In: Proceedings of International Conference on Computer Vision (ICCV 2015)

    We present a novel approach to 2D-3D registration from points or lines without correspondences. While there exist established solutions in the case where correspondences are known, there are many situations where it is not possible to reliably extract such correspondences across modalities, thus requiring the use of a correspondence-free registration algorithm. Existing correspondence-free methods rely on local search strategies and consequently have no guarantee of finding the optimal solution. In contrast, we present the first globally optimal approach to 2D-3D registration without correspondences, achieved by a Branch-and-Bound algorithm. Furthermore, a deterministic annealing procedure is proposed to speed up the nested branch-and-bound algorithm used. The theoretical and practical advantages this brings are demonstrated on a range of synthetic and real data where it is observed that the proposed approach is significantly more robust to high proportions of outliers compared to existing approaches.

    Nadejda Roubtsova, Jean-Yves Guillemaut (2020)Helmholtz Stereopsis Synthetic Dataset University of Surrey
    Nadejda Roubtsova, Jean-Yves Guillemaut (2016)Colour Helmholtz Stereopsis for Reconstruction of Dynamic Scenes with Arbitrary Unknown Reflectance, In: International Journal of Computer Vision124pp. 18-48 Springer

    Helmholtz Stereopsis is a powerful technique for reconstruction of scenes with arbitrary re ectance properties. However, previous formulations have been limited to static objects due to the requirement to se- quentially capture reciprocal image pairs (i.e. two im- ages with the camera and light source positions mu- tually interchanged). In this paper, we propose Colour Helmholtz Stereopsis - a novel framework for Helmholtz Stereopsis based on wavelength multiplexing. To ad- dress the new set of challenges introduced by multispec- tral data acquisition, the proposed Colour Helmholtz Stereopsis pipeline uniquely combines a tailored pho- tometric calibration for multiple camera/light source pairs, a novel procedure for spatio-temporal surface chromaticity calibration and a state-of-the-art Bayesian formulation necessary for accurate reconstruction from a minimal number of reciprocal pairs. In this frame- work, re ectance is spatially unconstrained both in terms of its chromaticity and the directional component dependent on the illumination incidence and viewing angles. The proposed approach for the rst time en- ables modelling of dynamic scenes with arbitrary un- known and spatially varying re ectance using a practi- cal acquisition set-up consisting of a small number of cameras and light sources. Experimental results demon- strate the accuracy and exibility of the technique on a variety of static and dynamic scenes with arbitrary un- known BRDF and chromaticity ranging from uniform to arbitrary and spatially varying.

    A Neophytou, J-Y Guillemaut, A Hilton (2015)A dense surface motion capture system for accurate acquisition of cloth deformation, In: CVMP 2015: PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON VISUAL MEDIA PRODUCTION
    J Kilner, J-Y Guillemaut, A Hilton (2010)3D action matching with key-pose detection, In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009pp. 1-8

    This paper addresses the problem of human action matching in outdoor sports broadcast environments, by analysing 3D data from a recorded human activity and retrieving the most appropriate proxy action from a motion capture library. Typically pose recognition is carried out using images from a single camera, however this approach is sensitive to occlusions and restricted fields of view, both of which are common in the outdoor sports environment. This paper presents a novel technique for the automatic matching of human activities which operates on the 3D data available in a multi-camera broadcast environment. Shape is retrieved using multi-camera techniques to generate a 3D representation of the scene. Use of 3D data renders the system camera-pose-invariant and allows it to work while cameras are moving and zooming. By comparing the reconstructions to an appropriate 3D library, action matching can be achieved in the presence of significant calibration and matting errors which cause traditional pose detection schemes to fail. An appropriate feature descriptor and distance metric are presented as well as a technique to use these features for key-pose detection and action matching. The technique is then applied to real footage captured at an outdoor sporting event. ©2009 IEEE.

    Gianmarco Addari, Jean-Yves Guillemaut (2019)An MRF Optimisation Framework for Full 3D Helmholtz Steropsis, In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications Institute for Systems and Technologies of Information, Control and Communication (INSTICC)

    Accurate 3D modelling of real world objects is essential in many applications such as digital film production and cultural heritage preservation. However, current modelling techniques rely on assumptions to constrain the problem, effectively limiting the categories of scenes that can be reconstructed. A common assumption is that the scene’s surface reflectance is Lambertian or known a priori. These constraints rarely hold true in practice and result in inaccurate reconstructions. Helmholtz Stereopsis (HS) addresses this limitation by introducing a reflectance agnostic modelling constraint, but prior work in this area has been predominantly limited to 2.5D reconstruction, providing only a partial model of the scene. In contrast, this paper introduces the first Markov Random Field (MRF) optimisation framework for full 3D HS. First, an initial reconstruction is obtained by performing 2.5D MRF optimisation with visibility constraints from multiple viewpoints and fusing the different outputs. Then, a refined 3D model is obtained through volumetric MRF optimisation using a tailored Iterative Conditional Modes (ICM) algorithm. The proposed approach is evaluated with both synthetic and real data. Results show that the proposed full 3D optimisation significantly increases both geometric and normal accuracy, being able to achieve sub-millimetre precision. Furthermore, the approach is shown to be robust to occlusions and noise.

    HE Imre, J-Y Guillemaut, ADM Hilton (2012)Moving Camera Registration for Multiple Camera Setups in Dynamic Scenes, In: Proceedings of the 21st British Machine Vision Conference

    Many practical applications require an accurate knowledge of the extrinsic calibration (____ie, pose) of a moving camera. The existing SLAM and structure-from-motion solutions are not robust to scenes with large dynamic objects, and do not fully utilize the available information in the presence of static cameras, a common practical scenario. In this paper, we propose an algorithm that addresses both of these issues for a hybrid static-moving camera setup. The algorithm uses the static cameras to build a sparse 3D model of the scene, with respect to which the pose of the moving camera is estimated at each time instant. The performance of the algorithm is studied through extensive experiments that cover a wide range of applications, and is shown to be satisfactory.

    JJ Kilner, J-Y Guillemaut, A Hilton (2009)3D Action Matching with Key-Pose Detection, In: IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops)pp. 1-8

    This paper addresses the problem of human action matching in outdoor sports broadcast environments, by analysing 3D data from a recorded human activity and retrieving the most appropriate proxy action from a motion capture library. Typically pose recognition is carried out using images from a single camera, however this approach is sensitive to occlusions and restricted fields of view, both of which are common in the outdoor sports environment. This paper presents a novel technique for the automatic matching of human activities which operates on the 3D data available in a multi-camera broadcast environment. Shape is retrieved using multi-camera techniques to generate a 3D representation of the scene. Use of 3D data renders the system camera-pose-invariant and allows it to work while cameras are moving and zooming. By comparing the reconstructions to an appropriate 3D library, action matching can be achieved in the presence of significant calibration and matting errors which cause traditional pose detection schemes to fail. An appropriate feature descriptor and distance metric are presented as well as a technique to use these features for key-pose detection and action matching. The technique is then applied to real footage captured at an outdoor sporting event

    D Casas, M Tejera, J-Y Guillemaut, A Hilton (2012)Parametric animation of performance-captured mesh sequences, In: COMPUTER ANIMATION AND VIRTUAL WORLDS23(2)pp. 101-111 WILEY-BLACKWELL
    J Kilner, J Starck, Jean-Yves Guillemaut, A Hilton (2009)Objective Quality Assessment in Free-viewpoint Video Production, In: Signal Processing: Image Communication24(1-2)pp. 3-16 Elsevier
    Charles Malleson, Jean-Yves Guillemaut, Adrian Hilton (2019)3D Reconstruction from RGB-D Data, In: Paul L. Rosin, Yu-Kun Lai, Ling Shao, Yonghuai Liu (eds.), RGB-D Image Analysis and Processingpp. pp 87-115 Springer Nature Switzerland AG

    A key task in computer vision is that of generating virtual 3D models of real-world scenes by reconstructing the shape, appearance and, in the case of dynamic scenes, motion of the scene from visual sensors. Recently, low-cost video plus depth (RGB-D) sensors have become widely available and have been applied to 3D reconstruction of both static and dynamic scenes. RGB-D sensors contain an active depth sensor, which provides a stream of depth maps alongside standard colour video. The low cost and ease of use of RGB-D devices as well as their video rate capture of images along with depth make them well suited to 3D reconstruction. Use of active depth capture overcomes some of the limitations of passive monocular or multiple-view video-based approaches since reliable, metrically accurate estimates of the scene depth at each pixel can be obtained from a single view, even in scenes that lack distinctive texture. There are two key components to 3D reconstruction from RGB-D data: (1) spatial alignment of the surface over time and, (2) fusion of noisy, partial surface measurements into a more complete, consistent 3D model. In the case of static scenes, the sensor is typically moved around the scene and its pose is estimated over time. For dynamic scenes, there may be multiple rigid, articulated, or non-rigidly deforming surfaces to be tracked over time. The fusion component consists of integration of the aligned surface measurements, typically using an intermediate representation, such as the volumetric truncated signed distance field (TSDF). In this chapter, we discuss key recent approaches to 3D reconstruction from depth or RGB-D input, with an emphasis on real-time reconstruction of static scenes.

    V Brujic-Okretic, J Guillemaut, L Hitchin, M Michielen, G Parker (2004)Real-time scene reconstruction for remote vehicle navigation, In: Geometric Modeling and Computing: Seattle 2003pp. 113-123
    Tom H. Williamson, Jean-Yves Guillemaut, Sheldon K. Hall, Joseph C. Hutter, Tony Goddard (2018)Theoretical gas concentrations achieving 100% fill of the vitreous cavity in the postoperative period, a gas eye model study (GEMS), In: RETINA, The Journal of Retinal and Vitreous Diseases38pp. S60-S64 Lippincott, Williams & Wilkins

    Precis. A mathematical model is described of the physical properties of intraocular gases providing a guide to the correct gas concentrations to achieve 100% fill of the vitreous cavity postoperatively. A table for the instruction of surgeons is provided and the effects of different axial lengths examined. ABSTRACT Purpose – To determine the concentrations of different gas tamponades in air to achieve 100% fill of the vitreous cavity postoperatively and to examine the influence of eye volume on these concentrations. Methods – A mathematical model of the mass transfer dynamics of tamponade and blood gases (O2, N2, CO2) when injected into the eye was used. Mass transfer surface areas were calculated from published anatomical data. The model has been calibrated from published volumetric decay and composition results for three gases sulphahexafluoride, SF6, hexafluoroethane, C2F6, or perfluoropropane, C3F8. The concentrations of these gases (in air) required to achieve 100% fill of the vitreous cavity postoperatively without an intra-ocular pressure rise were determined. The concentrations were calculated for three volumes of the vitreous cavity to test if ocular size influenced the results. Results – A table of gas concentrations was produced. In a simulation of pars plana vitrectomy operations in which an 80% to 85% fill of the vitreous cavity with gas was achieved at surgery, the concentrations of the three gases in air to achieve 100% fill postoperatively were 10-13% for C3F8, 12-15% for C2F6 and 19-25% for SF6. These were similar to the so-called ''non-expansive'' concentrations used in the clinical setting. The calculations were repeated for three different sizes of eye. Aiming for an 80% fill at surgery and 100% postoperatively, an eye with a 4ml vitreous cavity required 24% SF6, 15% C2F6 or 13% C3F8; 7.2ml required 25% SF6, 15% C2F6 or 13% C3F8; and 10ml required 25% SF6, 16% C2F6 or 13% C3F8. When using 100% gas (for example, employed in pneumatic retinopexy), in order to achieve 100% fill postoperatively, the minimum vitreous cavity fill at surgery was 43% for SF6, 29% for C2F6 and 25% for C3F8 and was only minimally changed by variation in the size of the eye. Conclusions – A table has been produced which could be used for surgical innovation in gas usage in the vitreous cavity. It provides concentrations for different percentage fills, which will achieve a moment post-operatively with a full fill of the cavity without a pressure rise. Variation in axial length and size of the eye does not appear to alter the values in the table significantly. Those using pneumatic retinopexy need to increase the volume of gas injected with increased size of the eye in order to match the percentage fill of the vitreous cavity recommended for a given tamponade agent.

    We propose a multi-view framework for joint object detection and labelling based on pairs of images. The proposed framework extends the single-view Mask R-CNN approach to multiple views without need for additional training. Dedicated components are embedded into the framework to match objects across views by enforcing epipolar constraints, appearance feature similarity and class coherence. The multi-view extension enables the proposed framework to detect objects which would otherwise be mis-detected in a classical Mask R-CNN approach, and achieves coherent object labelling across views. By avoiding the need for additional training, the approach effectively overcomes the current shortage of multi-view datasets. The proposed framework achieves high quality results on a range of complex scenes, being able to output class, bounding box, mask and an additional label enforcing coherence across views. In the evaluation, we show qualitative and quantitative results on several challenging outd oor multi-view datasets and perform a comprehensive comparison to verify the advantages of the proposed method

    J-Y Guillemaut, O Drbohlav, J Illingworth, R Sara (2008)A maximum likelihood surface normal estimation algorithm for Helmholtz stereopsis, In: VISAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2pp. 352-359
    J Imber, J-Y Guillemaut, A Hilton (2014)Intrinsic textures for relightable free-viewpoint video, In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)8690 L(PART 2)pp. 392-407

    This paper presents an approach to estimate the intrinsic texture properties (albedo, shading, normal) of scenes from multiple view acquisition under unknown illumination conditions. We introduce the concept of intrinsic textures, which are pixel-resolution surface textures representing the intrinsic appearance parameters of a scene. Unlike previous video relighting methods, the approach does not assume regions of uniform albedo, which makes it applicable to richly textured scenes. We show that intrinsic image methods can be used to refine an initial, low-frequency shading estimate based on a global lighting reconstruction from an original texture and coarse scene geometry in order to resolve the inherent global ambiguity in shading. The method is applied to relighting of free-viewpoint rendering from multiple view video capture. This demonstrates relighting with reproduction of fine surface detail. Quantitative evaluation on synthetic models with textured appearance shows accurate estimation of intrinsic surface reflectance properties. © 2014 Springer International Publishing.

    J-Y Guillemaut, A Hilton (2012)Space-Time Joint Multi-Layer Segmentation and Depth Estimation, In: SECOND JOINT 3DIM/3DPVT CONFERENCE: 3D IMAGING, MODELING, PROCESSING, VISUALIZATION & TRANSMISSION (3DIMPVT 2012)pp. 440-447
    Mark Brown, David Windridge, Jean-Yves Guillemaut (2019)A family of globally optimal branch-and-bound algorithms for 2D–3D correspondence-free registration, In: Pattern Recognition93pp. 36-54 Elsevier

    We present a family of methods for 2D–3D registration spanning both deterministic and non-deterministic branch-and-bound approaches. Critically, the methods exhibit invariance to the underlying scene primitives, enabling e.g. points and lines to be treated on an equivalent basis, potentially enabling a broader range of problems to be tackled while maximising available scene information, all scene primitives being simultaneously considered. Being a branch-and-bound based approach, the method furthermore enjoys intrinsic guarantees of global optimality; while branch-and-bound approaches have been employed in a number of computer vision contexts, the proposed method represents the first time that this strategy has been applied to the 2D–3D correspondence-free registration problem from points and lines. Within the proposed procedure, deterministic and probabilistic procedures serve to speed up the nested branch-and-bound search while maintaining optimality. Experimental evaluation with synthetic and real data indicates that the proposed approach significantly increases both accuracy and robustness compared to the state of the art.

    Matthew Bailey, Jean-Yves Guillemaut (2020)A Novel Depth from Defocus Framework Based on a Thick Lens Camera Model, In: 2020 International Conference on 3D Vision (3DV)pp. 1206-1215 IEEE

    Reconstruction approaches based on monocular defocus analysis such as Depth from Defocus (DFD) often utilise the thin lens camera model. Despite this widespread adoption, there are inherent limitations associated with it. Coupled with invalid parameterisation commonplace in literature, the overly-simplified image formation it describes leads to inaccurate defocus modelling; especially in macro-scale scenes. As a result, DFD reconstructions based around this model are not geometrically consistent, and are typically restricted to single-view applications. Subsequently, the handful of existing approaches which attempt to include additional viewpoints have had only limited success.In this work, we address these issues by instead utilising a thick lens camera model, and propose a novel calibration procedure to accurately parameterise it. The effectiveness of our model and calibration is demonstrated with a novel DFD reconstruction framework. We achieve highly detailed, geometrically accurate and complete 3D models of real-world scenes from multi-view focal stacks. To our knowledge, this is the first time DFD has been successfully applied to complete scene modelling in this way.

    M Sarim, JY Guillemaut, H Kim, A Hilton (2009)Wide-baseline Image Matting, In: European Conference on Visual Media Production(CVMP)
    Armin Mustafa, Marco Volino, Hansung Kim, Jean-Yves Guillemaut, Adrian Hilton (2020)Temporally coherent general dynamic scene reconstruction, In: International Journal of Computer Vision Springer

    Existing techniques for dynamic scene re- construction from multiple wide-baseline cameras pri- marily focus on reconstruction in controlled environ- ments, with fixed calibrated cameras and strong prior constraints. This paper introduces a general approach to obtain a 4D representation of complex dynamic scenes from multi-view wide-baseline static or moving cam- eras without prior knowledge of the scene structure, ap- pearance, or illumination. Contributions of the work are: An automatic method for initial coarse reconstruc- tion to initialize joint estimation; Sparse-to-dense tem- poral correspondence integrated with joint multi-view segmentation and reconstruction to introduce tempo- ral coherence; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes by introducing shape constraint. Com- parison with state-of-the-art approaches on a variety of complex indoor and outdoor scenes, demonstrates im- proved accuracy in both multi-view segmentation and dense reconstruction. This paper demonstrates unsuper- vised reconstruction of complete temporally coherent 4D scene models with improved non-rigid object seg- mentation and shape reconstruction and its application to various applications such as free-view rendering and virtual reality.

    M Sarim, A Hilton, Jean-Yves Guillemaut (2009)WIDE-BASELINE MATTE PROPAGATION FOR INDOOR SCENES, In: 2009 CONFERENCE FOR VISUAL MEDIA PRODUCTION: CVMP 2009pp. 195-204

    This paper presents a method to estimate alpha mattes for video sequences of the same foreground scene from wide-baseline views given sparse key-frame trimaps in a single view. A statistical inference framework is introduced for spatio-temporal propagation of high-confidence trimap labels between video sequences without a requirement for correspondence or camera calibration and motion estimation. Multiple view trimap propagation integrates appearance information between views and over time to achieve robust labelling in the presence of shadows, changes in appearance with view point and overlap between foreground and background appearance. Results demonstrate that trimaps are sufficiently accurate to allow high-quality video matting using existing single view natural image matting algorithms. Quantitative evaluation against ground-truth demonstrates that the approach achieves accurate matte estimation for camera views separated by up to 180◦ , with the same amount of manual interaction required for conventional single view video matting.

    Armin Mustafa, Marco Volino, Jean-Yves Guillemaut, Adrian Hilton (2018)4D Temporally Coherent Light-field Video, In: 3DV 2017 Proceedings IEEE

    Light-field video has recently been used in virtual and augmented reality applications to increase realism and immersion. However, existing light-field methods are generally limited to static scenes due to the requirement to acquire a dense scene representation. The large amount of data and the absence of methods to infer temporal coherence pose major challenges in storage, compression and editing compared to conventional video. In this paper, we propose the first method to extract a spatio-temporally coherent light-field video representation. A novel method to obtain Epipolar Plane Images (EPIs) from a spare lightfield camera array is proposed. EPIs are used to constrain scene flow estimation to obtain 4D temporally coherent representations of dynamic light-fields. Temporal coherence is achieved on a variety of light-field datasets. Evaluation of the proposed light-field scene flow against existing multiview dense correspondence approaches demonstrates a significant improvement in accuracy of temporal coherence.

    J-Y Guillemaut, J Kittler, MT Sadeghi, WJ Christmas (2006)General pose face recognition using frontal face model, In: PROGRESS IN PATTERN RECOGNITON, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS4225pp. 79-88
    M Klaudiny, M Tejera, C Malleson, J-Y Guillemaut, A Hilton (2020)SCENE Digital Cinema Datasets University of Surrey
    JY Guillemaut, A Hilton, J Starck, JJ Kilner, O Grau (2007)A Baysian Framework for Simultaneous Reconstruction and Matting, In: IEEE Int.Conf. on 3D Imaging and Modelingpp. 167-176
    ADM Hilton, Jean-Yves Guillemaut, JJ Kilner, O Grau, G Thomas (2011)3D-TV Production from Conventional Cameras for Sports Broadcast, In: IEEE Transactions Broadcasting57(2)pp. 462-476 IEEE

    3DTV production of live sports events presents a challenging problem involving conflicting requirements of main- taining broadcast stereo picture quality with practical problems in developing robust systems for cost effective deployment. In this paper we propose an alternative approach to stereo production in sports events using the conventional monocular broadcast cameras for 3D reconstruction of the event and subsequent stereo rendering. This approach has the potential advantage over stereo camera rigs of recovering full scene depth, allowing inter-ocular distance and convergence to be adapted according to the requirements of the target display and enabling stereo coverage from both existing and ‘virtual’ camera positions without additional cameras. A prototype system is presented with results of sports TV production trials for rendering of stereo and free-viewpoint video sequences of soccer and rugby.

    Conventional stereoscopic video content production requires use of dedicated stereo camera rigs which is both costly and lacking video editing flexibility. In this paper, we propose a novel approach which only requires a small number of standard cameras sparsely located around a scene to automatically convert the monocular inputs into stereoscopic streams. The approach combines a probabilistic spatio-temporal segmentation framework with a state-of-the-art multi-view graph-cut reconstruction algorithm, thus providing full control of the stereoscopic settings at render time. Results with studio sequences of complex human motion demonstrate the suitability of the method for high quality stereoscopic content generation with minimum user interaction.

    M Fastovets, J-Y Guillemaut, A Hilton (2013)Athlete Pose Estimation from Monocular TV Sports Footage, In: 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW)pp. 1048-1054
    A Mustafa, H Kim, J-Y Guillemaut, ADM Hilton (2016)Temporally coherent 4D reconstruction of complex dynamic scenes, In: CVPR 2016 Proceedings

    This paper presents an approach for reconstruction of 4D temporally coherent models of complex dynamic scenes. No prior knowledge is required of scene structure or camera calibration allowing reconstruction from multiple moving cameras. Sparse-to-dense temporal correspondence is integrated with joint multi-view segmentation and reconstruction to obtain a complete 4D representation of static and dynamic objects. Temporal coherence is exploited to overcome visual ambiguities resulting in improved reconstruction of complex scenes. Robust joint segmentation and reconstruction of dynamic objects is achieved by introducing a geodesic star convexity constraint. Comparative evaluation is performed on a variety of unstructured indoor and outdoor dynamic scenes with hand-held cameras and multiple people. This demonstrates reconstruction of complete temporally coherent 4D scene models with improved nonrigid object segmentation and shape reconstruction.

    Mark Brown, David Windridge, Jean-Yves Guillemaut (2016)A Generalised Framework for Saliency-Based Point Feature Detection, In: Computer Vision and Image Understanding157pp. 117-137 Elsevier

    Here we present a novel, histogram-based salient point feature detector that may naturally be applied to both images and 3D data. Existing point feature detectors are often modality specific, with 2D and 3D feature detectors typically constructed in separate ways. As such, their applicability in a 2D-3D context is very limited, particularly where the 3D data is obtained by a LiDAR scanner. By contrast, our histogram-based approach is highly generalisable and as such, may be meaningfully applied between 2D and 3D data. Using the generalised approach, we propose salient point detectors for images, and both untextured and textured 3D data. The approach naturally allows for the detection of salient 3D points based jointly on both the geometry and texture of the scene, allowing for broader applicability. The repeatability of the feature detectors is evaluated using a range of datasets including image and LiDAR input from indoor and outdoor scenes. Experimental results demonstrate a significant improvement in terms of 2D-2D and 2D-3D repeatability compared to existing multi-modal feature detectors.

    M Sarim, A Hilton, J Guillemaut (2009)Non-parametric patch based video matting

    In computer vision, matting is the process of accurate foreground estimation in images and videos. In this paper we presents a novel patch based approach to video matting relying on non-parametric statistics to represent image variations in appearance. This overcomes the limitation of parametric algorithms which only rely on strong colour correlation between the nearby pixels. Initially we construct a clean background by utilising the foreground object’s movement across the background. For a given frame, a trimap is constructed using the background and the last frame’s trimap. A patch-based approach is used to estimate the foreground colour for every unknown pixel and finally the alpha matte is extracted. Quantitative evaluation shows that the technique performs better, in terms of the accuracy and the required user interaction, than the current state-of-the-art parametric approaches.

    V Brujic-Okretic, J Guillemaut, L Hitchin, M Michielen, G Parker (2003)Remote vehicle manoeuvring using augmented realitypp. 186-189
    Charles Malleson, Jean-Yves Guillemaut, Adrian Hilton (2018)Hybrid modelling of non-rigid scenes from RGBD cameras, In: IEEE Transactions on Circuits and Systems for Video Technology IEEE

    Recent advances in sensor technology have introduced low-cost RGB video plus depth sensors, such as the Kinect, which enable simultaneous acquisition of colour and depth images at video rates. This paper introduces a framework for representation of general dynamic scenes from video plus depth acquisition. A hybrid representation is proposed which combines the advantages of prior surfel graph surface segmentation and modelling work with the higher-resolution surface reconstruction capability of volumetric fusion techniques. The contributions are (1) extension of a prior piecewise surfel graph modelling approach for improved accuracy and completeness, (2) combination of this surfel graph modelling with TSDF surface fusion to generate dense geometry, and (3) proposal of means for validation of the reconstructed 4D scene model against the input data and efficient storage of any unmodelled regions via residual depth maps. The approach allows arbitrary dynamic scenes to be efficiently represented with temporally consistent structure and enhanced levels of detail and completeness where possible, but gracefully falls back to raw measurements where no structure can be inferred. The representation is shown to facilitate creative manipulation of real scene data which would previously require more complex capture setups or manual processing.

    Chathura Vindana Perera Galkandage, Janko Calic, Safak Dogan, Jean-Yves Guillemaut (2020)Full-Reference Stereoscopic Video Quality Assessment Using a Motion Sensitive HVS Model, In: IEEE Transactions on Circuits and Systems for Video Technology Institute of Electrical and Electronics Engineers

    Stereoscopic video quality assessment has become a major research topic in recent years. Existing stereoscopic video quality metrics are predominantly based on stereoscopic image quality metrics extended to the time domain via for example temporal pooling. These approaches do not explicitly consider the motion sensitivity of the Human Visual System (HVS). To address this limitation, this paper introduces a novel HVS model inspired by physiological findings characterising the motion sensitive response of complex cells in the primary visual cortex (V1 area). The proposed HVS model generalises previous HVS models, which characterised the behaviour of simple and complex cells but ignored motion sensitivity, by estimating optical flow to measure scene velocity at different scales and orientations. The local motion characteristics (direction and amplitude) are used to modulate the output of complex cells. The model is applied to develop a new type of full-reference stereoscopic video quality metrics which uniquely combine non-motion sensitive and motion sensitive energy terms to mimic the response of the HVS. A tailored two-stage multi-variate stepwise regression algorithm is introduced to determine the optimal contribution of each energy term. The two proposed stereoscopic video quality metrics are evaluated on three stereoscopic video datasets. Results indicate that they achieve average correlations with subjective scores of 0.9257 (PLCC), 0.9338 and 0.9120 (SRCC), 0.8622 and 0.8306 (KRCC), and outperform previous stereoscopic video quality metrics including other recent HVS-based metrics.

    JY Guillemaut, AS Aguado, J Illingworth (2005)Using points at infinity for parameter decoupling in camera calibration, In: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE27(2)pp. 265-270 IEEE COMPUTER SOC
    Chathura Galkandage, Janko Calic, S Dogan, Jean-Yves Guillemaut (2017)Stereoscopic Video Quality Assessment Using Binocular Energy, In: Journal of Selected Topics in Signal Processing11(1)pp. 102-112 IEEE

    Stereoscopic imaging is becoming increasingly popular. However, to ensure the best quality of experience, there is a need to develop more robust and accurate objective metrics for stereoscopic content quality assessment. Existing stereoscopic image and video metrics are either extensions of conventional 2D metrics (with added depth or disparity information) or are based on relatively simple perceptual models. Consequently, they tend to lack the accuracy and robustness required for stereoscopic content quality assessment. This paper introduces full-reference stereoscopic image and video quality metrics based on a Human Visual System (HVS) model incorporating important physiological findings on binocular vision. The proposed approach is based on the following three contributions. First, it introduces a novel HVS model extending previous models to include the phenomena of binocular suppression and recurrent excitation. Second, an image quality metric based on the novel HVS model is proposed. Finally, an optimised temporal pooling strategy is introduced to extend the metric to the video domain. Both image and video quality metrics are obtained via a training procedure to establish a relationship between subjective scores and objective measures of the HVS model. The metrics are evaluated using publicly available stereoscopic image/video databases as well as a new stereoscopic video database. An extensive experimental evaluation demonstrates the robustness of the proposed quality metrics. This indicates a considerable improvement with respect to the state-of-the-art with average correlations with subjective scores of 0.86 for the proposed stereoscopic image metric and 0.89 and 0.91 for the proposed stereoscopic video metrics.

    M Sarim, A Hilton, JY Guillemaut (2009)Non-parametric Patch Based Video Matting, In: British Machine Vision Conference (BMVC)
    T Wang, J Guillemaut, J Collomosse (2010)Multi-label Propagation for Coherent Video Segmentation and Artistic Stylization, In: Proceedings of Intl. Conf. on Image Proc. (ICIP)pp. 3005-3008

    We present a new algorithm for segmenting video frames into temporally stable colored regions, applying our technique to create artistic stylizations (e.g. cartoons and paintings) from real video sequences. Our approach is based on a multilabel graph cut applied to successive frames, in which the color data term and label priors are incrementally updated and propagated over time. We demonstrate coherent segmentation and stylization over a variety of home videos.

    JY Guillemaut, O Drbohlav, R Sara, J Illingworth (2004)Helmholtz stereopsis on rough and strongly textured surfaces, In: Y Aloimonos, G Taubin (eds.), 2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGSpp. 10-17

    Helmholtz Stereopsis (HS) has recently been explored as a promising technique for capturing shape of objects with unknown reflectance. So far, it has been widely applied to objects of smooth geometry and piecewise uniform Bidirectional Reflectance Distribution Function (BRDF). Moreover, for nonconvex surfaces the inter-reflect ion effects have been completely neglected. We extend the method to surfaces which exhibit strong texture, nontrivial geometry and are possibly nonconvex. The problem associated with these surface features is that Helmholtz reciprocity is apparently violated when point-based measurements are used independently to establish the matching constraint as in the standard HS implementation. We argue that the problem is avoided by computing radiance measurements on image regions corresponding exactly to projections of the same surface point neighbourhood with appropriate scale. The experimental results demonstrate the success of the novel method proposed on real objects.

    M Fastovets, JY Guillemaut, A Hilton (2014)Estimating athlete pose from monocular tv sports footage71pp. 161-178

    © Springer International Publishing Switzerland 2014.Human pose estimation from monocular video streams is a challenging problem. Much of the work on this problem has focused on developing inference algorithms and probabilistic prior models based on learned measurements. Such algorithms face challenges in generalisation beyond the learned dataset.We propose an interactive model-based generative approach for estimating the human pose from uncalibratedmonocular video in unconstrained sportsTVfootage. Belief propagation over a spatio-temporal graph of candidate body part hypotheses is used to estimate a temporally consistent pose between user-defined keyframe constraints. Experimental results show that the proposed generative pose estimation framework is capable of estimating pose even in very challenging unconstrained scenarios.