Dr Jean-Yves Guillemaut


Senior Lecturer in 3D Computer Vision
MEng (hons), PhD, MIEEE, MBMVA, FHEA

Biography

Areas of specialism

3D Computer Vision; 3D Reconstruction; Computational Photography; Virtual and Augmented Reality; Lightfield Imaging; 3D Video; Artificial Intelligence

University roles and responsibilities

  • Senior Lecturer in 3D Computer Vision
  • CVSSP Postgraduate Research Director
  • Department Prizes Officer
  • Professional Training Tutor
  • MSc Personal Tutor

    My qualifications

    2014
    Graduate Certificate in Learning and Teaching
    University of Surrey
    2005
    PhD degree in 3D Computer Vision
    University of Surrey
    2001
    MEng degree (first class honours) with specialisation in Automatic Control and Robotics
    Ecole Centrale de Nantes

    Previous roles

    2012 - 2018
    Lecturer in 3D Computer Vision
    University of Surrey
    2012 - 2017
    CVSSP External Seminar Organiser
    University of Surrey
    2005 - 2012
    Research Fellow
    University of Surrey

    Research

    Research interests

    Research projects

    Research collaborations

    Indicators of esteem

    • Best Poster Award at European Conference on Visual Media Production (CVMP 2016)

    • Best Student Paper Award at Int. Conference on Computer Vision Theory and Applications (VISAPP 2014)

    • University of Surrey Faculty of Engineering and Physical Sciences Researcher of the Year Award (2012)

    • Honorable Mention for the Best Paper Award at ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2012

    • Best Poster Prize at EPSRC/BMVA Summer School on Computer Vision 2002

    Supervision

    Postgraduate research supervision

    Completed postgraduate research projects I have supervised

    My teaching

    My publications

    Publications

    Matthew Bailey, Jean-Yves Guillemaut (2020)A Novel Depth from Defocus Framework Based on a Thick Lens Camera Model, In: 2020 International Conference on 3D Vision (3DV)pp. 1206-1215 IEEE
    Reconstruction approaches based on monocular defocus analysis such as Depth from Defocus (DFD) often utilise the thin lens camera model. Despite this widespread adoption, there are inherent limitations associated with it. Coupled with invalid parameterisation commonplace in literature, the overly-simplified image formation it describes leads to inaccurate defocus modelling; especially in macro-scale scenes. As a result, DFD reconstructions based around this model are not geometrically consistent, and are typically restricted to single-view applications. Subsequently, the handful of existing approaches which attempt to include additional viewpoints have had only limited success.In this work, we address these issues by instead utilising a thick lens camera model, and propose a novel calibration procedure to accurately parameterise it. The effectiveness of our model and calibration is demonstrated with a novel DFD reconstruction framework. We achieve highly detailed, geometrically accurate and complete 3D models of real-world scenes from multi-view focal stacks. To our knowledge, this is the first time DFD has been successfully applied to complete scene modelling in this way.
    We propose a multi-view framework for joint object detection and labelling based on pairs of images. The proposed framework extends the single-view Mask R-CNN approach to multiple views without need for additional training. Dedicated components are embedded into the framework to match objects across views by enforcing epipolar constraints, appearance feature similarity and class coherence. The multi-view extension enables the proposed framework to detect objects which would otherwise be mis-detected in a classical Mask R-CNN approach, and achieves coherent object labelling across views. By avoiding the need for additional training, the approach effectively overcomes the current shortage of multi-view datasets. The proposed framework achieves high quality results on a range of complex scenes, being able to output class, bounding box, mask and an additional label enforcing coherence across views. In the evaluation, we show qualitative and quantitative results on several challenging outd oor multi-view datasets and perform a comprehensive comparison to verify the advantages of the proposed method
    J-Y Guillemaut, J Illingworth (2008)The normalised image of the absolute conic and its application for zooming camera calibration, In: PATTERN RECOGNITION41(12)pp. 3624-3635 PERGAMON-ELSEVIER SCIENCE LTD
    JY Guillemaut, AS Aguado, J Illingworth (2005)Using points at infinity for parameter decoupling in camera calibration, In: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE27(2)pp. 265-270 IEEE COMPUTER SOC
    J-Y Guillemaut, O Drbohlav, J Illingworth, R Sara (2008)A maximum likelihood surface normal estimation algorithm for Helmholtz stereopsis, In: VISAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2pp. 352-359
    JY Guillemaut, AS Aguado, J Illingworth (2003)Calibration of a zooming camera using the Normalized Image of the Absolute Conic, In: FOURTH INTERNATIONAL CONFERENCE ON 3-D DIGITAL IMAGING AND MODELING, PROCEEDINGSpp. 225-232
    JY Guillemaut, O Drbohlav, R Sara, J Illingworth (2004)Helmholtz stereopsis on rough and strongly textured surfaces, In: 2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGSpp. 10-17
    Helmholtz Stereopsis (HS) has recently been explored as a promising technique for capturing shape of objects with unknown reflectance. So far, it has been widely applied to objects of smooth geometry and piecewise uniform Bidirectional Reflectance Distribution Function (BRDF). Moreover, for nonconvex surfaces the inter-reflect ion effects have been completely neglected. We extend the method to surfaces which exhibit strong texture, nontrivial geometry and are possibly nonconvex. The problem associated with these surface features is that Helmholtz reciprocity is apparently violated when point-based measurements are used independently to establish the matching constraint as in the standard HS implementation. We argue that the problem is avoided by computing radiance measurements on image regions corresponding exactly to projections of the same surface point neighbourhood with appropriate scale. The experimental results demonstrate the success of the novel method proposed on real objects.
    J Guillemaut, A Aguado, J Illingworth (2002)Using points at infinity for parameter decoupling in camera calibration1pp. 263-272
    V Brujic-Okretic, J Guillemaut, L Hitchin, M Michielen, G Parker (2003)Remote vehicle manoeuvring using augmented realitypp. 186-189
    V Brujic-Okretic, J Guillemaut, L Hitchin, M Michielen, G Parker (2004)Real-time scene reconstruction for remote vehicle navigation, In: Geometric Modeling and Computing: Seattle 2003pp. 113-123
    Visual odometry, the process of tracking the trajectory of a moving camera based on its captured video is a fundamental problem behind autonomous mobile robotics and augmented reality applications. Yet, despite almost 40 years of extensive research on the problem, state-of-the-art systems are still vulnerable to several pitfalls that arise in challenging environments due to specific sensor limitations and restrictive assumptions. This thesis, in particular, investigates the use of RGB-D cameras for robust visual odometry in man-made environments, such as industrial plants. These spaces, contrary to natural environments, follow mainly a rectilinear structure made of simple geometric entities. Thus, this work exploits this structure by taking a feature-based approach, where lines, planes and cylinder segments are explicitly extracted as visual cues for egomotion estimation. While the depth captured by RGB-D cameras helps to resolve the ambiguity inherent of passive cameras especially on uniform and low textured surfaces, these active cameras suffer from several limitations, which may deteriorate the performance of RGB-D Odometry, such as, limited operating range, near-infrared light interference and systematic errors, leading to incomplete and noisy depth maps. To address these issues, we have first developed a visual odometry framework that leverages both depth measurements from active sensing and depth estimates from temporal stereo obtained via probabilistic filtering. Our experiments demonstrate that this framework is able to operate on large indoor and outdoor spaces, where the absence and inaccuracy of depth measurements is too high to rely just on RGB-D Odometry. Secondly, this thesis considers the depth sensor error by proposing a depth fusion framework based on Mixture of Gaussians to denoise the depth measurements and model their uncertainties through spatio-temporal observations. Extensive results on RGB-D sequences show that applying this depth model to RGB-D odometry improves significantly its performance and supports our hypothesis that the uncertainty of fused depth needs to be exposed. To fully exploit this probabilistic depth model, the depth uncertainty needs to be propagated throughout the visual odometry pipeline. Therefore, we reformulated the visual odometry system as a probabilistic process by (i) deriving plane and 3D line fitting solutions that model the uncertainties of the feature parameters and (ii) estimating the camera pose by combining different feature-type matches weighted by their respective uncertainties. Lastly, this thesis addresses man-made environments made also of smooth curved surfaces by proposing a curve-aware plane and cylinder extraction algorithm which is shown empirically to be more efficient and accurate than an alternative state-of-the-art plane extraction approach, leading ultimately to better visual odometry performance in scenes made of cylindrical surfaces. To incorporate this feature extractor in visual odometry, the system described above is extended to handle cylinder primitives.
    Nadejda Roubtsova, Jean-Yves Guillemaut (2020)Helmholtz Stereopsis Synthetic Dataset University of Surrey
    M Klaudiny, M Tejera, C Malleson, J-Y Guillemaut, A Hilton (2020)SCENE Digital Cinema Datasets University of Surrey
    M Sarim, A Hilton, Jean-Yves Guillemaut, Hansung Kim, T Takai (2010)Multiple view wide-baseline trimap propagation for natural video matting, In: Proc. European Conference on Visual Media Production (CVMP 2010)pp. 82-91
    This paper presents a method to estimate alpha mattes for video sequences of the same foreground scene from wide-baseline views given sparse key-frame trimaps in a single view. A statistical inference framework is introduced for spatio-temporal propagation of high-confidence trimap labels between video sequences without a requirement for correspondence or camera calibration and motion estimation. Multiple view trimap propagation integrates appearance information between views and over time to achieve robust labelling in the presence of shadows, changes in appearance with view point and overlap between foreground and background appearance. Results demonstrate that trimaps are sufficiently accurate to allow high-quality video matting using existing single view natural image matting algorithms. Quantitative evaluation against ground-truth demonstrates that the approach achieves accurate matte estimation for camera views separated by up to 180◦ , with the same amount of manual interaction required for conventional single view video matting.
    JJ Kilner, J Starck, A Hilton, JY Guillemaut, O Grau (2007)Dual Mode Deformable Models for Free-Viewpoint Video of Outdoor Sports Events, In: IEEE Int.Conf. on 3D Imaging and Modelingpp. 177-184
    Gianmarco Addari, Jean-Yves Guillemaut (2019)An MRF Optimisation Framework for Full 3D Helmholtz Steropsis, In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications Institute for Systems and Technologies of Information, Control and Communication (INSTICC)
    Accurate 3D modelling of real world objects is essential in many applications such as digital film production and cultural heritage preservation. However, current modelling techniques rely on assumptions to constrain the problem, effectively limiting the categories of scenes that can be reconstructed. A common assumption is that the scene’s surface reflectance is Lambertian or known a priori. These constraints rarely hold true in practice and result in inaccurate reconstructions. Helmholtz Stereopsis (HS) addresses this limitation by introducing a reflectance agnostic modelling constraint, but prior work in this area has been predominantly limited to 2.5D reconstruction, providing only a partial model of the scene. In contrast, this paper introduces the first Markov Random Field (MRF) optimisation framework for full 3D HS. First, an initial reconstruction is obtained by performing 2.5D MRF optimisation with visibility constraints from multiple viewpoints and fusing the different outputs. Then, a refined 3D model is obtained through volumetric MRF optimisation using a tailored Iterative Conditional Modes (ICM) algorithm. The proposed approach is evaluated with both synthetic and real data. Results show that the proposed full 3D optimisation significantly increases both geometric and normal accuracy, being able to achieve sub-millimetre precision. Furthermore, the approach is shown to be robust to occlusions and noise.
    M Sarim, A Hilton, JY Guillemaut (2009)Non-parametric Patch Based Video Matting, In: British Machine Vision Conference (BMVC)
    Charles Malleson, Jean-Yves Guillemaut, Adrian Hilton (2018)Hybrid modelling of non-rigid scenes from RGBD cameras, In: IEEE Transactions on Circuits and Systems for Video Technology IEEE
    Recent advances in sensor technology have introduced low-cost RGB video plus depth sensors, such as the Kinect, which enable simultaneous acquisition of colour and depth images at video rates. This paper introduces a framework for representation of general dynamic scenes from video plus depth acquisition. A hybrid representation is proposed which combines the advantages of prior surfel graph surface segmentation and modelling work with the higher-resolution surface reconstruction capability of volumetric fusion techniques. The contributions are (1) extension of a prior piecewise surfel graph modelling approach for improved accuracy and completeness, (2) combination of this surfel graph modelling with TSDF surface fusion to generate dense geometry, and (3) proposal of means for validation of the reconstructed 4D scene model against the input data and efficient storage of any unmodelled regions via residual depth maps. The approach allows arbitrary dynamic scenes to be efficiently represented with temporally consistent structure and enhanced levels of detail and completeness where possible, but gracefully falls back to raw measurements where no structure can be inferred. The representation is shown to facilitate creative manipulation of real scene data which would previously require more complex capture setups or manual processing.
    J-Y Guillemaut, J Kilner, A Hilton (2009)Robust Graph-Cut Scene Segmentation and Reconstruction for Free-Viewpoint Video of Complex Dynamic Scenes, In: IEEE Int.Conf. on Computer Vision, ICCVpp. 809-816
    M Sarim, A Hilton, Jean-Yves Guillemaut, T Takai, Hansung Kim (2010)Natural image matting for multiple wide-baseline views, In: Proceedings of 17th IEEE International Conference on Image Processing (ICIP)pp. 2233-2236
    In this paper we present a novel approach to estimate the alpha mattes of a foreground object captured by a widebaseline circular camera rig provided a single key frame trimap. Bayesian inference coupled with camera calibration information are used to propagate high confidence trimaps labels across the views. Recent techniques have been developed to estimate an alpha matte of an image using multiple views but they are limited to narrow baseline views with low foreground variation. The proposed wide-baseline trimap propagation is robust to inter-view foreground appearance changes, shadows and similarity in foreground/background appearance for cameras with opposing views enabling high quality alpha matte extraction using any state-of-the-art image matting algorithm.
    HE Imre, J-Y Guillemaut, ADM Hilton (2010)Moving Camera Registration for Multiple Camera Setups in Dynamic Scenes, In: Proceedings of the 21st British Machine Vision Conference
    Many practical applications require an accurate knowledge of the extrinsic calibration (____ie, pose) of a moving camera. The existing SLAM and structure-from-motion solutions are not robust to scenes with large dynamic objects, and do not fully utilize the available information in the presence of static cameras, a common practical scenario. In this paper, we propose an algorithm that addresses both of these issues for a hybrid static-moving camera setup. The algorithm uses the static cameras to build a sparse 3D model of the scene, with respect to which the pose of the moving camera is estimated at each time instant. The performance of the algorithm is studied through extensive experiments that cover a wide range of applications, and is shown to be satisfactory.
    Conventional stereoscopic video content production requires use of dedicated stereo camera rigs which is both costly and lacking video editing flexibility. In this paper, we propose a novel approach which only requires a small number of standard cameras sparsely located around a scene to automatically convert the monocular inputs into stereoscopic streams. The approach combines a probabilistic spatio-temporal segmentation framework with a state-of-the-art multi-view graph-cut reconstruction algorithm, thus providing full control of the stereoscopic settings at render time. Results with studio sequences of complex human motion demonstrate the suitability of the method for high quality stereoscopic content generation with minimum user interaction.
    JJ Kilner, J-Y Guillemaut, A Hilton (2009)Summarised Hierarchical Markov Models for Speed Invariant Action Matching., In: ICCV Workshop on Tracking Humans for the Evaluation of their Motion in Image Sequencespp. 1065-1072
    Action matching, where a recorded sequence is matched against, and synchronised with, a suitable proxy from a library of animations, is a technique for generating a synthetic representation of a recorded human activity. This proxy can then be used to represent the action in a virtual environment or as a prior on further processing of the sequence. In this paper we present a novel technique for performing action matching in outdoor sports environments. Outdoor sports broadcasts are typically multi-camera environments and as such reconstruction techniques can be applied to the footage to generate a 3D model of the scene. However due to poor calibration and matting this reconstruction is of a very low quality. Our technique matches the 3D reconstruction sequence against a predefined library of actions to select an appropriate high quality synthetic representation. A hierarchical Markov model combined with 3D summarisation of the data allows a large number of different actions to be matched successfully to the sequence in a rate-invariant manner without prior segmentation of the sequence into discrete units. The technique is applied to data captured at rugby and soccer games.
    Armin Mustafa, Marco Volino, Hansung Kim, Jean-Yves Guillemaut, Adrian Hilton (2020)Temporally coherent general dynamic scene reconstruction, In: International Journal of Computer Vision Springer
    Existing techniques for dynamic scene re- construction from multiple wide-baseline cameras pri- marily focus on reconstruction in controlled environ- ments, with fixed calibrated cameras and strong prior constraints. This paper introduces a general approach to obtain a 4D representation of complex dynamic scenes from multi-view wide-baseline static or moving cam- eras without prior knowledge of the scene structure, ap- pearance, or illumination. Contributions of the work are: An automatic method for initial coarse reconstruc- tion to initialize joint estimation; Sparse-to-dense tem- poral correspondence integrated with joint multi-view segmentation and reconstruction to introduce tempo- ral coherence; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes by introducing shape constraint. Com- parison with state-of-the-art approaches on a variety of complex indoor and outdoor scenes, demonstrates im- proved accuracy in both multi-view segmentation and dense reconstruction. This paper demonstrates unsuper- vised reconstruction of complete temporally coherent 4D scene models with improved non-rigid object seg- mentation and shape reconstruction and its application to various applications such as free-view rendering and virtual reality.
    Michaela Spiteri, Jean-Yves Guillemaut, David Windridge, Shivaram Avula, Ram Kumar, Emma Lewis (2019)Fully-Automated Identification of Imaging Biomarkers for Post-Operative Cerebellar Mutism Syndrome Using Longitudinal Paediatric MRI, In: Neuroinformatics18(1)pp. 151-162 Springer US
    Post-operative cerebellar mutism syndrome (POPCMS) in children is a post- surgical complication which occurs following the resection of tumors within the brain stem and cerebellum. High resolution brain magnetic resonance (MR) images acquired at multiple time points across a patient’s treatment allow the quantification of localized changes caused by the progression of this syndrome. However, MR images are not necessarily acquired at regular intervals throughout treatment and are often not volumetric. This restricts the analysis to 2D space and causes difficulty in intra- and inter-subject comparison. To address these challenges, we have developed an automated image processing and analysis pipeline. Multi-slice 2D MR image slices are interpolated in space and time to produce a 4D volumetric MR image dataset providing a longitudinal representation of the cerebellum and brain stem at specific time points across treatment. The deformations within the brain over time are represented using a novel metric known as the Jacobian of deformations determinant. This metric, together with the changing grey-level intensity of areas within the brain over time, are analyzed using machine learning techniques in order to identify biomarkers that correspond with the development of POPCMS following tumor resection. This study makes use of a fully automated approach which is not hypothesis-driven. As a result, we were able to automatically detect six potential biomarkers that are related to the development of POPCMS following tumor resection in the posterior fossa.
    ADM Hilton, Jean-Yves Guillemaut, JJ Kilner, O Grau, G Thomas (2011)3D-TV Production from Conventional Cameras for Sports Broadcast, In: IEEE Transactions Broadcasting57(2)pp. 462-476 IEEE
    3DTV production of live sports events presents a challenging problem involving conflicting requirements of main- taining broadcast stereo picture quality with practical problems in developing robust systems for cost effective deployment. In this paper we propose an alternative approach to stereo production in sports events using the conventional monocular broadcast cameras for 3D reconstruction of the event and subsequent stereo rendering. This approach has the potential advantage over stereo camera rigs of recovering full scene depth, allowing inter-ocular distance and convergence to be adapted according to the requirements of the target display and enabling stereo coverage from both existing and ‘virtual’ camera positions without additional cameras. A prototype system is presented with results of sports TV production trials for rendering of stereo and free-viewpoint video sequences of soccer and rugby.
    M Fastovets, JY Guillemaut, A Hilton (2014)Estimating athlete pose from monocular tv sports footage71pp. 161-178
    © Springer International Publishing Switzerland 2014.Human pose estimation from monocular video streams is a challenging problem. Much of the work on this problem has focused on developing inference algorithms and probabilistic prior models based on learned measurements. Such algorithms face challenges in generalisation beyond the learned dataset.We propose an interactive model-based generative approach for estimating the human pose from uncalibratedmonocular video in unconstrained sportsTVfootage. Belief propagation over a spatio-temporal graph of candidate body part hypotheses is used to estimate a temporally consistent pose between user-defined keyframe constraints. Experimental results show that the proposed generative pose estimation framework is capable of estimating pose even in very challenging unconstrained scenarios.
    E Imre, J-Y Guillemaut, A Hilton (2011)Calibration of nodal and free-moving cameras in dynamic scenes for post-production, In: Proceedings - 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, 3DIMPVT 2011pp. 260-267
    In film production, many post-production tasks require the availability of accurate camera calibration information. This paper presents an algorithm for through-the-lens calibration of a moving camera for a common scenario in film production and broadcasting: The camera views a dynamic scene, which is also viewed by a set of static cameras with known calibration. The proposed method involves the construction of a sparse scene model from the static cameras, with respect to which the moving camera is registered, by applying the appropriate perspective-n-point (PnP) solver. In addition to the general motion case, the algorithm can handle the nodal cameras with unknown focal length via a novel P2P algorithm. The approach can identify a subset of static cameras that are more likely to generate a high number of scene-image correspondences, and can robustly deal with dynamic scenes. Our target applications include dense 3D reconstruction, stereoscopic 3D rendering and 3D scene augmentation, through which the success of the algorithm is demonstrated experimentally.
    Chathura Vindana Perera Galkandage, Janko Calic, Safak Dogan, Jean-Yves Guillemaut Full-Reference Stereoscopic Video Quality Assessment Using a Motion Sensitive HVS Model, In: IEEE Transactions on Circuits and Systems for Video Technology Institute of Electrical and Electronics Engineers
    Stereoscopic video quality assessment has become a major research topic in recent years. Existing stereoscopic video quality metrics are predominantly based on stereoscopic image quality metrics extended to the time domain via for example temporal pooling. These approaches do not explicitly consider the motion sensitivity of the Human Visual System (HVS). To address this limitation, this paper introduces a novel HVS model inspired by physiological findings characterising the motion sensitive response of complex cells in the primary visual cortex (V1 area). The proposed HVS model generalises previous HVS models, which characterised the behaviour of simple and complex cells but ignored motion sensitivity, by estimating optical flow to measure scene velocity at different scales and orientations. The local motion characteristics (direction and amplitude) are used to modulate the output of complex cells. The model is applied to develop a new type of full-reference stereoscopic video quality metrics which uniquely combine non-motion sensitive and motion sensitive energy terms to mimic the response of the HVS. A tailored two-stage multi-variate stepwise regression algorithm is introduced to determine the optimal contribution of each energy term. The two proposed stereoscopic video quality metrics are evaluated on three stereoscopic video datasets. Results indicate that they achieve average correlations with subjective scores of 0.9257 (PLCC), 0.9338 and 0.9120 (SRCC), 0.8622 and 0.8306 (KRCC), and outperform previous stereoscopic video quality metrics including other recent HVS-based metrics.
    SK Hall, TH Williamson, Jean-Yves Guillemaut, T Goddard, AP Baumann, JC Hutter (2017)Modeling the Dynamics of Tamponade Multicomponent Gases During Retina Reattachment Surgery, In: AIChE Journal63(9)pp. 3651-3662 Wiley, for American Institute of Chemical Engineers
    Vitrectomy and pneumatic retinopexy are common surgical procedures used to treat retinal detachment. To reattach the retina, gases are used to inflate the vitreous space allowing the retina to attach by surface tension and buoyancy forces that are superior to the location of the bubble. These procedures require the injection of either a pure tamponade gas, such as C3F8 or SF6, or mixtures of these gases with air. The location of the retinal detachment, the anatomical spread of the retinal defect, and the length of time the defect has persisted, will determine the suggested volume and duration of the gas bubble to allow reattachment. After inflation, the gases are slowly absorbed by the blood allowing the vitreous to be refilled by aqueous. We have developed a model of the mass transfer dynamics of tamponade gases during pneumatic retinopexy or pars plana vitrectomy procedures. The model predicts the expansion and persistence of intraocular gases (C3F8, SF6), oxygen, nitrogen, and carbon dioxide, as well as the intraocular pressure. The model was validated using published literature in rabbits and humans. In addition to correlating the mass transfer dynamics by surface area, permeability, and partial pressure driving forces, the mass transfer dynamics are affected by the percentage of the tamponade gases. Rates were also correlated with the physical properties of the tamponade and blood gases. The model gave accurate predictions in humans.
    Caroline Scarles, Suzanne van Evan, Naomi Klepacz, Jean-Yves Guillemaut, Michael Humbracht Bringing The Outdoors Indoors: Immersive Experiences of Recreation in Nature and Coastal Environments in Residential Care Homes, In: E-review of Tourism Research Texas A&M AgriLife
    This paper critiques the opportunities afforded by immersive experience technology to create stimulating, innovative living environments for long-term residents of care homes for the elderly. We identify the ways in which virtual mobility can facilitate reconnection with recreational environments. Specifically, the project examines the potential of two assistive and immersive experiences; virtual reality (VR) and multisensory stimulation environments (MSSE). Findings identify three main areas of knowledge contribution. First, the introduction of VR and MSSE facilitated participants re-engagement and sharing of past experiences as they recalled past family holidays, day trips or everyday practices. Secondly, the combination of the hardware of the VR and MSSE technology with the physical objects of the sensory trays created alternative, multisensual ways of engaging with the experiences presented to participants. Lastly, the clear preference for the MSSE experience over the VR experience highlighted the importance of social interaction and exchange for participants.
    Mark Brown, David Windridge, Jean-Yves Guillemaut (2019)A family of globally optimal branch-and-bound algorithms for 2D–3D correspondence-free registration, In: Pattern Recognition93pp. 36-54 Elsevier
    We present a family of methods for 2D–3D registration spanning both deterministic and non-deterministic branch-and-bound approaches. Critically, the methods exhibit invariance to the underlying scene primitives, enabling e.g. points and lines to be treated on an equivalent basis, potentially enabling a broader range of problems to be tackled while maximising available scene information, all scene primitives being simultaneously considered. Being a branch-and-bound based approach, the method furthermore enjoys intrinsic guarantees of global optimality; while branch-and-bound approaches have been employed in a number of computer vision contexts, the proposed method represents the first time that this strategy has been applied to the 2D–3D correspondence-free registration problem from points and lines. Within the proposed procedure, deterministic and probabilistic procedures serve to speed up the nested branch-and-bound search while maintaining optimality. Experimental evaluation with synthetic and real data indicates that the proposed approach significantly increases both accuracy and robustness compared to the state of the art.
    J-Y Guillemaut, A Hilton (2011)Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications, In: International Journal of Computer Vision93(1)pp. 73-100 Springer
    Mark Brown, David Windridge, Jean-Yves Guillemaut (2016)A Generalised Framework for Saliency-Based Point Feature Detection, In: Computer Vision and Image Understanding157pp. 117-137 Elsevier
    Here we present a novel, histogram-based salient point feature detector that may naturally be applied to both images and 3D data. Existing point feature detectors are often modality specific, with 2D and 3D feature detectors typically constructed in separate ways. As such, their applicability in a 2D-3D context is very limited, particularly where the 3D data is obtained by a LiDAR scanner. By contrast, our histogram-based approach is highly generalisable and as such, may be meaningfully applied between 2D and 3D data. Using the generalised approach, we propose salient point detectors for images, and both untextured and textured 3D data. The approach naturally allows for the detection of salient 3D points based jointly on both the geometry and texture of the scene, allowing for broader applicability. The repeatability of the feature detectors is evaluated using a range of datasets including image and LiDAR input from indoor and outdoor scenes. Experimental results demonstrate a significant improvement in terms of 2D-2D and 2D-3D repeatability compared to existing multi-modal feature detectors.
    Tom H. Williamson, Jean-Yves Guillemaut, Sheldon K. Hall, Joseph C. Hutter, Tony Goddard (2018)Theoretical gas concentrations achieving 100% fill of the vitreous cavity in the postoperative period, a gas eye model study (GEMS), In: RETINA, The Journal of Retinal and Vitreous Diseases38pp. S60-S64 Lippincott, Williams & Wilkins
    Precis. A mathematical model is described of the physical properties of intraocular gases providing a guide to the correct gas concentrations to achieve 100% fill of the vitreous cavity postoperatively. A table for the instruction of surgeons is provided and the effects of different axial lengths examined.


    ABSTRACT

    Purpose – To determine the concentrations of different gas tamponades in air to achieve 100% fill of the vitreous cavity postoperatively and to examine the influence of eye volume on these concentrations.

    Methods – A mathematical model of the mass transfer dynamics of tamponade and blood gases (O2, N2, CO2) when injected into the eye was used. Mass transfer surface areas were calculated from published anatomical data. The model has been calibrated from published volumetric decay and composition results for three gases sulphahexafluoride, SF6, hexafluoroethane, C2F6, or perfluoropropane, C3F8. The concentrations of these gases (in air) required to achieve 100% fill of the vitreous cavity postoperatively without an intra-ocular pressure rise were determined. The concentrations were calculated for three volumes of the vitreous cavity to test if ocular size influenced the results.

    Results – A table of gas concentrations was produced. In a simulation of pars plana vitrectomy operations in which an 80% to 85% fill of the vitreous cavity with gas was achieved at surgery, the concentrations of the three gases in air to achieve 100% fill postoperatively were 10-13% for C3F8, 12-15% for C2F6 and 19-25% for SF6. These were similar to the so-called ''non-expansive'' concentrations used in the clinical setting. The calculations were repeated for three different sizes of eye. Aiming for an 80% fill at surgery and 100% postoperatively, an eye with a 4ml vitreous cavity required 24% SF6, 15% C2F6 or 13% C3F8; 7.2ml required 25% SF6, 15% C2F6 or 13% C3F8; and 10ml required 25% SF6, 16% C2F6 or 13% C3F8. When using 100% gas (for example, employed in pneumatic retinopexy), in order to achieve 100% fill postoperatively, the minimum vitreous cavity fill at surgery was 43% for SF6, 29% for C2F6 and 25% for C3F8 and was only minimally changed by variation in the size of the eye.

    Conclusions – A table has been produced which could be used for surgical innovation in gas usage in the vitreous cavity. It provides concentrations for different percentage fills, which will achieve a moment post-operatively with a full fill of the cavity without a pressure rise. Variation in axial length and size of the eye does not appear to alter the values in the table significantly. Those using pneumatic retinopexy need to increase the volume of gas injected with increased size of the eye in order to match the percentage fill of the vitreous cavity recommended for a given tamponade agent.
    Armin Mustafa, Hansung Kim, Jean-Yves Guillemaut, Adrian Hilton (2015)General Dynamic Scene Reconstruction from Multiple View Video, In: 2015 IEEE International Conference on Computer Vision (ICCV)pp. 900-908 IEEE
    This paper introduces a general approach to dynamic scene reconstruction from multiple moving cameras without prior knowledge or limiting constraints on the scene structure, appearance, or illumination. Existing techniques for dynamic scene reconstruction from multiple wide-baseline camera views primarily focus on accurate reconstruction in controlled environments, where the cameras are fixed and calibrated and background is known. These approaches are not robust for general dynamic scenes captured with sparse moving cameras. Previous approaches for outdoor dynamic scene reconstruction assume prior knowledge of the static background appearance and structure. The primary contributions of this paper are twofold: an automatic method for initial coarse dynamic scene segmentation and reconstruction without prior knowledge of background appearance or structure; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes from multiple wide-baseline static or moving cameras. Evaluation is performed on a variety of indoor and outdoor scenes with cluttered backgrounds and multiple dynamic non-rigid objects such as people. Comparison with state-of-the-art approaches demonstrates improved accuracy in both multiple view segmentation and dense reconstruction. The proposed approach also eliminates the requirement for prior knowledge of scene structure and appearance.
    Nadejda Roubtsova, Jean-Yves Guillemaut (2017)Bayesian Helmholtz Stereopsis with Integrability Prior, In: IEEE Transactions on Pattern Analysis and Machine Intelligence40(9)pp. 2265-2272 Institute of Electrical and Electronics Engineers (IEEE)
    Helmholtz Stereopsis is a 3D reconstruction method uniquely independent of surface reflectance. Yet, its sub-optimal maximum likelihood formulation with drift-prone normal integration limits performance. Via three contributions this paper presents a complete novel pipeline for Helmholtz Stereopsis. Firstly, we propose a Bayesian formulation replacing the maximum likelihood problem by a maximum a posteriori one. Secondly, a tailored prior enforcing consistency between depth and normal estimates via a novel metric related to optimal surface integrability is proposed. Thirdly, explicit surface integration is eliminated by taking advantage of the accuracy of prior and high resolution of the coarse-to-fine approach. The pipeline is validated quantitatively and qualitatively against alternative formulations, reaching sub-millimetre accuracy and coping with complex geometry and reflectance.
    J Imber, M Volino, J-Y Guillemaut, S Fenney, A Hilton (2013)Free-viewpoint video rendering for mobile devices., In: MIRAGEpp. 11:1-11:1
    M Fastovets, J-Y Guillemaut, A Hilton (2013)Athlete Pose Estimation from Monocular TV Sports Footage, In: 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW)pp. 1048-1054
    J-Y Guillemaut, A Hilton (2012)Space-Time Joint Multi-Layer Segmentation and Depth Estimation, In: SECOND JOINT 3DIM/3DPVT CONFERENCE: 3D IMAGING, MODELING, PROCESSING, VISUALIZATION & TRANSMISSION (3DIMPVT 2012)pp. 440-447
    D Casas, M Tejera, Jean-Yves Guillemaut, A Hilton (2012)4D parametric motion graphs for interactive animation, In: I3D '12 Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Gamespp. 103-110 ACM
    A 4D parametric motion graph representation is presented for interactive animation from actor performance capture in a multiple camera studio. The representation is based on a 4D model database of temporally aligned mesh sequence reconstructions for multiple motions. High-level movement controls such as speed and direction are achieved by blending multiple mesh sequences of related motions. A real-time mesh sequence blending approach is introduced which combines the realistic deformation of previous non-linear solutions with efficient online computation. Transitions between different parametric motion spaces are evaluated in real-time based on surface shape and motion similarity. 4D parametric motion graphs allow real-time interactive character animation while preserving the natural dynamics of the captured performance. © 2012 ACM.
    JJ Kilner, J-Y Guillemaut, A Hilton (2009)3D Action Matching with Key-Pose Detection, In: IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops)pp. 1-8
    This paper addresses the problem of human action matching in outdoor sports broadcast environments, by analysing 3D data from a recorded human activity and retrieving the most appropriate proxy action from a motion capture library. Typically pose recognition is carried out using images from a single camera, however this approach is sensitive to occlusions and restricted fields of view, both of which are common in the outdoor sports environment. This paper presents a novel technique for the automatic matching of human activities which operates on the 3D data available in a multi-camera broadcast environment. Shape is retrieved using multi-camera techniques to generate a 3D representation of the scene. Use of 3D data renders the system camera-pose-invariant and allows it to work while cameras are moving and zooming. By comparing the reconstructions to an appropriate 3D library, action matching can be achieved in the presence of significant calibration and matting errors which cause traditional pose detection schemes to fail. An appropriate feature descriptor and distance metric are presented as well as a technique to use these features for key-pose detection and action matching. The technique is then applied to real footage captured at an outdoor sporting event
    Charles Malleson, Jean-Yves Guillemaut, Adrian Hilton (2019)3D Reconstruction from RGB-D Data, In: RGB-D Image Analysis and Processingpp. pp 87-115 Springer Nature Switzerland AG
    A key task in computer vision is that of generating virtual 3D models of real-world scenes by reconstructing the shape, appearance and, in the case of dynamic scenes, motion of the scene from visual sensors. Recently, low-cost video plus depth (RGB-D) sensors have become widely available and have been applied to 3D reconstruction of both static and dynamic scenes. RGB-D sensors contain an active depth sensor, which provides a stream of depth maps alongside standard colour video. The low cost and ease of use of RGB-D devices as well as their video rate capture of images along with depth make them well suited to 3D reconstruction. Use of active depth capture overcomes some of the limitations of passive monocular or multiple-view video-based approaches since reliable, metrically accurate estimates of the scene depth at each pixel can be obtained from a single view, even in scenes that lack distinctive texture. There are two key components to 3D reconstruction from RGB-D data: (1) spatial alignment of the surface over time and, (2) fusion of noisy, partial surface measurements into a more complete, consistent 3D model. In the case of static scenes, the sensor is typically moved around the scene and its pose is estimated over time. For dynamic scenes, there may be multiple rigid, articulated, or non-rigidly deforming surfaces to be tracked over time. The fusion component consists of integration of the aligned surface measurements, typically using an intermediate representation, such as the volumetric truncated signed distance field (TSDF). In this chapter, we discuss key recent approaches to 3D reconstruction from depth or RGB-D input, with an emphasis on real-time reconstruction of static scenes.
    Chathura Galkandage, Janko Calic, S Dogan, Jean-Yves Guillemaut (2017)Stereoscopic Video Quality Assessment Using Binocular Energy, In: Journal of Selected Topics in Signal Processing11(1)pp. 102-112 IEEE
    Stereoscopic imaging is becoming increasingly popular. However, to ensure the best quality of experience, there is a need to develop more robust and accurate objective metrics for stereoscopic content quality assessment. Existing stereoscopic image and video metrics are either extensions of conventional 2D metrics (with added depth or disparity information) or are based on relatively simple perceptual models. Consequently, they tend to lack the accuracy and robustness required for stereoscopic content quality assessment. This paper introduces full-reference stereoscopic image and video quality metrics based on a Human Visual System (HVS) model incorporating important physiological findings on binocular vision. The proposed approach is based on the following three contributions. First, it introduces a novel HVS model extending previous models to include the phenomena of binocular suppression and recurrent excitation. Second, an image quality metric based on the novel HVS model is proposed. Finally, an optimised temporal pooling strategy is introduced to extend the metric to the video domain. Both image and video quality metrics are obtained via a training procedure to establish a relationship between subjective scores and objective measures of the HVS model. The metrics are evaluated using publicly available stereoscopic image/video databases as well as a new stereoscopic video database. An extensive experimental evaluation demonstrates the robustness of the proposed quality metrics. This indicates a considerable improvement with respect to the state-of-the-art with average correlations with subjective scores of 0.86 for the proposed stereoscopic image metric and 0.89 and 0.91 for the proposed stereoscopic video metrics.
    T Wang, J Guillemaut, J Collomosse (2010)Multi-label Propagation for Coherent Video Segmentation and Artistic Stylization, In: Proceedings of Intl. Conf. on Image Proc. (ICIP)pp. 3005-3008
    We present a new algorithm for segmenting video frames into temporally stable colored regions, applying our technique to create artistic stylizations (e.g. cartoons and paintings) from real video sequences. Our approach is based on a multilabel graph cut applied to successive frames, in which the color data term and label priors are incrementally updated and propagated over time. We demonstrate coherent segmentation and stylization over a variety of home videos.
    M Sarim, A Hilton, J-Y Guillemaut (2011)TEMPORAL TRIMAP PROPAGATION FOR VIDEO MATTING USING INFERENTIAL STATISTICS, In: 2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)pp. 1745-1748
    A Mustafa, H Kim, J-Y Guillemaut, ADM Hilton (2016)Temporally coherent 4D reconstruction of complex dynamic scenes, In: CVPR 2016 Proceedings
    This paper presents an approach for reconstruction of 4D temporally coherent models of complex dynamic scenes. No prior knowledge is required of scene structure or camera calibration allowing reconstruction from multiple moving cameras. Sparse-to-dense temporal correspondence is integrated with joint multi-view segmentation and reconstruction to obtain a complete 4D representation of static and dynamic objects. Temporal coherence is exploited to overcome visual ambiguities resulting in improved reconstruction of complex scenes. Robust joint segmentation and reconstruction of dynamic objects is achieved by introducing a geodesic star convexity constraint. Comparative evaluation is performed on a variety of unstructured indoor and outdoor dynamic scenes with hand-held cameras and multiple people. This demonstrates reconstruction of complete temporally coherent 4D scene models with improved nonrigid object segmentation and shape reconstruction.
    N Roubtsova, Jean-Yves Guillemaut (2015)Colour Helmholtz Stereopsis for reconstruction of complex dynamic scenes, In: Proceedings - 2014 International Conference on 3D Vision, 3DV 2014pp. 251-258
    Helmholtz Stereopsis (HS) is a powerful technique for reconstruction of scenes with arbitrary reflectance properties. However, previous formulations have been limited to static objects due to the requirement to sequentially capture reciprocal image pairs (i.e. two images with the camera and light source positions mutually interchanged). In this paper, we propose colour HS-a novel variant of the technique based on wavelength multiplexing. To address the new set of challenges introduced by multispectral data acquisition, the proposed novel pipeline for colour HS uniquely combines a tailored photometric calibration for multiple camera/light source pairs, a novel procedure for surface chromaticity calibration and the state-of-the-art Bayesian HS suitable for reconstruction from a minimal number of reciprocal pairs. Experimental results including quantitative and qualitative evaluation demonstrate that the method is suitable for flexible (single-shot) reconstruction of static scenes and reconstruction of dynamic scenes with complex surface reflectance properties.
    JY Guillemaut, A Hilton, J Starck, JJ Kilner, O Grau (2007)A Baysian Framework for Simultaneous Reconstruction and Matting, In: IEEE Int.Conf. on 3D Imaging and Modelingpp. 167-176
    H Kim, Jean-Yves Guillemaut, T Takai, M Sarim, A Hilton (2012)Outdoor Dynamic 3D Scene Reconstruction, In: IEEE Transactions on Circuits and Systems for Video Technology22(11)pp. 1611-1622 IEEE
    Existing systems for 3D reconstruction from multiple view video use controlled indoor environments with uniform illumination and backgrounds to allow accurate segmentation of dynamic foreground objects. In this paper we present a portable system for 3D reconstruction of dynamic outdoor scenes which require relatively large capture volumes with complex backgrounds and non-uniform illumination. This is motivated by the demand for 3D reconstruction of natural outdoor scenes to support film and broadcast production. Limitations of existing multiple view 3D reconstruction techniques for use in outdoor scenes are identified. Outdoor 3D scene reconstruction is performed in three stages: (1) 3D background scene modelling using spherical stereo image capture; (2) multiple view segmentation of dynamic foreground objects by simultaneous video matting across multiple views; and (3) robust 3D foreground reconstruction and multiple view segmentation refinement in the presence of segmentation and calibration errors. Evaluation is performed on several outdoor productions with complex dynamic scenes including people and animals. Results demonstrate that the proposed approach overcomes limitations of previous indoor multiple view reconstruction approaches enabling high-quality free-viewpoint rendering and 3D reference models for production.
    M Brown, D Windridge, JY Guillemaut (2015)A generalisable framework for saliency-based line segment detection, In: Pattern Recognition48(12)pp. 3993-4011 Elsevier
    © 2015 The Authors. Here we present a novel, information-theoretic salient line segment detector. Existing line detectors typically only use the image gradient to search for potential lines. Consequently, many lines are found, particularly in repetitive scenes. In contrast, our approach detects lines that define regions of significant divergence between pixel intensity or colour statistics. This results in a novel detector that naturally avoids the repetitive parts of a scene while detecting the strong, discriminative lines present. We furthermore use our approach as a saliency filter on existing line detectors to more efficiently detect salient line segments. The approach is highly generalisable, depending only on image statistics rather than image gradient; and this is demonstrated by an extension to depth imagery. Our work is evaluated against a number of other line detectors and a quantitative evaluation demonstrates a significant improvement over existing line detectors for a range of image transformations.
    J Imber, J-Y Guillemaut, A Hilton (2014)Intrinsic textures for relightable free-viewpoint video, In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)8690 L(PART 2)pp. 392-407
    This paper presents an approach to estimate the intrinsic texture properties (albedo, shading, normal) of scenes from multiple view acquisition under unknown illumination conditions. We introduce the concept of intrinsic textures, which are pixel-resolution surface textures representing the intrinsic appearance parameters of a scene. Unlike previous video relighting methods, the approach does not assume regions of uniform albedo, which makes it applicable to richly textured scenes. We show that intrinsic image methods can be used to refine an initial, low-frequency shading estimate based on a global lighting reconstruction from an original texture and coarse scene geometry in order to resolve the inherent global ambiguity in shading. The method is applied to relighting of free-viewpoint rendering from multiple view video capture. This demonstrates relighting with reproduction of fine surface detail. Quantitative evaluation on synthetic models with textured appearance shows accurate estimation of intrinsic surface reflectance properties. © 2014 Springer International Publishing.
    C Malleson, M Klaudiny, J-Y Guillemaut, A Hilton (2014)Structured Representation of Non-Rigid Surfaces from Single View 3D Point Tracks., In: 3DVpp. 625-632
    C Malleson, M Klaudiny, A Hilton, J-Y Guillemaut (2013)Single-view RGBD-based reconstruction of dynamic human geometry, In: Proceedings of the IEEE International Conference on Computer Vision - Workshop on Dynamic Shape Capture and Analysis (4DMOD 2013)pp. 307-314
    We present a method for reconstructing the geometry and appearance of indoor scenes containing dynamic human subjects using a single (optionally moving) RGBD sensor. We introduce a framework for building a representation of the articulated scene geometry as a set of piecewise rigid parts which are tracked and accumulated over time using moving voxel grids containing a signed distance representation. Data association of noisy depth measurements with body parts is achieved by online training of a prior shape model for the specific subject. A novel frame-to-frame model registration is introduced which combines iterative closest-point with additional correspondences from optical flow and prior pose constraints from noisy skeletal tracking data. We quantitatively evaluate the reconstruction and tracking performance of the approach using a synthetic animated scene. We demonstrate that the approach is capable of reconstructing mid-resolution surface models of people from low-resolution noisy data acquired from a consumer RGBD camera. © 2013 IEEE.
    M Fastovets, J-Y Guillemaut, A Hilton (2014)Athlete pose estimation by non-sequential key-frame propagation., In: CVMPpp. 3:1-3:1
    A Hilton, Jean-Yves Guillemaut, J Kilner, O Grau, G Thomas (2010)Free-Viewpoint Video for TV Sport Production, In: Image and Geometry Processing for 3-D Cinematography5 Springer
    J-Y Guillemaut, J Kittler, MT Sadeghi, WJ Christmas (2006)General pose face recognition using frontal face model, In: PROGRESS IN PATTERN RECOGNITON, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS4225pp. 79-88
    J Kilner, J Starck, Jean-Yves Guillemaut, A Hilton (2009)Objective Quality Assessment in Free-viewpoint Video Production, In: Signal Processing: Image Communication24(1-2)pp. 3-16 Elsevier
    Jean-Yves Guillemaut, J Kilner, J Starck, Adrian Hilton (2007)Dynamic feathering: Minimising blending artefacts in view-dependent rendering, In: IET Conference Publications534(534 CP)
    Conventional view-dependent texture mapping techniques produce composite images by blending subsets of input images, weighted according to their relative influence at the rendering viewpoint, over regions where the views overlap. Geometric or camera calibration errors often result in a los s of detail due to blurring or double exposure artefacts which tends to be exacerbated by the number of blending views considered. We propose a novel view-dependent rendering technique which optimises the blend region dynamically at rendering time, and reduces the adverse effects of camera calibration or geometric errors otherwise observed. The technique has been successfully integrated in a rendering pipeline which operates at interactive frame rates. Improvement over state-of-the-art view-dependent texture mapping techniques are illustrated on a synthetic scene as well as real imagery of a large scale outdoor scene where large camera calibration and geometric errors are present.
    C Budd, J Guillemaut, M Klaudiny, A Hilton (2012)Scene Modelling for Richer Media Content
    M Brown, Jean-Yves Guillemaut, D Windridge (2015)A Saliency-based Framework for 2D-3D Registration, In: Proc. International Conference on Computer Vision Theory and Applications (VISAPP 2014)
    Here we propose a saliency-based filtering approach to the problem of registering an untextured 3D object to a single monocular image. The principle of saliency can be applied to a range of modalities and domains to find intrinsically descriptive entities from amongst detected entities, making it a rigorous approach to multi-modal registration. We build on the Kadir-Brady saliency framework due to its principled information-theoretic approach which enables us to naturally extend it to the 3D domain. The salient points from each domain are initially aligned using the SoftPosit algorithm. This is subsequently refined by aligning the silhouette with contours extracted from the image. Whereas other point based registration algorithms focus on corners or straight lines, our saliency-based approach is more general as it is more widely applicable e.g. to curved surfaces where a corner detector would fail. We compare our salient point detector to the Harris corner and SIFT keypoint detectors and show it generally achieves superior registration accuracy
    M Sarim, A Hilton, J-Y Guillemaut (2009)Alpha Matte Estimation of Natural Images Using Local and Global Template Correspondence, In: ICET: 2009 INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES, PROCEEDINGSpp. 229-234
    M Sarim, A Hilton, Jean-Yves Guillemaut (2009)WIDE-BASELINE MATTE PROPAGATION FOR INDOOR SCENES, In: 2009 CONFERENCE FOR VISUAL MEDIA PRODUCTION: CVMP 2009pp. 195-204
    This paper presents a method to estimate alpha mattes for video sequences of the same foreground scene from wide-baseline views given sparse key-frame trimaps in a single view. A statistical inference framework is introduced for spatio-temporal propagation of high-confidence trimap labels between video sequences without a requirement for correspondence or camera calibration and motion estimation. Multiple view trimap propagation integrates appearance information between views and over time to achieve robust labelling in the presence of shadows, changes in appearance with view point and overlap between foreground and background appearance. Results demonstrate that trimaps are sufficiently accurate to allow high-quality video matting using existing single view natural image matting algorithms. Quantitative evaluation against ground-truth demonstrates that the approach achieves accurate matte estimation for camera views separated by up to 180◦ , with the same amount of manual interaction required for conventional single view video matting.
    J Kilner, J-Y Guillemaut, A Hilton (2010)Summarised hierarchical Markov models for speed-invariant action matching, In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009pp. 1065-1072
    Action matching, where a recorded sequence is matched against, and synchronised with, a suitable proxy from a library of animations, is a technique for generating a synthetic representation of a recorded human activity. This proxy can then be used to represent the action in a virtual environment or as a prior on further processing of the sequence. In this paper we present a novel technique for performing action matching in outdoor sports environments. Outdoor sports broadcasts are typically multi-camera environments and as such reconstruction techniques can be applied to the footage to generate a 3D model of the scene. However due to poor calibration and matting this reconstruction is of a very low quality. Our technique matches the 3D reconstruction sequence against a predefined library of actions to select an appropriate high quality synthetic representation. A hierarchical Markov model combined with 3D summarisation of the data allows a large number of different actions to be matched successfully to the sequence in a rate-invariant manner without prior segmentation of the sequence into discrete units. The technique is applied to data captured at rugby and soccer games. ©2009 IEEE.
    M Sarim, A Hilton, J Guillemaut (2009)Non-parametric patch based video matting
    In computer vision, matting is the process of accurate foreground estimation in images and videos. In this paper we presents a novel patch based approach to video matting relying on non-parametric statistics to represent image variations in appearance. This overcomes the limitation of parametric algorithms which only rely on strong colour correlation between the nearby pixels. Initially we construct a clean background by utilising the foreground object’s movement across the background. For a given frame, a trimap is constructed using the background and the last frame’s trimap. A patch-based approach is used to estimate the foreground colour for every unknown pixel and finally the alpha matte is extracted. Quantitative evaluation shows that the technique performs better, in terms of the accuracy and the required user interaction, than the current state-of-the-art parametric approaches.
    D Casas, M Tejera, J-Y Guillemaut, A Hilton (2011)Parametric control of captured mesh sequences for real-time animation, In: Lecture Notes in Computer Science: Motion in Games7060pp. 242-253
    In this paper we introduce an approach to high-level parameterisation of captured mesh sequences of actor performance for real-time interactive animation control. High-level parametric control is achieved by non-linear blending between multiple mesh sequences exhibiting variation in a particular movement. For example walking speed is parameterised by blending fast and slow walk sequences. A hybrid non-linear mesh sequence blending approach is introduced to approximate the natural deformation of non-linear interpolation techniques whilst maintaining the real-time performance of linear mesh blending. Quantitative results show that the hybrid approach gives an accurate real-time approximation of offline non-linear deformation. Results are presented for single and multi-dimensional parametric control of walking (speed/direction), jumping (heigh/distance) and reaching (height) from captured mesh sequences. This approach allows continuous real-time control of high-level parameters such as speed and direction whilst maintaining the natural surface dynamics of captured movement.
    Armin Mustafa, Marco Volino, Jean-Yves Guillemaut, Adrian Hilton (2018)4D Temporally Coherent Light-field Video, In: 3DV 2017 Proceedings IEEE
    Light-field video has recently been used in virtual and augmented reality applications to increase realism and immersion. However, existing light-field methods are generally limited to static scenes due to the requirement to acquire a dense scene representation. The large amount of data and the absence of methods to infer temporal coherence pose major challenges in storage, compression and editing compared to conventional video. In this paper, we propose the first method to extract a spatio-temporally coherent light-field video representation. A novel method to obtain Epipolar Plane Images (EPIs) from a spare lightfield camera array is proposed. EPIs are used to constrain scene flow estimation to obtain 4D temporally coherent representations of dynamic light-fields. Temporal coherence is achieved on a variety of light-field datasets. Evaluation of the proposed light-field scene flow against existing multiview dense correspondence approaches demonstrates a significant improvement in accuracy of temporal coherence.
    D Casas, M Tejera, J-Y Guillemaut, A Hilton (2012)Parametric animation of performance-captured mesh sequences, In: COMPUTER ANIMATION AND VIRTUAL WORLDS23(2)pp. 101-111 WILEY-BLACKWELL
    M Sarim, A Hilton, J-Y Guillemaut, H Kim Non-parametric Natural Image Mattingpp. 3213-3216
    Nadejda Roubtsova, Jean-Yves Guillemaut (2016)Colour Helmholtz Stereopsis for Reconstruction of Dynamic Scenes with Arbitrary Unknown Reflectance, In: International Journal of Computer Vision124pp. 18-48 Springer
    Helmholtz Stereopsis is a powerful technique for reconstruction of scenes with arbitrary re ectance properties. However, previous formulations have been limited to static objects due to the requirement to se- quentially capture reciprocal image pairs (i.e. two im- ages with the camera and light source positions mu- tually interchanged). In this paper, we propose Colour Helmholtz Stereopsis - a novel framework for Helmholtz Stereopsis based on wavelength multiplexing. To ad- dress the new set of challenges introduced by multispec- tral data acquisition, the proposed Colour Helmholtz Stereopsis pipeline uniquely combines a tailored pho- tometric calibration for multiple camera/light source pairs, a novel procedure for spatio-temporal surface chromaticity calibration and a state-of-the-art Bayesian formulation necessary for accurate reconstruction from a minimal number of reciprocal pairs. In this frame- work, re ectance is spatially unconstrained both in terms of its chromaticity and the directional component dependent on the illumination incidence and viewing angles. The proposed approach for the rst time en- ables modelling of dynamic scenes with arbitrary un- known and spatially varying re ectance using a practi- cal acquisition set-up consisting of a small number of cameras and light sources. Experimental results demon- strate the accuracy and exibility of the technique on a variety of static and dynamic scenes with arbitrary un- known BRDF and chromaticity ranging from uniform to arbitrary and spatially varying.
    M Sarim, A Hilton, J-Y Guillemaut, H Kim, T Takai (2010)Wide-Baseline Multi-View Video Segmentation For 3D Reconstruction, In: Proceedings of the 1st international workshop on 3D video processingpp. 13-16
    Obtaining a foreground silhouette across multiple views is one of the fundamental steps in 3D reconstruction. In this paper we present a novel video segmentation approach, to obtain a foreground silhouette, for scenes captured by a wide-baseline camera rig given a sparse manual interaction in a single view. The algorithm is based on trimap propagation, a framework used in video matting. Bayesian inference coupled with camera calibration information are used to spatio-temporally propagate high confidence trimap labels across the multi-view video to obtain coarse silhouettes which are later refined using a matting algorithm. Recent techniques have been developed for foreground segmentation, based on image matting, in multiple views but they are limited to narrow baseline with low foreground variation. The proposed wide-baseline silhouette propagation is robust to inter-view foreground appearance changes, shadows and similarity in foreground/background appearance. The approach has demonstrated good performance in silhouette estimation for views up to 180 degree baseline (opposing views). The segmentation technique has been fully integrated in a multi-view reconstruction pipeline. The results obtained demonstrate the suitability of the technique for multi-view reconstruction with wide-baseline camera set-ups and natural background.
    J Kilner, J-Y Guillemaut, A Hilton (2010)3D action matching with key-pose detection, In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009pp. 1-8
    This paper addresses the problem of human action matching in outdoor sports broadcast environments, by analysing 3D data from a recorded human activity and retrieving the most appropriate proxy action from a motion capture library. Typically pose recognition is carried out using images from a single camera, however this approach is sensitive to occlusions and restricted fields of view, both of which are common in the outdoor sports environment. This paper presents a novel technique for the automatic matching of human activities which operates on the 3D data available in a multi-camera broadcast environment. Shape is retrieved using multi-camera techniques to generate a 3D representation of the scene. Use of 3D data renders the system camera-pose-invariant and allows it to work while cameras are moving and zooming. By comparing the reconstructions to an appropriate 3D library, action matching can be achieved in the presence of significant calibration and matting errors which cause traditional pose detection schemes to fail. An appropriate feature descriptor and distance metric are presented as well as a technique to use these features for key-pose detection and action matching. The technique is then applied to real footage captured at an outdoor sporting event. ©2009 IEEE.
    A Neophytou, J-Y Guillemaut, A Hilton (2015)A dense surface motion capture system for accurate acquisition of cloth deformation, In: CVMP 2015: PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON VISUAL MEDIA PRODUCTION
    M Brown, D Windbridge, J Guillemaut (2015)Globally Optimal 2D-3D Registration from Points or Lines Without Correspondences, In: Proceedings of International Conference on Computer Vision (ICCV 2015)
    We present a novel approach to 2D-3D registration from points or lines without correspondences. While there exist established solutions in the case where correspondences are known, there are many situations where it is not possible to reliably extract such correspondences across modalities, thus requiring the use of a correspondence-free registration algorithm. Existing correspondence-free methods rely on local search strategies and consequently have no guarantee of finding the optimal solution. In contrast, we present the first globally optimal approach to 2D-3D registration without correspondences, achieved by a Branch-and-Bound algorithm. Furthermore, a deterministic annealing procedure is proposed to speed up the nested branch-and-bound algorithm used. The theoretical and practical advantages this brings are demonstrated on a range of synthetic and real data where it is observed that the proposed approach is significantly more robust to high proportions of outliers compared to existing approaches.
    H Kim, M Sarim, T Takai, J-Y Guillemaut, A Hilton (2010)Dynamic 3D Scene Reconstruction in Outdoor Environments, In: In Proc. IEEE Symp. on 3D Data Processing and Visualization
    A number of systems have been developed for dynamic 3D reconstruction from multiple view videos over the past decade. In this paper we present a system for multiple view reconstruction of dynamic outdoor scenes transferring studio technology to uncontrolled environments. A synchronised portable multiple camera system is composed of off-the-shelf HD cameras for dynamic scene capture. For foreground extraction, we propose a multi-view trimap propagation method which is robust against dynamic changes in appearance between views and over time. This allows us to apply state-of-the-art natural image matting algorithms for multi-view sequences with minimal interaction. Optimal 3D surface of the foreground models are reconstructed by integrating multi-view shape cues and features. For background modelling, we use a line scan camera with a fish eye lens to capture a full environment with high resolution. The environment model is reconstructed from a spherical stereo image pair with sub-pixel correspondence. Finally the foreground and background models are merged into a 3D world coordinate and the composite model is rendered from arbitrary viewpoints. We show that the proposed system generates high quality scene images with dynamic virtual camera actions.
    J-Y Guillemaut, J Kilner, A Hilton (2009)Robust Graph-Cut Scene Segmentation and Reconstruction for Free-Viewpoint Video of Complex Dynamic Scenes, In: 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)pp. 809-816
    M Sarim, JY Guillemaut, H Kim, A Hilton (2009)Wide-baseline Image Matting, In: European Conference on Visual Media Production(CVMP)
    D Casas, M Tejera, Jean-Yves Guillemaut, A Hilton (2013)Interactive Animation of 4D Performance Capture., In: IEEE Trans. Vis. Comput. Graph.195pp. 762-773
    J-Y Guillemaut, A Hilton, J Starck, J Kilner, O Grau (2007)A Bayesian framework for simultaneous matting and 3D reconstruction, In: 3DIM 2007: Sixth International Conference on 3-D Digital Imaging and Modeling, Proceedingspp. 167-174
    In recent years the Digital Film Production process has seen a huge increase in the amount of data captured, resulting in the need for automated tools within the pipeline. In particular, it typically involves the capture of multi-modal data such as 3D Light Detection And Ranging (LiDAR) scans, 2D images and videos, whose alignment and registration provide valuable information within the production process. There are significant challenges posed in this particular multi-modal registration problem that are not faced in the majority of feature-based registration pipelines. In particular, many existing feature detectors make modality-specific assumptions about the attributes a good, repeatable feature should possess, and as a result cannot be applied in a general, multi-modal manner. To combat this we take a saliency-based approach to feature detection that may be more meaningfully applied across modalities than other feature detectors. Furthermore, by extracting only the most salient features of a scene, significantly fewer features are obtained, resulting in a lower computational cost for the registration process. The first contribution of this thesis is a generalisation of the Kadir-Brady salient point detector. The generalisation allows for both a more robust alternative for 2D images, and a 3D extension, where in particular it may operate on both the geometry and texture of the scene. As a result, it allows for more meaningful multi-modal feature detection, and higher repeatability results are observed when compared to existing 2D-3D point feature detectors. The second contribution is the proposal of a novel salient line segment detector. By explicitly accounting for the surroundings of a line, the approach naturally avoids repetitive parts of a scene while detecting the strong, discriminative lines present. Its general, histogram-based framework allows for a natural extension to depth imagery and 3D, where lines are detected based jointly on both texture and geometry. The final contribution is centred around the registration phase, where a globally optimal solution to 2D-3D registration from points or lines based on a Branch-and-Bound (BnB) approach is proposed. Novel search procedures are proposed to speed up the algorithm, taking advantage of the special nested BnB structure used. The optimality properties of the proposed approach allow 2D-3D registration to be achieved for significantly higher rates of outliers compared to existing approaches.
    This thesis presents work on athlete pose estimation in single-iew broadcast videos. Human pose estimation is an important problem in computer vision and has received much interest in the research community due to the wide range of applications. This thesis presents a novel framework for the semi-automatic estimation of human pose in television quality sports footage. The focus is on achieving accurate pose estimation results on sports video sequences, with the assistance of a human operator in a broadcast studio setting, that can be used to drive post-action analysis and graphical overlays. A method for extracting and tracking off-the-shelf scale-invariant features on athletes is tested. Evaluation shows that such features are ill-suited for tracking articulated motion due to drift, data association, and a general lack of stable features to track. A keyframe-driven approach, inspired by the Pictorial Structures model, is developed for estimating 2D pose of athletes in sports sequences. This approach models the human body as a tree of loosely linked parts and introduces a temporal smoothness term aimed at ensuring temporal consistency of pose throughout the sequence. The evaluation demonstrates that such an approach is able to extract human pose in such videos, but requires a significant amount of manual interaction to do so with accuracy required for broadcast settings. A novel non-sequential method for maximising benefit from manually annotated keyframe poses using minimum spanning trees is developed. The developed algorithm serves two purposes: keyframe selection, and keyframe information propagation. Optimal keyframes are automatically selected and suggested to the operator for labelling. Once labelled, information from these keyframes is propagated throughout the sequence and automatically generated keyframes are created in visually similar frames. Qualitative and quantitative evaluation demonstrates an increase in accuracy and a decrease in the number of required keyframes. Finally, a geometric method for converting 2D poses into 3D is developed. The algorithm assumes a weak perspective projection for the video sequence and known relative limb lengths for the athlete, and is able to recover the relative scale given at least three labelled keyframes by solving a continuous optimisation problem. Evaluation against a baseline geometric method shows improved stability and lower residual error.
    Autonomous 3D reconstruction, the process whereby an agent can produce its own representation of the world, is an extremely challenging area in both vision and robotics. However, 3D reconstructions have the ability to grant robots the understanding of the world necessary for collaboration and high-level goal execution. Therefore, this thesis aims to explore methods that will enable modern robotic systems to autonomously and collaboratively achieve an understanding of the world. In the real world, reconstructing a 3D scene requires nuanced understanding of the environment. Additionally, it is not enough to simply “understand” the world, autonomous agents must be capable of actively acquiring this understanding. Achieving all of this using simple monocular sensors is extremely challenging. Agents must be able to understand what areas of the world are navigable, how egomotion affects reconstruction and how other agents may be leveraged to provide an advantage. All of this must be considered in addition to the traditional 3D reconstruction issues of correspondence estimation, triangulation and data association. Simultaneous Localisation and Mapping (SLAM) solutions are not particularly well suited to autonomous multi-agent reconstruction. They typically require the sensors to be in constant communication, do not scale well with the number of agents (or map size) and require expensive optimisations. Instead, this thesis attempts to develop more pro-active techniques from the ground up. First, an autonomous agent must have the ability to actively select what it is going to reconstruct. Known as view-selection, or Next-Best View (NBV), this has recently become an active topic in autonomous robotics and will form the first contribution of this thesis. Second, once a view is selected, an autonomous agent must be able to plan a trajectory to arrive at that view. This problem, known as path-planning, can be considered a core topic in the robotics field and will form the second contribution of this thesis. Finally, the 3D reconstruction must be anchored to a globally consistent map that co-relates to the real world. This will be addressed as a floorplan localisation problem, an emerging field for the vision community, and will be the third contribution of this thesis. To give autonomous agents the ability to actively select what data to process, this thesis discusses the NBV problem in the context of Multi-View Stereo (MVS). The proposed approach has the ability to massively reduce the amount of computing resources required for any given 3D reconstruction. More importantly, it autonomously selects the views that improve the reconstruction the most. All of this is done exclusively on the sensor pose; the images are not used for view-selection and only loaded into memory once they have been selected for reconstruction. Experimental evaluation shows that NBV applied to this problem can achieve results comparable to state-of-the-art using as little as 3.8% of the views. To provide the ability to execute an autonomous 3D reconstruction, this thesis proposes a novel computer-vision based goal-estimation and path-planning approach. The method proposed in the previous chapter is extended into a continuous pose-space. The resulting view then becomes the goal of a Scenic Pathplanner that plans a trajectory between the current robot pose and the NBV. This is done using an NBV-based pose-space that biases the paths towards areas of high information gain. Experimental evaluation shows that the Scenic Planning enables similar performance to state-of-the-art batch approaches using less than 3% of the views, whichcorresponds to 2.7 × 10 −4 % of the possible stereo pairs (using a naive interpretation of plausible stereo pairs). Comparison against length-based path-planning approaches show that the Scenic Pathplanner produces more complete and more accurate maps with fewer frames. Finally, the ability of the Scenic Pathplanner to generalise to live scenarios is demonstrated using low-cost robotic platforms. Finally, to allow global consistency and provide a basis for indoor robot-human interaction, this thesis proposes a novel human-inspired floorplan localisation approach. This method uses the intuition that humans use semantic cues, such as doors and windows, to localise within a floorplan. These semantic cues are extracted from an RGB image, presented as a novel sensor modality called Semantic Detection and Ranging (SeDAR) and used as observations within a Monte-Carlo Localisation (MCL) framework. Experimental evaluation shows that SeDAR-based MCL has the ability to outperform state-of-the-art MCL when using range measurements. It is also demonstrated that the semantic cues are sufficient for localisation, as this approach achieves results comparable to state-of-the-art without range measurements. When combined, these contributions provide solutions to some of the most fundamental issues facing autonomous and collaborative robots. They advance the fields of 3D Reconstruction, Path-planning and Localisation by allowing autonomous agents to reconstruct complex scenes. The field of 3D reconstruction is advanced by demonstrating that intelligent view selection is capable of drastically improving performance of established methods. The field of Path-planning is advanced by establishing that pro-active behaviours can be encoded into low-cost robotics, such that high-level goals result in emergent strategies for collaboration. Finally, the field of Localisation is advanced by validating that human-inspired localisation based on distinctive semantic landmarks is an effective alternative to traditional scan-matching. The experiments in this thesis demonstrate that autonomous agents can navigate unknown complex scenes using simple monocular cameras. This thesis lays the foundation for autonomous, collaborative 3D reconstruction that goes beyond simple SLAM-based solutions and enables high-level collaboration towards a common goal.
    Shape information has been recognised as playing a role in intrinsic image estimation since its inception. However, it is only in recent years that hints of the importance of geometry have been found in decomposing surface appearance into albedo and shading estimates. This thesis establishes the central importance of shape in intrinsic surface property estimation for static and dynamic scenes, and introduces methods for the use of approximate shape in a wide range of related problems to provide high-level constraints on shading. A key contribution is intrinsic texture estimation. This is a generalisation of intrinsic image estimation, in which appearance is processed as a function of surface position rather than pixel position. This approach has numerous advantages, in that the shape can be used to resolve occlusion, inter-reflection and attached shading as a natural part of the method. Unlike previous bidirectional texture function estimation approaches, high-quality albedo and shading textures are produced without prior knowledge of materials or lighting. Many of the concepts in intrinsic texture estimation can be extended to single-viewpoint capture for which depth information is available. Depth information greatly reduces the ambiguity of the shading estimation problem, allowing online intrinsic video to be developed for the first time. The availability of a lighting function also allows high-level temporal constraints on shading to be applied over video sequences, which previously required per-pixel correspondence between frames to be established. A number of applications of intrinsic video are investigated, including augmented reality, video stylisation and relighting, all of which run at interactive framerates. The albedo distribution of the input video is preserved, even in the case of natural scenes with complex appearance, and a globally-consistent shading estimate is obtained which remains robust over dynamic sequences. Finally, an integrated framework bridging the gaps between intrinsic image, video and texture estimation is presented for the first time. Approximate scene geometry provides a convenient means of achieving this, and is used in establishing pixel constraints between adjacent cameras, reconstructing scene lighting, and removing cast shadows and inter-reflections. This introduces a unified geometry-based approach to intrinsic image estimation and related fields, which achieves high-quality results for complex natural scenes for a wide range of capture modalities.
    Recent developments in 3D media technology have brought to life numerous applications of interactive entertainment such as 3D cinema, 3DTV and gaming. Due to the data intensive nature of 3D visual content, Quality of Experience (QoE) has become a major driving factor to optimise the end-to-end content delivery process. However, to ensure the QoE, there is a need to develop more robust and accurate objective metrics for stereoscopic image and video quality assessment. Existing stereoscopic QoE metrics tend to lack in accuracy and robustness compared to its 2D counterparts as they are either extensions of 2D metrics or are based on simple perceptual models. However, measuring stereoscopic QoE requires more perceptually inspired metrics. This research introduces full-reference stereoscopic image and video quality metrics based on a Human Visual System (HVS) model incorporating important physiological findings on binocular vision. Firstly, a novel HVS model extending existing models in the literature is proposed to include the phenomena of binocular suppression and recurrent excitation towards stereoscopic image quality assessment. Secondly the research is extended to the temporal domain using temporal pooling of the HVS model outputs for individual frames and using a spatio-temporal model in the HVS model towards two distinct temporally inspired stereoscopic video quality metrics. Finally, motion sensitivity is introduced to the HVS model towards a perception inspired stereoscopic video quality metric. The proposed QoE metrics are trained, verified and tested using four publicly available stereoscopic image databases and two stereoscopic video datasets. They indicate an increase of average correlation index from 0.66 (baseline method) to 0.86 for the stereoscopic images and a maximum increase of average correlation index from 0.57 (baseline method) to 0.93 for stereoscopic videos. These results demonstrate the benefits of using a perceptually inspired approach in this research.
    High resolution brain magnetic resonance (MR) images acquired at multiple time points across the treatment of a patient allow the quantification of localised changes brought about by disease progression. The aim of this thesis is to address the challenge of performing automatic longitudinal analysis of magnetic resonance imaging (MRI) in paediatric brain tumours. The first contribution in this thesis is the validation of a semi-automated segmentation technique. This technique was applied to intra-operative MR images acquired during the surgical resection of hypothalamic tumours in children, in order to assess the volume of tumour resected at different stages of the surgical procedure. The second contribution in this thesis is the quantification of a rare condition known as hypertrophic olivary degeneration (HOD) in lobes within the brain known as inferior olivary nucleii (ION) in relation to the development of posterior fossa syndrome (PFS) following tumour resection in the hind brain. The change in grey-level intensity over time in the left ION has been identified as a suitable biomarker that correlates with the occurrence of posterior fossa syndrome following tumour resection surgery. This study demonstrates the application of machine learning techniques to T2 brain MR images. The third contribution presents a novel approach to longitudinal brain MR analysis, focusing on the cerebellum and brain stem. This contribution presents a technique developed to interpolate multi-slice 2D MR image slices of the brain stem and cerebellum both to infill gaps between slices as well as longitudinally over time, that is, in four-dimensional space. This study also investigates the application of machine learning techniques directly to the MR images. Another novel method developed in this study is the Jacobian of deformations in the brain over time, and its use as an imaging feature. Unlike the previous contribution chapter, the third contribution is not hypothesis-driven, and automatically detects six potential biomarkers that are related to the development of PFS following tumour resection in the posterior fossa. The limited number of patients considered in each study posed a major challenge. This has prompted the use of multiple validation techniques in order to provide accurate results despite the small dataset. These techniques are presented in the second and third contribution chapters.
    Accurate 3D geometry modelling is an essential technology for many practical applications (computer generated imagery, assisted surgery, heritage preservation, automated quality control, robotics etc.). While the existing reconstruction methods mainly operate assuming the simplistic Lambertian model, real scenes, static or dynamic, are characterised by arbitrarily complex a priori unknown reflectance properties. The reflectance limitation of the state-of-the-art causes a gap between the practical demand for photometrically arbitrary scene modelling and the constrained applicability scope of existing methods. In response to the gap, this dissertation proposes a solution to the challenging problem of accurate geometric reconstruction of dynamic scenes with arbitrary a priori unknown reflectance. This is achieved by introducing a novel approach which generalises Helmholtz Stereopsis (HS) - a niche technique known to be independent of surface reflectance but till now limited to static scenes requiring sequential acquisition of a large number of input views. The undertaken generalisation extends the technique to dynamic scenes by two mutually tailored developments in response to the shortcomings of conventional HS. These developments are 1) a framework to fundamentally improve the geometric reconstruction accuracy from a small set of input images and 2) the design of a novel wavelength-multiplexing-based pipeline for dynamic scene modelling. Together these constitute a novel practical system which, for the first time, enables reconstruction of dynamic scenes with arbitrary surface properties. To improve the quality of geometric reconstruction by HS, a novel Bayesian formulation of the technique is proposed to replace its sub-optimal maximum likelihood formulation. Further a tailored prior enforcing consistency of per-point depth and normal estimates and related to integrability is developed. The prior purposely exploits the unique ability of HS to characterise the surface by both estimates. The formulation embedded into a coarse-to-fine framework without explicit surface integration achieves unprecedented accuracy and resolution of geometric modelling by HS regardless of reflectance, competitive with what the non-HS state-of-the-art achieves with strictly constrained reflectance. To generalise HS to dynamic scenes, Colour Helmholtz Stereopsis (CL HS) is proposed which utilises wavelength multiplexing for simultaneous acquisition of the minimal set of input images required for reconstruction. The challenges imposed by wavelength multiplexing in CL HS are addressed using a specially designed calibration consisting of two mutually dependent parts: one infers the photometric properties of the acquisition equipment while the other estimates the reconstructed surface chromaticity spatially and propagates it temporally to accommodate dynamic surface deformation. By integrating the proposed coarse-to-fine Bayesian HS with integrability prior into CL HS, remarkable accuracy and resolution of reconstruction are achieved with the minimal input using just three RGB cameras. Evaluation validates the approach by reconstruction of dynamic scenes with arbitrary a priori unknown reflectance, which includes unconstrained spatially varying chromaticity. The reconstructed dynamic sequences exhibit high per-frame geometric accuracy and resolution as well as temporal consistency.