My publications


Fowler S, Kim H, Hilton A (2017) Towards Complete Scene Reconstruction from Single-View Depth and Human Motion, Proceedings of the 28th British Machine Vision Conference (BMVC 2017)
Complete scene reconstruction from single view RGBD is a challenging task, requiring
estimation of scene regions occluded from the captured depth surface. We propose
that scene-centric analysis of human motion within an indoor scene can reveal fully occluded
objects and provide functional cues to enhance scene understanding tasks. Captured
skeletal joint positions of humans, utilised as naturally exploring active sensors,
are projected into a human-scene motion representation. Inherent body occupancy is
leveraged to carve a volumetric scene occupancy map initialised from captured depth,
revealing a more complete voxel representation of the scene. To obtain a structured box
model representation of the scene, we introduce unique terms to an object detection optimisation
that overcome depth occlusions whilst deriving from the same depth data. The
method is evaluated on challenging indoor scenes with multiple occluding objects such as
tables and chairs. Evaluation shows that human-centric scene analysis can be applied to
effectively enhance state-of-the-art scene understanding approaches, resulting in a more
complete representation than single view depth alone.
Fowler Sam, Kim Hansung, Hilton Adrian (2018) Human-Centric Scene Understanding from Single View 360 Video, 2018 International Conference on 3D Vision (3DV) pp. 334-342 Institute of Electrical and Electronics Engineers (IEEE)
In this paper, we propose an approach to indoor scene understanding from observation of people in single view spherical video. As input, our approach takes a centrally located spherical video capture of an indoor scene, estimating the 3D localisation of human actions performed throughout the long term capture. The central contribution of this work is a deep convolutional encoder-decoder network trained on a synthetic dataset to reconstruct regions of affordance from captured human activity. The predicted affordance segmentation is then applied to compose a reconstruction of the complete 3D scene, integrating the affordance segmentation into 3D space. The mapping learnt between human activity and affordance segmentation demonstrates that omnidirectional observation of human activity can be applied to scene understanding tasks such as 3D reconstruction. We show that our approach using only observation of people performs well against previous approaches, allowing reconstruction of occluded regions and labelling of scene affordances.