This paper presents a hybrid skeleton-driven surface registration
(HSDSR) approach to generate temporally consistent
meshes from multiple view video of human subjects.
2D pose detections from multiple view video are used to
estimate 3D skeletal pose on a per-frame basis. The 3D
pose is embedded into a 3D surface reconstruction allowing
any frame to be reposed into the shape from any other
frame in the captured sequence. Skeletal motion transfer
is performed by selecting a reference frame from the surface
reconstruction data and reposing it to match the pose
estimation of other frames in a sequence. This allows an
initial coarse alignment to be performed prior to refinement
by a patch-based non-rigid mesh deformation. The
proposed approach overcomes limitations of previous work
by reposing a reference mesh to match the pose of a target
mesh reconstruction, providing a closer starting point
for further non-rigid mesh deformation. It is shown that the
proposed approach is able to achieve comparable results to
existing model-based and model-free approaches. Finally,
it is demonstrated that this framework provides an intuitive
way for artists and animators to edit volumetric video.
4D human performance capture aims to create volumetric representations of observed human subjects performing arbitrary motions with the ability to replay and render dynamic scenes with the realism of the recorded video. This representation has the potential to enable highly realistic content production for immersive virtual and augmented reality experiences. Human models are typically rendered using detailed, explicit 3D models, which consist of meshes and textures, and animated using tailored motion models to simulate human behaviour and activity. However, designing a realistic 3D human model is still a costly and laborious process. Hence, this work investigates techniques to learn models of human body shape and appearance, aiming to facilitate the generation of highly realistic human animation, and demonstrate its potential contributions, applications, and versatility.
The first contribution of this work is a skeleton driven surface registration approach to generate temporally consistent meshes from multi-view video of human subjects. 2D pose detections from multi-view video are used to estimate 3D skeletal pose on a per-frame basis, which allows a reference frame to match the pose estimation of other frames in a sequence. This allows an initial coarse alignment followed by a patch-based non-rigid mesh deformation to generate temporally consistent mesh sequences.
The second contribution presents techniques to represent human-like shape using a compressed learnt model from 4D volumetric performance capture data. Sequences of 4D dynamic geometry representing a human are encoded with a generative network into a compact space representation, whilst maintaining the original properties, such as surface non-rigid deformations. This compact representation enables synthesis, interpolation and generation of 3D shapes.
The third contribution is Deep4D generative network that is capable of compact representation of 4D volumetric video sequences from skeletal motion of people with two orders of magnitude compression. A variational encoder-decoder is employed to learn an encoded latent space that maps from 3D skeletal pose to 4D shape and appearance. This enable high-quality 4D volumetric video synthesis to be driven by skeletal animation.
Finally, this thesis introduces Deep4D motion graph to implicitly combine multiple captured motions in a unified representation for character animation from volumetric video, allowing novel character movements to be generated with dynamic shape and appearance detail. Deep4D motion graphs allow character animation to be driven by skeletal motion sequences providing a compact encoded representation capable of high-quality synthesis of the 4D volumetric video with two orders of magnitude compression.