Person tracking is an often studied facet of computer vision, with applications in security, automated driving and entertainment. However, despite the advantages they offer, few current solutions work for 360° cameras, due to projection distortion. This paper presents a simple yet robust method for 3D tracking of multiple people in a scene from a pair of 360° cameras. By using 2D pose information, rather than potentially unreliable 3D position or repeated colour information, we create a tracker that is both appearance independent as well as capable of operating at narrow baseline. Our results demonstrate state of the art performance on 360° scenes, as well as the capability to handle vertical axis rotation.
This paper presents a 3D human pose estimation system that uses a stereo pair of 360° sensors to capture the complete scene from a single location. The approach combines the advantages of omnidirectional capture, the accuracy of multiple view 3D pose estimation and the portability of monocular acquisition. Joint monocular belief maps for joint locations are estimated from 360° images and are used to fit a 3D skeleton to each frame. Temporal data association and smoothing is performed to produce accurate 3D pose estimates throughout the sequence. We evaluate our system on the Panoptic Studio dataset, as well as real 360° video for tracking multiple people, demonstrating an average Mean Per Joint Position Error of 12.47cm with 30cm baseline cameras. We also demonstrate improved capabilities over perspective and 360° multi-view systems when presented with limited camera views of the subject.