A typical high-end film production generates several terabytes of data per day, either as footage from multiple cameras or as background information regarding the set (laser scans, spherical captures, etc). This paper presents solutions to improve the integration, and the understanding of the quality, of the multiple data sources, which are used both to support creative decisions on-set (or near it) and enhance the postproduction process. The main contributions covered in this paper are: a public multisource production dataset made available for research purposes, monitoring and quality assurance of multicamera set-ups, multisource registration, anthropocentric visual analysis for semantic content annotation, acceleration of 3D reconstruction, and integrated 2D-3D web visualization tools. Furthermore, this paper presents a toolset for analysis and visualisation of multi-modal media production datasets which enables onset data quality verification and management, thus significantly reducing the risk and time required in production. Some of the basic techniques used for acceleration, clustering and visualization could be applied to much larger classes of big data problems.
Many practical applications require an accurate knowledge of the extrinsic calibration (\ie, pose) of a moving camera. The existing SLAM and structure-from-motion solutions are not robust to scenes with large dynamic objects, and do not fully utilize the available information in the presence of static cameras, a common practical scenario. In this paper, we propose an algorithm that addresses both of these issues for a hybrid static-moving camera setup. The algorithm uses the static cameras to build a sparse 3D model of the scene, with respect to which the pose of the moving camera is estimated at each time instant. The performance of the algorithm is studied through extensive experiments that cover a wide range of applications, and is shown to be satisfactory.
In film production, many post-production tasks require the availability of accurate camera calibration information. This paper presents an algorithm for through-the-lens calibration of a moving camera for a common scenario in film production and broadcasting: The camera views a dynamic scene, which is also viewed by a set of static cameras with known calibration. The proposed method involves the construction of a sparse scene model from the static cameras, with respect to which the moving camera is registered, by applying the appropriate perspective-n-point (PnP) solver. In addition to the general motion case, the algorithm can handle the nodal cameras with unknown focal length via a novel P2P algorithm. The approach can identify a subset of static cameras that are more likely to generate a high number of scene-image correspondences, and can robustly deal with dynamic scenes. Our target applications include dense 3D reconstruction, stereoscopic 3D rendering and 3D scene augmentation, through which the success of the algorithm is demonstrated experimentally.
Synchronisation is an essential requirement for multiview 3D reconstruction of dynamic scenes. However, the use of HD cameras and large set-ups put a considerable stress on hardware and cause frame drops, which is usually detected by manually verifying very large amounts of data. This paper improves [9], and extends it with frame-drop detection capability. In order to spot frame-drop events, the algorithm fits a broken line to the frame index correspondences for each camera pair, and then fuses the pair wise drop hypotheses into a consistent, absolute frame-drop estimate. The success and the practical utility of the the improved pipeline is demonstrated through a number of experiments, including 3D reconstruction and free-viewpoint video rendering tasks. © 2012 IEEE.
Realistic 3D models of the environment are beneficial in many fields, from natural or man-made structure inspection and volumetric analysis, to movie-making, in particular, special effects integration to natural scenes. Spherical cameras are becoming popular in environment modelling because they capture the full surrounding scene visible from the camera location as a consistent seamless image at once. In this paper, we propose a novel pipeline to obtain fast and accurate 3D reconstructions from spherical images. In order to have a better estimation of the structure, the system integrates a joint camera pose and structure refinement step. This strategy proves to be much faster, yet equally accurate, when compared to the conventional method, registration of a dense point cloud via iterative closest point (ICP). Both methods require an initial estimate for successful convergence. The initial positions of the 3D points are obtained from stereo processing of pair of spherical images with known baseline. The initial positions of the cameras are obtained from a robust wide-baseline matching procedure. The performance and accuracy of the 3D reconstruction pipeline is analysed through extensive tests on several indoor and outdoor datasets.
For statistical analysis purposes, RANSAC is usually treated as a Bernoulli process: each hypothesis is a Bernoulli trial with the outcome outlier-free/contaminated; a run is a sequence of such trials. However, this model only covers the special case where all outlier-free hypotheses are equally good, e.g. generated from noise-free data. In this paper, we explore a more general model which obviates the noise-free data assumption: we consider RANSAC a random process returning the best hypothesis, (Formula presented.), among a number of hypotheses drawn from a finite set ((Formula presented.)). We employ the rank of (Formula presented.) within (Formula presented.) for the statistical characterisation of the output, present a closed-form expression for its exact probability mass function, and demonstrate that (Formula presented.)-distribution is a good approximation thereof. This characterisation leads to two novel termination criteria, which indicate the number of iterations to come arbitrarily close to the global minimum in (Formula presented.) with a specified probability. We also establish the conditions defining when a RANSAC process is statistically equivalent to a cascade of shorter RANSAC processes. These conditions justify a RANSAC scheme with dedicated stages to handle the outliers and the noise separately. We demonstrate the validity of the developed theory via Monte-Carlo simulations and real data experiments on a number of common geometry estimation problems. We conclude that a two-stage RANSAC process offers similar performance guarantees at a much lower cost than the equivalent one-stage process, and that a cascaded set-up has a better performance than LO-RANSAC, without the added complexity of a nested RANSAC implementation. © 2014 Springer Science+Business Media New York.
In this paper, an algorithm is proposed to solve the multi-frame structure from motion (MFSfM) problem for monocular video sequences with multiple rigid moving objects. The algorithm uses the epipolar criterion to segment feature trajectories belonging to the background scene and each of the independently moving objects. As a large baseline length is essential for the reliability of the epipolar geometry, the geometric robust information criterion is employed for key-frame selection within the sequences. Once the features are segmented, corresponding objects are reconstructed individually using a sequential algorithm that is capable of prioritizing the frame pairs with respect to their reliability and information content. The experimental results on synthetic and real data demonstrate that our approach has the potential to effectively deal with the multi-body MFSfM problem.