
Dr James Ross
Academic and research departments
Centre for Vision, Speech and Signal Processing (CVSSP), Faculty of Engineering and Physical Sciences.About
My research project
Computer vision/deep learning for autonomous vehicle navigation and controlDeveloping novel methods and systems for autonomous vehicle navigation and control, using deep learning with monocular vision to enhance spatial reasoning and scene understanding in challenging domains.
Supervisors
Developing novel methods and systems for autonomous vehicle navigation and control, using deep learning with monocular vision to enhance spatial reasoning and scene understanding in challenging domains.
My qualifications
ResearchResearch interests
Research focuses on the application of computer vision and deep learning to solve problems in autonomous robotics and transfer these techniques across multiple domains.
Research interests include:
- Monocular Bird's Eye View (BEV) prediction
- Simultaneous Localisation and Mapping (SLAM)
- Generative Adversarial Networks (GANs), Simulation and Domain Transfer
- Appearance transfer and conditional diffusion models
Research interests
Research focuses on the application of computer vision and deep learning to solve problems in autonomous robotics and transfer these techniques across multiple domains.
Research interests include:
- Monocular Bird's Eye View (BEV) prediction
- Simultaneous Localisation and Mapping (SLAM)
- Generative Adversarial Networks (GANs), Simulation and Domain Transfer
- Appearance transfer and conditional diffusion models
Publications
Significant advances in robotics and machine learning have resulted in many datasets designed to support research into autonomous vehicle technology. However, these datasets are rarely suitable for a wide variety of navigation tasks. For example, datasets that include multiple cameras often have short trajectories without loops that are unsuitable for the evaluation of longer-range SLAM or odometry systems, and datasets with a single camera often lack other sensors, making them unsuitable for sensor fusion approaches. Furthermore, alternative environmental representations such as semantic Bird's Eye View (BEV) maps are growing in popularity, but datasets often lack accurate ground truth and are not flexible enough to adapt to new research trends.To address this gap, we introduce Campus Map, a novel large-scale multi-camera dataset with 2M images from 6 mounted cameras that includes GPS data and 64-beam, 125k point LiDAR scans totalling 8M points (raw packets also provided). The dataset consists of 16 sequences in a large car park and 6 long-term trajectories around a university campus that provide data to support research into a variety of autonomous driving and parking tasks. Long trajectories (average 10 min) and many loops make the dataset ideal for the evaluation of SLAM, odometry and loop closure algorithms, and we provide several state-of-the-art baselines.We also include 40k semantic BEV maps rendered from a digital twin. This novel approach to ground truth generation allows us to produce more accurate and crisp semantic maps than are currently available. We make the simulation environment available to allow researchers to adapt the dataset to their specific needs. Dataset available at: cvssp.org/data/twizy_data
The ability to produce large-scale maps for nav-igation, path planning and other tasks is a crucial step for autonomous agents, but has always been challenging. In this work, we introduce BEV-SLAM, a novel type of graph-based SLAM that aligns semantically-segmented Bird's Eye View (BEV) predictions from monocular cameras. We introduce a novel form of occlusion reasoning into BEV estimation and demonstrate its importance to aid spatial aggregation of BEV predictions. The result is a versatile SLAM system that can operate across arbitrary multi-camera configurations and can be seamlessly integrated with other sensors. We show that the use of multiple cameras significantly increases performance, and achieves lower relative error than high-performance GPS. The resulting system is able to create large, dense, globally-consistent world maps from monocular cameras mounted around an ego vehicle. The maps are metric and correctly-scaled, making them suitable for downstream navigation tasks.