Dr Peng Wang
Academic and research departments
Centre for Vision, Speech and Signal Processing (CVSSP), Surrey Institute for People-Centred Artificial Intelligence (PAI).About
Biography
Dr Peng Wang joins the University of Surrey as a Lecturer in Robotics and is a Royal Society Short Industry Fellow. His research focuses on robotic perception, learning, and collaboration, approaching these challenges through digital twinning and AI-based methods to enable robust, real-world deployment of robotic systems, with applications in manufacturing etc. His work is supported by EPSRC, the Royal Society, and the Henry Royce Institute, and involves close collaboration with industry partners. Beyond his research, he actively engages with the digital twinning and robotics communities. He is the ECR Lead of the UKRI DTNet+ Trust and Uncertainty Special Interest Group, and leads/organises robot learning workshops at venues such as IROS 2025 and CVPR 2026.
University roles and responsibilities
- Lecturer in Robotics
My qualifications
ResearchResearch interests
- Research interests
- Robotics
- Digital Twinning
- Generative AI
- Deep Learning
- 3D Reconstruction
Research projects
TwinCatRoyal Society Short Industry Fellowship: Digital twin and robotics for catalyst design and manufacturing.
Research interests
- Research interests
- Robotics
- Digital Twinning
- Generative AI
- Deep Learning
- 3D Reconstruction
Research projects
Royal Society Short Industry Fellowship: Digital twin and robotics for catalyst design and manufacturing.
Sustainable development goals
My research interests are related to the following:
Publications
Robotic task planning in real-world environments requires not only object recognition but also a nuanced understanding of spatial relationships between objects. We present a spatial-relationship-aware dataset of nearly 1,000 robot-acquired indoor images, annotated with object attributes, positions, and detailed spatial relationships. Captured using a Boston Dynamics Spot robot and labelled with a custom annotation tool, the dataset reflects complex scenarios with similar or identical objects and intricate spatial arrangements. We benchmark six state-of-the-art scene-graph generation models on this dataset, analysing their inference speed and relational accuracy. Our results highlight significant differences in model performance and demonstrate that integrating explicit spatial relationships into foundation models, such as ChatGPT 4o, substantially improves their ability to generate executable, spatially-aware plans for robotics. The dataset and annotation tool are publicly available at https://github.com/PengPaulWang/SpatialAwareRobotDataset, supporting further research in spatial reasoning for robotics.
Flow matching has emerged as a competitive framework for learning high-quality generative policies in robotics; however, we find that generalisation arises and saturates early along the flow trajectory, in accordance with recent findings in the literature. We further observe that increasing the number of Euler integration steps during inference counter-intuitively and universally degrades policy performance. We attribute this to (i) additional, uniformly spaced integration steps oversample the late-time region, thereby constraining actions towards the training trajectories and reducing generalisation; and (ii) the learned velocity field becoming non-Lipschitz as integration time approaches 1, causing instability. To address these issues, we propose a novel policy that utilises non-uniform time scheduling (e.g., U-shaped) during training, which emphasises both early and late temporal stages to regularise policy training, and a dense-jump integration schedule at inference, which uses a single-step integration to replace the multi-step integration beyond a jump point, to avoid unstable areas around 1. Essentially, our policy is an efficient one-step learner that still pushes forward performance through multi-step integration, yielding up to 23.7% performance gains over state-of-the-art baselines across diverse robotic tasks.
Diffusion models have marked a significant mile-stone in the enhancement of image and video generation technologies. However, generating videos that precisely retain the shape and location of moving objects such as robots remains a challenge. This paper presents diffusion models specifically tailored to generate videos that accurately maintain the shape and location of mobile robots. The proposed models incorporate techniques such as embedding accessible robot pose information and applying semantic mask regulation within the scalable and efficient ConvNext backbone network. These techniques are designed to refine intermediate outputs, therefore improving the retention performance of shape and location. Through extensive experimentation, our models have demonstrated notable improvements in maintaining the shape and location of different robots, as well as enhancing overall video generation quality, compared to the benchmark diffusion model. Codes will be open-sourced at: https://github.com/PengPaulWang/diffusion-robots.
In Industry 5.0, Digital Twins bring in flexibility and efficiency for smart manufacturing. Recently, the success of artificial intelligence techniques such as deep learning has led to their adoption in manufacturing and especially in human–robot collaboration. Collaborative manufacturing tasks involving human operators and robots pose significant safety and reliability concerns. In response to these concerns, a deep learning-enhanced Digital Twin framework is introduced through which human operators and robots can be detected and their actions can be classified during the manufacturing process, enabling autonomous decision making by the robot control system. Developed using Unreal Engine 4, our Digital Twin framework complies with the Robotics Operating System specification, and supports synchronous control and communication between the Digital Twin and the physical system. In our framework, a fully-supervised detector based on a faster region-based convolutional neural network is firstly trained on synthetic data generated by the Digital Twin, and then tested on the physical system to demonstrate the effectiveness of the proposed Digital Twin-based framework. To ensure safety and reliability, a semi-supervised detector is further designed to bridge the gap between the twin system and the physical system, and improved performance is achieved by the semi-supervised detector compared to the fully-supervised detector that is simply trained on either synthetic data or real data. The evaluation of the framework in multiple scenarios in which human operators collaborate with a Universal Robot 10 shows that it can accurately detect the human and robot, and classify their actions under a variety of conditions. The data from this evaluation have been made publicly available, and can be widely used for research and operational purposes. Additionally, a semi-automated annotation tool from the Digital Twin framework is published to benefit the collaborative robotics community.