Dr Mohammad Hossein Bamorovat Abadi
About
Biography
I am a researcher and engineer specializing in computer vision, sensor fusion, and robotics. I build robots that can see, understand, and move, and sometimes they find the door faster than I do before coffee. Currently, I am a Research Fellow at the University of Surrey, leading a NERC-funded project that uses autonomous, kite-tethered ground vehicles to map greenhouse gas emissions. It proves that robots can indeed fly kites for a cause.
Areas of specialism
University roles and responsibilities
- Research Fellow A in Autonomous Robotics for Environmental Science
My qualifications
Previous roles
ResearchResearch interests
- Robotics Engineering: Industrial & service robots, Autonomous cars (self driving cars)
- Computer Vision: Perception systems, Sensor fusion [Camera+Lidar+Radar] , 2D/3D Object detection,
- Machine Learning: Deep learning, Agentic AI, Embodied AI, Vision Language Models, Wrold Models
Research projects
Leading technical development of an autonomous mobile gas flux tower project funded by the Natural Environment Research Council (NERC). The project integrates a tethered helium-filled kite with an autonomous ground vehicle to generate high-resolution spatial maps of greenhouse gas emissions and airflow patterns at international field sites.
Research interests
- Robotics Engineering: Industrial & service robots, Autonomous cars (self driving cars)
- Computer Vision: Perception systems, Sensor fusion [Camera+Lidar+Radar] , 2D/3D Object detection,
- Machine Learning: Deep learning, Agentic AI, Embodied AI, Vision Language Models, Wrold Models
Research projects
Leading technical development of an autonomous mobile gas flux tower project funded by the Natural Environment Research Council (NERC). The project integrates a tethered helium-filled kite with an autonomous ground vehicle to generate high-resolution spatial maps of greenhouse gas emissions and airflow patterns at international field sites.
Publications
The significance of Human-Robot Interaction (HRI) is increasingly evident when integrating robotics within human-centric settings. A crucial component of effective HRI is Human Activity Recognition (HAR), which is instrumental in enabling robots to respond aptly in human presence, especially within Ambient Assisted Living (AAL) environments. Since robots are generally mobile and their visual perception is often compromised by motion and noise, this paper evaluates methods by merging the robot's mobile perspective with a static viewpoint utilising multi-view deep learning models. We introduce a dual-stream Convolutional 3D (C3D) model to improve vision-based HAR accuracy for robotic applications. Utilising the Robot House Multiview (RHM) dataset, which encompasses a robotic perspective along with three static views (Front, Back, Top), we examine the efficacy of our model and conduct comparisons with the dual-stream ConvNet and Slow-Fast models. The primary objective of this study is to enhance the accuracy of robot viewpoints by integrating them with static views using dual-stream models. The metrics for evaluation include Top-1 and Top-5 accuracy. Our findings reveal that the integration of static views with robotic perspectives significantly boosts HAR accuracy in both Top-1 and Top-5 metrics across all models tested. Moreover, the proposed dual-stream C3D model demonstrates superior performance compared to the other contemporary models in our evaluations.
With the recent increased development of deep neural networks and dataset capabilities, the Human Action Recognition (HAR) domain is growing rapidly in terms of both the available datasets and deep models. Despite this, there are some lacks at datasets specifically covering the Robotics field and Human-Robot interaction. We prepare and introduce a new multi-view dataset to address this. The Robot House Multi-View dataset (RHM) contains four views: Front, Back, Ceiling, and Robot Views. There are 14 classes with 6701 video clips for each view, making a total of 26804 video clips for the four views. The lengths of the video clips are between 1 to 5 seconds. The videos with the same number and the same classes are synchronized in different views. In the second part of this paper, we consider how single streams afford activity recognition using established state-of-the-art models. We then assess the affordance for each of the views based on information theoretic modelling and mutual information concept. Furthermore, we benchmark the performance of different views, thus establishing the strengths and weaknesses of each view relevant to their information content and performance of the benchmark. Our results lead us to conclude that multi-view and multi-stream activity recognition has the added potential to improve activity recognition results.
Human and activity detection has always been a vital task in Human-Robot Interaction (HRI) scenarios, such as those involving assistive robots. In particular, skeleton-based Human Activity Recognition (HAR) offers a robust and effective detection method based on human biomechanics. Recent advancements in human pose estimation have made it possible to extract skeleton positioning data accurately and quickly using affordable cameras. In interaction with a human, robots can therefore capture detailed information from a close distance and flexible perspective. However, recognition accuracy is susceptible to robot movements, where the robot often fails to capture the entire scene. To address this we propose the adoption of external cameras to improve the accuracy of activity recognition on a mobile robot. In support of this proposal, we present the dataset RHM-HAR-SK that combines multiple camera perspectives augmented with human skeleton extraction obtained by the HRNet pose estimation. We apply qualitative and quantitative analysis to the extracted skeleton and its joints to evaluate the coverage of extracted poses per camera perspective and activity. Results indicate that the recognition accuracy for the skeleton varies between camera perspectives and also joints, depending on the type of activity. However, recognition accuracy is susceptible to robot movements, where the robot often fails to capture the entire scene. To address this we propose the adoption of external cameras to improve the accuracy of activity recognition on a mobile robot. In support of this proposal, we present the dataset RHM-HAR-SK that combines multiple camera perspectives augmented with human skeleton extraction obtained by the HRNet pose estimation. We apply qualitative and quantitative analysis techniques to the extracted skeleton and its joints to demonstrate the additional value of external cameras to the robot's recognition pipeline. Results indicate that the recognition accuracy for the skeleton varies between camera perspectives and also joints, depending on the type of activity.
Ambient Assisted Living (AAL) systems aim to improve the safety, comfort, and quality of life for the populations with specific attention given to prolonging personal independence during later stages of life. Human Activity Recognition (HAR) plays a crucial role in enabling AAL systems to recognise and understand human actions. Multi-view human activity recognition (MV-HAR) techniques are particularly useful for AAL systems as they can use information from multiple sensors to capture different perspectives of human activities and can help to improve the robustness and accuracy of activity recognition. In this work, we propose a lightweight activity recognition pipeline that utilizes skeleton data from multiple perspectives with the objective of enhancing an assistive robot's perception of human activity. The pipeline includes data sampling, spatial temporal data transformation, and representation and classification methods. This work contrasts a modified classic LeNet classification model (M-LeNet) versus a Vision Transformer (ViT) in detecting and classifying human activities. Both methods are evaluated using a multi-perspective dataset of human activities in the home (RHM-HAR-SK). Our results indicate that combining camera views can improve recognition accuracy. Furthermore, our pipeline provides an efficient and scalable solution in the AAL context, where bandwidth and computing resources are often limited.
Human activity recognition (HAR) plays a critical role in diverse applications and domains, from assessments of ambient assistive living (AAL) settings and the development of smart environments to human-robot interaction (HRI) scenarios. However, using mobile robot cameras in such contexts has limitations like restricted field of view and possible noise. Therefore, employing additional fixed cameras can enhance the field of view and reduce susceptibility to noise. Never-theless, integrating additional camera perspectives increases complexity, a concern exacerbated by the number of real-time processes that robots should perform in the AAL scenario. This paper introduces our methodology that facilitates the combination of multiple views and compares different aspects of fusing information at low, medium and high levels. Their comparison is guided by parameters such as the number of training parameters, floating-point operations per second (FLOPs), training time, and accuracy. Our findings uncover a paradigm shift, challenging conventional beliefs by demonstrating that simplistic CNN models outperform their more complex counterparts using this innovation. Additionally, the pivotal role of pipeline and data combination emerges as a crucial factor in achieving better accuracy levels. In this study, integrating the additional view with the Robot-view resulted in an accuracy increase of up to 25 %. Ultimately, we have successfully attained a streamlined and efficient multi-view HAR pipeline, which will now be incorporated into AAL interaction scenarios.
This paper presents a sonar vision algorithm applied to omnidirectional vision. It provides autonomous navigation for a mobile robot in an unknown environment. It uses omnidirectional images without any prior calibration and detects static and dynamic obstacles. It estimates the most intended path based on visual sonar beams in front of the robot. The proposed method was tested on a mobile robot in indoor environment. The experimental results show acceptable performance considering computation costs.
In this paper, we present a omnidirectional vision-based navigation system that includes three approaches: obstacle avoidance based on sonar vision, direction estimation based on sonar vision and confection method of obstacle avoidance and direction estimation. This paper peruses effects of the mirror in omnidirectional vision applied to mobile robot navigation, as well. We design and establish four mirrors: small non-uniform pixel-density hyperbolic mirror, small uniform pixel density hyperbolic mirror, large non-uniform pixel density hyperbolic mirror and spherical mirror. This paper provides autonomous navigation for a mobile robot in an unknown environments. We use omnidirectional images without any prior calibration and detects static and dynamic obstacles. Our experiments operates in indoor environment with our particular sonar vision. The result show that small uniform pixel density hyperbolic mirror have best performance and big non-uniform pixel density hyperbolic mirror have weak performance in vision base mobile robot navigation. Also, the experimental results show acceptable performance considering computation costs in our sonar vision algorithm.
This paper presents a novel method of sonar vision, called side sonar vision (SSV), to navigate mobile robots in a known environment. It adopts Omni-directional images and divides surrounding sonar vision into three parts: front, right and left sides. These sides are under continuous scrutiny of individual agents. SSV analyses data of each side, separately, and produces two key parameters: angle and length. The parameters are sent to multi-layer controlling module of navigation that has two main nodes: path estimation and trajectory. The proposed method does not require any calibration or image conversion. The experiments show that the robot moves the way smoothly without colliding obstacle. It could track up to 98% of the path, automatically, without any collision with obstacles. The process time for this work was about 120 ms.