Academic and research departmentsCentre for Vision, Speech and Signal Processing (CVSSP), Department of Electrical and Electronic Engineering, Faculty of Engineering and Physical Sciences.
Small space robots have the potential to revolutionise space exploration by facilitating the on-orbit assembly of infrastructure, in shorter time scales, at reduced costs. Their commercial appeal will be further improved if such a system is also capable of performing on-orbit servicing missions, in line with the current drive to limit space debris and prolong the lifetime of satellites already in orbit. Whilst there have been a limited number of successful demonstrations of technologies capable of these on-orbit operations, the systems remain large and bespoke. The recent surge in small satellite technologies is changing the economics of space and in the near future, downsizing a space robot might become be a viable option with a host of benets. This industry wide shift means some of the technologies for use with a downsized space robot, such as power and communication subsystems, now exist. However, there are still dynamic and control issues that need to be overcome before a downsized space robot can be capable of undertaking useful missions. This paper rst outlines these issues, before analyzing the effect of downsizing a system on its operational capability. Therefore presenting the smallest controllable system such that the benefits of a small space robot can be achieved with current technologies. The sizing of the base spacecraft and manipulator are addressed here. The design presented consists of a 3 link, 6 degrees of freedom robotic manipulator mounted on a 12U form factor satellite. The feasibility of this 12U space robot was evaluated in simulation and the in-depth results presented here support the hypothesis that a small space robot is a viable solution for in-orbit operations. Keywords: Small Satellite; Space Robot; In-orbit Assembly and Servicing; In-orbit operations; Free-Flying; Free-Floating.
The successful performance of any system is dependant on the hardware of the agent, which is typically immutable during RL training. In this work, we present ORCHID (Optimisation of Robotic Control and Hardware In Design) which allows for truly simultaneous optimisation of hardware and control parameters in an RL pipeline. We show that by forming a complex differential path through a trajectory rollout we can leverage a vast amount of information from the system that was previously lost in the ‘black-box’ environment. Combining this with a novel hardware-conditioned critic network minimises variance during training and ensures stable updates are made. This allows for refinements to be made to both the morphology and control parameters simultaneously. The result is an efficient and versatile approach to holistic robot design, that brings the final system nearer to true optimality. We show improvements in performance across 4 different test environments with two different control algorithms - in all experiments the maximum performance achieved with ORCHID is shown to be unattainable using only policy updates with the default design. We also show how re-designing a robot using ORCHID in simulation, transfers to a vast improvement in the performance of a real-world robot.
The use of reinforcement learning (RL) has led to huge advancements in the field of robotics. However data scarcity, brittle convergence and the gap between simulation & real world environments, mean that most common RL approaches are subject to over fitting and fail to generalise to unseen environments. Hardware agnostic policies would mitigate this by allowing a single network to operate in a variety of test domains, where dynamics vary due to changes in robotic morphologies or internal parameters.We utilise the idea that learning to adapt a known and successful control policy is easier and more flexible than jointly learning numerous control policies for different morphologies. This paper presents the idea of Hardware Agnostic Reinforcement Learning using Adversarial selection (HARL-A). In this approach training examples are sampled using a novel adversarial loss function. This is designed to self regulate morphologies based on their learning potential. Simply applying our learning potential based loss function to current stateof- the-art already provides ~ 30% improvement in performance. Meanwhile experiments using the full implementation of HARL-A report an average increase of 70% to a standard RL baseline and 55% compared with current state-of-the-art.