The proposed system uses an Advantage Actor Critic (A2C) learning system with recurrent layers to introduce temporal context within the network. This allows the learned system to evaluate continuous control actions based on previous states and actions in addition to current states. Moreover, slow training of the algorithm caused by its sample inefficiency is addressed by utilising another neural network to approximate the vehicle dynamics. Using a neural network as a proxy for the simulator has significant benefit to training as it reduces the requirement for reinforcement learning to query the simulation (which is a major bottleneck) in learning and as both reinforcement learning network and proxy network can be deployed on the same GPU, learning speed is considerably improved. Simulation results from testing in IPG CarMaker show the effectiveness of our recurrent A2C algorithm, compared to an A2C without recurrent layers.
an autonomous vehicle. However, functional safety validation is seen
as a critical issue for these systems due to the lack of transparency in
deep neural networks and the safety-critical nature of autonomous vehicles.
The black box nature of deep neural networks limits the effectiveness
of traditional verification and validation methods. In this paper, we propose
two software safety cages, which aim to limit the control action
of the neural network to a safe operational envelope. The safety cages
impose limits on the control action during critical scenarios, which if
breached, change the control action to a more conservative value. This
has the benefit that the behaviour of the safety cages is interpretable,
and therefore traditional functional safety validation techniques can be
applied. The work here presents a deep neural network trained for longitudinal
vehicle control, with safety cages designed to prevent forward
collisions. Simulated testing in critical scenarios shows the effectiveness
of the safety cages in preventing forward collisions whilst under normal
highway driving unnecessary interruptions are eliminated, and the deep
learning control policy is able to perform unhindered. Interventions by
the safety cages are also used to re-train the network, resulting in a more
robust control policy.