Robust Adversarial Attack and Defence of Convolutional Neural Networks
Adversarial examples are malicious inputs to computer vision systems that lead to unexpectedly incorrect results. Their existence raises concerns about the viability of using deep learning based systems in applications where safety and security are critical. In order for the adversarial examples to be useful to a real world attacker, they must be physically realisable, not restricted to the digital domain. This thesis explores the two complementary problems of finding such robust adversarial attacks, and defending against them.
First, we present a novel method for generating robust adversarial image examples making use of a Deep Internal Learning approach that exploits convolutional neural network (CNN) architectures to enforce plausible texture in image synthesis. Adversarial images are commonly generated by perturbing images to introduce high frequency noise that induces misclassification, but is fragile to simple image transformations. We show that using Deep Image Prior (DIP) regularisation to reconstruct an image under adversarial constraint induces perturbations that are more robust to transformations, whilst remaining visually imperceptible. Furthermore we show that our DIP approach can also be adapted to produce adversarial patch attacks (APAs), which insert visually overt, local regions (patches) into an image to induce misclassification.
Second, we present Vax-a-Net; a technique for immunizing CNN classifiers against APAs. We introduce a conditional Generative Adversarial Network (GAN) architecture that simultaneously learns to synthesise patches for use in APAs, whilst exploiting those attacks to adapt a pre-trained target CNN to reduce its susceptibility to them. This approach enables resilience against APAs to be conferred to pre-trained models, which would be impractical with conventional adversarial training due to the slow convergence of APA methods. We demonstrate transferability of this protection to defend against existing APAs, and show its efficacy across several contemporary CNN classification architectures.
Third, we explore adversarial attacks and defences for CNNs used in Micro-Doppler based automatic target recognition (ATR). The adversarial robustness of CNN classifiers in this application is an open question, which is important to address since such networks can be used in safety critical applications such as drone detection at airports. We look at whether simple CNN classifiers working on the frequency spectrum of radar data are vulnerable to adversarial examples. We produce two kinds of adversarial examples for these networks: an imperceptible kind that changes the entire input spectrum, and a perceptible kind that adds extra bands to the frequency spectrum. All the tested classifiers were vulnerable to both of these attacks. We also defended all the networks against both types of attack.
Finally, we present the first defence against APAs for deep networks that perform semantic segmentation of scenes. Such networks have potential applications in visual navigation tasks such as autonomous driving. We show that a conditional generator can be trained to produce patches on demand targeting specific classes and achieving superior performance versus conventional pixel-optimised patch attacks. We then leverage this generator along with the segmentation network as part of a GAN, which trains the CNN to ignore the adversarial patches produced by the generator, while simultaneously training the generator to produce updated patches to attack the finetuned network. We show that our process confers strong protection against adversarial patches, and that this protection generalises to traditional pixel-optimised patches.
Attend the Event
This is a free hybrid event open to everyone.