The Challenges of Anomaly Detection
Recently, Anomaly Detection (AD), a.k.a one-class classification, has received considerable attention in a variety of applications such as biometrics, computer vision, machine learning etc. An anomaly is defined as an observation that does not conform to the expected normal behaviour. To detect anomalies, modelling and encapsulating normal data is still an open problem, especially if only normal (non-anomalous) data is available for training time, making it a challenging problem. To mitigate this issue, several existing works have tried to leverage the available anomalies in the training set to improve the AD performance. However, this design may not be effective in real-world scenarios where very few or no anomalous data is available. In this thesis, the challenge of a pure AD design is studied using non-anomalous samples only. As it would not be feasible to develop a generic framework to cover all the aforementioned applications, several pure AD models are developed so that each one deals with a specific domain.
Although significant improvements have been achieved in face recognition, presentation attacks (PA) are recognised as a considerable threat to the biometric devices where an impostor tries to access a service illegally. In order to counteract PAs, the majority of approaches formulate the presentation attack detection (PAD) problem, a.k.a face spoofing detection, as a two-class classification. Nevertheless, the two-class formulation does not perform robustly due to its poor generalisation performance in the presence of novel PAs. To address this limitation, a pure AD model is trained where the real-access is considered normal and PAs are presumed to be anomalous observations. An aspect of PAD design that has been overlooked is the use of client-specific information in the context of AD. It has been shown that the client identity information can be deployed to achieve better discrimination between the real-accesses and PAs. As the first contribution, the client-specific information is adopted to build the one-class classifiers (OCCs) and determine a client-specific threshold.
To further improve the generalisation performance of OCCs, the idea of constructing a fusion of OCCs has received increasing attention. Nevertheless, very few studies in the literature have been concerned with developing a general methodology of OCC fusion design and examining its effectiveness in a broad range of applications. In the thesis, it is aimed to redress this limitation by proposing a generic OCC fusion method. To boost the performance, three novel contributions are proposed. Firstly, as very few consider the effect of population outliers on the normalisation process, a new score normalisation method is proposed as a pre-processing step to multiple classifier fusion that is able to cope well with heavy-tailed non-anomalous data distributions. Second, to be faithful to the pure AD design philosophy, a novel fitness function is defined which requires only normal observations to estimate the competency of OCCs. Thirdly, a new pruning method is proposed to discard OCCs having no/less informative data from the fusion to improve the AD results.
Up to this point, pre-trained ImageNet CNNs have been used in the thesis to extract the features from image data. To train a CNN from scratch, a deep network is pretrained using self-supervised learning for an auxiliary geometric transformation (GT) classification task. The key contribution is a novel loss function that augments the standard cross-entropy by an additional term that plays a significant role in the later stages of self-supervised learning. The proposed enabling innovation is a triplet centre loss with an adaptive margin and a learnable metric, which relentlessly drives the GT classes to exhibit continuously improving compactness and inter-class separation. The pretrained network is finetuned for the downstream task using non-anomalous data only, and a GT model for the data is constructed. Anomalies are detected by fusing the output of several decision functions defined using the learnt GT class model.
Extensive experiments on publicly available AD datasets demonstrate the effectiveness of the proposed contributions and lead to significant performance gains compared to the state-of-the-art methods. This includes benchmarking datasets in PAD, conventional tabular datasets in the machine learning domain and common computer vision databases.
Attend the event
This is a hybrid event free for everyone to join