Dr Colin O'Reilly
I am a Research Fellow at the Institute for Communication Systems (ICS), University of Surrey. My research is in the area of machine learning and data mining. I am involved in the FP7 SocIoTal project.
I received the B.Sc. degree in Mathematics from Queen Mary College, University of London, an M.Eng in Telecommunications Engineering from Dublin City University, and the Ph.D from University of Surrey. My Ph.D thesis focused on anomaly detection.
Machine learning; Anomaly detection; Distributed data; Non-stationary data; Univariate and multivariate time-series analysis; Kernel methods; Applications of machine learning in a wide variety of domains
In non-stationary environments the data distribution may alter, meaning that the concepts to be learned evolve in time. Anomaly detection techniques must be able to adapt to a non-stationary data distribution in order to perform optimally. This requires an update to the model that is being used to classify data. A batch approach to the problem requires a reconstruction of the model each time an update is required. Incremental
learning overcomes this issue by using the previous model as the basis for an update. Two kernel-based incremental anomaly detection techniques are proposed. One technique uses kernel principal component analysis to perform anomaly detection. The kernel eigenspace is incrementally updated by splitting and merging kernel
eigenspaces. The technique is shown to be more accurate than current state-of-the-art solutions. The second technique offers a reduction in the number of computations by using an incrementally updated hypersphere in kernel space.
In addition to updating a model, in a non-stationary environment an update to the parameters of the model are required. Anomaly detection algorithms require the selection of appropriate parameters in order to perform optimally for a given data set. If the distribution of the data changes, an update to the parameters of a model is required. An automatic parameter optimization procedure is proposed for the one-class quartersphere support vector machine where the v parameter is selected automatically based on the anomaly rate in the training set.
In environments such as wireless sensor networks, data might be distributed amongst a number of nodes. In this case, distributed learning is required where nodes construct a classifier, or an approximation of the classifier, that would have been formed had all the data been available to one instance of the algorithm. A principal component analysis based anomaly detection method is proposed that uses the solution to a convex
optimization problem. The convex optimization problem is then derived in a distributed form, with each node running a local instance of the algorithm. Nodes are able to iterate towards an anomaly detector equivalent to the global solution by exchanging short messages.
Detailed evaluations of the proposed techniques are performed against existing state-of-the-art techniques using a variety of synthetic and real-world data sets. Results in the area of a non-stationary environment illustrate the necessity to adapt an anomaly detection model to the changing data distribution. It is shown that the proposed incremental techniques are maintain accuracy while reducing the number of computations.
In addition, optimal parameters derived from an unlabelled training set are shown to exhibit superior performance to statically selected parameters.
In the area of a distributed environment, it is shown that local learning is insufficient due to the lack of examples. Distributed learning can be performed in a manner where a centralized model can be derived by passing small amounts of information between neighbouring nodes. This approach yields a model that obtains performance equal to that of the centralized model.