Dr Spencer Thomas

Senior Lecturer in Data Science and Computational Intelligence
MPhys PhD


Areas of specialism

Data Science; Machine Learning; Artificial Intelligence; Computational Intelligence; Healthcare Data ; Large and High Dimensional Data Analysis


Research interests


Postgraduate research supervision




F. Ntelemis, Y. Jin and S. A. Thomas, "Image Clustering Using an Augmented Generative Adversarial Network and Information Maximization," in IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2021.3085125.

S. A. Thomas, F. Brochu, A framework for traceble storage and curation of measurement data, Measurement: Sensors, 2021 1016/j.measen.2021.100201.

A. Lemanska, S. A. Thomas, et al, Study into COVID-19 Crisis Using Primary Care Mental Health Consultations and Prescriptions Data, Studies in Health Technology and Informatics. 2021 doi: 10.3233/SHTI210277

Spencer A. Thomas, Frederic Brochu (2022)Curation at the point of measurement and traceability of measurement workflows, In: Measurement: Sensors23100399 Elsevier

In this paper we introduce a method that can digitally capture machine actionable metadata, tag them to the associated measurement data, and upload to a curated database. Our method is packaged as a tool to enable scientists to capture and store curated data at the point of measurement. By ‘data’ we include the primary measurement and any associated information such as calibration data, processing/analysis scripts, multi-modal data, etc. Combining the associated data together enhances re-usability through metadata and confidence though calibration data. We extend this process by adding new data at each stage of the data capture and analysis workflow to develop a completely traceable data processing pipeline. We achieve this by cumulatively updating the ‘data’ at each stage and by using versioning in our database for complete generality. Here each version is a self-contained curated container of all relevant data and codes providing a reproducible ‘snapshot’ in a traceable analytical pipeline. Within each ‘snapshot’ we store the outputs from the relevant data analysis (figures, models, hypothesis tests, etc), the raw data, and each step (codes, converters, etc) between them, resulting in a fully transparent and reproducible workflow. The ‘snapshots’ are updated along the analytical pipeline and we demonstrate this with several steps including: at the point of measurement; conversion to an open format; pre-processing (feature selection, noise reduction, etc); and data analysis. We demonstrate our method with a large cohort of mass spectrometry imaging experiments as an exemplar case study.

Ambra Morisi, Taran Rai, Nicholas J. Bacon, Spencer A. Thomas, Miroslaw Bober, Kevin Wells, Michael J. Dark, Tawfik Aboellail, Barbara Bacci, Roberto M. La Ragione (2023)Detection of Necrosis in Digitised Whole Slide Images for Better Grading of Canine Soft Tissue Sarcoma Using Machine-Learning, In: Veterinary sciences10(1)45

The definitive diagnosis of canine soft-tissue sarcomas (STSs) is based on histological assessment of formalin-fixed tissues. Assessment of parameters, such as degree of differentiation, necrosis score and mitotic score, give rise to a final tumour grade, which is important in determining prognosis and subsequent treatment modalities. However, grading discrepancies are reported to occur in human and canine STSs, which can result in complications regarding treatment plans. The introduction of digital pathology has the potential to help improve STS grading via automated determination of the presence and extent of necrosis. The detected necrotic regions can be factored in the grading scheme or excluded before analysing the remaining tissue. Here we describe a method to detect tumour necrosis in histopathological whole-slide images (WSIs) of STSs using machine learning. Annotated areas of necrosis were extracted from WSIs and the patches containing necrotic tissue fed into a pre-trained DenseNet161 convolutional neural network (CNN) for training, testing and validation. The proposed CNN architecture reported favourable results, with an overall validation accuracy of 92.7% for necrosis detection which represents the number of correctly classified data instances over the total number of data instances. The proposed method, when vigorously validated represents a promising tool to assist pathologists in evaluating necrosis in canine STS tumours, by increasing efficiency, accuracy and reducing inter-rater variation.

Foivos Ntelemis , Yaochu Jin, Spencer A. Thomas (2022)Information maximization clustering via multi-view self-labelling, In: Knowledge-Based Systems250109042 Elsevier

Image clustering is a particularly challenging computer vision task, which aims to generate annotations without human supervision. Recent advances focus on the use of self-supervised learning strategies in image clustering, by first learning valuable semantics and then clustering the image representations. These multiple-phase algorithms, however, involve several hyper-parameters and transformation functions, and are computationally intensive. By extending the grouping based self-supervised approach, this work proposes a novel single-phase clustering method that simultaneously learns meaningful representations and assigns the corresponding annotations. This is achieved by integrating a discrete representation into the self-supervised paradigm through a classifier net. Specifically, the proposed clustering objective employs mutual information to maximise the dependency of the integrated discrete representation on a discrete probability distribution. The discrete probability distribution is derived by means of a self-supervised process that compares the learnt latent representation with a set of trainable prototypes. To enhance the learning performance of the classifier, we jointly apply the mutual information across multi-crop views. Our empirical results show that the proposed framework outperforms state-of-the-art techniques with an average clustering accuracy of 89.1%, 49.0%, 83.1%, and 27.9%, respectively, on the baseline datasets of CIFAR-10, CIFAR-100/20, STL10 and Tiny-ImageNet/200. Finally, the proposed method also demonstrates attractive robustness to parameter settings, and to a large number of classes, making it ready to be applicable to other datasets.

Taranpreet Rai, Ambra Morisi, Barbara Bacci, Nicholas J. Bacon, Michael J. Dark, Tawfik Aboellail, Spencer A. Thomas, Miroslaw Bober, Roberto La Ragione, Kevin Wells (2022)Deep learning for necrosis detection using canine perivascular wall tumour whole slide images, In: Scientific Reports1210634

Necrosis seen in histopathology Whole Slide Images is a major criterion that contributes towards scoring tumour grade which then determines treatment options. However conventional manual assessment suffers from inter-operator reproducibility impacting grading precision. To address this, automatic necrosis detection using AI may be used to assess necrosis for final scoring that contributes towards the final clinical grade. Using deep learning AI, we describe a novel approach for automating necrosis detection in Whole Slide Images, tested on a canine Soft Tissue Sarcoma (cSTS) data set consisting of canine Perivascular Wall Tumours (cPWTs). A patch-based deep learning approach was developed where different variations of training a DenseNet-161 Convolutional Neural Network architecture were investigated as well as a stacking ensemble. An optimised DenseNet-161 with post-processing produced a hold-out test F1-score of 0.708 demonstrating state-of-the-art performance. This represents a novel first-time automated necrosis detection method in the cSTS domain as well specifically in detecting necrosis in cPWTs demonstrating a significant step forward in reproducible and reliable necrosis assessment for improving the precision of tumour grading.

—The usage of chemical imaging technologies is becoming a routine accompaniment to traditional methods in pathology. Significant technological advances have developed these next generation techniques to provide rich, spatially resolved , multidimensional chemical images. The rise of digital pathology has significantly enhanced the synergy of these imaging modalities with optical microscopy and immunohistochemistry, enhancing our understanding of the biological mechanisms and progression of diseases. Techniques such as imaging mass cy-tometry provide labelled multidimensional (multiplex) images of specific components used in conjunction with digital pathology techniques. These powerful techniques generate a wealth of high dimensional data that create significant challenges in data analysis. Unsupervised methods such as clustering are an attractive way to analyse these data, however, they require the selection of parameters such as the number of clusters. Here we propose a methodology to estimate the number of clusters in an automatic data-driven manner using a deep sparse autoencoder to embed the data into a lower dimensional space. We compute the density of regions in the embedded space, the majority of which are empty, enabling the high density regions (i.e. clusters) to be detected as outliers and provide an estimate for the number of clusters. This framework provides a fully unsupervised and data-driven method to analyse multidimensional data. In this work we demonstrate our method using 45 multiplex imaging mass cytometry datasets. Moreover, our model is trained using only one of the datasets and the learned embedding is applied to the remaining 44 images providing an efficient process for data analysis. Finally, we demonstrate the high computational efficiency of our method which is two orders of magnitude faster than estimating via computing the sum squared distances as a function of cluster number.

Additional publications