
Dr Spencer Thomas
Academic and research departments
Nature Inspired Computing and Engineering Research Group, Department of Computer Science.About
Biography
Dr Spencer Thomas is a Senior Lecturer in Data Science and Computational Intelligence and a Senior Research Scientist in Data Science at the National Physical Laboratory (NPL). Spencer works part time at the University (Wednesdays and Thursdays) and at NPL the remained of the time.
Spencer joined the University on March 2022 following an Associate Lecture position and long-term visiting position in the Department of Computer Science. Prior to this he joined NPL in 2016 in the National Centre of Excellence in Mass Spectrometry Imaging (NiCE-MSI) group leading activity in computational methods and data analysis, before joining the Data Science group in 2019. Prior to NPL, Spencer was a Research Fellow in Applied Mathematics working on manifold learning and the analysis of complex systems and stochastic dynamical systems. He received an undergraduate master’s degree in physics (M.Phys) in 2010 and completed his Ph.D. in Optimisation and Computational Biology in 2014 from the University of Surrey.
Spencer is involved in a number of committees including co-Chair of the Computational Intelligence in Big Data track at the IEEE SSCI conference (2016-present) and a member of the Royal Microscopy Society Focus Interest Group on Mass Spectrometry Imaging. He is also a member of the Analytics Hub of the Early Detection of Neurodegenerative diseases (EDoN) Project.
Spencer is currently leading activity in trustworthy and explainable machine learning, curation of large complex data and healthcare applications in Data Science.
Areas of specialism
Affiliations and memberships
ResearchResearch interests
Spencer is working on trustworthy and explainable machine learning, transfer learning, data integration, representation learning, large scale high dimensional systems analysis, data science applications in healthcare, and tools for traceable storage and curation of complex data. Spencer also has experience in complex systems, dynamical systems, optimisation, and applied mathematics.
Research projects
- DS and AI in complex high dim data
- Trusted and explainable AI in healthcare
- Data integration in ML
- Representation learning
- Machine Learning for Evaluating Disease and Drug Effectiveness in Fibre-Bundle Endo-Microscopy Systems
- COVID response projects
- Digital Pathology classification of animal tissues and data analysis kidneys data
- Early detection of recurrence of GBM
- Early detection of cancerous lung nodules
- Traceable and curated FAIR data storage
- Early Detection of Neurodegenerative diseases (EDoN) Analytical Hub
- Royal Microscopy Society Focus Interest Group on Mass Spectrometry Imaging (RMS MSI FIG)
- NPL’s interdisciplinary Digital Health project
Research interests
Spencer is working on trustworthy and explainable machine learning, transfer learning, data integration, representation learning, large scale high dimensional systems analysis, data science applications in healthcare, and tools for traceable storage and curation of complex data. Spencer also has experience in complex systems, dynamical systems, optimisation, and applied mathematics.
Research projects
- DS and AI in complex high dim data
- Trusted and explainable AI in healthcare
- Data integration in ML
- Representation learning
- Machine Learning for Evaluating Disease and Drug Effectiveness in Fibre-Bundle Endo-Microscopy Systems
- COVID response projects
- Digital Pathology classification of animal tissues and data analysis kidneys data
- Early detection of recurrence of GBM
- Early detection of cancerous lung nodules
- Traceable and curated FAIR data storage
- Early Detection of Neurodegenerative diseases (EDoN) Analytical Hub
- Royal Microscopy Society Focus Interest Group on Mass Spectrometry Imaging (RMS MSI FIG)
- NPL’s interdisciplinary Digital Health project
Supervision
Postgraduate research supervision
I am looking for PhD students in machine learning, computer vision and pattern recognition theory and applications. I am particularly interested in healthcare applications.
Current PhD students
- Foivos Ntelemis (CS, NPL): Deep learning and representation learning for high dimensional data
- Taran Rai (CVSSP, Vet School, NPL): Deep learning in Digital Pathology
- Tarek Haloubi (Uni Edinburgh, GSK, NPL): Machine Learning for Evaluating Disease and Drug Effectiveness in Fibre-Bundle Endo-Microscopy Systems.
Previous students
- Cameron Cook (MMath): Exploring the limits of transfer learning on image classification
Teaching
COM3018 Practical Business Analytics
COMM053 Practical Business Analytics (Data Science MSc)
Publications
Highlights
F. Ntelemis, Y. Jin and S. A. Thomas, "Image Clustering Using an Augmented Generative Adversarial Network and Information Maximization," in IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2021.3085125.
S. A. Thomas, F. Brochu, A framework for traceble storage and curation of measurement data, Measurement: Sensors, 2021 1016/j.measen.2021.100201.
A. Lemanska, S. A. Thomas, et al, Study into COVID-19 Crisis Using Primary Care Mental Health Consultations and Prescriptions Data, Studies in Health Technology and Informatics. 2021 doi: 10.3233/SHTI210277
S. A. Thomas, et al, Analysis of Primary Care Computerized Medical Records (CMR) Data With Deep Autoencoders (DAE), Front. Appl. Math. Stat., 06 August 2019 | https://doi.org/10.3389/fams.2019.00042
In this paper we introduce a method that can digitally capture machine actionable metadata, tag them to the associated measurement data, and upload to a curated database. Our method is packaged as a tool to enable scientists to capture and store curated data at the point of measurement. By ‘data’ we include the primary measurement and any associated information such as calibration data, processing/analysis scripts, multi-modal data, etc. Combining the associated data together enhances re-usability through metadata and confidence though calibration data. We extend this process by adding new data at each stage of the data capture and analysis workflow to develop a completely traceable data processing pipeline. We achieve this by cumulatively updating the ‘data’ at each stage and by using versioning in our database for complete generality. Here each version is a self-contained curated container of all relevant data and codes providing a reproducible ‘snapshot’ in a traceable analytical pipeline. Within each ‘snapshot’ we store the outputs from the relevant data analysis (figures, models, hypothesis tests, etc), the raw data, and each step (codes, converters, etc) between them, resulting in a fully transparent and reproducible workflow. The ‘snapshots’ are updated along the analytical pipeline and we demonstrate this with several steps including: at the point of measurement; conversion to an open format; pre-processing (feature selection, noise reduction, etc); and data analysis. We demonstrate our method with a large cohort of mass spectrometry imaging experiments as an exemplar case study.
The definitive diagnosis of canine soft-tissue sarcomas (STSs) is based on histological assessment of formalin-fixed tissues. Assessment of parameters, such as degree of differentiation, necrosis score and mitotic score, give rise to a final tumour grade, which is important in determining prognosis and subsequent treatment modalities. However, grading discrepancies are reported to occur in human and canine STSs, which can result in complications regarding treatment plans. The introduction of digital pathology has the potential to help improve STS grading via automated determination of the presence and extent of necrosis. The detected necrotic regions can be factored in the grading scheme or excluded before analysing the remaining tissue. Here we describe a method to detect tumour necrosis in histopathological whole-slide images (WSIs) of STSs using machine learning. Annotated areas of necrosis were extracted from WSIs and the patches containing necrotic tissue fed into a pre-trained DenseNet161 convolutional neural network (CNN) for training, testing and validation. The proposed CNN architecture reported favourable results, with an overall validation accuracy of 92.7% for necrosis detection which represents the number of correctly classified data instances over the total number of data instances. The proposed method, when vigorously validated represents a promising tool to assist pathologists in evaluating necrosis in canine STS tumours, by increasing efficiency, accuracy and reducing inter-rater variation.
Image clustering is a particularly challenging computer vision task, which aims to generate annotations without human supervision. Recent advances focus on the use of self-supervised learning strategies in image clustering, by first learning valuable semantics and then clustering the image representations. These multiple-phase algorithms, however, involve several hyper-parameters and transformation functions, and are computationally intensive. By extending the grouping based self-supervised approach, this work proposes a novel single-phase clustering method that simultaneously learns meaningful representations and assigns the corresponding annotations. This is achieved by integrating a discrete representation into the self-supervised paradigm through a classifier net. Specifically, the proposed clustering objective employs mutual information to maximise the dependency of the integrated discrete representation on a discrete probability distribution. The discrete probability distribution is derived by means of a self-supervised process that compares the learnt latent representation with a set of trainable prototypes. To enhance the learning performance of the classifier, we jointly apply the mutual information across multi-crop views. Our empirical results show that the proposed framework outperforms state-of-the-art techniques with an average clustering accuracy of 89.1%, 49.0%, 83.1%, and 27.9%, respectively, on the baseline datasets of CIFAR-10, CIFAR-100/20, STL10 and Tiny-ImageNet/200. Finally, the proposed method also demonstrates attractive robustness to parameter settings, and to a large number of classes, making it ready to be applicable to other datasets.
Necrosis seen in histopathology Whole Slide Images is a major criterion that contributes towards scoring tumour grade which then determines treatment options. However conventional manual assessment suffers from inter-operator reproducibility impacting grading precision. To address this, automatic necrosis detection using AI may be used to assess necrosis for final scoring that contributes towards the final clinical grade. Using deep learning AI, we describe a novel approach for automating necrosis detection in Whole Slide Images, tested on a canine Soft Tissue Sarcoma (cSTS) data set consisting of canine Perivascular Wall Tumours (cPWTs). A patch-based deep learning approach was developed where different variations of training a DenseNet-161 Convolutional Neural Network architecture were investigated as well as a stacking ensemble. An optimised DenseNet-161 with post-processing produced a hold-out test F1-score of 0.708 demonstrating state-of-the-art performance. This represents a novel first-time automated necrosis detection method in the cSTS domain as well specifically in detecting necrosis in cPWTs demonstrating a significant step forward in reproducible and reliable necrosis assessment for improving the precision of tumour grading.
—The usage of chemical imaging technologies is becoming a routine accompaniment to traditional methods in pathology. Significant technological advances have developed these next generation techniques to provide rich, spatially resolved , multidimensional chemical images. The rise of digital pathology has significantly enhanced the synergy of these imaging modalities with optical microscopy and immunohistochemistry, enhancing our understanding of the biological mechanisms and progression of diseases. Techniques such as imaging mass cy-tometry provide labelled multidimensional (multiplex) images of specific components used in conjunction with digital pathology techniques. These powerful techniques generate a wealth of high dimensional data that create significant challenges in data analysis. Unsupervised methods such as clustering are an attractive way to analyse these data, however, they require the selection of parameters such as the number of clusters. Here we propose a methodology to estimate the number of clusters in an automatic data-driven manner using a deep sparse autoencoder to embed the data into a lower dimensional space. We compute the density of regions in the embedded space, the majority of which are empty, enabling the high density regions (i.e. clusters) to be detected as outliers and provide an estimate for the number of clusters. This framework provides a fully unsupervised and data-driven method to analyse multidimensional data. In this work we demonstrate our method using 45 multiplex imaging mass cytometry datasets. Moreover, our model is trained using only one of the datasets and the learned embedding is applied to the remaining 44 images providing an efficient process for data analysis. Finally, we demonstrate the high computational efficiency of our method which is two orders of magnitude faster than estimating via computing the sum squared distances as a function of cluster number.