Areas of specialism

Systems Biology; Bioinformatics; Bayesian statistics; Statistics of networks


Research interests




There is an up to date publication list on my personal website.

Justina Zurauskiene, Paul Kirk, Tom Thorne, John Pinney, Michael Stumpf (2014)Derivative processes for modelling metabolic fluxes, In: Bioinformatics30(13)pp. 1892-1898 Oxford University Press

Motivation: One of the challenging questions in modelling biological systems is to characterize the functional forms of the processes that control and orchestrate molecular and cellular phenotypes. Recently proposed methods for the analysis of metabolic pathways, for example, dynamic flux estimation, can only provide estimates of the underlying fluxes at discrete time points but fail to capture the complete temporal behaviour. To describe the dynamic variation of the fluxes, we additionally require the assumption of specific functional forms that can capture the temporal behaviour. However, it also remains unclear how to address the noise which might be present in experimentally measured metabolite concentrations. Results: Here we propose a novel approach to modelling metabolic fluxes: derivative processes that are based on multiple-output Gaussian processes (MGPs), which are a flexible non-parametric Bayesian modelling technique. The main advantages that follow from MGPs approach include the natural non-parametric representation of the fluxes and ability to impute the missing data in between the measurements. Our derivative process approach allows us to model changes in metabolite derivative concentrations and to characterize the temporal behaviour of metabolic fluxes from time course data. Because the derivative of a Gaussian process is itself a Gaussian process, we can readily link metabolite concentrations to metabolic fluxes and vice versa. Here we discuss how this can be implemented in an MGP framework and illustrate its application to simple models, including nitrogen metabolism in Escherichia coli.

Justina Zurauskiene, Paul Kirk, Tom Thorne, Michael P.H Stumpf (2014)Bayesian non-parametric approaches to reconstructing oscillatory systems and the Nyquist limit, In: Physica A: Statistical Mechanics and its Applications407pp. 33-42 Elsevier

Reconstructing continuous signals from discrete time-points is a challenging inverse problem encountered in many scientific and engineering applications. For oscillatory signals classical results due to Nyquist set the limit below which it becomes impossible to reliably reconstruct the oscillation dynamics. Here we revisit this problem for vector-valued outputs and apply Bayesian non-parametric approaches in order to solve the function estimation problem. The main aim of the current paper is to map how we can use of correlations among different outputs to reconstruct signals at a sampling rate that lies below the Nyquist rate. We show that it is possible to use multiple-output Gaussian processes to capture dependences between outputs which facilitate reconstruction of signals in situation where conventional Gaussian processes (i.e. this aimed at describing scalar signals) fail, and we delineate the phase and frequency dependence of the reliability of this type of approach. In addition to simple toy-models we also consider the dynamics of the tumour suppressor gene p53, which exhibits oscillations under physiological conditions, and which can be reconstructed more reliably in our new framework.

Ann C. Babtie, Michael P.H Stumpf, Tom Thorne (2019)Gene Regulatory Network Inference, In: E. Voit (eds.), Elsevier Reference Module in Biomedical Sciences Elsevier

Transcriptomic data quantifying gene expression states for single cells or cell populations at a genomic level is now readily available. Changes in transcriptional state during cell development and function are governed by gene regulatory networks, comprising a collection of genes and regulatory interactions between these genes (or gene products). Network inference algorithms aim to infer functional interactions between genes from experimentally observed expression profiles, and identify the structure of the underlying regulatory networks. Here we describe popular classes of network inference algorithms, highlighting their respective strengths and weaknesses, along with some general challenges faced by these methods. Analyzing inferred network structures can provide insight into the genes, transcriptional changes, and regulatory interactions that play key roles in biological and disease-related processes of interest.

Paul Kirk, Tom Thorne, Michael P.H Stumpf (2013)Model selection in systems and synthetic biology, In: Current Opinion in Biotechnology24(4) Oxford University Press

Developing mechanistic models has become an integral aspect of systems biology, as has the need to differentiate between alternative models. Parameterizing mathematical models has been widely perceived as a formidable challenge, which has spurred the development of statistical and optimisation routines for parameter inference. But now focus is increasingly shifting to problems that require us to choose from among a set of different models to determine which one offers the best description of a given biological system. We will here provide an overview of recent developments in the area of model selection. We will focus on approaches that are both practical as well as build on solid statistical principles and outline the conceptual foundations and the scope for application of such methods in systems biology.

Background: Inference of gene regulatory network structures from RNA-Seq data is challenging due to the natureof the data, as measurements take the form of counts of reads mapped to a given gene. Here we present a model forRNA-Seq time series data that applies a negative binomial distribution for the observations, and uses sparse regressionwith a horseshoe prior to learn a dynamic Bayesian network of interactions between genes. We use a variationalinference scheme to learn approximate posterior distributions for the model parameters. Results: The methodology is benchmarked on synthetic data designed to replicate the distribution of real worldRNA-Seq data. We compare our method to other sparse regression approaches and find improved performance inlearning directed networks. We demonstrate an application of our method to a publicly available human neuronalstem cell differentiation RNA-Seq time series data set to infer the underlying network structure. Conclusions: Our method is able to improve performance on synthetic data by explicitly modelling the statisticaldistribution of the data when learning networks from RNA-Seq time series. Applying approximate inferencetechniques we can learn network structures quickly with only moderate computing resources.

Huizhi Liang, Umarani Ganeshbabu, Tom Thorne (2020)A Dynamic Bayesian Network Approach for Analysing Topic-Sentiment Evolution, In: IEEE Xplore8 IEEE

Sentiment analysis is one of the key tasks of natural language understanding. Sentiment Evolution models the dynamics of sentiment orientation over time. It can help people have a more profound and deep understanding of opinion and sentiment implied in user generated content. Existing work mainly focuses on sentiment classication, while the analysis of how the sentiment orientation of a topic has been inuenced by other topics or the dynamic interaction of topics from the aspect of sentiment has been ignored. In this paper, we propose to construct a Gaussian Process Dynamic Bayesian Network to model the dynamics and interactions of the sentiment of topics on social media such as Twitter. We use Dynamic Bayesian Networks to model time series of the sentiment of related topics and learn relationships between them. The network model itself applies Gaussian Process Regression to model the sentiment at a given time point based on related topics at previous time.We conducted experiments on a real world dataset that was crawled from Twitter with 9.72 million tweets. The experiment demonstrates a case study of analysing the sentiment dynamics of topics related to the event Brexit.

Tom Thorne (2015)Empirical likelihood tests for nonparametric detection of differential expression from RNA-seq data, In: Statistical Applications in Genetics and Molecular Biology14(6)pp. 575-583 De Gruyter

The availability of large quantities of transcriptomic data in the form of RNA-seq count data has necessitated the development of methods to identify genes differentially expressed between experimental conditions. Many existing approaches apply a parametric model of gene expression and so place strong assumptions on the distribution of the data. Here we explore an alternate nonparametric approach that applies an empirical likelihood framework, allowing us to define likelihoods without specifying a parametric model of the data. We demonstrate the performance of our method when applied to gold standard datasets, and to existing experimental data. Our approach outperforms or closely matches performance of existing methods in the literature, and requires modest computational resources. An R package, EmpDiff implementing the methods described in the paper is available from: http://homepages.inf.ed.ac.uk/tthorne/software/packages/EmpDiff_0.99.tar.gz.

A.R Gafson, Tom Thorne, C.I.J McKechnie, B. Jimenez, R. Nicholas, P.M Matthews (2018)Lipoprotein markers associated with disability from multiple sclerosis, In: Scientific Reports817026 Nature Research

Altered lipid metabolism is a feature of chronic inflammatory disorders. Increased plasma lipids and lipoproteins have been associated with multiple sclerosis (MS) disease activity. Our objective was to characterise the specific lipids and associated plasma lipoproteins increased in MS and to test for an association with disability. Plasma samples were collected from 27 RRMS patients (median EDSS, 1.5, range 1–7) and 31 healthy controls. Concentrations of lipids within lipoprotein sub-classes were determined from NMR spectra. Plasma cytokines were measured using the MesoScale Discovery V-PLEX kit. Associations were tested using multivariate linear regression. Differences between the patient and volunteer groups were found for lipids within VLDL and HDL lipoprotein sub-fractions (p < 0.05). Multivariate regression demonstrated a high correlation between lipids within VLDL sub-classes and the Expanded Disability Status Scale (EDSS) (p < 0.05). An optimal model for EDSS included free cholesterol carried by VLDL-2, gender and age (R2= 0.38, p < 0.05). Free cholesterol carried by VLDL-2 was highly correlated with plasma cytokines CCL-17 and IL-7 (R2= 0.78, p < 0.0001). These results highlight relationships between disability, inflammatory responses and systemic lipid metabolism in RRMS. Altered lipid metabolism with systemic inflammation may contribute to immune activation

Arie R. Gafson, Constantinos Savva, Tom Thorne, Mark David, Maria Gomez-Romero, Matthew R. Lewis, Richard Nicholas, Amanda Heslegrave, Henrik Zetterberg, Paul M. Matthews (2019)Breaking the cycle: Reversal of flux in the tricarboxylic acid cycle by dimethyl fumarate, In: Neurology, Neuroimmunology and Neuroinflammation6(3) Lippincott, Williams & Wilkins

Objective: To infer molecular effectors of therapeutic effects and adverse events for dimethyl fumarate (DMF) in patients with relapsing-remitting MS (RRMS) using untargeted plasma metabolomics. Methods: Plasma from 27 patients with RRMS was collected at baseline and 6 weeks after initiating DMF. Patients were separated into discovery (n = 15) and validation cohorts (n = 12). Ten healthy controls were also recruited. Metabolomic profiling using ultra-high-performance liquid chromatography mass spectrometry (UPLC-MS) was performed on the discovery cohort and healthy controls at Metabolon Inc (Durham, NC). UPLC-MS was performed on the validation cohort at the National Phenome Centre (London, UK). Plasma neurofilament concentration (pNfL) was assayed using the Simoa platform (Quanterix, Lexington, MA). Time course and cross-sectional analyses were performed to identify pharmacodynamic changes in the metabolome secondary to DMF and relate these to adverse events. Results: In the discovery cohort, tricarboxylic acid (TCA) cycle intermediates fumarate and succinate, and TCA cycle metabolites succinyl-carnitine and methyl succinyl-carnitine increased 6 weeks following treatment (q < 0.05). Methyl succinyl-carnitine increased in the validation cohort (q < 0.05). These changes were not observed in the control population. Increased succinylcarnitine and methyl succinyl-carnitine were associated with adverse events from DMF (flushing and abdominal symptoms). pNfL concentration was higher in patients with RRMS than in controls and reduced over 15 months of treatment. Conclusion: TCA cycle intermediates and metabolites are increased in patients with RRMS treated with DMF. The results suggest reversal of flux through the succinate dehydrogenase complex. The contribution of succinyl-carnitine ester agonism at hydroxycarboxylic acid receptor 2 to both therapeutic effects and adverse events requires investigation.

Tom Thorne (2016)NetDiff – Bayesian model selection for differential gene regulatory network inference, In: Scientific Reports639224 Nature Research

Differential networks allow us to better understand the changes in cellular processes that are exhibited in conditions of interest, identifying variations in gene regulation or protein interaction between, for example, cases and controls, or in response to external stimuli. Here we present a novel methodology for the inference of differential gene regulatory networks from gene expression microarray data. Specifically we apply a Bayesian model selection approach to compare models of conserved and varying network structure, and use Gaussian graphical models to represent the network structures. We apply a variational inference approach to the learning of Gaussian graphical models of gene regulatory networks, that enables us to perform Bayesian model selection that is significantly more computationally efficient than Markov Chain Monte Carlo approaches. Our method is demonstrated to be more robust than independent analysis of data from multiple conditions when applied to synthetic network data, generating fewer false positive predictions of differential edges. We demonstrate the utility of our approach on real world gene expression microarray data by applying it to existing data from amyotrophic lateral sclerosis cases with and without mutations in C9orf72, and controls, where we are able to identify differential network interactions for further investigation.

Tom Wilson, Hoang Thuy Duong Vo, Thomas William Thorne (2023)Identifying sub-populations of cells in single cell transcriptomic data – a Bayesian mixture modelling approach to zero-inflation of counts, In: Journal of Computational Biology Mary Ann Liebert

In the study of single cell RNA-seq data, a key component of the analysis is to identify sub-populations of cells in the data. A variety of approaches to this have been considered, and although many machine learning based methods have been developed, these rarely give an estimate of uncertainty in the cluster assignment. To allow for this probabilistic models have been developed, but single cell RNA-seq data exhibit a phenomenon known as dropout, whereby a large proportion of the observed read counts are zero. This poses challenges in developing probabilistic models that appropriately model the data. We develop a novel Dirichlet process mixture model which employs both a mixture at the cell level to model multiple populations of cells, and a zero-inflated negative binomial mixture of counts at the transcript level. By taking a Bayesian approach we are able to model the expression of genes within clusters, and to quantify uncertainty in cluster assignments. It is shown that this approach out-performs previous approaches that applied multinomial distributions to model single cell RNA-seq counts and negative binomial models that do not take into account zero-inflation. Applied to a publicly available data set of single cell RNA-seq counts of multiple cell types from the mouse cortex and hippocampus, we demonstrate how our approach can be used to distinguish sub-populations of cells as clusters in the data, and to identify gene sets that are indicative of membership of a sub-population. The methodology is implemented as an open source Snakemake pipeline available from https://github.com/ tt104/scmixture.

Thomas Thorne, Paul D W Kirk, Heather A Harrington (2022)Topological Approximate Bayesian Computation for Parameter Inference of an Angiogenesis Model, In: Bioinformatics Oxford University Press

Motivation Inferring the parameters of models describing biological systems is an important problem in the reverse engineering of the mechanisms underlying these systems. Much work has focused on parameter inference of stochastic and ordinary differential equation models using Approximate Bayesian Computation (ABC). While there is some recent work on inference in spatial models, this remains an open problem. Simultaneously, advances in topological data analysis (TDA), a field of computational mathematics, have enabled spatial patterns in data to be characterized. Results Here, we focus on recent work using TDA to study different regimes of parameter space for a well-studied model of angiogenesis. We propose a method for combining TDA with ABC to infer parameters in the Anderson–Chaplain model of angiogenesis. We demonstrate that this topological approach outperforms ABC approaches that use simpler statistics based on spatial features of the data. This is a first step toward a general framework of spatial parameter inference for biological systems, for which there may be a variety of filtrations, vectorizations and summary statistics to be considered. Availability and implementation All code used to produce our results is available as a Snakemake workflow from github.com/tt104/tabc_angio.

R Perryman, A Renziehausen, H Shaye, A D Kostagianni, A D Tsiailanis, THOMAS WILLIAM THORNE, M V Chatziathanasiadou, G B Sivolapenko, M A El Mubarak, G Won Han, B Zarzycka, V Katritch, G Lebon, C L Nigro, L Lattanzio, STEPHEN MORSE, JK Choi, K O’Neill, Z Kanaki, T Crook, A Klinakis, V Cherezov, A Tzakos, N Syed (2022)Inhibition of the angiotensin II type 2 receptor AT2R is a novel therapeutic strategy for glioblastoma, In: Proceedings of the National Academy of Sciences of the United States of America National Academy of Sciences

Glioblastoma (GBM) is an aggressive malignant primary brain tumor with limited therapeutic options. We show that the angiotensin II (AngII) type 2 receptor (AT2R) is a novel therapeutic target for GBM and that AngII, endogenously produced in GBM cells, promotes proliferation through AT2R. We repurposed EMA401, an AT2R antagonist originally developed as a peripherally restricted analgesic, for GBM and showed that it inhibits the proliferation of AT2R-expressing GBM spheroids and blocks their invasiveness and angiogenic capacity. The crystal structure of AT2R bound to EMA401 was determined and revealed the receptor to be in an active-like conformation with helix-VIII blocking G protein or β-arrestin recruitment. The architecture and interactions of EMA401 in AT2R differ drastically from complexes of AT2R with other relevant compounds. To enhance central nervous system (CNS) penetration of EMA401, we exploited the crystal structure to design an angiopep-2 tethered EMA401 derivative, A3E. A3E exhibited enhanced CNS penetration, leading to reduced tumor volume, inhibition of proliferation and increased levels of apoptosis in an orthotopic xenograft model of GBM.