Areas of specialism

Systems Biology; Bioinformatics; Bayesian statistics; Statistics of networks


Research interests

My teaching

My publications


There is an up to date publication list on my personal website.


Liang Huizhi, Ganeshbabu Umarani, Thorne Tom (2020) A Dynamic Bayesian Network Approach for Analysing Topic-Sentiment Evolution,IEEE Xplore 8 IEEE
Sentiment analysis is one of the key tasks of natural language understanding. Sentiment
Evolution models the dynamics of sentiment orientation over time. It can help people have a more profound
and deep understanding of opinion and sentiment implied in user generated content. Existing work mainly
focuses on sentiment classiýcation, while the analysis of how the sentiment orientation of a topic has been
inýuenced by other topics or the dynamic interaction of topics from the aspect of sentiment has been ignored.
In this paper, we propose to construct a Gaussian Process Dynamic Bayesian Network to model the dynamics
and interactions of the sentiment of topics on social media such as Twitter. We use Dynamic Bayesian
Networks to model time series of the sentiment of related topics and learn relationships between them.
The network model itself applies Gaussian Process Regression to model the sentiment at a given time point
based on related topics at previous time.We conducted experiments on a real world dataset that was crawled
from Twitter with 9.72 million tweets. The experiment demonstrates a case study of analysing the sentiment
dynamics of topics related to the event Brexit.
Gafson Arie R., Savva Constantinos, Thorne Tom, David Mark, Gomez-Romero Maria, Lewis Matthew R., Nicholas Richard, Heslegrave Amanda, Zetterberg Henrik, Matthews Paul M. (2019) Breaking the cycle: Reversal of flux in the tricarboxylic acid cycle by dimethyl fumarate,Neurology, Neuroimmunology and Neuroinflammation 6 (3) Lippincott, Williams & Wilkins
To infer molecular effectors of therapeutic effects and adverse events for dimethyl fumarate
(DMF) in patients with relapsing-remitting MS (RRMS) using untargeted plasma metabolomics.

Plasma from 27 patients with RRMS was collected at baseline and 6 weeks after initiating DMF.
Patients were separated into discovery (n = 15) and validation cohorts (n = 12). Ten healthy
controls were also recruited. Metabolomic profiling using ultra-high-performance liquid
chromatography mass spectrometry (UPLC-MS) was performed on the discovery cohort and
healthy controls at Metabolon Inc (Durham, NC). UPLC-MS was performed on the validation
cohort at the National Phenome Centre (London, UK). Plasma neurofilament concentration
(pNfL) was assayed using the Simoa platform (Quanterix, Lexington, MA). Time course and
cross-sectional analyses were performed to identify pharmacodynamic changes in the metabolome
secondary to DMF and relate these to adverse events.

In the discovery cohort, tricarboxylic acid (TCA) cycle intermediates fumarate and succinate,
and TCA cycle metabolites succinyl-carnitine and methyl succinyl-carnitine increased 6 weeks
following treatment (q and methyl succinyl-carnitine were associated with adverse events from DMF
(flushing and abdominal symptoms). pNfL concentration was higher in patients with RRMS
than in controls and reduced over 15 months of treatment.

TCA cycle intermediates and metabolites are increased in patients with RRMS treated with
DMF. The results suggest reversal of flux through the succinate dehydrogenase complex. The
contribution of succinyl-carnitine ester agonism at hydroxycarboxylic acid receptor 2 to both
therapeutic effects and adverse events requires investigation.

Gafson A.R, Thorne Tom, McKechnie C.I.J, Jimenez B., Nicholas R., Matthews P.M (2018) Lipoprotein markers associated with disability from multiple sclerosis,Scientific Reports 8 17026 Nature Research
Altered lipid metabolism is a feature of chronic inflammatory disorders. Increased plasma lipids and lipoproteins have been associated with multiple sclerosis (MS) disease activity. Our objective was to characterise the specific lipids and associated plasma lipoproteins increased in MS and to test for an association with disability. Plasma samples were collected from 27 RRMS patients (median EDSS, 1.5, range 1?7) and 31 healthy controls. Concentrations of lipids within lipoprotein sub-classes were determined from NMR spectra. Plasma cytokines were measured using the MesoScale Discovery V-PLEX kit. Associations were tested using multivariate linear regression. Differences between the patient and volunteer groups were found for lipids within VLDL and HDL lipoprotein sub-fractions (p
Thorne Tom (2015) Empirical likelihood tests for nonparametric detection of differential expression from RNA-seq data,Statistical Applications in Genetics and Molecular Biology 14 (6) pp. 575-583 De Gruyter
The availability of large quantities of transcriptomic data in the form of RNA-seq count data has necessitated the development of methods to identify genes differentially expressed between experimental conditions. Many existing approaches apply a parametric model of gene expression and so place strong assumptions on the distribution of the data. Here we explore an alternate nonparametric approach that applies an empirical likelihood framework, allowing us to define likelihoods without specifying a parametric model of the data. We demonstrate the performance of our method when applied to gold standard datasets, and to existing experimental data. Our approach outperforms or closely matches performance of existing methods in the literature, and requires modest computational resources. An R package, EmpDiff implementing the methods described in the paper is available from:….
Kirk Paul, Thorne Tom, Stumpf Michael P.H (2013) Model selection in systems and synthetic biology,Current Opinion in Biotechnology 24 (4) Oxford University Press
Developing mechanistic models has become an integral aspect of systems biology, as has the need to differentiate between alternative models. Parameterizing mathematical models has been widely perceived as a formidable challenge, which has spurred the development of statistical and optimisation routines for parameter inference. But now focus is increasingly shifting to problems that require us to choose from among a set of different models to determine which one offers the best description of a given biological system. We will here provide an overview of recent developments in the area of model selection. We will focus on approaches that are both practical as well as build on solid statistical principles and outline the conceptual foundations and the scope for application of such methods in systems biology.
Babtie Ann C., Stumpf Michael P.H, Thorne Tom (2019) Gene Regulatory Network Inference,In: Voit E. (eds.), Elsevier Reference Module in Biomedical Sciences Elsevier
Transcriptomic data quantifying gene expression states for single cells or cell populations at a genomic level is now readily available. Changes in transcriptional state during cell development and function are governed by gene regulatory networks, comprising a collection of genes and regulatory interactions between these genes (or gene products). Network inference algorithms aim to infer functional interactions between genes from experimentally observed expression profiles, and identify the structure of the underlying regulatory networks. Here we describe popular classes of network inference algorithms, highlighting their respective strengths and weaknesses, along with some general challenges faced by these methods. Analyzing inferred network structures can provide insight into the genes, transcriptional changes, and regulatory interactions that play key roles in biological and disease-related processes of interest.
Thorne Tom (2016) NetDiff ? Bayesian model selection for differential gene regulatory network inference,Scientific Reports 6 39224 Nature Research
Differential networks allow us to better understand the changes in cellular processes that are exhibited in conditions of interest, identifying variations in gene regulation or protein interaction between, for example, cases and controls, or in response to external stimuli. Here we present a novel methodology for the inference of differential gene regulatory networks from gene expression microarray data. Specifically we apply a Bayesian model selection approach to compare models of conserved and varying network structure, and use Gaussian graphical models to represent the network structures. We apply a variational inference approach to the learning of Gaussian graphical models of gene regulatory networks, that enables us to perform Bayesian model selection that is significantly more computationally efficient than Markov Chain Monte Carlo approaches. Our method is demonstrated to be more robust than independent analysis of data from multiple conditions when applied to synthetic network data, generating fewer false positive predictions of differential edges. We demonstrate the utility of our approach on real world gene expression microarray data by applying it to existing data from amyotrophic lateral sclerosis cases with and without mutations in C9orf72, and controls, where we are able to identify differential network interactions for further investigation.
Inference of gene regulatory network structures from RNA-Seq data is challenging due to the natureof the data, as measurements take the form of counts of reads mapped to a given gene. Here we present a model forRNA-Seq time series data that applies a negative binomial distribution for the observations, and uses sparse regressionwith a horseshoe prior to learn a dynamic Bayesian network of interactions between genes. We use a variationalinference scheme to learn approximate posterior distributions for the model parameters.

The methodology is benchmarked on synthetic data designed to replicate the distribution of real worldRNA-Seq data. We compare our method to other sparse regression approaches and find improved performance inlearning directed networks. We demonstrate an application of our method to a publicly available human neuronalstem cell differentiation RNA-Seq time series data set to infer the underlying network structure.

Our method is able to improve performance on synthetic data by explicitly modelling the statisticaldistribution of the data when learning networks from RNA-Seq time series. Applying approximate inferencetechniques we can learn network structures quickly with only moderate computing resources.

Zurauskiene Justina, Kirk Paul, Thorne Tom, Stumpf Michael P.H (2014) Bayesian non-parametric approaches to reconstructing oscillatory systems and the Nyquist limit,Physica A: Statistical Mechanics and its Applications 407 pp. 33-42 Elsevier
Reconstructing continuous signals from discrete time-points is a challenging inverse problem encountered in many scientific and engineering applications. For oscillatory signals classical results due to Nyquist set the limit below which it becomes impossible to reliably reconstruct the oscillation dynamics. Here we revisit this problem for vector-valued outputs and apply Bayesian non-parametric approaches in order to solve the function estimation problem. The main aim of the current paper is to map how we can use of correlations among different outputs to reconstruct signals at a sampling rate that lies below the Nyquist rate. We show that it is possible to use multiple-output Gaussian processes to capture dependences between outputs which facilitate reconstruction of signals in situation where conventional Gaussian processes (i.e. this aimed at describing scalar signals) fail, and we delineate the phase and frequency dependence of the reliability of this type of approach. In addition to simple toy-models we also consider the dynamics of the tumour suppressor gene p53, which exhibits oscillations under physiological conditions, and which can be reconstructed more reliably in our new framework.
Zurauskiene Justina, Kirk Paul, Thorne Tom, Pinney John, Stumpf Michael (2014) Derivative processes for modelling metabolic fluxes,Bioinformatics 30 (13) pp. 1892-1898 Oxford University Press
Motivation: One of the challenging questions in modelling biological systems is to characterize the functional forms of the processes that control and orchestrate molecular and cellular phenotypes. Recently proposed methods for the analysis of metabolic pathways, for example, dynamic flux estimation, can only provide estimates of the underlying fluxes at discrete time points but fail to capture the complete temporal behaviour. To describe the dynamic variation of the fluxes, we additionally require the assumption of specific functional forms that can capture the temporal behaviour. However, it also remains unclear how to address the noise which might be present in experimentally measured metabolite concentrations.

Results: Here we propose a novel approach to modelling metabolic fluxes: derivative processes that are based on multiple-output Gaussian processes (MGPs), which are a flexible non-parametric Bayesian modelling technique. The main advantages that follow from MGPs approach include the natural non-parametric representation of the fluxes and ability to impute the missing data in between the measurements. Our derivative process approach allows us to model changes in metabolite derivative concentrations and to characterize the temporal behaviour of metabolic fluxes from time course data. Because the derivative of a Gaussian process is itself a Gaussian process, we can readily link metabolite concentrations to metabolic fluxes and vice versa. Here we discuss how this can be implemented in an MGP framework and illustrate its application to simple models, including nitrogen metabolism in Escherichia coli.