I joined the Computer Science at the University of Surrey in June 2020 as a Senior Lecturer, having previously been a lecturer at the University of Reading. Prior to that I was a Safra research fellow in the Division of Brain Sciences at Imperial College London and a Chancellors Fellow in the School of Informatics at the University of Edinburgh. I originally studied Computer Science at King's College Cambridge before taking an MSc and PhD in Bioinformatics at Imperial College London.
My research applies statistical methods to analysing large biological data sets, especially in learning networks and studying their structure. You can read more about my current research interests under the research section.
Areas of specialism
I am open to applications from funded PhD students with interests in statistical methods in Systems Biology.
My research focusses on statistical methods in Systems Biology and Bioinformatics, including the inference of biological networks, and the statistical analysis of networks. I am also interested in the use of approximate Bayesian methods for complex models and large data sets, and the acceleration of statistical methods with GPUs.
Recently research has involved developing tools for the analysis of single cell transcriptomic data, which provides information about the heterogeneity of gene expression across a population of cells. Another area of interest is comparing data sets between different conditions to learn differential networks that jointly infer network structures across data sets but also allow differences to be identified.
I am currently leading the teaching team of COMM054 Data Science Principles and Practices on the MSc Data Science at the University of Surrey. This module introduces the fundamental concepts in probability and statistics that provide a solid background for students studying data science.
Evolution models the dynamics of sentiment orientation over time. It can help people have a more profound
and deep understanding of opinion and sentiment implied in user generated content. Existing work mainly
focuses on sentiment classiýcation, while the analysis of how the sentiment orientation of a topic has been
inýuenced by other topics or the dynamic interaction of topics from the aspect of sentiment has been ignored.
In this paper, we propose to construct a Gaussian Process Dynamic Bayesian Network to model the dynamics
and interactions of the sentiment of topics on social media such as Twitter. We use Dynamic Bayesian
Networks to model time series of the sentiment of related topics and learn relationships between them.
The network model itself applies Gaussian Process Regression to model the sentiment at a given time point
based on related topics at previous time.We conducted experiments on a real world dataset that was crawled
from Twitter with 9.72 million tweets. The experiment demonstrates a case study of analysing the sentiment
dynamics of topics related to the event Brexit.
To infer molecular effectors of therapeutic effects and adverse events for dimethyl fumarate
(DMF) in patients with relapsing-remitting MS (RRMS) using untargeted plasma metabolomics.
Plasma from 27 patients with RRMS was collected at baseline and 6 weeks after initiating DMF.
Patients were separated into discovery (n = 15) and validation cohorts (n = 12). Ten healthy
controls were also recruited. Metabolomic profiling using ultra-high-performance liquid
chromatography mass spectrometry (UPLC-MS) was performed on the discovery cohort and
healthy controls at Metabolon Inc (Durham, NC). UPLC-MS was performed on the validation
cohort at the National Phenome Centre (London, UK). Plasma neurofilament concentration
(pNfL) was assayed using the Simoa platform (Quanterix, Lexington, MA). Time course and
cross-sectional analyses were performed to identify pharmacodynamic changes in the metabolome
secondary to DMF and relate these to adverse events.
In the discovery cohort, tricarboxylic acid (TCA) cycle intermediates fumarate and succinate,
and TCA cycle metabolites succinyl-carnitine and methyl succinyl-carnitine increased 6 weeks
following treatment (q and methyl succinyl-carnitine were associated with adverse events from DMF
(flushing and abdominal symptoms). pNfL concentration was higher in patients with RRMS
than in controls and reduced over 15 months of treatment.
TCA cycle intermediates and metabolites are increased in patients with RRMS treated with
DMF. The results suggest reversal of flux through the succinate dehydrogenase complex. The
contribution of succinyl-carnitine ester agonism at hydroxycarboxylic acid receptor 2 to both
therapeutic effects and adverse events requires investigation.
Inference of gene regulatory network structures from RNA-Seq data is challenging due to the natureof the data, as measurements take the form of counts of reads mapped to a given gene. Here we present a model forRNA-Seq time series data that applies a negative binomial distribution for the observations, and uses sparse regressionwith a horseshoe prior to learn a dynamic Bayesian network of interactions between genes. We use a variationalinference scheme to learn approximate posterior distributions for the model parameters.
The methodology is benchmarked on synthetic data designed to replicate the distribution of real worldRNA-Seq data. We compare our method to other sparse regression approaches and find improved performance inlearning directed networks. We demonstrate an application of our method to a publicly available human neuronalstem cell differentiation RNA-Seq time series data set to infer the underlying network structure.
Our method is able to improve performance on synthetic data by explicitly modelling the statisticaldistribution of the data when learning networks from RNA-Seq time series. Applying approximate inferencetechniques we can learn network structures quickly with only moderate computing resources.
Results: Here we propose a novel approach to modelling metabolic fluxes: derivative processes that are based on multiple-output Gaussian processes (MGPs), which are a flexible non-parametric Bayesian modelling technique. The main advantages that follow from MGPs approach include the natural non-parametric representation of the fluxes and ability to impute the missing data in between the measurements. Our derivative process approach allows us to model changes in metabolite derivative concentrations and to characterize the temporal behaviour of metabolic fluxes from time course data. Because the derivative of a Gaussian process is itself a Gaussian process, we can readily link metabolite concentrations to metabolic fluxes and vice versa. Here we discuss how this can be implemented in an MGP framework and illustrate its application to simple models, including nitrogen metabolism in Escherichia coli.