Samaneh Kouchaki

Dr Samaneh Kouchaki

Lecturer in Machine Learning for Healthcare
+44 (0)1483 689261
38 BA 01



Research interests

Research collaborations

PhD research positions

Please contact me if you are interested in PhD in Machine learning for Healthcare applications.


Postgraduate research supervision

Completed postgraduate research projects I have supervised



Alexey Youssef, Samaneh Kouchaki, Farah Shamout, Jacob Armstrong, Rasheed El‐Bouri, Thomas Taylor, Drew Birrenkott, Baptiste Vasey, Andrew Soltan, Tingting Zhu, David A Clifton, David W Eyre (2021)Development and validation of early warning score systems for COVID‐19 patients, In: Health Technology Letters8(5)pp. 105-117 Wiley

COVID‐19 is a major, urgent, and ongoing threat to global health. Globally more than 24 million have been infected and the disease has claimed more than a million lives as of November 2020. Predicting which patients will need respiratory support is important to guiding individual patient treatment and also to ensuring sufficient resources are available. The ability of six common Early Warning Scores (EWS) to identify respiratory deterioration defined as the need for advanced respiratory support (high‐flow nasal oxygen, continuous positive airways pressure, non‐invasive ventilation, intubation) within a prediction window of 24 h is evaluated. It is shown that these scores perform sub‐optimally at this specific task. Therefore, an alternative EWS based on the Gradient Boosting Trees (GBT) algorithm is developed that is able to predict deterioration within the next 24 h with high AUROC 94% and an accuracy, sensitivity, and specificity of 70%, 96%, 70%, respectively. The GBT model outperformed the best EWS (LDTEWS:NEWS), increasing the AUROC by 14%. Our GBT model makes the prediction based on the current and baseline measures of routinely available vital signs and blood tests.

Yang Yang, Timothy M Walker, Samaneh Kouchaki, Chenyang Wang, Timothy E A Peto, Derrick W Crook, David A Clifton, (2021)An end-to-end heterogeneous graph attention network for Mycobacterium tuberculosis drug-resistance prediction, In: Briefings in Bioinformatics22(6) Oxford University Press

Abstract Antimicrobial resistance (AMR) poses a threat to global public health. To mitigate the impacts of AMR, it is important to identify the molecular mechanisms of AMR and thereby determine optimal therapy as early as possible. Conventional machine learning-based drug-resistance analyses assume genetic variations to be homogeneous, thus not distinguishing between coding and intergenic sequences. In this study, we represent genetic data from Mycobacterium tuberculosis as a graph, and then adopt a deep graph learning method—heterogeneous graph attention network (‘HGAT–AMR’)—to predict anti-tuberculosis (TB) drug resistance. The HGAT–AMR model is able to accommodate incomplete phenotypic profiles, as well as provide ‘attention scores’ of genes and single nucleotide polymorphisms (SNPs) both at a population level and for individual samples. These scores encode the inputs, which the model is ‘paying attention to’ in making its drug resistance predictions. The results show that the proposed model generated the best area under the receiver operating characteristic (AUROC) for isoniazid and rifampicin (98.53 and 99.10%), the best sensitivity for three first-line drugs (94.91% for isoniazid, 96.60% for ethambutol and 90.63% for pyrazinamide), and maintained performance when the data were associated with incomplete phenotypes (i.e. for those isolates for which phenotypic data for some drugs were missing). We also demonstrate that the model successfully identifies genes and SNPs associated with drug resistance, mitigating the impact of resistance profile while considering particular drug resistance, which is consistent with domain knowledge.

Martin Hunt, Brice Letcher, Giang Nguyen, Michael B. Hall, Rachel M. Colquhoun, Michael C. Schatz, Srividya Ramakrishnan, CRyPTIC consortium, Zamin Iqbal (2022)Minos: variant adjudication and joint genotyping of cohorts of bacterial genomes, In: Genome Biology23147 BMC

There are many short-read variant-calling tools, with different strengths and weaknesses. We present a tool, Minos, which combines outputs from arbitrary variant callers, increasing recall without loss of precision. We benchmark on 62 samples from three bacterial species and an outbreak of 385 Mycobacterium tuberculosis samples. Minos also enables joint genotyping; we demonstrate on a large (N=13k) M. tuberculosis cohort, building a map of non-synonymous SNPs and indels in a region where all such variants are assumed to cause rifampicin resistance. We quantify the correlation with phenotypic resistance and then replicate in a second cohort (N=10k).

Nivedita Bijlani, Ramin Nilforooshan, Samaneh Kouchaki (2022)An Unsupervised Data-Driven Anomaly Detection Approach for Adverse Health Conditions in People Living With Dementia: Cohort Study, In: JMIR aging5(3)e38211

BACKGROUNDSensor-based remote health monitoring can be used for the timely detection of health deterioration in people living with dementia with minimal impact on their day-to-day living. Anomaly detection approaches have been widely applied in various domains, including remote health monitoring. However, current approaches are challenged by noisy, multivariate data and low generalizability. OBJECTIVEThis study aims to develop an online, lightweight unsupervised learning-based approach to detect anomalies representing adverse health conditions using activity changes in people living with dementia. We demonstrated its effectiveness over state-of-the-art methods on a real-world data set of 9363 days collected from 15 participant households by the UK Dementia Research Institute between August 2019 and July 2021. Our approach was applied to household movement data to detect urinary tract infections (UTIs) and hospitalizations. METHODSWe propose and evaluate a solution based on Contextual Matrix Profile (CMP), an exact, ultrafast distance-based anomaly detection algorithm. Using daily aggregated household movement data collected via passive infrared sensors, we generated CMPs for location-wise sensor counts, duration, and change in hourly movement patterns for each patient. We computed a normalized anomaly score in 2 ways: by combining univariate CMPs and by developing a multidimensional CMP. The performance of our method was evaluated relative to Angle-Based Outlier Detection, Copula-Based Outlier Detection, and Lightweight Online Detector of Anomalies. We used the multidimensional CMP to discover and present the important features associated with adverse health conditions in people living with dementia. RESULTSThe multidimensional CMP yielded, on average, 84.3% recall with 32.1 alerts, or a 5.1% alert rate, offering the best balance of recall and relative precision compared with Copula-Based and Angle-Based Outlier Detection and Lightweight Online Detector of Anomalies when evaluated for UTI and hospitalization. Midnight to 6 AM bathroom activity was shown to be the most important cross-patient digital biomarker of anomalies indicative of UTI, contributing approximately 30% to the anomaly score. We also demonstrated how CMP-based anomaly scoring can be used for a cross-patient view of anomaly patterns. CONCLUSIONSTo the best of our knowledge, this is the first real-world study to adapt the CMP to continuous anomaly detection in a health care scenario. The CMP inherits the speed, accuracy, and simplicity of the Matrix Profile, providing configurability, the ability to denoise and detect patterns, and explainability to clinical practitioners. We addressed the need for anomaly scoring in multivariate time series health care data by developing the multidimensional CMP. With high sensitivity, a low alert rate, better overall performance than state-of-the-art methods, and the ability to discover digital biomarkers of anomalies, the CMP is a clinically meaningful unsupervised anomaly detection technique extensible to multimodal data for dementia and other health care scenarios.

Elaheh Kalantari, Samaneh Kouchaki, Christine Miaskowski, Kord M. Kober, Payam Barnaghi (2022)Network analysis to identify symptoms clusters and temporal interconnections in oncology patients, In: Scientific Reports12(1)17052 Nature Research

Oncology patients experience numerous co-occurring symptoms during their treatment. The identification of sentinel/core symptoms is a vital prerequisite for therapeutic interventions. In this study, using Network Analysis, we investigated the inter-relationships among 38 common symptoms over time (i.e., a total of six time points over two cycles of chemotherapy) in 987 oncology patients with four different types of cancer (i.e., breast, gastrointestinal, gynaecological, and lung). In addition, we evaluated the associations between and among symptoms and symptoms clusters and examined the strength of these interactions over time. Eight unique symptom clusters were identified within the networks. Findings from this research suggest that changes occur in the relationships and interconnections between and among co-occurring symptoms and symptoms clusters that depend on the time point in the chemotherapy cycle and the type of cancer. The evaluation of the centrality measures provides new insights into the relative importance of individual symptoms within various networks that can be considered as potential targets for symptom management interventions.

Elham Khalili, Samaneh Kouchaki, Shahin Ramazi, Faezeh Ghanati (2020)Machine Learning Techniques for Soybean Charcoal Rot Disease Prediction, In: Frontiers in Plant Science11590529 Frontiers Media S.A

Early prediction of pathogen infestation is a key factor to reduce the disease spread in plants. Macrophomina phaseolina (Tassi) Goid, as one of the main causes of charcoal rot disease, suppresses the plant productivity significantly. Charcoal rot disease is one of the most severe threats to soybean productivity. Prediction of this disease in soybeans is very tedious and non-practical using traditional approaches. Machine learning (ML) techniques have recently gained substantial traction across numerous domains. ML methods can be applied to detect plant diseases, prior to the full appearance of symptoms. In this paper, several ML techniques were developed and examined for prediction of charcoal rot disease in soybean for a cohort of 2,000 healthy and infected plants. A hybrid set of physiological and morphological features were suggested as inputs to the ML models. All developed ML models were performed better than 90% in terms of accuracy. Gradient Tree Boosting (GBT) was the best performing classifier which obtained 96.25% and 97.33% in terms of sensitivity and specificity. Our findings supported the applicability of ML especially GBT for charcoal rot disease prediction in a real environment. Moreover, our analysis demonstrated the importance of including physiological featured in the learning. The collected dataset and source code can be found in

Mohammad Amin Zare, R Boostani, M Mohammadi, Samaneh Kouchaki (2022)A Dopamine Based Adaptive Emotional Neural Network, In: IEEE Access10pp. 109460-109475 Institute of Electrical and Electronics Engineers (IEEE)

Due to the inevitable role of emotions in human learning and decision-making, different types of emotions in the form of emotional weights/neurons have also been considered in shallow neural networks. Emotional neural networks suffer from a low convergence rate as well as batch learning instability mainly because of the improper tuning of learning coefficients. To overcome these drawbacks, we introduced two solutions: (i) a heuristic upgrading method, inspiring by the behavior of dopamine secretion in the human brain, to adaptively regulate the learning rate based on positive and negative emotional states at each epoch and (ii) a stochastic learning technique to stabilize the learning process. The proposed dopamine based adaptive emotional neural network statistically outperforms state-of-the-art methods like emotional neural network, prototype-incorporated emotional neural network, multi-layer perceptron, and deep convolutional neural networks such as LeNet, AlexNet, DenseNet, MobileNet and EfficientNet in terms of different measures such as accuracy and convergence rate on several high dimensional and big datasets.

Andrew A S Soltan, Samaneh Kouchaki, Tingting Zhu, Dani Kiyasseh, Thomas Taylor, Zaamin B Hussain, Tim Peto, Andrew J Brent, David W Eyre, David A Clifton (2021)Rapid triage for COVID-19 using routine clinical data for patients attending hospital: development and prospective validation of an artificial intelligence screening test, In: The Lancet. Digital health3(2)pp. e78-e87 Elsevier

The early clinical course of COVID-19 can be difficult to distinguish from other illnesses driving presentation to hospital. However, viral-specific PCR testing has limited sensitivity and results can take up to 72 h for operational reasons. We aimed to develop and validate two early-detection models for COVID-19, screening for the disease among patients attending the emergency department and the subset being admitted to hospital, using routinely collected health-care data (laboratory tests, blood gas measurements, and vital signs). These data are typically available within the first hour of presentation to hospitals in high-income and middle-income countries, within the existing laboratory infrastructure. We trained linear and non-linear machine learning classifiers to distinguish patients with COVID-19 from pre-pandemic controls, using electronic health record data for patients presenting to the emergency department and admitted across a group of four teaching hospitals in Oxfordshire, UK (Oxford University Hospitals). Data extracted included presentation blood tests, blood gas testing, vital signs, and results of PCR testing for respiratory viruses. Adult patients (>18 years) presenting to hospital before Dec 1, 2019 (before the first COVID-19 outbreak), were included in the COVID-19-negative cohort; those presenting to hospital between Dec 1, 2019, and April 19, 2020, with PCR-confirmed severe acute respiratory syndrome coronavirus 2 infection were included in the COVID-19-positive cohort. Patients who were subsequently admitted to hospital were included in their respective COVID-19-negative or COVID-19-positive admissions cohorts. Models were calibrated to sensitivities of 70%, 80%, and 90% during training, and performance was initially assessed on a held-out test set generated by an 80:20 split stratified by patients with COVID-19 and balanced equally with pre-pandemic controls. To simulate real-world performance at different stages of an epidemic, we generated test sets with varying prevalences of COVID-19 and assessed predictive values for our models. We prospectively validated our 80% sensitivity models for all patients presenting or admitted to the Oxford University Hospitals between April 20 and May 6, 2020, comparing model predictions with PCR test results. We assessed 155 689 adult patients presenting to hospital between Dec 1, 2017, and April 19, 2020. 114 957 patients were included in the COVID-negative cohort and 437 in the COVID-positive cohort, for a full study population of 115 394 patients, with 72 310 admitted to hospital. With a sensitive configuration of 80%, our emergency department (ED) model achieved 77·4% sensitivity and 95·7% specificity (area under the receiver operating characteristic curve [AUROC] 0·939) for COVID-19 among all patients attending hospital, and the admissions model achieved 77·4% sensitivity and 94·8% specificity (AUROC 0·940) for the subset of patients admitted to hospital. Both models achieved high negative predictive values (NPV; >98·5%) across a range of prevalences (≤5%). We prospectively validated our models for all patients presenting and admitted to Oxford University Hospitals in a 2-week test period. The ED model (3326 patients) achieved 92·3% accuracy (NPV 97·6%, AUROC 0·881), and the admissions model (1715 patients) achieved 92·5% accuracy (97·7%, 0·871) in comparison with PCR results. Sensitivity analyses to account for uncertainty in negative PCR results improved apparent accuracy (ED model 95·1%, admissions model 94·1%) and NPV (ED model 99·0%, admissions model 98·5%). Our models performed effectively as a screening test for COVID-19, excluding the illness with high-confidence by use of clinical data routinely available within 1 h of presentation to hospital. Our approach is rapidly scalable, fitting within the existing laboratory testing infrastructure and standard of care of hospitals in high-income and middle-income countries. Wellcome Trust, University of Oxford, Engineering and Physical Sciences Research Council, National Institute for Health Research Oxford Biomedical Research Centre.

Roonak Rezvani, Samaneh Kouchaki, Ramin Nilforooshan, David J Sharp, Payam Barnaghi (2021)Analysing behavioural changes in people with dementia using in‐home monitoring technologies, In: Alzheimer's & Dementia: The Journal of the Alzheimer's Association17(S11)e052181 Wiley

Background Behavioural changes and neuropsychiatric symptoms such as agitation are common in people with dementia. These symptoms impact the quality of life of people with dementia and can increase the stress on caregivers. This study aims to identify the likelihood of having agitation in people affected by dementia (i.e., patients and carers) using routinely collected data from in‐home monitoring technologies. We have used a digital platform and analytical methods, developed in our previous study, to generate alerts when changes occur in the digital markers collected using in‐home sensing technologies (i.e., vital signs, environmental and activity data). A care monitoring team use the platform and interact with participants and caregivers when an alert is generated. Method We have used connected sensory devices to collect environmental markers, including Passive Infra‐Red (PIR), smart power plugs for monitoring home appliance use, motion and door sensors. The environmental marker data have been aggregated within each hour and used to train an agitation risk analysis model. We have trained a model using data collected from 88 homes (∼6 months of data from each home). The proposed model has two components: a self‐supervised transformation learning and an ensemble classification model for agitation likelihood. Ten different neural network encoders are learned to create pseudo‐labels using the samples from the unlabelled data. We use these pseudo‐labels to train a classification model with a convolutional block and a decision layer. The trained convolutional block is then used to learn a latent representation of the data for an ensemble classification block. Results Comparing with baseline models such as LSTM network, Bidirectional LSTM (BiLSTM) network, VGG, ResNet, Inception, Random Forest (RF), Support Vector Machine (SVM) and Gaussian Process (GP) classifiers, the proposed model performs better in sensitivity (recall) and area under the precision‐recall curve with at most 40% improvement. The recall measure using the 10‐fold cross‐validation technique is 61%. Conclusion This method can support early interventions and help develop new pathways to support people affected by dementia. A limitation in our current study is that the environmental and movement data is at the home level and not personalised.

Francesca Palermo, Honglin Li, Alexander Capstick, Nan Fletcher-Lloyd, Yuchen Zhao, Samaneh Kouchaki, Ramin Nilforooshan, David Sharp, Payam Barnaghi (2021)Designing A Clinically Applicable Deep Recurrent Model to Identify Neuropsychiatric Symptoms in People Living with Dementia Using In-Home Monitoring Data

Agitation is one of the neuropsychiatric symptoms with high prevalence in de-mentia which can negatively impact the Activities of Daily Living (ADL) and the independence of individuals. Detecting agitation episodes can assist in providing People Living with Dementia (PLWD) with early and timely interventions. Analysing agitation episodes will also help identify modifiable factors such as ambient temperature and sleep as possible components causing agitation in an individual. This preliminary study presents a supervised learning model to anal-* We apply a recurrent deep learning model to identify agitation episodes validated and recorded by a clinical monitoring team. We present the experiments to assess the efficacy of the proposed model. The proposed model achieves an average of 79.78% recall, 27.66% precision and 37.64% F1 scores when employing the optimal parameters, suggesting a good ability to recognise agitation events. We also discuss using machine learning models for analysing the behavioural patterns using continuous monitoring data and explore clinical applicability and the choices between specificity and specificity in home monitoring applications.

Sensor-based remote health monitoring is used in industrial, urban and healthcare settings to monitor ongoing operation of equipment and human health. An important aim is to intervene early if anomalous events or adverse health is detected. In the wild, these anomaly detection approaches are challenged by noise, label scarcity, high dimensionality, explainability and wide variability in operating environments. The Contextual Matrix Profile (CMP) is a configurable 2-dimensional version of the Matrix Profile (MP) that uses the distance matrix of all subsequences of a time series to discover patterns and anomalies. The CMP is shown to enhance the effectiveness of the MP and other SOTA methods at detecting, visualising and interpreting true anomalies in noisy real world data from different domains. It excels at zooming out and identifying temporal patterns at configurable time scales. However, the CMP does not address cross-sensor information, and cannot scale to high dimensional data. We propose a novel, self-supervised graph- based approach for temporal anomaly detection that works on context graphs generated from the CMP distance matrix. The learned graph embeddings encode the anomalous nature of a time context. In addition, we evaluate other graph outlier algorithms for the same task. Given our pipeline is modular, graph construction, generation of graph embeddings, and pattern recognition logic can all be chosen based on the specific pattern detection application.We verified the effectiveness of graph-based anomaly detection and compared it with the CMP and 3 state-of-the art methods on two real-world healthcare datasets with different anomalies. Our proposed method demonstrated better recall, alert rate and generalisability.

Samaneh Kouchaki, Paolo Miotto, Claudio U. Köser, Philip W. Fowler, Jeff Knaggs, Zamin Iqbal, Martin Hunt, Leonid Chindelevitch, Maha Farhat, Daniela Maria Cirillo, Iñaki Comas, James Posey, Shaheed V. Omar, Timothy E A Peto, Anita Suresh, Swapna Uplekar, Sacha Laurent, Rebecca E. Colman, Carl-Michael Nathanson, Matteo Zignol, Ann Sarah Walker, Derrick W. Crook, Nazir Ismail, Timothy C. Rodwell (2021)The 2021 WHO catalogue of Mycobacterium tuberculosis complex mutations associated with drug resistance: A genotypic analysis, In: The Lancet Microbe Elsevier
Samaneh Kouchaki, Narges Pourshahrokhi, Kord M. Kober, Christine Miaskowski, Payam Barnaghi (2021)A Hybrid Bayesian Model to Analyse Healthcare Data

Missing values exist in nearly all clinical studies because data for a variable or question are not collected or not available. Imputing missing values and augmenting data can significantly improve generalisation and avoid bias in machine learning models. We propose a Hybrid Bayesian inference using Hamiltonian Monte Carlo (F-HMC) as a more practical approach to process cross-dimensional relations by applying a random walk and Hamiltonian dynamics to adapt posterior distribution and generate large-scale samples. The proposed method is applied to cancer symptom assessment, and MNIST datasets confirmed to enrich data quality in precision, accuracy, recall, F1-score, and propensity metric.