Samaneh Kouchaki

Dr Samaneh Kouchaki

Lecturer in Machine Learning for Healthcare
+44 (0)1483 689261
38 BA 01



Research interests

Research collaborations

PhD research positions

Please contact me if you are interested in PhD in Machine learning for Healthcare applications.


Postgraduate research supervision

Completed postgraduate research projects I have supervised

My teaching

My publications


Samaneh Kouchaki, Paolo Miotto, Claudio U. Köser, Philip W. Fowler, Jeff Knaggs, Zamin Iqbal, Martin Hunt, Leonid Chindelevitch, Maha Farhat, Daniela Maria Cirillo, Iñaki Comas, James Posey, Shaheed V. Omar, Timothy E A Peto, Anita Suresh, Swapna Uplekar, Sacha Laurent, Rebecca E. Colman, Carl-Michael Nathanson, Matteo Zignol, Ann Sarah Walker, Derrick W. Crook, Nazir Ismail, Timothy C. Rodwell (2021)The 2021 WHO catalogue of Mycobacterium tuberculosis complex mutations associated with drug resistance: A genotypic analysis, In: The Lancet Microbe Elsevier
Samaneh Kouchaki, Francesca Palermo, Honglin Li, Alexander Capstick, Nan Fletcher-Lloyd, Yuchen Zhao, Ramin Nilforooshan, David Sharp, Payam Barnaghi (2021)Designing A Clinically Applicable Deep Recurrent Model to Identify Neuropsychiatric Symptoms in People Living with Dementia Using In-Home Monitoring Data

Agitation is one of the neuropsychiatric symptoms with high prevalence in de-mentia which can negatively impact the Activities of Daily Living (ADL) and the independence of individuals. Detecting agitation episodes can assist in providing People Living with Dementia (PLWD) with early and timely interventions. Analysing agitation episodes will also help identify modifiable factors such as ambient temperature and sleep as possible components causing agitation in an individual. This preliminary study presents a supervised learning model to anal-* We apply a recurrent deep learning model to identify agitation episodes validated and recorded by a clinical monitoring team. We present the experiments to assess the efficacy of the proposed model. The proposed model achieves an average of 79.78% recall, 27.66% precision and 37.64% F1 scores when employing the optimal parameters, suggesting a good ability to recognise agitation events. We also discuss using machine learning models for analysing the behavioural patterns using continuous monitoring data and explore clinical applicability and the choices between specificity and specificity in home monitoring applications.

Roonak Rezvani, Samaneh Kouchaki, Ramin Nilforooshan, David J Sharp, Payam Barnaghi (2021)Analysing behavioural changes in people with dementia using in‐home monitoring technologies, In: Alzheimer's & Dementia: The Journal of the Alzheimer's Association17(S11)e052181 Wiley

Background Behavioural changes and neuropsychiatric symptoms such as agitation are common in people with dementia. These symptoms impact the quality of life of people with dementia and can increase the stress on caregivers. This study aims to identify the likelihood of having agitation in people affected by dementia (i.e., patients and carers) using routinely collected data from in‐home monitoring technologies. We have used a digital platform and analytical methods, developed in our previous study, to generate alerts when changes occur in the digital markers collected using in‐home sensing technologies (i.e., vital signs, environmental and activity data). A care monitoring team use the platform and interact with participants and caregivers when an alert is generated. Method We have used connected sensory devices to collect environmental markers, including Passive Infra‐Red (PIR), smart power plugs for monitoring home appliance use, motion and door sensors. The environmental marker data have been aggregated within each hour and used to train an agitation risk analysis model. We have trained a model using data collected from 88 homes (∼6 months of data from each home). The proposed model has two components: a self‐supervised transformation learning and an ensemble classification model for agitation likelihood. Ten different neural network encoders are learned to create pseudo‐labels using the samples from the unlabelled data. We use these pseudo‐labels to train a classification model with a convolutional block and a decision layer. The trained convolutional block is then used to learn a latent representation of the data for an ensemble classification block. Results Comparing with baseline models such as LSTM network, Bidirectional LSTM (BiLSTM) network, VGG, ResNet, Inception, Random Forest (RF), Support Vector Machine (SVM) and Gaussian Process (GP) classifiers, the proposed model performs better in sensitivity (recall) and area under the precision‐recall curve with at most 40% improvement. The recall measure using the 10‐fold cross‐validation technique is 61%. Conclusion This method can support early interventions and help develop new pathways to support people affected by dementia. A limitation in our current study is that the environmental and movement data is at the home level and not personalised.

Samaneh Kouchaki, Narges Pourshahrokhi, Kord M. Kober, Christine Miaskowski, Payam Barnaghi (2021)A Hybrid Bayesian Model to Analyse Healthcare Data

Missing values exist in nearly all clinical studies because data for a variable or question are not collected or not available. Imputing missing values and augmenting data can significantly improve generalisation and avoid bias in machine learning models. We propose a Hybrid Bayesian inference using Hamiltonian Monte Carlo (F-HMC) as a more practical approach to process cross-dimensional relations by applying a random walk and Hamiltonian dynamics to adapt posterior distribution and generate large-scale samples. The proposed method is applied to cancer symptom assessment, and MNIST datasets confirmed to enrich data quality in precision, accuracy, recall, F1-score, and propensity metric.

Yang Yang, Timothy M Walker, Samaneh Kouchaki, Chenyang Wang, Timothy E A Peto, Derrick W Crook, David A Clifton, (2021)An end-to-end heterogeneous graph attention network for Mycobacterium tuberculosis drug-resistance prediction, In: Briefings in Bioinformatics22(6) Oxford University Press

Abstract Antimicrobial resistance (AMR) poses a threat to global public health. To mitigate the impacts of AMR, it is important to identify the molecular mechanisms of AMR and thereby determine optimal therapy as early as possible. Conventional machine learning-based drug-resistance analyses assume genetic variations to be homogeneous, thus not distinguishing between coding and intergenic sequences. In this study, we represent genetic data from Mycobacterium tuberculosis as a graph, and then adopt a deep graph learning method—heterogeneous graph attention network (‘HGAT–AMR’)—to predict anti-tuberculosis (TB) drug resistance. The HGAT–AMR model is able to accommodate incomplete phenotypic profiles, as well as provide ‘attention scores’ of genes and single nucleotide polymorphisms (SNPs) both at a population level and for individual samples. These scores encode the inputs, which the model is ‘paying attention to’ in making its drug resistance predictions. The results show that the proposed model generated the best area under the receiver operating characteristic (AUROC) for isoniazid and rifampicin (98.53 and 99.10%), the best sensitivity for three first-line drugs (94.91% for isoniazid, 96.60% for ethambutol and 90.63% for pyrazinamide), and maintained performance when the data were associated with incomplete phenotypes (i.e. for those isolates for which phenotypic data for some drugs were missing). We also demonstrate that the model successfully identifies genes and SNPs associated with drug resistance, mitigating the impact of resistance profile while considering particular drug resistance, which is consistent with domain knowledge.

Alexey Youssef, Samaneh Kouchaki, Farah Shamout, Jacob Armstrong, Rasheed El‐Bouri, Thomas Taylor, Drew Birrenkott, Baptiste Vasey, Andrew Soltan, Tingting Zhu, David A Clifton, David W Eyre (2021)Development and validation of early warning score systems for COVID‐19 patients, In: Health Technology Letters8(5)pp. 105-117 Wiley

COVID‐19 is a major, urgent, and ongoing threat to global health. Globally more than 24 million have been infected and the disease has claimed more than a million lives as of November 2020. Predicting which patients will need respiratory support is important to guiding individual patient treatment and also to ensuring sufficient resources are available. The ability of six common Early Warning Scores (EWS) to identify respiratory deterioration defined as the need for advanced respiratory support (high‐flow nasal oxygen, continuous positive airways pressure, non‐invasive ventilation, intubation) within a prediction window of 24 h is evaluated. It is shown that these scores perform sub‐optimally at this specific task. Therefore, an alternative EWS based on the Gradient Boosting Trees (GBT) algorithm is developed that is able to predict deterioration within the next 24 h with high AUROC 94% and an accuracy, sensitivity, and specificity of 70%, 96%, 70%, respectively. The GBT model outperformed the best EWS (LDTEWS:NEWS), increasing the AUROC by 14%. Our GBT model makes the prediction based on the current and baseline measures of routinely available vital signs and blood tests.

Andrew A S Soltan, Samaneh Kouchaki, Tingting Zhu, Dani Kiyasseh, Thomas Taylor, Zaamin B Hussain, Tim Peto, Andrew J Brent, David W Eyre, David A Clifton (2021)Rapid triage for COVID-19 using routine clinical data for patients attending hospital: development and prospective validation of an artificial intelligence screening test, In: The Lancet. Digital health3(2)pp. e78-e87 Elsevier

The early clinical course of COVID-19 can be difficult to distinguish from other illnesses driving presentation to hospital. However, viral-specific PCR testing has limited sensitivity and results can take up to 72 h for operational reasons. We aimed to develop and validate two early-detection models for COVID-19, screening for the disease among patients attending the emergency department and the subset being admitted to hospital, using routinely collected health-care data (laboratory tests, blood gas measurements, and vital signs). These data are typically available within the first hour of presentation to hospitals in high-income and middle-income countries, within the existing laboratory infrastructure. We trained linear and non-linear machine learning classifiers to distinguish patients with COVID-19 from pre-pandemic controls, using electronic health record data for patients presenting to the emergency department and admitted across a group of four teaching hospitals in Oxfordshire, UK (Oxford University Hospitals). Data extracted included presentation blood tests, blood gas testing, vital signs, and results of PCR testing for respiratory viruses. Adult patients (>18 years) presenting to hospital before Dec 1, 2019 (before the first COVID-19 outbreak), were included in the COVID-19-negative cohort; those presenting to hospital between Dec 1, 2019, and April 19, 2020, with PCR-confirmed severe acute respiratory syndrome coronavirus 2 infection were included in the COVID-19-positive cohort. Patients who were subsequently admitted to hospital were included in their respective COVID-19-negative or COVID-19-positive admissions cohorts. Models were calibrated to sensitivities of 70%, 80%, and 90% during training, and performance was initially assessed on a held-out test set generated by an 80:20 split stratified by patients with COVID-19 and balanced equally with pre-pandemic controls. To simulate real-world performance at different stages of an epidemic, we generated test sets with varying prevalences of COVID-19 and assessed predictive values for our models. We prospectively validated our 80% sensitivity models for all patients presenting or admitted to the Oxford University Hospitals between April 20 and May 6, 2020, comparing model predictions with PCR test results. We assessed 155 689 adult patients presenting to hospital between Dec 1, 2017, and April 19, 2020. 114 957 patients were included in the COVID-negative cohort and 437 in the COVID-positive cohort, for a full study population of 115 394 patients, with 72 310 admitted to hospital. With a sensitive configuration of 80%, our emergency department (ED) model achieved 77·4% sensitivity and 95·7% specificity (area under the receiver operating characteristic curve [AUROC] 0·939) for COVID-19 among all patients attending hospital, and the admissions model achieved 77·4% sensitivity and 94·8% specificity (AUROC 0·940) for the subset of patients admitted to hospital. Both models achieved high negative predictive values (NPV; >98·5%) across a range of prevalences (≤5%). We prospectively validated our models for all patients presenting and admitted to Oxford University Hospitals in a 2-week test period. The ED model (3326 patients) achieved 92·3% accuracy (NPV 97·6%, AUROC 0·881), and the admissions model (1715 patients) achieved 92·5% accuracy (97·7%, 0·871) in comparison with PCR results. Sensitivity analyses to account for uncertainty in negative PCR results improved apparent accuracy (ED model 95·1%, admissions model 94·1%) and NPV (ED model 99·0%, admissions model 98·5%). Our models performed effectively as a screening test for COVID-19, excluding the illness with high-confidence by use of clinical data routinely available within 1 h of presentation to hospital. Our approach is rapidly scalable, fitting within the existing laboratory testing infrastructure and standard of care of hospitals in high-income and middle-income countries. Wellcome Trust, University of Oxford, Engineering and Physical Sciences Research Council, National Institute for Health Research Oxford Biomedical Research Centre.

Elham Khalili, Samaneh Kouchaki, Shahin Ramazi, Faezeh Ghanati (2020)Machine Learning Techniques for Soybean Charcoal Rot Disease Prediction, In: Frontiers in Plant Science11590529 Frontiers Media S.A

Early prediction of pathogen infestation is a key factor to reduce the disease spread in plants. Macrophomina phaseolina (Tassi) Goid, as one of the main causes of charcoal rot disease, suppresses the plant productivity significantly. Charcoal rot disease is one of the most severe threats to soybean productivity. Prediction of this disease in soybeans is very tedious and non-practical using traditional approaches. Machine learning (ML) techniques have recently gained substantial traction across numerous domains. ML methods can be applied to detect plant diseases, prior to the full appearance of symptoms. In this paper, several ML techniques were developed and examined for prediction of charcoal rot disease in soybean for a cohort of 2,000 healthy and infected plants. A hybrid set of physiological and morphological features were suggested as inputs to the ML models. All developed ML models were performed better than 90% in terms of accuracy. Gradient Tree Boosting (GBT) was the best performing classifier which obtained 96.25% and 97.33% in terms of sensitivity and specificity. Our findings supported the applicability of ML especially GBT for charcoal rot disease prediction in a real environment. Moreover, our analysis demonstrated the importance of including physiological featured in the learning. The collected dataset and source code can be found in