
Dr Samaneh Kouchaki
Academic and research departments
Centre for Vision, Speech and Signal Processing (CVSSP), Department of Electrical and Electronic Engineering.Biography
I joined CVSSP in July 2020 to lead research and teaching in machine learning for health and dementia care in collaboration with the UK Dementia Research Institute (UK DRI) Care Research & Technology Centre (a joint initiative between CVSSP, the Surrey Sleep Research Centre and Department of Mathematics at Surrey, and Imperial College London).
I previously spent three years as a postdoctoral researcher within the Institute of Biomedical Engineering at the University of Oxford. I was the senior machine learning researcher for the ‘100,000 Genomes Project for Tuberculosis’, an international consortium involving the Centres for Disease Control of most major nations (including the USA, UK and China), jointly funded by the Gates Foundation and the Wellcome Trust. My research focus was on the prediction of antibiotic resistance in pathogens such as those that cause tuberculosis.
Prior to this, I was at the University of Manchester within the Division of Evolutionary and Genomic Sciences where I was funded by the EU Horizon 2020 Virogenesis project, working on next-generation DNA sequencing using signal and image processing techniques coupled with unsupervised machine learning.
I obtained my PhD in Computer Science at Surrey in 2015. My PhD focused on developing novel multi-way techniques for source separation with application to biomedical signals.
Areas of specialism
Biomedical signal processing; deep supervised/semi-supervised learning for healthcare data; graph learning and embedding for omics data; time-series data processing and pattern analysis.
Research
Research interests
Dr Kouchaki’s research is aimed at improving patient care by providing decision support. Her objective is to develop intelligent tools, based on hybrid architectures of advanced probabilistic and deep learning techniques, that facilitate improved patient outcomes.
Research collaborations
Collaboration with the UK Dementia Research Institute (UK DRI) Care Research & Technology Centre on machine learning for health and dementia care.
PhD research positions
Please contact me if you are interested in PhD in Machine learning for Healthcare applications.
Supervision
Postgraduate research supervision
Narges Pourshahrokhi
Elaheh Kalantari
Nivedita Bijlani
Bahar Khora
Co-supervision:
Adam Dowse
Completed postgraduate research projects I have supervised
Roonak Rezvani
My teaching
LABORATORIES DESIGN & PROFESSIONAL STUDIES III and IV (EEE2036 and EEE2037)
COMPUTER AND DIGITAL LOGIC (EEE1033)
My publications
Publications
Agitation is one of the neuropsychiatric symptoms with high prevalence in de-mentia which can negatively impact the Activities of Daily Living (ADL) and the independence of individuals. Detecting agitation episodes can assist in providing People Living with Dementia (PLWD) with early and timely interventions. Analysing agitation episodes will also help identify modifiable factors such as ambient temperature and sleep as possible components causing agitation in an individual. This preliminary study presents a supervised learning model to anal-* We apply a recurrent deep learning model to identify agitation episodes validated and recorded by a clinical monitoring team. We present the experiments to assess the efficacy of the proposed model. The proposed model achieves an average of 79.78% recall, 27.66% precision and 37.64% F1 scores when employing the optimal parameters, suggesting a good ability to recognise agitation events. We also discuss using machine learning models for analysing the behavioural patterns using continuous monitoring data and explore clinical applicability and the choices between specificity and specificity in home monitoring applications.
Background Behavioural changes and neuropsychiatric symptoms such as agitation are common in people with dementia. These symptoms impact the quality of life of people with dementia and can increase the stress on caregivers. This study aims to identify the likelihood of having agitation in people affected by dementia (i.e., patients and carers) using routinely collected data from in‐home monitoring technologies. We have used a digital platform and analytical methods, developed in our previous study, to generate alerts when changes occur in the digital markers collected using in‐home sensing technologies (i.e., vital signs, environmental and activity data). A care monitoring team use the platform and interact with participants and caregivers when an alert is generated. Method We have used connected sensory devices to collect environmental markers, including Passive Infra‐Red (PIR), smart power plugs for monitoring home appliance use, motion and door sensors. The environmental marker data have been aggregated within each hour and used to train an agitation risk analysis model. We have trained a model using data collected from 88 homes (∼6 months of data from each home). The proposed model has two components: a self‐supervised transformation learning and an ensemble classification model for agitation likelihood. Ten different neural network encoders are learned to create pseudo‐labels using the samples from the unlabelled data. We use these pseudo‐labels to train a classification model with a convolutional block and a decision layer. The trained convolutional block is then used to learn a latent representation of the data for an ensemble classification block. Results Comparing with baseline models such as LSTM network, Bidirectional LSTM (BiLSTM) network, VGG, ResNet, Inception, Random Forest (RF), Support Vector Machine (SVM) and Gaussian Process (GP) classifiers, the proposed model performs better in sensitivity (recall) and area under the precision‐recall curve with at most 40% improvement. The recall measure using the 10‐fold cross‐validation technique is 61%. Conclusion This method can support early interventions and help develop new pathways to support people affected by dementia. A limitation in our current study is that the environmental and movement data is at the home level and not personalised.
Missing values exist in nearly all clinical studies because data for a variable or question are not collected or not available. Imputing missing values and augmenting data can significantly improve generalisation and avoid bias in machine learning models. We propose a Hybrid Bayesian inference using Hamiltonian Monte Carlo (F-HMC) as a more practical approach to process cross-dimensional relations by applying a random walk and Hamiltonian dynamics to adapt posterior distribution and generate large-scale samples. The proposed method is applied to cancer symptom assessment, and MNIST datasets confirmed to enrich data quality in precision, accuracy, recall, F1-score, and propensity metric.
Abstract Antimicrobial resistance (AMR) poses a threat to global public health. To mitigate the impacts of AMR, it is important to identify the molecular mechanisms of AMR and thereby determine optimal therapy as early as possible. Conventional machine learning-based drug-resistance analyses assume genetic variations to be homogeneous, thus not distinguishing between coding and intergenic sequences. In this study, we represent genetic data from Mycobacterium tuberculosis as a graph, and then adopt a deep graph learning method—heterogeneous graph attention network (‘HGAT–AMR’)—to predict anti-tuberculosis (TB) drug resistance. The HGAT–AMR model is able to accommodate incomplete phenotypic profiles, as well as provide ‘attention scores’ of genes and single nucleotide polymorphisms (SNPs) both at a population level and for individual samples. These scores encode the inputs, which the model is ‘paying attention to’ in making its drug resistance predictions. The results show that the proposed model generated the best area under the receiver operating characteristic (AUROC) for isoniazid and rifampicin (98.53 and 99.10%), the best sensitivity for three first-line drugs (94.91% for isoniazid, 96.60% for ethambutol and 90.63% for pyrazinamide), and maintained performance when the data were associated with incomplete phenotypes (i.e. for those isolates for which phenotypic data for some drugs were missing). We also demonstrate that the model successfully identifies genes and SNPs associated with drug resistance, mitigating the impact of resistance profile while considering particular drug resistance, which is consistent with domain knowledge.
COVID‐19 is a major, urgent, and ongoing threat to global health. Globally more than 24 million have been infected and the disease has claimed more than a million lives as of November 2020. Predicting which patients will need respiratory support is important to guiding individual patient treatment and also to ensuring sufficient resources are available. The ability of six common Early Warning Scores (EWS) to identify respiratory deterioration defined as the need for advanced respiratory support (high‐flow nasal oxygen, continuous positive airways pressure, non‐invasive ventilation, intubation) within a prediction window of 24 h is evaluated. It is shown that these scores perform sub‐optimally at this specific task. Therefore, an alternative EWS based on the Gradient Boosting Trees (GBT) algorithm is developed that is able to predict deterioration within the next 24 h with high AUROC 94% and an accuracy, sensitivity, and specificity of 70%, 96%, 70%, respectively. The GBT model outperformed the best EWS (LDTEWS:NEWS), increasing the AUROC by 14%. Our GBT model makes the prediction based on the current and baseline measures of routinely available vital signs and blood tests.
The early clinical course of COVID-19 can be difficult to distinguish from other illnesses driving presentation to hospital. However, viral-specific PCR testing has limited sensitivity and results can take up to 72 h for operational reasons. We aimed to develop and validate two early-detection models for COVID-19, screening for the disease among patients attending the emergency department and the subset being admitted to hospital, using routinely collected health-care data (laboratory tests, blood gas measurements, and vital signs). These data are typically available within the first hour of presentation to hospitals in high-income and middle-income countries, within the existing laboratory infrastructure. We trained linear and non-linear machine learning classifiers to distinguish patients with COVID-19 from pre-pandemic controls, using electronic health record data for patients presenting to the emergency department and admitted across a group of four teaching hospitals in Oxfordshire, UK (Oxford University Hospitals). Data extracted included presentation blood tests, blood gas testing, vital signs, and results of PCR testing for respiratory viruses. Adult patients (>18 years) presenting to hospital before Dec 1, 2019 (before the first COVID-19 outbreak), were included in the COVID-19-negative cohort; those presenting to hospital between Dec 1, 2019, and April 19, 2020, with PCR-confirmed severe acute respiratory syndrome coronavirus 2 infection were included in the COVID-19-positive cohort. Patients who were subsequently admitted to hospital were included in their respective COVID-19-negative or COVID-19-positive admissions cohorts. Models were calibrated to sensitivities of 70%, 80%, and 90% during training, and performance was initially assessed on a held-out test set generated by an 80:20 split stratified by patients with COVID-19 and balanced equally with pre-pandemic controls. To simulate real-world performance at different stages of an epidemic, we generated test sets with varying prevalences of COVID-19 and assessed predictive values for our models. We prospectively validated our 80% sensitivity models for all patients presenting or admitted to the Oxford University Hospitals between April 20 and May 6, 2020, comparing model predictions with PCR test results. We assessed 155 689 adult patients presenting to hospital between Dec 1, 2017, and April 19, 2020. 114 957 patients were included in the COVID-negative cohort and 437 in the COVID-positive cohort, for a full study population of 115 394 patients, with 72 310 admitted to hospital. With a sensitive configuration of 80%, our emergency department (ED) model achieved 77·4% sensitivity and 95·7% specificity (area under the receiver operating characteristic curve [AUROC] 0·939) for COVID-19 among all patients attending hospital, and the admissions model achieved 77·4% sensitivity and 94·8% specificity (AUROC 0·940) for the subset of patients admitted to hospital. Both models achieved high negative predictive values (NPV; >98·5%) across a range of prevalences (≤5%). We prospectively validated our models for all patients presenting and admitted to Oxford University Hospitals in a 2-week test period. The ED model (3326 patients) achieved 92·3% accuracy (NPV 97·6%, AUROC 0·881), and the admissions model (1715 patients) achieved 92·5% accuracy (97·7%, 0·871) in comparison with PCR results. Sensitivity analyses to account for uncertainty in negative PCR results improved apparent accuracy (ED model 95·1%, admissions model 94·1%) and NPV (ED model 99·0%, admissions model 98·5%). Our models performed effectively as a screening test for COVID-19, excluding the illness with high-confidence by use of clinical data routinely available within 1 h of presentation to hospital. Our approach is rapidly scalable, fitting within the existing laboratory testing infrastructure and standard of care of hospitals in high-income and middle-income countries. Wellcome Trust, University of Oxford, Engineering and Physical Sciences Research Council, National Institute for Health Research Oxford Biomedical Research Centre.
Early prediction of pathogen infestation is a key factor to reduce the disease spread in plants. Macrophomina phaseolina (Tassi) Goid, as one of the main causes of charcoal rot disease, suppresses the plant productivity significantly. Charcoal rot disease is one of the most severe threats to soybean productivity. Prediction of this disease in soybeans is very tedious and non-practical using traditional approaches. Machine learning (ML) techniques have recently gained substantial traction across numerous domains. ML methods can be applied to detect plant diseases, prior to the full appearance of symptoms. In this paper, several ML techniques were developed and examined for prediction of charcoal rot disease in soybean for a cohort of 2,000 healthy and infected plants. A hybrid set of physiological and morphological features were suggested as inputs to the ML models. All developed ML models were performed better than 90% in terms of accuracy. Gradient Tree Boosting (GBT) was the best performing classifier which obtained 96.25% and 97.33% in terms of sensitivity and specificity. Our findings supported the applicability of ML especially GBT for charcoal rot disease prediction in a real environment. Moreover, our analysis demonstrated the importance of including physiological featured in the learning. The collected dataset and source code can be found in https://github.com/Elham-khalili/Soybean-Charcoal-Rot-Disease-Prediction-Dataset-code.