Lecturer A, School of Health Sciences

Matt Spick


Lecturer in Health and Biomedical Data Analytics
PhD

Academic and research departments

School of Health Sciences.

Publications

Danny Maupin, Tulsi Suchak, Adrian Barnett, Matt Spick Dramatic increases in redundant publications in the Generative AI era, In: medRxiv Cold Spring Harbor Laboratory Press

Redundant publication, the practice of submitting the same or substantially overlapping manuscripts multiple times, distorts the scientific record and wastes resources. Since 2022, publications using large open-science data resources have increased substantially, raising concerns that Generative AI (GenAI) may be facilitating the production of formulaic, redundant manuscripts. In this work we aim to quantify the extent of redundant publication from a single, large health dataset and to investigate whether GenAI can create submissions that evade standard integrity checks. We conducted a systematic search for the years 2021 to 2025 (year to end-July) to identify redundant publications using the US Centers for Disease Control and Prevention National Health and Nutrition Examination Survey (NHANES) dataset. Redundancy was defined as publications analysing the same exposures associated with the same outcomes in the same national population. To test whether GenAI could facilitate creating these papers, we prompted large language models to write three synthetic manuscripts using redundant publications from our dataset as input, instructing them to maximise syntactic differences and evade plagiarism detectors. These three synthetic manuscripts were then tested using a leading plagiarism detection platform to assess their similarity scores. Our search identified 411 redundant publications across 156 unique exposure-outcome pairings; for example, the association between oxidative balance score and chronic kidney disease using NHANES data was published six times in one year. In many instances, redundant articles appeared within the same journals. The three synthetic manuscripts created by GenAI to evade detection yielded overall similarity scores of 26%, 19%, and 14%, with individual similarity contributions below the typical 5% warning thresholds used by plagiarism detectors. The rapid growth in redundant publications (a 17-fold increase between 2022 and 2024) is suggestive of a systemic failure of editorial checks. These papers distort meta-analyses and scientometric studies, waste scarce peer review resources and pose a significant threat to the integrity of the scientific record. We conclude that current checks for redundant publications and plagiarism are no longer fit for purpose in the GenAI era.

Rupert Milsom, Magdalena Zasada, Cath Taylor, Matt Spick (2025)Machine Learning Applied to NHS Electronic Staff Records Identifies Key Areas of Focus for Staff Retention, In: Administrative sciences15(8)297 MDPI

Background: In this work, we examine determinants of staff departure rates in the NHS, a critical issue for workforce stability and continuity of care. High turnover, particularly among clinical staff, undermines service delivery and incurs substantial replacement costs. Methods: Here, we analyse a unique dataset derived from Electronic Staff Records at Ashford and St. Peter’s NHS Foundation Trust, using a machine learning approach to move beyond traditional survey-based methods, to assess propensity to leave. Results: In addition to established predictors such as salary and length of service, we identify drivers of increased risks of staff exits, including the distance between home and workplace and, especially for medical staff, cost centre vacancy rates. Conclusions: These findings highlight the multifactorial nature of staff retention and suggest the potential of local administrative data to improve workforce planning, for example, through hyperlocal recruitment strategies. Whilst further work will be required to assess the generalisability of our findings beyond a single Trust, our analysis offers insights for NHS managers seeking to stabilise staffing levels and reduce attrition through targeted interventions beyond pay and tenure.

Matt Spick, Cheryl M. Isherwood, Lee A. Gethings, Christopher J. Hughes, Matthew E. Daly, Hana Hassanin, Daan R. van der Veen, Debra J. Skene, Jonathan D. Johnston (2025)Challenges and opportunities for statistical power and biomarker identification arising from rhythmic variation in proteomics, In: Npj biological timing and sleep2(1) Nature Publishing Group UK

Time-of-day variation in the molecular profile of biofluids and tissues is a well-described phenomenon, but—especially for proteomics—is rarely considered in terms of the challenges this presents to reproducible biomarker identification. We provide a case study analysis of human circadian and ultradian rhythmicity in proteins, including in the complement and coagulation cascades and apolipoproteins, with PLG, CFAH, ZA2G and ITIH2 demonstrated as rhythmic for the first time. We also show that rhythmicity increases the risk of Type II errors due to the reduction in statistical power from increased variance, and that controlling for rhythmic time-of-day variation improves statistical power and reduces the chances of Type II errors. We recommend that best practice in proteomics study design should account for temporal variation and that time of sampling be reported as part of study metadata. These simple steps can mitigate against both false and missed discoveries, as well as improving reproducibility.

Kris Elomaa, Matt Spick, Earn H Gan, Simon H Pearce, Nophar Geifman (2025)Variable hyperthyroidism outcomes related to different treatment regimens: an analysis of UK Biobank data, In: European thyroid journal14(2)e240393 BIOSCIENTIFICA LTD

BackgroundUK guidance on the assessment and management of thyroid disease was set out in NICE guideline NG145 in 2019 and is expected to result in an increase in radioactive iodine (RAI) being offered as a first-line definitive treatment for hyperthyroidism.MethodologyIn this work we analyse longitudinal UK Biobank data to assess all-cause mortality and comorbidity risks associated with the main treatment modalities for 793 participants with hyperthyroidism, specifically antithyroid drugs (ATDs), RAI and thyroidectomy.ResultsParticipants treated with RAI showed reduced all-cause mortality compared with those treated with ATD alone (time to event ratio: 1.8, 95% CI: 0.9-3.6), albeit the result did not reach statistical significance, as did those treated by thyroidectomy (time ratio: 2.0, 95% CI: 1.1-3.9). For treated patients, odds ratios were generally elevated for osteoporosis, cardiovascular events and atrial fibrillation, but again did not reach statistical significance except for those patients treated by ATDs, with an odds ratio for atrial fibrillation of 2.2 (95% CI: 1.2-4.1) versus controls.ConclusionOur findings were consistent with those previously reported in the literature and do not reveal any evidence from the UK Biobank to contradict the safety of RAI being offered as a first-line treatment. The data are also suggestive, however, that treatments do not fully eliminate risks of complications related to hyperthyroidism. This reinforces the need for both clear communication where there may be risks of complications such as osteoporosis as well as clinical support for patients even after definitive treatment.

Inaki Deza-Cruz, Alexandre de Menezes, Brian Gardner, Ilknur Aktan, Sarhad Alnajjar, Martha Elizabeth Betson, Adriana Cabal Rosel, Manuela Caniça, Mark Chambers, Georgina Tarrant, Francesca Marie Contadini, Olukayode Daramola, Rani de la Rivière, Mary Bernadette Egan, Abel Bulamu Ekiri, Catherine Finnegan, Laura Cristina Gonzalez Villeta, Richard Green, Belinda Suzette Hall, Martin Hawes, Marwa Hassan, Sara Healy, Lisa Marie Holbrook, Guldane Damla Kaya, Prashant Kumar, Roberto Marcello La Ragione, Daniel James Maupin, Jai W. Mehat, Davide Messina, Kelly Moon, Elizabeth Mumford, Gordon Nichols, Daniel V. Olivença, Joaquin Prada, Claire Price, Christopher John Proudman, Retha Queenan, Miguel Ramos, Jaime Riccomini Closa, Jennifer M. Ritchie, Lorenzo Santorelli, Nick Selemetas, Matt Spick, Yashwanth Subbannayya, Shelini Surendran, Pedro Teixeira, Mukunthan Tharmakulasingam, Damian Valle, Arnoud H. M. Van Vliet, Marco Videira, Hazel Wallace-Williams, Klara Wanelik, Markus Woegerbauer, Danika Wright, Giovanni Lo Iacono (2025)Mapping the evidence of the effects of environmental factors on the prevalence of antibiotic resistance in the non-built environment, In: Environment International202109634 Elsevier

Background: Antibiotic resistance increasingly threatens the interconnected health of humans, animals, and the environment. While misuse of antibiotics is a known driver, environmental factors also play a critical role. A balanced One Health approach—including the environmental sector—is necessary to understand the emergence and spread of resistance. Methods: We systematically searched English-language literature (1990–2021) in MEDLINE, Embase, and Web of Science, plus grey literature. Titles, abstracts, and keywords were screened, followed by full-text reviews using a structured codebook and dual-reviewer assessments. Results: Of 13,667 records screened, 738 met the inclusion criteria. Most studies focused on freshwater and terrestrial environments, particularly associated with wastewater or manure sources. Evidence of research has predominantly focused on Escherichia coli and Pseudomonas spp., with a concentration on ARGs conferring resistance to sulphonamides (sul1–3), tetracyclines (tet), and beta-lactams. Additionally, the People’s Republic of China has produced a third of the studies—twice that of the next country, the United States—and research was largely domestic, with closely linked author networks. Conclusion: Significant evidence gaps persist in understanding antibiotic resistance in non-built environments, particularly in marine, atmospheric, and non-agricultural set65 tings. Stressors such as climate change and microplastics remain notably under-explored. There is also an urgent need for more research in low-income regions, which face higher risks of antibiotic resistance, to support the development of targeted, evidence-based interventions.

Tulsi Suchak, Anietie E Aliu, Charlie Harrison, Reyer Zwiggelaar, Nophar Geifman, Matt Spick (2025)Explosion of formulaic research articles, including inappropriate study designs and false discoveries, based on the NHANES US national health database, In: PLoS biology23(5)e3003152 PUBLIC LIBRARY SCIENCE

With the growth of artificial intelligence (AI)-ready datasets such as the National Health and Nutrition Examination Survey (NHANES), new opportunities for data-driven research are being created, but also generating risks of data exploitation by paper mills. In this work, we focus on two areas of potential concern for AI-supported research efforts. First, we describe the production of large numbers of formulaic single-factor analyses, relating single predictors to specific health conditions, where multifactorial approaches would be more appropriate. Employing AI-supported single-factor approaches removes context from research, fails to capture interactions, avoids false discovery correction, and is an approach that can easily be adopted by paper mills. Second, we identify risks of selective data usage, such as analyzing limited date ranges or cohort subsets without clear justification, suggestive of data dredging, and post-hoc hypothesis formation. Using a systematic literature search for single-factor analyses, we identified 341 NHANES-derived research papers published over the past decade, each proposing an association between a predictor and a health condition from the wide range contained within NHANES. We found evidence that research failed to take account of multifactorial relationships, that manuscripts did not account for the risks of false discoveries, and that researchers selectively extracted data from NHANES rather than utilizing the full range of data available. Given the explosion of AI-assisted productivity in published manuscripts (the systematic search strategy used here identified an average of 4 papers per annum from 2014 to 2021, but 190 in 2024-9 October alone), we highlight a set of best practices to address these concerns, aimed at researchers, data controllers, publishers, and peer reviewers, to encourage improved statistical practices and mitigate the risks of paper mills using AI-assisted workflows to introduce low-quality manuscripts to the scientific literature.

Laura C. Gonzalez Villeta, Linda Chamane-Pinedo, Alasdair James Charles Cook, Eelco Franz, Theo Kanellos, Lapo Mughini-Gras, Gordon Nichols, Roan Pijnacker, Joaquin M. Prada, Christophe Sarran, Matt Spick, Jessica Wu, Giovanni Lo Iacono (2025)Identifying Key Weather Factors Influencing Human Salmonellosis: A Conditional Incidence Analysis in England, Wales, and the Netherlands, In: Journal of Infection90(2)106410 Elsevier

The accelerating rate of global climate and environmental changes is expected to affect the distribution, frequency, and patterns of established infectious diseases, as well as the emergence and re-emergence of both new and known diseases. Salmonella is a leading cause of foodborne illnesses in Europe, accounting for nearly one in three foodborne outbreaks. The seasonal pattern observed in cases of human salmonellosis reported suggests that weather may be a relevant driver of disease. Many studies show associations of salmonellosis with weather factors, but the exact extent of this influence is still unclear. Elucidating how the disease depends on relevant weather factors provides insights into the underlying mechanisms of transmission and provides a tool to anticipate the risk when relevant weather factors are known. This study provides new insights into the relationship between weather factors and the occurrence of salmonellosis, addressing a crucial issue in the context of climate change. By utilizing long-term, high-resolution epidemiological data from England and Wales linked with local weather data, the study offers a comprehensive phenomenological description of specific weather conditions that are related the incidence of salmonellosis. Unlike previous studies that often rely on regression models or predefined parameterizations, the methodology used in this study employs a transparent and straightforward approach to estimate disease incidence based on a wide range of 14 local weather factors linked to individual cases. A key contribution of this study is its ability to account for the simultaneous effect of up to three weather factors, providing a more holistic understanding of their combined impact on disease incidence. Air temperature (>10⁰C), relative humidity, precipitation (dry conditions), dewpoint temperature (7-10⁰C), and day length (12-15h) were identified as key weather factors associated with salmonellosis, irrespective of geographical location. These findings were validated both in England and Wales and the Netherlands, which encourages the application of the model in other regions with different climatic and social characteristics to gain new insights on the incidence of salmonellosis. Likewise, the methodology can be adapted to explore other environmental factors, such as land use, proximity to animal farms, or socio-economic factors, providing a more holistic understanding of disease dynamics. The methodology used in this study, the conditional incidence, provides a robust framework to select key weather factors and exclude less relevant ones and to better understand climate-sensitive diseases and their response to climate changes. Early warning systems enhanced with weather data can improve incidence patterns predictions and tailor interventions to specific geographic areas.

Matt Spick, Holly M. Lewis, Michael J. Wilde, Christopher Hopley, Jim Huggett, Melanie J. Bailey (2022)Systematic review with meta-analysis of diagnostic test accuracy for COVID-19 by mass spectrometry, In: Metabolism, clinical and experimental126154922pp. 154922-154922 Elsevier

Background: The global COVID-19 pandemic has led to extensive development in many fields, including the diag-nosis of COVID-19 infection by mass spectrometry. The aim of this systematic review and meta-analysis was to assess the accuracy of mass spectrometry diagnostic tests developed so far, across a wide range of biological ma-trices, and additionally to assess risks of bias and applicability in studies published to date. Method: 23 retrospective observational cohort studies were included in the systematic review using the PRISMA -DTA framework, with a total of 2858 COVID-19 positive participants and 2544 controls. Risks of bias and appli-cability were assessed via a QUADAS-2 questionnaire. A meta-analysis was also performed focusing on sensitiv -ity, specificity, diagnostic accuracy and Youden's Index, in addition to assessing heterogeneity. Findings: Sensitivity averaged 0.87 in the studies reviewed herein (interquartile range 0.81-0.96) and specificity 0.88 (interquartile range 0.82-0.98), with an area under the receiver operating characteristic summary curve of 0.93. By subgroup, the best diagnostic results were achieved by viral proteomic analyses of nasopharyngeal swabs and metabolomic analyses of plasma and serum. The performance of other sampling matrices (breath, sebum, sa -liva) was less good, indicating that these protocols are currently insufficiently mature for clinical application. Conclusions: This systematic review and meta-analysis demonstrates the potential for mass spectrometry and 'omics in achieving accurate test results for COVID-19 diagnosis, but also highlights the need for further work to optimize and harmonize practice across laboratories before these methods can be translated to clinical applications. (c) 2021 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Anastasia Kontiza, Johanna Von Gerichten, Kyle D.G. Saunders, Matt Spick, Anthony D. Whetton, Carla F Newman, Melanie Jane Bailey (2024)Single-Cell Lipidomics: An Automated and Accessible Microfluidic Workflow Validated by Capillary Sampling, In: Analytical chemistry ACS

We report the first demonstration of a microfluidics-based approach to measure lipids in single living cells using widely available liquid chromatography mass spectrometry (LC-MS) instrumentation. The method enables the rapid sorting of live cells into liquid chambers formed on standard Petri dishes and their subsequent dispensing into vials for analysis using LC-MS. This approach facilitates automated sampling, data acquisition, and analysis and carries the additional advantage of chromatographic separation, aimed at reducing matrix effects present in shotgun lipidomics approaches. We demonstrate that our method detects comparable numbers of features at around 200 lipids in populations of single cells versus established live single-cell capillary sampling methods and with greater throughput, albeit with the loss of spatial resolution. We also show the importance of optimization steps in addressing challenges from lipid contamination, especially in blanks, and demonstrate a 75% increase in the number of lipids identified. This work opens up a novel, accessible, and high-throughput way to obtain single-cell lipid profiles and also serves as an important validation of single-cell lipidomics through the use of different sampling methods.

Matt Spick, Katherine Longman, Cecile Frampas, Holly Lewis, Catia Costa, Deborah Dunn Walters, Alex Stewart, Michael Wilde, Danni Greener, George Evetts, Drupad Trivedi, Perdita Barran, Andy Pitt, Melanie Bailey (2021)Changes to the sebum lipidome upon COVID-19 infection observed via rapid sampling from the skin, In: Data for Changes to the sebum lipidome upon COVID-19 infection observed via rapid sampling from the skin Elsevier Ltd

The COVID-19 pandemic has led to an unprecedented demand for testing - for diagnosis and prognosis - as well as for investigation into the impact of the disease on the host metabolism. Sebum sampling has the potential to support both needs by looking at what the virus does to us, rather than looking for the virus itself. In this pilot study, sebum samples were collected from 67 hospitalised patients (30 COVID-19 positive and 37 COVID-19 negative) by gauze swab. Lipidomics analysis was carried out using liquid chromatography mass spectrometry, identifying 998 reproducible features. Univariate and multivariate statistical analyses were applied to the resulting feature set. Lipid levels were depressed in COVID-19 positive participants, indicative of dyslipidemia; p-values of 0·022 and 0·015 were obtained for triglycerides and ceramides respectively, with effect sizes of 0·44 and 0·57. Partial Least Squares-Discriminant Analysis showed separation of COVID-19 positive and negative participants with sensitivity of 57% and specificity of 68%, improving to 79% and 83% respectively when controlled for confounding comorbidities. COVID-19 dysregulates many areas of metabolism; in this work we show that the skin lipidome can be added to the list. Given that samples can be provided quickly and painlessly, we conclude that sebum is worthy of future consideration for clinical sampling. The authors acknowledge funding from the EPSRC Impact Acceleration Account for sample collection and processing, as well as EPSRC Fellowship Funding EP/R031118/1, the University of Surrey and BBSRC BB/T002212/1. Mass Spectrometry was funded under EP/P001440/1.

Matt Spick, Jan Higgins, Cynthia L. Green, Roland Matsouaka, Daniel B. Shin, Russell P. Hall III, Nophar Geifman (2024)Observations from statistical review editors: a commentary, In: JID Innovations100302 Elsevier

Reproducibility and replicability are crucial components of the scientific method, but they may be compromised when there are inherent issues related to a study and analytic choices such as statistical errors, or misalignments between the study’s objectives and implementation. Indeed, statistical errors and misunderstandings contribute to low reproducibility and replicability, hindering independent verification or changes in the direction of research. (McNutt, 2014) Such problems can easily occur in health science, where there are many confounding factors and low prior odds of genuine findings (Ioannidis, 2005). Guidelines for statistical reporting that can minimize these issues are well-established, but are not always followed. In January of 2023, to help address these challenges in a more targeted way, JID Innovations established a statistical review board as part of its overall editorial process, nominating editors with expertise in statistical analysis and data science. (Hall, 2023) All submissions to the journal are reviewed by one of the statistical review editors to provide specialist evaluation and feedback on study design, statistical tests, and analyses as well as bioinformatic aspects of the manuscript. In this commentary, common themes identified by statistical review editors in their peer reviews are brought forth along with comments that are made during the ‘routine’ peer review process in order to highlight prevalent issues in statistical methodologies and reporting seen in submissions to JID Innovations. The goal of this commentary is to propose easy steps that authors can take to inform study design at the outset of any data-driven project, reduce the number of potential revisions to statistical methodology and presentation in the original submission and ultimately to improve the reproducibility and replicability of the work published in JID Innovations, with the added benefit of a more efficient submission process.

Anthony Onoja, Johanna Von Gerichten, Holly-May Lewis, Melanie Jane Bailey, Debra Jean Skene, Nophar Geifman, Matt Spick (2023)Meta-Analysis of COVID-19 Metabolomics Identifies Variations in Robustness of Biomarkers, In: International journal of molecular sciences24(18)14371 Mdpi

The global COVID-19 pandemic resulted in widespread harms but also rapid advances in vaccine development, diagnostic testing, and treatment. As the disease moves to endemic status, the need to identify characteristic biomarkers of the disease for diagnostics or therapeutics has lessened, but lessons can still be learned to inform biomarker research in dealing with future pathogens. In this work, we test five sets of research-derived biomarkers against an independent targeted and quantitative Liquid Chromatography-Mass Spectrometry metabolomics dataset to evaluate how robustly these proposed panels would distinguish between COVID-19-positive and negative patients in a hospital setting. We further evaluate a crowdsourced panel comprising the COVID-19 metabolomics biomarkers most commonly mentioned in the literature between 2020 and 2023. The best-performing panel in the independent dataset-measured by F1 score (0.76) and AUROC (0.77)-included nine biomarkers: lactic acid, glutamate, aspartate, phenylalanine, & beta;-alanine, ornithine, arachidonic acid, choline, and hypoxanthine. Panels comprising fewer metabolites performed less well, showing weaker statistical significance in the independent cohort than originally reported in their respective discovery studies. Whilst the studies reviewed here were small and may be subject to confounders, it is desirable that biomarker panels be resilient across cohorts if they are to find use in the clinic, highlighting the importance of assessing the robustness and reproducibility of metabolomics analyses in independent populations.

Kyle D. G. Saunders, Johanna von Gerichten, Holly-May Lewis, Priyanka Gupta, Matt Spick, Catia Costa, Eirini Velliou, Melanie J. Bailey (2023)Single-Cell Lipidomics Using Analytical Flow LC-MS Characterizes the Response to Chemotherapy in Cultured Pancreatic Cancer Cells, In: Analytical Chemistry95(39)pp. 14727-14735 American Chemical Society

In this work, we demonstrate the development and first application of nanocapillary sampling followed by analytical flow liquid chromatography–mass spectrometry for single-cell lipidomics. Around 260 lipids were tentatively identified in a single cell, demonstrating remarkable sensitivity. Human pancreatic ductal adenocarcinoma cells (PANC-1) treated with the chemotherapeutic drug gemcitabine can be distinguished from controls solely on the basis of their single-cell lipid profiles. Notably, the relative abundance of LPC(0:0/16:0) was significantly affected in gemcitabine-treated cells, in agreement with previous work in bulk. This work serves as a proof of concept that live cells can be sampled selectively and then characterized using automated and widely available analytical workflows, providing biologically relevant outputs.

Matt P. Spick, Nathaniel M. Bingham, Yuman Li, Janella De Jesus, Catia Costa, Melanie J. Bailey, Peter J. Roth (2020)Fully Degradable Thioester-Functional Homo- and Alternating Copolymers Prepared through Thiocarbonyl Addition-Ring-Opening RAFT Radical Polymerization, In: Macromolecules53(2)pp. 539-547 Amer Chemical Soc

The radical ring-opening polymerization (RROP) of thionolactones provides access to thioester backbone-functional copolymers but has, to date, only been demonstrated on acrylic copolymers. Herein, the thionolactone dibenzo[c,e]oxepane-5-thione (DOT) was subjected to azobisisobutyronitrile (A1BN)-initiated free-radical homopolymerization, which produced a thioester-functional homopolymer with a glass-transition temperature of 95 degrees C and the ability to degrade exclusively into predetermined small molecules. However, the homopolymerization was impractically slow and precluded the introduction of functionality. Conversely, the reversible addition-fragmentation chain-transfer (RAFT)-mediated copolymerization of DOT with N-methylmaleimide (MeMI), N-phenylmaleimide (PhMI), and N-2,3,4,5,6-pentafluorophenylmaleimide (PFPMI) rapidly produced well-defined copolymers with the tendency to form alternating sequences increasing in the order MeMI

Johanna von Gerichten, Kyle Saunders, Melanie J. Bailey, Lee A. Gethings, Anthony Onoja, Nophar Geifman, Matt Spick (2024)Challenges in Lipidomics Biomarker Identification: Avoiding the Pitfalls and Improving Reproducibility, In: Metabolites14(8)461 Mdpi

Identification of features with high levels of confidence in liquid chromatography-mass spectrometry (LC-MS) lipidomics research is an essential part of biomarker discovery, but existing software platforms can give inconsistent results, even from identical spectral data. This poses a clear challenge for reproducibility in biomarker identification. In this work, we illustrate the reproducibility gap for two open-access lipidomics platforms, MS DIAL and Lipostar, finding just 14.0% identification agreement when analyzing identical LC-MS spectra using default settings. Whilst the software platforms performed more consistently using fragmentation data, agreement was still only 36.1% for MS2 spectra. This highlights the critical importance of validation across positive and negative LC-MS modes, as well as the manual curation of spectra and lipidomics software outputs, in order to reduce identification errors caused by closely related lipids and co-elution issues. This curation process can be supplemented by data-driven outlier detection in assessing spectral outputs, which is demonstrated here using a novel machine learning approach based on support vector machine regression combined with leave-one-out cross-validation. These steps are essential to reduce the frequency of false positive identifications and close the reproducibility gap, including between software platforms, which, for downstream users such as bioinformaticians and clinicians, can be an underappreciated source of biomarker identification errors.

Cecile F. Frampas, Katie Longman, Matt Spick, Holly-May Lewis, Catia D. S. Costa, Alex Stewart, Deborah Dunn-Walters, Danni Greener, George Evetts, Debra J. Skene, Drupad Trivedi, Andy Pitt, Katherine Hollywood, Perdita Barran, Melanie J. Bailey (2022)Untargeted saliva metabolomics by liquid chromatography-Mass spectrometry reveals markers of COVID-19 severity, In: PloS one17(9)e0274967 Public Library Science

Background The COVID-19 pandemic is likely to represent an ongoing global health issue given the potential for new variants, vaccine escape and the low likelihood of eliminating all reservoirs of the disease. Whilst diagnostic testing has progressed at a fast pace, the metabolic drivers of outcomes-and whether markers can be found in different biofluids-are not well understood. Recent research has shown that serum metabolomics has potential for prognosis of disease progression. In a hospital setting, collection of saliva samples is more convenient for both staff and patients, and therefore offers an alternative sampling matrix to serum. Methods Saliva samples were collected from hospitalised patients with clinical suspicion of COVID-19, alongside clinical metadata. COVID-19 diagnosis was confirmed using RT-PCR testing, and COVID-19 severity was classified using clinical descriptors (respiratory rate, peripheral oxygen saturation score and C-reactive protein levels). Metabolites were extracted and analysed using high resolution liquid chromatography-mass spectrometry, and the resulting peak area matrix was analysed using multivariate techniques. Results Positive percent agreement of 1.00 between a partial least squares-discriminant analysis metabolomics model employing a panel of 6 features (5 of which were amino acids, one that could be identified by formula only) and the clinical diagnosis of COVID-19 severity was achieved. The negative percent agreement with the clinical severity diagnosis was also 1.00, leading to an area under receiver operating characteristics curve of 1.00 for the panel of features identified. Conclusions In this exploratory work, we found that saliva metabolomics and in particular amino acids can be capable of separating high severity COVID-19 patients from low severity COVID-19 patients. This expands the atlas of COVID-19 metabolic dysregulation and could in future offer the basis of a quick and non-invasive means of sampling patients, intended to supplement existing clinical tests, with the goal of offering timely treatment to patients with potentially poor outcomes.

Matt Spick, Olivier Cexus, Hardev Singh Pandha, Agnieszka Michael, Anthony David Whetton, Nophar Geifman, Paul Andrew Townsend (2023)A Novel Blood Proteomic Signature for Prostate Cancer, In: Cancers15(4)1051 MDPI

Prostate cancer is the most common malignant tumour in men. Improved testing for di- agnosis, risk prediction, and response to treatment would improve care. Here, we identified a pro- teomic signature of prostate cancer in peripheral blood using data-independent acquisition mass spectrometry combined with machine learning. A highly predictive signature was derived, which was associated with relevant pathways, including the coagulation, complement, and clotting cas- cades, as well as plasma lipoprotein particle remodeling. We further validated the identified bi- omarkers against a second cohort, identifying a panel of five key markers (GP5, SERPINA5, ECM1, IGHG1, and THBS1) which retained most of the diagnostic power of the overall dataset, achieving an AUC of 0.91. Taken together, this study provides a proteomic signature complementary to PSA for the diagnosis of patients with localised prostate cancer, with the further potential for assessing risk of future development of prostate cancer. Data are available via ProteomeXchange with identi- fier PXD025484.

Jaime Gonzalez, Manuel Salvador, Ozhan Ozkayab, Matt Spick, Catia Costa, Melanie Bailey, Claudio Avignone Rossa, Rolf Kuemmerli, Jose Jimenez The loss of the pyoverdine secondary receptor in Pseudomonas aeruginosa results in a fitter strain suitable for population invasion, In: bioRxiv Cold Spring Harbor Laboratory Press

The rapid emergence of antibiotic resistant bacterial pathogens constitutes a critical problem in healthcare and requires the development of novel treatments. Potential strategies include the exploitation of microbial social interactions based on public goods, which are produced at a fitness cost by cooperative microorganisms, but can be exploited by cheaters that do not produce these goods. Cheater invasion has been proposed as a "Trojan horse" approach to infiltrate pathogen populations with strains deploying built-in weaknesses (e.g. sensitiveness to antibiotics). However, previous attempts have been often unsuccessful because population invasion by cheaters was prevented by various mechanisms including the presence of spatial structure (e.g. growth in biofilms), which limits the diffusion and exploitation of public goods. Here we followed an alternative approach and examined whether the manipulation of public good uptake and not its production could result in potential "Trojan horses" suitable for population invasion. We focused on the siderophore pyoverdine produced by the human pathogen Pseudomonas aeruginosa MPAO1 and manipulated its uptake by deleting and/or overexpressing the pyoverdine primary (FpvA) and secondary (FpvB) receptors. We found that receptor synthesis feeds back on pyoverdine production and uptake rates, which led to strains with altered pyoverdine-associated costs and benefits. Moreover, we found that the receptor FpvB was advantageous under iron-limited conditions but revealed hidden costs in the presence of an antibiotic stressor (gentamicin). As a consequence, FpvB mutants became the fittest strain under gentamicin exposure, displacing the wildtype in liquid cultures, and in biofilms and during infections of the wax moth larvae Galleria mellonella, which both represent structured environments. Our findings reveal that an evolutionary trade-off associated with the costs and benefits of a versatile pyoverdine uptake strategy can be harnessed for devising a Trojan horse candidate for medical interventions. Competing Interest Statement The authors have declared no competing interest.

Matt Spick, Ammara Muazzam, Hardev Singh Pandha, Agnieszka Michael, Lee A Gethings, Christopher J. Hughes, Nyasha Munjoma, Robert S. Plumb, Ian D. Wilson, Anthony David Whetton, Paul Andrew Townsend, Nophar Geifman (2023)Multi-omic diagnostics of prostate cancer in the presence of benign prostatic hyperplasia, In: Heliyon9(12)e22604 Elsevier

There is an unmet need for improved diagnostic testing and risk prediction for cases of prostate cancer (PCa) to improve care and reduce overtreatment of indolent disease. Here we have analysed the serum proteome and lipidome of 262 study participants by liquid chromatography-mass spectrometry, including participants diagnosed with PCa, benign prostatic hyperplasia (BPH), or otherwise healthy volunteers, with the aim of improving biomarker specificity. Although a two class machine learning model separated PCa from controls with sensitivity of 0.82 and specificity of 0.95, adding BPH resulted in a statistically significant decline in specificity for prostate cancer to 0.76, with half of BPH cases being misclassified by the model as PCa. A small number of biomarkers differentiating between BPH and prostate cancer were identified, including proteins in MAP Kinase pathways, as well as in lipids containing oleic acid; these may offer a route to greater specificity. These results highlight, however, that whilst there are opportunities for machine learning, these will only be achieved by use of appropriate training sets that include confounding comorbidities, especially when calculating the specificity of a test.