Dr Xilu Wang
About
Biography
Dr. Xilu Wang joined the Computer Science Department at Surrey in 2024 as a Surrey Future Fellow and Lecturer. Before this she was a postdoctoral researcher at Bielefeld University in Germany for two years, having completed her PhD at the University of Surrey in 2022, funded by the Honda Research Institute Europe.
Her research focuses on making AI optimisation smarter, more efficient, and trustworthy — particularly in settings where data is scarce, distributed, or sensitive. She has published in multi-objective Bayesian optimisation, surrogate-assisted evolutionary algorithms, and federated learning, with applications ranging from audio deepfake detection to energy systems and healthcare. Wang is an active leader in the international research community, serving as Vice Chair of the IEEE CIS Task Force on Data-Driven Evolutionary Optimization.
ResearchResearch interests
My research focuses on the design of fair, privacy-preserving, and trustworthy machine learning and data-driven optimization
algorithms, with growing interests in the efficiency and trustworthiness of large language and audio models. Topics include:
• Multi-objective optimization, Bayesian optimization, and surrogate-assisted evolutionary algorithms
• Trustworthy AI: federated learning, fair and privacy-preserving optimization
• Efficient large language and audio models: parameter-efficient training, speculative decoding
• Audio AI: audio deepfake detection and audio language models
Research interests
My research focuses on the design of fair, privacy-preserving, and trustworthy machine learning and data-driven optimization
algorithms, with growing interests in the efficiency and trustworthiness of large language and audio models. Topics include:
• Multi-objective optimization, Bayesian optimization, and surrogate-assisted evolutionary algorithms
• Trustworthy AI: federated learning, fair and privacy-preserving optimization
• Efficient large language and audio models: parameter-efficient training, speculative decoding
• Audio AI: audio deepfake detection and audio language models
Supervision
Postgraduate research supervision
PhD Supervision
Graduated PhD Students
Jinhao Zhang (Co-supervisor)
Co-supervised with Dr. Xiaowei Gu, Prof. Zhenhua Feng, and Prof. Yaochu Jin.
Current PhD Students
Rong Wan (Primary supervisor)
Research topic: Explainable audio deepfake detection.
Co-supervised with Prof. Wenwu Wang.
Zijian Jiang (Primary supervisor)
Co-supervised with Prof. Ferrante Neri.
Yixia Zhang (Co-supervisor)
Co-supervised with Prof. Ferrante Neri.
Jiaxi Li (Primary supervisor)
Research topic: Efficiency of large language models.
Co-supervised with Prof. Wenwu Wang.
Junqi Zhao (Co-supervisor)
Research topic: Generative models for audio separation and generation.
Co-supervised with Prof. Wenwu Wang.
Completed postgraduate research projects I have supervised
Abel Alexander University of Surrey
“A comparative study of Bayesian algorithms on expensive optimization” September 2021-September 2022
• Distinction, Master student.
• He currently works in Bank of America.
Adrian Kruse Bielefeld University
“Bayesian Multi-objective Evolutionary Optimization, Bielefeld University” January 2023-September 2023
• Bachelor student.
Scheu Louis Lamar Bielefeld University
“Federated learning in objective detection” September 2023-Ongoing
• Master student.
Teaching
BSc:
COM3013 COMPUTATIONAL INTELLIGENCE
COM3001 FINAL YEAR PROJECT (supervision)
MSc Data Science:
COMM070 MSC DISSERTATION (supervision)
COMM062 COMPUTATIONAL INTELLIGENCE
Publications
This reproducibility companion paper provides implementation details of our paper ''Learning differentiable particle filter on the fly''[10] presented at the 57th Asilomar Conference on Signals, Systems, and Computers. We provide detailed documentation to replicate our research, which proposes a differentiable particle filter capable of online learning. This paper includes our Python code repository, experimental configurations, dataset description, and step-by-step instructions to reproduce the results. By sharing these resources, we aim to encourage open source and further research in this direction.
In surrogate-assisted evolutionary optimization, privacy-preservation and trusted data sharing has become an increasingly important concern, especially in scenarios involving distributed sensitive data. Existing privacy-preserving surrogate-assisted evolutionary optimization algorithms heavily rely on the basic federated learning framework. However, recent findings have revealed possible vulnerabilities within this framework, including susceptibility to adversarial threats like gradient leakage and inference attacks. To address the above challenges and enhance privacy protection, this article proposes to protect the raw data by applying a differentially private stochastic gradient descent method to train surrogate models. A differential evolution operator is designed to generate personalized new samples for multiple clients based on promising and additional auxiliary samples, avoiding the exposure of online newly generated data. Moreover, a similarity-based aggregation algorithm is integrated to effectively construct the global surrogate model. A rigorous security analysis is provided to further validate the effectiveness of the proposed method in privacy protection. Experimental results show that the proposed method exhibits remarkable optimization performance on a set of synthetic problems with federated settings while maintaining the data privacy.
Recent years have seen the rapid development of fairness-aware machine learning in mitigating unfairness or discrimination in decision-making in a wide range of applications. However, much less attention has been paid to the fairness-aware multi-objective optimization, which is indeed commonly seen in real life, such as fair resource allocation problems and data-driven multi-objective optimization problems. This paper aims to illuminate and broaden our understanding of multi-objective optimization from the perspective of fairness. To this end, we start with a discussion of user preferences in multi-objective optimization. Subsequently, we explore its relationship to fairness in machine learning and multi-objective optimization. Following the above discussions, representative cases of fairness-aware multi-objective optimization are presented, further elaborating the importance of fairness in traditional multi-objective optimization, data-driven optimization and federated optimization. Finally, challenges and opportunities in fairness-aware multi-objective optimization are addressed. We hope that this article makes a solid step forward towards understanding fairness in the context of optimization. Additionally, we aim to promote research interests in fairness-aware multi-objective optimization.
With the development of edge devices and cloud computing, the question of how to accomplish machine learning and optimization tasks in a privacy-preserving and secure way has attracted increased attention over the past decade. As a privacy-preserving distributed machine learning method, federated learning (FL) has become popular in the last few years. However, the data privacy issue also occurs when solving optimization problems, which has received little attention so far. This survey paper is concerned with privacy-preserving optimization, with a focus on privacy-preserving data-driven evolutionary optimization. It aims to provide a roadmap from secure privacy-preserving learning to secure privacy-preserving optimization by summarizing security mechanisms and privacy-preserving approaches that can be employed in machine learning and optimization. We provide a formal definition of security and privacy in learning, followed by a comprehensive review of FL schemes and cryptographic privacy-preserving techniques. Then, we present ideas on the emerging area of privacy-preserving optimization, ranging from privacy-preserving distributed optimization to privacy-preserving evolutionary optimization and privacy-preserving Bayesian optimization (BO). We further provide a thorough security analysis of BO and evolutionary optimization methods from the perspective of inferring attacks and active attacks. On the basis of the above, an in-depth discussion is given to analyze what FL and distributed optimization strategies can be used for the design of federated optimization and what additional requirements are needed for achieving these strategies. Finally, we conclude the survey by outlining open questions and remaining challenges in federated data-driven optimization. We hope this survey can provide insights into the relationship between FL and federated optimization and will promote research interest in secure federated optimization.
Tablets are an efficient dosage form for delivering probiotics. Prior studies have identified compression pressure, compression speed, and precompression pressure as critical process parameters determining probiotic survival during tabletting. However, due to the labour-intensive and time-consuming nature of experimental investigations, most previous studies focused on evaluating the impact of individual parameters in isolation. Consequently, the rapid and systematic identification of optimal process parameters to maximise probiotic survival remains a significant and unresolved challenge in pharmaceutical formulation research. To address this gap, an integrated approach combining active learning (AL) based Gaussian process regression (GPR) with finite element (FE) modelling was developed to systematically explore the compaction parameter space and identify optimal process conditions. All data utilised in AL were generated using an FE model that was specifically developed to predict viability of probiotics during tabletting. Remarkably, the integrated approach achieved high prediction performance after only 78 iterations, demonstrating a coefficient of determination (R2) of 0.96 across the entire design space for predicting probiotic survival rate during tabletting. Using the well-trained model, a global random sampling strategy combined with threshold filtering was employed to identify regions of the design space likely to yield near-optimal survival rates. Furthermore, the exploration of compression speed and precompression pressure at selected fixed main compression pressures enabled the generation of survival rate maps, providing insights into the interplay between probiotic survival rate and tablet mechanical performance. This study demonstrated the potential of hybrid data-driven and first-principles modelling approaches as a robust strategy for optimising probiotic tabletting processes and accelerating pharmaceutical development.
Many real-world problems involve optimizing numerous decision variables and are expensive to evaluate, known as large-scale expensive optimization problems (LSEOPs). While surrogate-assisted evolutionary algorithms have proven effective for expensive problems, training proper models for LSEOPs remains challenging due to insufficient training data. In this paper, we adopt the divide-and-conquer approach, decomposing LSEOPs into lower-dimensional sub-problems and constructing models for sub-problems, and introduce a multi-view synthetic sampling technique for new sample selection. Specifically, we propose sorting all evaluated solutions in an ascending order and dividing them into intervals, from which data are sampled to obtain informative training data for models. The population for the LSEOP is updated by employing cooperative environmental selections on the population, formed by recombining all renewed populations for sub-problems to balance exploration and exploitation. Finally, a solution is selected among the current population for the true evaluation based on its multi-view performance predicted across all sub-problems. Results on CEC'2013 benchmark problems show the effectiveness and efficiency of our proposed method compared to three prevalent large-scale expensive optimization algorithms. Additionally, results on 2000-dimensional CEC'2010 benchmark problems and a 1200-dimensional real-world problem demonstrate encouraging scalability and robustness of the proposed method for addressing higher-dimensional problems.
Traditional large-scale evolutionary algorithms are limited in their ability to solve certain real-world applications with high-dimensional, black-box, and computationally expensive objectives due to their need for numerous objective evaluations. Surrogate-assisted evolutionary algorithms (SAEAs) have shown effective for expensive black-box optimization by relying on inexpensive surrogate models. However, large-scale optimization remains challenging for SAEAs due to the exponentially growing search space and the presence of multiple local optima, resulting in difficulty in training a proper model due to the lack of samples. To address these challenges, we propose constructing an initial surrogate model on randomly selected dimensions and calculating a Gaussian distribution for each sampled dimension. The surrogate then provides predictions when perturbing each sampled dimension by sampling from the distribution, enabling the identification of the most important variables for constructing an active sub-problem to reduce the search space. A secondary surrogate model, built for the active sub-problem, guides the offspring generation and environmental selection for a modified particle swarm optimization algorithm to effectively explores the sub-space while escaping local optima in large-scale problems. Experimental results on CEC'2013 and CEC'2010 benchmark problems show that the proposed method outperforms state-of-the-art algorithms in addressing large-scale expensive optimization problems. The efficiency of the proposed method is further verified on CEC'2010 benchmark problems extended to 2000 dimensions.
Blackbox optimization problems are commonly seen in the real world, ranging from experimental design to hyperparameter tuning of machine learning models. In numerous scenarios, addressing a collection of similar data-driven blackbox optimization tasks distributed on multiple clients not only raises privacy concerns, but also suffers from non-independent and identically distributed (non-IID) data, seriously deteriorating the optimization performance. To address the above challenges, this paper focuses on handling non-IID data in federated data-driven many-task optimization. To construct a high-quality global surrogate by robustly aggregating the local models, the server first fits a Gaussian distribution for each model parameter upon receiving local parameters, from which an ensemble model can be sampled. To reduce the communication cost and provide a generalized global model, a student surrogate model is derived by means of knowledge distillation from the ensemble. In addition, each client is allowed to retain both local and global models, so that the mean and variance of the predictions can be used to guide the selection of new samples. Experimental results demonstrate the reliability and efficacy of our proposed method on both benchmark problems and a real machine learning problem in the presence of non-IID data.