Dr Xilu Wang

Surrey Future Fellow

PhD

wangxilu@surrey.ac.uk

Fridays 9:00-14:00 by appointment using the bookings link above

Academic and research departments

School of Computer Science and Electronic Engineering, Nature Inspired Computing and Engineering Research Group, Surrey Institute for People-Centred Artificial Intelligence (PAI).

Research

Research interests

My main area of interest is Data-driven Optimization, Artificial Intelligence, and Machine Learning. My research focus lies
in the development and application of concepts and algorithms of fair, privacy-preserving, and trustworthy machine learning
and data-driven optimization. Topics in this area include:
• Multi-objective optimization
• Data-driven optimization and Bayesian optimization
• Evolutionary Machine Learning: neural architecture search, transfer learning
• Trustworthy AI: fair and federated optimization

Supervision

Postgraduate research supervision

Completed postgraduate research projects I have supervised

Abel Alexander University of Surrey
“A comparative study of Bayesian algorithms on expensive optimization” September 2021-September 2022
• Distinction, Master student.
• He currently works in Bank of America.
Adrian Kruse Bielefeld University
“Bayesian Multi-objective Evolutionary Optimization, Bielefeld University” January 2023-September 2023
• Bachelor student.
Scheu Louis Lamar Bielefeld University
“Federated learning in objective detection” September 2023-Ongoing
• Master student.

Publications

Kaili Zhao, Xilu Wang, Chaoli Sun, Yaochu Jin (2025)A Large-scale Expensive Optimization Algorithm with a Multi-view Synthetic Sampling, In: IEEE Transactions on Evolutionary ComputationEarly Access(Early Access) Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/TEVC.2025.3571929

Many real-world problems involve optimizing numerous decision variables and are expensive to evaluate, known as large-scale expensive optimization problems (LSEOPs). While surrogate-assisted evolutionary algorithms have proven effective for expensive problems, training proper models for LSEOPs remains challenging due to insufficient training data. In this paper, we adopt the divide-and-conquer approach, decomposing LSEOPs into lower-dimensional sub-problems and constructing models for sub-problems, and introduce a multi-view synthetic sampling technique for new sample selection. Specifically, we propose sorting all evaluated solutions in an ascending order and dividing them into intervals, from which data are sampled to obtain informative training data for models. The population for the LSEOP is updated by employing cooperative environmental selections on the population, formed by recombining all renewed populations for sub-problems to balance exploration and exploitation. Finally, a solution is selected among the current population for the true evaluation based on its multi-view performance predicted across all sub-problems. Results on CEC'2013 benchmark problems show the effectiveness and efficiency of our proposed method compared to three prevalent large-scale expensive optimization algorithms. Additionally, results on 2000-dimensional CEC'2010 benchmark problems and a 1200-dimensional real-world problem demonstrate encouraging scalability and robustness of the proposed method for addressing higher-dimensional problems.

Kaili Zhao, Xilu Wang, Chaoli Sun, Yaochu Jin, Asad Hayat (2024)Efficient Large-Scale Expensive Optimization via Surrogate-assisted Sub-problem Selection, In: IEEE Transactions on Evolutionary Computation14(8) Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/TEVC.2025.3544449

Traditional large-scale evolutionary algorithms are limited in their ability to solve certain real-world applications with high-dimensional, black-box, and computationally expensive objectives due to their need for numerous objective evaluations. Surrogate-assisted evolutionary algorithms (SAEAs) have shown effective for expensive black-box optimization by relying on inexpensive surrogate models. However, large-scale optimization remains challenging for SAEAs due to the exponentially growing search space and the presence of multiple local optima, resulting in difficulty in training a proper model due to the lack of samples. To address these challenges, we propose constructing an initial surrogate model on randomly selected dimensions and calculating a Gaussian distribution for each sampled dimension. The surrogate then provides predictions when perturbing each sampled dimension by sampling from the distribution, enabling the identification of the most important variables for constructing an active sub-problem to reduce the search space. A secondary surrogate model, built for the active sub-problem, guides the offspring generation and environmental selection for a modified particle swarm optimization algorithm to effectively explores the sub-space while escaping local optima in large-scale problems. Experimental results on CEC'2013 and CEC'2010 benchmark problems show that the proposed method outperforms state-of-the-art algorithms in addressing large-scale expensive optimization problems. The efficiency of the proposed method is further verified on CEC'2010 benchmark problems extended to 2000 dimensions.

Xilu Wang, Yaochu Jin (2024)Distilling Ensemble Surrogates for Federated Data-Driven Many-Task Optimization, In: IEEE Transactions on Evolutionary ComputationEarly Access(Early Access) Institute of Electrical and Electronics Engineers (IEEE)

DOI: 10.1109/TEVC.2024.3428701

Blackbox optimization problems are commonly seen in the real world, ranging from experimental design to hyperparameter tuning of machine learning models. In numerous scenarios, addressing a collection of similar data-driven blackbox optimization tasks distributed on multiple clients not only raises privacy concerns, but also suffers from non-independent and identically distributed (non-IID) data, seriously deteriorating the optimization performance. To address the above challenges, this paper focuses on handling non-IID data in federated data-driven many-task optimization. To construct a high-quality global surrogate by robustly aggregating the local models, the server first fits a Gaussian distribution for each model parameter upon receiving local parameters, from which an ensemble model can be sampled. To reduce the communication cost and provide a generalized global model, a student surrogate model is derived by means of knowledge distillation from the ensemble. In addition, each client is allowed to retain both local and global models, so that the mean and variance of the predictions can be used to guide the selection of new samples. Experimental results demonstrate the reliability and efficacy of our proposed method on both benchmark problems and a real machine learning problem in the presence of non-IID data.