
Dr Xilu Wang
ResearchResearch interests
My main area of interest is Data-driven Optimization, Artificial Intelligence, and Machine Learning. My research focus lies
in the development and application of concepts and algorithms of fair, privacy-preserving, and trustworthy machine learning
and data-driven optimization. Topics in this area include:
• Multi-objective optimization
• Data-driven optimization and Bayesian optimization
• Evolutionary Machine Learning: neural architecture search, transfer learning
• Trustworthy AI: fair and federated optimization
Research interests
My main area of interest is Data-driven Optimization, Artificial Intelligence, and Machine Learning. My research focus lies
in the development and application of concepts and algorithms of fair, privacy-preserving, and trustworthy machine learning
and data-driven optimization. Topics in this area include:
• Multi-objective optimization
• Data-driven optimization and Bayesian optimization
• Evolutionary Machine Learning: neural architecture search, transfer learning
• Trustworthy AI: fair and federated optimization
Supervision
Postgraduate research supervision
Completed postgraduate research projects I have supervised
Abel Alexander University of Surrey
“A comparative study of Bayesian algorithms on expensive optimization” September 2021-September 2022
• Distinction, Master student.
• He currently works in Bank of America.
Adrian Kruse Bielefeld University
“Bayesian Multi-objective Evolutionary Optimization, Bielefeld University” January 2023-September 2023
• Bachelor student.
Scheu Louis Lamar Bielefeld University
“Federated learning in objective detection” September 2023-Ongoing
• Master student.
Publications
Many real-world problems involve optimizing numerous decision variables and are expensive to evaluate, known as large-scale expensive optimization problems (LSEOPs). While surrogate-assisted evolutionary algorithms have proven effective for expensive problems, training proper models for LSEOPs remains challenging due to insufficient training data. In this paper, we adopt the divide-and-conquer approach, decomposing LSEOPs into lower-dimensional sub-problems and constructing models for sub-problems, and introduce a multi-view synthetic sampling technique for new sample selection. Specifically, we propose sorting all evaluated solutions in an ascending order and dividing them into intervals, from which data are sampled to obtain informative training data for models. The population for the LSEOP is updated by employing cooperative environmental selections on the population, formed by recombining all renewed populations for sub-problems to balance exploration and exploitation. Finally, a solution is selected among the current population for the true evaluation based on its multi-view performance predicted across all sub-problems. Results on CEC'2013 benchmark problems show the effectiveness and efficiency of our proposed method compared to three prevalent large-scale expensive optimization algorithms. Additionally, results on 2000-dimensional CEC'2010 benchmark problems and a 1200-dimensional real-world problem demonstrate encouraging scalability and robustness of the proposed method for addressing higher-dimensional problems.
Traditional large-scale evolutionary algorithms are limited in their ability to solve certain real-world applications with high-dimensional, black-box, and computationally expensive objectives due to their need for numerous objective evaluations. Surrogate-assisted evolutionary algorithms (SAEAs) have shown effective for expensive black-box optimization by relying on inexpensive surrogate models. However, large-scale optimization remains challenging for SAEAs due to the exponentially growing search space and the presence of multiple local optima, resulting in difficulty in training a proper model due to the lack of samples. To address these challenges, we propose constructing an initial surrogate model on randomly selected dimensions and calculating a Gaussian distribution for each sampled dimension. The surrogate then provides predictions when perturbing each sampled dimension by sampling from the distribution, enabling the identification of the most important variables for constructing an active sub-problem to reduce the search space. A secondary surrogate model, built for the active sub-problem, guides the offspring generation and environmental selection for a modified particle swarm optimization algorithm to effectively explores the sub-space while escaping local optima in large-scale problems. Experimental results on CEC'2013 and CEC'2010 benchmark problems show that the proposed method outperforms state-of-the-art algorithms in addressing large-scale expensive optimization problems. The efficiency of the proposed method is further verified on CEC'2010 benchmark problems extended to 2000 dimensions.
Blackbox optimization problems are commonly seen in the real world, ranging from experimental design to hyperparameter tuning of machine learning models. In numerous scenarios, addressing a collection of similar data-driven blackbox optimization tasks distributed on multiple clients not only raises privacy concerns, but also suffers from non-independent and identically distributed (non-IID) data, seriously deteriorating the optimization performance. To address the above challenges, this paper focuses on handling non-IID data in federated data-driven many-task optimization. To construct a high-quality global surrogate by robustly aggregating the local models, the server first fits a Gaussian distribution for each model parameter upon receiving local parameters, from which an ensemble model can be sampled. To reduce the communication cost and provide a generalized global model, a student surrogate model is derived by means of knowledge distillation from the ensemble. In addition, each client is allowed to retain both local and global models, so that the mean and variance of the predictions can be used to guide the selection of new samples. Experimental results demonstrate the reliability and efficacy of our proposed method on both benchmark problems and a real machine learning problem in the presence of non-IID data.