Xiaolong Han

Postgraduate Research Student

xiaolong.han@surrey.ac.uk

https://xiaolonghan2000.github.io/

About

My research project

Weight Space Learning: A New Perspective on Neural Networks

Weight Space Learning is a research perspective that shifts focus from studying neural networks only through their input–output functions to directly analyzing and leveraging their parameters. Unlike conventional training, which treats weights merely as optimization variables, weight space learning regards them as a meaningful domain of study and operation. Existing works in this area can be organized along three complementary dimensions: (1) weight space understanding, which investigates the geometry, symmetry, and statistical properties of weights; (2) weight space discrimination, which treats weights as a modality for tasks such as embedding, retrieval, and behavior prediction; and (3) weight space generation, which explores how new parameters can be produced via generative models, hypernetworks, or model merging. This framing highlights weight space learning as distinct from function-space or purely optimization-centric views, aiming to build a systematic foundation for reasoning about, operating on, and reusing neural network parameters.

Publications

Zehong Wang, Xiaolong Han, Yanru Chen, Xiaotong Ye, Keli Hu, Donghua Yu (2022) Prediction of willingness to pay for airline seat selection based on improved ensemble learning

Airlines have launched various ancillary services to meet their passengers’ requirements and to increase their revenue. Ancillary revenue from seat selection is an important source of revenue for airlines and is a common type of advertisement. However, advertisements are generally delivered to all customers, including a significant proportion of people who do not wish to pay for seat selection. Random advertisements may thus decrease the amount of profit generated since users will tire of useless advertising, leading to a decrease in user stickiness. To solve this problem, we propose a Bagging in Certain Ratio Light Gradient Boosting Machine (BCR-LightGBM) to predict the willingness of passengers to pay to choose their seats. The experimental results show that the proposed model outperforms all 12 comparison models in terms of the area under the receiver operating characteristic curve (ROC-AUC) and F1-score. Furthermore, we studied two typical samples to demonstrate the decision-making phase of a decision tree in BCR-LightGBM and applied the Shapley additive explanation (SHAP) model to analyse the important influencing factors to further enhance the interpretability. We conclude that the customer’s values, the ticket fare, and the length of the trip are three factors that airlines should consider in their seat selection service.

Zehong Wang, Qi Li, Donghua Yu, Xiaolong Han (2022) Temporal graph transformer for dynamic network

Graph neural networks (GNN) have received great attention in recent years due to their unique role in mining graph-based data. Although most work focuses on learning low-dimensional node representation in static graphs, the dynamic nature of real-world networks makes temporal graphs more practical and significant. Continuous-time dynamic graph (CTDG) is a general approach to express temporal networks in fine granularity. Owing to the high time consumption in training and inference, existing CTDG-based algorithms capture information from 1-hop neighbors, ignoring the messages from high-order neighbors, which inevitably leads to model degradation. To overcome the challenge, we propose Temporal Graph Transformer (TGT) to efficiently capture the evolving and semantic information from high-order neighborhoods in dynamic graphs. The proposed TGT consists of three modules, i.e., update module, aggregation module, and propagation module. Different from previous works that aggregate messages layer by layer, the model captures messages from 1-hop and 2-hop neighbors in a single layer. In particular, (1) the update module learns from messages derived from interactions; (2) the aggregation module aggregates 1-hop temporal neighbors to compute node embedding; (3) the propagation module re-updates the hidden state of temporal neighbors to introduce 2-hop information. Experimental results on three real-world networks demonstrate the superiority of TGT in efficacy and efficiency.

Xiaolong Han, Yu Xue, Zehong Wang, Yong Zhang, Anton Muravev, Moncef Gabbouj (2024) SaDENAS: A self-adaptive differential evolution algorithm for neural architecture search

Evolutionary neural architecture search (ENAS) and differentiable architecture search (DARTS) are all prominent algorithms in neural architecture search, enabling the automated design of deep neural networks. To leverage the strengths of both methods, there exists a framework called continuous ENAS, which alternates between using gradient descent to optimize the supernet and employing evolutionary algorithms to optimize the architectural encodings. However, in continuous ENAS, there exists a premature convergence issue accompanied by the small model trap, which is a common issue in NAS. To address this issue, this paper proposes a self-adaptive differential evolution algorithm for neural architecture search (SaDENAS), which can reduce the interference caused by small models to other individuals during the optimization process, thereby avoiding premature convergence. Specifically, SaDENAS treats architectures within the search space as architectural encodings, leveraging vector differences between encodings as the basis for evolutionary operators. To achieve a trade-off between exploration and exploitation, we integrate both local and global search strategies with a mutation scaling factor to adaptively balance these two strategies. Empirical findings demonstrate that our proposed algorithm achieves better performance with superior convergence compared to other algorithms.

Yu Xue, Xiaolong Han, Ferrante Neri, Jiafeng Qin, Danilo Pelusi (2024) A gradient-guided evolutionary neural architecture search

Neural architecture search (NAS) is a popular method that can automatically design deep neural network structures. However, designing a neural network using NAS is computationally expensive. This article proposes a gradient-guided evolutionary NAS (GENAS) to design convolutional neural networks (CNNs) for image classification. GENAS is a hybrid algorithm that combines evolutionary global and local search operators to evolve a population of subnets sampled from a supernet. Each candidate architecture is encoded as a table describing which operations are associated with the edges between nodes signifying feature maps. Besides, evolutionary optimization uses novel crossover and mutation operators to manipulate the subnets using the proposed tabular encoding. Every n generations, the candidate architectures undergo a local search inspired by differentiable NAS. GENAS is designed to overcome the limitations of both evolutionary and gradient descent NAS. This algorithmic structure enables the performance assessment of the candidate architecture without retraining, thus limiting the NAS calculation time. Furthermore, subnet individuals are decoupled during evaluation to prevent strong coupling of operations in the supernet. The experimental results indicate that the searched structures achieve test errors of 2.45%, 16.86%, and 23.9% on CIFAR-10/100/ImageNet datasets and it costs only 0.26 GPU days on a graphic card. GENAS can effectively expedite the training and evaluation processes and obtain high-performance network structures.

Yu Xue, Xiaolong Han, Zehong Wang (2024) Self-adaptive weight based on dual-attention for differentiable neural architecture search

Differentiable architecture search is a popular gradient-based method for neural architecture search, and has achieved great success in automating design of neural network architectures. However, it still has some limitations such as performance collapse, which seriously affects network architecture performance. To solve this issue, we propose an algorithm called self-adaptive weight based on dual-attention for differentiable neural architecture search (SWD-NAS) in this article. SWD-NAS utilizes a dual-attention mechanism to measure architectural weights. Specifically, an upper-attention module is used to adaptively select channels based on their weights before inputting into the search space. A lower-attention (LA) module is utilized to calculate architectural weights. In addition, we propose an architectural weight normalization to alleviate the unfair competition among connection edges. Finally, we evaluate the architectures searched on CIFAR-10 and CIFAR-100, achieving test errors of 2.51% and 16.13%, respectively. Furthermore, we transfer the architecture searched on CIFAR-10 to ImageNet, achieving top-one and top-five errors of 24.5% and 7.6%, respectively. This demonstrates the superior performance of the proposed algorithm compared to many gradient-based algorithms.

Zehong Wang, Qi Li, Donghua Yu, Xiaolong Han, Xiao-Zhi Gao, Shigen Shen (2023) Heterogeneous graph contrastive multi-view learning

Inspired by the success of Contrastive Learning (CL) in computer vision and natural language processing, Graph Contrastive Learning (GCL) has been developed to learn discriminative node representations on graph datasets. However, the development of GCL on Heterogeneous Information Networks (HINs) is still in the infant stage. For example, it is unclear how to augment the HINs without substantially altering the underlying semantics, and how to design the contrastive objective to fully capture the rich semantics. Moreover, early investigations demonstrate that CL suffers from sampling bias, whereas conventional debias- ing techniques are empirically shown to be inadequate for GCL. How to mitigate the sampling bias for heterogeneous GCL is another important problem. To address the aforementioned challenges, we propose a novel Heterogeneous Graph Contrastive Multi-view Learning (HGCML) model. In particular, we use metapaths as the augmentation to generate multiple subgraphs as multi-views, and propose a contrastive objective to maximize the mutual information between any pairs of metapath-induced views. To alleviate the sampling bias, we further propose a positive sampling strategy to explicitly select positives for each node via jointly considering semantic and structural information preserved on each metapath view. Extensive experiments demonstrate HGCML consistently outperforms state-of-the-art baselines on five real-world benchmark datasets. To enhance the repro- ducibility of our work, we make all the code publicly available at https://github.com/Zehong-Wang/HGCML.