Research

Research interests

Publications

Xingchi Liu, Qing Li, Jiaming Liang, Jinzheng Zhao, Peipei Wu, Chenyi Lyu, Shidrokh Goudarzi, Jemin George, Tien Pham, Wenwu Wang, Lyudmila Mihaylova, Simon Godsill (2022)Advanced Machine Learning Methods for Autonomous Classification of Ground Vehicles with Acoustic Data, In: T Pham, L Solomon (eds.), ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS IV12113121131pp. 121131P-121131P-10 Spie-Int Soc Optical Engineering

This paper presents a distributed multi-class Gaussian process (MCGP) algorithm for ground vehicle classification using acoustic data. In this algorithm, the harmonic structure analysis is used to extract features for GP classifier training. The predictions from local classifiers are then aggregated into a high-level prediction to achieve the decision-level fusion, following the idea of divide-and-conquer. Simulations based on the acoustic-seismic classification identification data set (ACIDS) confirm that the proposed algorithm provides competitive performance in terms of classification error and negative log-likelihood (NLL), as compared to an MCGP based on the data-level fusion where only one global MCGP is trained using data from all the sensors.

JINZHENG ZHAO, PEIPEI WU, SHIDROKH GOUDARZI, XUBO LIU, JIANYUAN SUN, Yong Xu, WENWU WANG (2022)Visually Assisted Self-supervised Audio Speaker Localization and Tracking

—Training a robust tracker of objects (such as vehicles and people) using audio and visual information often needs a large amount of labelled data, which is difficult to obtain as manual annotation is expensive and time-consuming. The natural synchronization of the audio and visual modalities enables the object tracker to be trained in a self-supervised manner. In this work, we propose to localize an audio source (i.e., speaker) using a teacher-student paradigm, where the visual network teaches the audio network by knowledge distillation to localize speakers. The introduction of multi-task learning, by training the audio network to perform source localization and semantic segmentation jointly, further improves the model performance. Experimental results show that the audio localization network can learn from visual information and achieve competitive tracking performance as compared to the baseline methods that are based on the audio-only measurements. The proposed method can provide more reliable measurements for tracking than the traditional sound source localization methods, and the generated audio features aid in visual tracking.

Jinzheng Zhao, PEIPEI WU, XUBO LIU, WENWU WANG, Yong-Xu Hu, Lyudmila Mihaylova, Simon Godsill (2022)AUDIO-VISUAL TRACKING OF MULTIPLE SPEAKERS VIA A PMBM FILTE

Intensity Particle Flow (IPF) SMC-PHD has been proposed recently for multi-target tracking. In this paper, we extend IPF-SMC-PHD filter to distributed setting, and develop a novel consensus method for fusing the estimates from individual sensors, based on Arithmetic Average (AA) fusion. Different from conventional AA method which may be degraded when unreliable estimates are presented, we develop a novel arithmetic consensus method to fuse estimates from each individual IPF-SMC-PHD filter with partial consensus. The proposed method contains a scheme for evaluating the reliability of the sensor nodes and preventing unreliable sensor information to be used in fusion and communication in sensor network, which help improve fusion accuracy and reduce sensor communication costs. Numerical simulations are performed to demonstrate the advantages of the proposed algorithm over the uncooperative IPF-SMC-PHD and distributed particle-PHD with AA fusion.