
Yaru Chen
Academic and research departments
About
My research project
Fine-grained audio-visual video understanding based on natural scenesYaru Chen, Ph.D. candidate at the Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, since 2022. Her research focuses on video understanding in natural scenes, particularly on audio-visual video parsing. During her doctoral studies, she has published several papers in conferences and journals such as ICASSP, ICMR, CVPR, and Information Fusion.
Supervisors
Yaru Chen, Ph.D. candidate at the Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, since 2022. Her research focuses on video understanding in natural scenes, particularly on audio-visual video parsing. During her doctoral studies, she has published several papers in conferences and journals such as ICASSP, ICMR, CVPR, and Information Fusion.
Publications
Highlights
Chen Y, Zhang P, Li F, et al. TeMTG: Text-Enhanced Multi-Hop Temporal Graph Modeling for Audio-Visual Video Parsing[C]//Proceedings of the 2025 International Conference on Multimedia Retrieval. 2025: 1978-1982.
Zhang P, Chen Y, Liu Y, et al. Multi-Category Fusion Contrastive Learning with Core Data Selection for robust RGB image-based dental caries classification[J]. Information Fusion, 2025: 103390.
Guo R, Ying X, Chen Y, et al. Audio-visual instance segmentation[C]//Proceedings of the Computer Vision and Pattern Recognition Conference. 2025: 13550-13560.
Zhang P, Chen Y, Liu Y, et al. Core Inter-Category Contrastive Learning for Enhancing Robustness of Caries Classification[C]//Proceedings of the 2025 International Conference on Multimedia Retrieval. 2025: 2108-2112.
Zhou Z, Hu F, Jiang H, et al. Find the stem along grape: a grape and stem segmentation method based on region images[J]. Signal, Image and Video Processing, 2025, 19(12): 973.
Chen Y, Guo R, Liu X, et al. CM-PIE: Cross-modal perception for interactive-enhanced audio-visual video parsing[C]//ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024: 8421-8425.
Wu P, Zhao J, Chen Y, et al. PLDISET: Probabilistic localization and detection of independent sound events with transformers[C]//Detection and Classification of Acoustic Scenes and Events 2023. 2023.