11am - 12 noon

Friday 5 September 2025

Multimodal Machine Learning for Fish Feeding Behaviour Analysis

PhD Viva Open Presentation - Meng Cui

Online event - All Welcome!

Free

ONLINE

Speakers


Multimodal Machine Learning for Fish Feeding Behaviour Analysis

Abstract:

Fish feeding behaviour serves as a vital indicator of fish health and appetite in aquaculture systems, directly influencing growth rates and production efficiency. Fish Feeding Intensity Recognition (FFIR) has emerged as a key technology for quantifying fish feeding intensity during feeding processes, enabling precise assessment of fish appetite changes. Despite recent advances, FFIR continues to face significant challenges in robustness, adaptability, and computational efficiency. This thesis investigates FFIR from three perspectives: novel multi-modal approaches, efficient model architectures, and adaptive learning capabilities.

Traditional FFIR methods rely primarily on computer vision techniques, which are limited by water refraction and uneven illumination. Therefore, we first study the feasibility of using acoustic-based methods for identifying fish feeding behaviour. We first introduce AFFIR3K, a novel audio-based dataset containing 3,000 labelled audio clips of different fish feeding intensities. Our proposed deep learning framework, utilising mel spectrograms and CNN-based models, achieves an mAP of 0.74 on the test set, demonstrating the potential of audio-based approaches in aquaculture applications.

To overcome the limitations of single-modality approaches, we present AV-FFIR, a comprehensive dataset of 27,000 labelled audio and video clips, and develop multimodal methods for FFIR. However, these approaches often involve higher computational costs due to the independent encoders for each modality. We propose U-FFIR, a unified mixed-modality model that can process audio, visual, or audio-visual inputs using a single architecture. Through modality dropout during training and knowledge distillation, U-FFIR achieves performance comparable to state-of-the-art modality-specific models while reducing parameters by 54% and computational overhead by 77%.

Finally, we address the challenge of adapting FFIR systems to new fish species and environments. We introduce HAIL-FFIR, a novel hierarchical audio-visual class-incremental learning framework, along with the AV-CIL-FFIR dataset containing 81,932 labelled clips across six fish species. Our framework combines hierarchical representation learning, prototype-enhanced parameter updates, and dynamic modality balancing to mitigate catastrophic forgetting. Experimental results demonstrate that HAIL-FFIR achieves 75.92% average accuracy with only 9.36% forgetting, significantly outperforming existing incremental learning approaches. This represents the first application of class-incremental learning to FFIR and demonstrates the potential of continual audio-visual learning in enhancing the adaptability of aquaculture monitoring systems.