Wenjie Ai
Pronouns: he/him
About
My research project
Multi-Rater Learning with Noisy LabelsTBD
TBD
Publications
As datasets grow, expert-based annotation becomes impractical, making crowdsourcing a scalable alternative. In crowdsourcing, samples are typically annotated by multiple workers and aggregated via majority voting, which ignores annotator-specific biases and introduces noisy labels that impair downstream models. Traditional multi-rater methods attempt to model annotator biases but often overfit with many classes or few, noisy annotators. Learning with Noisy Labels (LNL) methods offer more robust strategies for handling noisy labels, but their assumption of a single noisy label per sample makes extending them to multi-annotator settings non-trivial. To bridge this gap, we propose the Reciprocal Teacher-student Learning from Multi-rater Noisy Annotation (RETINA), which trains annotator-specific models and employs a dynamic teacher-student process to separate clean from noisy samples. Progress in multi-rater learning has also been limited by benchmarks with few classes, fixed noise rates, and no control over annotators. To address this, we introduce the Synthetic MRL (SynMRL) benchmark that contains many classes and controllable noise and annotator settings for systematic evaluation. Experiments on synthetic and real-world data show that RETINA outperforms existing multi-rater methods, particularly in high-noise, low-annotator, many-class settings.
With the development of Human-AI Collaboration in Classification (HAI-CC), integrating users and AI predictions becomes challenging due to the complex decision-making process. This process has three options: 1) AI autonomously classifies, 2) learning to complement, where AI collaborates with users, and 3) learning to defer, where AI defers to users. Despite their interconnected nature, these options have been studied in isolation rather than as components of a unified system. In this paper, we address this weakness with the novel HAI-CC methodology, called Learning to Complement and to Defer to Multiple Users (LECODU). LECODU not only combines learning to complement and learning to defer strategies, but it also incorporates an estimation of the optimal number of users to engage in the decision process. The training of LECODU maximises classification accuracy and minimises collaboration costs associated with user involvement. Comprehensive evaluations across real-world and synthesized datasets demonstrate LECODU’s superior performance compared to state-of-the-art HAI-CC methods. Remarkably, even when relying on unreliable users with high rates of label noise, LECODU exhibits significant improvement over both human decision-makers alone and AI alone (Supported by the Engineering and Physical Sciences Research Council (EPSRC) through grant EP/Y018036/1). Code is available at https://github.com/zhengzhang37/LECODU.git.