11am - 12 noon

Wednesday 25 June 2025

Discrete Representation Learning for Retrieval and Generation

PhD Viva Open Presentation - Kam Woh Ng

Online event (Teams) - All Welcome!

Free

Online

back to all events

This event has passed

Speakers

Kam Woh Ng

Discrete Representation Learning for Retrieval and Generation

Abstract:

Discrete representation learning plays a pivotal role in image retrieval and generation, offering advantages in efficiency, scalability, and interpretability. By mapping high-dimensional data to compact hash codes, discrete representations enable faster computations, reduce storage requirements, and facilitate large-scale retrieval via efficient similarity searches such as Hamming distance. In generative tasks, discretization simplifies control over outputs by reducing prompt complexity, allowing users to manipulate predefined attributes -- such as color schemes or styles -- with greater consistency. This thesis explores the impact of discrete representations on image retrieval (supervised and unsupervised hashing) and image and 3D generation.

The contributions of this thesis are fourfold. First, we propose a novel supervised hashing approach that unifies deep hashing requirements into a single classification objective, improving retrieval efficiency while maintaining bit balance and maximizing retrieval accuracy. Second, we address the similarity collapse issue in unsupervised hashing -- where continuous features lose meaningful distances in hash space -- by introducing a calibration technique that better preserves relative similarity, significantly outperforming state-of-the-art methods. Third, we explore the compositional nature of objects in image generation, introducing a part-based framework that enhances fine-grained control in text-to-image diffusion models. Instead of relying solely on textual prompts, our approach allows users to generate objects by explicitly selecting specific parts, ensuring greater precision and customization in the output. Lastly, we tackle 3D generation by leveraging a multi-view diffusion model trained solely on 2D images, allowing the synthesis of high-quality 3D objects with species-specific details beyond existing exemplars. This approach bridges the gap between 2D image-based modeling and 3D object synthesis without requiring large-scale 3D datasets.

Featured Academics

Prof Tao Xiang

Professor of Computer Vision and Machine Learning