Anjan Dutta

Dr Anjan Dutta


About

Areas of specialism

Computer Vision; Machine Learning; Representation Learning; Learning from Fewer Labels; Structured Representation Learning

My qualifications

2021
Postgraduate Certification in Academic Practice (PCAP / FHEA)
University of Exeter
2014
PhD in Computer Science
Autonomous University of Barcelona
2010
MSc in Computer Vision and Artificial Intelligence
Autonomous University of Barcelona
2009
MCA in Computer Application
Maulana Abul Kalam Azad University of Technology
2006
BSc in Mathematics (Honours)
University of Calcutta

Previous roles

2019 - 2022
Lecturer of Computer Vision & Machine Learning
University of Exeter
2017 - 2019
Marie Curie Fellow
Computer Vision Centre
2016 - 2017
Postdoctoral Researcher
Computer Vision Centre
2014 - 2015
Postdoctoral Researcher
Télécom ParisTech

Publications

Abhra Chaudhuri, Massimiliano Mancini, Yanbei Chen, Zeynep Akata, ANJAN DUTTA (2022)Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval

Representation learning for sketch-based image retrieval has mostly been tackled by learning embeddings that discard modality-specific information. As instances from different modalities can often provide complementary information describing the underlying concept, we propose a cross-attention framework for Vision Transformers (XModalViT) that fuses modality-specific information instead of discarding them. Our framework first maps paired datapoints from the individual photo and sketch modalities to fused representations that unify information from both modalities. We then decouple the input space of the aforementioned modality fusion network into independent encoders of the individual modalities via contrastive and relational cross-modal knowledge distillation. Such encoders can then be applied to downstream tasks like cross-modal retrieval. We demonstrate the expressive capacity of the learned representations by performing a wide range of experiments and achieving state-of-the-art results on three fine-grained sketch-based image retrieval benchmarks: Shoe-V2, Chair-V2 and Sketchy. Implementation is available at https://github.com/abhrac/xmodal-vit.

Abhra Chaudhuri, Massimiliano Mancini, Zeynep Akata, ANJAN DUTTA (2022)Relational Proxies: Emergent Relationships as Fine-Grained Discriminators

Fine-grained categories that largely share the same set of parts cannot be discriminated based on part information alone, as they mostly differ in the way the local parts relate to the overall global structure of the object. We propose Relational Proxies , a novel approach that leverages the relational information between the global and local views of an object for encoding its semantic label. Starting with a rigorous formalization of the notion of distinguishability between fine-grained categories, we prove the necessary and sufficient conditions that a model must satisfy in order to learn the underlying decision boundaries in the fine-grained setting. We design Relational Proxies based on our theoretical findings and evaluate it on seven challenging fine-grained benchmark datasets and achieve state-of-the-art results on all of them, surpassing the performance of all existing works with a margin exceeding 4% in some cases. We also experimentally validate our theory on fine-grained dis-tinguishability and obtain consistent results across multiple benchmarks. Implementation is available at https://github.com/abhrac/relational-proxies.

Stephan Alaniz, Massimiliano Mancini, Anjan Dutta, Diego Marcos, Zeynep Akata (2022)Abstracting Sketches through Simple Primitives

Humans show high-level of abstraction capabilities in games that require quickly communicating object information. They decompose the message content into multiple parts and communicate them in an interpretable protocol. Toward equipping machines with such capabilities , we propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primi-tives under the influence of a budget. To solve this task, our Primitive-Matching Network (PMN), learns interpretable abstractions of a sketch in a self supervised manner. Specifically, PMN maps each stroke of a sketch to its most similar primitive in a given set, predicting an affine transformation that aligns the selected primitive to the target stroke. We learn this stroke-to-primitive mapping end-to-end with a distance-transform loss that is minimal when the original sketch is precisely reconstructed with the predicted primitives. Our PMN abstraction empirically achieves the highest performance on sketch recognition and sketch-based image retrieval given a communication budget, while at the same time being highly interpretable. This opens up new possibilities for sketch analysis, such as comparing sketches by extracting the most relevant primitives that define an object category. Code is available at https://github.com/ExplainableML/sketch-primitives.