My research project




J12:  Ayan Kumar Bhunia*,  Subham Mukherjee, Aneeshan Sain,  Ankan Kumar Bhunia, Partha Pratim Roy,  Umapada PaL, “Indic Handwritten Script Identification using Offline-Online multi-modal Deep Network”,  Information Fusion, Elsevier. 2019.  (Impact Factor - 10.716)

C11: Ankan Kumar Bhunia, Ayan Kumar Bhunia*,  Aneeshan Sain,  Partha Pratim Roy, “Improving Document Binarization via Adversarial Noise-Texture Augmentation”,  IEEE International Conference on Image Processing (ICIP), 2019.  [PDF] [Github]

C10: Pranay Mukherjee, Abhirup Das, Ayan Kumar Bhunia*, Partha Pratim Roy, “Cogni-Net: Cognitive Feature Learning through Deep Visual Perception”,  IEEE International Conference on Image Processing (ICIP), 2019. [PDF] [Github]

J11: Ayan Kumar Bhunia, Shuvozit Ghose, Partha Pratim Roy, Subrahmanyam Murala, “A Novel Feature Descriptor for Image Retrieval by Combining Modified Color Histogram and Diagonally Symmetric Co-occurrence Texture Pattern”, Pattern Analysis and Applications, 2019, Springer. (Impact Factor-1.28)   [PDF

C9: Ayan Kumar Bhunia, Abhirup Das,  Ankan Kumar Bhunia, Perla Sai Raj Kishore, Partha Pratim Roy, “Handwriting Recognition in Low-resource Scripts using Adversarial Learning ”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.  [PDF]

C8: Sauradeep Nag, Ayan Kumar Bhunia*,  Aishik Konwer,  Partha Pratim Roy, “Facial Micro-expression Spotting and Recognition using Time Contrasted Feature with Visual Memory”,  International  Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.  [PDF

C7: Perla Sai Raj Kishore, Ayan Kumar Bhunia*,  Shuvozit Ghose,  Partha Pratim Roy, “User Constrained Thumbnail Generation using Adaptive Convolutions”,  International  Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.  [PDF] [Oral] [Code]

C6: Ayan Kumar Bhunia,  Perla Sai Raj Kishore, Pranay Mukherjee, Abhirup Das, Partha Pratim Roy “Texture Synthesis Guided Deep Hashing for Texture Image Retrieval”, IEEE  Winter Conference on Applications of Computer Vision (WACV), 2019.  [PDF, Youtube



C5: Ayan Kumar Bhunia,  Abir Bhowmick, Ankan Kumar Bhunia, Aishik Konwer, Prithaj Banerjee, Partha Pratim Roy, Umapada Pal, “Handwriting Trajectory Recovery using End-to-End Deep Encoder-Decoder Network”, 24th International Conference on Pattern Recognition (ICPR), 2018.  [PDFCODE

C4: Ankan Kumar Bhunia, Ayan Kumar Bhunia, Prithaj Banerjee, Aishik Konwer, Abir Bhowmick, Partha Pratim Roy, Umapada Pal, “Word Level Font-to-Font Image Translation using Convolutional Recurrent Generative Adversarial Networks”, 24th International Conference on Pattern Recognition (ICPR), 2018. [PDF

C3: Aishik Konwer, Ayan Kumar Bhunia, Abir Bhowmick, Ankan Kumar Bhunia, Prithaj Banerjee, Partha Pratim Roy, Umapada Pal, “Staff line Removal using Generative Adversarial Networks”, 24th International Conference on Pattern Recognition (ICPR), 2018.  [PDFOral ]

J10: Partha Pratim Roy, Ayan Kumar Bhunia, Avirup Bhattacharyya, Umapada Pal, “Word Searching in Scene Image and Video Frame in Multi-Script Scenario using Dynamic Shape Coding”, Multimedia Tools and Applications, 2018, Springer. (Impact Factor-1.541)   (DOI:10.1007/s11042-018-6484-5) [PDF

J9: Ankan Kumar Bhunia*, Aishik Konwer*,  Ayan Kumar Bhunia, Abir Bhowmick, Partha Pratim Roy, Umapada Pal, “Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network", Pattern Recognition,  Elsevier, 2018.  (Impact Factor - 3.962) (DOI:10.1016/j.patcog.2018.07.034) [PDFGithub

J8: Prithaj Banerjee, Ayan Kumar Bhunia, Partha Pratim Roy, Avirup Bhattacharyya, Partha Pratim Roy, Subrahmanyam Murala, “Local Neighborhood Intensity Pattern–A new texture feature descriptor for image retrieval", Expert Systems with Applications, 2018.  (Impact Factor - 3.928) (DOI:10.1016/j.eswa.2018.06.044) [PDF]  

J7: Ayan Kumar Bhunia, Partha Pratim Roy, Akash Mohta, Umapada Pal, “Cross-language Framework for Word Recognition and Spotting of Indic Scripts", Pattern Recognition, 2018, Elsevier. (Impact Factor - 4.582) (DOI:10.1016/j.patcog.2018.01.034) [PDF]  

J6: Aneeshan Sain, Ayan Kumar Bhunia, Partha Pratim Roy, Umapada Pal, “Multi-Oriented Text Detection and Verification in Video Frames and Scene Images", Neurocomputing, Elsevier. (Impact Factor - 3.317) (DOI:10.1016/j.neucom.2017.09.089) [PDF]   

J5: Ayan Kumar Bhunia, Gautam Kumar, Partha Pratim Roy, R. Balasubramanian, Umapada Pal, “Text Recognition in Scene Image and Video Frames using Color Channel Selection”, Multimedia Tools and ApplicationsVolume 77, Pages 8551–8578, 2018, Springer. (Impact Factor-1.530)  (DOI:10.1007/s11042-017-4750-6) [PDF

J4: Partha Pratim Roy, Ayan Kumar Bhunia, Umapada Pal, “Date-Field Retrieval in Scene Image and Video Frames using Text Enhancement and Shape Coding", Neurocomputing, Elsevier. (Impact Factor - 2.392) (DOI:10.1016/j.neucom.2016.08.141) [PDF



J3: Partha Pratim Roy, Ayan Kumar Bhunia, Umapada Pal, “HMM-based Writer Identification in Music Score Documents without Staff-Line Removal", Expert Systems with Applications, Volume-89, Pages 222-240, 2017, Elsevier. (Impact Factor - 3.928)  (DOI:10.1016/j.eswa.2017.07.031) [PDF

J2:  Partha Pratim Roy, Ayan Kumar Bhunia, Ayan Das, Prithviraj Dhar, Umapada Pal, “Keyword Spotting in Doctor's Handwriting on Medical Prescriptions”, Expert Systems with Applications, Volume-76, Pages 113-128, 2017, Elsevier. (Impact Factor 2.981). (DOI:10.1016/j.eswa.2017.01.027) [PDF



J1: Partha Pratim Roy, Ayan Kumar Bhunia, Ayan Das, Prasenjit Dey, Umapada Pal, “HMM-based Indic Handwriting Recognition using Zone Segmentation", Pattern Recognition, Volume-60, Pages 1057-1075, 2016, Elsevier. (Impact Factor-3.399) (DOI:10.1016/j.patcog.2016.04.012)[PDF]   



C1: Ayan Kumar Bhunia, Ayan Das, Partha Pratim Roy, Umapada Pal, “A Comparative Study of Features for Handwritten Bangla Text Recognition”, 13th International Conference on Document Analysis and Recognition (ICDAR), Pages 636-640, Nancy-France, 2015, IEEE. (DOI:10.1109/ICDAR.2015.7333839) [PDF, Oral]

C2: Ayan Das, Ayan Kumar Bhunia, Partha Pratim Roy, Umapada Pal, “Handwritten Word Spotting in Indic Scripts using Foreground and Back- ground Information", 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Pages 426-430, KualLumpur- Malaysia,2015, IEEE. (DOI:10.1109/ACPR.2015.7486539) [PDF]

Pinaki Nath Chowdhury, Aneeshan Sain, Ayan Kumar Bhunia, Tao Xiang, Yulia Gryaditskaya, Yi-Zhe Song (2022)FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context

We advance sketch research to scenes with the first dataset of freehand scene sketches, FS-COCO. With practical applications in mind, we collect sketches that convey scene content well but can be sketched within a few minutes by a person with any sketching skills. Our dataset comprises 10,000 freehand scene vector sketches with per point space-time information by 100 non-expert individuals, offering both object- and scene-level abstraction. Each sketch is augmented with its text description. Using our dataset, we study for the first time the problem of fine-grained image retrieval from freehand scene sketches and sketch captions. We draw insights on: (i) Scene salience encoded in sketches using the strokes temporal order; (ii) Performance comparison of image retrieval from a scene sketch and an image caption; (iii) Complementarity of information in sketches and image captions, as well as the potential benefit of combining the two modalities. In addition, we extend a popular vector sketch LSTM-based encoder to handle sketches with larger complexity than was supported by previous work. Namely, we propose a hierarchical sketch decoder, which we leverage at a sketch-specific ``pretext" task. Our dataset enables for the first time research on freehand scene sketch understanding and its practical applications. We release the dataset under CC BY-NC 4.0 license: https://fscoco.github.io

Ayan Kumar Bhunia, Ayan Das, Umar Riaz Muhammad, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yulia Gryaditskaya, Yi-Zhe Song (2020)Pixelor: A Competitive Sketching AI Agent. So you think you can sketch?, In: ACM Transactions on Graphics39(6) Association for Computing Machinery (ACM)

We present the first competitive drawing agent Pixelor that exhibits human-level performance at a Pictionary-like sketching game, where the participant whose sketch is recognized first is a winner. Our AI agent can autonomously sketch a given visual concept, and achieve a recognizable rendition as quickly or faster than a human competitor. The key to victory for the agent’s goal is to learn the optimal stroke sequencing strategies that generate the most recognizable and distinguishable strokes first. Training Pixelor is done in two steps. First, we infer the stroke order that maximizes early recognizability of human training sketches. Second, this order is used to supervise the training of a sequence-to-sequence stroke generator. Our key technical contributions are a tractable search of the exponential space of orderings using neural sorting; and an improved Seq2Seq Wasserstein (S2S-WAE) generator that uses an optimal-transport loss to accommodate the multi-modal nature of the optimal stroke distribution. Our analysis shows that Pixelor is better than the human players of the Quick, Draw! game, under both AI and human judging of early recognition. To analyze the impact of human competitors’ strategies, we conducted a further human study with participants being given unlimited thinking time and training in early recognizability by feedback from an AI judge. The study shows that humans do gradually improve their strategies with training, but overall Pixelor still matches human performance. The code and the dataset are available at http://sketchx.ai/pixelor.

Ayan Kumar Bhunia, PINAKI NATH CHOWDHURY, ANEESHAN SAIN, YONGXIN YANG, TAO XIANG, YI-ZHE SONG (2021)More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

A fundamental challenge faced by existing Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) models is the data scarcity – model performances are largely bottlenecked by the lack of sketch-photo pairs. Whilst the number of photos can be easily scaled, each corresponding sketch still needs to be individually produced. In this paper, we aim to mitigate such an upper-bound on sketch data, and study whether unlabelled photos alone (of which they are many) can be cultivated for performance gain. In particular, we introduce a novel semi-supervised framework for cross-modal retrieval that can additionally leverage large-scale unla-belled photos to account for data scarcity. At the center of our semi-supervision design is a sequential photo-to-sketch generation model that aims to generate paired sketches for unlabelled photos. Importantly, we further introduce a discriminator-guided mechanism to guide against unfaithful generation, together with a distillation loss-based regu-larizer to provide tolerance against noisy training samples. Last but not least, we treat generation and retrieval as two conjugate problems, where a joint learning procedure is devised for each module to mutually benefit from each other. Extensive experiments show that our semi-supervised model yields a significant performance boost over the state-of-the-art supervised alternatives, as well as existing methods that can exploit unlabelled photos for FG-SBIR.

AYAN KUMAR BHUNIA, Shuvozit Ghose, Amandeep Kumar, Pinaki Nath Chowdhury, Aneeshan Sain, YI-ZHE SONG (2021)MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition

Handwritten Text Recognition (HTR) remains a challenging problem to date, largely due to the varying writing styles that exist amongst us. Prior works however generally operate with the assumption that there is a limited number of styles, most of which have already been captured by existing datasets. In this paper, we take a completely different perspective – we work on the assumption that there is always a new style that is drastically different, and that we will only have very limited data during testing to perform adaptation. This creates a commercially viable solution – being exposed to the new style, the model has the best shot at adaptation, and the few-sample nature makes it practical to implement. We achieve this via a novel meta-learning framework which exploits additional new-writer data via a support set, and outputs a writer-adapted model via single gradient step update, all during inference (see Figure 1). We discover and leverage on the important insight that there exists few key characters per writer that exhibit relatively larger style discrepancies. For that, we additionally propose to meta-learn instance specific weights for a character-wise cross-entropy loss, which is specifically designed to work with the sequential nature of text data. Our writer-adaptive MetaHTR framework can be easily implemented on the top of most state-of-the-art HTR models. Experiments show an average performance gain of 5-7% can be obtained by observing very few new style data (≤ 16).

AYAN KUMAR BHUNIA, YONGXIN YANG, T.M. Hospedales, TAO XIANG, Yi-Zhe Song (2020)Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval, In: Proceedings Conference on Computer Vision and Pattern Recognition (CVPR) 2020

Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user’s query sketch. Its widespread applicability is however hindered by the fact that drawing a sketch takes time, and most people struggle to draw a complete and faithful sketch. In this paper, we reformulate the conventional FG-SBIR framework to tackle these challenges, with the ultimate goal of retrieving the target photo with the least number of strokes possible. We further propose an on-the-fly design that starts retrieving as soon as the user starts drawing. To accomplish this, we devise a reinforcement learning based cross-modal retrieval framework that directly optimizes rank of the ground-truth photo over a complete sketch drawing episode. Additionally, we introduce a novel reward scheme that circumvents the problems related to irrelevant sketch strokes, and thus provides us with a more consistent rank list during the retrieval. We achievesuperiorearly-retrievalefficiencyoverstate-of-theartmethodsandalternativebaselinesontwopubliclyavailable fine-grained sketch retrieval datasets.

AYAN KUMAR BHUNIA, PINAKI NATH CHOWDHURY, YONGXIN YANG, Timothy M. Hospedales, TAO XIANG, YI-ZHE SONG (2021)Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting

Self-supervised learning has gained prominence due to its efficacy at learning powerful representations from un-labelled data that achieve excellent performance on many challenging downstream tasks. However, supervision-free pretext tasks are challenging to design and usually modality specific. Although there is a rich literature of self-supervised methods for either spatial (such as images) or temporal data (sound or text) modalities, a common pretext task that benefits both modalities is largely missing. In this paper, we are interested in defining a self-supervised pretext task for sketches and handwriting data. This data is uniquely characterised by its existence in dual modalities of rasterized images and vector coordinate sequences. We address and exploit this dual representation by proposing two novel cross-modal translation pretext tasks for self-supervised feature learning: Vectorization and Rasteriza-tion. Vectorization learns to map image space to vector coordinates and rasterization maps vector coordinates to image space. We show that our learned encoder modules benefit both raster-based and vector-based downstream approaches to analysing hand-drawn data. Empirical evidence shows that our novel pretext tasks surpass existing single and multi-modal self-supervision methods.

Aneeshan Sain, Ayan Kumar Bhunia, Yongxin Yang, Tao Xiang, Yi-Zhe Song (2020)Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval, In: Proceedings of The 31st British Machine Vision Virtual Conference (BMVC 2020)pp. 1-14 British Machine Vision Association

Sketch as an image search query is an ideal alternative to text in capturing the finegrained visual details. Prior successes on fine-grained sketch-based image retrieval (FGSBIR) have demonstrated the importance of tackling the unique traits of sketches as opposed to photos, e.g., temporal vs. static, strokes vs. pixels, and abstract vs. pixelperfect. In this paper, we study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail – a person typically sketches up to various extents of detail to depict an object. This hierarchical structure is often visually distinct. In this paper, we design a novel network that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at corresponding hierarchical levels. In particular, features from a sketch and a photo are enriched using cross-modal co-attention, coupled with hierarchical node fusion at every level to form a better embedding space to conduct retrieval. Experiments on common benchmarks show our method to outperform state-of-the-arts by a significant margin.

ANEESHAN SAIN, AYAN KUMAR BHUNIA, YONGXIN YANG, TAO XIANG, YI-ZHE SONG (2021)StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

Sketch-based image retrieval (SBIR) is a cross-modal matching problem which is typically solved by learning a joint embedding space where the semantic content shared between photo and sketch modalities are preserved. However, a fundamental challenge in SBIR has been largely ignored so far, that is, sketches are drawn by humans and considerable style variations exist amongst different users. An effective SBIR model needs to explicitly account for this style diversity, crucially, to generalise to unseen user styles. To this end, a novel style-agnostic SBIR model is proposed. Different from existing models, a cross-modal variational autoencoder (VAE) is employed to explicitly disentangle each sketch into a semantic content part shared with the corresponding photo, and a style part unique to the sketcher. Importantly, to make our model dynamically adaptable to any unseen user styles, we propose to metatrain our cross-modal VAE by adding two style-adaptive components: a set of feature transformation layers to its encoder and a regulariser to the disentangled semantic content latent code. With this meta-learning framework, our model can not only disentangle the cross-modal shared semantic content for SBIR, but can adapt the disentanglement to any unseen user style as well, making the SBIR model truly style-agnostic. Extensive experiments show that our style-agnostic model yields state-of-the-art performance for both category-level and instance-level SBIR.