My research project

My publications



J12:  Ayan Kumar Bhunia*,  Subham Mukherjee, Aneeshan Sain,  Ankan Kumar Bhunia, Partha Pratim Roy,  Umapada PaL, “Indic Handwritten Script Identification using Offline-Online multi-modal Deep Network”,  Information Fusion, Elsevier. 2019.  (Impact Factor - 10.716)

C11: Ankan Kumar Bhunia, Ayan Kumar Bhunia*,  Aneeshan Sain,  Partha Pratim Roy, “Improving Document Binarization via Adversarial Noise-Texture Augmentation”,  IEEE International Conference on Image Processing (ICIP), 2019.  [PDF] [Github]

C10: Pranay Mukherjee, Abhirup Das, Ayan Kumar Bhunia*, Partha Pratim Roy, “Cogni-Net: Cognitive Feature Learning through Deep Visual Perception”,  IEEE International Conference on Image Processing (ICIP), 2019. [PDF] [Github]

J11: Ayan Kumar Bhunia, Shuvozit Ghose, Partha Pratim Roy, Subrahmanyam Murala, “A Novel Feature Descriptor for Image Retrieval by Combining Modified Color Histogram and Diagonally Symmetric Co-occurrence Texture Pattern”, Pattern Analysis and Applications, 2019, Springer. (Impact Factor-1.28)   [PDF

C9: Ayan Kumar Bhunia, Abhirup Das,  Ankan Kumar Bhunia, Perla Sai Raj Kishore, Partha Pratim Roy, “Handwriting Recognition in Low-resource Scripts using Adversarial Learning ”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.  [PDF]

C8: Sauradeep Nag, Ayan Kumar Bhunia*,  Aishik Konwer,  Partha Pratim Roy, “Facial Micro-expression Spotting and Recognition using Time Contrasted Feature with Visual Memory”,  International  Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.  [PDF

C7: Perla Sai Raj Kishore, Ayan Kumar Bhunia*,  Shuvozit Ghose,  Partha Pratim Roy, “User Constrained Thumbnail Generation using Adaptive Convolutions”,  International  Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.  [PDF] [Oral] [Code]

C6: Ayan Kumar Bhunia,  Perla Sai Raj Kishore, Pranay Mukherjee, Abhirup Das, Partha Pratim Roy “Texture Synthesis Guided Deep Hashing for Texture Image Retrieval”, IEEE  Winter Conference on Applications of Computer Vision (WACV), 2019.  [PDF, Youtube



C5: Ayan Kumar Bhunia,  Abir Bhowmick, Ankan Kumar Bhunia, Aishik Konwer, Prithaj Banerjee, Partha Pratim Roy, Umapada Pal, “Handwriting Trajectory Recovery using End-to-End Deep Encoder-Decoder Network”, 24th International Conference on Pattern Recognition (ICPR), 2018.  [PDFCODE

C4: Ankan Kumar Bhunia, Ayan Kumar Bhunia, Prithaj Banerjee, Aishik Konwer, Abir Bhowmick, Partha Pratim Roy, Umapada Pal, “Word Level Font-to-Font Image Translation using Convolutional Recurrent Generative Adversarial Networks”, 24th International Conference on Pattern Recognition (ICPR), 2018. [PDF

C3: Aishik Konwer, Ayan Kumar Bhunia, Abir Bhowmick, Ankan Kumar Bhunia, Prithaj Banerjee, Partha Pratim Roy, Umapada Pal, “Staff line Removal using Generative Adversarial Networks”, 24th International Conference on Pattern Recognition (ICPR), 2018.  [PDFOral ]

J10: Partha Pratim Roy, Ayan Kumar Bhunia, Avirup Bhattacharyya, Umapada Pal, “Word Searching in Scene Image and Video Frame in Multi-Script Scenario using Dynamic Shape Coding”, Multimedia Tools and Applications, 2018, Springer. (Impact Factor-1.541)   (DOI:10.1007/s11042-018-6484-5) [PDF

J9: Ankan Kumar Bhunia*, Aishik Konwer*,  Ayan Kumar Bhunia, Abir Bhowmick, Partha Pratim Roy, Umapada Pal, “Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network", Pattern Recognition,  Elsevier, 2018.  (Impact Factor - 3.962) (DOI:10.1016/j.patcog.2018.07.034) [PDFGithub

J8: Prithaj Banerjee, Ayan Kumar Bhunia, Partha Pratim Roy, Avirup Bhattacharyya, Partha Pratim Roy, Subrahmanyam Murala, “Local Neighborhood Intensity Pattern–A new texture feature descriptor for image retrieval", Expert Systems with Applications, 2018.  (Impact Factor - 3.928) (DOI:10.1016/j.eswa.2018.06.044) [PDF]  

J7: Ayan Kumar Bhunia, Partha Pratim Roy, Akash Mohta, Umapada Pal, “Cross-language Framework for Word Recognition and Spotting of Indic Scripts", Pattern Recognition, 2018, Elsevier. (Impact Factor - 4.582) (DOI:10.1016/j.patcog.2018.01.034) [PDF]  

J6: Aneeshan Sain, Ayan Kumar Bhunia, Partha Pratim Roy, Umapada Pal, “Multi-Oriented Text Detection and Verification in Video Frames and Scene Images", Neurocomputing, Elsevier. (Impact Factor - 3.317) (DOI:10.1016/j.neucom.2017.09.089) [PDF]   

J5: Ayan Kumar Bhunia, Gautam Kumar, Partha Pratim Roy, R. Balasubramanian, Umapada Pal, “Text Recognition in Scene Image and Video Frames using Color Channel Selection”, Multimedia Tools and ApplicationsVolume 77, Pages 8551–8578, 2018, Springer. (Impact Factor-1.530)  (DOI:10.1007/s11042-017-4750-6) [PDF

J4: Partha Pratim Roy, Ayan Kumar Bhunia, Umapada Pal, “Date-Field Retrieval in Scene Image and Video Frames using Text Enhancement and Shape Coding", Neurocomputing, Elsevier. (Impact Factor - 2.392) (DOI:10.1016/j.neucom.2016.08.141) [PDF



J3: Partha Pratim Roy, Ayan Kumar Bhunia, Umapada Pal, “HMM-based Writer Identification in Music Score Documents without Staff-Line Removal", Expert Systems with Applications, Volume-89, Pages 222-240, 2017, Elsevier. (Impact Factor - 3.928)  (DOI:10.1016/j.eswa.2017.07.031) [PDF

J2:  Partha Pratim Roy, Ayan Kumar Bhunia, Ayan Das, Prithviraj Dhar, Umapada Pal, “Keyword Spotting in Doctor's Handwriting on Medical Prescriptions”, Expert Systems with Applications, Volume-76, Pages 113-128, 2017, Elsevier. (Impact Factor 2.981). (DOI:10.1016/j.eswa.2017.01.027) [PDF



J1: Partha Pratim Roy, Ayan Kumar Bhunia, Ayan Das, Prasenjit Dey, Umapada Pal, “HMM-based Indic Handwriting Recognition using Zone Segmentation", Pattern Recognition, Volume-60, Pages 1057-1075, 2016, Elsevier. (Impact Factor-3.399) (DOI:10.1016/j.patcog.2016.04.012)[PDF]   



C1: Ayan Kumar Bhunia, Ayan Das, Partha Pratim Roy, Umapada Pal, “A Comparative Study of Features for Handwritten Bangla Text Recognition”, 13th International Conference on Document Analysis and Recognition (ICDAR), Pages 636-640, Nancy-France, 2015, IEEE. (DOI:10.1109/ICDAR.2015.7333839) [PDF, Oral]

C2: Ayan Das, Ayan Kumar Bhunia, Partha Pratim Roy, Umapada Pal, “Handwritten Word Spotting in Indic Scripts using Foreground and Back- ground Information", 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Pages 426-430, KualLumpur- Malaysia,2015, IEEE. (DOI:10.1109/ACPR.2015.7486539) [PDF]


BHUNIA AYAN KUMAR , YANG YONGXIN, Hospedales T.M., XIANG TAO, Song Yi-Zhe (2020)Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval, In: Proceedings Conference on Computer Vision and Pattern Recognition (CVPR) 2020
Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user’s query sketch. Its widespread applicability is however hindered by the fact that drawing a sketch takes time, and most people struggle to draw a complete and faithful sketch. In this paper, we reformulate the conventional FG-SBIR framework to tackle these challenges, with the ultimate goal of retrieving the target photo with the least number of strokes possible. We further propose an on-the-fly design that starts retrieving as soon as the user starts drawing. To accomplish this, we devise a reinforcement learning based cross-modal retrieval framework that directly optimizes rank of the ground-truth photo over a complete sketch drawing episode. Additionally, we introduce a novel reward scheme that circumvents the problems related to irrelevant sketch strokes, and thus provides us with a more consistent rank list during the retrieval. We achievesuperiorearly-retrievalefficiencyoverstate-of-theartmethodsandalternativebaselinesontwopubliclyavailable fine-grained sketch retrieval datasets.
Bhunia Ayan Kumar, Das Ayan, Muhammad Umar Riaz, Yang Yongxin, Hospedales Timothy M., Song Yi-Zhe, Gryaditskaya Yulia, Xiang Tao (2020)Pixelor: A Competitive Sketching AI Agent. So you think you can sketch?, In: ACM Transactions on Graphics39(6) Association for Computing Machinery (ACM)
We present the first competitive drawing agent Pixelor that exhibits human-level performance at a Pictionary-like sketching game, where the participant whose sketch is recognized first is a winner. Our AI agent can autonomously sketch a given visual concept, and achieve a recognizable rendition as quickly or faster than a human competitor. The key to victory for the agent’s goal is to learn the optimal stroke sequencing strategies that generate the most recognizable and distinguishable strokes first. Training Pixelor is done in two steps. First, we infer the stroke order that maximizes early recognizability of human training sketches. Second, this order is used to supervise the training of a sequence-to-sequence stroke generator. Our key technical contributions are a tractable search of the exponential space of orderings using neural sorting; and an improved Seq2Seq Wasserstein (S2S-WAE) generator that uses an optimal-transport loss to accommodate the multi-modal nature of the optimal stroke distribution. Our analysis shows that Pixelor is better than the human players of the Quick, Draw! game, under both AI and human judging of early recognition. To analyze the impact of human competitors’ strategies, we conducted a further human study with participants being given unlimited thinking time and training in early recognizability by feedback from an AI judge. The study shows that humans do gradually improve their strategies with training, but overall Pixelor still matches human performance. The code and the dataset are available at
Sain Aneeshan, Bhunia Ayan Kumar, Yang Yongxin, Xiang Tao, Song Yi-Zhe (2020)Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval, In: Proceedings of The 31st British Machine Vision Virtual Conference (BMVC 2020)pp. 1-14 British Machine Vision Association
Sketch as an image search query is an ideal alternative to text in capturing the finegrained visual details. Prior successes on fine-grained sketch-based image retrieval (FGSBIR) have demonstrated the importance of tackling the unique traits of sketches as opposed to photos, e.g., temporal vs. static, strokes vs. pixels, and abstract vs. pixelperfect. In this paper, we study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail – a person typically sketches up to various extents of detail to depict an object. This hierarchical structure is often visually distinct. In this paper, we design a novel network that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at corresponding hierarchical levels. In particular, features from a sketch and a photo are enriched using cross-modal co-attention, coupled with hierarchical node fusion at every level to form a better embedding space to conduct retrieval. Experiments on common benchmarks show our method to outperform state-of-the-arts by a significant margin.