Dr Diptesh Kanojia


Lecturer in Artificial Intelligence for Natural Language Processing
Doctor of Philosophy (IITB-Monash Research Academy), Bachelor of Technology in Computer Science & Engineering

About

Areas of specialism

Natural Language Processing; Machine Translation; Lexical Semantics; Cognitive Psycholinguistics; Computational Phylogenetics; Sarcasm Detection and Sentiment Analysis

News

In the media

2016
Emergency Response, Courtesy of Twitter
Featured || An article on PBS's NOVA Next covering our research on rapidly responding to emergency situations based on tweets.
Public Broadcasting Service (PBS)
2020
Helping Computers Understand Language
Author || PhD research story || The article describes the research conducted during my PhD on both Cognates and Phylogenetics.
IITB-Monash Research Academy

Research

Research interests

Publications

Highlights

Google Scholar has a complete list of my publications. A select few publications are shown below.

Sourabh Deoghare, Paramveer Choudhary, Diptesh Kanojia, Tharindu Ranasinghe, Pushpak Bhattacharyya, Constantin Orasan (2023)A Multi-task Learning Framework for Quality Estimation, In: Findings of the Association for Computational Linguistics: ACL 2023pp. 9191-9205 Association for Computational Linguistics

Quality Estimation (QE) is the task of evaluating machine translation output in the absence of reference translation. Conventional approaches to QE involve training separate models at different levels of granularity viz., word-level, sentence-level, and document-level, which sometimes lead to inconsistent predictions for the same input. To overcome this limitation, we focus on jointly training a single model for sentence-level and word-level QE tasks in a multi-task learning framework. Using two multi-task learning-based QE approaches , we show that multi-task learning improves the performance of both tasks. We evaluate these approaches by performing experiments in different settings, viz., single-pair, multi-pair, and zero-shot. We compare the multi-task learning-based approach with base-line QE models trained on single tasks and observe an improvement of up to 4.28% in Pearson's correlation (r) at sentence-level and 8.46% in F1-score at word-level, in the single-pair setting. In the multi-pair setting, we observe improvements of up to 3.04% at sentence-level and 13.74% at word-level; while in the zero-shot setting, we also observe improvements of up to 5.26% and 3.05%, respectively. We make the models proposed in this paper publicly available.

Shenbin Qian, Constantin Orasan, Diptesh Kanojia, Hadeel Saadany, Felix do Carmo (2022)SURREY-CTS-NLP at WASSA2022:An Experiment of Discourse and Sentiment Analysis for the Prediction of Empathy, Distress and Emotion, In: PROCEEDINGS OF THE 12TH WORKSHOP ON COMPUTATIONAL APPROACHES TO SUBJECTIVITY, SENTIMENT & SOCIAL MEDIA ANALYSISpp. 271-275 Assoc Computational Linguistics-Acl

This paper summarises the submissions our team, SURREY-CTS-NLP has made for the WASSA 2022 Shared Task for the prediction of empathy, distress and emotion. In this work, we tested different learning strategies, like ensemble learning and multi-task learning, as well as several large language models, but our primary focus was on analysing and extracting emotion-intensive features from both the essays in the training data and the news articles, to better predict empathy and distress scores from the perspective of discourse and sentiment analysis. We propose several text feature extraction schemes to compensate the small size of training examples for fine-tuning pretrained language models, including methods based on Rhetorical Structure Theory (RST) parsing, cosine similarity and sentiment score. Our best submissions achieve an average Pearson correlation score of 0.518 for the empathy prediction task and an F1 score of 0.571 for the emotion prediction task(1), indicating that using these schemes to extract emotion-intensive information can help improve model performance.

Jordan Painter, Helen Treharne, Diptesh Kanojia (2022)Utilizing Weak Supervision to Create S3D: A Sarcasm Annotated Dataset, In: Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS) Association for Computational Linguistics

Additional publications