
Dr Diptesh Kanojia
Academic and research departments
Surrey Institute for People-Centred Artificial Intelligence (PAI), Department of Computer Science.About
Biography
Researcher working on problems intersecting the areas of Natural Language Processing (NLP) and Machine or Deep Learning (ML/DL). As a Lecturer at Surrey Institute for People-Centred AI and the Department of Computer Science in University of Surrey, I co-teach the NLP module. I am also teaching NLP at the Indian Institute of Information Technology Lucknow, as a Visiting Faculty.
The focus of my research is Machine Translation; working with Prof. Sabine Braun, Prof. Constantin Orăsan, Dr. Félix do Carmo and other colleagues.
Other than that, my interests also lie in the NLP sub-areas of Cognitive NLP, Distributional and Lexical Semantics, Multimodality/Multilingualism in NLP, Sentiment/Emotion Analysis; and Teaching. I graduated with a Joint PhD from the IITB-Monash Research Academy ( IIT Bombay, India & Monash University, Australia ) in 2021; and was advised by Prof. Pushpak Bhattacharyya, Prof. Gholamreza Haffari, and Prof. Malhar Kulkarni.
Before starting my PhD, I was a Research Engineer at the CFILT Lab, IIT Bombay, India, where my investigations led to publications in diverse sub-areas of NLP and AI.
Areas of specialism
News
In the media
ResearchResearch interests
Lexical Semantics, Machine Translation, Translation Technologies,
Cognitive Psycholinguistics, Automated Essay Grading, Computational Phylogenetics,
Multilingualism, Low-resource Languages, and Sentiment/Emotion Analysis.
Research interests
Lexical Semantics, Machine Translation, Translation Technologies,
Cognitive Psycholinguistics, Automated Essay Grading, Computational Phylogenetics,
Multilingualism, Low-resource Languages, and Sentiment/Emotion Analysis.
Publications
Highlights
Google Scholar has a complete list of my publications. A select few publications are shown below.
Quality Estimation (QE) is the task of evaluating machine translation output in the absence of reference translation. Conventional approaches to QE involve training separate models at different levels of granularity viz., word-level, sentence-level, and document-level, which sometimes lead to inconsistent predictions for the same input. To overcome this limitation, we focus on jointly training a single model for sentence-level and word-level QE tasks in a multi-task learning framework. Using two multi-task learning-based QE approaches , we show that multi-task learning improves the performance of both tasks. We evaluate these approaches by performing experiments in different settings, viz., single-pair, multi-pair, and zero-shot. We compare the multi-task learning-based approach with base-line QE models trained on single tasks and observe an improvement of up to 4.28% in Pearson's correlation (r) at sentence-level and 8.46% in F1-score at word-level, in the single-pair setting. In the multi-pair setting, we observe improvements of up to 3.04% at sentence-level and 13.74% at word-level; while in the zero-shot setting, we also observe improvements of up to 5.26% and 3.05%, respectively. We make the models proposed in this paper publicly available.
This paper summarises the submissions our team, SURREY-CTS-NLP has made for the WASSA 2022 Shared Task for the prediction of empathy, distress and emotion. In this work, we tested different learning strategies, like ensemble learning and multi-task learning, as well as several large language models, but our primary focus was on analysing and extracting emotion-intensive features from both the essays in the training data and the news articles, to better predict empathy and distress scores from the perspective of discourse and sentiment analysis. We propose several text feature extraction schemes to compensate the small size of training examples for fine-tuning pretrained language models, including methods based on Rhetorical Structure Theory (RST) parsing, cosine similarity and sentiment score. Our best submissions achieve an average Pearson correlation score of 0.518 for the empathy prediction task and an F1 score of 0.571 for the emotion prediction task(1), indicating that using these schemes to extract emotion-intensive information can help improve model performance.
Additional publications
Significant Publications:-
Kanojia, D., Sharma, P., Ghodekar, S., Bhattacharyya, P., Haffari, G., & Kulkarni, M. (2021). Cognition-aware Cognate Detection. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL). EACL 2021. [Best Paper Honourable Mention]
Kanojia, D., Dabre, R., Dewangan, S., Bhattacharyya, P., Haffari, G., & Kulkarni, M. (2020). Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages. Proceedings of The 28th International Conference on Computational Linguistics (COLING). COLING 2020.
Mathias, S., Kanojia, D., Mishra, A., & Bhattacharyya, P. (2020). A survey on using gaze behaviour for natural language processing. IJCAI International Joint Conference on Artificial Intelligence, 2021-January, 4907–4913. https://doi.org/10.24963/ijcai.2020/683
Mathias, S., Murthy, R., Kanojia, D., Mishra, A., & Bhattacharyya, P. (2020). Happy Are Those Who Grade without Seeing: A Multi-Task Learning Approach to Grade Essays Using Gaze Behaviour. Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL-IJCNLP). AACL-IJCNLP 2020.
Kanojia, D., Kulkarni, M., Bhattacharyya, P., & Haffari, G. (2020). Challenge Dataset of Cognates and False Friend Pairs from Indian Languages. Proceedings of The 12th Language Resources and Evaluation Conference, 3096–3102.
Mathias, S., Murthy, R., Kanojia, D., & Bhattacharyya, P. (2021). Cognitively Aided Zero-Shot Automatic Essay Grading. Proceedings of the 17th International Conference on Natural Language Processing (ICON). ICON 2020.
Sheoran, A., Kanojia, D., Joshi, A., & Bhattacharyya, P. (2020). Recommendation chart of domains for cross-domain sentiment analysis: findings of a 20 domain study.The 12th Language Resources and Evaluation Conference (LREC). LREC 2020.
Kumar, S., Kumar, S., Kanojia, D., & Bhattacharyya, P. (2020). "A Passage to India": Pre-trained Word Embeddings for {I}ndian Languages. Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), 352–357. https://www.aclweb.org/anthology/2020.sltu-1.49
Kanojia, D., Patel, K., Bhattacharyya, P., Kulkarni, M., & Haffari, R. (2019). Utilizing Wordnets for Cognate Detection among Indian Languages. Global Wordnet Conference (GWC), 404. GWC 2019.
Kanojia, D., Dubey, A., Kulkarni, M., Bhattacharyya, P., & Haffari, G. (2019). Utilizing Word Embeddings based Features for Phylogenetic Tree Generation of Sanskrit Texts. Proceedings of the 6th International Sanskrit Computational Linguistics Symposium, 152–165.
Kanojia, D., Kulkarni, M., Bhattacharyya, P., & Kahrs, E. (2019). An Introduction to the Textual History Tool. Proceedings of the 6th International Sanskrit Computational Linguistics Symposium (ISCLS), 166–180. ISCLS 2019.
Mathias, S., Kanojia, D., Patel, K., Agrawal, S., Mishra, A., & Bhattacharyya, P. (2018). Eyes are the Windows to the Soul: Predicting the Rating of Text Quality Using Gaze Behaviour. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2352–2362. ACL 2018.
Mishra, A., Kanojia, D., Nagar, S., Dey, K., & Bhattacharyya, P. (2017). Scanpath Complexity: Modeling Reading Effort Using Gaze Information. Proceedings of the 31st Annual AAAI Conference on Artificial Intelligence, 4429–4436. AAAI 2017.
Joshi, A., Kanojia, D., & Bhattacharyya, P. (2017). Sarcasm Suite: A browser-based engine for sarcasm detection and generation. Association for the Advancement of Artificial Intelligence Conference: Demo Track (AAAI 2017), 31, 2. AAAI 2017.
Mishra, A., Kanojia, D., Nagar, S., Dey, K., & Bhattacharyya, P. (2016, August). Harnessing Cognitive Features for Sarcasm Detection. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1095-1104). ACL 2016.
Mishra, A., Kanojia, D., Nagar, S., Dey, K., & Bhattacharyya, P. (2016, August). Leveraging cognitive features for sentiment analysis. Proceedings of The SIGNLL Conference on Computational Natural Language Learning (CoNLL). CoNLL 2016.
Mishra, A., Kanojia, D., & Bhattacharyya, P. (2016). Predicting Readers’ Sarcasm Understandability by Modeling Gaze Behavior. Proceedings of the 30th Annual AAAI Conference on Artificial Intelligence, 3747–3753. AAAI 2016.
Joshi, S., Kanojia, D., & Bhattacharyya, P. (2013). More than meets the eye: Study of Human Cognition in Sense Annotation. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 733–738.