Terminology-Aware Machine Translation for Accessible Science (TamTAS)
Start date
February 2026End date
January 2029About the project
Summary
This project addresses the urgent need for accurate and accessible scientific communication by enabling multilingual access to scientific knowledge. It challenges the dominance of English in scientific dissemination and aims to empower both researchers and the general public to engage with science in their native languages. To achieve this, we propose a terminology-aware machine translation (MT) framework tailored to scientific texts. The core technology is based on Large Reasoning Models (LRMs)—an advanced class of Large Language Models (LLMs) that treat translation as a reasoning task. LRMs incorporate chain-of-thought prompting, self-correction, and document-level understanding, ensuring terminological consistency and coherence across longer texts.
To further improve translation robustness, we will develop a tightly integrated pipeline combining Quality Estimation (QE) and Automatic Post-Editing (APE). QE models will detect and assess terminology-related errors, guiding APE modules—using reinforcement learning techniques like direct preference optimization—to refine the translations. Feedback from QE will also be used to improve the LRM itself. The training of these components will be supported by specialized corpora annotated with terminology errors. The project also explores post-translation text augmentation to adapt outputs for different audiences. This includes simplification and explanation strategies to make scientific content more accessible and reusable in education and public engagement.
We focus on five languages—English, Spanish, Catalan, Estonian, and Irish—covering both well-resourced and under-resourced contexts, and apply our approach to the Life Sciences domain. Pilot collaborations with the Centre de Recerca Genòmica (CRG) in Barcelona, the Institute of Family Medicine and Public Health in Tartu, and Conradh na Gaeilge in Ireland will ensure real-world validation and user-driven refinement (both from the scientific and the general public perspective). Bridging MT, NLP, and scientific expertise, the project will be evaluated using both adapted automatic metrics and human assessments by domain experts and general users. The final system, validated at TRL 5–6, will significantly enhance the accuracy, inclusivity, and trustworthiness of scientific communication across languages—supporting a more equitable and globally connected research ecosystem.
People
Principal investigator
Professor Constantin Orasan
Professor of Language and Translation Technologies
Biography
I am Professor of Language and Translation Technologies at the Centre of Translation Studies, University of Surrey. Before starting this role, I was Reader in Computational Linguistics at the University of Wolverhampton, UK, and the deputy head of the Research Group in Computational Linguistics at the same university. I have received my BSc in computer science at Babeș-Bolyai University, Cluj-Napoca, Romania and was awarded my PhD from the University of Wolverhampton.
I have over 20 years experience of working in the fields of (applied) Natural Language Processing (NLP), Artificial Intelligence and Machine Learning for language processing. My research interests are largely focused on facilitating information access and include translation technology, sentiment analysis, question answering, text summarisation, anaphora and coreference resolution, building, annotation and exploitation of corpora.
I recently coordinated the EXPERT project, an extremely successful Initial Training Network (ITN) funded under the People Programme of the Seventh Framework Programme (FP7) of the European Community which trained the next generation of world-class researchers in the field of data-driven translation technology. In addition to coordinating this project between nine partners across both academia and industry, I was actively involved in the training of the Early Stage Researchers (ESRs) appointed in the project and, in collaboration with these ESRs, I carried out research on translation memories and quality estimation for machine translation. I continue researching these topics.
I was also the deputy coordinator of the FIRST project, a project which developed language technologies for making texts more accessible to people with autism. In addition to managing a consortium of nine partners from academia, industry and heath care organisations, I also carried out research on text simplification and contributed to the development of a powerful editor which can be used by carers of people with autism to make texts more accessible for these people.
In the past In the past, I was the Local Course Coordinator of the Erasmus Mundus programme on Technology for Translation and Interpretation and the Erasmus Mundus International Masters in Natural Language Processing and Human Language Technology, and the scientist in charge for the University of Wolverhampton in two European projects QALL-ME and MESSAGE. I also worked as a research fellow on the CAST project.
I love programming and in my spare time I contribute to some open source projects and have my own GitHub repository.
Co-investigator
Dr Diptesh Kanojia
Senior Lecturer in People-Centred AI
Biography
Researcher working on problems within areas of Natural Language Processing (NLP) and Machine Learning (ML) at the Institute for People-Centred AI (PAI) and School of Computer Science and Electronic Engineering. As a research lead, I manage Human-Machine Interaction theme @ PAI, and the NLP subgroup within the Nature Inspired Computing and Engineering group (NICE) @ Computer Science Research Centre. I also lead the teaching on the NLP module offered to undergraduate (COM3029) and postgraduate students (COMM061).
My research focuses on developing scalable and safe human-machine interaction using foundation models. Guided by the principles of Responsible and Inclusive AI, my work emphasises cross-lingual and multimodal learning to address challenges like online toxicity, misinformation, and digital accessibility for low-resource languages. All our research outcomes- code, data, and models, are publicly available on the SurreyNLP GitHub and HuggingFace.
My prior roles include a Postdoctoral Fellowship at the Centre for Translation Studies, a joint PhD from IIT Bombay and Monash University, and Research Engineer at the CFILT Lab.
Funding amount
£310,525.94
Funder
Engineering and Physical Sciences Research Council (EPSRC)
Contact
For enquiries or potential collaboration on this topic please contact Prof Constantin Orasan, the Principal Investigator of the project.
See other research projects carried out at the Centre for Translation Studies.
Related sustainable development goals
Research themes
Find out more about our research at Surrey: