Terminology-Aware Machine Translation for Accessible Science (TamTAS)

Start date

February 2026

End date

January 2029

About the project

Summary

This project addresses the urgent need for accurate and accessible scientific communication by enabling multilingual access to scientific knowledge. It challenges the dominance of English in scientific dissemination and aims to empower both researchers and the general public to engage with science in their native languages. To achieve this, we propose a terminology-aware machine translation (MT) framework tailored to scientific texts. The core technology is based on Large Reasoning Models (LRMs)—an advanced class of Large Language Models (LLMs) that treat translation as a reasoning task. LRMs incorporate chain-of-thought prompting, self-correction, and document-level understanding, ensuring terminological consistency and coherence across longer texts. 

To further improve translation robustness, we will develop a tightly integrated pipeline combining Quality Estimation (QE) and Automatic Post-Editing (APE). QE models will detect and assess terminology-related errors, guiding APE modules—using reinforcement learning techniques like direct preference optimization—to refine the translations. Feedback from QE will also be used to improve the LRM itself. The training of these components will be supported by specialized corpora annotated with terminology errors. The project also explores post-translation text augmentation to adapt outputs for different audiences. This includes simplification and explanation strategies to make scientific content more accessible and reusable in education and public engagement. 

We focus on five languages—English, Spanish, Catalan, Estonian, and Irish—covering both well-resourced and under-resourced contexts, and apply our approach to the Life Sciences domain. Pilot collaborations with the Centre de Recerca Genòmica (CRG) in Barcelona, the Institute of Family Medicine and Public Health in Tartu, and Conradh na Gaeilge in Ireland will ensure real-world validation and user-driven refinement (both from the scientific and the general public perspective). Bridging MT, NLP, and scientific expertise, the project will be evaluated using both adapted automatic metrics and human assessments by domain experts and general users. The final system, validated at TRL 5–6, will significantly enhance the accuracy, inclusivity, and trustworthiness of scientific communication across languages—supporting a more equitable and globally connected research ecosystem.

Funding amount

£310,525.94

Funder

Contact

For enquiries or potential collaboration on this topic please contact Prof Constantin Orasan, the Principal Investigator of the project.

See other research projects carried out at the Centre for Translation Studies.

Related sustainable development goals

Quality Education UN Sustainable Development Goal 4 logo
Reduced Inequalities UN Sustainable Development Goal 10 logo

Research themes

Find out more about our research at Surrey: