Dr Geraint Rees

Research Fellow
+44 (0)1483 689960
14 LC 03



Geraint joined the University of Surrey in 2017 as a Research Fellow on the AHRC funded Collocaid Project. The project involves combining corpus-based lexicographical data and text editors to help writers of English for Academic Purposes. Geraint has a PhD in Language and Translation Sciences at Universitat Pompeu Fabra Barcelona. Geraint's doctoral scholarship, awarded by the Universitat Pompeu Fabra and the University Institute for Applied Linguistics (IULA), supported his research on disciplinary differences in English for Academic Purposes and their representation in lexicographical resources and teaching materials.

Geraint specialises in ESP lexicography (with particular emphasis on EAP), the use of corpora in language teaching and translation, and digital learning environments. He is particularly interested in phraseological approaches to language teaching

Research interests

Special purpose and pedagogical lexicography

Academic discourse


Language testing

Materials design

Corpus linguistics


Geraint is a member of the European Association for Lexicography (EURALEX) and the Spanish Association of Applied Linguistics (AESLA)



Frankenberg-Garcia Ana, Lew R, Roberts J, Rees Geraint, Sharma N (2018) Developing a writing assistant to help EAP writers with
collocations in real time, 
ReCALL 31 (1) pp. 23-39 Cambridge University Press

DOI: 10.1017/S0958344018000150

Roberts Jonathan C., Frankenberg-Garcia Ana, Lew Robert, Rees Geraint, Sharma Nirwan (2018) Visualisation Approaches for Corpus Linguistics: towards Visual Integration of Data-Driven Learning, 3rd Workshop on Visualization for the Digital Humanities ? Vis4DH 2018 pp. 1-5 Institute of Electrical and Electronics Engineers (IEEE)

Ana Frankenberg-Garcia, Geraint Rees, Robert Lew, Jonathan Roberts, Nirwan Sharma, Peter Butcher (2019)ColloCaid: a tool to help academic English writers find the words they need, In: Fanny Meunier, Julie Van de Vyver, Linda Bradley, Sylvie Thouësny (eds.), CALL and complexity – short papers from EUROCALL 2019pp. 144-150 Research-publishing.net

This short paper summarizes the development of ColloCaid (www.collocaid.uk), a text editor that supports writers with academic English collocations. After a brief introduction, the paper summarizes how the lexicographic database underlying ColloCaid was compiled, how text editor integration was achieved, and results from initial user studies. The paper concludes by outlining future developments

Ana Frankenberg-Garcia, R Lew, J Roberts, Geraint Rees, N Sharma (2018)Developing a writing assistant to help EAP writers with collocations in real time, In: ReCALL31(1)pp. 23-39 Cambridge University Press

Corpora have given rise to a wide range of lexicographic resources aimed at helping novice users of academic English with their writing. This includes academic vocabulary lists, a variety of textbooks, and even a bespoke academic English dictionary. However, writers may not be familiar with these resources or may not be sufficiently aware of the lexical shortcomings of their emerging texts to trigger the need to use such help in the first place. Moreover, writers who have to stop writing to look up a word can be distracted from getting their ideas down on paper. The ColloCaid project aims to address this problem by integrating information on collocation with text editors. In this paper, we share the research underpinning the initial development of ColloCaid by detailing the rationale of (1) the lexicographic database we are compiling to support novice EAP users’ collocation needs and (2) the preliminary visualisation decisions taken to present information on collocation to EAP users without disrupting their writing. We conclude the paper by outlining the next steps in the research.

Geraint Rees, Ana Frankenberg-Garcia (2020)Slipping Through the Cracks in e-Lexicography, In: International Journal of Lexicography Oxford University Press

Despite the remarkable advances made in recent years to facilitate the lexicographer’s work of interpreting and synthesizing the complexity of language uncovered by corpora, an uncritical use of cutting-edge corpus tools and resources can instill a false sense of assurance. In this paper, authentic examples pertaining to wordlist use, collocation research and example selection that arose when compiling a real-world lexical database are discussed through the lens of problems that can easily slip through the cracks in e-lexicography. In doing so, we emphasize the importance of solid training and sound lexicographic judgment when using corpora, corpus tools and corpus-derived resources, and provide an opportunity to reflect on how e-lexicography can be further refined in the future.

J. C. Roberts, H. Al‐maneea, P. W. S. Butcher, R. Lew, G. Rees, N. Sharma, A. Frankenberg-Garcia (2019)Multiple Views: different meanings and collocated words, In: Computer Graphics Forum - proceedings of the 21st Eurographics / IEEE VGTC Conference on Visualization (EuroVis 2019)38(3)pp. 79-93 Wiley

We report on an in‐depth corpus linguistic study on ‘multiple views’ terminology and word collocation. We take a broad interpretation of these terms, and explore the meaning and diversity of their use in visualisation literature. First we explore senses of the term ‘multiple views’ (e.g., ‘multiple views’ can mean juxtaposition, many viewport projections or several alternative opinions). Second, we investigate term popularity and frequency of occurrences, investigating usage of ‘multiple’ and ‘view’ (e.g., multiple views, multiple visualisations, multiple sets). Third, we investigate word collocations and terms that have a similar sense (e.g., multiple views, side‐by‐side, small multiples). We built and used several corpora, including a 6‐million‐word corpus of all IEEE Visualisation conference articles published in IEEE Transactions on Visualisation and Computer Graphics 2012 to 2017. We draw on our substantial experience from early work in coordinated and multiple views, and with collocation analysis develop several lists of terms. This research provides insight into term use, a reference for novice and expert authors in visualisation, and contributes a taxonomy of ‘multiple view’ terms.

Paula Tavares Pinto, Geraint Paul Rees, Ana Frankenberg-Garcia (2020)Identifying collocation issues in English L2 research article writing, In: Corpora in ESP/EAP Writing Instruction: Preparation, Exploitation, Analysis Routledge

Although there are numerous studies on collocation in English writing by L2 university students, little is known about the problems encountered by mature researchers writing authentic L2 English texts in their fields. This study investigates collocation issues in L2 English research papers in Brazil. Its starting point was the compilation of the Brazilian Academic Corpus of English (BrACE), a 906,035-word multidisciplinary corpus of journal articles written in English that have been published in Brazilian journals. The most frequent noun collocations in this corpus were contrasted with the expert writing lexical database underlying the ColloCaid academic writing assistant. No evidence of systematic miscollocation was found in the published papers represented in BrACE. However, many general academic English collocations were conspicuous by their absence from BrACE, including collocations with L1 Portuguese cognates. We also observed that the collocations in BrACE were less diverse and tended to score higher in terms of strength of association than their equivalents in the reference data. In addition to feedback on miscollocations which might arise in unedited manuscripts, our findings to conclude that Brazilian (and other English L2) research writers can benefit from suggestions to expand their collocation repertoire, enhance their perceptions of collocation strength, and offset collocation avoidance.

ANA LUCIA FRANKENBERG-GARCIA, GERAINT PAUL REES (2021)Academic Collocation Errors and Other Problems by ColloCaid Figshare

This dataset can be found on Figshare: https://doi.org/10.6084/m9.figshare.13640624.v2 The Academic Collocation Errors and Other Problems database by ColloCaid (www.collocaid.uk) comprises 370 common collocation errors and other collocation problems affecting how 76 frequently used words are employed in English academic writing. Solutions to the problems are also provided. A variety of sources were used to compile the database, including learner corpora, textbooks, dictionaries, and grammars. For more information, read the documentation file. FUNDING AHRC AH/P003508/1

Robert Lew, Ana Frankenberg-Garcia, Geraint Paul Rees, Jonathan C. Roberts, Nirwan Sharma (2018)ColloCaid: A Real-time Tool to Help Academic Writers with English Collocations, In: Proceedings of the XVIII EURALEX International Congress: Lexicography in global contextspp. 247-254 Ljubljana University Press (Faculty of Arts)

Writing is a cognitively challenging activity that can benefit from lexicographic support. Academic writing in English presents a particular challenge, given the extent of use of English for this purpose. The ColloCaid tool, currently under development, responds to this challenge. It is intended to assist academic English writers by providing collocation suggestions, as well as alerting writers to unconventional collocational choices as they write. The underlying collocational data are based on a carefully curated set of about 500 collocational bases (nouns, verbs, and adjectives) characteristic of academic English, and their collocates with illustrative examples. These data have been derived from state-of-the-art corpora of academic English and academic vocabulary lists. The manual curation by expert lexicographers and reliance on specifically Academic English textual resources are what distinguishes ColloCaid from existing collocational resources. A further characteristic of ColloCaid is its strong emphasis on usability. The tool draws on dictionary-user research, findings in information visualization, as well as usability testing specific to ColloCaid in order to find an optimal amount of collocation prompts, and the best way to present them to the user.

Ana Frankenberg-Garcia, Geraint Paul Rees, Robert Lew (2020)ColloCaid Sample Data figshare

COLLOCAID SAMPLE DATAThe ColloCaid Sample Data comprises approximately 2% of the ColloCaid lexical database. The sample covers 692 strong academic English collocations (LogDice >5.0) for 16 core academic lemmas used as collocation bases (or nodes): 5 nouns, 5 verbs, and 6 adjectives. The selection aims to give an overview of the range of data included in the full dataset. This includes collocations with bases classified with more than one part-of-speech tag (e.g. DEBATE, INDIVIDUAL), polysemous collocation bases giving rise to distinct collocation patterns (e.g. CODE), as well as collocation bases that evoke a very large and a very small number of collocations. The strongest eight lexical collocations listed for each base are enriched with three different curated example sentences adapted from corpora of expert academic English writing. COLLOCAID LEXICAL DATA 1.0. The full ColloCaid lexical dataset consists of: • 572 core academic English lemmas • 32,655 academic collocations with the above lemmas • 29,055 example sentences of collocations in context Further information at http://www.collocaid.uk/

Jonathan C. Roberts, Ana Frankenberg-Garcia, Robert Lew, Geraint Rees, Nirwan Sharma (2018)Visualisation Approaches for Corpus Linguistics: towards Visual Integration of Data-Driven Learning, In: 3rd Workshop on Visualization for the Digital Humanities – Vis4DH 2018pp. 1-5 Institute of Electrical and Electronics Engineers (IEEE)

The compilation and use of corpora is not solely for research in linguistics. Among many other practical applications, corpora can be used to inform dictionaries, grammars and syllabuses. They can also help language users directly by providing concrete examples of common practices and good examples. Data-Driven Learning (DDL) describes situations where tools and techniques of corpus linguistics are used to learn a language or a particular type of language. However, DDL has remained largely confined to the research community. Consequently, there is a need to better integrate corpora with language pedagogy, develop visual techniques that will enable DDL to be used by wider audiences, and explore how visualisation could help make DDL more integrated and interactive. This paper addresses this question by exploring how visualisation approaches for corpus linguistics can enhance DDL, with particular focus on improving academic writing.

Additional publications