
Dr Leonardo Zilio
Academic and research departments
Centre for Translation Studies, School of Literature and Languages, Faculty of Arts and Social Sciences.About
Biography
I graduated in Translation Studies (German-Portuguese) in 2006, finished my Master Dissertation in 2009 and my Doctoral Thesis in 2015 at the Institute of Linguistics, Universidade Federal do Rio Grande do Sul (UFRGS). During this time, I also took part in a one-year doctoral stage at the Laboratoire d'Informatique de Grenoble in 2012-2013.
After my PhD, I have developed two post-doctoral researches in Natural Language Processing and in Computational Linguistics. The first one at the Institute of Informatics, UFRGS, from 2015 to 2016, and the second one at the Centre de traitement automatic du langage, Université catholique de Louvain, from 2016 to 2018.
In 2019, I enjoyed a time dedicated to translation as a freelancer in Brazil, before joining the Centre for Translation Studies, University of Surrey, at the end of the year.
Along these years, I always dedicated at least a part of the time to professional translation (English / French / German > Portuguese) and text proofing (English / Portuguese).
ResearchResearch interests
Text Simplification
Translation Studies
Corpus Linguistics
Computational Linguistics
Natural Language Processing
Semantic Role Labeling
Research projects
2018 - Current. Associate Researcher
This project deals with textual and terminological accessibility, mainly focusing on the process of producing scientific texts that can be understood by laymen and people with low literacy skills, especially in a health care environment.
2016 - 2018. Post-doctoral Researcher
Language learning is a process which requires focus and immersion in the target language. With regards to e-learning, in which the language learner is much more autonomous in their language learning process, having an immersive virtual platform that enables the learner to pursue his or her own goals is a challenge to be reckoned with. In this scenario, it is important that the platform offers both ready-to-use instruments and adaptive resources that can recognize user-developed or user-prompted materials. In this way the learner is not completely attached to the predeveloped material and can focus on their own interests. Distance learning is a flexible solution for coping with the extreme diversity of language learners interested in vocational training. All this calls for the integration of Artificial Intelligence tools and advanced language technologies. In short, the learning platform needs to become "smarter". This project inserts itself in this scenario via a partnership between the Catholic University of Louvain, specifically the Centre de traitement automatique du langage (CENTAL), and Altissia, an enterprise dedicated to e-learning that handles Erasmus+ Online Linguistic Support (http://erasmusplusols.eu/).
2014 - 2016. Post-doctoral Researcher
The goal of this project is to investigate and develop techniques, resources and tools for automatic text simplification. The idea is to rewrite texts making them more accessible and easier to understand to a larger audience. Our focus is in lexical simplification, where more difficult words are replaced by more familiar synonyms, and we are interested in lexical simplification of complex expressions (such as compound nouns like brain teaser, access road, nut case) which require a high degree of precision to maintain the original meaning and result in a natural and readable alternative.
Our target groups include children as first or second language learners, and texts in both Brazilian Portuguese and English. The aim is to help reduce the literacy deficit that is an unfortunate reality of a large part of the Brazilian population, by means of resources and tools that facilitate the understanding of a text.
The project Simplification of Complex Expressions is funded by Samsung Eletrônica da Amazônia Ltda. under the terms of Brazilian federal law No. 8.248/91
Research interests
Text Simplification
Translation Studies
Corpus Linguistics
Computational Linguistics
Natural Language Processing
Semantic Role Labeling
Research projects
2018 - Current. Associate Researcher
This project deals with textual and terminological accessibility, mainly focusing on the process of producing scientific texts that can be understood by laymen and people with low literacy skills, especially in a health care environment.
2016 - 2018. Post-doctoral Researcher
Language learning is a process which requires focus and immersion in the target language. With regards to e-learning, in which the language learner is much more autonomous in their language learning process, having an immersive virtual platform that enables the learner to pursue his or her own goals is a challenge to be reckoned with. In this scenario, it is important that the platform offers both ready-to-use instruments and adaptive resources that can recognize user-developed or user-prompted materials. In this way the learner is not completely attached to the predeveloped material and can focus on their own interests. Distance learning is a flexible solution for coping with the extreme diversity of language learners interested in vocational training. All this calls for the integration of Artificial Intelligence tools and advanced language technologies. In short, the learning platform needs to become "smarter". This project inserts itself in this scenario via a partnership between the Catholic University of Louvain, specifically the Centre de traitement automatique du langage (CENTAL), and Altissia, an enterprise dedicated to e-learning that handles Erasmus+ Online Linguistic Support (http://erasmusplusols.eu/).
2014 - 2016. Post-doctoral Researcher
The goal of this project is to investigate and develop techniques, resources and tools for automatic text simplification. The idea is to rewrite texts making them more accessible and easier to understand to a larger audience. Our focus is in lexical simplification, where more difficult words are replaced by more familiar synonyms, and we are interested in lexical simplification of complex expressions (such as compound nouns like brain teaser, access road, nut case) which require a high degree of precision to maintain the original meaning and result in a natural and readable alternative.
Our target groups include children as first or second language learners, and texts in both Brazilian Portuguese and English. The aim is to help reduce the literacy deficit that is an unfortunate reality of a large part of the Brazilian population, by means of resources and tools that facilitate the understanding of a text.
The project Simplification of Complex Expressions is funded by Samsung Eletrônica da Amazônia Ltda. under the terms of Brazilian federal law No. 8.248/91
Publications
Extracting data and knowledge dispersed along Portuguese old medical records is important especially for researchers dealing with historical epidemiology and health sciences. An essential task in Natural Language Processing for processing textual information is Named En- tity Recognition (NER). In this paper, our main objective is to test the performance of NER systems for Portuguese for extracting information from XVIII-century medical texts, so that we can provide an annotated version of an important work of this type.
The advent of AI-supported, cloud-based collaborative translation platforms have enabled a new form of online collaborative translation — ‘concurrent translation’ (CT). CT refers to commercial translation performed on such platforms by multiple agents (translators, editors, subject-matter experts etc) simultaneously, via concurrent access. Although the practice has recently gained more ground, research on CT is scarce. The present article reports on selected key findings of a study that investigates translators experiences with CT via a survey of 804 professional translators working in CT mode across different commercial platforms. Despite the affordances such as peer learning, positive competition, speed, flexibility of the volume of work and working time, and reduced responsibility and reduced stress, CT workflow comes with its substantial challenges such as time pressure, negative competition, reduced selfrevision and research, all of which result in quality compromised for speed.
This paper presents MedSimples, an authoring tool that combines Natural Language Processing, Corpus Linguistics and Terminologyto help writers to convert health-related information into a more accessible version for people with low literacy skills. MedSimplesapplies parsing methods associated with lexical resources to automatically evaluate a text and present simplification suggestions thatare more suitable for the target audience. Using the suggestions provided by the tool, the author can adapt the original text and makeit more accessible. The focus of MedSimples lies on texts for special purposes, so that it not only deals with general vocabulary, butalso with specialized terms. The tool is currently under development, but an online working prototype exists and can be tested freely.An assessment of MedSimples was carried out aiming at evaluating its current performance with some promising results, especially forinforming the future developments that are planned for the tool.
Additional publications
- Doctoral Thesis
ZILIO, L., 2015. Verblexpor: um recurso léxico com anotação de papéis semânticos para o português. 196 f. 2015 (Doctoral thesis - Doutorado em Letras – Universidade Federal do Rio Grande do Sul). [PDF]
- Master Dissertation
Zilio, L., 2009. Colocações especializadas e 'Komposita': um estudo constrastivo alemão-português na área de cardiologia. (Master Dissertation - Mestrado em Letras – Universidade Federal do Rio Grande do Sul). [PDF]
- Papers in Journals
Zilio, L., Ramisch, C. and Finatto, M.J.B., 2013. Desenvolvimento de um recurso léxico com papéis semânticos para o português. Linguamática, 5(2), pp.23-41. [PDF]
Zilio, L., 2012. Colocações especializadas em alemão e português na área de Cardiologia. Tradterm, 20, pp.146-177. [PDF]
Zilio, L., 2011. Termo e valor linguístico: uma abordagem ensaística. Cadernos do IL, (42), pp.119-128. [PDF]
Zilio, L., Fichtner, M.L.F. and Finatto, M.J.B., 2006. Resíduos e Abfälle: um reconhecimento terminológico para a busca de equivalências entre o português e o alemão. Tradterm, 12, pp.269-292. [PDF]
- Book
FINATTO, M. and ZILIO, L., 2015. Textos e termos para Lothar Hoffmann: um convite para o estudo das linguagens técnico-científicas. Porto Alegre, Editora Pallotti/FAPERGS. [No PDF Available]
- Book Chapters
Wilkens, R., Zilio, L. and Fairon, C., 2018, March. Document Ranking Applied to Second Language Learning. In European Conference on Information Retrieval (pp. 618-624). Springer, Cham. [Springer Link]
Zilio, L., Wilkens, R. and Fairon, C., 2018, September. SMILLE for Portuguese: Annotation and Analysis of Grammatical Structures in a Pedagogical Context. In International Conference on Computational Processing of the Portuguese Language (pp. 13-23). Springer, Cham. [Springer Link]
Ramisch, C., Ramisch, R., Zilio, L., Villavicencio, A. and Cordeiro, S., 2018, September. A Corpus Study of Verbal Multiword Expressions in Brazilian Portuguese. In International Conference on Computational Processing of the Portuguese Language (pp. 24-34). Springer, Cham. [Springer Link]
Zilio, L., Wilkens, R. and Fairon, C., 2018, September. PassPort: A Dependency Parsing Model for Portuguese. In International Conference on Computational Processing of the Portuguese Language (pp. 479-489). Springer, Cham. [Springer Link]
Zilio, L., Wilkens, R. and Fairon, C., 2017. Enhancing grammatical structures in web-based texts. CALL in a climate of change: adapting to turbulent global conditions, p.345. [PDF]
Wilkens, R., Zilio, L., Ferreira, E. and Villavicencio, A., 2016, July. The Portuguese B2SG: A Semantic Test for Distributional Thesaurus. In International Conference on Computational Processing of the Portuguese Language (pp. 333-339). Springer, Cham. [Springer Link]
Wagner Filho, J.A., Wilkens, R., Zilio, L., Idiart, M. and Villavicencio, A., 2016, July. Crawling by readability level. In International Conference on Computational Processing of the Portuguese Language (pp. 306-318). Springer, Cham. [Springer Link]
Zilio, L., Wilkens, R., Möllmann, L., Wehrli, E., Cordeiro, S. and Villavicencio, A., 2016, July. Joining forces for multiword expression identification. In International Conference on Computational Processing of the Portuguese Language (pp. 233-238). Springer, Cham. [Springer Link]
Zilio, L., Zanette, A. and Scarton, C., 2014. Automatic extraction of subcategorization frames from portuguese corpora. New Languages Technologies and Linguistic Research: A Two-Way Road, pp.78-96. [No PDF Available]
Zilio, L., 2014, October. Development of a Lexical Resource Annotated with Semantic Roles for Portuguese. In International Conference on Computational Processing of the Portuguese Language (pp. 195-200). Springer, Cham. [Springer Link]
Zilio, L., 2010. Terminologia Textual e Linguística de Corpus: estudo em parceria. Linguagens Especializadas em Corpora, p.128. [PDF]
ZILIO, L., 2007. Contraste Alemão-Português de Fraseologias Especializadas em Textos de Cardiologia. Anais do VI Encontro de Lingüística de Corpus. [No PDF Available]
- Papers in Conference Proceedings
Zilio, L., Paraguassu, L.B., Hercules, L.A.L., Ponomarenko, G., Berwanger, L. and Finatto, M.J.B., 2020. A Lexical Simplification Tool for Promoting Health Literacy. In Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) (pp. 70-76). [PDF]
Paraguassu, L., Zilio, L., Hercules, T.L. and Finatto, M.J.B., 2020. MedSimples: An Automated Simplification Tool for Promoting Health Literacy in Brazil. In DHandNLP@PROPOR (pp. 76-78). [PDF]
Wilkens, R., Zilio, L. and Fairon, C., 2018, May. Sw4all: a cefr classified and aligned corpus for language learning. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). [PDF]
Zilio, L., Wilkens, R. and Fairon, C., 2018, May. An sla corpus annotated with pedagogically relevant grammatical structures. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). [PDF]
Wilkens, R., Zilio, L., Cordeiro, S.R., Paula, F., Ramisch, C., Idiart, M. and Villavicencio, A., 2017. Lexsubnc: A dataset of lexical substitution for nominal compounds. In IWCS 2017—12th International Conference on Computational Semantics—Short papers. [PDF]
Zilio, L., Wilkens, R. and Fairon, C., 2017, September. Using NLP for Enhancing Second Language Acquisition. In RANLP (pp. 839-846). [PDF]
Zilio, L. and Fairon, C., 2017, July. Adaptive system for language learning. In 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT) (pp. 47-49). IEEE. [PDF]
Wilkens, R., Zilio, L., Ferreira, E. and Villavicencio, A., 2016, May. B2SG: a TOEFL-like task for Portuguese. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) (pp. 3659-3662). [PDF]
Zilio, L., Finatto, M.J.B. and Villavicencio, A., 2016, May. VerbLexPor: a lexical resource with semantic roles for Portuguese. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) (pp. 2656-2661). [PDF]
Ramisch, C., Cordeiro, S., Zilio, L., Idiart, M. and Villavicencio, A., 2016, August. How naked is the naked truth? a multilingual lexicon of nominal compound compositionality. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 156-161). [PDF]
Wilkens, R., Zilio, L., Ferreira, E., Gonçalves, G. and Villavicencio, A., 2015, November. Tesauros distribucionais para o português: avaliação de metodologias. In Anais do X Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (pp. 131-140). SBC. [PDF]
Zilio, L., 2015, November. Verblexpor: um recurso léxico com anotação de papéis semânticos para o português. In Anais do X Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (pp. 131-140). SBC. [PDF]
Zilio, L., Zanette, A. and Scarton, C., 2012. Extração automática de estruturas de subcategorização a partir de corpora em português. Anais do ELC 2012.
Zilio, L., Prestes, K., Wilkens, R. and Villavicencio, A., 2012. Geração semiautomática de uma ontologia geral de língua comum. Anais do ELC 2012. [PDF]
Schreiner, P., Villavicencio, A., Zilio, L. and Caseli, H.M., 2011. Improving Lexical Alignment Using Hybrid Discriminative and Post-Processing Techniques. In Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology. [PDF]
Zilio, L., Svoboda, L., Rossi, L.H.L. and Feitosa, R.M., 2011. Automatic extraction and evaluation of MWE. In Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology. [PDF]
Prestes, K., Wilkens, R., Zilio, L. and Villavicencio, A., 2011. Extração e Validação de Ontologias a partir de Recursos Digitais. In ONTOBRAS-MOST (pp. 183-188). [PDF]
Finatto, Maria J.B., Leonardo Zilio; Fernanda Scheeren, 2009, August. Artigos científicos de Cardiologia: contraste de macro e microestruturas para caracterização de tipo textual. In V SIGET - Simpósio Internacional de Estudos de Gêneros Textuais. [PDF]
ZILIO, L., 2007. Contraste Alemão-Português de Fraseologias Especializadas em Textos de Cardiologia. Anais do VI Encontro de Lingüística de Corpus. [No PDF Available]