
Dr Sacha Beniamine
About
Biography
I am a British Academy Newton International Fellow in the Surrey Morphology Group (SMG), a research centre based in the School of Literature & Languages. Before joining Surrey in February 2012, I was a post-doctoral researcher at the Department of Linguistic and Cultural Evolution (DLCE), of the Max Planck Institutes in Jena and Leipzig. I did my PhD in the Laboratoire de Linguistique Formelle (LLF) of the University of Paris.
Areas of specialism
My qualifications
ResearchResearch interests
My research in computational linguistics focuses on language evolution and typology. The title of my current project is: Solving the word puzzle: morphological analysis beyond stem and affixes. I see computational tools as an opportunity to systematize linguistic analyses, a solution to study precisely large amounts of data, and a necessary methodological step towards typological investigation.
Before joining the SMG, I was a post-doctoral researcher at the DLCE (MPG EVA), where I worked on inflectional lexicons, evolutionary models of inflectional paradigms and sound correspondence. During my PhD, I studied the typological variation of inflection classes (declensions or conjugations) using computational methods.
Research projects
Solving the word puzzle: morphological analysis beyond stem and affixesIn the few milliseconds necessary for speakers to say a word and for listeners to understand it, they both make several elaborate deductions. The internal structure of words can be a crucial source of information for these deductions, particularly when words have multiple grammatical forms, a process known as inflection. Across languages, the nature and number of contrasts expressed through inflection can vary greatly. While a language such as English has only a handful of grammatical distinctions, some languages can have up to thousands. Moreover, these distinctions can be manifested by diverse intricate sound contrasts. For example, the verbal system of English would be simple if all verbs conformed to the pattern of jump~jumped, which can be neatly segmented into a stem (jump) and affixes (-ed). But across languages, many words behave more like the pair think~thought which resist segmentation. In many languages, layers of regularity and idiosyncrasy further complicate the matter. Understanding the puzzling complexity of inflection is essential to explain the structure and evolution of the world's languages. Yet, linguistics still lacks a consistent, predictable methodology to study inflection.
To assess inflectional complexity across languages, this project investigates word structures across typologically diverse languages, using quantitative, computational tools.
Current studies in this area have two main – but related – shortcomings. First, they often start from pre-analysed paradigms, where forms have been segmented by hand into stems (removed from the data) and affixes. These affixal tables are not commensurate across languages. Second, studies focus on assessing how difficult it is for speakers to predict forms for a given meaning, and ignore the parallel problem of deducing the grammatical meaning of a given form. This question is key to automating word structure analysis.
the project remedies both by providing data, developing computational tools to analyse inflected words, and studying the organisation of inflectional exponence. We work on gathering, digitising, and standardising inflectional lexicons, coordinating with the international morphology community to spread the use of common standards and ensure interoperability. To solve the long standing Segmentation Problem, we write computational tools which focus on characterizing gradient information in words. Finally, our goal is to build a quantitative typology of inflected word structure.
Research interests
My research in computational linguistics focuses on language evolution and typology. The title of my current project is: Solving the word puzzle: morphological analysis beyond stem and affixes. I see computational tools as an opportunity to systematize linguistic analyses, a solution to study precisely large amounts of data, and a necessary methodological step towards typological investigation.
Before joining the SMG, I was a post-doctoral researcher at the DLCE (MPG EVA), where I worked on inflectional lexicons, evolutionary models of inflectional paradigms and sound correspondence. During my PhD, I studied the typological variation of inflection classes (declensions or conjugations) using computational methods.
Research projects
In the few milliseconds necessary for speakers to say a word and for listeners to understand it, they both make several elaborate deductions. The internal structure of words can be a crucial source of information for these deductions, particularly when words have multiple grammatical forms, a process known as inflection. Across languages, the nature and number of contrasts expressed through inflection can vary greatly. While a language such as English has only a handful of grammatical distinctions, some languages can have up to thousands. Moreover, these distinctions can be manifested by diverse intricate sound contrasts. For example, the verbal system of English would be simple if all verbs conformed to the pattern of jump~jumped, which can be neatly segmented into a stem (jump) and affixes (-ed). But across languages, many words behave more like the pair think~thought which resist segmentation. In many languages, layers of regularity and idiosyncrasy further complicate the matter. Understanding the puzzling complexity of inflection is essential to explain the structure and evolution of the world's languages. Yet, linguistics still lacks a consistent, predictable methodology to study inflection.
To assess inflectional complexity across languages, this project investigates word structures across typologically diverse languages, using quantitative, computational tools.
Current studies in this area have two main – but related – shortcomings. First, they often start from pre-analysed paradigms, where forms have been segmented by hand into stems (removed from the data) and affixes. These affixal tables are not commensurate across languages. Second, studies focus on assessing how difficult it is for speakers to predict forms for a given meaning, and ignore the parallel problem of deducing the grammatical meaning of a given form. This question is key to automating word structure analysis.
the project remedies both by providing data, developing computational tools to analyse inflected words, and studying the organisation of inflectional exponence. We work on gathering, digitising, and standardising inflectional lexicons, coordinating with the international morphology community to spread the use of common standards and ensure interoperability. To solve the long standing Segmentation Problem, we write computational tools which focus on characterizing gradient information in words. Finally, our goal is to build a quantitative typology of inflected word structure.
Publications
Recent literature has highlighted the extent to which inflectional paradigms are organised into systems of implications allowing speakers to make full use of the inflection system on the basis of exposure to only a few forms of each word. The present paper contributes to this line of research by investigating in detail the implicative structure of European Portuguese verbal paradigms. After outlining the computational methods we use to that effect, we deploy these methods on a lexicon of about 5000 verbs, and show how the morphological and phonological properties of European Portuguese verbs lead to the observed patterns of predictability.
Stem allomorphy plays a central role in the recent history of morphology, in no small part thanks to a research program initiated by Aronoff (1994). Yet, there is no agreed upon way of deciding whether some bit of form should be considered a proper part of a stem allomorph or an independent exponent. We explore the possibility of just doing away with the notion of stem allomorphy in inflection. We use computational methods to identify within each word a sequence of strings that do not take part in any alternation within that word’s paradigm. We then discuss the relationship of such sequences to the classical notion of a stem, and argue that discontinuous stems are both conceptually and empirically more satisfactory.
Most models of inflectional morphology rely at their core on the identification of recurrent and diverging material across inflected forms. Across theoretical frameworks, this can be expressed in terms of morpheme segmentation, rules, processes, patterns or analogies. Finding these recurrences in large structured lexicons is an important step in empirical computational morphology, where analyses are induced bottom-up from inflected forms. This can be done by aligning all the forms in each paradigm, a task of Multiple Sequence Alignments which is well known in other fields such as evolutionary biology and historical linguistics. In this paper, we present the specific problems which arise when aligning inflected forms, provide a simple alignment format, define evaluation measures and compare two implemented methods on 13 inflectional lexicons. Our intent is to provide the conditions for the inter-operability of future systems, and for incre-mental improvements in this fundamental step for quantitative morphology.
This is a collection of European Portuguese verbal paradigms, in phonemic notation. They are suited for both computational and manual analysis.
This paper discusses the nature of inflection classes (ICs) and provides a fully im-plemented methodology to conduct typological investigations into their structure.ICs (conjugations or declensions) are sets of lexemes which inflect similarly. Theyare often described as partitioning the set of lexemes, but similarities across classeslead some authors to favor hierarchical descriptions. While some formalisms allowfor multiple inheritance, where one class takes after two or more others, it is usuallytaken as an exceptional situation.I submit that the structure of ICs is a typological property of inflectional systems.As a result, ICs are best modelled as semi-lattices, which by design capture non-canonical phenomena. I show how these monotonous multiple inheritance hierar-chies can be inferred automatically from raw paradigms using alternation patternsand formal concept analysis. Using quantitative measures of canonicity, I comparesix inflectional systems and show that multiple inheritance is in fact pervasiveacross inflectional systems.