A DATR Theory of Russian Morphology

ESRC Project R000233633

(Completed September 1995; Rated 'Outstanding')

Introduction

The aim of this project was to model a significant proportion of the inflectional morphology (word-structure) of a relatively complex language, Russian, expressing the analysis formally using the lexical knowledge representation language DATR (Evans and Gazdar 1989a, 1989b, 1995; Keller 1995).

The project began in September 1992 and ran for three years. It was managed by Professor Greville Corbett and Dr Norman Fraser, and employed Dunstan Brown as a Research Fellow.

Earlier work on syncretism, inflectional features and the status and number of paradigms in Russian by Corbett (1981; 1982) combined with work on gender (Corbett 1991) would be combined with formal modelling of natural language structures using default inheritance, an area in which one of the researchers has considerable experience (Andry et al 1992; Fraser and Hudson 1992).

Objectives

The project had five stated objectives.

  1. Encoding in machine readable form a substantial fragment of Russian inflectional morphology

    We concentrated on particularly difficult issues of the inflectional system: the organisation of Russian paradigms, gender assignment, the problematic case of the Russian genitive plural of nouns whose stems ended in a 'soft' (palatalised or palatoalveolar) consonant, the question of related verbal stems and the issue of whether verbs which were paired aspectually were separate lexemes (syntactic words).

    Once we had developed theoretical solutions to the problematic cases, we set about developing lexicons based on frequency information from Zasorina's (1977) frequency dictionary. Accounting for the most frequent items means accounting for the most irregular, as irregular items are nearly always among the most frequent.

    The objective has been met by creating lexicons of the first 1500 most frequent nouns and 700 most frequent verbs and these, combined with the theoretical fragment, have been checked computationally.

  2. Comparing inheritance hierarchies with the more traditional organisation in terms of morphological paradigms

    This was a long term goal of the project. The treatment of declension classes as nodes in an inheritance hierarchy contrasts strongly with the traditional notion of paradigms as discrete entities which do not share information. Using default inheritance hierarchies to model wordstructure we see that there is a great deal of information sharing.

  3. Demonstrating new insights into specific, well-established problem areas of Russian morphology

    Difficult and challenging areas of Russian have been modelled by the researchers using DATR: animacy (Corbett and Fraser, 1993), gender and animacy assignment (Fraser and Corbett, 1995), conflict in genitive plural assignment (Brown and Hippisley 1994), the nominal stress system (Brown et al, to appear) and the stem structure of the Russian verb (Brown, to appear).

  4. Evaluation of the usability and utility of DATR as a tool for linguists

    As the researcher had never worked with DATR before this would prove a good test of its value for theoretical linguists. During the initial stages of familiarisation a diary was kept and the insights gained from this have been translated into a document entitled DATR for Linguists.

  5. The generation of design recommendations for improving the usability and utility of DATR for linguists

    Dr Fraser has written a document entitled Practical DATR which contains a number of recommendations for modifications and changes to DATR. This document has arisen as a result of close observation of conceptual and practical problems encountered by the researcher during his period of familiarisation.

Outcomes

  1. A sizeable fragment of Russian inflectional morphology

    DATR lexicons for the first 1500 nouns and first 700 verbs from Zasorina (1977) have been developed. For nouns the combined lexicon and theoretical fragment give over 22,500 (15 x 1500) lines of information, providing 18,000 (12 x 1500) grammatical words, and 4,500 (3 x 1500) lines of agreement information (information as to gender and animacy, together with a gloss). For verbs there is information on 11, 900 grammatical words (17 x 700) plus information on aspect and meaning, making a total of 13, 300 lines of information (19 x 700).

    2200 of the most frequent lexical items have been covered, a significant fragment of Russian inflectional morphology.

  2. A set of recommendations for revisions of DATR

    A document entitled Practical DATR, based on the experience of the researcher, has been written by Dr Fraser. It makes a number of recommendations for possible improvements to DATR.

  3. An account of the value of DATR for theoretical linguistics A document DATR for Linguists discusses important conceptual issues that the researchers on this and a related project have had to confront when working with DATR. It also adumbrates possible theoretical considerations for those wishing to represent their theories formally using DATR. The document is available from the authors or can be obtained from here
  4. Dissemination Ten publications which have appeared or are about to appear in refereed journals our as book sections. Thirty-nine presentations have been made at national and international venues. The DATR fragments and lexicons, together with theorem dumps, are available from here

 

Conclusion

Using DATR to represent our analyses of Russian has enabled us to take a fresh look at problem areas of Russian: gender assignment; inflection classes; the genitive plural; verbal stems. It has allowed us to start to develop a theoretical framework, Network Morphology, which constrains possible DATR representations. A number of informal principles of this framework have guided the computational modelling of Russian morphology using DATR. These include the assumption that morphology is a network of hierarchies, a distinction between a lexemic hierarchy and an inflectional hierarchy and the treatment of inflection classes as nodes in the inflectional hierarchy.

Future work will concentrate on the development of principles which constrain relations between the hierarchies. The development of such principles in dealing with a relatively complex language such as Russian means that it is reasonable to assume that they might well carry over to the representation of other languages of differing typological diversity.

References

Andry, Francois, Norman Fraser, Scott McGlashan, Simon Thornton and Nick Youd. 1992. Making DATR work for speech: lexicon compilation in SUNDIAL. Computational Linguistics 18. 245-267.

Brown, Dunstan. To appear Setevaja morfologija i russkaja glagol'naja sistema. To appear in Vestnik Moskovskogo Universiteta.

Brown, Dunstan and Andrew Hippisley. 1994. Conflict in Russian Genitive Plural Assignment: A Solution Represented in DATR. Journal Slavic Linguistics 2, 1 (winter - spring). 48-76.

Brown, Dunstan, Corbett, Greville G., Fraser, Norman M., Hippisley, Andrew and Alan Timberlake To appear. Russian Noun Stress and Network Morphology. To appear in Linguistics 34, No 1 (1996).

Corbett, Greville G. 1981. Syntactic Features. Journal of Linguistics 17. 55-76.

Corbett, Greville G. 1982. Gender in Russian: an account of gender specification and its relationship to declension. Russian Linguistics 6. 197-232.

Corbett, Greville G. 1991. Gender. Cambridge: C.U.P.

Corbett, Greville G. and Norman M. Fraser. 1993. Network Morphology: a DATR account of Russian nominal inflection. Journal of Linguistics 29. 113-42.

Evans, Roger and Gazdar, Gerald 1989a. Inference in DATR. Proceedings of the 4th Conference of the European Chapter of the Association for Computational Linguistics, 66-71. Manchester, England.

Evans, Roger and Gazdar, Gerald 1989b. The semantics of DATR. In A. G. Cohn (ed.) Proceedings of the Seventh Conference of the Society for the Study of Artificial Intelligence and Simulation of Behaviour, 79-87. London: Pitman/Morgan Kaufmann.

Evans, Roger and Gerald Gazdar. 1995. DATR: A Language For Lexical Knowledge Representation, University of Sussex. CSRP 382.

Fraser, Norman M. and Richard A. Hudson. 1992. Inheritance in Word Grammar. Computational Linguistics 18. 133-58.

Fraser, Norman M. and Greville G. Corbett. 1995. Gender, animacy and declensional class assignment: a unified account for Russian. In: Geert Booij and Jaap van Marle (eds.) Yearbook of Morphology 1994. 123-50. Dordrecht: Kluwer.

Keller, Bill. 1995. DATR Theories and DATR Models. To appear in: Proceedings of the Association for Computational Linguistics '95.

Zasorina, L. N. 1977. Castotnyj slovar' russkogo jazyka. Moscow: Russkij jazyk.

 

Outputs

Corbett, Greville G. and Norman M. Fraser. 1993. Network Morphology: a DATR account of Russian nominal inflection. Journal of Linguistics 29. 113-42.

Brown, Dunstan and Andrew Hippisley. 1994. Conflict in Russian Genitive Plural Assignment: A Solution Represented in DATR. Journal of Slavic Linguistics 2, 1. 48-76.

Corbett, Greville G. 1994. Systems of Grammatical Number in Slavonic. Slavonic and East European Review  72, 2. 201-217.

Fraser, Norman M. and Greville G. Corbett. 1995. Gender, animacy and  declensional class assignment: a unified account for Russian. In: Geert Booij and Jaap van Marle (eds.) Yearbook of Morphology 1994. 123-150.  Dordrecht: Kluwer.

Corbett, Greville G. and Norman M. Fraser. 1995.  Vycislitel´naja lingvistika i tipologija/Computational linguistics meets typology.  [Abstract]  In: A. E. Kibrik, I. M. Kobozeva, A. I. Kuznecova, T. B. Nazarova (eds) Lingvistika na isxode XX veka: itogi i perspektivy: Tezisy meždunarodnoj konferencii: tom I, 256-258.  Moscow: Filologiceskij fakul´tet MGU imeni M. V. Lomonosova.

Brown, Dunstan. Setevaja morfologija i russkij glagol/Network Morphology and the Russian verb. [Abstract] In: A. E. Kibrik, I. M. Kobozeva, A. I. Kuznecova, T. B. Nazarova (eds) Lingvistika na isxode XX veka: itogi i perspektivy: Tezisy meždunarodnoj konferencii: tom I, 74-76.  Moscow: Filologiceskij fakul´tet MGU imeni M. V. Lomonosova.

Brown, Dunstan, Greville G. Corbett, Norman M. Fraser, Andrew Hippisley and Alan Timberlake 1996. Russian Noun Stress and Network Morphology.  Linguistics  34, 53-107.

Brown, Dunstan 1995.  Setevaja morfologija i russkaja glagol´naja sistema. Vestnik Moskovskogo Universiteta, ser. 9. Filologija no. 6. 91-108.