Number use in language: a quantitative and typological investigation

Number Use in Language:

a Quantitative and Typological Investigation

1. Summary of Research Results

Abstract

The research was interdisciplinary, combining work in linguistics (the main discipline) with work in statistics (the secondary discipline). It investigated the relationship between the availability of a grammatical category across languages and the way it is used by speakers of a single language where the grammatical category is generally available. The category examined was number (singular and plural) and the language where its use was analysed was Russian (and Slovene to a lesser extent). The results were positive, and strongly indicative of a relationship between availability and use. Research outcomes were nine article, including one on statistics, thirteen presentations, a dataset of over 243000 noun forms morphosyntactically and semantically encoded (made available on the world wide web), and a statistical model (also made available on the world wide web). An unexpected outcome was the significant findings from a subsidiary investigation into irregularity and frequency.  

 

1.0            Introduction

An important contribution to linguistic typology was Smith-Stark's hierarchy of number availability, an extended version of which is given in (1). Nouns with number marking (formally distinguishing singular and plural) typically occupy some top portion. Different languages make the 'split' at different points on the hierarchy (e.g. only Speaker, Addressee, and Kin terms may mark number). 

 

(1)

Speaker > Addressee  > Kin  > Non-human rational  > Human rational  >

Human non-rational  > Animate  > Concrete inanimate > Abstract inanimate

 

The chief aim of the research was to investigate to what extent Smith-Stark's hierarchy of  number availability impacted on the way number was used. The general methodology was to analyse the way in which the nominals of a one million word Russian corpus distributed their singular and plural forms, and compare that with the nominals' position on the Smith-Stark hierarchy.

 

1. 1      Result of the investigation into a typology of number use

Our analysis of the large Russian corpus has demonstrated that there is a clear relationship between the points on the Smith-Stark hierarchy and a nominal's use of singular and plural.  The exact nature of this relationship is revealed by the proportion of  plural occurrences found in the corpus for nominals belonging to different categories. The proportions are shown in Table 1.

 

Animacy category

singular forms

plural forms

median plural proportion

(plural/freq.)

 

p values

Speaker

6197

3413

35.5%

0.53

Addressee

2600

205

8.7%

0.52

Kin

3733

422

5%

0.05

Non-human rational

248

19

5.5%

0.52

Human rational

9427

7737

45.5%

< 0.001

Human non-rational

851

1181

61.8%

< 0.001

Animate

1588

1227

50%

< 0.001

Concrete inanimate

59830

26285

23%

< 0.001

Abstract inanimate

89875

28068

1.5%

< 0.001

 

TABLE 1: Plural proportions for the animacy categories

 

The p value in Table 1 represents the probability that the observed median was due to chance variation.There is very strong evidence that there is structure in most of the categories. (A value less than 0.5 is strong evidence that the median is significantly different from the corpus.) From Table 1 we see that the evidence is less strong for Speaker, Addressee, and Non-human rational; however a separate test for evidence for structure across all categories gave a value of less than 0.001. For each category the median point is significantly different, indicating structure in the data. Comparing the results in Table 1 with the hierarchy in (1) we see that the proportion of plurals decreases from Speaker to Addressee, then steadily increases from Kin through to Human non-rational, where it peaks; the proportions then steadily decrease through to Abstract inanimate, where the median proportion is under two plural occurrences for every one hundred singular occurrences.

The results of our analysis are statistically significant and represent a typology of number use. A statistical model of this number use typology has been developed, and made available on the world wide web.

 

1.2       Result of investigation into third person pronouns

The problematic position for Smith-Stark's typology of number availability is the third person pronoun (which he omits after brief discussion) and we wished to explore this challenging area for our typology of number use. We analysed a sample of 950 pronouns from the corpus,  examining the contexts in which they occurred, and identifying and categorising their antecedents  according to the Smith-Stark hierarchy. The question we asked was: with respect to the points on the hierarchy, do the nouns that are directly stated pattern with those referred to by a pronoun? Or is there some way in which the pronoun behaves differently? The results are given in Table 2 which shows there is only limited patterning.