Analysis of Estonian external locative cases in semi-spontaneous speech using an automatic transcription system

Keywords: corpus linguistics, semi-spontaneous speech, external cases, Estonian language

We proceed from the tenets of usage-based linguistics which stresses the importance of studying language use, especially quantitative frequency measures, in order to make (qualitative) inferences about linguistic knowledge. We focus on the frequency counts of Estonian external cases (allative, adessive, ablative) and their different functions in semi-spontaneous everyday speech. The automatic transcription system applied for the present study can be accessed free of charge via the web application In our study we used the recordings of 2,681 radio broadcasts containing 15,318,158 transcribed words in total. The average word error rate for the automatic transcription system is around 9%. The transcriptions were morphologically analysed via EstNLTK 1.6 using Vabamorf. For the analysis of exterior locative cases, we extracted data about nouns (S) and pronouns (P) whose word form included one of the following tags: [sg all], [pl all], [sg ad], [pl ad], [sg abl], [pl abl]. The second part of the study focused on the use of the different functions of the locative cases. Since the annotation of functions needs to be done manually, a smaller sample was analysed as a pilot study (15 broadcasts, 101,575 transcribed words in total).

The results of the present study confirm earlier results about the overall frequency of use of external locative cases. The most frequent case is the adessive (it ranks fourth in the overall ranking of Estonian cases), followed by allative and then ablative (the latter belongs to the four least frequently used Estonian cases). Very broadly speaking, our study indicates that in semi-spontaneous speech external cases are used less frequently than in newspaper texts and fiction. As for the different functions expressed by external locative cases, we were interested in finding out the overall proportion of uses where the cases express a spatial relation. As expected, ablative has the highest proportion of spatial relations (58% out of 172 uses), followed by adessive (18% out of 1,687 uses) and allative (15% out of 990 uses). It is clear that for the adessive and allative, expressing a spatial relation is not the most frequent function. These two cases carry other important functional loads in the Estonian language, e.g. expressing the experiencer, addressee or possessor. Very broadly, it seems that expressing a spatial relation is proportionally more frequent for the adessive case than for the allative case. Overall, our study presents a number of different frequency counts pertaining to the use of Estonian external locative cases, which can serve as input for further qualitative studies. Based on the results of the present study we confirm that using an automatic transcription system for recorded speech and automatic morphological analysis of the transcriptions are accurate enough to serve as basis for studying Estonian morphosyntax in semi-spontaneous speech.


Jane Klavan (b. 1983), PhD, University of Tartu, Faculty of Arts and Humanities, College of Foreign Languages and Cultures, Lecturer in English Language (Lossi 3, 51003 Tartu),

Tanel Alumäe (b. 1976), PhD, Tallinn University of Technology, School of Information Technologies, Department of Software Science, Senior Researcher (Akadeemia tee 21B, 12618 Tallinn),

Arvi Tavast (b. 1969), PhD, Institute of the Estonian Language, Director (Roosikrantsi 6, 10119 Tallinn),



EstNLTK. Vabavara eestikeelsete tekstide töötluseks.

Sõnaliikide sagedusloend ning käändsõna grammatiliste kategooriate sagedusloendid Tasakaalus korpuse põhjal.

Vabamorf. Eesti keele morfanalüsaator.

Veebipõhine kõnetuvastus.



