Analysis of Estonian external locative cases in semi-spontaneous speech using an automatic transcription system


Keywords: corpus linguistics, semi-spontaneous speech, external cases, Estonian language

We proceed from the tenets of usage-based linguistics which stresses the importance of studying language use, especially quantitative frequency measures, in order to make (qualitative) inferences about linguistic knowledge. We focus on the frequency counts of Estonian external cases (allative, adessive, ablative) and their different functions in semi-spontaneous everyday speech. The automatic transcription system applied for the present study can be accessed free of charge via the web application http://bark.phon.ioc.ee/webtrans. In our study we used the recordings of 2,681 radio broadcasts containing 15,318,158 transcribed words in total. The average word error rate for the automatic transcription system is around 9%. The transcriptions were morphologically analysed via EstNLTK 1.6 using Vabamorf. For the analysis of exterior locative cases, we extracted data about nouns (S) and pronouns (P) whose word form included one of the following tags: [sg all], [pl all], [sg ad], [pl ad], [sg abl], [pl abl]. The second part of the study focused on the use of the different functions of the locative cases. Since the annotation of functions needs to be done manually, a smaller sample was analysed as a pilot study (15 broadcasts, 101,575 transcribed words in total).

The results of the present study confirm earlier results about the overall frequency of use of external locative cases. The most frequent case is the adessive (it ranks fourth in the overall ranking of Estonian cases), followed by allative and then ablative (the latter belongs to the four least frequently used Estonian cases). Very broadly speaking, our study indicates that in semi-spontaneous speech external cases are used less frequently than in newspaper texts and fiction. As for the different functions expressed by external locative cases, we were interested in finding out the overall proportion of uses where the cases express a spatial relation. As expected, ablative has the highest proportion of spatial relations (58% out of 172 uses), followed by adessive (18% out of 1,687 uses) and allative (15% out of 990 uses). It is clear that for the adessive and allative, expressing a spatial relation is not the most frequent function. These two cases carry other important functional loads in the Estonian language, e.g. expressing the experiencer, addressee or possessor. Very broadly, it seems that expressing a spatial relation is proportionally more frequent for the adessive case than for the allative case. Overall, our study presents a number of different frequency counts pertaining to the use of Estonian external locative cases, which can serve as input for further qualitative studies. Based on the results of the present study we confirm that using an automatic transcription system for recorded speech and automatic morphological analysis of the transcriptions are accurate enough to serve as basis for studying Estonian morphosyntax in semi-spontaneous speech.


Jane Klavan (b. 1983), PhD, University of Tartu, Faculty of Arts and Humanities, College of Foreign Languages and Cultures, Lecturer in English Language (Lossi 3, 51003 Tartu), jane.klavan@ut.ee

Tanel Alumäe (b. 1976), PhD, Tallinn University of Technology, School of Information Technologies, Department of Software Science, Senior Researcher (Akadeemia tee 21B, 12618 Tallinn), tanel.alumae@taltech.ee

Arvi Tavast (b. 1969), PhD, Institute of the Estonian Language, Director (Roosikrantsi 6, 10119 Tallinn), arvi@tavast.ee



EstNLTK. Vabavara eestikeelsete tekstide töötluseks. https://github.com/estnltk/estnltk

Sõnaliikide sagedusloend ning käändsõna grammatiliste kategooriate sagedusloendid Tasakaalus korpuse põhjal. https://www.cl.ut.ee/ressursid/gram-kat

Vabamorf. Eesti keele morfanalüsaator. https://github.com/Filosoft/vabamorf

Veebipõhine kõnetuvastus. http://bark.phon.ioc.ee/webtrans



Alumäe, Tanel; Tilk, Ottokar; Asadullah 2018. Advanced rich transcription system for Estonian speech. – Human Language Technologies: the Baltic Perspective. Proceedings of the Eighth International Conference, Baltic HLT 2018. (Frontiers in Artificial Intelligence and Applications 307.) Toim Kadri Muischnek, Kaili Müürisep. Amsterdam: IOS Press, lk 1-8.
Andersson, John M. 1971. The Grammar of Case: Towards a Localistic Theory. (Cambridge Studies in Linguistics 4.) Cambridge: Cambridge University Press.
Andersson, John M. 2006. Modern Grammars of Case. Oxford: Oxford University Press.
Arnon, Inbal; Snider, Neal 2010. More than words: Frequency effects for multi-word phrases. – Journal of Memory and Language, kd 62, nr 1, lk 67-82.
Bartens, Raija 1978. Synteettiset ja analyyttiset rakenteet Lapin paikanilmauksissa. (Suomalais-ugrilaisen seuran toimituksia 166.) Helsinki: Suomalais-ugrilainen seura.
Bod, Rens; Hay, Jennifer; Jannedy, Stefanie 2003. Introduction. – Probabilistic Linguistics. Toim R. Bod, J. Hay, S. Jannedy. Cambridge-Massachusetts-London: The MIT Press, lk 1-10.
Brezina, Vaclav 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cam­bridge University Press.
Bybee, Joan L. 1995. Regular morphology and the lexicon. – Language and Cognitive Processes, kd 10, nr 5, lk 425-455.
Bybee, Joan L. 2006. From usage to grammar: The mind’s response to repetition. – Language, kd 82, nr 4, lk 711-733.
Bybee, Joan L. 2007. Frequency of Use and the Organization of Language. Oxford: Oxford University Press.
Bybee, Joan 2010. Language, Usage, and Cognition. Cambridge: Cambridge University Press.
Bybee, Joan; Hopper, Paul J. (toim) 2001. Frequency and the Emergence of Linguistic Structure. (Typological Studies in Language 45.) Amsterdam: John Benjamins.
Diessel, Holger 2017. Usage-Based Linguistics. Oxford Research Encyclopedia of Linguistics.
Divjak, Dagmar 2019. Frequency in Language: Context, Memory and Attention. Cambridge: Cambridge University Press.
EKG I = Mati Erelt, Reet Kasik, Helle Metslang, Henno Rajandi, Kristiina Ross, Henn Saari, Silvi Vare, Eesti keele grammatika I. Morfoloogia. Tallinn: Eesti Teaduste Akadeemia Eesti Keele Instituut, 1995.
Erelt, Mati; Erelt, Tiiu; Ross, Kristiina 2007. Eesti keele käsiraamat. Tallinn: Eesti Keele ­Sihtasutus.
Haspelmath, Martin 2006. Against markedness (and what to replace it with). – Journal of Linguistics, kd 42, nr 1, lk 25-70.
Hay, Jennifer 2001. Lexical frequency in morphology: Is everything relative? – Linguistics, kd 39, nr 6, lk 1041-1070.
Heine, Bernd 1997. Possession: Cognitive Sources, Forces, and Grammaticalization. (Cam­bridge Studies in Linguistics 83.) Cambridge: Cambridge University Press.
Klavan, Jane 2012. Evidence in Linguistics: Corpus-Linguistic and Experimental Methods for Studying Grammatical Synonymy. (Dissertationes linguisticae Universitatis Tartuensis 15.) Tartu: Tartu University Press.
Klavan, Jane 2017. Pitting corpus-based classification models against each other: A case study for predicting constructional choice in written Estonian. – Corpus Linguistics and Linguistic Theory.
Klavan, Jane; Pilvik, Maarja-Liisa; Uiboaed, Kristel 2015. The use of multivariate statistical classification models for predicting constructional choice in spoken, non-standard varieties of Estonian. – SKY Journal of Linguistics, nr 28, lk 187-224.
Lindström, Liina; Tragel, Ilona 2007. Eesti keele impersonaali ja seisundipassiivi vahekorrast adessiivargumendi kasutamise põhjal. – Keel ja Kirjandus, nr 7, lk 532-553.
Lindström, Liina; Tragel, Ilona 2010. The possessive perfect construction in Estonian. – Folia Linguistica, kd 44, nr 2, lk 371-399.
Lindström, Liina; Vihman, Virve-Anneli 2017. Who needs it? Variation in experiencer marking in Estonian ‘need’-constructions. – Journal of Linguistics, kd 53, nr 4, lk 789-822.
Lindström, Liina; Uiboaed, Kristel; Vihman, Virve-Anneli 2014. Varieerumine tarvis-/vaja-konstruktsioonides keelekontaktide valguses. – Keel ja Kirjandus, nr 8-9, lk 609-630.
Lyons, John 1977. Semantics. Kd 2. Cambridge: Cambridge University Press.
Matsumura, Kazuto 1994. Is the Estonian adessive really a local case? – Journal of Asian and African Studies, nr 46/47, lk 223-235.
Ojutkangas, Krista 2008. Mihin suomessa tarvitaan sisä-grammeja. – Virittäjä, nr 3, lk 382-400.
Orasmaa, Siim; Petmanson, Timo; Tkachenko, Alexander; Laur, Sven; Kaalep, Heiki-Jaan 2016. EstNLTK – NLP toolkit for Estonian. – Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). Portorož: European Language Resources Association, lk 2460-2466.
Tilk, Ottokar; Alumäe, Tanel 2016. Bidirectional recurrent neural network with attention mechanism for punctuation restoration. – Proceedings of the INTERSPEECH 2016: Understanding Speech Processing in Humans and Machines. San Francisco: International Speech Communication Association, lk 3047-3051.
Vainik, Ene 1995. Eesti keele väliskohakäänete semantika kognitiivse grammatika vaate­nurgast. Tallinn: Eesti Keele Instituut.
Valge, Jüri 1970. Eesti keele käänete sagedused kolmes funktsionaalses stiilis. – Keel ja struktuur 4. Tartu, lk 145-162.