Doing numbers and Cognitive Linguistics

Keywords: corpus linguistics, forced choice task, logistic regression, Estonian

The paper gives a short overview of the recent trends in Cognitive Linguistics. It focuses on the methodological aspects involved and exemplifies how the performance of a corpus-based statistical model can be evaluated by comparing it against the behaviour of native speakers in a linguistic experiment. A mixed-effects logistic regression model is fitted to the corpus data of the Estonian adessive case and the adposition peal ‘on’ in present-day written Estonian. In order to evaluate the goodness of the corpus-based model, its performance is compared to the behaviour of native speakers in a forced choice task. In general, the results of the study reported in this paper show that an adequately constructed probabilistic model based on richly annotated corpus data can perform at a more or less equal level to human beings.

Jane Klavan (b. 1983), PhD, University of Tartu, Lecturer in English Language,



PsychData. (15. VIII 2018).


Arppe, Antti, Abdulrahim, Dana 2013. Converging linguistic evidence on two flavors of production: The synonymy of Arabic COME verbs. – Ettekanne. Second Workshop on Arabic Corpus Linguistics, University of Lancaster, 22–26 July, 2013.

Baayen, Harald R. 2008. Analyzing Linguistic Data: A Practical Introduction To Statistics Using R. Cambridge: Cambridge University Press.

Baayen, Harald R. 2011. Corpus linguistics and naive discriminative learning. – Revista Brasileira de Linguística Aplicada, kd 11, nr 2, lk 295–328.

Baayen, Harald R., Endresen, Anna­, Janda, Laura A., Makarova, Anastasia, Nesset, Tore 2013. Making choices in Russian: pros and cons of statistical methods for rival forms. – Russian Linguistics, kd 37, nr 3, lk 253–291.

Barlow, Michael, Kemmer, Suzanne (toim) 2002. Usage-Based Models of Language. Stanford: CSLI Publications / Center for the Study of Language and Information.

Bartens, Raija 1978. Synteettiset ja analyyttiset rakenteet Lapin paikanilma­uk­sissa. Helsinki: Suomalais-Ugrilainen Seura.

Box, George E. 1976. Science and statistics. – Journal of the American Statistical Association, kd 71, nr 356, lk 791–799.

Bresnan, Joan 2007. Is syntactic knowledge probabilistic? Experiments with the English dative alternation. – Roots: Linguistics in Search of Its Evidential Base. Toim Sam Featherston, Wolfgang Sternefeld. Berlin–New York: Walter de Gruyter, lk 77–96.

Comrie, Bernard 1986. Markedness, grammar, people, and the world. – Marked­ness. Toim Fred R. Eckman, Edith A. Moravcsik, Jessica R. Wirth. New York: Plenum, lk 85–106.

Crawley, Michael J. 2007. Statistics: An Introduction Using R. Chichester: Wiley.

Cuyckens, Hubert, Sandra, Dominick, Rice, Sally 1997. Towards an empirical lexical semantics. – Human Contact Through Language and Linguistics. Toim Birgit Smeija, Meike Tasch. Bern: Peter Lang, lk 35–54.

Dąbrowska, Ewa 2015. Individual differences in grammatical knowledge. – Handbook of Cognitive Linguistics. Toim E. Dąbrowska, Dagmar Divjak. Berlin–Boston: De Gruyter Mouton, lk 650–668.

Dąbrowska, Ewa 2016. Cognitive Linguistics’ seven deadly sins. – Cognitive Linguistics, kd 27, nr 4, lk 479–491.

Dąbrowska, Ewa, Divjak, Dagmar (toim) 2015. Handbook of Cognitive Linguistics. Berlin–Boston: De Gruyter Mouton.

Dancygier, Barbara (toim) 2017. The Cambridge Handbook of Cognitive Linguistics. Cambridge: Cambridge University Press.

D’Arcy, A. 2017, November 3. ‘Numbers are our tools, not our masters.’ – Labov #nwav46 #micdrop [Tweet].

Divjak, Dagmar, Levshina, Natalia, Klavan, Jane 2016a. Cognitive linguistics: Looking back, looking forward. – Cognitive Linguistics, kd 27, nr 4, lk 447–463.

Divjak, Dagmar, Dąbrowska, Ewa, Arppe, Antti 2016b. Machine meets man: Evaluating the psychological reality of corpus-based probabilistic models. – Cognitive Linguistics, kd 27, nr 1, lk 1–33.

Erelt, Mati, Erelt, Tiiu, Ross, Kristiina 2007. Eesti keele käsiraamat. Kolmas, täiendatud trükk. Tallinn: Eesti Keele Sihtasutus.

Erelt, Mati, Kasik, Reet, Metslang, Helle, Rajandi, Henno, Ross, Kristiina, Saari, Henn, Vare, Silvi 1995. Eesti keele grammatika I. Morfoloogia. Tallinn: Eesti Teaduste Akadeemia Eesti Keele Instituut.

Geeraerts, Dirk 2006. Methodology in cognitive linguistics. – Cognitive Linguistics: Current Applications and Future Perspectives, kd 1. Toim Gitte Kristiansen, Michael Achard, René Dirven, Francisco J. Ruiz de Mendoza ibáñez. Berlin–Boston: Mouton de Gruyter, lk 21–50.

Geeraerts, Dirk 2016. The sociosemiotic commitment. – Cognitive Linguistics, kd 27, nr 4, lk 527–542.

Geeraerts, Dirk, Cuyckens, Hubert 2010. Introducing Cognitive Linguistics. The Oxford Handbook of Cognitive Linguistics. Oxford University Press.

Glynn, Dylan, Fischer, Kerstin (toim) 2010. Quantitative Methods in Cognitive Semantics Corpus-Driven Approaches. Berlin–New York: Walter de Gruyter.

Glynn, Dylan, Robinson, Justyna A. (toim) 2014. Corpus Methods for Semantics: Quantitative Studies in Polysemy and Synonymy. Amsterdam–Philadelphia: John Benjamins Publishing Company.

Gonzalez-Marquez, Monica, Mittelberg, Irene, Coulson, Seana, Spivey, Michael J. (toim) 2007. Methods in Cognitive Linguistics. Amsterdam: John Benjamins.

Gries, Stefan Thomas 2009. Quantitative Corpus Linguistics with R: A Practical Introduction. New York: Routledge.

Gries, Stefan Thomas 2013. Statistics for Linguistics with R: A Practical Introduction. Textbook. Berlin: De Gruyter Mouton.

Hagège, Claude 2010. Adpositions: Function-Marking in Human Languages. Oxford: Oxford University Press.

Harrell, Frank E. 2001. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer-Verlag.

Hosmer, David W., Lemeshow, Stanley, Sturdivant, Rodney X. 2013. ­Applied Logistic Regression. 3. tr. New York: John Wiley & Sons.

Janda, Laura A. 2013. Cognitive Linguistics: The Quantitative Turn. The Essential Reader. Berlin–Boston: De Gruyter Mouton.

Johnson, Keith 2008. Quantitative Methods in Linguistics. Malden, MA: Blackwell.

Klavan, Jane 2012. Evidence in Linguistics: Corpus-Linguistic and Experimental Methods for Studying Grammatical Synonymy. (Dissertationes linguisticae Universitatis Tartuensis 15.) Tartu: University of Tartu Press.

Klavan, Jane 2014. A multifactorial corpus analysis of grammatical synonymy. – Corpus Methods for Semantics: Quantitative studies in polysemy and synonymy. Toim D. Glynn, J. A. Robinson. Amsterdam–Philadelphia: John Benjamins Publishing Company, lk 253–278.

Klavan, Jane, Divjak, Dagmar 2016. The cognitive plausibility of statistical classification models: Comparing textual and behavioral evidence. – Folia Linguistica, kd 50, nr 2, lk 355–384.

Klavan, Jane, Pilvik, Maarja-Liisa, Uiboaed, Kristel 2015. The use of multi­variate statistical classification models for predicting constructional choice in spoken, non-standard varieties of Estonian. – SKY Journal of Linguistics, nr 28, lk 187–224.

Klavan, Jane, Veismann, Ann 2017. Are corpus-based predictions mirrored in the preferential choices and ratings of native speakers? Predicting the alternation between the Estonian adessive case and the adposition peal ‘on’. – ESUKA–JEFUL, kd 8, nr 2, lk 59–91.

Langacker, Ronald W. 2016. Working toward a synthesis. – Cognitive Linguistics, kd 27, nr 4, lk 465–477.

Lestrade, Sander 2010. Spatial Case. Berlin: Mouton de Gruyter.

Levshina, Natalia 2015. How to do Linguistics With R: Data Exploration and Statistical Analysis. Amsterdam: John Benjamins Publishing Company.

Matsumura, Kazuto 1994. Is the Estonian adessive really a local case. – Journal of Asian and African Studies, kd 46, nr 47, lk 223–235.

Milin, Petar, Divjak, Dagmar, Dimitrijević, Strahinja, Baayen, Harald R. 2016. Towards cognitively plausible data science in language research. – Cognitive Linguistics, kd 27, nr 4, lk 507–526.

Nesset, Tore 2016. Does historical linguistics need the Cognitive Commitment? Prosodic change in East Slavic. – Cognitive Linguistics, kd 27, nr 4, lk 573–585.

Ojutkangas, Krista 2008. Mihin suomessa tarvitaan sisä-grammeja. – Virittäjä, nr 3, lk 382–400.

Pinheiro, José C., Bates, Douglas M. 2002. Mixed-effects models in S and S-PLUS. New York: Springer.

Popper, Karl R. 1965. The Logic of Scientific Discovery. New York: Harper & Row.

Rescorla, Robert A., Wagner, Allan R. 1972. A theory of pavlovian condition­ing: Variations in the effectiveness of reinforcement and nonreinforcement. – Classical Conditioning II: Current Research and Theory. Toim A. H. Black, W. F. Prokasy. New York: Appleton-Century-Crofts, lk 64–99.

Rice, Sally, Newman, John (toim) 2010. Empirical and Experimental Methods in Cognitive/Functional Research. Stanford: CSLI Publications / Center for the Study of Language and Information.

Sandra, Dominiek, Rice, Sally 1995. Network analyses of prepositional meaning: Mirroring whose mind – the linguist’s or the language user’s? – Cognitive Linguistics, kd 6, nr 1, lk 89–130.

Schmid, Hans-Jörg 2016. Why Cognitive Linguistics must embrace the social and pragmatic dimensions of language and how it could do so more seriously. – Cognitive Linguistics, kd 27, nr 4, lk 543–557.

Zlatev, Jordan 2016. Turning back to experience in Cognitive Linguistics via phenomenology. – Cognitive Linguistics, kd 27, nr 4, lk 559–572.

Vainik, Ene 2017. Kas inimnäoline keeleteadus on võimalik. – Ettekanne. Teoreetiline keeleteadus Eestis V. Tartu, 23.–24. november.

Õim, Haldur 2017. Teoreetiline keeleteadus ja kvantitatiivsed meetodid. – Ettekanne. Teoreetiline keeleteadus Eestis V. Tartu, 23.–24. november.