Doing numbers and Cognitive Linguistics


Keywords: corpus linguistics, forced choice task, logistic regression, Estonian

The paper gives a short overview of the recent trends in Cognitive Linguistics. It focuses on the methodological aspects involved and exemplifies how the performance of a corpus-based statistical model can be evaluated by comparing it against the behaviour of native speakers in a linguistic experiment. A mixed-effects logistic regression model is fitted to the corpus data of the Estonian adessive case and the adposition peal ‘on’ in present-day written Estonian. In order to evaluate the goodness of the corpus-based model, its performance is compared to the behaviour of native speakers in a forced choice task. In general, the results of the study reported in this paper show that an adequately constructed probabilistic model based on richly annotated corpus data can perform at a more or less equal level to human beings.

Jane Klavan (b. 1983), PhD, University of Tartu, Lecturer in English Language, jane.klavan@ut.ee



PsychData. https://www.psychdata.com (15. VIII 2018).


Arppe, Antti, Abdulrahim, Dana 2013. Converging linguistic evidence on two flavors of production: The synonymy of Arabic COME verbs. – Ettekanne. Second Workshop on Arabic Corpus Linguistics, University of Lancaster, 22–26 July, 2013.

Baayen, Harald R. 2008. Analyzing Linguistic Data: A Practical Introduction To Statistics Using R. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511801686

Baayen, Harald R. 2011. Corpus linguistics and naive discriminative learning. – Revista Brasileira de Linguística Aplicada, kd 11, nr 2, lk 295–328. https://doi.org/10.1590/S1984-63982011000200003

Baayen, Harald R., Endresen, Anna­, Janda, Laura A., Makarova, Anastasia, Nesset, Tore 2013. Making choices in Russian: pros and cons of statistical methods for rival forms. – Russian Linguistics, kd 37, nr 3, lk 253–291. https://doi.org/10.1007/s11185-013-9118-6

Barlow, Michael, Kemmer, Suzanne (toim) 2002. Usage-Based Models of Language. Stanford: CSLI Publications / Center for the Study of Language and Information.

Bartens, Raija 1978. Synteettiset ja analyyttiset rakenteet Lapin paikanilma­uk­sissa. Helsinki: Suomalais-Ugrilainen Seura.

Box, George E. 1976. Science and statistics. – Journal of the American Statistical Association, kd 71, nr 356, lk 791–799. https://doi.org/10.1080/01621459.1976.10480949

Bresnan, Joan 2007. Is syntactic knowledge probabilistic? Experiments with the English dative alternation. – Roots: Linguistics in Search of Its Evidential Base. Toim Sam Featherston, Wolfgang Sternefeld. Berlin–New York: Walter de Gruyter, lk 77–96.

Comrie, Bernard 1986. Markedness, grammar, people, and the world. – Marked­ness. Toim Fred R. Eckman, Edith A. Moravcsik, Jessica R. Wirth. New York: Plenum, lk 85–106. https://doi.org/10.1007/978-1-4757-5718-7_6

Crawley, Michael J. 2007. Statistics: An Introduction Using R. Chichester: Wiley.

Cuyckens, Hubert, Sandra, Dominick, Rice, Sally 1997. Towards an empirical lexical semantics. – Human Contact Through Language and Linguistics. Toim Birgit Smeija, Meike Tasch. Bern: Peter Lang, lk 35–54.

Dąbrowska, Ewa 2015. Individual differences in grammatical knowledge. – Handbook of Cognitive Linguistics. Toim E. Dąbrowska, Dagmar Divjak. Berlin–Boston: De Gruyter Mouton, lk 650–668. https://doi.org/10.1515/9783110292022-033

Dąbrowska, Ewa 2016. Cognitive Linguistics’ seven deadly sins. – Cognitive Linguistics, kd 27, nr 4, lk 479–491. https://doi.org/10.1515/cog-2016-0059

Dąbrowska, Ewa, Divjak, Dagmar (toim) 2015. Handbook of Cognitive Linguistics. Berlin–Boston: De Gruyter Mouton. https://doi.org/10.1515/9783110292022

Dancygier, Barbara (toim) 2017. The Cambridge Handbook of Cognitive Linguistics. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781316339732

D’Arcy, A. 2017, November 3. ‘Numbers are our tools, not our masters.’ – Labov #nwav46 #micdrop [Tweet]. https://twitter.com/LangMaverick/status/926400351091220480

Divjak, Dagmar, Levshina, Natalia, Klavan, Jane 2016a. Cognitive linguistics: Looking back, looking forward. – Cognitive Linguistics, kd 27, nr 4, lk 447–463. https://doi.org/10.1515/cog-2016-0095

Divjak, Dagmar, Dąbrowska, Ewa, Arppe, Antti 2016b. Machine meets man: Evaluating the psychological reality of corpus-based probabilistic models. – Cognitive Linguistics, kd 27, nr 1, lk 1–33. https://doi.org/10.1515/cog-2015-0101

Erelt, Mati, Erelt, Tiiu, Ross, Kristiina 2007. Eesti keele käsiraamat. Kolmas, täiendatud trükk. Tallinn: Eesti Keele Sihtasutus.

Erelt, Mati, Kasik, Reet, Metslang, Helle, Rajandi, Henno, Ross, Kristiina, Saari, Henn, Vare, Silvi 1995. Eesti keele grammatika I. Morfoloogia. Tallinn: Eesti Teaduste Akadeemia Eesti Keele Instituut.

Geeraerts, Dirk 2006. Methodology in cognitive linguistics. – Cognitive Linguistics: Current Applications and Future Perspectives, kd 1. Toim Gitte Kristiansen, Michael Achard, René Dirven, Francisco J. Ruiz de Mendoza ibáñez. Berlin–Boston: Mouton de Gruyter, lk 21–50.

Geeraerts, Dirk 2016. The sociosemiotic commitment. – Cognitive Linguistics, kd 27, nr 4, lk 527–542. https://doi.org/10.1515/cog-2016-0058

Geeraerts, Dirk, Cuyckens, Hubert 2010. Introducing Cognitive Linguistics. The Oxford Handbook of Cognitive Linguistics. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199738632.013.0001

Glynn, Dylan, Fischer, Kerstin (toim) 2010. Quantitative Methods in Cognitive Semantics Corpus-Driven Approaches. Berlin–New York: Walter de Gruyter. https://doi.org/10.1515/9783110226423

Glynn, Dylan, Robinson, Justyna A. (toim) 2014. Corpus Methods for Semantics: Quantitative Studies in Polysemy and Synonymy. Amsterdam–Philadelphia: John Benjamins Publishing Company. https://doi.org/10.1075/hcp.43

Gonzalez-Marquez, Monica, Mittelberg, Irene, Coulson, Seana, Spivey, Michael J. (toim) 2007. Methods in Cognitive Linguistics. Amsterdam: John Benjamins. https://doi.org/10.1075/hcp.18

Gries, Stefan Thomas 2009. Quantitative Corpus Linguistics with R: A Practical Introduction. New York: Routledge. https://doi.org/10.4324/9780203880920

Gries, Stefan Thomas 2013. Statistics for Linguistics with R: A Practical Introduction. Textbook. Berlin: De Gruyter Mouton. https://doi.org/10.1515/9783110307474

Hagège, Claude 2010. Adpositions: Function-Marking in Human Languages. Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199575008.001.0001

Harrell, Frank E. 2001. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer-Verlag. https://doi.org/10.1007/978-1-4757-3462-1

Hosmer, David W., Lemeshow, Stanley, Sturdivant, Rodney X. 2013. ­Applied Logistic Regression. 3. tr. New York: John Wiley & Sons. https://doi.org/10.1002/9781118548387

Janda, Laura A. 2013. Cognitive Linguistics: The Quantitative Turn. The Essential Reader. Berlin–Boston: De Gruyter Mouton. https://doi.org/10.1515/9783110335255

Johnson, Keith 2008. Quantitative Methods in Linguistics. Malden, MA: Blackwell.

Klavan, Jane 2012. Evidence in Linguistics: Corpus-Linguistic and Experimental Methods for Studying Grammatical Synonymy. (Dissertationes linguisticae Universitatis Tartuensis 15.) Tartu: University of Tartu Press.

Klavan, Jane 2014. A multifactorial corpus analysis of grammatical synonymy. – Corpus Methods for Semantics: Quantitative studies in polysemy and synonymy. Toim D. Glynn, J. A. Robinson. Amsterdam–Philadelphia: John Benjamins Publishing Company, lk 253–278. https://doi.org/10.1075/hcp.43.10kla

Klavan, Jane, Divjak, Dagmar 2016. The cognitive plausibility of statistical classification models: Comparing textual and behavioral evidence. – Folia Linguistica, kd 50, nr 2, lk 355–384. https://doi.org/10.1515/flin-2016-0014

Klavan, Jane, Pilvik, Maarja-Liisa, Uiboaed, Kristel 2015. The use of multi­variate statistical classification models for predicting constructional choice in spoken, non-standard varieties of Estonian. – SKY Journal of Linguistics, nr 28, lk 187–224.

Klavan, Jane, Veismann, Ann 2017. Are corpus-based predictions mirrored in the preferential choices and ratings of native speakers? Predicting the alternation between the Estonian adessive case and the adposition peal ‘on’. – ESUKA–JEFUL, kd 8, nr 2, lk 59–91. https://doi.org/10.12697/jeful.2017.8.2.03

Langacker, Ronald W. 2016. Working toward a synthesis. – Cognitive Linguistics, kd 27, nr 4, lk 465–477. https://doi.org/10.1515/cog-2016-0004

Lestrade, Sander 2010. Spatial Case. Berlin: Mouton de Gruyter.

Levshina, Natalia 2015. How to do Linguistics With R: Data Exploration and Statistical Analysis. Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/z.195

Matsumura, Kazuto 1994. Is the Estonian adessive really a local case. – Journal of Asian and African Studies, kd 46, nr 47, lk 223–235.

Milin, Petar, Divjak, Dagmar, Dimitrijević, Strahinja, Baayen, Harald R. 2016. Towards cognitively plausible data science in language research. – Cognitive Linguistics, kd 27, nr 4, lk 507–526. https://doi.org/10.1515/cog-2016-0055

Nesset, Tore 2016. Does historical linguistics need the Cognitive Commitment? Prosodic change in East Slavic. – Cognitive Linguistics, kd 27, nr 4, lk 573–585. https://doi.org/10.1515/cog-2016-0026

Ojutkangas, Krista 2008. Mihin suomessa tarvitaan sisä-grammeja. – Virittäjä, nr 3, lk 382–400.

Pinheiro, José C., Bates, Douglas M. 2002. Mixed-effects models in S and S-PLUS. New York: Springer.

Popper, Karl R. 1965. The Logic of Scientific Discovery. New York: Harper & Row.

Rescorla, Robert A., Wagner, Allan R. 1972. A theory of pavlovian condition­ing: Variations in the effectiveness of reinforcement and nonreinforcement. – Classical Conditioning II: Current Research and Theory. Toim A. H. Black, W. F. Prokasy. New York: Appleton-Century-Crofts, lk 64–99.

Rice, Sally, Newman, John (toim) 2010. Empirical and Experimental Methods in Cognitive/Functional Research. Stanford: CSLI Publications / Center for the Study of Language and Information.

Sandra, Dominiek, Rice, Sally 1995. Network analyses of prepositional meaning: Mirroring whose mind – the linguist’s or the language user’s? – Cognitive Linguistics, kd 6, nr 1, lk 89–130. https://doi.org/10.1515/cogl.1995.6.1.89

Schmid, Hans-Jörg 2016. Why Cognitive Linguistics must embrace the social and pragmatic dimensions of language and how it could do so more seriously. – Cognitive Linguistics, kd 27, nr 4, lk 543–557. https://doi.org/10.1515/cog-2016-0048

Zlatev, Jordan 2016. Turning back to experience in Cognitive Linguistics via phenomenology. – Cognitive Linguistics, kd 27, nr 4, lk 559–572. https://doi.org/10.1515/cog-2016-0057

Vainik, Ene 2017. Kas inimnäoline keeleteadus on võimalik. – Ettekanne. Teoreetiline keeleteadus Eestis V. Tartu, 23.–24. november.

Õim, Haldur 2017. Teoreetiline keeleteadus ja kvantitatiivsed meetodid. – Ettekanne. Teoreetiline keeleteadus Eestis V. Tartu, 23.–24. november.