Dialects, variation and corpus data

Position of the negation word in Võru and Seto

https://doi.org/10.54013/kk764a7

Keywords: dialects, language variation, multivariate analysis, corpus linguistics, ­negation, Võru, Seto

The article exemplifies contemporary language variation studies by examining the variation in the position of the negation word in the Võru and Seto varieties of South Estonian. Using frequency analysis of dialect corpus data, it is shown that despite being close both geographically and linguistically, Seto and Võru exhibit opposite tendencies with regard to their preferences towards pre- and postverbal negation. Võru speakers predominantly use preverbal negation patterns, showing more variation only in two southwestern parishes – Rõuge and Vastseliina. Seto, in turn, prefers the typologically rare postverbal negation construction, but occasionally also makes use of preverbal negation.

Mixed-effects logistic regression analysis was conducted to compare newer East Seto data with older data from Rõuge and Vastseliina with regard to the importance, strength and direction of the factors conditioning the variation in Seto and Võru. The results suggested that the variation in both language varieties is primarily affected by factors related to memory, processing, and communication. In particular, it was the word order of the previously activated negation construction that turned out to have the strongest effect: the use of both pre- and postverbal negation significantly raised the probability of the construction being repeated with the same respective word order. In East Seto, the lexical negation word (expressing grammatical tense) also had a significant role: the use of a preverbal negation construction became more likely with the present tense negation word ei. This had probably to do with copying the word order of frequent fixed expressions (ma ei tiijäq ‘I don’t know’) from Estonian and Russian as contact languages. The individual effects of speakers and verb lemmas were also accounted for in the analyses as important sources of random variation.

 

Maarja-Liisa Pilvik (b. 1989), MA, University of Tartu, PhD Student, Junior Research ­Fellow (Jakobi 2-430, 51005 Tartu), maarja-liisa.pilvik@ut.ee

Helen Plado (b. 1981), PhD, University of Tartu, Research Fellow; Võro Institute, Researcher (Jakobi 2-426, 51005 Tartu), helen.plado@ut.ee

Liina Lindström (b. 1973), PhD, University of Tartu, Professor of Modern Estonian, Head of the Centre for Digital Humanities and Information Society (Jakobi 2-443, 51005 Tartu), liina.lindstrom@ut.ee

References

Veebivarad

EMK = Eesti murrete korpus. https://www.keel.ut.ee/et/keelekogud/murdekorpus

ggplot2. Create Elegant Data Visualisations Using the Grammar of Graphics.
https://cran.r-project.org/package=ggplot2

Goldvarb Z. A multivariate analysis application for Macintosh. Toronto: Department of Linguistics, University of Toronto, 2018. http://individual.utoronto.ca/tagliamonte/goldvarb.html

Goldvarb X. A variable rule application for Macintosh and Windows. Toronto: Department of Linguistics, University of Toronto, 2005. http://individual.utoronto.ca/tagliamonte/goldvarb.html

lme4. Linear Mixed-Effects Models using ‘Eigen’ and S4. https://cran.r-project.org/package=lme4

Murdearhiiv = Tartu Ülikooli eesti murrete ja sugulaskeelte arhiiv. https://murdearhiiv.ut.ee

R Core Team 2020. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org

SetKo = Interdistsiplinaarne seto korpus. https://setko.ut.ee

sf. Simple Features for R. https://cran.r-project.org/package=sf

sjPlot. Data Visualization for Statistics in Social Science. R package version 2.8.7.
https://CRAN.R-project.org/package=sjPlot

Kirjandus

Baayen, R. Harald 2008. Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge, UK: Cambridge University Press.
https://doi.org/10.1017/CBO9780511801686
Baayen, R. Harald; Davidson, Douglas J.; Bates, Douglas M. 2008. Mixed-effects modeling with crossed random effects for subjects and items. – Journal of Memory and Language, kd 59, nr 4, lk 390-412.
https://doi.org/10.1016/j.jml.2007.12.005
Bates, Douglas; Maechler, Martin; Bolker, Ben; Walker, Steve 2015. Fitting linear mixed-effects models using lme4. – Journal of Statistical Software, kd 67, nr 1, lk 1-48. https://doi.org/10.18637/jss.v067.i01
https://doi.org/10.18637/jss.v067.i01
Bod, Rens; Hay, Jennifer; Jannedy, Stefanie (toim) 2003. Probabilistic Linguistics. Cam­bridge, MA: MIT Press.
https://doi.org/10.7551/mitpress/5582.001.0001
Bresnan, Joan 2007. Is syntactic knowledge probabilistic? Experiments with the English dative alternation. – Roots: Linguistics in Search of its Evidential Base. Toim Sam Feather­ston, Wolfgang Sternefeld. Berlin: Mouton de Gruyter, lk 75-96.
Bresnan, Joan; Ford, Marilyn 2010. Predicting syntax: Processing dative constructions in American and Australian varieties of English. – Language, kd 86, nr 1, lk 168-213.
https://doi.org/10.1353/lan.0.0189
Cedergren, Henrietta J.; Sankoff, David 1974. Variable rules: Performance as a statistical ­reflection of competence. – Language, kd 50, nr 2, lk 333-355.
https://doi.org/10.2307/412441
Diessel, Holger 2017. Usage-based linguistics. – Oxford Research Encyclopedia of Linguistics. Toim Mark Aronoff. Oxford: Oxford University Press.
https://doi.org/10.1093/acrefore/9780199384655.013.363
Dryer, Matthew S. 2013. Order of negative morpheme and verb. – The World Atlas of Language Structures Online. Toim M. S. Dryer, Martin Haspelmath. Leipzig: Max Planck Institute for Evolutionary Anthropology. http://wals.info/chapter/143 (11. VII 2019).
Eichenbaum, Külli; Pajusalu, Karl 2001. Setode ja võrokeste keelehoiakutest ja identiteedist. – Keel ja Kirjandus, nr 7, lk 483-489.
Grafmiller, Jason; Szmrecsanyi, Benedikt; Röthlisberger, Melanie; Heller, Benedikt 2018. General introduction: A comparative perspective on probabilistic variation in grammar. – Glossa: A Journal of General Linguistics, kd 3, nr 1, lk 94.
https://doi.org/10.5334/gjgl.690
Gries, Stefan Th. 2015. The most under-used statistical method in corpus linguistics: Multi-level (and mixed-effects) models. – Corpora, kd 10, nr 1, lk 95-125.
https://doi.org/10.3366/cor.2015.0068
Gries, Stefan Th. 2021. (Generalized linear) mixed-effects modeling: A learner corpus example. – Language Learning, kd 71, nr 3, lk 757-798.
https://doi.org/10.1111/lang.12448
Hagu, Paul; Pajusalu, Karl 2020. Lühkene seto keeleoppus. Lühike seto keeleõpetus. Seto Instituut. http://www.setoinstituut.ee/download/luhikene-seto-keeleopetus (2. VII 2021).
Hazen, Kirk 2011. Labov: Language variation and change. – The SAGE Handbook of Sociolinguistics. Toim Ruth Wodak, Barbara Johnstone, Paul Kerswill. SAGE Publications Ltd, lk 24-39.
https://doi.org/10.4135/9781446200957.n3
Hinrichs, Lars; Szmrecsanyi, Benedikt 2007. Recent changes in the function and frequency of Standard English genitive constructions: A multivariate analysis of tagged corpora. – English Language and Linguistics, kd 11, nr 3, lk 437-474.
https://doi.org/10.1017/S1360674307002341
Hint, Helen; Taremaa, Piia; Reile, Maria; Pajusalu, Renate 2021. Demonstratiivpronoomenid ja -adverbid määratlejatena. Miks me oleme siin ilmas, selles olukorras? – Eesti ja soome-ugri keeleteaduse ajakiri / Journal of Estonian and Finno-Ugric Linguistics, kd 12, nr 1, lk 79-111.
https://doi.org/10.12697/jeful.2021.12.1.03
Hosmer, David W. Jr.; Lemeshow, Stanley; Sturdivant, Rodney X. 2013. Applied Logistic Regression. Hoboken, NJ: John Wiley and Sons.
https://doi.org/10.1002/9781118548387
Iva, Sulev 2002. Võro-eesti synaraamat. (Võro Instituudi toimõndusõq 12.) Võro: Võro Instituut’.
Iva, Sulev 2007. Võru kirjakeele sõnamuutmissüsteem. (Dissertationes philologiae estonicae Universitatis Tartuensis 20.) Tartu: Tartu Ülikooli Kirjastus.
Iva, Sulev 2013. Võru ja seto kõrihäälikud h ja q. − Tartu Ülikooli Lõuna-Eesti keele- ja kultuuriuuringute keskuse aastaraamat XI−XII. Tartu: Tartu Ülikooli Lõuna-Eesti keele- ja kultuuriuuringute keskus, lk 102−116.
Janhunen, Juha 1982. On the structure of Proto-Uralic. − Finnisch-Ugrische Forschungen, nr 44, lk 23-42.
https://doi.org/10.33339/fuf.109829
Jääts, Indrek 2000. Ethnic identity of the Setus and the Estonian-Russian border dispute. – Nationalities Papers, kd 28, nr 4, lk 651-670.
https://doi.org/10.1080/00905990020009665
Kallio, Petri 2007. Kantasuomen konsonanttihistoriaa. − Sámit, sánit, sátnehámit: Riepmočála Pekka Sammallahtii miessemánu 21. beaivve 2007. (Mémoires de la Société Finno-Ougrienne 253.) Toim Jussi Ylikoski, Ante Aikio. Helsinki: Société Finno-Ougrienne, lk 229-249.
Kallio, Petri 2014. The diversification of Proto-Finnic. − Fibula, Fabula, Fact: The Viking Age in Finland. (Studia Fennica Historica 18.) Toim Joonas Ahola, Frog, Clive Tolley. Helsinki: Suomalaisen Kirjallisuuden Seura, lk 155−168.
Keem, Hella; Käsi, Inge 2002. Võru murde tekstid. (Eesti murded VI.) Tallinn: Eesti Keele Instituut.
Klavan, Jane 2012. Evidence in Linguistics: Corpus-Linguistic and Experimental Methods for Studying Grammatical Synonymy. (Dissertationes linguisticae Universitatis Tartuensis 15.) Tartu: Tartu Ülikooli Kirjastus.
Klavan, Jane 2018. Kognitiivne keeleteadus arvude rägastikus. – Keel ja Kirjandus, nr 8-9, lk 697−712.
https://doi.org/10.54013/kk730a6
Klavan, Jane 2021 (ilmumas). The alternation between exterior locative cases and postpositions in Estonian web texts. − Eesti ja soome-ugri keeleteaduse ajakiri / Journal of Estonian and Finno-Ugric Linguistics, kd 12, nr 1.
https://doi.org/10.12697/jeful.2021.12.1.05
Klavan, Jane; Divjak, Dagmar 2016. The cognitive plausibility of statistical classification models: Comparing textual and behavioral evidence. – Folia Linguistica, kd 50, nr 2, lk 355-384.
https://doi.org/10.1515/flin-2016-0014
Klavan, Jane; Pilvik, Maarja-Liisa; Uiboaed, Kristel 2015. The use of multivariate statistical classification models for predicting constructional choice in spoken, non-standard varieties of Estonian. − SKY Journal of Linguistics, nr 28, lk 187−224.
Klavan, Jane; Veismann, Ann 2017. Are corpus-based predictions mirrored in the prefer­ential choices and ratings of native speakers? Predicting the alternation between the Estonian adessive case and the adposition peal ‘on’. − Eesti ja soome-ugri keeleteaduse ajakiri / Journal of Estonian and Finno-Ugric Linguistics, kd 8, nr 2, lk 59−91.
https://doi.org/10.12697/jeful.2017.8.2.03
Kortmann, Bernd 2010. Areal variation in syntax. – Language and Space: An International Hand­book of Linguistic Variation. Theories and Methods. Toim Peter Auer, Jürgen E. Schmidt. Berlin-New York: De Gruyter, lk 837-864.
https://doi.org/10.1515/9783110220278.837
Kortmann, Bernd 2021. Reflecting on the quantitative turn in linguistics. – Linguistics.
https://doi.org/10.1515/ling-2019-0046
Krasnoukhova, Olga; Auwera, Johan van der; Crevels, Mily 2021 (ilmumas). Introduction: Postverbal negation: What, where, why. – Studies in Language.
https://doi.org/10.1075/sl.45.3
Labov, William 1963. The social motivation of a sound change. – Word, kd 19, nr 3, lk 273-309.
https://doi.org/10.1080/00437956.1963.11659799
Labov, William 1972. Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press.
Lindström, Liina 1997. Eitus Võru murde suulises kõnes. – Õdagumeresoomõ lõunapiir’. Toim Karl Pajusalu, Jüvä Sullõv. (Võro Instituudi Toimõtiseq 1.) Võro: Võro Instituut’, lk 143-154.
Lindström, Liina 2001. Eesti murrete korpuse iseloomustus argivestlustega võrrelduna. – Keele kannul. Pühendusteos Mati Erelti 60. sünnipäevaks 12. märtsil 2001. (Tartu Ülikooli eesti keele õppetooli toimetised 17.) Toim Reet Kasik. Tartu: Tartu Ülikooli Kirjastus, lk 212−221.
Lindström, Liina 2015. Ülevaade eesti murrete korpusest seisuga 17.11.2015. https://www.keel.ut.ee/sites/default/files/www_ut/emk_teejuht2015.pdf (2. VII 2021).
Lindström, Liina 2021 (ilmumas). Seto lause põhijooned. − Setomaa 3. Keel, rahvaluule ja tänapäeva kultuur. Toim Andreas Kalkun, Karl Pajusalu, Ergo-Hart Västrik. Tartu: Eesti Rahva Muuseum.
Lindström, Liina; Kalmus, Mervi; Klaus, Anneliis; Bakhoff, Liisi; Pajusalu, Karl 2009. Ainsuse 1. isikule viitamine eesti murretes. − Emakeele Seltsi aastaraamat 54 (2008). Tallinn: Teaduste Akadeemia Kirjastus, lk 159−185.
Lindström, Liina; Pilvik, Maarja-Liisa 2018. Korpuspõhine kvantitatiivne dialektoloogia. − Keel ja Kirjandus, nr 8-9, lk 643−662.
https://doi.org/10.54013/kk730a3
Lindström, Liina; Pilvik, Maarja-Liisa; Plado, Helen 2018. Nimetamiskonstruktsioonid eesti murretes: murdeerinevused või suuline süntaks? − Mäetagused, nr 70, lk 91−126.
https://doi.org/10.7592/MT2018.70.lindstrom_pilvik_plado
Lindström, Liina; Pilvik, Maarja-Liisa; Plado, Helen 2021 (ilmumas). Variation in negation in Seto. – Studies in Language.
https://doi.org/10.1075/sl.19063.lin
Lindström, Liina; Pilvik, Maarja-Liisa; Ruutma, Mirjam; Uiboaed, Kristel 2019. On the use of perfect and pluperfect in Estonian dialects: Frequency and language contacts. − Multi­lingual Finnic − Language Contact and Change. Toim Sofia Björklöf, Santra Jantunen. Helsinki: Finno-Ugrian Society, lk 155−193.
https://doi.org/10.33341/uh.85035
Lindström, Liina; Uiboaed, Kristel 2017. Syntactic variation in ‘need’-constructions in Estonian dialects. − Nordic Journal of Linguistics, kd 40, nr 3, lk 313−349.
https://doi.org/10.1017/S0332586517000191
Lindström, Liina; Vihman, Virve-Anneli 2017. Who needs it? Variation in experiencer mark­ing in Estonian ‘need’-constructions. − Journal of Linguistics, kd 53, nr 4, lk 789−822.
https://doi.org/10.1017/S0022226716000402
Lüdecke, Daniel 2018. ggeffects: Tidy data frames of marginal effects from regression models. – Journal of Open Source Software, nr 3 (26), lk 772.
https://doi.org/10.21105/joss.00772
Mets, Mari 2010. Suhtlusvõrgustikud reaalajas: võru kõnekeele varieerumine kahes Võrumaa külas. (Dissertationes philologiae estonicae Universitatis Tartuensis 25.) Tartu: Tartu Ülikooli Kirjastus.
Miestamo, Matti; Tamm, Anne; Wagner-Nagy, Beáta 2015. Negation in Uralic languages. Introduction. – Negation in Uralic Languages. Toim M. Miestamo, A. Tamm, B. Wagner-Nagy. Amsterdam-Philadelphia: John Benjamins, lk 1-41.
https://doi.org/10.1075/tsl.108.01int
Paas, Friedrich-Eugen 1927. Sega-abielud ja nende mõju rahvusele piiriäärsetes maakondades Eestis. Tartu: Eesti Vabariigi Tartu Ülikool.
Pajusalu, Karl 1996. Multiple Linguistic Contacts in South Estonian: Variation of Verb Inflection in Karksi. (Turun yliopiston suomalaisen ja yleisen kielitieteen laitoksen julkaisuja. Publications of the Department of Finnish and General Linguistics of the University of Turku 54.) Turku: Turku University.
Pajusalu, Karl; Hennoste, Tiit; Niit, Ellen; Päll, Peeter; Viikberg, Jüri 2002. Eesti murded ja kohanimed. Tallinn: Eesti Keele Sihtasutus.
Pajusalu, Karl; Velsker, Eva; Org, Ervin 1999. On recent changes in South Estonian: Dynamics in the formation of the inessive. − International Journal of the Sociology of Language, nr 139, lk 87−103.
https://doi.org/10.1515/ijsl.1999.139.87
Paolillo, John C. 2002. Analyzing Linguistic Variation. (CSLI Lecture Notes 114.) Stanford: CSLI Publications.
Pebesma, Edzer 2018. Simple features for R: Standardized support for spatial vector data. – The R Journal, kd 10, nr 1, lk 439-446.
https://doi.org/10.32614/RJ-2018-009
Pilvik, Maarja-Liisa; Lindström, Liina; Plado, Helen 2021. Murded, varieerumine ja korpus­andmed: eitussõna paiknemine võru ja seto eituslausetes. Lisamaterjalid.
https://doi.org/10.54013/kk764a7
Pinheiro, José C.; Bates, Douglas M. 2000. Mixed-effects Models in S and S-PLUS. New York: Springer.
https://doi.org/10.1007/978-1-4419-0318-1
Plado, Helen 2015. des- ja mata-konverbi kasutusest eesti murretes. – Emakeele Seltsi aasta­raamat 60 (2014). Tallinn: Teaduste Akadeemia Kirjastus, lk 195−218.
https://doi.org/10.3176/esa60.10
Reile, Maria; Plado, Helen; Gudde, Harmen B.; Coventry, Kenny R. 2020. Demonstratives as spatial deictics or something more? Evidence from Common Estonian and Võro. – Folia Linguistica, kd 54, nr 1, lk 167−195.
https://doi.org/10.1515/flin-2020-2030
Saareste, Andrus 1955. Petit atlas des parlers estoniens. Väike eesti murde­atlas. Uppsala: Almqvist & Wiksells.
Siiman, Ann 2018. Variation of the Estonian singular long and short illative form: A multi­variate analysis. − SKY Journal of Linguistics, nr 31, lk 139−167.
Siiman, Ann 2019. Vormikasutuse varieerumine ning selle põhjused osastava ja sisseütleva käände näitel. (Dissertationes philologiae estonicae Universitatis Tartuensis 45.) Tartu: Tartu Ülikooli Kirjastus.
Szmrecsanyi, Benedikt 2005. Language users as creatures of habit: A corpus-based analysis of persistence in spoken English. – Corpus Linguistics and Linguistic Theory, kd 1, nr 1, lk 113-149.
https://doi.org/10.1515/cllt.2005.1.1.113
Szmrecsanyi, Benedikt 2017. Variationist sociolinguistics and corpus-based variationist linguistics: Overlap and cross-pollination potential. − Canadian Journal of Linguistics / Revue canadienne de linguistique, kd 62, nr 4, lk 685-701.
https://doi.org/10.1017/cnj.2017.34
Szmrecsanyi, Benedikt; Anderwald, Lieselotte 2018. Corpus-based approaches to dialect study. – The Handbook of Dialectology. Toim Charles Boberg, John Nerbonne, Dominic Watt. Malden, MA: Wiley-Blackwell, lk 300-313.
https://doi.org/10.1002/9781118827628.ch17
Tagliamonte, Sali A. 2012. Variationist Sociolinguistics: Change, Observation, Interpretation. Chichester: Wiley-​Blackwell.
Tagliamonte, Sali A.; Baayen, R. Harald 2012. Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. − Language Variation and Change, kd 24, nr 2, lk 135-178.
https://doi.org/10.1017/S0954394512000129
Taremaa, Piia; Hint, Helen; Reile, Maria; Pajusalu, Renate 2021. Constructional variation in Estonian: Demonstrative pronouns and adverbs as determiners in noun phrases. − Lingua, kd 254, nr 103030.
https://doi.org/10.1016/j.lingua.2021.103030
Torres Cacoullos, Rena; Travis, Catherine E. 2019. Variationist typology: Shared probabilistic constraints across (non-)null subject languages. – Linguistics, kd 57, nr 3, lk 653-692.
https://doi.org/10.1515/ling-2019-0011
Tulving, Endel 2002. Mälu. Tartu: Tartu Ülikooli Kirjastus.
Veismann, Ann; Klavan, Jane; Õim, Haldur 2018. Teoreetiline keeleteadus ja kvantitatiivsed meetodid. – Keel ja Kirjandus, nr 8-9, lk 609-621.
https://doi.org/10.54013/kk730a1
Viikberg, Jüri 2020. Eesti murrete grammatika. (Eesti keele varamu VIII.) Tartu: Tartu Ülikooli Kirjastus.
Viitso, Tiit-Rein 1998. Estonian. – The Uralic Languages. Toim Daniel Abondolo. London-New York: Routledge, lk 115-148.
Walker, James A. 2013. Variation analysis. – Research Methods in Linguistics. Toim Robert J. Podesva, Devyani Sharma. Cambridge: Cambridge University Press, lk 440-459.
Wickham, Hadley 2016. ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag.
https://doi.org/10.1007/978-3-319-24277-4_9
Winter, Bodo 2020. Statistics for Linguists: An Introduction Using R. New York: Routledge.
https://doi.org/10.4324/9781315165547