From a collection of dictionaries to a language portal

Keywords: e-lexicography, corpora, dictionary writing system, language portal, data model, API

The article aims to describe some major changes that have taken place in e-lexicography in recent decades in Europe generally and in Estonia in particular. Digital changes have permeated not only the dictionary compilation process but also whole workflow from lexicographic content creation to publication. The focus has shifted from building specific dictionaries to building a central database and infrastructure that can be adapted for further user and NLP applications.

We describe methods and technologies used to better integrate lexicographic data (several tools have been developed within the Horizon 2020 project European Lexicographic Infrastructure), and to better access lexicographic information.

As a turning point for digital change in Estonian lexicography, we consider the start of the development of the new Dictionary Writing System Ekilex and its user interface Sõnaveeb in 2017. The long-term goal is to have a single data source to provide consistent information about the Estonian language. In connection with Ekilex and Sõnaveeb, we discuss several issues: the theoretical foundations of the Ekilex biggest lexicographic dataset, the EKI Combined Dictionary, improvements in lexicographic workflow, and the Ekilex data model and API. The EKI Combined Dictionary contains information layers imported from several monolingual explanatory dictionaries, bilingual dictionaries, a collocations dictionary, and an etymology and morphology database. The improvements in lexicographic workflow include working in one general database, more cooperation between research groups in the institute and more active involvement of external users.

The Ekilex data model meets the requirements for treating both words and meanings as independent entities and for representing both semasiological and onomasiological data. Created data are stored in Ekilex’s PostgreSQL database and comply with all current standards of data exchange. As of April 2021, Ekilex contains approx. 300,000 headwords from general-language dictionaries and more than 90 terminological databases.


Margit Langemets (b. 1961), PhD, Institute of the Estonian Language, Leading Lexicographer (Roosikrantsi 6, 10119 Tallinn),

Kristina Koppel (b. 1985), PhD, Institute of the Estonian Language, Senior Computa­tional Lexicographer (Roosikrantsi 6, 10119 Tallinn),

Jelena Kallas (b. 1976), PhD, Institute of the Estonian Language, Senior Computational Lexicographer (Roosikrantsi 6, 10119 Tallinn),

Arvi Tavast (b. 1969), PhD, Institute of the Estonian Language, Director (Roosikrantsi 6, 10119 Tallinn),