Slovak WordNet

WordNet is a database of semantic relations. It describes semantic relations between the most frequent Slovak nouns, adjectives, verbs and adverbs following the general model of English WordNet. Slovak entries (synsets) are mapped to English synsets.

The project is still in the development stage. The database currently contains 25 000 synsets. It has been made available to give an insight into the data and processing technologies. The file format can change.

File format

The files are encoded using UTF-8 with the Unix line ending (LF, \n, U+00A0 ...). Each synset is a single line consisting of two records separated by a symbol ␞ U+241E SYMBOL FOR RECORD SEPARATOR. The first Slovak record is linked to the other from Princeton WordNet.

Format of the Slovak record

Each record includes 4 annotations separated by a tab (\t):

Slovak synset can be linked to several English synsets.

Please cite

Ondrej Dzurjuv, Ján Genči and Radovan Garabík: Generating Sets of Synonyms between Languages. In: Natural Language Processing, Multilinguality. Proceedings of the 6th International Conference SLOVKO 2011. Eds. D. Majchráková, R. Garabík. November 2011, Tribun, Brno.


Slovak WordNet is available under following licenses: