Lithuanian WordNet

The WordNet contains semantic relations of the most frequent Lithuanian nouns, verbs, adjectives and adverbs following the general model of English WordNet. Slovak entries (synsets) are mapped to English and Slovak synsets.

The project is still in the development stage. Currently, the database contains 15 000 synsets. It has been made available to give an insight into the data and processing technologies. The file format may change.

File format

The files are encoded using UTF-8 with the Unix line ending (LF, \n, U+00A0 ...). Each synset is a single line consisting of three records separated by a symbol ␞ U+241E SYMBOL FOR RECORD SEPARATOR. Synsets are ordered as follows: Lithuanian record, Slovak record and English record.

Format of the Lithuanian and Slovak record

Each record includes 4 annotations separated by tabs (\t):

Lithuanian synset can be linked to several English or Slovak synsets.


Lithuanian WordNet is available under following licenses: