Lithuanian WordNet

The WordNet contains semantic relations of the most frequent Lithuanian nouns, verbs, adjectives and adverbs following the general model of English WordNet. Slovak entries (synsets) are mapped to English and Slovak synsets.

The project is still in the development stage. Currently, the database contains 15 000 synsets. It has been made available to give an insight into the data and processing technologies. The file format may change.

File format

The files are encoded using UTF-8 with the Unix line ending (LF, \n, U+00A0 ...). Each synset is a single line consisting of three records separated by a symbol ␞ U+241E SYMBOL FOR RECORD SEPARATOR. Synsets are ordered as follows: Lithuanian record, Slovak record and English record.

Format of the Lithuanian and Slovak record

Each record includes 4 annotations separated by tabs (\t):

Number is a synset identifier.
Part of speech classification:
- n for nouns
- v for verbs
- a for adjectives
- r for adverbs.
Words are literals grouped by similarity of meaning – literals are separated by a semicolon; explanation or further clarification can be given in the brackets. Plus sign (+) denotes semantically ‘most important’ literal in the synset. Minus sign (-) indicates that there is no direct equivalent in the target language. Question mark (?) denotes unclear synset.
Gloss is an optional comment on synset; in most cases this annotation remains empty.

Lithuanian synset can be linked to several English or Slovak synsets.

Please cite

Radovan Garabík and Indrė Pileckytė: From Multilingual Dictionary to Lithuanian Wordnet. In: Natural Language Processing, Corpus Linguistics, E-learning. Proceedings of the SLOVKO 2013 conference. pp. 74—80, 2013

Licenses

Lithuanian WordNet is available under following licenses:

Slovak National Corpus

Lithuanian WordNet

File format

Format of the Lithuanian and Slovak record

Please cite

Licenses

Related links