Structure of the Slovak National Corpus

Overview of SNC corpora.
Frequency lists of lemmata, word forms and parts of speech from the publicly available SNC corpora.

Monolingual corpus of written texts

The current version prim-8.0 has been available since January 2018. The publicly available subcorpus contains more than 1 500 million tokens. Registration for free access is required.

The previous version of the corpus prim-7.0 containing over 1 250 million tokens is also available.

Users can get access to the earlier versions by request.

Manually morphologically annotated corpus

r-mak versions

Morphological database of the Slovak language

Other text corpora

Corpora of texts before the year 1955

Spoken corpora

Corpus of Spoken Slovak

Corpus of Dialects of the Slovak National Corpus

Project webpage

Corpus of Historical Slovak

Project webpage

Corpus of Crimean Tatar language

Project webpage

Slovak Terminology Database

Project webpage, SSL

WordNet

WordNet is a lexical database including information about semantic relations of words. It is aligned with the Princeton 3.0 WordNet. Slovak synsets are linked to the English equivalents.