Monolingual written language corpus
Current version is prim-5.0, released in the beginning of 2011, containing over 719 million tokens.
Older versions:
prim-4.0 – released in the beginning of 2009, 526 million tokens
prim-3.0 – released in the beggining of 2007, 350 million tokens
prim-2.1 – released in the beginning of 2006, 300 million tokens
prim-2.0 – released in 2005, 250 million tokens
prim1 – released in 2004, 182 million tokens
Manually morphologically annotated corpus
- r-mak-2.0 (58.1 % fiction, 28.9 % journalism, 13.0 % professional texts). 511 534 tokens.
- r-mak-1.0 (57.9 % fiction, 41.8 % journalism a 0.2 % professional texts). 322 600 tokens.
Grants
Budovanie Slovenského národného korpusu a elektronizácia jazykovedného výskumu na Slovensku (Construction of the Slovak National Corpus and the electronization of linguistic research in Slovakia)
- Sémantická a distribučná analýza adjektív v nemčine a slovenčine (The semantic and distribution analysis of adjectives in German and Slovak)
Slovak Terminology Database
Corpus of Spoken Slovak
Parallel corpora
