→po slovensky

Corpus of Slovak Wikipédia and Necyklopédia

1. Version 6.0

The sixth version wiki-2019-08 containing 50 619 991 tokens was made available in January 2020.

The corpus contains texts from Slovak Wikipédia, as of 2019-08-01.

This version carries four notable changes:

It is lemmatized (lemma is capitalized when it is a proper noun) and morphologically annotated, information on the source is provided.

2. Version 5.0

The fifth version wiki-2018-03 containing 47 283 205 tokens was made available in May 2018.

The corpus contains texts from Slovak Wikipédia and Necyklopédia, as of 2018-03-15.

It is lemmatized (lemma is capitalized when it is a proper noun) and morphologically annotated, information on the source is provided.

3. Version 4.0

The fourth version wiki-2017-02 containing 45 109 693 tokens was made available in March 2017.

The corpus contains texts from Slovak Wikipédia and Necyklopédia, as of 2017-02-28.

It is lemmatized (lemma is capitalized when it is a proper noun) and morphologically annotated, information on the source is provided.

4. Version 3.0

The third version wiki-2016-02 containing 42 615 597 tokens was made available in March 2016.

The corpus contains texts from Slovak Wikipédia and Necyklopédia, as of 2016-02-26.

It is lemmatized and morphologically annotated, information on the source is provided.

5. Version 2.0

The second version wiki-2015-02 containing 40 million tokens was released in March 2015. It includes texts from Slovak Wikipédia and Necyklopédia, as of February 2015.

6. Version 1.0

The first version wiki-2014-02 was released in February 2014 containing 37 548 997 tokens.