
Language data

The data is jointly released by the Slovak National Corpus, Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences and Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague.

These (and other) datasets relevant for MT are also available from the Clarin ERIC repository located at the LINDAT-Clarin project page.

To get access to the files, please contact us.

Translation tables for the Moses MT system

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

These tables will help you build your own MT system.

You can get more complete language models here.

Parallel corpora (English-Slovak)

Slovak texts are automatically morphologically annotated with the Slovak National Corpus tagset. English texts are part-of-speech tagged with the Penn Treebank Tagset.

Parallel corpora (Slovak-Czech)

Slovak texts are automatically morphologically annotated with the Slovak National Corpus tagset. Czech texts are automatically morphologically annotated with the Czech National Corpus tagset

Supported by the EC grant FP7-ICT-2009-5 Bringing Machine Translation for European Languages to the User – Enlarged European Union (EuroMatrixPlus-X).