Slovak-German Parallel Corpus

The lattest version par-skde-2.0 was released in May 2016. The database contains almost 446.2 million tokens (219.8 million tokens in the Slovak half, 226.4 million tokens in the German half).

The corpus consists of two parts – the subcorpus of fiction (7.5 million tokens) and the free subcorpus (containing EU documents).

You can query the subcorpus of fiction using the NoSketch Engine in the German half, in the Slovak half.

You can query the whole Slovak-German corpus using the NoSketchEngine in the German half, in the Slovak half.

Previous experience with NoSketch Engine and CQL is highly recommended.

Slovak-German Parallel Corpus is a database containing texts for both Slovak and German language. Slovak texts are translated into German or vice versa, as well translations from a third language. The database also contains written or published texts in their original form, therefore, an original ortography is preserved in case of the old ones.

The texts are automatically aligned at the sentence level. Slovak texts are automatically morphologically annotated by the tagger Morče and MorphoDiTa which have been trained and tuned on tagset developed by the SNC. German texts are part-of-speech tagged, using the TreeTagger software.

Version 1.0

The corpus par-skde-1.0 was released in December 2014. The database contained almost 263 million tokens (129.5 million tokens in the Slovak half, 133 million tokens in the German half).

The subcorpus of fiction contained 7.5 million tokens.

Slovak National Corpus

Slovak-German Parallel Corpus

Version 1.0