Slovak-German Parallel Corpus
The lattest version par-skde-2.0 was released in May 2016. The database contains almost 446.2 million tokens (219.8 million tokens in the Slovak half, 226.4 million tokens in the German half).
The corpus consists of two parts – the subcorpus of fiction (7.5 million tokens) and the free subcorpus (containing EU documents).
Previous experience with NoSketch Engine and CQL is highly recommended.
Slovak-German Parallel Corpus is a database containing texts for both Slovak and German language. Slovak texts are translated into German or vice versa, as well translations from a third language. The database also contains written or published texts in their original form, therefore, an original ortography is preserved in case of the old ones.
The texts are automatically aligned at the sentence level. Slovak texts are automatically morphologically annotated by the tagger Morče and MorphoDiTa which have been trained and tuned on tagset developed by the SNC. German texts are part-of-speech tagged, using the TreeTagger software.
The corpus par-skde-1.0 was released in December 2014. The database contained almost 263 million tokens (129.5 million tokens in the Slovak half, 133 million tokens in the German half).
The subcorpus of fiction contained 7.5 million tokens.