Slovak-Romanian Parallel Corpus

The first version par-skro-fic-1.1 was released on 24 August 2017 as a small and experimental corpus, including about 1.3 million tokens (603 111 tokens in the Slovak half and 688 867 tokens in the Romanian half).

Use the web interface NoSketch Engine to query the Romanian texts, the Slovak texts.

The Slovak-Romanian Parallel Corpus is a database containing three literary texts translated from Romanian into Slovak and one documents about mutual collaboration. The texts are automatically sentence-aligned. The Slovak texts are automatically morphologically annotated by the MorphoDiTa tagger which has been trained and tuned on tagset developed by the Slovak National Corpus. The Romanian texts are annotated by TreeTagger.