Corpus of Dialects of the Slovak National Corpus
We started to prepare the Corpus of Dialects of the Slovak National Corpus (hereinafter referred to as CD SNC) in 2013. The aim of the initial phase is to gather existing dialect audio recordings or handwritten transcriptions, in particular those already published, to process them in the way using a corpus methodology and tools and make them available for research.
The new version dialekt-3.0 containing 494 722 text units was made accessible in December 2016. As compared with the previous version, the corpus has been supplemented with ten processed texts.
CD SNC is not lemmatised nor morphologically annotated. User can browse the corpus by searching for a word or using CQL. The transcribed texts contain sociolinguistic metadata about respondents, informants, origin and content of record. User can access the corpus through web interface NoSketch Engine, but he/she must register for an account.
A specialized virtual keyboard named SNK-DIALEKT with the special characters has been available in the NoSketch Engine interface since the version dialekt-2.0. The corpus also includes several specific values since the version 2.0.