Projects made available from 2 November 2015 to 4 April 2016
The new version of the monolingual corpus of written texts prim-7.0
The new version of the manually morphologically annotated corpus r-mak-5.0
The new version of specialized Corpus of Economic Texts ecn-2.0
The new version of the Corpus of Slovak Wikipédia and Necyklopédia
The new version of the Slovak-English Parallel Corpus
The new version of the Slovak-Hungarian Parallel Corpus
List of SNC corpora
Frequency lists of lemmata, word forms and parts of speech from the publicly available SNC corpora
The new version of the tool Developer for visualization of word usage over time in SVG format: the tool allows users to query the corpus prim-7.0 and compare word trends in different subcorpora according the text style. Clicking on data point on the graph leads to the number of occurrences of queried word in the corpus in the analysed time span (for registered users only).
In recent months, new texts together with new names have arrived in databases of the Slovak National Corpus, e.g. translator Andrej Chovan, numismatist Vladimír Kovár, journalist and translator Agneša Kalinová, who was one of our most influential movie critics in the 1950's, and Ján Ladislav Kalina, Slovak writer, humorist, screenwriter, translator, journalist (both Kalinová and Kalina courtesy of their daughter Julia Sherwood). The corpus has been enriched with texts by German translator Mirko Kraetsch and writer Ivana Auxtová, who began to write children's literature. We appreciate your help and your cooperation.
In 2015, as many as 538 active users were registered with the SNC databases, 60 of them based in 12 countries outside Slovakia: Poland (18), Czech Republic (16), Serbia (9), Hungary (4), Germany (3), Belgium (2), Russia (2), Ukraine (2), England (1), Croatia (1), Romania (1), Scotland (1). Excluding SNC members, the most active users were: ulej.tomas (6515 queries), dvoran.jaroslav (2144), halagova.gabriela (1898), kyselova.miroslava (1862), jasinska.lucia (1813), gren.zbigniew (1806), timova.zuzana (1788), mzourkova.hana (1524), poppelkova.magdalena (1227), gembalova.jana (1217), sokolova.jana (1083), manurova.veronika (1076), marko.martin (991), makovska.maria (911), nabelkova.mira (885). The most popular query types were: basic (62781 queries), lemma (14618) and CQL (12755). The two most used corpora are prim-6.1 with its subcorpora and Slovak-English parallel corpus including its subcorpora. From among specialised corpora, the corpus of economic texts was queried the most. Our users were also interested in the Corpus of Spoken Slovak and the Corpus of Dialects of the SNC.
In accordance with the conditions of use of the SNC, all users can get a free account with their own user name and password. Registration is personal, non-transferable and valid for one year. Usually, around the turn of January and February, the access is renewed for longtime users and users registered in the previous year. Each year we offer only active accounts. Access can be renewed any time, i. e. the renewal may be requested by previously registered user any time.
Spring practical seminars
5 April 2016 – practical workshop on Querying Written Corpora for Beginners, 09.00 – 17.00 (maximum number of participants: 7; registration required)
6 April 2016 – practical workshop on Querying Using Regular Expressions, 09.00 – 17.00 (maximum number of participants: 7; registration required)
7 April 2016 – practical workshop on Querying Parallel Corpora, 14.00 – 17.00 (maximum number of participants: 7; registration required)
Registration is available at korpus @ korpus.sk