snk10/report

Slovakia Road Show Development of the Human Language Technologies and Resources in Slovakia and in the world (10 years of the Slovak National Corpus)

(a complete report in Slovak is here)

Bratislava was in June 2012 a host city of the two-day scientific and informational conference, organized by the Slovak National Corpus Department Ľ. Štúr Institute of Linguistics of the Slovak Academy of Sciences (SAS) in the scope of CESAR project which is aimed at mobilization of national industry and research and enhancing support for language technologies and tools at the national level.

At the press conference, the invited quests stressed the importance of language technologies in the multilingual European society. Slovak NLP research received full support of Ľ. Falťan (Vice President of SAS), M. Cimbáková (General Director – Science and Technology Division of Ministry of Education, Science, Research and Sport of the Slovak Republic) and P. Žigo (Director of Ľ. Štúr Institute of Linguistics SAS).

The talk by G. Rehm (META-NET manager from DFKI GmbH in Berlin) was dedicated to fostering the technological foundations of a multilingual European information society. On one hand T. Váradi (CESAR project coordinator from Research Institute for Linguistics, Hungarian Academy of Sciences in Budapest) emphasized the current size of the Slovak National Corpus, but on the other he pointed out the weak or no support for text analysis, speech analysis or machine translation.

National, parallel and several specialized corpora were presented by F. Čermák, L. Dimitrova, R. Garabík, J. Hajič, L. Iomdin, T. Pintér, V. Stoykova, M. Šimková and M. Tadić. Applied research of language technologies was presented by NEWTON Technologies, company which provides speech recognition services; Seznam.cz, the first catalogue search engine in the Czech Republic; and Education@Internet, international non-profit organization supporting intercultural learning.

D. Katuščák introduced project about the digital library and archive and its possibilities to use the digital text content for linguistic research. L. Hluchý, M. Rusko, J. Staš, D. Hládek and J. Juhár informed of the computing technologies and tools for speech and text processing for Slovak. Building large corpora and tools for computer lexicography were presented by K. Pala and P. Rychlý. Cartographic processing of the Slavic dialects was introduced by P. Žigo. J. Kravjar spoke about the national corpus of theses with the system for detecting plagiarism.

At the international conference information on the current state of language technologies in Slovakia were provided and new trends and visions of development of language technologies were presented. The experts together with stakeholders and general public shared the latest knowledge and expressed most wanted requirements and co-operative ideas in the respective field.