→po slovensky

Bibliographical, Style and Genre Annotation

Bibliographical, style and genre annotation are inevitable parts of the primary processing of corpus texts. Information about the identity and the basic text structure are useful for its archiving, citation, statistical evaluation of parameters or investigating the distribution of language units and language phenomena in particular texts. The annotation will be displayed at the bottom of the client Bonito window by clicking on the desired line in a concordance list with the right mouse button. The annotation consists of keys together with values, which can be either free (e.g. author's name) or other (e.g. genre). Keys can refer to style and genre characteristics of text. The main categories are type of text (literary, journalistic, professional, live communication), genre (poem, novel, short story, article, etc.) and domain (subject area, e.g. science, law, politics, economy). These categories can be further divided. Other keys provide the bibliographic details of a source and information about the author and text. Here is the list of keys under which you can find relevant information.

Date format

All the dates are expressed according to ISO 8601 YYYY-MM-DD, e.g. 1998-05-23 in order to make them clear and keep them organized.

External annotation

External annotation uses the key-value structure. Value is a string of characters finished at the end of each line. The multi-line names are therefore excluded. The values may be either free (e.g. name of author) or chosen from specified values (e.g. genre). Optional flags consists of a set of flags separated by commas. Each flag establishes a particular characteristic of a value. These values have a special meaning (they are not necessarily meaningful for all the keys):

... (three dots)
undefined value. This value is not defined, because it is not able to be defined. It is incomplete. It should not be in real annotation.
(an empty space or a whitespace)
the same as „...“. Default value in the automatic annotation. But we suppose it will appear.
missing key
has the same value as the undefined key („...” or empty)
XXX
unknown value. It cannot be defined, e.g. author's name in article.
YYY
undefinable value. It cannot be defined or has no meaning. It cannot be defined or has no meaning, e.g. gender of author (in collaborative work), gender of translator (if not a translation).
MIX
mixture. Mixed values, e.g. author is a hermaphrodite.
MSC
other. If the value is not defined in the set of values, e.g. author is a eunuch.
TTT
unknown value which needs to be defined. The annotation must be completed, the value added.

Annotation of the bank

none of the following keys are not mandatory to use, but the SourceId. Keys are in the form of title (abbreviation). Its meaning is described under the corresponding key and its possible values are listed, if not free.

Name (Name)

Origname (OrgN)

Author (Auth)

Origauthor (OrgA)

Translator (Trnr)

Translation (Trnn)

Values:

ISBN (ISBN)

ISSN (ISSN)

SourceId (ScId)

Id (Id)

Rhyme (Rhym)

Values:

Type (Type)

Values:

Subtype (SubT)

Subtype (SubT) subtype of text — values

for Type=img

for Type=inf

for Type=prf

for Type=liv

(literary (imaginative) text)

(journalistic (informative) text)

(professional text)

(live communication)

poe
poetry

pub
public press

sci
scientific literature, articles, journals, university textbooks

spk
spoken

pro
prose

adv
advertisement

pop
popular science, special interest magazines

wri
written (Internet, telex if used interactively, communication of speech-impaired people)

dra
drama

adm
administration

txb
primary and high school textbooks

enc
encyclopedia and similar works

man
manuals, recipes....

Genre (Genr)

Genre (Genr) genre — values

for Type=img

for Type=inf

for Type=prf

(literary (imaginative) text)

(journalistic (informative) text)

(professional text)

ver
verse

doc (documentary)
minute, protocol, resolution, contract

mon
monograph

son
song, libretto

ann (announce)
directive, decree, questionnaire, commercials, announcements, offers

hnd
handbook

scd
drama script, drama play

lst (list)
lists, programmes, rules, statues, content, masthead

dis
dissertation

scf
film script, film subtitles

rpt (report)
report, interview, announcement, communique

std
study

scr
radio script

anl (analytic)
editorial, comment, gloss, review, critics, discussion, polemic, debate, caricature

abs
abstract

nov
novel

pbb (belles-lettres)
feuilleton, report, feature, column

tcl
article

col
short story, collection of short stories

spc
speeches (political, occasional)

rfl
reflection

ess
essay

dsc
discussion/polemic/debate paper

lct
lecture

mem
memoirs

crs
characteristics

let
letters

crt
critics, review

chr
chronicle

opn
opinion

sen
short epic genres (quotes, aphorisms etc.)

ins
instruction

dia
dialogues

rig
doctoral thesis

dpl
diploma, bachelor and final works

ref
paper, term paper

Subgenre (SubG)

Values:

Domain (Domn)

Values:

Subdomain (SubD)

Subdomain (SubD) subdomain — values

for Domain = ars

for Domain = hum

for Domain = law

for Domain = nat

for Domain = tec

mus
music, opera, operetta, ballet

his
history, archeology

bil
bills, statutes, regulations

agr
agriculture

tra
transport, lines, telecommunication

cin
cinema, film

psy
psychology

jud
judicatures

med
medicine

ene
energetics

arc
architecture

edu
education

jur
jurisdiction (other legal texts)

pha
pharmacy

ind
industry

art
art, photos, sculpture

soc
sociology, communication, media

zoo
zoology

com
computer science

the
theatre, theatre studies and critics

phi
philosophy

bot
botany

bui
building industry

lit
literature, literature science and critics

inf
library science and information sources

bio
biology

sta
standardisation

pol
political science

che
chemistry

lin
linguistics

mat
mathematics

eth
ethnology, ethnography

ggr
geography

cul
cultural science

phy
physics (including astronomy)

swo
social work

met
meteorology

geo
geology

env
environmental studies, ecology

for Domain = ecn

for Domain = blf

for Domain = lif

for Domain = ins

for Domain = plt

eco
economy, banking, business

rel
religion, belief, sects

hou
household (flat, garden, handicraft, kitchen, breeding)

no subdomain

no subdomain

mng
management, control

teo
theology

fsh
clothing, fashion

mer
merchandising, consumer area

exc
the supernatural, occult, magic, astrology

spo
sport

sct
social life

amu
amusement, games, hobbies, free time, travelling

min
ethnic minorities

reg
region

cnl
counselling

clt
culture

Medium (Medi)

Values:

Authsex (AutS)

Values:

Lang (Lang)

Varieta (Vari)

Values:

Paragraphs (Para)

Values:

Emphasis (Emph)

Values:

Diacritics (Dcrt)

Values:

Transsex (TrnS)

Origlang (OrgL)

Date (Date)

Dateorig (OrgD)

Conglomerate (Cong)

Bogocong (Bogo)

Comment (Comn)

Corrected (Corr)

Bibliography (Bibl)

Noises

Images

image

Head-lines

Highlighted text

Hyphen/dash

If the type cannot be easily identified, we use U+002D HYPHEN-MINUS (-). U+2010 HYPHEN is used in for example „Rakúsko-Uhorsko“.

U+2014 EM DASH (—) is used for writing dashes, e. g. „Peniaze — radosť“. U+2212 MINUS SIGN (−) can be used as a unary or binary operator. But supposingly, the operator would not differ in the source document. In such case, we'd use U+002D HYPHEN-MINUS (-).

U+00AD SOFT HYPHEN is not clearly defined. It does not usually appear in the Corpus.

U+2011 NON-BREAKING HYPHEN is equal to U+2010 HYPHEN and never used in the Corpus.

Formulae

Mathematical, chemical and other formulae are marked <equation/>. Simple formulas, chemical compounds, reactions, etc. (used by general public such as H₂O) are marked by the UNICODE characters. We do not use LETTERLIKE SYMBOLS (e.g. instead of U+212A KELVIN SIGN we use U+004B LATIN CAPITAL LETTER K).

For subscripts and superscripts we use the following Unicode characters, e.g. U+00B9 SUPERSCRIPT ONE, U+2074 SUPERSCRIPT FOUR, U+207B SUPERSCRIPT MINUS, e.g. 10⁶ km².

For multiplication signs we use U+00D7 MULTIPLICATION SIGN × or U+00B7 MIDDLE DOT ·, depending on the source text. We do not make corrections! If “H2O” is used in the original text, we use “H2O”.

Tables

Tables are tagged as a <table/>, or <table caption="information on table"/>.

Quotation marks

Quotation marks of the original document are used. There are the following main styles:

"double English ASCII quotation marks"

'single English ASCII quotation marks'

„correct Slovak double quotation marks“

„incorrect Slovak double quotation marks”

‚correct Slovak single quotation marks‘

‚incorrect Slovak single quotation marks’

”correct English double quotation marks”

‘correct English single quotation marks’

‹guillemet single›

«guillemet double»

›inverted guillemet single‹

»inverted guillemet double« p

There is a difference between U+0027 APOSTROPHE (') and U+2019 RIGHT SINGLE QUOTATION MARK (’), then between U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK (‹) and U+003C LESS-THAN SIGN (<) (likewise the right-pointing) as well as between U+201A SINGLE LOW-9 QUOTATION MARK (‚) and U+002C COMMA (,).

If incorrect quotation marks are found in the source document (e.g. ,comma and apostrophe' or , ,two commas and two apostrophes' '), we do not change them. They will be corrected while transformation from the bank into corpusoid. In LaTeX, two commas (, ,) stand for double low-9 quotation mark!

Some keys in the bank

Conglomerate

For books and similar works a conglomerate consists of author's name, hyphen (-) and name.

For journals, newspapers, etc. conglomerates are:

* journals

* newspapers

* miscellanies

* books and similar works

Bogocong

For authorial books the bogocong consists of a multi-letter abbreviation: author's initials and ordinal number assigned to work of a particular author (starting in 1). In case of collaborative work, we use only the first letters of surnames and ordinal number. For journals and newspapers, bogocong consists of a journal abbreviation followed by YY/MM (YY - year, MM - month) or YY/CC (CC – journal issue).

* journals

* newspapers

* miscellanies

* books and similar works

Bibliography

* journals

* newspapers

* miscellanies

* books and similar works