Introduction to Corpus Linguistics and Computational Lexicography
Information technologies and language (text) corpora.
Beginning of corpus linguistics, purpose of corpora.
Building corpora, collecting corpus data and their standardization,
SGML, TEI, representativeness of corpora, their maintenance.
Corpora tools, query processors: CQP, CUE, CQM, concordance
programmes - XKWIC, OCP, LEXA, WORDCRUNCHER. Queries, regular
expressions and their use. Statistical programmes, absolute
and relative frequencies, M/I and T-score. Sorting programmes,
different codings, code conversions.
Annotated corpora,tagging on various levels: structural tagging
(SGML), grammatical tagging - POS, lemmata, word forms,
programme LEMMA.
Syntactic tagging, treebanks, skeleton analysis, constraint
grammars, desambiguation on morphological and syntactic
level.
Parallel corpora, alignment programmes.
Czech National Corpus, working with CNC, words, constructions,
collocations. Building dictionaries.
Basic concepts of Computational Lexicography.
Teacher's information
V ramci predmetu korpusova lingvistika se nabizeji nektera zajimava
temata pro diplomove prace, napr.
1) Rozpoznavani vetnych hranic v ceskych textech
2) Zpracovani viceslovnych spojeni pro znackovani korpusovych textu
3) Semanticke znackovani korpusovych textu