I047 Introduction to Corpus Linguistics and Computer Lexicography

Faculty of Informatics
Spring 1999
Extent and Intensity
2/0. 2 credit(s). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
prof. PhDr. Karel Pala, CSc. (lecturer)
Guaranteed by
Contact Person: prof. PhDr. Karel Pala, CSc.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Syllabus
  • Introduction to Corpus Linguistics and Computational Lexicography
  • Information technologies and language (text) corpora. Beginning of corpus linguistics, purpose of corpora.
  • Building corpora, collecting corpus data and their standardization, SGML, TEI, representativeness of corpora, their maintenance.
  • Corpora tools, query processors: CQP, CUE, CQM, concordance programmes - XKWIC, OCP, LEXA, WORDCRUNCHER. Queries, regular expressions and their use. Statistical programmes, absolute and relative frequencies, M/I and T-score. Sorting programmes, different codings, code conversions.
  • Annotated corpora,tagging on various levels: structural tagging (SGML), grammatical tagging - POS, lemmata, word forms, programme LEMMA.
  • Syntactic tagging, treebanks, skeleton analysis, constraint grammars, desambiguation on morphological and syntactic level.
  • Parallel corpora, alignment programmes.
  • Czech National Corpus, working with CNC, words, constructions, collocations. Building dictionaries.
  • Basic concepts of Computational Lexicography.
Language of instruction
Czech
Further comments (probably available only in Czech)
The course is taught annually.
The course is taught: every week.
The course is also listed under the following terms Spring 1996, Spring 1997, Spring 1998, Spring 2000, Spring 2001, Spring 2002.
  • Enrolment Statistics (Spring 1999, recent)
  • Permalink: https://is.muni.cz/course/fi/spring1999/I047