PA154 Corpus Tools

Faculty of Informatics
Spring 2004
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
prof. PhDr. Karel Pala, CSc. (lecturer)
Guaranteed by
prof. PhDr. Karel Pala, CSc.
Department of Machine Learning and Data Processing - Faculty of Informatics
Contact Person: prof. PhDr. Karel Pala, CSc.
Tue 18:00–19:50 B204
The subject is an introduction to the corpus linguistics a computer lexicography. It offers the basics of the corpora types, corpus tools, tagging and disambiguation. In the part dealing with the computer lexicography one can find the explanation about the machine readable dictionaries and lexical databases and the priciples of their building.
  • Text corpora and their types. Standardization of the corpus data - SGML, XML, TEI. Building corpora. Corpus managers and processors (CQP, Manatee), graphical interface (GCQP, Bonito), concordance programs (OCP). Tagging and taggers (ajka for Czech). Morphological, syntactic and semantic tagging (WSD). Disambiguation and disambiguators (rule based - DIS, stochastic and others). Parallel corpora, alignment and aligners. Using corpora in computer lexicography, context, word sense disambiguation. Machine readable dictionaries and their types. Tools for electronic dictionaries - browsers and editors. Lexicographer's workbench. Lexical databases WordNet and EuroWordNet and tools for handling them: Polaris, Persicope, VisDic.
  • RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000. xiv, 128 s. info
  • Studie z korpusové lingvistiky. 1. vyd. Praha: Karolinum, 2000. 531 s. ISBN 80-7184-893-X. info
