IB047 Introduction to Corpus Linguistics and Computer Lexicography
Faculty of Informatics Spring 2012
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Alternate Types of Completion: k (colloquium), z (credit).
The course is also offered to the students of the fields other than those the course is directly associated with.
Fields of study the course is directly associated with
there are 28 fields of study the course is directly associated with, display
Course objectives
A basic introduction to the field of corpus linguistics and
computational lexicography. Students will study types of corpora,
corpus building and usage, especially in the sake of dictionaries building.
Syllabus
Information technologies and language (text) corpora.
Beginning of corpus linguistics, purpose of corpora.
Corpus data, corpus types and their standardization,
SGML, XML, TEI, CES.
Annotated corpora, tagging on various levels: structural tagging, grammatical tagging -- POS, lemmata, word forms.
Syntactic tagging, treebanks, skeleton analysis.
Parallel corpora, alignment programes.
Tools for automatic and semi-automatic annotation, disambiguation.
Building corpora, maintainance.
Corpus tools: corpus manager.
Concordance programmes.
Queries, regular expressions and their use.
Statistical programmes, absolute and relative frequencies, MI and T-score.
Work with corpus attributes and tags.
Working with corpora -- CNC, SUSANNE, Prague Dependency Treebank
Words, constructions, collocations.
Computational lexicography, lexicology.
Descripton of meanings (semantic features).
Types of computer dictionaries. Lexicography standards.
Data for dictionary building -- corpora.
Lexicography Software tools. Lemmatizers.
Literature
SAMPSON, Geoffrey. English for the computer :the SUSANNE corpus and analytic scheme. Oxford: Clarendon Press, 1995. ix, 499 s. ISBN 0-19-824023-6. info
RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000. xiv, 128 s. info
Computational lexicography for natural language processing. Edited by Ted Briscoe - Bran Boguraev. London: Longman, 1989. xiv, 310 p. ISBN 0-470-21187-3. info
SAMPSON, Geoffrey. Empirical linguistics. London: Continuum, 2001. viii, 226. ISBN 0-8264-4883-6. info
Corpus processing for lexical acquisition. Edited by Bran Boguraev - J. (James) Pustejovsky. Cambridge: Bradford Book, 1996. xi, 245 s. ISBN 0-262-02392-. info
Teaching methods
Teaching is performed in the form of oral lectures and seminars, in which the slides and demos of the relevant software tools are combined. Students work out homeworks, prepare presentations based on the literature they had read and develop smaller projects. At the appropriate points of the teaching the open dialog between a teacher and students is used.