FI:IB047 Intro to Corpus Linguistics - Course Information
IB047 Introduction to Corpus Linguistics and Computer Lexicography
Faculty of InformaticsSpring 2009
- Extent and Intensity
- 2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
- Teacher(s)
- prof. PhDr. Karel Pala, CSc. (lecturer)
doc. Mgr. Pavel Rychlý, Ph.D. (lecturer) - Guaranteed by
- prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: prof. PhDr. Karel Pala, CSc. - Timetable
- Thu 12:00–13:50 B204
- Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- there are 22 fields of study the course is directly associated with, display
- Course objectives
- A basic introduction to the field of corpus linguistics and computational lexicography. Students will study types of corpora, corpus building and usage, especially in the sake of dictionaries building.
- Syllabus
- Information technologies and language (text) corpora. Beginning of corpus linguistics, purpose of corpora.
- Corpus data, corpus types and their standardization, SGML, XML, TEI, CES. Annotated corpora, tagging on various levels: structural tagging, grammatical tagging -- POS, lemmata, word forms. Syntactic tagging, treebanks, skeleton analysis. Parallel corpora, alignment programes. Tools for automatic and semi-automatic annotation, disambiguation.
- Building corpora, maintainance. Corpus tools: corpus manager. Concordance programmes. Queries, regular expressions and their use. Statistical programmes, absolute and relative frequencies, MI and T-score. Work with corpus attributes and tags.
- Working with corpora -- CNC, SUSANNE, Prague Dependency Treebank Words, constructions, collocations.
- Computational lexicography, lexicology.
- Descripton of meanings (semantic features).
- Types of computer dictionaries. Lexicography standards.
- Data for dictionary building -- corpora.
- Lexicography Software tools. Lemmatizers.
- Literature
- SAMPSON, Geoffrey. English for the computer : the SUSANNE corpus and analytic scheme. Oxford: Clarendon Press, 1995, ix, 499. ISBN 0198240236. info
- RYCHLÝ, Pavel. Korpusové manažery a jejich efektivní implementace. Brno, 2000, xiv, 128. info
- Computational lexicography for natural language processing. Edited by Ted Briscoe - Bran Boguraev. London: Longman, 1989, xiv, 310 p. ISBN 0-470-21187-3. info
- SAMPSON, Geoffrey. Empirical linguistics. London: Continuum, 2001, viii, 226. ISBN 0-8264-4883-6. info
- Corpus processing for lexical acquisition. Edited by Bran Boguraev - J. (James) Pustejovsky. Cambridge: Bradford Book, 1996, xi, 245 s. ISBN 0-262-02392-X. info
- Assessment methods
- Lectures, written exam.
- Language of instruction
- Czech
- Follow-Up Courses
- Further Comments
- The course is taught annually.
- Enrolment Statistics (Spring 2009, recent)
- Permalink: https://is.muni.cz/course/fi/spring2009/IB047