P110 Corpus Linguistic and Computational Lexicography

Faculty of Informatics
Spring 2001
Extent and Intensity
2/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
Dr. Patrick Hanks (lecturer)
prof. PhDr. Karel Pala, CSc. (lecturer)
Guaranteed by
prof. PhDr. Karel Pala, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: prof. PhDr. Karel Pala, CSc.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
The capacity limit for the course is 15 student(s).
Current registration and enrolment status: enrolled: 0/15, only registered: 0/15, only registered with preference (fields directly associated with the programme): 0/15
fields of study / plans the course is directly associated with
Syllabus
  • Building a corpus. Design criteria; obtaining permissions; spoken and written texts; size; type and token; Zipf's law; sampling (representativeness); contrasting genres; a monitor corpus.
  • Monolingual dictionary structure. Headwords and subentries; pronunciation transcription; word class and grammar; structure of definitions; example sentences; pragmatics and function words; word origins and word histories; usage notes.
  • Why build a corpus? Language performance and language competence; the problem of introspection; patterns of linguistic behaviour; metaphor and other aspects of language creativity; discourse structure; anaphora; register.
  • Preparing a corpus for use [with Karel Pala]. Indexing; tagging; lemmatization; concordancing programs; sorting the matches; displaying the wider context; identifying source texts.
  • Characteristics of natural language. Cognitive and syntactic prototypes; phraseological norms; "possible" vs. "normal"; probability and certainty; variability; typicality; statistical significance; analytic delicacy.
  • Using the corpus. Parsing and chunking; lexical statistics; collocates; sorting and classifying; linking word use to word meaning.
  • Naturalness. Syntactic well-formedness and textual well-formedness; cohesion; given and new; idiomaticity; neutrality.
  • Bilingual dictionary structure. Target language and metalanguage; wordclass; domain indicators; glosses; phraseology.
  • Using corpora in language comparisons. Parallel corpora and comparable corpora; sentence alignment; lexical gaps; terminology.
Language of instruction
Czech
Further comments (probably available only in Czech)
The course is taught annually.
The course is taught: every week.
The course is also listed under the following terms Spring 1999, Spring 2000, Spring 2002.
  • Enrolment Statistics (Spring 2001, recent)
  • Permalink: https://is.muni.cz/course/fi/spring2001/P110