FI:PV030 Textual Information Systems - Course Information
PV030 Textual Information Systems
Faculty of InformaticsSpring 2011
- Extent and Intensity
- 2/1. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
- Teacher(s)
- doc. RNDr. Petr Sojka, Ph.D. (lecturer)
- Guaranteed by
- prof. Ing. Jiří Sochor, CSc.
Department of Visual Computing – Faculty of Informatics
Contact Person: doc. RNDr. Petr Sojka, Ph.D. - Timetable
- Mon 12:00–13:50 B411, Mon 14:00–14:50 B116, Mon 14:00–14:50 B411
- Prerequisites
- Students are strongly advised to bring some basic knowledge of automata theory (IB005 Formal Languages and Automata) and natural language processing (IB030 Introduction to Natural Language Processing or IB047 Introduction to Corpus Linguistics and Computer Lexicography). Some database basics (PB154 Database Systems) will be helpful as well.
- Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- there are 44 fields of study the course is directly associated with, display
- Course objectives
- At the end of the course students should be able to: apply basic techniques and algorithms used in textual information systems; understand text search algorithms (KMP, AC, BM, RK,...) and be familiar with data structures used for index storage, query languages, architectures of textual information system (e.g. Google) including those that use natural language processing techniques.
- Syllabus
- Basic notions. TIS - text information system. Classification of information systems.
- Searching in TIS. Searching and pattern matching classification and data structures.
- Algorithms of Knuth-Morris-Pratt, Aho-Corasick. Boyer-Moore, Commentz-Walter, Buczilowski.
- Theory of automata for searching. Classification of searching problems.
- Indexes. Indexing methods. Data structures for searching and indexing.
- Google as an example of search and indexing engine. Pagerank.
- Signature methods.
- Query languages and document models: boolean, vector, probabilistic, MMM, Paice.
- Data compression. Basic notions. Statistic methods.
- Compression methods based on dictionary. Neural nets for text compression.
- Syntactic methods. Context modeling.
- Spell checking. Filtering information channels. Document classification.
- Literature
- Jaroslav Pokorn\'y, V\'aclav Sn\'a\v{s}el, Du\v{s}an H\'usek: Dokumentografick\'e informa\v{c}n\'{\i} syst\'emy, skripta MFF UK Praha, 1998.
- KORFHAGE, Robert R. Information storage and retrieval. New York: Wiley Computer Publishing, 1997, xiii, 349. ISBN 0471143383. info
- Information retrieval :data structures & algorithms. Edited by William B. Frakes - Ricardo Baeza-Yates. Upper Saddle River: Prentice Hall, 1992, viii, 504. ISBN 0-13-463837-9. info
- Finite-state language processing. Edited by Emmanuel Roche - Yves Schabes. Cambridge: Bradford Book, 1997, xv, 464. ISBN 0262181827. info
- Teaching methods
- Classical lectures, intermixed with brainstorming, class discussions and lectures by experts from industry (e.g. Seznam).
- Assessment methods
- Teaching methods are classical; during the course and at the end the students are examined by written tests. In final test 70 % of points can be achieved, in midterm test 30 %. Examples of tests are posted on the web page of the course. During the course students are motivated by brainstormings, questions and small examples honored by extra points.
- Language of instruction
- English
- Follow-Up Courses
- Further comments (probably available only in Czech)
- Study Materials
The course is taught annually. - Teacher's information
- http://www.fi.muni.cz/~sojka/PV030/
- Enrolment Statistics (Spring 2011, recent)
- Permalink: https://is.muni.cz/course/fi/spring2011/PV030