IA167 String algorithms

Faculty of Informatics
Spring 2014
Extent and Intensity
2/2/0. 4 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: z (credit).
Teacher(s)
Alexandru Popa, Ph.D. (lecturer)
Guaranteed by
prof. RNDr. Mojmír Křetínský, CSc.
Department of Computer Science – Faculty of Informatics
Timetable
Mon 14:00–15:50 G123, Tue 14:00–15:50 G126
Prerequisites
The students should have a good understanding of algorithms and their worst-case analysis. A basic understanding of Graph Theory is also expected.
Course Enrolment Limitations
The course is offered to students of any study field.
Course objectives
This course expands on courses Algorithm design I and Algorithm design II and aims to introduce the students into the world of stringology. At the end of the course the students should be able to: understand and implement the main algorithms used in this field; understand the connection of the topics presented in the course with other research areas; read and summarise a research paper on stringology.
Syllabus
  • 1. Introduction. Classical offline pattern matching model and basic pattern matching algorithms (Rabin-Karp, Boyer-Moore, Knuth-Morris Pratt).
  • 2. Suffix trees, Ukkonen's algorithm for constructing a suffix tree in linear time.
  • 3. Suffix arrays. Building suffix arrays in linear time. Longest Common Prefix and Range Minimum Queries.
  • 4. Pattern matching with don't care symbols, errors, and mistakes. Abrahamson algorithm for computing mismatches. Wu-Manber algorithm. Fast Fourier Transforms and Kangaroo hopping.
  • 5. Pattern matching in the streaming model (where the text arrives online and our memory is less than the size of the pattern).
  • Porat&Porat algorithm solving the pattern matching in the streaming model using only O(log m log n) bits of extra memory.
  • 6. Edit distance vs longest common subsequence. LCS in linear space. Myers' algorithm for edit distance (algorithm used in 'diff' program from UNIX). Computing LCS in subquadratic time (the algorithm of Masek and Paterson).
  • 7. Indexing for approximate pattern matching.
  • 8. Two-dimensional pattern matching algorithms. Zhu-Takaoka algorithm. Bird/Baker algorithm.
  • 9. Introduction to equations on words.
  • 10. Shortest Common Superstring problem. Approximation algorithms and the connection with the Travelling Salesman problem.
  • 11. Introduction to combinatorics on words.
Literature
    recommended literature
  • Jewels of stringology. Edited by Maxime Crochemore - Wojciech Rytter. River Edge, NJ: World Scientific, 2002, x, 310 p. ISBN 9810247826. info
  • GUSFIELD, Dan. Algorithms on strings, trees, and sequences : computer science and computional biology. Cambridge: Cambridge University Press, 1997, xviii, 534. ISBN 0521585198. info
Teaching methods
Lectures, class discussion.
Language of instruction
English
Further comments (probably available only in Czech)
Study Materials
The course is taught annually.

  • Enrolment Statistics (recent)
  • Permalink: https://is.muni.cz/course/fi/spring2014/IA167