PLIN057 Automatic processing of text

Faculty of Arts
Spring 2018
Extent and Intensity
0/2/0. 4 credit(s). Type of Completion: z (credit).
Mgr. et Mgr. Ondřej Mrázek (lecturer)
Guaranteed by
doc. PhDr. Zdeňka Hladká, Dr.
Department of Czech Language - Faculty of Arts
Contact Person: Jaroslava Vybíralová
Supplier department: Department of Czech Language - Faculty of Arts
Mon 10:50–12:25 G13
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
The capacity limit for the course is 20 student(s).
Current registration and enrolment status: enrolled: 6/20, only registered: 0/20, only registered with preference (fields directly associated with the programme): 0/20
fields of study / plans the course is directly associated with
there are 10 fields of study the course is directly associated with, display
Course objectives
It is often important in humanities to be able to transform textual data into a structured form. This ability allows for textual analysis, text information retrieval and becomes an input for further research regardless of text semantics.
The aim of the course is to teach students the basic possibilities of processing textual information using selected computer tools. The secondary aim is to teach students to perceive text as a data type that is devoid of meaning and to cope with different text encoding and text's portability between different operating systems.
The course is designed for students who have no experience with this topic.
The pace and content of the course will be tailored to the students' needs. Understanding and practice of the topics issued in the course will be preferred to quantity of topics visited.
Learning outcomes
After the course the student will be familiar with the problems of text processing and will be able to:
  • search the text
  • transform the text into a different form
  • compare texts to each other
  • compile simple databases from the information obtained. The student will also be capable of:
  • using regular expressions and implement them
  • basic work in the   Linux terminal
  • using UNIX text tools (grep, sort, uniq, cut etc.)
  • using UNIX text editors (nano, sed, vim).
    Given students' capabilities and interest also:
  • basics of scripting in   Bash
  • basic text processing in Python language.
  • Syllabus
    • Getting familiar with the course of the semester
    • Regular expressions and their use
    • Getting to know the   UNIX terminal
    • Data Flow Management (Input, Output, Redirect)
    • cat, tac, head, tail, wc,
    • grep, sort, uniq, cut
    • comm, diff, join, paste, csplit
    • tr, nano, sed
    • vim
    • Basics of Scripting in   Bashi
    • Practice
    • Work with text in Python
      recommended literature
    • Manuálové stránky jednotlivých utilit.
    • BRANDEJS, Michal. UNIX - Linux : praktický průvodce. 1. vyd. Praha: Grada, 1996. 340 s. ISBN 8071691704. info
    Teaching methods
    teaching, practicing, discussion
    Assessment methods
    The credit will be awarded for attendance, active participation and passing the test.
    Language of instruction
    Further Comments
    Study Materials
    The course is also listed under the following terms Spring 2019.
    • Enrolment Statistics (Spring 2018, recent)
    • Permalink: