FI:PA212 Advanced Search Techniques - Informace o předmětu

PA212 Advanced Search Techniques for Large Scale Data Analytics

Fakulta informatiky
jaro 2026

Rozsah

2/0/0. 2 kr. (plus ukončení). Ukončení: zk.
Vyučováno kontaktně

Vyučující

doc. RNDr. Jan Sedmidubský, Ph.D. (přednášející)
prof. Ing. Pavel Zezula, CSc. (přednášející)

Garance

doc. RNDr. Jan Sedmidubský, Ph.D.
Katedra strojového učení a zpracování dat – Fakulta informatiky
Dodavatelské pracoviště: Katedra strojového učení a zpracování dat – Fakulta informatiky

Rozvrh

St 18. 2. až St 13. 5. St 10:00–11:50 A319

Předpoklady

Knowledge of the basic principles of data processing is assumed.

Omezení zápisu do předmětu

Předmět je nabízen i studentům mimo mateřské obory.

Mateřské obory/plány

předmět má 34 mateřských oborů, zobrazit

Anotace

The objective of the course is to explain the problems of information retrieval in large collections of unstructured data, such as text documents or multimedia objects. The main emphasis will be on describing the basic principles of distributed algorithms for processing large volumes of data, e.g., Locality Sensitive Hashing, MapReduce, or PageRank. The algorithms for processing stream data will be introduced as well. The students will also acquire practical experience by applying the presented algorithms to specific tasks.

Výstupy z učení

After completing the course, students are able to:

Describe algorithmic-based differences between processing offline data collections and online data streams;

Understand the basic principles of distributed algorithms for processing large volumes of data;

Evaluate the results of algorithms by several metrics;

Apply presented algorithms, such as k-Means, Locality Sensitive Hashing, MapReduce, or PageRank, to specific tasks.

Klíčová témata

Introduction – what is searching, things useful to know
Support for distributed processing – distributed processing, MapReduce, performance evaluation
Retrieval operators and metrics – common similarity search operators, retrieval metrics for evaluating search results
Clustering – clustering in Euclidean and non-Euclidean spaces; hierarchical, k-means, and BFR clustering algorithms
Finding frequent item sets – counting frequent items; A-Priori and PCY algorithms
Finding similar items – near-neighbor search, shingling of documents, min-hashing, Locality Sensitive Hashing
Processing data streams – sampling data from a stream, queries over sliding windows, filtering a stream
Link analysis – PageRank, topic sensitive PageRank, link spam
Search applications – advertising on the web, recommender systems

Studijní zdroje a literatura

doporučená literatura

P, Deepak a Prasad M. DESHPANDE. Operators for similarity search : semantics, techniques and usage scenarios. Cham: Springer, 2015, xi, 115. ISBN 9783319212562. info
LESKOVEC, Jurij; Anand RAJARAMAN a Jeffrey D. ULLMAN. Mining of massive datasets. 2nd ed. Cambridge: Cambridge University Press, 2014, xi, 467. ISBN 9781107077232. info
BAEZA-YATES, R. a Berthier de Araújo Neto RIBEIRO. Modern information retrieval : the concepts and technology behind search. 2nd ed. Harlow: Pearson, 2011, xxx, 913. ISBN 9780321416919. info

Přístupy, postupy a metody používané ve výuce

Lectures with slides in English. The approach combines theory, algorithms, and practical examples.

Způsob ověření výstupů z učení a požadavky na ukončení

The final exam consists of only a written part. The student is asked several theoretical and practical questions to verify their knowledge obtained during the course lectures.

Vyučovací jazyk

Angličtina

Další komentáře

Studijní materiály
Předmět je vyučován každoročně.

Předmět je zařazen také v obdobích jaro 2017, jaro 2018, jaro 2019, jaro 2020, jaro 2021, jaro 2022, jaro 2023, jaro 2024, jaro 2025.

Statistika zápisu (nejnovější)
Permalink: https://is.muni.cz/predmet/fi/jaro2026/PA212

FI:PA212 Advanced Search Techniques - Informace o předmětu

PA212 Advanced Search Techniques for Large Scale Data Analytics

Další aplikace