News

The course is a regular research seminar in the stated researched areas. It is mandatory that enrolled student has a presentation on a research topic (of interest, or her thesis, or he will talk about a research paper or area) during the term. Topics of presentations focus primarily (but not necessarily) on those related to the Math Information Retrieval@FI group: machine learning, information retrieval, representation learning, and scientific visualization.
There is a discussion group with official course information and a communication channel in addition to the course outline below: watch both frequently!

Topics and Course Outline

Week 1– Introduction

Join us at A502 Faculty of Informatics MU on February 22nd at 10 AM (CET) [or on Zoom].

[9:50 Catering preparation (tea preparation, cakes,...]
10:00 Class introduction, warm-up round-up discussion (expectation, topics/expertise/background, suggested presentations, and readings). Why do you go to college? Bring your research presentation offers (one slide) and ideas to present, read, study, and discuss! Take inspiration from last term's presentations.
10:30 The importance of "selling the ideas and work." Picking the right topics and questions, researching "big issues." Picking the right publication forums (in CS and NLP). The danger of Tyranny of metrics.
10:50 Motivating video: DEK's advice to young students.
10:55 Preparation of schedule of talks for this term, topics to cover.
11:20 Varia, socializing (team builder wanted!), lunch.

Week 2 – Research strategy, evaluation, and course schedule planning

Join us at A502, Faculty of Informatics MU, on February 29th at 10 AM (CET) [or on Zoom].

[9:50 Catering preparation (tea preparation, cakes,...]

10:00 Principles of research communication and scientific work: Put readers in your place!
Specifics of CS research and doctoral studies and their evaluation at FI MU: CS conference rankings

10:30 Importance of "selling the ideas and work," picking the right topics and questions, researching "big issues," and picking the right publication forums (in CS and NLP). An h-index as a measure of impact. The danger of Tyranny of metric.

10:45 Class round-up discussion (suggested presentations, readings, and presentation course schedule).

Week 3 – Martin Čermák: Spatial Sharpening of Land Surface Temperature Data

Join us at A502, Faculty of Informatics MU, on March 7th at 10 AM (CET) [or on Zoom].

[Martin Čermák]: Spatial Sharpening of Land Surface Temperature Data 7/3/2024

Open

Remote sensing is an essential tool in efficiently gathering information about the Earth's surface on a global scale. This information can be used in various areas ranging from agriculture and forest management to battling the effects of heat islands in urban environments. There are many satellite missions with various types of sensors measuring different metrics. All come with their advantages and disadvantages. One of the most common trade-offs in remote sensing is spatial vs. temporal resolution due to physical constraints. This thesis explores the existing ways of enhancing the spatial resolution of land surface temperature (LST) images using environmental variables obtained at finer resolution. Additionally, we form and evaluate a hypothesis that chosen predictors that model anthropogenic heat flux will aid in improving the accuracy of the overall sharpening model.

Chapter contains:

1

Image

1

Study Materials

1

Video

1

Study text

Teacher recommends to study from 25/2/2024 to 9/3/2024.

Week 4 – Martin Kňažovič, Jan Franěk: Interactive Learning Tool Utilizing AI

Join us at A502, Faculty of Informatics MU, on March 14th at 10 AM (CET) or [or on Zoom].

[Martin Kňažovič, Jan Franěk]: Interactive Learning Tool Utilizing AI 14/3/2024

Open

With the recent improvements of the Large Language Models (LLMs), their usage became relevant for most daily tasks. Yet, many of those areas are underutilizing the full potential of LLMs. One such area is the learning domain, which our project focuses on. Our main goal is to create a platform to help students by integrating LLMs and speeding up their learning process. However, there are obstacles in this area; for example, study materials are provided in different formats - videos, PDFs, and slides, which we want to touch on. In the following presentation, we will analyze available tools for learning enhancement, propose a possible solution, and evaluate its results.

Chapter contains:

1

Image

1

PDF

1

Video

1

Study text

Teacher recommends to study from 7/3/2024 to 22/3/2024.

Week 5 – 21. 3. No contact meeting (EACL)

Week 6 – 28. 3. No contact meeting (spa)

Week 7 – 4. 4. No contact meeting (spa)

Week 8 – Katarína Hudcovicová

Join us at A502, Faculty of Informatics MU, on April 11th at 10 AM (CET) and on Zoom.

[Katarína Hudcovicová]: Identifying the Limits of Transformers when Performing Model-Checking with Natural Language 11. 4. 2024

Open

Previous works on natural language inference have examined how well transformer models can reason with text. But what was still lacking was addressing whether they could understand the logical semantics in natural language. The reason is mainly because the previously studied logical problems can be, depending on their structure, more or less computationally complex. Therefore, it is unclear whether the reason for lower performance is due to the difference in computational complexity or the inability to comprehend the logical semantics of natural language. The authors chose the model-checking problem, as their computational complexity is always PTIME. The results suggested that the form and type of language used significantly affect how well the transformer models perform. They can grasp some logical meanings in natural language but still fall short when learning the underlying algorithm of model-checking problems.

Chapter contains:

1

Image

1

Study Materials

1

Video

1

Study text

Teacher recommends to study from 5/4/2024 to 12/4/2024.

Week 9 – Jiří Žák

Join us at A502, Faculty of Informatics MU, on April 18th at 10 AM (CET) (the teacher will connect via Zoom).

[Jiří Žák]: Invoices Recognition with Large Language Models 18. 4.

Open

This research centers on automated invoice processing, entailing an analysis of existing methods and systems to construct a comprehensive overview. The objective is to develop a pipeline based on established software and to construct a corresponding testing framework. Leveraging this foundation, the aim is to refine the pipeline and evaluate its efficacy within the established testing framework. The subsequent findings shed light on the performance and potential enhancements of the automated invoice processing pipeline.

Chapter contains:

1

Image

1

Video

1

Study text

Teacher recommends to study from 12/4/2024 to 19/4/2024.

Week 10 – Zuzana Pitsmausová

Join us at A502, Faculty of Informatics MU, on April 25th at 10 AM (CET).

[Zuzana Pitsmausová]: Feature Construction 25/4/2024

Open

Feature construction (FC) is a crucial step in the machine learning pipeline, as the quality of features can significantly impact the model's performance. This presentation aims to acquaint listeners with feature construction and briefly overview the state-of-the-art FC methods. The primary focus of the presentation will be an experiment that was conducted using two FC frameworks based on genetic programming (GP) – Evolutionary Forest and M3GP.

Chapter contains:

1

Image

1

PDF

1

Study text

Teacher recommends to study from 17/4/2024 to 26/4/2024.

Week 11 – David Čechák

Join us at A502, Faculty of Informatics MU, on May 2nd at 10 AM (CET) [or on Zoom].

[David Čechák]: Prediction of mRNA decay sites 2/5/2024

Open

Messenger RNA (mRNA) decay is a crucial process in regulating of gene expression, influencing cellular functions and organismal phenotypes. Precise identification of mRNA decay sites will help to understand the post-transcriptional control mechanisms that affect mRNA stability. In this presentation, we will discuss the prediction of mRNA decay sites using a DeBERTa transformer model. Our model is trained on sequences derived from direct RNA sequencing of polyadenylated RNA from HeLa cells, focusing on subsequences within mRNA coding regions to determine the presence of decay sites. Based on our model, we also analyze the role of single nucleotide variants in mRNA decay.

Chapter contains:

1

Video

1

Study text

Teacher recommends to study from 30/4/2024 to 10/5/2024.

Week 12 – no lecture, individual consultations only

Join us at A502, Faculty of Informatics MU, on May 9th at 10 AM (CET) or [or on Zoom].

Week 13 – Marek Kadlčík

Join us at A502, Faculty of Informatics MU, on May 16th at 10 AM (CET) or [or on Zoom].

[Marek Kadlčík, Michal Štefánik, Petr Sojka]: ICLR presentation breaking news and report 16/5/2024

Open

ICLR is among the most impactful ML conferences (A* in CORE ranking). In this presentation, we will report on the acceptance of our presented paper in the main track. We will comment on the main research direction in empirical NLP and show you the highlights of the research that struck our attention during the event.

Chapter contains:

10

Image

1

Video

1

Study text

Teacher recommends to study from 9/5/2024 to 24/5/2024.

Tips for readings, discussions, and presentation preparations:

Žákovi, který se hrozil chyb, Mistr řekl: "Ti, kdo nedělají chyby, chybují nejvíc ze všech – nepokoušejí se o nic nového." Anthony de Mello: O cestě

To a student in danger, the Master said: "Those who do not make mistakes most of all – they do not try anything new." Anthony de Mello

Visual Abstract

Visual abstract

Abstract

Remote sensing is an essential tool in efficiently gathering information about the Earth's surface on a global scale. This information can be used in various areas ranging from agriculture and forest management to battling the effects of heat islands in urban environments.

There are many satellite missions with various types of sensors measuring different metrics. All come with their advantages and disadvantages. One of the most common trade-offs in remote sensing is spatial vs. temporal resolution due to physical constraints.

This thesis explores the existing ways of enhancing the spatial resolution of land surface temperature (LST) images using environmental variables obtained at finer resolution. Additionally, we form and evaluate a hypothesis that chosen predictors that model anthropogenic heat flux will aid in improving the accuracy of the overall sharpening model.

Slides

Presentation slides in PowerPoint format

Lecture recording

Spatial Sharpening of Land Surface Temperature Data

Readings

[1] Firozjaei Mohammad Karimi et al., Satellite-derived land surface temperature spatial sharpening: A comprehensive review on current status and perspectives, European Journal of Remote Sensing, 2022, https://doi.org/10.1080/22797254.2022.2144764

Abstract

With the recent improvements of the Large Language Models (LLMs), their usage became relevant for most daily tasks. Yet, many of those areas are underutilizing the full potential of LLMs. One such area is the learning domain, which our project focuses on.

Our main goal is to create a platform to help students by integrating LLMs and speeding up their learning process. However, there are obstacles in this area; for example, study materials are provided in different formats - videos, PDFs, and slides, which we want to touch on.

In the following presentation, we will analyze available tools for learning enhancement, propose a possible solution, and evaluate its results.

Visual Abstract

Image

Slides

Interactive learning tool utilizing AI

Lecture recording

Interactive Learning Tool Utilizing AI

Readings

TBA

Catering

Modest: water and cakes.

Visual_Abstract.png

Abstract

Previous works on natural language inference have examined how well transformer models can reason with text. But what was still lacking was addressing whether they could understand the logical semantics in natural language. The reason is mainly because the previously studied logical problems can be, depending on their structure, more or less computationally complex. Therefore, it is unclear whether the reason for lower performance is due to the difference in computational complexity or the inability to comprehend the logical semantics of natural language. The authors chose the model-checking problem, as their computational complexity is always PTIME. The results suggested that the form and type of language used significantly affect how well the transformer models perform. They can grasp some logical meanings in natural language but still fall short when learning the underlying algorithm of model-checking problems.

Slides

Presentation

Lecture Recordings

Identifying the Limits of Transformers when Performing Model-Checking with Natural Language

Readings

Tharindu Madusanka and Ian Pratt-Hartmann and Riza Theresa Batista-Navarro: Identifying the limits of transformers when performing model-checking with natural language

Catering

Překvapení.

Visual Abstract

Visual abstract

Abstract

This research centers on automated invoice processing, entailing an analysis of existing methods and systems to construct a comprehensive overview. The objective is to develop a pipeline based on established software and to construct a corresponding testing framework. Leveraging this foundation, the aim is to refine the pipeline and evaluate its efficacy within the established testing framework. The subsequent findings shed light on the performance and potential enhancements of the automated invoice processing pipeline.

Lecture Recordings

Invoices Recognition with Large Language Models

Readings

Šárka Ščavnická et al.: Towards General Document Understanding through Question Answering https://nlp.fi.muni.cz/raslan/2022/paper17.pdf
Martin Geletka et al.: Information Extraction from Business Documents A Case Study https://nlp.fi.muni.cz/raslan/2022/paper18.pdf
Rossum AI: Docile https://docile.rossum.ai/
Štěpán Šimša et al.: DocILE Benchmark for Document Information Localization and Extraction https://arxiv.org/abs/2302.05658

Catering

None :-/.

Abstract

Feature construction (FC) is a crucial step in the machine learning pipeline, as the quality of features can significantly impact the model's performance. This presentation aims to acquaint listeners with feature construction and briefly overview the state-of-the-art FC methods.

The primary focus of the presentation will be an experiment that was conducted using two FC frameworks based on genetic programming (GP) – Evolutionary Forest and M3GP.

Visual Abstract

Visual abstract

Slides

Feature construction

Readings

Vouk, B., Guid, M., Robnik-Sikonja, M.: Feature construction using explanations of individual predictions. Engineering Applications of Artificial Intelligence 120, 105823 (2023) https://doi.org/10.1016/j.engappai.2023.105823
GP (figure): https://www.researchgate.net/figure/Genetic-programming-Tree-based-crossover_fig4_282769665
H. Zhang, A. Zhou and H. Zhang: An Evolutionary Forest for Regression. IEEE Transactions on Evolutionary Computation, vol. 26, no. 4, pp. 735-749, Aug. 2022, https://doi.org/10.1109/TEVC.2021.3136667
H. Zhang, A. Zhou, Q. Chen, B. Xue, and M. Zhang: SR-Forest: A Genetic Programming based Heterogeneous Ensemble Learning Method. IEEE Transactions on Evolutionary Computation, https://doi.org/10.1109/TEVC.2023.3243172 https://github.com/hengzhe-zhang/EvolutionaryForest
J. E. Batista and S. Silva: Comparative study of classifier performance using automatic feature construction by M3GP. 2022 IEEE Congress on Evolutionary Computation (CEC), Padua, Italy, 2022, pp. 1-8, https://doi.org/10.1109/CEC55065.2022.9870343
Muñoz, L., Trujillo, L., & Silva, S. (2015). M3GP - multiclass classification with GP. In Genetic Programming - 18th European Conference, EuroGP 2015, Proceedings (Vol. 9025, pp. 78-91). LNCS Vol. 9025. Springer, Cham. https://doi.org/10.1007/978-3-319-16501-1_7 https://github.com/jespb/Python-M3GP

Catering

TBA

Visual abstract

TBA

Abstract

Messenger RNA (mRNA) decay is crucial in regulating gene expression and influencing cellular functions and organismal phenotypes. Precise identification of mRNA decay sites will help to understand the post-transcriptional control mechanisms that affect mRNA stability. In this presentation, we will discuss the prediction of mRNA decay sites using a DeBERTa transformer model. Our model is trained on sequences derived from direct RNA sequencing of polyadenylated RNA from HeLa cells, focusing on subsequences within mRNA coding regions to determine the presence of decay sites. Based on our model, we also analyze the role of single nucleotide variants in mRNA decay.

Slides

TBA

Lecture Recordings

Prediction of mRNA decay sites

Readings

Determinants of Functional MicroRNA Targeting
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9880601/
miRBind: A Deep Learning Method for miRNA Binding Classification
https://www.mdpi.com/2073-4425/13/12/2323
Using Attribution Sequence Alignment to Interpret Deep Learning Models for
miRNA Binding Site Prediction https://www.mdpi.com/2079-7737/12/3/369

Catering

Water

Abstract

ICLR is among the most impactful NLP conferences (A* in CORE ranking). We will report on the conference and two papers presented at the ICLR workshops in this presentation. We will comment on the main research directions in ML and learning representations (i.e., LLM training) and show you the research highlights that struck our attention during the event.

Visual and Photo Gallery

ICLR 2024 FI MU team

From right: Marek Kadlčík, Michal Štefánik and Petr Sojka (photo by Petr Sojka)

Yoshua Bengio workshop keynote speaker and panelist
(photo by Petr Sojka)

AGI panel discussion: the remaining hurdle is the absence of Kanhemann system 2 (thinking slow, e.g., reasoning)
(photo by Petr Sojka)

Yoshua Bengio worrying about uncontrolled AGI
(photo by Petr Sojka)

Young researcher Marek Kadlčík presenting his poster Calc-X
(photo by Petr Sojka)

Your researcher, Michal Štefánik, presenting his poster to the lady from AI2
(photo by Petr Sojka)

Marek and Michal coping with the storm of interest about Calc-X
(photo by Petr Sojka)

Final minutes of the exhausting conference (6000 people registered 4 ICLR)
(photo by Petr Sojka)

Lecture recordings

ICLR presentation breaking news and report

Readings

Catering

Meloun, sýry, hrozny,...

Příkladný catering z donesených ingrediencí (podzimní běh předmětu pro inspiraci)

camembert, hrozny, jahody, cukroví, miniklobásky a salámy, a mnoho dalšího, co vidíte na fotce.

Other

Chlubení se nádhernými dárky studentů vyučujícímu (tričko a hrníček :-).

Příběh za oběma dárky je ve videu https://www.youtube.com/watch?v=v678Em6qyzk a můj vztah k němu je například tady: https://www.fi.muni.cz/app/news?feed_id=1&lang=cs&id=3244&archive=1

Interactive Syllabus

News

Topics and Course Outline

Week 1– Introduction

Week 2 – Research strategy, evaluation, and course schedule planning

Week 3 – Martin Čermák: Spatial Sharpening of Land Surface Temperature Data

Week 4 – Martin Kňažovič, Jan Franěk: Interactive Learning Tool Utilizing AI

Week 5 – 21. 3. No contact meeting (EACL)

Week 6 – 28. 3. No contact meeting (spa)

Week 7 – 4. 4. No contact meeting (spa)

Week 8 – Katarína Hudcovicová

Week 9 – Jiří Žák

Week 11 – David Čechák

Week 12 – no lecture, individual consultations only

Week 13 – Marek Kadlčík

[Martin Čermák]: Spatial Sharpening of Land Surface Temperature Data 7/3/2024

Visual Abstract

Abstract

Slides

Presentation slides in PowerPoint format

Lecture recording

Readings

[Martin Kňažovič, Jan Franěk]: Interactive Learning Tool Utilizing AI 14/3/2024

Abstract

Visual Abstract

Slides

Lecture recording

Readings

Catering

[Katarína Hudcovicová]: Identifying the Limits of Transformers when Performing Model-Checking with Natural Language 11. 4. 2024

Slides

Lecture Recordings

Váš prohlížeč nepodporuje značku <video> pro přehrávání videa. Identifying the Limits of Transformers when Performing Model-Checking with Natural Language

Readings

Catering

[Jiří Žák]: Invoices Recognition with Large Language Models 18. 4.

Visual Abstract

Abstract

Lecture Recordings

Readings

Catering

[Zuzana Pitsmausová]: Feature Construction 25/4/2024

Abstract

Visual Abstract

Slides

Readings

Catering

[David Čechák]: Prediction of mRNA decay sites 2/5/2024

Visual abstract

Abstract

Slides

Lecture Recordings

Readings

Catering

[Marek Kadlčík, Michal Štefánik, Petr Sojka]: ICLR presentation breaking news and report 16/5/2024

Abstract

Visual and Photo Gallery

Yoshua Bengio workshop keynote speaker and panelist (photo by Petr Sojka)

AGI panel discussion: the remaining hurdle is the absence of Kanhemann system 2 (thinking slow, e.g., reasoning) (photo by Petr Sojka)

Yoshua Bengio worrying about uncontrolled AGI (photo by Petr Sojka)

Young researcher Marek Kadlčík presenting his poster Calc-X (photo by Petr Sojka)

Your researcher, Michal Štefánik, presenting his poster to the lady from AI2 (photo by Petr Sojka)

Marek and Michal coping with the storm of interest about Calc-X (photo by Petr Sojka)

Final minutes of the exhausting conference (6000 people registered 4 ICLR) (photo by Petr Sojka)

Lecture recordings

Readings

Catering

Other

Operations

Identifying the Limits of Transformers when Performing Model-Checking with Natural Language

Yoshua Bengio workshop keynote speaker and panelist
(photo by Petr Sojka)

AGI panel discussion: the remaining hurdle is the absence of Kanhemann system 2 (thinking slow, e.g., reasoning)
(photo by Petr Sojka)

Yoshua Bengio worrying about uncontrolled AGI
(photo by Petr Sojka)

Young researcher Marek Kadlčík presenting his poster Calc-X
(photo by Petr Sojka)

Your researcher, Michal Štefánik, presenting his poster to the lady from AI2
(photo by Petr Sojka)

Marek and Michal coping with the storm of interest about Calc-X
(photo by Petr Sojka)

Final minutes of the exhausting conference (6000 people registered 4 ICLR)
(photo by Petr Sojka)