Interactive Syllabus
You are currently viewing the whole syllabus; go back to default view.
The speed of loading and viewing the syllabus may be slower when showing a large amount of content.
News
- The course is a regular research seminar in the stated researched areas. It is mandatory that enrolled student has a presentation on a research topic (of interest, or her thesis, or he will talk about a research paper or area) during the term. Topics of presentations focus primarily (but not necessarily) on those related to the Math Information Retrieval@FI group: machine learning, information retrieval, representation learning, and scientific visualization.
- There is a discussion group with official course information and a communication channel in addition to the
course outline below: watch both frequently!
Topics and Course Outline
Week 1– Introduction
Join us at A502 Faculty of Informatics MU on February 22nd at 10 AM (CET) [or on Zoom].
- [9:50 Catering preparation (tea preparation, cakes,...]
- 10:00 Class introduction, warm-up round-up discussion (expectation, topics/expertise/background, suggested presentations, and readings). Why do you go to college? Bring your research presentation offers (one slide) and ideas to present, read, study, and discuss! Take inspiration from last term's presentations.
- 10:30 The importance of "selling the ideas and work." Picking the right topics
and questions, researching "big issues." Picking the right publication
forums (in CS and NLP). The danger of Tyranny of metrics.
- 10:50 Motivating video: DEK's advice to young students.
- 10:55 Preparation of schedule of talks for this term, topics to cover.
- 11:20 Varia, socializing (team builder wanted!), lunch.
Week 2 – Research strategy, evaluation, and course schedule planning
Join us at A502, Faculty of Informatics MU, on February 29th at 10 AM (CET) [or on Zoom].
Specifics of CS research and doctoral studies and their evaluation at FI MU: CS conference rankings
Week 3 – Martin Čermák: Spatial Sharpening of Land Surface Temperature Data
Join us at A502, Faculty of Informatics MU, on March 7th at 10 AM (CET) [or on Zoom].
Remote sensing is an essential tool in efficiently gathering information about the Earth's surface on a global scale. This information can be used in various areas ranging from agriculture and forest management to battling the effects of heat islands in urban environments. There are many satellite missions with various types of sensors measuring different metrics. All come with their advantages and disadvantages. One of the most common trade-offs in remote sensing is spatial vs. temporal resolution due to physical constraints. This thesis explores the existing ways of enhancing the spatial resolution of land surface temperature (LST) images using environmental variables obtained at finer resolution. Additionally, we form and evaluate a hypothesis that chosen predictors that model anthropogenic heat flux will aid in improving the accuracy of the overall sharpening model.
Week 4 – Martin Kňažovič, Jan Franěk: Interactive Learning Tool Utilizing AI
Join us at A502, Faculty of Informatics MU, on March 14th at 10 AM (CET) or [or on Zoom].
With the recent improvements of the Large Language Models (LLMs), their usage became relevant for most daily tasks. Yet, many of those areas are underutilizing the full potential of LLMs. One such area is the learning domain, which our project focuses on. Our main goal is to create a platform to help students by integrating LLMs and speeding up their learning process. However, there are obstacles in this area; for example, study materials are provided in different formats - videos, PDFs, and slides, which we want to touch on. In the following presentation, we will analyze available tools for learning enhancement, propose a possible solution, and evaluate its results.
Week 5 – 21. 3. No contact meeting (EACL)
Week 6 – 28. 3. No contact meeting (spa)
Week 7 – 4. 4. No contact meeting (spa)
Week 8 – Katarína Hudcovicová
Join us at A502, Faculty of Informatics MU, on April 11th at 10 AM (CET) and on Zoom.
Previous works on natural language inference have examined how well transformer models can reason with text. But what was still lacking was addressing whether they could understand the logical semantics in natural language. The reason is mainly because the previously studied logical problems can be, depending on their structure, more or less computationally complex. Therefore, it is unclear whether the reason for lower performance is due to the difference in computational complexity or the inability to comprehend the logical semantics of natural language. The authors chose the model-checking problem, as their computational complexity is always PTIME. The results suggested that the form and type of language used significantly affect how well the transformer models perform. They can grasp some logical meanings in natural language but still fall short when learning the underlying algorithm of model-checking problems.
Week 9 – Jiří Žák
Join us at A502, Faculty of Informatics MU, on April 18th at 10 AM (CET) (the teacher will connect via Zoom).
This research centers on automated invoice processing, entailing an analysis of existing methods and systems to construct a comprehensive overview. The objective is to develop a pipeline based on established software and to construct a corresponding testing framework. Leveraging this foundation, the aim is to refine the pipeline and evaluate its efficacy within the established testing framework. The subsequent findings shed light on the performance and potential enhancements of the automated invoice processing pipeline.
Week 10 – Zuzana Pitsmausová
Join us at A502, Faculty of Informatics MU, on April 25th at 10 AM (CET).
Feature construction (FC) is a crucial step in the machine learning pipeline, as the quality of features can significantly impact the model's performance. This presentation aims to acquaint listeners with feature construction and briefly overview the state-of-the-art FC methods. The primary focus of the presentation will be an experiment that was conducted using two FC frameworks based on genetic programming (GP) – Evolutionary Forest and M3GP.
Week 11 – David Čechák
Join us at A502, Faculty of Informatics MU, on May 2nd at 10 AM (CET) [or on Zoom].
Messenger RNA (mRNA) decay is a crucial process in regulating of gene expression, influencing cellular functions and organismal phenotypes. Precise identification of mRNA decay sites will help to understand the post-transcriptional control mechanisms that affect mRNA stability. In this presentation, we will discuss the prediction of mRNA decay sites using a DeBERTa transformer model. Our model is trained on sequences derived from direct RNA sequencing of polyadenylated RNA from HeLa cells, focusing on subsequences within mRNA coding regions to determine the presence of decay sites. Based on our model, we also analyze the role of single nucleotide variants in mRNA decay.
Week 12 – no lecture, individual consultations only
Join us at A502, Faculty of Informatics MU, on May 9th at 10 AM (CET) or [or on Zoom].
Week 13 – Marek Kadlčík
Join us at A502, Faculty of Informatics MU, on May 16th at 10 AM (CET) or [or on Zoom].
ICLR is among the most impactful ML conferences (A* in CORE ranking). In this presentation, we will report on the acceptance of our presented paper in the main track. We will comment on the main research direction in empirical NLP and show you the highlights of the research that struck our attention during the event.
Tips for readings, discussions, and presentation preparations:
- Top2Vec towardsdatascience.com/top2vec-new-way-of-topic-modelling
- How to Speak by Patrick Winston (YouTube video)
Žákovi, který se hrozil chyb, Mistr řekl: "Ti, kdo nedělají chyby, chybují nejvíc ze všech – nepokoušejí se o nic nového." Anthony de Mello: O cestě
To a student in danger, the Master said: "Those who do not make mistakes most of all – they do not try anything new." Anthony de Mello
[Martin Čermák]: Spatial Sharpening of Land Surface Temperature Data 7/3/2024
Visual Abstract
Abstract
Remote sensing is an essential tool in efficiently gathering information about the Earth's surface on a global scale. This information can be used in various areas ranging from agriculture and forest management to battling the effects of heat islands in urban environments.
There are many satellite missions with various types of sensors measuring different metrics. All come with their advantages and disadvantages. One of the most common trade-offs in remote sensing is spatial vs. temporal resolution due to physical constraints.
This thesis explores the existing ways of enhancing the spatial resolution of land surface temperature (LST) images using environmental variables obtained at finer resolution. Additionally, we form and evaluate a hypothesis that chosen predictors that model anthropogenic heat flux will aid in improving the accuracy of the overall sharpening model.
Slides
Lecture recording
Readings
[1] Firozjaei Mohammad Karimi et al., Satellite-derived land surface temperature spatial sharpening: A comprehensive review on current status and perspectives, European Journal of Remote Sensing, 2022, https://doi.org/10.1080/22797254.2022.2144764
[Martin Kňažovič, Jan Franěk]: Interactive Learning Tool Utilizing AI 14/3/2024
Abstract
With the recent improvements of the Large Language Models (LLMs), their usage became relevant for most daily tasks. Yet, many of those areas are underutilizing the full potential of LLMs. One such area is the learning domain, which our project focuses on.
Our main goal is to create a platform to help students by integrating LLMs and speeding up their learning process. However, there are obstacles in this area; for example, study materials are provided in different formats - videos, PDFs, and slides, which we want to touch on.
In the following presentation, we will analyze available tools for learning enhancement, propose a possible solution, and evaluate its results.
Visual Abstract
Slides
Lecture recording
Readings
- TBA
Catering
Modest: water and cakes.
[Katarína Hudcovicová]: Identifying the Limits of Transformers when Performing Model-Checking with Natural Language 11. 4. 2024
Abstract
Previous works on natural language inference have examined how well transformer models can reason with text. But what was still lacking was addressing whether they could understand the logical semantics in natural language. The reason is mainly because the previously studied logical problems can be, depending on their structure, more or less computationally complex. Therefore, it is unclear whether the reason for lower performance is due to the difference in computational complexity or the inability to comprehend the logical semantics of natural language. The authors chose the model-checking problem, as their computational complexity is always PTIME. The results suggested that the form and type of language used significantly affect how well the transformer models perform. They can grasp some logical meanings in natural language but still fall short when learning the underlying algorithm of model-checking problems.
Slides
Lecture Recordings
Readings
Catering
Překvapení.
[Jiří Žák]: Invoices Recognition with Large Language Models 18. 4.
Visual Abstract
Abstract
This research centers on automated invoice processing, entailing an analysis of existing methods and systems to construct a comprehensive overview. The objective is to develop a pipeline based on established software and to construct a corresponding testing framework. Leveraging this foundation, the aim is to refine the pipeline and evaluate its efficacy within the established testing framework. The subsequent findings shed light on the performance and potential enhancements of the automated invoice processing pipeline.
Lecture Recordings
Readings
- Šárka Ščavnická et al.: Towards General Document Understanding through Question Answering https://nlp.fi.muni.cz/raslan/2022/paper17.pdf
- Martin Geletka et al.:
Information Extraction from Business Documents
A Case Study https://nlp.fi.muni.cz/raslan/2022/paper18.pdf
- Rossum AI: Docile https://docile.rossum.ai/
- Štěpán Šimša et al.:
DocILE Benchmark for Document Information Localization and Extraction https://arxiv.org/abs/2302.05658
Catering
None :-/.
[Zuzana Pitsmausová]: Feature Construction 25/4/2024
Abstract
Feature construction (FC) is a crucial step in the machine learning pipeline, as the quality of features can significantly impact the model's performance. This presentation aims to acquaint listeners with feature construction and briefly overview the state-of-the-art FC methods.
The primary focus of the presentation will be an experiment that was conducted using two FC frameworks based on genetic programming (GP) – Evolutionary Forest and M3GP.
Visual Abstract
Slides
Readings
- Vouk, B., Guid, M., Robnik-Sikonja, M.: Feature construction using explanations of individual predictions. Engineering Applications of Artificial Intelligence 120, 105823 (2023) https://doi.org/10.1016/j.engappai.2023.105823
- GP (figure): https://www.researchgate.net/figure/Genetic-programming-Tree-based-crossover_fig4_282769665
- H. Zhang, A. Zhou and H. Zhang: An Evolutionary Forest for Regression. IEEE Transactions on Evolutionary Computation, vol. 26, no. 4, pp. 735-749, Aug. 2022, https://doi.org/10.1109/TEVC.2021.3136667
- H. Zhang, A. Zhou, Q. Chen, B. Xue, and M. Zhang: SR-Forest: A Genetic Programming based Heterogeneous Ensemble Learning Method. IEEE Transactions on Evolutionary Computation, https://doi.org/10.1109/TEVC.2023.3243172 https://github.com/hengzhe-zhang/EvolutionaryForest
- J. E. Batista and S. Silva: Comparative study of classifier performance using automatic feature construction by M3GP. 2022 IEEE Congress on Evolutionary Computation (CEC), Padua, Italy, 2022, pp. 1-8, https://doi.org/10.1109/CEC55065.2022.9870343
- Muñoz, L., Trujillo, L., & Silva, S. (2015). M3GP - multiclass classification with GP. In Genetic Programming - 18th European Conference, EuroGP 2015, Proceedings (Vol. 9025, pp. 78-91). LNCS Vol. 9025. Springer, Cham. https://doi.org/10.1007/978-3-319-16501-1_7 https://github.com/jespb/Python-M3GP
Catering
TBA
[David Čechák]: Prediction of mRNA decay sites 2/5/2024
Visual abstract
TBA
Abstract
Messenger RNA (mRNA) decay is crucial in regulating gene expression and influencing cellular functions and organismal phenotypes. Precise identification of mRNA decay sites will help to understand the post-transcriptional control mechanisms that affect mRNA stability. In this presentation, we will discuss the prediction of mRNA decay sites using a DeBERTa transformer model. Our model is trained on sequences derived from direct RNA sequencing of polyadenylated RNA from HeLa cells, focusing on subsequences within mRNA coding regions to determine the presence of decay sites. Based on our model, we also analyze the role of single nucleotide variants in mRNA decay.
Slides
TBA
Lecture Recordings
Readings
- Determinants of Functional MicroRNA Targeting
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9880601/ - miRBind: A Deep Learning Method for miRNA Binding Classification
https://www.mdpi.com/2073-4425/13/12/2323 - Using Attribution Sequence Alignment to Interpret Deep Learning Models for
miRNA Binding Site Prediction https://www.mdpi.com/2079-7737/12/3/369
Catering
Water
[Marek Kadlčík, Michal Štefánik, Petr Sojka]: ICLR presentation breaking news and report 16/5/2024
Abstract
ICLR is among the most impactful NLP conferences (A* in CORE ranking). We will report on the conference and two papers presented at the ICLR workshops in this presentation. We will comment on the main research directions in ML and learning representations (i.e., LLM training) and show you the research highlights that struck our attention during the event.
Visual and Photo Gallery
Yoshua Bengio workshop keynote speaker and panelist
(photo by Petr Sojka)
Yoshua Bengio worrying about uncontrolled AGI
(photo by Petr Sojka)
Young researcher Marek Kadlčík presenting his poster Calc-X
(photo by Petr Sojka)
Your researcher, Michal Štefánik, presenting his poster to the lady from AI2
(photo by Petr Sojka)
Marek and Michal coping with the storm of interest about Calc-X
(photo by Petr Sojka)
Final minutes of the exhausting conference (6000 people registered 4 ICLR)
(photo by Petr Sojka)
Lecture recordings
Readings
- https://iclr.cc/
- https://openreview.net/forum?id=cFky5GcER_J
- https://openreview.net/forum?id=zBh79GuLNO
Catering
Meloun, sýry, hrozny,...