👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization
doc. RNDr. Petr Sojka, Ph.D.
👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization

News

  • The course is a regular research seminar in the stated researched areas. It is mandatory that enrolled student has a presentation on a research topic (of interest, or her thesis, or he will talk about a research paper or area) during the term. Topics of presentations focus primarily (but not necessarily) on those related to the  group: machine learning, information retrieval, representation learning, and scientific visualization.
  • There is a discussion group with official course information and a communication channel in addition to the course outline below: watch both frequently!

Topics and Course Outline

Week 1Introduction

Join us at A502 Faculty of Informatics MU on February 22nd at 10 AM (CET) [or on Zoom].

  1. [9:50 Catering preparation (tea preparation, cakes,...]
  2. 10:00 Class introduction, warm-up round-up discussion (expectation, topics/expertise/background, suggested presentations, and readings). Bring your research presentation offers (one slide) and ideas to present, read, study, and discuss! Take inspiration from last term's presentations.
  3. 10:30 The importance of "selling the ideas and work." Picking the right topics and questions, researching "big issues." Picking the right publication forums (in CS and NLP). The danger of Tyranny of metrics.
  4. 10:50 Motivating video: DEK's advice to young students.
  5. 10:55 Preparation of schedule of talks for this term, topics to cover.
  6. 11:20 Varia, socializing (team builder wanted!), lunch.

Week 2  Research strategy, evaluation, and course schedule planning

Join us at A502, Faculty of Informatics MU, on February 29th at 10 AM (CET) [or on ].

  • [9:50 Catering preparation (tea preparation, cakes,...]
  • 10:00 Principles of research communication and scientific work: Put readers in your place!
    Specifics of CS research and doctoral studies and their evaluation at FI MU: CS conference rankings
  • 10:30 Importance of "selling the ideas and work," picking the right topics and questions, researching "big issues," and picking the right publication forums (in CS and NLP). An h-index as a measure of impact. The danger of Tyranny of metric
  • 10:45 Class round-up discussion (suggested presentations, readings, and presentation course schedule).
  • Week 3 – Martin Čermák:  Spatial Sharpening of Land Surface Temperature Data

    Join us at A502, Faculty of Informatics MU, on March 7th at 10 AM (CET) [or on Zoom].

    Remote sensing is an essential tool in efficiently gathering information about the Earth's surface on a global scale. This information can be used in various areas ranging from agriculture and forest management to battling the effects of heat islands in urban environments. There are many satellite missions with various types of sensors measuring different metrics. All come with their advantages and disadvantages. One of the most common trade-offs in remote sensing is spatial vs. temporal resolution due to physical constraints. This thesis explores the existing ways of enhancing the spatial resolution of land surface temperature (LST) images using environmental variables obtained at finer resolution. Additionally, we form and evaluate a hypothesis that chosen predictors that model anthropogenic heat flux will aid in improving the accuracy of the overall sharpening model.

    Chapter contains:
    1
    Image
    1
    Study Materials
    1
    Video
    1
    Study text
    Teacher recommends to study from 25/2/2024 to 9/3/2024.

    Week 4 – Martin Kňažovič, Jan Franěk: Interactive Learning Tool Utilizing AI

    Join us at A502, Faculty of Informatics MU, on March 14th at 10 AM (CET) or [or on Zoom].

    With the recent improvements of the Large Language Models (LLMs), their usage became relevant for most daily tasks. Yet, many of those areas are underutilizing the full potential of LLMs. One such area is the learning domain, which our project focuses on. Our main goal is to create a platform to help students by integrating LLMs and speeding up their learning process. However, there are obstacles in this area; for example, study materials are provided in different formats - videos, PDFs, and slides, which we want to touch on. In the following presentation, we will analyze available tools for learning enhancement, propose a possible solution, and evaluate its results.

    Chapter contains:
    1
    Image
    1
    PDF
    1
    Video
    1
    Study text
    Teacher recommends to study from 7/3/2024 to 22/3/2024.

    Week 5 – 21. 3. No contact meeting (EACL)

    Week 6 – 28. 3. No contact meeting (spa)

    Week 7 – 4. 4. No contact meeting (spa) 

    Week 8 – Katarína Hudcovicová

    Join us at A502, Faculty of Informatics MU, on April 11th at 10 AM (CET) and on Zoom.

    Previous works on natural language inference have examined how well transformer models can reason with text. But what was still lacking was addressing whether they could understand the logical semantics in natural language. The reason is mainly because the previously studied logical problems can be, depending on their structure, more or less computationally complex. Therefore, it is unclear whether the reason for lower performance is due to the difference in computational complexity or the inability to comprehend the logical semantics of natural language. The authors chose the model-checking problem, as their computational complexity is always PTIME. The results suggested that the form and type of language used significantly affect how well the transformer models perform. They can grasp some logical meanings in natural language but still fall short when learning the underlying algorithm of model-checking problems.

    Chapter contains:
    1
    Image
    1
    Study Materials
    1
    Video
    1
    Study text
    Teacher recommends to study from 5/4/2024 to 12/4/2024.

    Week 9 – Jiří Žák 

    Join us at A502, Faculty of Informatics MU, on April 18th at 10 AM (CET) (the teacher will connect via Zoom).

    This research centers on automated invoice processing, entailing an analysis of existing methods and systems to construct a comprehensive overview. The objective is to develop a pipeline based on established software and to construct a corresponding testing framework. Leveraging this foundation, the aim is to refine the pipeline and evaluate its efficacy within the established testing framework. The subsequent findings shed light on the performance and potential enhancements of the automated invoice processing pipeline.

    Chapter contains:
    1
    Image
    1
    Video
    1
    Study text
    Teacher recommends to study from 12/4/2024 to 19/4/2024.

    Week 10 – Zuzana Pitsmausová

    Join us at A502, Faculty of Informatics MU, on April 25th at 10 AM (CET).

    Feature construction (FC) is a crucial step in the machine learning pipeline, as the quality of features can significantly impact the model's performance. This presentation aims to acquaint listeners with feature construction and briefly overview the state-of-the-art FC methods. The primary focus of the presentation will be an experiment that was conducted using two FC frameworks based on genetic programming (GP) – Evolutionary Forest and M3GP.

    Chapter contains:
    1
    Image
    1
    PDF
    1
    Study text
    Teacher recommends to study from 17/4/2024 to 26/4/2024.

    Week 11 – David Čechák

    Join us at A502, Faculty of Informatics MU, on May 2nd at 10 AM (CET) [or on Zoom].

    Messenger RNA (mRNA) decay is a crucial process in regulating of gene expression, influencing cellular functions and organismal phenotypes. Precise identification of mRNA decay sites will help to understand the post-transcriptional control mechanisms that affect mRNA stability. In this presentation, we will discuss the prediction of mRNA decay sites using a DeBERTa transformer model. Our model is trained on sequences derived from direct RNA sequencing of polyadenylated RNA from HeLa cells, focusing on subsequences within mRNA coding regions to determine the presence of decay sites. Based on our model, we also analyze the role of single nucleotide variants in mRNA decay.

    Chapter contains:
    1
    Video
    1
    Study text
    Teacher recommends to study from 30/4/2024 to 10/5/2024.

    Week 12 – no lecture, individual consultations only

    Join us at A502, Faculty of Informatics MU, on May 9th at 10 AM (CET) or [or on Zoom].

    Week 13 – Marek Kadlčík

    Join us at A502, Faculty of Informatics MU, on May 16th at 10 AM (CET) or [or on ].

    ICLR is among the most impactful ML conferences (A* in CORE ranking). In this presentation, we will report on the acceptance of our presented paper in the main track. We will comment on the main research direction in empirical NLP and show you the highlights of the research that struck our attention during the event.

    Chapter contains:
    10
    Image
    1
    Video
    1
    Study text
    Teacher recommends to study from 9/5/2024 to 24/5/2024.

    Tips for readings, discussions, and presentation preparations:

    1. Top2Vec towardsdatascience.com/top2vec-new-way-of-topic-modelling 
    2. How to Speak by Patrick Winston (YouTube video)

    Žákovi, který se hrozil chyb, Mistr řekl: "Ti, kdo nedělají chyby, chybují nejvíc ze všech – nepokoušejí se o nic nového." Anthony de Mello: O cestě

    To a student in danger, the Master said: "Those who do not make mistakes most of all – they do not try anything new." Anthony de Mello

    [Martin Čermák]: Spatial Sharpening of Land Surface Temperature Data 7/3/2024

    Visual Abstract

    Abstract

    Remote sensing is an essential tool in efficiently gathering information about the Earth's surface on a global scale. This information can be used in various areas ranging from agriculture and forest management to battling the effects of heat islands in urban environments.

    There are many satellite missions with various types of sensors measuring different metrics. All come with their advantages and disadvantages. One of the most common trade-offs in remote sensing is spatial vs. temporal resolution due to physical constraints.

    This thesis explores the existing ways of enhancing the spatial resolution of land surface temperature (LST) images using environmental variables obtained at finer resolution. Additionally, we form and evaluate a hypothesis that chosen predictors that model anthropogenic heat flux will aid in improving the accuracy of the overall sharpening model.

    Slides

    Lecture recording

    Readings

    [1] Firozjaei Mohammad Karimi et al., Satellite-derived land surface temperature spatial sharpening: A comprehensive review on current status and perspectivesEuropean Journal of Remote Sensing, 2022, https://doi.org/10.1080/22797254.2022.2144764 

    [Martin Kňažovič, Jan Franěk]: Interactive Learning Tool Utilizing AI 14/3/2024

    Abstract

    With the recent improvements of the Large Language Models (LLMs), their usage became relevant for most daily tasks. Yet, many of those areas are underutilizing the full potential of LLMs. One such area is the learning domain, which our project focuses on.

    Our main goal is to create a platform to help students by integrating LLMs and speeding up their learning process. However, there are obstacles in this area; for example, study materials are provided in different formats - videos, PDFs, and slides, which we want to touch on.

    In the following presentation, we will analyze available tools for learning enhancement, propose a possible solution, and evaluate its results.

    Visual Abstract

    Slides

    Lecture recording

    Readings

    1. TBA

    Catering

    Modest: water and cakes.

    [Katarína Hudcovicová]: Identifying the Limits of Transformers when Performing Model-Checking with Natural Language 11. 4. 2024


    Abstract

    Previous works on natural language inference have examined how well transformer models can reason with text. But what was still lacking was addressing whether they could understand the logical semantics in natural language. The reason is mainly because the previously studied logical problems can be, depending on their structure, more or less computationally complex. Therefore, it is unclear whether the reason for lower performance is due to the difference in computational complexity or the inability to comprehend the logical semantics of natural language. The authors chose the model-checking problem, as their computational complexity is always PTIME. The results suggested that the form and type of language used significantly affect how well the transformer models perform. They can grasp some logical meanings in natural language but still fall short when learning the underlying algorithm of model-checking problems.

    Slides

    Lecture Recordings

    Readings

    1. Tharindu Madusanka and Ian Pratt-Hartmann and Riza Theresa Batista-Navarro: Identifying the limits of transformers when performing model-checking with natural language  

    Catering

    Překvapení.

    [Jiří Žák]: Invoices Recognition with Large Language Models 18. 4.

    Visual Abstract

    Abstract

    This research centers on automated invoice processing, entailing an analysis of existing methods and systems to construct a comprehensive overview. The objective is to develop a pipeline based on established software and to construct a corresponding testing framework. Leveraging this foundation, the aim is to refine the pipeline and evaluate its efficacy within the established testing framework. The subsequent findings shed light on the performance and potential enhancements of the automated invoice processing pipeline.

    Lecture Recordings

    Readings

    1. Šárka Ščavnická et al.: Towards General Document Understanding through Question Answering  https://nlp.fi.muni.cz/raslan/2022/paper17.pdf 
    2. Martin Geletka et al.:  Information Extraction from Business Documents A Case Study  https://nlp.fi.muni.cz/raslan/2022/paper18.pdf 
    3. Rossum AI: Docile  https://docile.rossum.ai/ 
    4. Štěpán Šimša et al.:  DocILE Benchmark for Document Information Localization and Extraction https://arxiv.org/abs/2302.05658 

    Catering

    None :-/.

    [Zuzana Pitsmausová]: Feature Construction 25/4/2024

    Abstract

    Feature construction (FC) is a crucial step in the machine learning pipeline, as the quality of features can significantly impact the model's performance. This presentation aims to acquaint listeners with feature construction and briefly overview the state-of-the-art FC methods.

    The primary focus of the presentation will be an experiment that was conducted using two FC frameworks based on genetic programming (GP) – Evolutionary Forest and M3GP.

    Visual Abstract

    Slides

    Feature construction
    PDF to download

    Readings

    1. Vouk, B., Guid, M., Robnik-Sikonja, M.: Feature construction using explanations of individual predictions. Engineering Applications of Artificial Intelligence 120, 105823 (2023) https://doi.org/10.1016/j.engappai.2023.105823
    2. GP (figure): https://www.researchgate.net/figure/Genetic-programming-Tree-based-crossover_fig4_282769665 
    3. H. Zhang, A. Zhou and H. Zhang: An Evolutionary Forest for Regression. IEEE Transactions on Evolutionary Computation, vol. 26, no. 4, pp. 735-749, Aug. 2022, https://doi.org/10.1109/TEVC.2021.3136667  
    4. H. Zhang, A. Zhou, Q. Chen, B. Xue, and M. Zhang: SR-Forest: A Genetic Programming based Heterogeneous Ensemble Learning Method. IEEE Transactions on Evolutionary Computation, https://doi.org/10.1109/TEVC.2023.3243172  https://github.com/hengzhe-zhang/EvolutionaryForest 
    5. J. E. Batista and S. Silva: Comparative study of classifier performance using automatic feature construction by M3GP. 2022 IEEE Congress on Evolutionary Computation (CEC), Padua, Italy, 2022, pp. 1-8, https://doi.org/10.1109/CEC55065.2022.9870343 
    6. Muñoz, L., Trujillo, L., & Silva, S. (2015). M3GP - multiclass classification with GP. In Genetic Programming - 18th European Conference, EuroGP 2015, Proceedings (Vol. 9025, pp. 78-91). LNCS Vol. 9025. Springer, Cham.  https://doi.org/10.1007/978-3-319-16501-1_7  

    Catering

    TBA

    [David Čechák]: Prediction of mRNA decay sites 2/5/2024

    Visual abstract

    TBA

    Abstract

    Messenger RNA (mRNA) decay is crucial in regulating gene expression and influencing cellular functions and organismal phenotypes. Precise identification of mRNA decay sites will help to understand the post-transcriptional control mechanisms that affect mRNA stability. In this presentation, we will discuss the prediction of mRNA decay sites using a DeBERTa transformer model. Our model is trained on sequences derived from direct RNA sequencing of polyadenylated RNA from HeLa cells, focusing on subsequences within mRNA coding regions to determine the presence of decay sites. Based on our model, we also analyze the role of single nucleotide variants in mRNA decay.

    Slides

    TBA

    Lecture Recordings


    Readings

    1. Determinants of Functional MicroRNA Targeting
      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9880601/
    2. miRBind: A Deep Learning Method for miRNA Binding Classification
      https://www.mdpi.com/2073-4425/13/12/2323
    3. Using Attribution Sequence Alignment to Interpret Deep Learning Models for
      miRNA Binding Site Prediction https://www.mdpi.com/2079-7737/12/3/369

    Catering

    Water 

    [Marek Kadlčík, Michal Štefánik, Petr Sojka]: ICLR presentation breaking news and report 16/5/2024

    Abstract

    ICLR is among the most impactful NLP conferences (A* in CORE ranking). We will report on the conference and two papers presented at the ICLR workshops in this presentation. We will comment on the main research directions in ML and learning representations (i.e., LLM training) and show you the research highlights that struck our attention during the event.

    Visual and Photo Gallery

    ICLR 2024 FI MU team
    From right: Marek Kadlčík, Michal Štefánik and Petr Sojka (photo by Petr Sojka)

    Lecture recordings

    Readings

    Catering

    Meloun, sýry, hrozny,...

    Příkladný catering z donesených ingrediencí (podzimní běh předmětu pro inspiraci)
    camembert, hrozny, jahody, cukroví, miniklobásky a salámy, a mnoho dalšího, co vidíte na fotce.


    Other

    Chlubení se nádhernými dárky studentů vyučujícímu (tričko a hrníček :-).
    Příběh za oběma dárky je ve videu https://www.youtube.com/watch?v=v678Em6qyzk a můj vztah k němu je například tady: https://www.fi.muni.cz/app/news?feed_id=1&lang=cs&id=3244&archive=1