👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization

doc. RNDr. Petr Sojka, Ph.D.

👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization

doc. RNDr. Petr Sojka, Ph.D.

👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization

Info

News

The course is a regular research seminar in the stated researched areas. It is mandatory that enrolled student has a presentation on a research topic during the term. Topics of presentations are those focused primarily (but not necessarily) on those of the Math Information Retrieval@FI group: machine learning, information retrieval, representation learning, and scientific visualization.
There is a discussion group with official course information and a communication channel, in addition to the course outline below: watch both frequently!

Topics and Course Outline

Week 1

Join us at A502 the Faculty of Informatics MU on September 21st at 10 AM (CET).

10:00 Class introduction, warm-up round-up discussion (expectation, topics/expertise/background, suggested presentations, and readings). Why do you go to college? Bring your research presentation offers and ideas to present, read, study, and discuss!
10:30 Principles of research communication and scientific work, Reading a scientific paper. Put readers in your place!
Specifics of CS research and doctoral studies and their evaluation at FI MU: CS conference rankings
10:40 Importance of "selling the ideas and work", picking the right topics and questions, researching "big issues", picking the right publication forums (in CS and NLP), and h-index as a measure of impact. The danger of Tyranny of metrics.
10:50 Motivating video: DEK's advice to young students.
10:55 Preparation of schedule of talks for this term, topics to cover.
11:20 Varia, socializing (team builder wanted!), lunch.

Week 2 – no meeting (state holiday)

Week 3 – Michal Štefánik: Can In-context Learners Learn a New Reasoning Concept from Demonstrations?

Join us in room A502 at FI MU on October 5th at 10 AM (CET) [or on Zoom.].

[Michal Štefánik]: Can In-context Learners Learn a new Reasoning Concept from Demonstrations? 5. 10. 2023

Přejít

Kapitola obsahuje:

Obrázek

Studijní text

Učitel doporučuje studovat od 24. 9. 2023 do 7. 10. 2023.

Week 4 – Denisa Šrámková: Interpretability of Deep Learning

Canceled due to the speaker's illness. Will be presented next week together with Adam's talk. ~~Join us in room A502 at FI MU on October 12th at 9:15 AM! (CET) [or on Zoom.]~~

[Denisa Šrámková]: Interpretability of Binary Protein Knot Classification 19. 10. 2023

Přejít

Proteins with knotted backbones are an exceedingly rare phenomenon, and the mechanisms governing the knot formation and functional implications remain poorly understood. We fine-tuned the ProtBert-BFD Transformer to classify proteins as either knotted or unknotted solely from their primary structure. As a training set, we used a collection of proteins from selected protein families whose 3D structures were predicted by AlphaFold2. The knotted status of proteins was assigned using Topoly (polymer topology analysis tool). While the model exhibits high accuracy (98%) in predicting a protein's knot status, it does not directly provide a biological explanation or pinpoint which regions of the protein contribute to knot formation. To uncover this phenomenon, we propose a patching technique: a sliding window (patch) replacing part of the sequence and therefore testing the importance of this part for the knot formation. We tested this method on proteins from the SPOUT family and found that the most influential patches reside within the C-terminal portion of the knot core, which is also responsible for substrate binding.

Kapitola obsahuje:

Obrázek

PDF

Video

Studijní text

Učitel doporučuje studovat od 5. 10. 2023 do 20. 10. 2023.

Week 5 – Adam Hájek: De-novo identification of small molecules from their GC-EI-MS spectra

Join us in room A502 at FI MU on October 19th at 10:00 AM (CET) [or on Zoom.]

[Adam Hájek]: De-Novo Identification of Small Molecules from their GC-EI-MS Spectra 19. 10. 2023

Přejít

Mass spectrometry is an analytical technique used to determine the mass-to-charge ratio of ions. When combined with chromatography, it becomes a powerful tool for identifying molecules in chemical samples. Typically, the analysis of experimental spectra relies on comparing them to a well-maintained database of reference data. However, a significant challenge arises because existing spectral databases don't adequately cover the vast chemical space. To address this limitation, recent attention has shifted towards machine learning-based de-novo methods. These methods can directly derive the molecular structure from the mass spectrum. In this context, we introduce a novel approach that addresses a specific use case involving GC-EI-MS spectra. This case is particularly challenging because it lacks additional information from the initial stage of MS/MS experiments, which previous methods depend on.

Kapitola obsahuje:

Obrázek

Video

Studijní text

Učitel doporučuje studovat od 13. 10. 2023 do 21. 10. 2023.

Week 6 –

canceled (no speaker found, we will meet on Week 14 instead)...

Week 7 – Adam Hájek: De-novo identification (cont.) + Vlastimil Martinek: TBA

Join us in room A502 at FI MU on November 2nd at 10 AM (CET) [or on Zoom.]

[Vlastimil Martinek]: Predicting RNA Halflife 2. 11. 2023

Přejít

Kapitola obsahuje:

Obrázek

Video

Studijní text

Učitel doporučuje studovat od 19. 10. 2023.

Week 8 – Dávid Meluš: Enhancing Quality of Optical Character Recognition for Financial Document Processing

Join us in room A502 at FI MU on November 9th at 10 AM (CET) [or on Zoom.]

[Dávid Meluš, Šárka Ščavnická]: Intelligent Back Office Work in Progress Thesis Reports 9. 11. 2023

Přejít

[Dávid Meluš]: Enhancing Quality of Optical Character Recognition for Financial Document Processing [Šárka Ščavnická]: CIVQA

Kapitola obsahuje:

PDF

Video

Studijní text

Učitel doporučuje studovat od 7. 11. 2023 do 16. 11. 2023.

Week 9 – David Valecký: Transformers in Computer Vision

Join us in room A502 at FI MU on November 16th at 10 AM (CET) [or on Zoom.]

[David Valecký]: Transformers in Computer Vision 16. 11. 2023

Přejít

Kapitola obsahuje:

PDF

Video

Studijní text

Učitel doporučuje studovat od 9. 11. 2023 do 30. 11. 2023.

Week 10 – Marek Kadlčík: TBA

Join us in room A502 at FI MU on November 23rd at 10 AM (CET) [or on Zoom.]

[Marek Kadlčík]: Teaching Models to Use a Calculator for Solving Math Word Problems 23. 11. 2023

Přejít

Large language models (LLMs) are commonly used for solving natural language tasks like question answering or generating text. However, their outputs can be outdated, factually incorrect, or untruthful. In particular, LLMs are notoriously bad at arithmetic computation. A promising way to mitigate this problem is to allow LLMs to interact with external tools, such as a calculator, a computer algebra system, or a code interpreter. In this talk, we will cover the training of calculator-using models, compare their capability of solving math word problems to vanilla LLM baselines, and discuss possible improvements in the training workflow.

Kapitola obsahuje:

Obrázek

PDF

Video

Studijní text

Učitel doporučuje studovat od 16. 11. 2023 do 24. 11. 2023.

Week 11 – Jan Rodák: TBA

Join us in room A502 at FI MU on November 30th at 10 AM (CET) [or on Zoom.]

[Jan Rodák]: Uses Machine Learning for Security Compliance 30. 11. 2023

Přejít

Cybersecurity has become increasingly important in recent years. Many organisations such as large enterprises, governments, hospitals, the military and airports use computers to communicate, process data, serve customers, etc. These computers may contain sensitive data or be part of a company's critical infrastructure. Protecting and securing these machines is becoming an increasingly important issue for many companies. One approach to securing the system is an automated security audit (according to the organization's security policy). These security policies define, through a set of rules and recommendations, what a secure system should look like for specific use cases. SCAP is used to check whether the system complies with the policy. There are ideas on how to use machine learning to improve user-friendliness and simplify the work of developers.

Kapitola obsahuje:

PDF

Video

Studijní text

Učitel doporučuje studovat od 14. 11. 2023 do 30. 11. 2023.

Week 12 – Andrej Kubanda: Forecasting of glycemia

Join us in room A502 at FI MU on December 7th at 10 AM (CET) [or on Zoom.]

[Andrej Kubanda]: Forecasting of glycemia 7. 12. 2023

Přejít

Kapitola obsahuje:

PDF

Video

Studijní text

Učitel doporučuje studovat od 30. 11. 2023 do 7. 12. 2023.

Week 13 – David Čechák: TBA

[David Čechák]: Understanding miRNA Binding Behavior Through Deep Learning Models 14. 12. 2023

Přejít

MicroRNAs are small non-coding RNAs that play a central role in many molecular processes, but the exact rules of their activity are not known. One of the processes is gene regulation, pairing with the Ago protein and, as a pair, binds to mRNA. The common techniques used in this field are manual feature selection followed by a classical ML method. However, these methods are greatly dependent on short sequence patterns called a seed. As a result, they work well on conventional binding caused by the seed, however, they lack in less frequent cases of unconventional binging. We build an explainable CNN model for the binding of miRNA and a subsequence of mRNA. Subsequently, we use this model to scan the whole mRNA sequence (transcript) and produce a signal of SHAP values. This scanning method creates a signal sample for each transcript. We try to correlate the signal with a fold change in gene expression the miRNA would cause if introduced in large quantities to the environment. We build a CNN + RNN regression to predict the fold change based on the signal. We hypothesize that using the signal could help to shield the model overfitting on simple sequence patterns and help with cases where the conventional seed pattern is not strongly present.

Kapitola obsahuje:

PDF

Video

Studijní text

Učitel doporučuje studovat od 30. 11. 2023 do 8. 12. 2023.

Join us in room A502 at FI MU on December 14th at 10 AM (CET) [or on Zoom.]

Week 14 – [Michal Štefánik, Marek Kadlčík]: EMNLP breaking news

Join us in room A502 at FI MU on December 21st at 10 AM (CET) [or on

[Michal Štefánik]: EMNLP presentation breaking news and report 21. 12. 2023

Přejít

EMNLP is among the most impactful NLP conferences (A* in CORE ranking). In this presentation, we will report on the acceptance of our presented paper in the main track. We will comment on the main research direction in empirical NLP and show you the highlights of the research that struck our attention during the event.

Kapitola obsahuje:

Obrázek

Studijní text

Učitel doporučuje studovat od 7. 12. 2023 do 22. 12. 2023.

Tips for readings, discussions, and presentation preparations:

Žákovi, který se hrozil chyb, Mistr řekl: "Ti, kdo nedělají chyby, chybují nejvíc ze všech – nepokoušejí se o nic nového." Anthony de Mello: O cestě

To a student who was in danger, the Master said: "Those who do not make mistakes most of all – they do not try anything new." Anthony de Mello

Předchozí

Následující

👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization
- Nyní studovat
  
  [Michal Štefánik]: Can In-context Learners Learn a new Reasoning Concept from Demonstrations? 5. 10. 2023
- Nyní studovat
  
  [Denisa Šrámková]: Interpretability of Binary Protein Knot Classification 19. 10. 2023
- Nyní studovat
  
  [Adam Hájek]: De-Novo Identification of Small Molecules from their GC-EI-MS Spectra 19. 10. 2023
- Nyní studovat
  
  [Vlastimil Martinek]: Predicting RNA Halflife 2. 11. 2023
- Nyní studovat
  
  [Dávid Meluš, Šárka Ščavnická]: Intelligent Back Office Work in Progress Thesis Reports 9. 11. 2023
- Nyní studovat
  
  [David Valecký]: Transformers in Computer Vision 16. 11. 2023
- Nyní studovat
  
  [Marek Kadlčík]: Teaching Models to Use a Calculator for Solving Math Word Problems 23. 11. 2023
- Nyní studovat
  
  [Jan Rodák]: Uses Machine Learning for Security Compliance 30. 11. 2023
- Nyní studovat
  
  [Andrej Kubanda]: Forecasting of glycemia 7. 12. 2023
- Nyní studovat
  
  [David Čechák]: Understanding miRNA Binding Behavior Through Deep Learning Models 14. 12. 2023
- Nyní studovat
  
  [Michal Štefánik]: EMNLP presentation breaking news and report 21. 12. 2023

Operace

Prohlédnout vše

Interaktivní osnova

News

Topics and Course Outline

Week 1

Week 2 – no meeting (state holiday)

Week 3 – Michal Štefánik: Can In-context Learners Learn a New Reasoning Concept from Demonstrations?

Week 4 – Denisa Šrámková: Interpretability of Deep Learning

Week 5 – Adam Hájek: De-novo identification of small molecules from their GC-EI-MS spectra

Week 6 –

Week 7 – Adam Hájek: De-novo identification (cont.) + Vlastimil Martinek: TBA

Week 8 – Dávid Meluš: Enhancing Quality of Optical Character Recognition for Financial Document Processing

Week 9 – David Valecký: Transformers in Computer Vision

Week 10 – Marek Kadlčík: TBA

Week 11 – Jan Rodák: TBA

Week 12 – Andrej Kubanda: Forecasting of glycemia

Week 13 – David Čechák: TBA

Week 14 – [Michal Štefánik, Marek Kadlčík]: EMNLP breaking news

Operace