Závěrečná práce: Bc. Václav Novák, učo 505947: Similarity search system for molecular dynamics
Diplomová práce
Similarity search system for molecular dynamics
Anotace
Simulace molekulární dynamiky generují velké objemy strukturních dat, jejichž efektivní porovnávání a vyhledávání je při použití tradičních metod založených na strukturálním zarovnání výpočetně náročné. Tato práce navrhuje systém pro vyhledávání podobnosti v datech molekulární dynamiky založený na vektorových reprezentacích. Jsou zkoumány dva přístupy k vyhledávání podobnosti: statické a dynamické …více
Abstract
Molecular dynamics (MD) simulations generate large volumes of data that are difficult to compare and search efficiently using traditional alignment-based methods. This thesis proposes a similarity search system for molecular dynamics data based on vector embeddings. Both static and dynamic similarity search are investigated. Static search retrieves individual protein structures similar to a given query …více
Zadání práce
Molecular Dynamics (MD) simulations are crucial for understanding the behavior of complex molecular systems over time. As MD data repositories grow, efficient similarity searching becomes increasingly important for identifying structural similarities, functional changes, recurring patterns, and molecular mechanisms across different simulations. Large-scale similarity search systems heavily rely on the use of embedding-based retrieval. While there are many embedding methods for static proteins, no robust embedding for MD exists yet.
The goal of this thesis is to propose, implement and evaluate a similarity search method using embedding methods for static proteins within molecular dynamics simulations containing proteins as the primary biomolecule.
In particular, the student will:
- Report on the current state-of-the-art embedding methods for static proteins and survey the state of available vector representations for MD simulations.
- Propose and implement a k-NN search method that can determine which simulations contain a protein of interest.
- Propose and implement a k-NN search method that can determine what are the most similar conformations in the lifetime of the simulation to the protein of interest.
- Propose and implement a method that transforms a set of static embeddings for each selected simulation frame into one “embedding” to represent the entire simulation. This method will then be used within a k-NN search system to find the most similar simulation.
- Optionally, the student will use MD tool GROMACS (gmx cluster, https://manual.gromacs.org/current/onlinehelp/gmx-cluster.html) to perform a clustering analysis to inform selection of suitable frames within the simulation.
All the proposed methods will be evaluated using domain-specific metrics for structural similarity (TM-Score) and simulation similarity (Path Similarity Analysis - PSA) and on a representative set (at least tens to a hundred, optionally more) of real-world MD simulations from MDRepo (https://mdrepo.org/). The methods will be available as Docker images and open-sourced.
17. 12. 2025 08:57, RNDr. Terézia Slanináková, Ph.D., učo 445526
Konzultant
Práce na příbuzné téma
Seznam prací, které mají shodná klíčová slova.
-
Computational and NMR characterization of intrinsically disordered protein regions
Mgr. Vojtěch Zapletal, Ph.D., učo 357261 -
Paralelizace analýzy molekulárně dynamických simulací
Mgr. Jakub Štěpán, Ph.D. -
Self-organizing Similarity Search - The Social Network Approach
doc. RNDr. Jan Sedmidubský, Ph.D., učo 60474 -
Vyhledávání podobných obrázků tetování
Bc. Petr Hájek, učo 256613 -
Multi-Index Approach for Similarity Searching
RNDr. Martin Kyselák -
In silico predikce vazebných vlastností lektinu RS20L
Mgr. Michal Ďurech, Ph.D. -
Similarity Models for Human Motion Data
RNDr. Jakub Valčík, Ph.D. -
Homologní modelování a virtuální screening potenciálních inhibitorů intelektinu
Mgr. Veronika Horská




