Závěrečná práce: Samuel Kuziel: The quality assessment of high-quality near-complete genome assemblies
Bakalářská práce
The quality assessment of high-quality near-complete genome assemblies
Anotace
Genómové zostavy dosiahli úroveň kvality, pri ktorej tradičné metódy hodnotenia založené na k-meroch strácajú citlivosť a nedokážu spoľahlivo rozlíšiť medzi najkvalitnejšími zostavami. Táto práca predstavuje RAAQA, command-line nástroj na kvantitatívne hodnotenie kvality genómových zostáv s využitím metrík založených na zarovnaní a fázovaní. Nástroj poskytuje dva analytické moduly. Modul kvality zarovnania …více
Abstract
Genome assemblies have reached a level of quality where traditional k-mer-based evaluation methods lose sensitivity and can no longer reliably differentiate between the highest-quality assemblies. This thesis presents RAAQA, a command-line tool for quantitative genome assembly quality assessment using alignment-based and phasing-based metrics. The tool provides two analytical modules. The alignment …více
Zadání práce
The recent technological and algorithmical advancement allowed us to generate high-quality genome assemblies, with the estimated error rates of only 1 in 1 million basepairs, or lower. As a consequence, many traditional methods for the quality control of newly generated assemblies are no longer useful. For example, k-mer based quality estimates cannot differentiate well between high-quality and ultra high-quality genome assemblies. Therefore, new computational pipelines or tools that utilize different quality metrics are needed. The objective of this thesis is to implement a computational tool for the quantitative assessment of genome assembly quality using novel evaluation metrics. The proposed solution will address some of the limitations of traditional quality control approaches. The developed tool will support multiple complementary metrics for assessing assembly completeness and accuracy, including: 1) soft-clipped basepairs - indicating potential misassemblies or alignment inconsistencies, 2) mapping quality (MAPQ) - reflecting the overall confidence of read alignments across the genome, and an experimental module for calculating the 3) Hamming errors and switch errors - providing measures of haplotype phasing correctness. This module will be applicable only to Hifiasm assemblies with available .paf alignment files and parental data, and will serve to help scientists benchmark various assembly recipes. Finally, the tool will be implemented as a command-line application and the emphasis will be placed on usability and reproducibility, ensuring accessibility for biologists and genomic researchers.
25. 5. 2026 08:00, Mgr. Monika Čechová, Ph.D., učo 256590
Přílohy
hprc_results_20kb_5kb_part4.zip
visualise_mapq_softclip_examples.zip
hese_inputs_and_results.zip
hprc_results_20kb_5kb_part5.zip
raaqa-0.3.0.zip
hprc_results_20kb_5kb_part2.zip
hprc_summary_figures.zip
hprc_results_20kb_5kb_part3.zip
hprc_results_20kb_5kb_part1.zip
chrY_summary_figures.zip
chrY_results.zip
Práce na příbuzné téma
Seznam prací, které mají shodná klíčová slova.
-
Technical aspects of telomere length predictions from human sequencing data
Bc. Jan Petkov -
Celogenomové optické mapování – nová éra cytogenetiky?
Mgr. Kristína Handzušová -
The Most Important Skills in the Field of Professional Translation
Mgr. Roman Kuběna -
Současné možnosti detekce a analýzy proteinových interakcí
Mgr. Filip Trčka, Ph.D., učo 184375 -
Bioinformatická analýza glykosyltransferas z Mycobacterium tuberculosis
Mgr. Bc. Lucie Čtveráčková, Ph.D. -
Vizualizace elektrostatického potenciálu v proteinech
Mgr. Jiří Andrlík -
Vývoj bioinformatických nástrojů a databází pro proteinové inženýrství
Mgr. Jan Dvorský, Ph.D. -
PredictSNP onco: server pro automatickou analýzu mutací přispívajících k rozvoji rakoviny
Mgr. Adam Dobiáš




