J 2018

ToTem: a tool for variant calling pipeline optimization

TOM, Nikola, O. TOM, Jitka MALČÍKOVÁ, Šárka PAVLOVÁ, Blanka KUBEŠOVÁ et. al.

Basic information

Original name

ToTem: a tool for variant calling pipeline optimization

Authors

TOM, Nikola (203 Czech Republic, belonging to the institution), O. TOM (203 Czech Republic), Jitka MALČÍKOVÁ (203 Czech Republic, belonging to the institution), Šárka PAVLOVÁ (203 Czech Republic, belonging to the institution), Blanka KUBEŠOVÁ (203 Czech Republic, belonging to the institution), T. RAUSCH (276 Germany), M. KOLARIK (203 Czech Republic), V. BENES (276 Germany), Vojtěch BYSTRÝ (203 Czech Republic, belonging to the institution) and Šárka POSPÍŠILOVÁ (203 Czech Republic, guarantor, belonging to the institution)

Edition

BMC Bioinformatics, London, BioMed Central, 2018, 1471-2105

Other information

Language

English

Type of outcome

Článek v odborném periodiku

Field of Study

10608 Biochemistry and molecular biology

Country of publisher

United Kingdom of Great Britain and Northern Ireland

Confidentiality degree

není předmětem státního či obchodního tajemství

References:

Impact factor

Impact factor: 2.511

RIV identification code

RIV/00216224:14740/18:00101855

Organization unit

Central European Institute of Technology

UT WoS

000436517200004

Keywords in English

Variant calling; Benchmarking; Next generation sequencing; Parameter optimization

Tags

International impact, Reviewed
Změněno: 13/3/2019 17:22, Mgr. Pavla Foltynová, Ph.D.

Abstract

V originále

Background: High-throughput bioinformatics analyses of next generation sequencing (NGS) data often require challenging pipeline optimization. The key problem is choosing appropriate tools and selecting the best parameters for optimal precision and recall. Results: Here we introduce ToTem, a tool for automated pipeline optimization. ToTem is a stand-alone web application with a comprehensive graphical user interface (GUI). ToTem is written in Java and PHP with an underlying connection to a MySQL database. Its primary role is to automatically generate, execute and benchmark different variant calling pipeline settings. Our tool allows an analysis to be started from any level of the process and with the possibility of plugging almost any tool or code. To prevent an over-fitting of pipeline parameters, ToTem ensures the reproducibility of these by using cross validation techniques that penalize the final precision, recall and F-measure. The results are interpreted as interactive graphs and tables allowing an optimal pipeline to be selected, based on the user's priorities. Using ToTem, we were able to optimize somatic variant calling from ultra-deep targeted gene sequencing (TGS) data and germline variant detection in whole genome sequencing (WGS) data. Conclusions: ToTem is a tool for automated pipeline optimization which is freely available as a web application at https://totern.software

Links

EF16_013/0001818, research and development project
Name: Modernizace a podpora výzkumných aktivit národní infrastruktury pro translační medicínu EATRIS-CZ
LM2015064, research and development project
Name: Český národní uzel Evropské infrastruktury pro translační medicínu (Acronym: EATRIS-ERIC-CZ)
Investor: Ministry of Education, Youth and Sports of the CR
LQ1601, research and development project
Name: CEITEC 2020 (Acronym: CEITEC2020)
Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/0968/2017, interní kód MU
Name: Nové přístupy ve výzkumu, diagnostice a terapii hematologických malignit V (Acronym: VýDiTeHeMA V)
Investor: Masaryk University, Category A
NV15-30015A, research and development project
Name: Analýza klonální heterogenity chronické lymfocytární leukemie pomoci sekvenování nové generace genu pro B-buněčný receptor. Národní studie.
NV15-31834A, research and development project
Name: Vliv selekce genomických poškození na průběh chronické lymfocytární leukémie
TE02000058, research and development project
Name: Centrum kompetence pro molekulární diagnostiku a personalizovanou medicínu (Acronym: MOLDIMED)
Investor: Technology Agency of the Czech Republic
692298, interní kód MU
Name: MEDGENET - Medical genomics and epigenomics network (Acronym: MEDGENET)
Investor: European Union, Spreading excellence and widening participation