SPIŠAKOVÁ, Viktória, Lukáš HEJTMÁNEK a Jakub HYNŠT. Nextflow in Bioinformatics: Executors Performance Comparison Using Genomics Data. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE. NETHERLANDS: ELSEVIER, 2023, roč. 142, May 2023, s. 328-339. ISSN 0167-739X. Dostupné z: https://dx.doi.org/10.1016/j.future.2023.01.009. |
Další formáty:
BibTeX
LaTeX
RIS
@article{2246279, author = {Spišaková, Viktória and Hejtmánek, Lukáš and Hynšt, Jakub}, article_location = {NETHERLANDS}, article_number = {May 2023}, doi = {http://dx.doi.org/10.1016/j.future.2023.01.009}, keywords = {Kubernetes;HPC;Cloud;Performance comparison;Genomics;Nextflow;Big data}, language = {eng}, issn = {0167-739X}, journal = {FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE}, title = {Nextflow in Bioinformatics: Executors Performance Comparison Using Genomics Data}, url = {https://www.sciencedirect.com/journal/future-generation-computer-systems/special-issue/10KV6BLSVBL}, volume = {142}, year = {2023} }
TY - JOUR ID - 2246279 AU - Spišaková, Viktória - Hejtmánek, Lukáš - Hynšt, Jakub PY - 2023 TI - Nextflow in Bioinformatics: Executors Performance Comparison Using Genomics Data JF - FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE VL - 142 IS - May 2023 SP - 328-339 EP - 328-339 PB - ELSEVIER SN - 0167739X KW - Kubernetes;HPC;Cloud;Performance comparison;Genomics;Nextflow;Big data UR - https://www.sciencedirect.com/journal/future-generation-computer-systems/special-issue/10KV6BLSVBL N2 - Processing big data is a computationally demanding task which has usually been fulfilled by HPC batch systems. These complex systems pose a challenge to scientists due to their cumbersome nature and changing environment. The scientists often lack deeper informatics understanding and experiment reproducibility is increasingly becoming a hard request on the research validity. A new computational paradigm — containers — are meant to contain all dependencies and persist the state which help reproducibility. They have gained a lot of popularity in the informatics community but HPC community remains skeptical and doubts that container platforms are appropriate for demanding tasks or that such infrastructure can reach significant performance. In this paper, we observe the performance of various infrastructure types (HPC, Kubernetes, local) on a Sarek Nextflow bioinformatics workflow with real life genomics data of various sizes. We analyze obtained workload trace and discuss pros and cons of utilized infrastructures. We also show some approaches perform better in terms of available resources but others are more suitable for diversified workflows. Based on the results, we provide recommendations for life science groups which plan to analyze data in large scale. ER -
SPIŠAKOVÁ, Viktória, Lukáš HEJTMÁNEK a Jakub HYNŠT. Nextflow in Bioinformatics: Executors Performance Comparison Using Genomics Data. \textit{FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE}. NETHERLANDS: ELSEVIER, 2023, roč.~142, May 2023, s.~328-339. ISSN~0167-739X. Dostupné z: https://dx.doi.org/10.1016/j.future.2023.01.009.
|