Bioinformatic pipelines for whole transcriptome sequencing data
exploitation in leukemia patients with complex structural
variants

J 2019

Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants

HYNŠT, Jakub, Karla PLEVOVÁ, Lenka RADOVÁ, Vojtěch BYSTRÝ, Karol PÁL et. al.

Basic information

Original name

Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants

Authors

HYNŠT, Jakub (203 Czech Republic, belonging to the institution), Karla PLEVOVÁ (203 Czech Republic, belonging to the institution), Lenka RADOVÁ (203 Czech Republic, belonging to the institution), Vojtěch BYSTRÝ (203 Czech Republic, belonging to the institution), Karol PÁL (703 Slovakia, belonging to the institution) and Šárka POSPÍŠILOVÁ (203 Czech Republic, guarantor, belonging to the institution)

Edition

PeerJ, London, PEERJ INC, 2019, 2167-8359

Other information

Language

English

Type of outcome

Článek v odborném periodiku

Field of Study

30204 Oncology

Country of publisher

United Kingdom of Great Britain and Northern Ireland

Confidentiality degree

není předmětem státního či obchodního tajemství

References:

URL

Impact factor

Impact factor: 2.379

RIV identification code

RIV/00216224:14740/19:00108521

Organization unit

Central European Institute of Technology

DOI

http://dx.doi.org/10.7717/peerj.7071

UT WoS

000471213700009

Keywords in English

Chromothripsis; Complex structural variants; Fusion gene; Gene expression; Bioinformatic pipeline; Next-generation sequencing; Leukemia; Transcriptomics; Chronic lymphocytic leukemia; Statistics

Abstract

V originále

Background. Extensive genome rearrangements, known as chromothripsis, have been recently identified in several cancer types. Chromothripsis leads to complex structural variants (cSVs) causing aberrant gene expression and the formation of de novo fusion genes, which can trigger cancer development, or worsen its clinical course. The functional impact of cSVs can be studied at the RNA level using whole transcriptome sequencing (total RNA-Seq). It represents a powerful tool for discovering, profiling, and quantifying changes of gene expression in the overall genomic context. However, bioinformatic analysis of transcriptomic data, especially in cases with cSVs, is a complex and challenging task, and the development of proper bioinformatic tools for transcriptome studies is necessary. Methods. We designed a bioinformatic workflow for the analysis of total RNA-Seq data consisting of two separate parts (pipelines): The first pipeline incorporates a statistical solution for differential gene expression analysis in a biologically heterogeneous sample set. We utilized results from transcriptomic arrays which were carried out in parallel to increase the precision of the analysis. The second pipeline is used for the identification of de novo fusion genes. Special attention was given to the filtering of false positives (FPs), which was achieved through consensus fusion calling with several fusion gene callers. We applied the workflow to the data obtained from ten patients with chronic lymphocytic leukemia (CLL) to describe the consequences of their cSVs in detail. The fusion genes identified by our pipeline were correlated with genomic break-points detected by genomic arrays. Results. We set up a novel solution for differential gene expression analysis of individual samples and de novo fusion gene detection from total RNA-Seq data. The results of the differential gene expression analysis were concordant with results obtained by transcriptomic arrays, which demonstrates the analytical capabilities of our method. We also showed that the consensus fusion gene detection approach was able to identify true positives (TPs) efficiently. Detected coordinates of fusion gene junctions were in concordance with genomic breakpoints assessed using genomic arrays. Discussion. By applying our methods to real clinical samples, we proved that our approach for total RNA-Seq data analysis generates results consistent with other genomic analytical techniques. The data obtained by our analyses provided clues for the study of the biological consequences of cSVs with far-reaching implications for clinical outcome and management of cancer patients. The bioinformatic workflow is also widely applicable for addressing other research questions in different contexts, for which transcriptomic data are generated.

Links

LM2015091, research and development project

Name: Národní centrum lékařské genomiky (Acronym: NCLG)

Investor: Ministry of Education, Youth and Sports of the CR

MUNI/A/1105/2018, interní kód MU

Name: Nové přístupy ve výzkumu, diagnostice a terapii hematologických malignit VI (Acronym: VýDiTeHeMA VI)

Investor: Masaryk University, Category A

NV15-31834A, research and development project

Name: Vliv selekce genomických poškození na průběh chronické lymfocytární leukémie

90091, large research infrastructures

Name: NCMG

Citovat

HYNŠT, Jakub, Karla PLEVOVÁ, Lenka RADOVÁ, Vojtěch BYSTRÝ, Karol PÁL and Šárka POSPÍŠILOVÁ. Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants. PeerJ. London: PEERJ INC, 2019, vol. 7, JUN, p. 7071-7086. ISSN 2167-8359. Available from: https://dx.doi.org/10.7717/peerj.7071.

@article{1567877,
   author = {Hynšt, Jakub and Plevová, Karla and Radová, Lenka and Bystrý, Vojtěch and Pál, Karol and Pospíšilová, Šárka},
   article_location = {London},
   article_number = {JUN},
   doi = {http://dx.doi.org/10.7717/peerj.7071},
   keywords = {Chromothripsis; Complex structural variants; Fusion gene; Gene expression; Bioinformatic pipeline; Next-generation sequencing; Leukemia; Transcriptomics; Chronic lymphocytic leukemia; Statistics},
   language = {eng},
   issn = {2167-8359},
   journal = {PeerJ},
   title = {Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants},
   url = {https://peerj.com/articles/7071.pdf},
   volume = {7},
   year = {2019}
}

TY  - JOUR
ID  - 1567877
AU  - Hynšt, Jakub - Plevová, Karla - Radová, Lenka - Bystrý, Vojtěch - Pál, Karol - Pospíšilová, Šárka
PY  - 2019
TI  - Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants
JF  - PeerJ
VL  - 7
IS  - JUN
SP  - 7071
EP  - 7071
PB  - PEERJ INC
SN  - 21678359
KW  - Chromothripsis
KW  - Complex structural variants
KW  - Fusion gene
KW  - Gene expression
KW  - Bioinformatic pipeline
KW  - Next-generation sequencing
KW  - Leukemia
KW  - Transcriptomics
KW  - Chronic lymphocytic leukemia
KW  - Statistics
UR  - https://peerj.com/articles/7071.pdf
L2  - https://peerj.com/articles/7071.pdf
N2  - Background. Extensive genome rearrangements, known as chromothripsis, have been recently identified in several cancer types. Chromothripsis leads to complex structural variants (cSVs) causing aberrant gene expression and the formation of de novo fusion genes, which can trigger cancer development, or worsen its clinical course. The functional impact of cSVs can be studied at the RNA level using whole transcriptome sequencing (total RNA-Seq). It represents a powerful tool for discovering, profiling, and quantifying changes of gene expression in the overall genomic context. However, bioinformatic analysis of transcriptomic data, especially in cases with cSVs, is a complex and challenging task, and the development of proper bioinformatic tools for transcriptome studies is necessary. Methods. We designed a bioinformatic workflow for the analysis of total RNA-Seq data consisting of two separate parts (pipelines): The first pipeline incorporates a statistical solution for differential gene expression analysis in a biologically heterogeneous sample set. We utilized results from transcriptomic arrays which were carried out in parallel to increase the precision of the analysis. The second pipeline is used for the identification of de novo fusion genes. Special attention was given to the filtering of false positives (FPs), which was achieved through consensus fusion calling with several fusion gene callers. We applied the workflow to the data obtained from ten patients with chronic lymphocytic leukemia (CLL) to describe the consequences of their cSVs in detail. The fusion genes identified by our pipeline were correlated with genomic break-points detected by genomic arrays. Results. We set up a novel solution for differential gene expression analysis of individual samples and de novo fusion gene detection from total RNA-Seq data. The results of the differential gene expression analysis were concordant with results obtained by transcriptomic arrays, which demonstrates the analytical capabilities of our method. We also showed that the consensus fusion gene detection approach was able to identify true positives (TPs) efficiently. Detected coordinates of fusion gene junctions were in concordance with genomic breakpoints assessed using genomic arrays. Discussion. By applying our methods to real clinical samples, we proved that our approach for total RNA-Seq data analysis generates results consistent with other genomic analytical techniques. The data obtained by our analyses provided clues for the study of the biological consequences of cSVs with far-reaching implications for clinical outcome and management of cancer patients. The bioinformatic workflow is also widely applicable for addressing other research questions in different contexts, for which transcriptomic data are generated.
ER  -

HYNŠT, Jakub, Karla PLEVOVÁ, Lenka RADOVÁ, Vojtěch BYSTRÝ, Karol PÁL and Šárka POSPÍŠILOVÁ. Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants. \textit{PeerJ}. London: PEERJ INC, 2019, vol.~7, JUN, p.~7071-7086. ISSN~2167-8359. Available from: https://dx.doi.org/10.7717/peerj.7071.

Detailed Information on Publication Record