Lecture 2 : DNA re-sequencing Modern Genomic Technologies (LF:DSMGT01 ) Vojta Bystry vojtech.bystry@ceitec.muni.cz NGS data analysis 2 2 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC > Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … DNA re-sequencing 3 •Variant Calling •Medical purposes •Cancer genomics • •Small variants (SNV + small indels) vs. Structural Variants • •Germline vs. Somatic • Mapping 4 •Computationally most demanding •More or less standardized •Output .bam ‒.bam = binary (ziped) .sam ‒.sam = Sequence Alignment Map • •Tools ‒BWA - DNA ‒STAR - RNA DNA re-sequencing Small Variant calling Name of the presentation 5 Mapping QC 6 A picture containing chart Description automatically generated Mapping QC 7 Table Description automatically generated Mapping QC - coverage 8 A picture containing line chart Description automatically generated Chart, line chart Description automatically generated Mapping QC – cumulative coverage 9 Line chart Description automatically generated with medium confidence Chart Description automatically generated with low confidence Mapping QC 10 Graphical user interface, text, application Description automatically generated Mapping QC 11 Chart, line chart Description automatically generated Graphical user interface, application Description automatically generated Variant Calling - Germline 12 •What you have from birth •Family trio sequencing •Predispositions ‒ ‒ ‒ ‒ ‒ Variant Calling - Germline 13 •What you have from birth •Family trio sequencing •Predispositions ‒ ‒ ‒ ‒ ‒ Variant Calling - Germline 14 •Tools: ‒ ‒ ‒ ‒ ‒ ‒ Variant Calling - Somatic 15 •Diagnostics / prognostic / therapy decision •Tumor – normal paired ‒Somatic variant calling without normal needs high coverage •Expected variant heterogeneity •Indirectly corelates to the necessary coverage ‒ ‒ ‒ ‒ ‒ Variant Calling - Somatic Name of the presentation 16 •Multiple tools: ‒strelka2, verdict, mutect2, somaticsniper, lofreq, muse, varscan •Ensemble caller ‒SomaticSeq ‒Use machine learning to detect TP from FP •Sensitivity vs. specificity ‒Preferred sensitivity ‒Preferred accuracy for derived information • ‒ ‒ ‒ ‒ ‒ Small Variant annotation Name of the presentation 17 •VEP – variant effect predictor •Transcript ”selection” ‒Refseq vs. ensemble •Population frequency ‒1000 genome project ‒Gnomad •Many clinical variant DBs ‒Gene based vs. variant based ‒snpDB ‒COSMIC ‒clinvar ‒CGC Small Variant annotation – functional prediction Name of the presentation 18 •General variant consequence ‒Based on the position ‒Impact •Effect of the variant on protein structure ‒PolyPhen ‒SIFT ‒ • Screenshot 2017-06-13 19.53.57.png Small Variant interpretation Name of the presentation 19 • •Hardest part •Usually manual work ‒Clinical genetics ‒Select 5 probable causal from ~1000 •Bioinformatics can help • Variant interpretation – gene networks Name of the presentation 20 • •Gene ontology •Biological pathway DB ‒KEGG ‒Reactome ‒WikiPathways • Variant interpretation – derived informations Name of the presentation 21 •Tumor mutational burden ‒Several definitions ‒Mutations per million bases •Mutational Signatures ‒COSMIC ‒exposure to ultraviolet light ‒Tabacco smoking ‒Defective DNA damage repair 22 www.ceitec.eu CEITEC @CEITEC_Brno Vojta Bystry vojtech.bystry@ceitec.muni.cz Thank you for your attention! >