Lecture 1 : NGS Overview Modern Genomic Technologies (LF:DSMGT01 ) Vojta Bystry vojtech.bystry@ceitec.muni.cz Introduction to LF: DSMGT01 2 •NGS data analysis for non-bioinformatics ‒Focus on experiment planning and result interpretation ‒ 1.Introduction; NGS Overview 2.DNA resequencing 3.DNA resequencing 4.RNA-seq 5.RNA-seq 6.Chip-seq (CLIP-seq) 7.Everything else J + colloquium •The plan is open to change - (based on your suggestions and wishes) • What is NGS? 3 •Next generation sequencing ‒New generation sequencing ‒HTP = High throughput ‒Massively parallel sequencing •Contrast to Sanger sequencing What is NGS? 4 What is NGS? 5 What is NGS? 6 •Illumina – sequencing by synthesis •Oxford Nanopore – Nanopore sequencing •Pacific Bioscience - Single Molecule, Real-Time (SMRT) • • •Chinese are coming - BGI DNBSeq platforms Technical comparison of DNBSeq™ and Illumina platforms What is NGS? 7 •Illumina – sequencing by synthesis •Oxford Nanopore – Nanopore sequencing •Pacific Bioscience - Single Molecule, Real-Time (SMRT) • Illumina – sequencing by synthesis 8 •https://www.youtube.com/watch?v=fCd6B5HRaZ8 • • • Raw data 9 •10^5 – 10^10 reads •75 – 300Bp •Could be pair-end Basic workflow 10 Chimestry_flask.png Experimental design Library preparation Sequencing Data analysis Basic workflow 11 Chimestry_flask.png Experimental design Library preparation Sequencing Data analysis How we sequence What we sequence Why we sequence Basic workflow 12 Chimestry_flask.png Experimental design Library preparation Sequencing Data analysis How we sequence What we sequence Why we sequence Consultation regarding data analysis is highly advisable. NGS library preparation 13 Living material DNA Select some parts RNA NGS library overview 14 A screen shot of a computer Description automatically generated NGS data analysis 15 15 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing > Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … NGS data analysis 16 16 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing > Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … Metagenomics 17 •Environmental statistics about populations ‒alpha, beta, gamma diversity ‒identify known bacterial species ‒eventually functional profiling •E.g. antimicrobial resistance genes •Sequencing techniques ‒16S rRNA sequencing ‒Shotgun metagenomic sequencing • Metagenomics – 16S rRNA vs. Shotgun 18 Factors 16S rRNA sequencing Shotgun Metagenomic Sequencing Cost ~$50 USD Starting at ~$150 but price will depend on sequencing depth required Sample preparation Similar complexity to shotgun sequencing Similar complexity to 16S rRNA sequencing Functional profiling (profile microbial genes) No (but ‘predicted’ functional profiling is possible) Yes (but it only reveals information on functional potential) Taxonomic resolution: Genus, species, strain? Bacterial genus (sometimes species); dependent on region(s) targeted Bacterial species (sometimes strains and single nucleotide variants, if sequencing is deep enough) Taxonomic coverage Bacteria and archaea All taxa, including viruses Bioinformatics requirements Beginner to intermediate expertise Intermediate to advanced expertise Databases Established, well-curated Relatively new, still growing Sensitivity to host DNA contamination Low (but PCR success depends on the absence of inhibitors and the presence of a detectable microbiome) High , varies with sample type (but this can be mitigated by calibrating the sequencing depth) Bias Medium to high (retrieved taxonomic composition is dependent on selected primers and targeted variable region) Lower (while metagenomics is “untargeted”, experimental and analytical biases can be introduced at various stages) Metagenomics – 16S rRNA vs. Shotgun 19 •Study Examples ‒Assessment of the bacterial microbiome of Amazonian soil •16S rRNA sequencing may provide more taxonomic resolution ‒Changes in microbiome composition and antimicrobial gene carriage following fecal transplant •shotgun sequencing to assess both compositional and functional differences ‒Daily fluctuations in gut microbiome following 2 week dietary fiber intervention •shotgun sequencing to assess both compositional and functional differences NGS data analysis 20 20 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing > Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … Reference Assembly 21 Reference Assembly 22 Reference Assembly 23 •Genome – DNA – very hard and costly •Transcriptome – RNA •Multiple sequencing types highly beneficial ‒Pair-end ‒Long reads ‒Mate-pairs •Similar reference helpful – assembly by homology NGS data analysis 24 24 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing > Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … Immunogenetic •T-cell receptor , Immunoglobulin – (B-cell) •Gene rearrangement during cell maturation ‒VDJ recombination 25 Immunogenetic •T-cell receptor , Immunoglobulin – (B-cell) •Gene rearrangement during cell maturation ‒VDJ recombination 26 Immunogenetic •Different cell populations ‒Clonal studies ‒Repertoire usage •Main usage – blood malignancies (leukemias) 27 A close up of a piece of paper Description automatically generated NGS data analysis 28 28 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing > Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … How to how to use CRISPR Libraries for Screening-GenScript丨CRISPR/Cas9 Applications Genome-wide CRISPR-Cas9 knockout screens •Cas9 (CRISPR associated protein 9) is a protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses •sgRNA libraries ‒Each sgRNA knockout specific gene ‒76,000 guide RNAs (sgRNAs) with four highly active guides per gene, targeting about 19,000 genes as well as non-targeting sgRNA controls 29 Lentivirus Genome-wide CRISPR-Cas9 knockout screens •Screen selection + expansion/enrichment of surviving cells •NGS sequencing • 30 Genome-wide CRISPR-Cas9 knockout screens •NGS data analysis ‒Counting cells with different genes KD ‒Counting sgRNA fragments ‒Compare conditions • • 31 Genome-wide CRISPR-Cas9 knockout screens •Example study • • 32 figure1 Wei, L., Lee, D., Law, CT. et al. Genome-wide CRISPR/Cas9 library screening identified PHGDH as a critical driver for Sorafenib resistance in HCC. Nat Commun 10, 4681 (2019). https://doi.org/10.1038/s41467-019-12606-7 NGS data analysis 33 33 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC > Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq … De-multiplexing 34 De-multiplexing 35 •Bcl2fastq tool ‒Needs sample sheet with indexes ‒Number of barcode mismatches •Check undetermined Primary data – fastq file 36 Fastq format - quality 37 •Fastq - q stands for quality – coded phred score • • Quality Error probability 5 31% 10 10% 20 1% 30 0.1% •Very good for early problem detection •Reasonable for trimming and read filtering •RNA seq - above phred score 5 • CFFFFEFFGCEEGECFGGGGAFF87@E:++6C<++3:,8,33,,:,,,:,,:,,, Fastq – quality control 38 •Fastqc - tool • • A screenshot of a computer Description automatically generated A screenshot of a computer Description automatically generated 39 www.ceitec.eu CEITEC @CEITEC_Brno Vojta Bystry vojtech.bystry@ceitec.muni.cz Thank you for your attention! >