RNA-seq+ - Analysis Vojtěch Bystrý 16. December 2019 NGS experiments 22 Cell Static Cell Dynamics Next Generation Sequencing DNA sequencing RNAseq, Chipseq, ATAC-seq,.. NGS experiments 33 DNA sequencing RNAseq, Chipseq, ATAC-seq,.. Next Generation Sequencing Recognize differences from “normal” Counting elements NGS data analysis workflow 44 Raw data .fastq Mapping .bam Expression analysis Alternative splicing Peak analysis de-multiplexing Special cases QC QC Experiment design … Single cell analysis 5 UMI – unique molecular identifiers § Each molecular fragment gets unique n-base sequence (n ~ 8-12) § Usage: § Mark duplicates Raw data - QC • Fastq - q stands for quality – coded phred score 66 Q = −10⋅log10 P Quality Error probability 5 31% 10 10% 20 1% 30 0.1% • Very good for early problem detection • Reasonable for trimming and read filtering • RNA seq - above phred score 5 CFFFFEFFGCEEGECFGGGGAFF87@E:++6C<++3:,8,33,,:,,,:,,:,,, Alignment - QC • Per gene coverage • Variability of per gene mapping • Gene counts distribution • rRNA content estimate • Tissue expression check - gtex 77 Alignment - QC • QC example – multiQC html 88 NGS data analysis workflow 99 Raw data .fastq Mapping .bam Expression analysis Alternative splicing Peak analysis de-multiplexing Special cases QC QC Experiment design … Single cell analysis Expression analysis - planning • 3 way balance • Read depth • Biological replicates • Fold change (number of genes) sensitivity 1010 many BR 1 BRlow RD high RD not sensitive very sensitive Expression analysis - planning • Depth • Human ~ 22 000 genes = minimum 20 mil mapped reads • Good 25 mil mapped reads • Mapped reads! • rRNA removal • Size selection for sRNA • Technical vs. biological • Technical only for technique testing • Batch effect • Sample randomized sequencing • Highly suggested minimum = 4 rep 1111 Expression analysis 1/2/20 • Raw counts Expression analysis 1/2/20 • Result Expression analysis 1/2/20 • Result • normCounts • rpkm - Reads Per Kilobase of transcript per Million mapped reads • fpkm - Fragments Per Kilobase of transcript per Million mapped reads • tpm - Transcripts Per Million (TPM) • for every 1,000,000 RNA molecules in the RNA-seq sample, x came from this gene/transcript • log2FoldChange • pvalue • padj – pvalue adjusted for multiple testing Expression analysis 1/2/20 • Report example NGS data analysis workflow 1616 Raw data .fastq Mapping .bam Expression analysis Alternative splicing Peak analysis de-multiplexing Special cases QC QC Experiment design … Single cell analysis Single cell analysis 1717 • Cluster cells based on expression • Cleaning/Filtering step • Clustering • Dimension reduction • PCA • tSNE • Visualization Single cell analysis 1818 NGS data analysis workflow 1919 Raw data .fastq Mapping .bam Expression analysis Alternative splicing Peak analysis de-multiplexing Special cases QC QC Experiment design … Single cell analysis Alternative splicing 2020 Alternative splicing 2121 Alternative splicing 2222 NGS data analysis workflow 2323 Raw data .fastq Mapping .bam Expression analysis Alternative splicing Peak analysis de-multiplexing Special cases QC QC Experiment design … Single cell analysis Peak analysis 2424 Peak analysis 2525 Thank you for your attention Central European Institute of Technology Masaryk University Kamenice 753/5 625 00 Brno, Czech Republic www.ceitec.muni.cz | info@ceitec.muni.cz