Next-gen sequencing Roman Hobza CG920 Genomics Lecture 7 2 3 4 5 6 7 Sequencing of genomes GenBank originated in 1982 from Los Alamos Sequence Database Walter Goad 8 Why do we need sequencing? • Comparative genomics • Biomedicine research • Personal genome 9 Frederick Sanger 1958 – Nobel prize – protein sequencing 1975 – dideoxy sequencing method 1977 – Φ-X174 (5,368 bp) 1980 – Nobel prize – DNA sequencing Phage λ - shotgun method (48,502 bp) 10 Genome sequencing • 1986 Leroy Hood: automatic sequencer • 1986 Human Genome Initiative • 1990 HUGO Leroy Hood 11 Genome sequencing • 1995 John Craig Venter – the first bacterial genome • 1996 first eukaryotic genome (yeast) John Craig Venter 12 Craig Venter Global Ocean Sampling Expedition Synthetic genomics Human Longevity Inc http://www.youtube.com/watch?v=J0rDFbrhjtI 13 • 1997 E. coli sequence • 1998 Caenorhabditis elegans genome (the first multicellular genome) • 1999 human chromosome 22 Genome sequencing 14 Genome sequencing • 2000 Drosophila melanogaster genome • 2001 Human Genome Sequencing: draft sequence 15 Genome sequencing • duben 2003 mouse draft genome • duben 2004 rat draft genome 16 2010 Perfect human genome 17 The Race for the $1000 Genome. Science 311: 1544 – 1546, 2006 Race for sequences Human genome (first draft) – $300 million (2001) Rhesus macaque – $22 million (2006) 18 Complexity reduction Hi-Cot selection 19 Complexity reduction Methylation filtration (MF) E. coli McrBC (5mC restriction) 20 Sheath fluid Deflection plates Excitation light Waste Right collector Left collector Laser Scattered light Fluorescence emission Relative fluorescence intensity Chromosomes in suspension Flow sorted chromosomes Flow karyotype Flow chamber Numberofevents Flow sorted chromosomes Complexity reduction – chromosome sorting 21 Laser microdissection Advantage: purity Disadvantage: small amount 22 Methods 23 24 • http://www.454.com Genome Sequencer 20 System 454 pyrosequencing (2005) 25 DNA library preparation 26 DNA fragmentation 27 adaptor ligation 28 DNA capture 29 denaturation 30 31 emPCR 32 emulsion 33 emPCR 34 emPCR 35 Bead capture 36 Bead capture 37 denaturation 38 Sequencing primer 39 Dispersion 40 Dispersion 41 Microwells 42 Parameters of microreactors 43 Sequencing 44 Sequencing 45 Sequencing 46 Sequencing 47 Sequencing 48 Sequencing 49 Sequencing 50 Sequencing 51 SOLID (Sequencing by Oligonucleotide Ligation and Detection) 2-base encoding sequencing (2007) 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 Solexa (2007) 70 71 72 HELICOS (2008) True Single Molecule Sequencing (tSMS) 73 Single Molecule Real-Time (SMRT) Pacific Biosciences 20 zeptoliters 74 Ion Torrent 75 Oxford nanopore 76 CHALLENGES IN GENOME SEQUENCING De novo genome assemblies using only short read data of NGS technologies are generally incomplete and highly fragmented due to  Large duplications  High proportion of repetitive DNA - chromosomal approach, BAC-by-BAC sequencing - challenge!  Large genome size (~17 Gb)  Polyploidy (3 subgenomes) Chromosomal approach 77 BAC-BY-BAC SEQUENCING BAC clones  Physical map is composed of contigs of overlapping BAC clones  BAC contigs are landed on the chromosome through markers comprised in the contigs 78 SOLUTIONS FOR THE REPEATS  Long mate-pair reads > 10 kb  Long read technologies – PacBio, Oxford Nanopore  Optical mapping  Single-molecule mapping of genomic DNA hundreds of kilobases to several megabases in size  Creates sequence-motif maps, which provide long-range template for ordering genomic sequences  Visualisation of reality “Seeing is Believing” 79 Three enzymatic approaches  restriction enzymes: sequence-specifically cleave DNA immobilized on a surface  nicking enzymes: fluorescent labelling of the nicking site in solution (BioNano Genomics - Irys)  methyltransferase enzymes: labelling with ultra-high density OPTICAL MAPPING Nicking Strand displacement Incorporation of fluorescent nucleotides 80 BIONANO GENOME MAPPING ON NANOCHANNEL ARRAYS 3 Fluorescence imaging Lam et al., Nat. Biotechnol. 30(8) 2012 4 Map construction DNA linearization2 5 Building consensus map Nickase (Nt.BspQI) 1 Sequence-specific labeling U U A 81