SYLICA DNA Sequencing – Bowater Feb 2013 SYLICA 2013 Bowater lectures Contemporary DNA Sequencing Technologies Bowater Lectures in Brno, Feb. 2013 4 lectures on linked topics will be delivered during the coming week: •Contemporary DNA Sequencing Technologies – 26/2/2013 @ 10:00 •Using ‘Omic Technologies to Investigate Gene Function – 26/2/2013 @ 14:00 •Biophysical Methods to Study Molecular Interactions – 27/2/2013 @ 10:00 •Synthetic Biology & Nanotechnology: Tomorrow’s Molecular Biology? – 28/2/2013 @ 10:00 SYLICA DNA Sequencing – Bowater Feb 2013 DNA STRUCTURE SYLICA DNA Sequencing – Bowater Feb 2013 Bases Nelson & Cox, “Lehninger, Principles of Biochemistry”, 5th edn, 2008, p. 272 SYLICA DNA Sequencing – Bowater Feb 2013 DNA DNA is a double-stranded, helical molecule SYLICA DNA Sequencing – Bowater Feb 2013 [USEMAP] Alternative DNA Helices •The DNA helix may take different forms: A- and B-helices are right-handed; Z-DNA is left-handed. MAJOR GROOVE MINOR GROOVE The typical (average) structure of a DNA molecule is in the B-form. SYLICA DNA Sequencing – Bowater Feb 2013 dsDNA has an Anti-Parallel Structure The two strands of dsDNA have an anti-parallel polarity SYLICA DNA Sequencing – Bowater Feb 2013 5’ 5’ 3’ 3’ 3’ Genomes Genome of E. coli codes for 4,500 genes in 4.6 Mbp SYLICA DNA Sequencing – Bowater Feb 2013 Human genome codes for ~30,000 genes in 3,000 Mbp Genomes SYLICA DNA Sequencing – Bowater Feb 2013 Cellular DNA •“Coding” DNA is name given to genes that are transcribed & translated to make protein •Eukaryotic genomes contain large amounts of non-coding DNA •The length of DNA inside cells is extremely large relative to cell size SYLICA DNA Sequencing – Bowater Feb 2013 The Length of DNA in Cells is Very Large! •length of DNA inside cells is extremely large relative to cell size E. coli – lysed to show chromosomal DNA For related discussion, see also: Nelson & Cox, “Lehninger, Principles of Biochemistry”, 5th edn, 2008, p. 950 SYLICA DNA Sequencing – Bowater Feb 2013 Coding & Non-coding DNA Organism No. of genes Total size of DNA (Mbp) % of genome as coding DNA* E. coli 4,500 4.6 98 Yeast (S. cerevisiae) 5,885 12 49 Human 30,000 (?) 3,000 1 * Assuming average gene size ~ 1,000 bp SYLICA DNA Sequencing – Bowater Feb 2013 Chromosomal DNA •chromosomes are complexes of protein & DNA Nelson & Cox, “Lehninger, Principles of Biochemistry”, 5th edn, 2008, p. 951 SYLICA DNA Sequencing – Bowater Feb 2013 Nelson & Cox, “Lehninger, Principles of Biochemistry”, 5th edn, 2008, p. 968 DNA Compaction within Cells SYLICA DNA Sequencing – Bowater Feb 2013 DNA Structure: Overview •Inside cells, the structure of DNA is dynamic: usually in B-form helix, but can exist in different structures/conformations •DNA molecules can be linear or circular •Most genomes have significant amounts of non-coding DNA •DNA molecules in cells are very long – therefore the helix has further levels of well-organised structure that allow it to be contained and used in cells SYLICA DNA Sequencing – Bowater Feb 2013 Genomics & Technology •Molecular biology: major scientific discipline for past ~50 years •Genomics: became important science during 1990’s •Transcriptomics/Proteomics: developed during past 5 years •Bioinformatics: has developed as major branch of science - enables efficient analysis of data from “omics” experiments SYLICA DNA Sequencing – Bowater Feb 2013 A Primer About DNA Sequencing •Major advance in DNA sequencing occurred with use of DNA polymerases Synthesis requires template strand, “primer” & dNTPs Nelson & Cox, “Lehninger, Principles of Biochemistry”, 4th edn, 2004, p. 297 ddNTPs interrupt DNA synthesis SYLICA DNA Sequencing – Bowater Feb 2013 A Primer About DNA Sequencing •DNA to be sequenced acts as template •Oligonucleotide allows sequencing to start at any point •Small amounts of “labelled” ddNTP •Identification of specific bases after electrophoresis •<500 bases per day Nelson & Cox, “Lehninger, Principles of Biochemistry”, 4th edn, 2004, p. 297 SYLICA DNA Sequencing – Bowater Feb 2013 Improved DNA Sequencing •“Labelled” DNAs separated by capillary electrophoresis •DNA sequence read as series of colours •Computer deciphers sequence •>2,000 bases per day Nelson & Cox, “Lehninger, Principles of Biochemistry”, 4th edn, 2004, p. 298 SYLICA DNA Sequencing – Bowater Feb 2013 Genomic Sequencing Nelson & Cox, “Lehninger, Principles of Biochemistry”, 4th edn, 2004, p. 324 •Sequencing centres have hundreds of machines working continuously •Each can generate equivalent of human genome sequence each month SYLICA DNA Sequencing – Bowater Feb 2013 unnumbered 9 p322 Genomic Sequencing SYLICA DNA Sequencing – Bowater Feb 2013 The Human Genome Project •Sequencing of the human genome allows for: –Identification and categorization of different haplotypes –Understanding the differences between humans and chimpanzees •Based on phylogenetic trees and comparison of differences •Especially in regulatory sequences, which may be more important to evolution than protein changes –Identification of genes involved in disease –Track the path of human migration SYLICA DNA Sequencing – Bowater Feb 2013 Human genome contains many different sequence types SYLICA DNA Sequencing – Bowater Feb 2013 FIGURE 9–29a A snapshot of the human genome. (a) This pie chart shows the proportions of various types of sequences in our genome. Transposons, including SINEs and LINEs, are described in Chapters 25 and 26. Human genome contains many different protein types SYLICA DNA Sequencing – Bowater Feb 2013 FIGURE 9–29b A snapshot of the human genome. (b) The approximately 25,000 protein-coding genes in the human genome can be classified by the type of protein encoded. New Generation of DNA Sequence Analysis •Full genome is immobilized on a chip in fragments a few hundred bases long –All sequenced at once, allowing for faster detection •Pyrosequencing –DNA synthesized from the template a single nucleotide at a time, each generating a pulse of light –Can read 400–500 nucleotides in the sequence •Reversible terminator sequencing –Fluorescently labeled terminal nucleotide is added to the sequence and detected –Terminal nucleotide is removed, sequence extended, and next nucleotide is detected SYLICA DNA Sequencing – Bowater Feb 2013 Pyrosequencing SYLICA DNA Sequencing – Bowater Feb 2013 FIGURE 9–25 Next-generation pyrosequencing. (a) Pyrosequencing uses the enzymes sulfurylase (see Fig. 22–15) and luciferase (see Box 13–1) to deted nucleotide addition with flashes of light. The plot shows the light intensities observed during successive sequencing cycles for a DNA segment immobilized at a particular spot of a picotiter plate and (at top) the DNA nucleotide sequence derived from them. (b) An image of a very small part of one cycle of a 454 sequencing run. Each individual segment of DNA to be sequenced is attached to a tiny DNA capture bead, then amplified on the bead by PCR. Each bead is immersed in an emulsion and placed in a tiny (~29 μm) well on a picotiter plate. The reaction of luciferin and ATP with luciferase produces light flashes when a nucleotide is added to a particular DNA cluster in a particular well. Circles represent the same cluster over multiple cycles. In this case, reading the top (or bottom) circle from left to right across each row gives the sequence for that cluster. Reversible Terminator Sequencing Nelson & Cox, “Lehninger, Principles of Biochemistry”, 6th edn, Fig. 9.26 SYLICA DNA Sequencing – Bowater Feb 2013 FIGURE 9–26 Next-generation reversible terminator sequencing. (a) The reversible terminator method of sequencing uses fluorescent tags to identify nucleotides. Blocking groups on each fluorescently labeled nucleotide prevent multiple nucleotides from being added per cycle. (b) Six successive cycles from one very small part of an Illumina sequencing run. Each colored spot represents the location of an immobilized DNA oligonucleotide affixed to the surface of the flow cell. The circled clusters represent the same spot on the surface over successive cycles and give the sequences indicated. Data are automatically recorded and analyzed digitally. (c) Typical flow cell used for a next-generation sequencer. Millions of DNA fragments can be sequenced simultaneously in each of the eight channels. High-throughput Sequencing Technologies •Recently, several new technologies have increased the throughput and reduced the cost for genome sequencing •Examples are: 454 Sequencing Illumina method •Animations illustrating these methods available at: •www.wellcome.ac.uk/Education-resources/Teaching-and-education/Animations/DNA/index.htm SYLICA DNA Sequencing – Bowater Feb 2013 DNA Sequencing: Overview •Production of pure DNA polymerases made it feasible to consider sequencing of genomes •Sequencing of large genomes became possible with: ØIncreased sensitivity of nucleic acid detection ØAutomated robotic technology ØImproved computer power •Further advances are increasing speed, reducing size and cost – suggesting it will soon be possible to sequence individual human genomes for $1,000 SYLICA DNA Sequencing – Bowater Feb 2013 Polymerase Chain Reaction (PCR) •Used to amplify DNA in the test tube –Can amplify regions of interest (genes) within DNA –Can amplify complete circular plasmids •Mix together –Target DNA –Primers (oligonucleotides complementary to target) –Nucleotides: dATP, dCTP, dGTP, dTTP –Thermostable DNA polymerase •Place the mixture into thermocycler –Melt DNA at ~95°C –Cool to ~ 50–60°C, primers anneal to target –Polymerase extends primers in 5’®3’ direction –After a round of elongation is done, repeat steps SYLICA DNA Sequencing – Bowater Feb 2013 General Steps of PCR SYLICA DNA Sequencing – Bowater Feb 2013 FIGURE 9–12a (part 1) Amplification of a DNA segment by the polymerase chain reaction (PCR). (a) The PCR procedure has three steps. DNA strands are 1 separated by heating, then 2 annealed to an excess of short synthetic DNA primers (orange) that flank the region to be amplified (dark blue); 3 new DNA is synthesized by polymerization catalyzed by DNA polymerase. The three steps are repeated for 25 or 30 cycles. The thermostable Taq DNA polymerase (from Thermus aquaticus, a bacterial species that grows in hot springs) is not denatured by the heating steps. •Repeat steps 1–3 many times: SYLICA DNA Sequencing – Bowater Feb 2013 General Steps of PCR FIGURE 9–12a (part 2) Amplification of a DNA segment by the polymerase chain reaction (PCR). (a) The PCR procedure has three steps. DNA strands are 1 separated by heating, then 2 annealed to an excess of short synthetic DNA primers (orange) that flank the region to be amplified (dark blue); 3 new DNA is synthesized by polymerization catalyzed by DNA polymerase. The three steps are repeated for 25 or 30 cycles. The thermostable Taq DNA polymerase (from Thermus aquaticus, a bacterial species that grows in hot springs) is not denatured by the heating steps. DNA Fingerprinting •Humans have short sequences that repeat next to each other (Short tandem repeats (STR)) •Differences in the number of repeats cause varying fragment lengths when sample subjected to PCR using a primer specific for that region •Fragment sizes determined by using a capillary gel •Multiple STR locations exist in the human genome •Allows matching of “suspect” samples to known individuals •13 well-studied locations are used in identifications –Based on number of alleles at each location misidentification is <1 in 1018 (with good data) SYLICA DNA Sequencing – Bowater Feb 2013 DNA Genotyping SYLICA DNA Sequencing – Bowater Feb 2013 BOX 9-1 FIGURE 1 (a) STR loci can be analyzed by PCR. Suitable PCR primers (with an attached dye to aid in subsequent detection) are targeted to sequences on either side of the STR, and the region between them is amplified. Individuals usually have two different alleles at a particular locus (one inherited from each parent). If the STR sequences have different lengths on the two chromosomes of an individual, two PCR products of different lengths will result. (b) The PCR products from amplification of 16 STR loci run on a single capillary acrylamide gel are shown. Particular loci are targeted with primers labeled with only one of three different fluorescent dyes (six loci with a green dye, five with a blue dye, and five with a yellow dye, plotted here with black ink for visibility). Determination of which locus corresponds to which signal depends on the color of the fluorescent dye attached to the primers used in the process and to the size range in which the signal appears (the size range can be controlled by which sequences—closer to or more distant from the STR—are targeted by the designed PCR primers). Adaptations to PCR •Reverse Transcriptase PCR (RT-PCR) –Used to amplify RNA sequences –First step uses reverse transcriptase to convert RNA to DNA •Quantitative PCR (Q-PCR) –Used to show quantitative differences in gene levels SYLICA DNA Sequencing – Bowater Feb 2013 qPCR SYLICA DNA Sequencing – Bowater Feb 2013 FIGURE 9–13 Quantitative PCR. PCR can be used quantitatively, by carefully monitoring the progress of a PCR amplification and determining when a DNA segment has been amplified to a specific threshold level. (a) The amount of PCR product present is determined by measuring the level of a fluorescent probe attached to a reporter oligonucleotide complementary to the DNA segment that is being amplified. Probe fluorescence is initially not detectable due to a fluorescence quencher attached to the same oligonucleotide. When the reporter oligonucleotide pairs with its complement in a copy of the amplified DNA segment, the fluorophore is separated from the quenching molecule and fluorescence results. (b) As the PCR reaction proceeds, the amount of the targeted DNA segment increases exponentially, and the fluorescent signal also increases exponentially as the oligonucleotide probes anneal to the amplified segments. After many PCR cycles, the signal reaches a plateau as one or more reaction components become exhausted. When a segment is present in greater amounts in one sample than another, its amplification reaches a defined threshold level earlier. The “No template” line follows the slow increase in background signal observed in a control that does not include added sample DNA. CT is the cycle number at which the threshold is first surpassed.