www.BioTechniques.com105Vol. 48 | No. 2 | 2010 In the late 1990s, scientists J. Craig Venter and Francis Collins became household names during the epic race to sequence the humangenome.Eventuallythatraceended in a tie in 2000, but similar levels of fame and publicity--not to mention millions in potential revenue--await the company or individual that wins sequencing's latest race: to develop and implement a so-called third-generation sequencing system. The sequencing community turned its sightstowardthepromiseofthird-generation sequencing instruments less than five years ago, when a number of companies started promising single-molecule sequencing instruments that could provide faster, cheaper genome sequencing--instruments that would finally enable the burgeoning fieldofpersonalizedmedicineandtakenew fieldsofstudysuchastranscriptomicstothe next level. So strong is the interest in new sequencing technologies that the National InstitutesofHealth(NIH)issupportingthe race toward third-generation technologies through their $1000 Genome Initiative, which provides funding to companies and individualsdevelopinginnovativesolutions aimed at rapid, efficient DNA sequencing. Giventhefrenziedpaceofdevelopmentand the millions of dollars in financial support, many suspect that it is only a matter of time before someone crosses the finish line, commercializing a third-generation sequencing system that will finally give researchers a full human genome sequence for less than $1000. Slow, long reads: the starting line Frederick Sanger was awarded part of the 1980 Nobel Prize in Chemistry for his development of a method to sequence DNA. His approach uses gel or capillary electrophoresistoseparateDNAfragments of differing lengths that have labeled dideoxynucleotides incorporated at the ends.Byidentifyingeachlabelednucleotide in the resulting DNA ladder, the ordered sequence of any DNA fragment can be determined. After nearly 30 years, Sanger sequencing, as it has come to be known, is still being used by many researchers for its simplicity and effectiveness along with its abilitytoaccuratelydeterminethesequence of long stretches of DNA. A few years after Sanger won his Nobel Prize, developers at Applied Biosystems launched an automated DNA sequencer based on his method, which used fluorescently labeled dideoxynucleotides. Sanger's sequencing-by-synthesis approach to DNA sequencing,alongwiththenewlydeveloped automated instruments, would serve as the enabling technology for the Human Genome Project. Although the method has proved durable, reliable, and accurate, it suffers from being slow, expensive, and relativelylow-throughput--keyreasonswhy Collins' government-funded researchers required$3billionover13yearstogenerate their draft map of the human genome. Recent advances in the Sanger methodology and associated computational assembly tools, along with the availability of the draft human genome sequence, enabled the assembly of another human genomesequencein2007,usingthedideoxy approach. This time, though, the genome wasofasingleindividual:J.CraigVenter(1). Despite the obvious success of the Human Genome Project and the other recent whole-genome sequencing efforts, Sanger sequencing is slowly becoming outdated. Costs remain high and throughput is Next-generation sequencing has pushed forward the boundaries of genetic research and enabled the completion of a rapidly growing number of whole-genome sequencing projects. But the impending arrival of third-generation DNA sequencing technology could change the landscape yet again. Applied Biosystems' SOLiD sequencing system was commercialized in 2008, vastly improving upon their Sanger methodbased automated sequencing system. Courtesy of Applied Biosystems. Illustration credit: Tavares Jones Special News Feature Features Sequencing's new race www.BioTechniques.com107Vol. 48 | No. 2 | 2010 Features not high enough to support the growing interest in personalized medicine and the desiretouncoverthegeneticbasisofhuman disease. Since these efforts could require tens of thousands of individual genome sequences, geneticists have been searching for alternatives. Claire Wade, a professor of veterinary science at the University of Sydney in Australia, recently completed the sequencing of the horse genome using Sanger methodology (2). "The horse was one of the last sequencing projects to be fully conducted using Sanger sequencing," says Wade. "It represents the peak of our understanding of mammalian assembly on that platform." The next generation: ready to go As it turns out, the desire to improve upon the Sanger approach and related instrumentation over the last decade has led to the development of a new generation of sequencing platforms that are currently used in labs around the world. But even these newer and faster `next-generation' platforms might merely be holding scientists over until the arrival of third-generation technologies. In 2005, Roche 454 Life Sciences becamethefirstcompanytocommercialize a next-generation sequencing platform (3). Their core technology maps genomes by attaching beads to DNA fragments which are then amplified, enriched, and loaded onto a multi-well plate for sequencing. Labeled nucleotides are then flowed across the plate wells, where each well contains a single bead. When a complementary nucleotide crosses the template strand and is incorporated, a chemiluminescence reaction produces a signal that is imaged andrecorded.Byperformingthenucleotide flow step repeatedly with each nucleotide, the current GS-FLX system can generate 400 million raw bases of data per 10-hour instrumentrun,fromsequencereadlengths in excess of 400 bases. Illumina's Genome Analyzer (GA) IIx system was introduced in 2006 and takes a slightly different route. The GA generates millionsofcloneclustersonasolidsupport, which are then sequenced using reversible dideoxy terminator­based sequencing chemistry, which is akin to the Sanger methodology.ReadlengthsusingIllumina's system are shorter than those from Roche's GS-FLX(averaging100bp),butthenumber of sequence reads per run is dramatically greater (the GA IIx generates 300 million reads per flow cell, in comparison to the GS-FLX's 1 million per run). Applied Biosystems developed the SOLiD system in 2008, which combines elements of various approaches to sequence clonally amplified DNA fragments linked to beads. While the amplification and attachment to beads are similar to the Roche and Illumina platforms, Applied Biosystems' platform relies on a unique sequencing by ligation approach using dye-labeled oligonucleotides, rather than sequencing-by-synthesis. The method actually provides two-base redundancy in sequencing reads, which the company says results in a higher accuracy. Similar to Illumina's platform, the latest SOLiD 3 system generates large numbers of shorter reads: the current SOLiD system can generate more than 60 gigabases of raw data from more than one billion sequence tags per instrument run. While shorter read lengths can cause problemswithdenovogenomeassembly,the increased number of reads per instrument run--along with new computational tools designed specifically for next-generation sequencing analysis--have made it possible to decode whole genomes rapidly on these platforms, at a significantly reduced cost. In August 2009, Illumina announced the mappingofasinglehumangenomeatacost of $48,000. With sequencing projects that once took more than a decade and billions of dollars now being done in just over 2 weeks for a little less than $50,000, genomics researchers are applying nextgeneration instrumentation to projects that7 years agoseemed impossible.Indeed, the NIH-funded 1000 Genomes Project, launched in 2008, involves labs around the globe and seeks to create an improved map of the variation found in human DNA by sequencinggenomesfrom1200individuals. Even more ambitious is the Genome 10K project, an international effort to sequence 10,000vertebratespecies,orapproximately one genome for every vertebrate genus. The high throughput and digital nature of next-generation sequencing technologies have enabled their application within emerging fields like transcriptomics, where it is important to quantify numbers of low-abundance messenger RNA molecules and understand how they regulate protein expression. But in the end, according to Wade, it might be the developing `third-generation' sequencingmethodsthathavethepotential to reveal even more about the composition and dynamics of complex mammalian genomes. "There needs to be sufficient read length to span longer repetitive sequences, and this represents a challenge with short read length massively parallel methods," she explains. In 2005, Roche 454 Life Sciences' sequencing-by-synthesis instrument was the first next-generation platform to be commercialized. Courtesy of Roche 454 Life Sciences. Roche 454 Life Sciences' approach to sequencing relies on attaching DNA fragments to beads which are amplified and placed into wells on a microwell plate. Nucleotides are flowed over the well and addition of a specific nucleotide to the DNA fragment on the bead results in a chemiluminescent reaction that can be imaged, allowing for the determination of the DNA sequence of the fragment attached to the bead. Courtesy of Roche 454 Life Sciences. www.BioTechniques.com109Vol. 48 | No. 2 | 2010 Features The third generation: almost there Cambridge, Massachusetts­based Helicos Biosciencesissittingatthenexusofcurrent next-generation sequencing systems and the emergence of third-generation sequencing. The company is the first to developaplatformbasedonsingle-molecule sequencing (4), an approach the company calls True Single Molecule Sequencing (tSMS). Capable of sequencing several billion bases in one instrument run in real time, the true advantage of single-molecule approaches like Helicos' might just lie in the ability to sequence without amplifying template DNA, permitting accurate quantification of specific RNA or DNA molecules rapidly from very small starting template amounts. WhileHelicos'tSMSmayhavebeenthe first approach out of the single-molecule gate, other methods are beginning to emerge. "We anticipate that by 2013, the SMRT sequencer will be available and able to sequenceahumangenomeforafewhundred dollars--and in a matter of minutes," says Sejal Sheth, director of product marketing atPacificBiosciences.Boldstatementssuch as this are common in the world of thirdgeneration development. Pacific Biosciences made a splash at the 2008 Advances in Genome Biology and Technology (AGBT) conference when the company's chief technology officer, Stephen Turner, showed early data on the effectiveness of SMRT, their approach to single-molecule sequencing. Turner's talk at AGBT was quickly followed by two articles describing the method in greater detail (5,6). Pacific Biosciences' approach works by sequencing strands of DNA on chips containing thousands of zero-mode waveguides(ZMWs).EachZMWfunctions as a nanophotonic visualization chamber, providing an incredibly small detection volumeof20zeptoliters(10-21 liters).Atthis volume,theactivityofasinglemoleculecan be detected against thousands of labeled nucleotides, which provides a window for watching DNA polymerase as it performs sequencing-by-synthesis. In each chamber, a single DNA polymerase molecule is attached to the bottom surface so that it permanently resides in the detection volume. Phospholinked nucleotides, each labeledwithadifferentcoloredfluorophore, are introduced into the reaction solution at concentrations that promote enzyme speed, accuracy, and processivity. As the single molecule of DNA polymerase incorporates complementary nucleotides, each baseisheldwithinthedetectionvolumefor tens of milliseconds. During this time, the engagedfluorophoreemitsfluorescentlight whose color corresponds to the identity of a particular base. The polymerase then cleaves the bond holding the fluorophore in place and the dye diffuses out of the detection volume. Following incorporation and cleavage, the signal immediately returns to baseline and the process repeats. The result of this process could be very fast sequencing--ontheorderof50nucleotides per second--with longer read lengths than is currently available with any next-generation or Sanger-based system. Expectations at Pacific Biosciences are high. Sheth believes that with their current development timeline,thecompanywillbe the first to reach the $1000 genome. "We firmly believe that we have the leading third-generation single-molecule DNA sequencing technology," he says. Others disagree. "The technology that will deliver a true $1000 genome will not rely on light and optics," says Zoe McDougall, director of communications at Cambridge, UK­based Oxford NanoporeTechnologies.OxfordNanopore is advancing the use of protein nanopores forDNAsequencinginaplatformthatuses an electronic rather than an optical signal to identify DNA bases (7). According to McDougall,thisplacesOxfordNanopore's approach in the unique position to drive down sequencing costs. Usingnanopores--smallopeningsinthe lipidbilayerofacell--tosequenceDNAwas proposed years ago, but the technology to make the idea feasible has proved elusive. McDougallsaysOxfordNanopore'sapproach makesnanoporesequencingaviableoption. Thecompanyusessiliconchipscontaininga seriesofmicrowellswhereDNAsamplesare introduced. The lipid bilayer that lies across the top of the well gives a high-resistance electrical seal across which voltage is sent to drive a current toward the bottom of the well.Thenanoporesaretheonlypointacross which the current can flow. As each DNA basepassesthroughananopore,itmustfirst bindtoanadaptorcyclodextrin.Duringthis binding event, each base blocks the flow of current across the nanopore to a different degree; the variations allow researchers to identifywhichbasesarepassingthroughthe pore.SimilartoPacificBiosciences,Oxford Nanopore demonstrated with initial data that their nanopore sequencing approach will be able to map as many as 25 bases per second (7) with the potential to sequence very long fragments of DNA without inter- ruption. Another company is taking their thirdgeneration offering one step further, not The Complete Genomics sequencing method includes the creation of novel clusters of DNA fragments and pieces of known sequence called DNA nanoballs. Courtesy of Complete Genomics. Clifford Reid cofounded the company Complete Genomics in 2006 in an effort to develop a faster method to sequence DNA. Courtesy of Complete Genomics. www.BioTechniques.com111Vol. 48 | No. 2 | 2010 Features onlybycreatinganoveltechnologyplatform, but offering it as a sequencing service. In 2006, Clifford Reid, Radoje Drmanac, and JohnCursoncametogethertodevelopanew approach to how DNA is sequenced and how researchers obtain their sequences of interest.Forthenext3years,theircompany, CompleteGenomics,workedondeveloping their proprietary sequencing technology. In 2009,CompleteGenomicsannouncedtheir intentiontoofferwhole-genomesequencing services to researchers, and in February of that same year, the company released their first complete human genome sequence (8). TheCompleteGenomicsmodeldiffersfrom othernext-generationsequencingcompanies in that they do not sell their instruments; instead, the company sequences DNA samples that customers provide, and then reports back the results. Complete Genomics published schematics of their sequencing system in November 2009 (9). The approach is to create 500-bp libraries with genomic DNA fragments of known sequence interspersed at regular intervals. These are then amplified in solution, in a single reaction chamber, which enables higher density and lower reagent usage. These resulting DNA nanoballs (DNBs) consist of more than 200 copies of the head-to-tail concatemers in a ball-like configuration. The DNBs are then transferred onto patterned silicon substrates with arrays of spots that are activated to capture and hold the DNBs in place, essentially allowing for self-assembly of the DNBs into DNA nanoarrays. The next step involves Complete Genomics' combinatorial probe-anchor ligation (cPAL), which is an unchained, non-iterative, base-reading sequencing assay. According to Reid, this method has the advantage of dramatically reducing the required probe and enzyme concentrations and reducing imaging time, which substantially cuts reagent and imaging costs, respectively. The collected images are then assembled with Complete Genomics' computing software. Currently, Complete Genomics offers sequencing at $20,000 per genome for projects with a minimum of eight genomes, andvolumediscountsforprojectswithmore than24genomes.Forthoseresearcherswith projects of even larger volume, the cost per genome is around $5,000. But McDougall cautions against "comparing apples to pears" when it comes to advancing toward the $1000 genome. "When we [at Oxford Nanopore] speak about the cost of the genome, we like to consider the full cost. That means reagents, amortization of our instruments, labor, IT, informatics, project management, and sample preparation," she says. Oxford Nanopore's technology does not require reagents, which McDougall believes gives them an advantage when it comes to reducing total costs. With fame and funds abounding, the third-generation sequencing race has attractednumerouscompetitors.TheArchon X-Prize for Genomics was established in October 2006 and offers a $10-million prize to the first team able to sequence 100 human genomes in 10 days. Research teams and companies from around the world have stepped up to the challenge. Participants includeteamcrackerfromTaiwan,base4innovation from the UK, and the US-based companies Visigen Biotechnologies, Reveo, ZS Genetics, and 454 Life Sciences. Also from the US is the Foundation for Applied Molecular Evolution and George Church's Personal Genome X. At the finish line Which company will be the one to break the $1000 genome barrier first? What will be included in the total cost of that $1000? How will a winner actually be decided? Thoughthesequestionsremainunanswered, Pacific Biosciences, Oxford Nanopore, and Complete Genomics all predict there will be a winner soon--and according to each company, it's going to be them. From the development of the Sanger method to the completion of the Human Genome Project, geneticists have made significant strides in understanding and accessing the information stored in our genes. Whether using optics, nanopores, nanoballs, or some yet-unannounced sequencing methodology, the completion of the third-generation sequencing race will mark another milestone in the history of genetics. Yet like all previously lauded achievements, it will be improved upon. Speed, accuracy, easily assembled long reads, and reduced cost will only satisfy for so long. Once the $1000 goal is reached, developers are likely to set their sights on the $100 genome, and, perhaps, someday even the $1 genome. References 1. Levy, S., G. Sutton, P.C. Ng, L. Feuk,and A.L. Halpern. 2007. The diploid genome sequence of an individual human. PLoS Biol. 5:e254. 2.Wade, C.M., E. Giulotto, S. Sigurdsson, M. Zoli, S. Gnerre, F. Imsland, T.L. Lear, D.L. Adelson, et al. 2009. Genome sequence, comparative analysis, and population genetics of the domestic horse. Science 326:865-867. 3. Margulies, M., M. Egholm, W.E. Altman, S. Attiya, J.S. Bader, and L.A. Bemben. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376-380. 4.Harris, T.D., P.R. Buzby, H. Babcock, E. Beer, J. Bowers, I. Braslavsky, M. Causey, J. Colonell, et al. 2008. Single molecule DNA sequencing of a viral genome. Science 320:106-109. 5.Korlach, J., P. Marks, R. Cicero, J. Gray, D. Murphy, D. Roitman, T. Pham, G. Otto, et al. 2008. Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures. Proc. Natl. Acad. Sci. USA 105:1176-1181. 6.Eid, J., A. Fehr, J. Gray, K. Luong, J. Lyle, G. Otto, P. Peluso, D. Rank, and S. Turner. 2008. Real-time DNA sequencing from single polymerase molecules. Science. 323:133-8. 7. Clarke, J., H.-C. Wu, L. Jayasinghe, A. Patel, S. Reid, and H. Bayley. 2009. Continuous base identificationforsingle-moleculenanoporeDNA sequencing. Nat. Nanotechnol. 4:265-270. 8.Rosenbaum A.M., J.V. Thakuria, X. Wu, A.W. Zaranek, J. Li, P. Hulick, M. Murray, M.F. Browning, et al. 2009. Clinical analysis of individual genomes. Nature. (In press). 9.Drmanac, R. and C.A. Reid. 2009. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science. 327:78-81. Written by Erin Podolak. Supplementary material for this article is available at www.BioTechniques.com/article/113371. BioTechniques 48:105-111 (February 2010) doi 10.2144/000113371 Helicos Biosciences' approach to single molecule sequencing involves binding of targets to a flow cell followed by sequential rounds of nucleotide addition, imaging, and cleavage. Courtesy of Helicos Biosciences.