Forgot your password?
Biotech Science

New Method To Revolutionize DNA Sequencing 239

Posted by ScuttleMonkey
from the start-saving-up-to-buy-a-clone dept.
An anonymous reader writes "A new method of DNA sequencing published this week in Science identifies incorporation of single bases by fluorescence. This has been shown to increase read lengths from 20 bases (454 sequencing) to >4000 bases, with a 99.3% accuracy. Single molecule reading can reduce costs and increase the rate at which reads can be performed. 'So far, the team has built a chip housing 3000 ZMWs [waveguides], which the company hopes will hit the market in 2010. By 2013, it aims to squeeze a million ZMWs [waveguides] onto a single chip and observe DNA being assembled in each simultaneously. Company founder Stephen Turner estimates that such a chip would be able to sequence an entire human genome in under half an hour to 99.999 per cent accuracy for under $1000.'"
This discussion has been archived. No new comments can be posted.

New Method To Revolutionize DNA Sequencing

Comments Filter:
  • by mapkinase (958129) on Monday January 05, 2009 @02:35PM (#26332805) Homepage Journal


    We present single-molecule, real-time sequencing data obtained from a DNA polymerase performing uninterrupted template-directed synthesis using four distinguishable fluorescently labeled deoxyribonucleoside triphosphates (dNTPs). We detected the temporal order of their enzymatic incorporation into a growing DNA strand with zero-mode waveguide nanostructure arrays, which provide optical observation volume confinement and enable parallel, simultaneous detection of thousands of single-molecule sequencing reactions. Conjugation of fluorophores to the terminal phosphate moiety of the dNTPs allows continuous observation of DNA synthesis over thousands of bases without steric hindrance. The data report directly on polymerase dynamics, revealing distinct polymerization states and pause sites corresponding to DNA secondary structure. Sequence data were aligned with the known reference sequence to assay biophysical parameters of polymerization for each template position. Consensus sequences were generated from the single-molecule reads at 15-fold coverage, showing a median accuracy of 99.3%, with no systematic error beyond fluorophore-dependent error rates.

  • Re:99.3% accurate? (Score:4, Informative)

    by Anonymous Coward on Monday January 05, 2009 @02:35PM (#26332813)

    It's common practice in bioinformatics to measure the same data repetitively in an effort to reduce the error. While 0.993 isn't very good, (0.993)^3 is pretty awsome. In practice, the errors might be correlated (as in a flaw in the measuring system), so the benefit of re-measuring might not be exponential...however it should be darn close.

  • Article in Science (Score:2, Informative)

    by prograde (1425683) on Monday January 05, 2009 @02:56PM (#26333125)

    I assume that the hardware at Science can withstand a slashdotting better than the crappy blog linked in the summary: []

  • Since this technique should be a shoe-in for the Archon X Prize [].
  • Re:99.3% accurate? (Score:2, Informative)

    by prograde (1425683) on Monday January 05, 2009 @03:14PM (#26333401)

    It's common practice in bioinformatics to measure the same data repetitively in an effort to reduce the error.

    It's common practice on Slashdot to read the article before posting. From the abstract of the Science article:

    Consensus sequences were generated from the single-molecule reads at 15-fold coverage, showing a median accuracy of 99.3%, with no systematic error beyond fluorophore-dependent error rates.

    So that's 99.3% after averaging 15 reads. Not exactly replicating the same read 15 times..more like taking random starting points and aligning the results where they overlap, so that each base is covered in 15 different reads.

    Don't get me wrong - this is really cool, and a massive speed-up over current "next-gen" sequencing. And I'm sure that it will get better.

    To answer the GP - yes, this is an acceptable error rate, for now.

  • Re:99.3% accurate? (Score:5, Informative)

    by peter303 (12292) on Monday January 05, 2009 @03:22PM (#26333547)
    One in 10E8 is the DNA base-pair copy error rate. Even so thats around 60 when a sperm meets egg. Another much more when there a trillion somatic cells dividing on average 50 times each in a human lifetime. The vast majority are errors are neutral, but accumulating ten or so specifically unluckly ones in a cell may be a cancer.
  • by chihowa (366380) on Monday January 05, 2009 @03:23PM (#26333585)
    It looks to be inaccessible. Here are the abstract [] and fulltext [] links.
  • by Anonymous Coward on Monday January 05, 2009 @03:24PM (#26333593)

    DOI: 10.1126/science.1162986

  • Re:99.3% accurate? (Score:1, Informative)

    by Anonymous Coward on Monday January 05, 2009 @03:35PM (#26333779)

    It's super fine! It's very very good. I think there is some confusion since they are using different metrics for accuracy. The 99.3% accuracy likely does mean an expected 28 errors per read of 4000 base pairs. But their 99.999% accuracy likely means that there's 99.999% confidence that there are no errors at all. Here's a crash course in DNA sequencing with hand waving and generalization. You slice the DNA into (mostly) random bits. With old methods they were about 20 BP long. With newer 454 pyrosequencing I think they were 100 BP long, or longer. This new technique uses about 4000 long fragments. Then you use some kind of magic to look at those slices. Pyrosequencing gets its name because you have reagents involving pyrophosphate, that attach to your base pairs. So one at a time you can add reagents to combine with the chain, and when the pyrophospate is released it glows. The brighter, the more of the same BP in a row there was. So you put them all on a slide and can do thousands at once. Mix reagents, your CCD looks for glows. Neato! Haven't read this article to see how it works but it doesn't really matter how, so long as it does work.

    But, what do you do with these nice little fragments? (not so little if they're doing 4000 long reads!) Well, you run a pattern matching algorithm to line them up! You are right you'll have 28 errors on average per fragment. But, with 4000 long fragments, if you have overlap of say 1000, even with all 56 errors occurring in that overlap (bad luck) I think you'll find that the odds of this being the correct matching, with 56 errors, are massively massively higher than the odds of the other 944 base pairs matching by random chance or read errors! So these errors won't disrupt the assembly, at least it would be amazing if they actually did. Now typically one would use 15 fold overlap. That is, you run enough fragments so that, statistically speaking, you have high confidence that every basepair will be a member of at least 15 different fragments. So, after assembling, even with some assembly errors, you can do majority voting. If at least 9 out of those 15 (you could in fact use 8 but I'm leaving 1 extra!) all agree, then that's the base pair that goes there!

    Now that that's "explained," what are the actual odds of an error in a genome of length 3*10^9? Well, lets imagine we have the whole human genome done in this way, with reads of length 4000, and with 99.3% accuracy for any given base pair. Well, then in our assembled genome, the odds of an error at a given basepair will be (0.993)^14 * (0.007), odds of two, thee, also calculable. However, when assembling, because the reads are so long, you can be extremely confident that that is where they go, so one error won't disrupt you at all. Even 6 is fine, because the other 9 fragments that overlapped that BP all agree still. The odds of 7 or more errors on the same basepair is something like 10^(-15). The probability of having this occur ZERO times out of 3 billion base pairs is 99.9997%. So that's where their number comes from. It's not that out of these 3 billion base pairs, 99.9997% are right, meaning an expected OVER 9000 base pairs will be wrong. It's saying that with 99.9997% confidence, there are ZERO errors. If there is an error, there's again many 9s chance that it's only a single error. That's pretty damn good! Nearly a 1 in a million chance of even a single error! Of course, the errors won't be uniform, and it may be position dependent (pyrosequencing sure is, I don't know about this technique though). And the coverage won't be uniform, either, so a BP that already has a higher than average proportion of errors could, by bad luck, end up with a low amount of coverage to boot! But I'm sure if you double your coverage for an extra $1000 you can be super sure of the coverage, and can also afford to be extra pessimistic in your error rates, and still get your 99.9999% chance of an error-free sequence.

  • Re:99.3% accurate? (Score:2, Informative)

    by Anonymous Coward on Monday January 05, 2009 @03:44PM (#26333889)

    If you RTFP (requires subscription), no systematic errors were detected

  • nitpick (Score:3, Informative)

    by Zenaku (821866) on Monday January 05, 2009 @03:51PM (#26333971)

    One base-pair does not a gene make.

  • by Fnord666 (889225) on Monday January 05, 2009 @04:28PM (#26334497) Journal
    Here [] is an article in New Scientist about the new process. It explains it fairly well and even defines what a ZMW is.
  • Re:Gattica... (Score:2, Informative)

    by Noddegamra (1270574) on Monday January 05, 2009 @06:31PM (#26336295)
    Since inosine.

FORTRAN is a good example of a language which is easier to parse using ad hoc techniques. -- D. Gries [What's good about it? Ed.]