Forgot your password?
typodupeerror
Biotech The Almighty Buck Science

Graphs Show Costs of DNA Sequencing Falling Fast 126

Posted by timothy
from the we've-got-your-number dept.
kkleiner writes "You may know that the cost to sequence a human genome is dropping, but you probably have no idea how fast that price is coming down. The National Human Genome Research Institute, part of the US National Institute of Health, has compiled extensive data on the costs of sequencing DNA over the past decade and used that information to create two truly jaw-dropping graphs. NHGRI's research shows that not only are sequencing costs plummeting, they are outstripping the exponential curves of Moore's Law. By a big margin."
This discussion has been archived. No new comments can be posted.

Graphs Show Costs of DNA Sequencing Falling Fast

Comments Filter:
  • by the_humeister (922869) on Sunday March 06, 2011 @12:11PM (#35397488)

    How about the cost of analysis of said genomes?

    • Re:Great! (Score:4, Insightful)

      by hedwards (940851) on Sunday March 06, 2011 @12:18PM (#35397548)

      Sequencing has been where the focus on cost has been going. It doesn't make much sense to try and reduce the cost of analysis when it takes a very long time and a huge amount of money to accomplish. The graph was hard to read, but at this point with the cost well over $10k there's a lot more that has to be done before analysis is worth spending a lot of time economizing.

      But as it gets cheaper more and more of the focus will be on the analysis side. And the cost of analysis will come down, given that insurance isn't going to cover the sequencing at this point, analysis is moot in most cases. As more research analyzes sequenced DNA I'm sure tricks and such will be discovered to bring the cost down. But right now you're dealing with low volumes and as such cost is higher than it will be with higher volumes.

      • Define "cost of analysis". I paid 23andme.com $100 per person (myself, my wife, and my brother) to sequence 1 million SNPs per person using Illumina's V3 chip (plus $5/month/person for as long as we have accounts with them) and to provide current and future research data with regards to those SNPs. That is *super cheap* for the kind of data I'm getting out of it (I'd be happy to post an imgr link with an anonymized print-version of the report, although I guess it doesn't matter since I've already uploaded t

        • Re:Great! (Score:5, Informative)

          by varcher (156670) on Sunday March 06, 2011 @12:56PM (#35397836)

          to sequence 1 million SNPs per person

          Actually, they're not sequencing.

          They're checking.

          The way 23andme and most personal genome companies work is that they have those genochips (Illumina) with one million DNA sequences on them, and they check whether or not your DN has one of those sequences.

          If you have a SNP not on the chip (well, you have lots of SNP not on the chip), it won't list anything. If, at a given chromosome locale, they have "all" of the "known" SNP, but you happen to have a mutant variant not on their lib, then you're not detected.

          "Sequencing" involves taking your DNA, and getting every sequence, no matter what. And that's still long and very expensive. We're in the era of the "thousand genomes", meaning we expect in a couple year to complete a thousand full sequences. Of course, 10 years later, we'll sequence everyone, but, so far, it's still a way out.

        • by RDW (41497)

          Progress in SNP chips, though they were a big breakthrough when introduced and remain very important in research, has been pretty static compared to the dramatic speed with which 'next generation' sequencing technologies have brought down the cost and increased the amount of data we have to cope with. Whole genome sequencing is on an entirely different scale - 3 billion bases rather than a million. Even an 'exome' (the sequence of all the actual genes in your genome) runs to about 40 million bases.

          • That's monetary cost - not social and personal. ;-)

            Soon the $ cost will be free - and mandatory. If you want to fly, or even drive a car.

            Hey! And to think, they said it couldn't happen here!

            • by Garridan (597129)
              Hah. You have a funny definition of "free". Know those ever-increasing fees that don't get added onto your your plane ticket cost until you're about to pay? That's the compulsory gratuity for your freedom pat.
            • I find it ironic that whenever a biotech related story gets posted here, the crowd always goes to the extreme worst abuse that tech could be used for, or the most absurd nightmare scenario. Anything using bacteria = whatcouldpossiblygowrong, anything with viruses (even cures) = iamlegend. DNA manipulation will undoubtedly produce zombies. DNA sequencing means the government will steal your DNA and you'll be discriminated against. Somehow. And half of those suggestions are serious.

              Yet new computer ha
    • by RDW (41497)

      "How about the cost of analysis of said genomes?"

      It's computationally expensive and pretty much subject to Moore's Law (though improved algorithms like Burrows-Wheeler alignment have helped to speed things up in the last couple of years). So it's getting cheaper, but not fast enough to keep up with the expected deluge of data. If you're just interested in sequencing a fixed number of genomes you benefit from both cheaper/faster sequencing and cheaper/faster processing power. But if you're a major genome cen

    • by mapkinase (958129)

      Full annotation takes about half an hour for a bacterial genome of 4Mbases on 100 processors.

    • by mapkinase (958129)

      In addition to my previous comment. Assuming cost of processor per year = $2000 per pi*10^7s, 100 proc per 1.8x10^3s = 1.8x10^6s = $100, that's only hardware cost.

  • The industry has been trying to get sequencing done cheaply for a while now. Good to see that there has been success. Now if only the doctors get on the bandwagon and start diagnosing people based on an individuals genome.
    • by x14n (935233)

      Now if only the doctors get on the bandwagon and start diagnosing people based on an individuals genome.

      Sorry, that'll take 10+ years of basic research, 10+ years of clinical trials to provide practical applications for findings of said research, plus another 10+ years for a new generation of doctors to matriculate with knowledge of said applications. More likely, big pharma will be "farming" human genome data for drugs with rapid development platforms [wikipedia.org]. The scary part here is that, without basic research, more unintended consequences are to be expected...

  • by gclef (96311)

    Question for the bio-folks: is there a way for someone (okay, me) to DIY this? I'm curious to know my own genome, but I'm *very* leery of having that data living in some company's database. What I'd like to be able to do is have the data, and be able to look up how any discoveries later on map to what I've got. Is that possible? What I don't want is what seems to be the prevalent pattern right now of companies telling customers: "You have indicators for x & y. Re-do this & pay us again in a few year

    • by MoobY (207480)

      Even in research, most of the sequencing at whole genome level is outsourced to big companies (like, for example, Complete Genomics) since investing in the capabilities, machinery and computer power to sequence whole genomes is simply too big for sequencing one or a few individual genomes (you currently need to invest a few millions to get started with the sequencing of whole genomes). You can DIY sequencing of small fragments (for example, to determine whether a known genetic cause of a hereditary disease

    • If you could get your hands on a V3 chip from Illumina (the same used by 23andme.com) you could do it, although you could always pay 23andme.com the $200 for the sequencing, download the raw data, and delete your account.

      I myself paid for 23andme.com to do genetic profiles for myself, my wife, and my brother. $100/person and then $5/month/person ongoing as they add more research each month (come on, $5? Cheap! for the research data they add, although I guess some people don't see the value).

      • As some ACs have pointed out in response to a few of your posts on this thread, 23andme does not sequence your DNA.
        https://www.23andme.com/you/faqwin/sequencing/ [23andme.com]
        my emphasis:

        What is the difference between genotyping and sequencing?

        Though you may hear both terms in reference to obtaining information about DNA, genotyping and sequencing refer to slightly different things.

        Genotyping is the process of determining which genetic variants an individual possesses. Genotyping can be performed through a variety of different methods, depending on the variants of interest and resources available. At 23andMe, we look at SNPs, and a good way of looking at many SNPs in a single individual is a recently developed technology called a “DNA chip.”

        Sequencing is a method used to determine the exact sequence of a certain length of DNA. Depending on the location, a given stretch may include some DNA that varies between individuals, like SNPs, in addition to regions that are constant. So sequencing is one way to genotype someone, but not the only way.

        You might wonder, then, why we don't just sequence everyone's entire genome, and find every single genetic variant they possess. Unfortunately, sequencing technology has not yet progressed to the point where it is feasible to sequence an entire genome quickly and cheaply. It took the Human Genome Project over 10 years' work by multiple labs to sequence the three billion base pair genomes of just a few individuals. For now, genotyping technologies such as those used by 23andMe provide an efficient and cost-effective way of obtaining more than enough genetic information for scientists—and you—to study. Copyright © 2007-2011 23andMe, Inc. All rights reserved.

        To be sure you have gained interesting information for your $200, but you have neither your sequence, nor a complete list of differences from a reference human sequence, which of course if you did would give you your sequence.
        23andme only gives you a list of many SNPs.

    • by RDW (41497)

      'What I'd like to be able to do is have the data, and be able to look up how any discoveries later on map to what I've got. Is that possible?'

      You can't do genome sequencing, or even SNP chip genotyping, in a DIY lab, so you'll have to involve a large company or research centre at some point. But you can do this anonymously (e.g. through a physician) and get hold of the raw data afterwards to analyse as you please, assuming you have the technical knowledge to make sense of it. Illumina is one company that pr

    • Of course it would be great if we could each get out full genomes - full coverage of every chromosome at high confidence - for an affordable price. However, if you did that, you would find that the vast majority of the information would be quite uninteresting or even borderline meaningless. There are large regions of the chromosomes that do not code for anything, and some of those end up being particularly difficult to sequence accurately. While changes in those regions can be important, changes in those
    • Question for the bio-folks: is there a way for someone (okay, me) to DIY this?

      Wait ten years and then buy ten year old sequencers from todays companies.

      (I'm not being a sarcastic dick. A number of DIY bio-hackers/Makers have "cheaply" stocked their sheds/basements with high-end analysis and synthesis equipment by buying the stuff that mainstream biotech labs have moved away from. When a field is progressing as quickly as biotech, once equipment is one generation out of date, it's completely out of date.)

  • 1 cent apiece to find out why CowboyNeal is the way CowboyNeal is?

  • by MoobY (207480) <anthony@liekenLA ... t minus math_god> on Sunday March 06, 2011 @12:22PM (#35397588) Homepage

    We've been observing this decrease over the last few years at our sequencing lab too. Some people might find it fascinating, but I, as a bioinformatician, find it frightening.

    We're still keeping up at maintaining and analysing our sequenced reads and genomes at work, but the amount of incoming sequencing data (currently a few terabytes of data per month) is increasing four-to-five-fold per year (compared to doubling each 18-24 months in Moore's law). Our lab had the first human genomes at the end of 2009 after waiting for almost 9 years since the world's first human genome, now we're getting a few genomes per month. We're not too far away of running out of installing sufficient processing power (following Moore's law) and no longer being able to process all of this data.

    So yes, the more-than-exponential decrease in sequencing costs is cool and offers a lot of possibilities in getting to know your own genome, advances in personalized medicine, and possibilities for population-wide genome sequencing research, but there's no way we'll be able to process all of this interesting data because Moore's law is simply way too slow as compared to advances in biochemical technologies.

    • by Kjella (173770)

      I assume you're talking about incoming data, not the final DNA sequence. As I understand it the final result is 2 bits/base pair and about 3 billion base pairs so about a CD's worth of data per human. And if you were talking about a genetic database I guess 99%+ is common so you could just store a "reference human" and diffs against that. So at 750 MB for the first person and 7.5 MB for each additional person I guess you could store 2-300.000 full genetic profiles on a 2 TB disk. Probably the whole human ra

      • by RDW (41497) on Sunday March 06, 2011 @01:06PM (#35397942)

        Yes, the incoming (and intermediate) data sets are huge. You don't just sequence each base once, but 30-50 times over on average (required to call variants accurately). And you don't want to throw this data away, since analysis algorithms are improving all the time. But it's true that the final 'diff' to the reference sequence is very small, and has been compressed to as little as 4Mb in one publication:

        http://www.ncbi.nlm.nih.gov/pubmed/18996942 [nih.gov]

      • by jda104 (1652769)

        I assume you're talking about incoming data, not the final DNA sequence. As I understand it the final result is 2 bits/base pair and about 3 billion base pairs so about a CD's worth of data per human. And if you were talking about a genetic database I guess 99%+ is common so you could just store a "reference human" and diffs against that. So at 750 MB for the first person and 7.5 MB for each additional person I guess you could store 2-300.000 full genetic profiles on a 2 TB disk. Probably the whole human race in less than 100 TB.

        The incoming data is image-based, so yes, it will be huge. Regarding the sequence data: yes; in its most condensed format it could be stored in 750MB. There are a couple of issues that you're overlooking, however:
        1. The reads aren't uniform quality -- and methods of analysis that don't consider the quality score of a read are quickly being viewed as antiquated. So each two bit "call" also has a few more bits representing the confidence in that call.
        2. This technology is based on redundant reads. In order

        • by RDW (41497)

          'The incoming data is image-based, so yes, it will be huge.'

          The image data is routinely discarded by at least some major centres; the raw sequence and quality data alone is huge enough to be a major issue! See:

          http://www.bio-itworld.com/news/09/16/10/Broad-approach-genome-sequencing-partI.html [bio-itworld.com]

          'It's been a couple of years since we saved the primary [raw image] data. It is cheaper to redo the sequence and pull it out of the freezer. There are 5,000 tubes in a freezer. Storing a tube isn't very expensive. Stor

    • Problems of success are a bear.
    • by jda104 (1652769)
      Interesting. I view this from a completely different perspective: if DNA sequencing really is outpacing Moore's Law, that just means that the results become disposable. You use them for your initial analysis and store whatever summarized results you want from this sequence, then delete the original data.

      If you need the raw data again, you can just resequence the sample.

      The only problem with this approach, of course, is that samples are consumable; eventually there wouldn't be any more material left to
      • by RDW (41497)

        'If you need the raw data again, you can just resequence the sample.'

        See my reply above to another post - this is exactly the approach that some centres are taking. But as you say, some samples can't be regarded as a consumable resource (e.g. archival clinical material is often only available in limiting quantities).

    • by Anonymous Coward

      Indeed, as a developer/sysadmin in a bioinformatics lab, I find this equally terrifying.

      As of two years ago when my supervisor went to a meeting at Sanger (one of the largest sequencing centres in the world for those reading this, the granddaddies of large scale sequencing) they said a few frightening things. First, they were spending more on power and other items related to data storage than chemical supplies for sequencing. Second, the cost of resequencing something compared to storing the sequenced dat

    • by mapkinase (958129)

      Do not forget storage problems. The center I know already is dropping annotations closer than a substrain. Given recent setbacks in budgeting American national centers (raw sequence data storage project being dropped in one of them), the problem will only get bigger.

    • by timeOday (582209)
      It sounds like the problem is storage capacity moreso than processing capacity, is that so?
    • but there's no way we'll be able to process all of this interesting data because Moore's law is simply way too slow as compared to advances in biochemical technologies.

      And yet tens of if not hundreds of millions of handheld supercomputers are being sent out yearly. The processing power/storage capacity exists to do this stuff, we just need to make and encourage the use of p2p computing apps boinc style on all platforms, and solve the various energy related problems keeping people from turning their com
  • by diewlasing (1126425) on Sunday March 06, 2011 @12:26PM (#35397618)
    ...Oct 2007?
    • by Anonymous Coward

      ...Oct 2007?

      FTFA: "From 2001 to 2007, first generation techniques (dideoxy chain termination or ‘Sanger’ sequencing) for sequencing DNA were already following an exponential curve. Starting around January of 2008, however, things go nuts. That’s the time when, according to the NHGRI, a significant part of production switched to second generation techniques [wikipedia.org]."

    • by Anonymous Coward

      In 2008 454 LifeSciences released the Genome Sequencer FLX, which was the first affordable next-generation sequencer to become widely available, Since then a number of other high-throughput sequencers have been released (including Illumina and SOLiD). This marks the beginning of 2nd generation sequencing era, prior to this, the method used was Sanger-based sequencing, and although this is completely automated nowadays, it is still based on principles that were established in the 1970's, which are comparativ

  • I saw the same thing back in the mid-1990s.
    Sequencing technology was ramping up hyper-exponentially.
    That means that it curves up on semi-log paper.
    It was outstripping Moore's Law, and crushing our data systems.

    Finished DNA sequence only needs 2 bits/base pair,
    but the raw data behind those 2 bits can be much bigger;
    in our case, the raw data was scanned images of radiograms.

    In the early '90s, a typical sequencing project was a few hundred DNA fragments.
    Each fragment is a few hundred base pairs.
    You put each f

  • Looks even better in terms of raw data in spreadsheet format DNA sequencing analysis data [dnasequencing.org]. The fact that per Mb drops from thousands of dollars to under a dollar is astounding.
  • What happens is that in July 07 a new way to do it was introduced. As it was a new technique, it started of 'expensive' and became cheaper. The last three months it is again in the standard just as it was before.

    As if you compare the drop in household costs of one family, where the family moved house to a cheaper estate.

    So yes, in numbers ist has become cheaper, but also it must be clear you are comparing two ways of doing things and obviously people will select the cheaper one.

  • "they are outstripping the exponential curves of Moore's Law. By a big margin"

    Moore's law simply states that the quantity of transistors that can be inexpensively placed on a circuit doubles every two years.

    This is a relatively new area of science. New techniques can be expected to evolve, as would refinements of existing techniques. As it moves from the domain of a very few skilled individuals at universities to more of a commodity where $100 buys you your family tree, economies of scale kick in. And then

  • Personally, I blame Jerry Springer.

  • The Economist had an excellent article about this a while back. Using the number of blades in a razor as the example. The made a graph of time, on the bottom, and the number of blades on the left. Then they drew a curve that fit. For a long time, there was only one blade. Then there were two, and that held a while. Then came three, then four, and now we have five. Now, using sound mathematical methods to extrapolate this curve, The Economist projected that by 2020, a razor will have something like 4

  • by fahrbot-bot (874524) on Sunday March 06, 2011 @01:23PM (#35398058)

    NHGRI's research shows that not only are sequencing costs plummeting, they are outstripping the exponential curves of Moore's Law. By a big margin.

    Moore's Law [wikipedia.org] is about the number of transistors on a wafer and other directly-related hardware density issues, not about cost - and certainly not the cost of gene sequencing.

    • Gordon Moore's paper [intel.com] disagrees. He directly addresses the fact that increased density leads to lower per-component costs.

  • The only question that remains is, "How long does it take?"
  • As a person in the field, I have to say, that one has to consider a quality of genomes in the field (at least bacterial genomes). So called "complete genomes" submissions of the past, in the form of full continuous sequences of chromosomes and plasmids of the organism, are staying in the past, almost all of the new submissions are WGS (whole genome sequences), which is basically bunch of "contigs", pieces of sequences not connected together, 10s, 100s and sometimes 1000s of them. (This is a result of adopti

  • Page advertisement in Nature, a leading science journal. The company was Axeq base in Korea. Here is the ad [axeq.com]. The exome is the 2% of the genome that appears to code for proteins.
  • Moore's Law shouldn't be what the graph is based on. A best fit line would be much more accurate at predicting the price. Technology can have effect on the price. Increase in processor speed and more efficient algorithms can both decrease the time spent processing therefore decreasing the overall cost. But economy of scale will have a much more dramatic effect. When firms increase in size or increase their production, their costs generally go down until they hit a certain point. http://en.wikipedia.org/ [wikipedia.org]
    • by mlush (620447)
      Moore's Law is exactly what we should be measuring this against. CPU speed is proportional to the amount of data that can be processed, it looks like were headed for a era where there is more data than we can process!
  • Jaw dropping graphs indeed, especially the Cost per Megabase one. Extend the graph further to the right, and it appears that Moore's law will reach $0 some time in 2033.

    They mislabeled the graph. That should be $0.1 in the lower left, not $0. Of course, you could say they are rounding, but then they are rounding to a number more granular than what their chosen Y axis range calls for.

  • Complete Genomics claims they will be able to sequence a genome for $5,000. [nature.com] Although, I haven't heard from them in awile..
  • This also means the costs to podunk police departments to synthesize and plant DNA evidence where they want drops as well. The proof will be the exponential increase in the number of calls on innocent citizens to 'donate' DNA samples to track down the 'criminal in our midst du jour'. After all: if you've done nothing wrong, you have nothing to hide, right?

Whoever dies with the most toys wins.

Working...