Forgot your password?
typodupeerror
Supercomputing Biotech Science

Decoding the Genome: Serious Infrastructure 175

Posted by timothy
from the mine-decodes-to-'not-a-winner-try-again' dept.
Roland Piquepaille writes "The Wellcome Trust Sanger Institute is one of the largest genomics data centers in the world. In "The Hum and the Genome," the Scientist writes about the IT infrastructure needed to handle the avalanche of data that researchers have to analyze. With its 2,000 processors and its 300 terabytes of storage, the data center uses today about 0.75 megawatts (MW) of power at a cost of 140,000 per year (about $170K). But the data center will need more than a petabyte of storage within three years, and its yearly electricity bill will reach 500,000 (more than $600K) for about 1.4 MW, enough to power more than a thousand homes. The original article gets all the facts, but this summary contains all the essential numbers."
This discussion has been archived. No new comments can be posted.

Decoding the Genome: Serious Infrastructure

Comments Filter:
  • by JamesD_UK (721413) on Tuesday June 07, 2005 @05:33AM (#12744828) Homepage
    Lots of computers use lots of power which costs lots of money!
    • I don't know which is the best in the ix86 market at the moment, possibly Via, possibly Intel, used to be Transmeta.

      It's one of the things many geeks tend not to consider when they're dreaming up their ideal ultra powerful, ultra cheap beowulf cluster. The fact that you need a megaWatts worth of power and a megaWatts worth of cooling to go along with those $400 1U high density servers running the latest 4GHz AMD CPUs. Suddenly those cheap servers don't look so cheap.

      • Good to see you've got your facts straight before you posted.

        AMD does not have a CPU running anywhere NEAR 4GHz, you're thinking of Intel.

        As far as power consumption..
        "Even the Athlon 64 X2 4800+ consumes less power than all single core 90nm Pentium 4 CPUs" - Anandtech

        For more information please see this [anandtech.com] and this [anandtech.com]

        For less power, better performance use AMD.
      • If you're doing this seriously, 1U servers are too big. IBM BladeCenter: 14 dual-processor blades in a 7U chassis, 6 of which fit in a rack. The power cables look like garden hoses.
    • by Anonymous Coward
      Quit badmouthing my man Roland! He provides a valuable service to slashdot, and he even gave me a blowjob for only $10! Roland will do ANYTHING for many, god bless his 'im.
    • enough to power 1000 homes with the equivalent power of distributed computing software?

      probably not.
  • Amazing! (Score:3, Funny)

    by poopdeville (841677) on Tuesday June 07, 2005 @05:34AM (#12744831)
    The Wellcome Trust Sanger Institute is amazing it will-

    - optimize seamless communities
    - generate vertical e-services
    - everage synergistic convergence

    and best of all

    - engage e-business content Perfect solution
    • Re:Amazing! (Score:1, Insightful)

      by Anonymous Coward

      The Trust also believes that the basic DNA sequence of humans and other organisms as such should be placed in the public domain as soon as is practical, without any fees, patents, licences or limitations on use, giving free and equal access to all. Subject to this, the Trust is supportive of patents encompassing genes and their products when there is research data or information indicating that a particular DNA sequence has a utility such that the legal criteria for patenting can be met.

      Exactly the same

  • by LiquidCoooled (634315) on Tuesday June 07, 2005 @05:34AM (#12744833) Homepage Journal
    I misread that and thought it involved a spotlight and torture methods to a poor garden gnome :(

    "You will tell us what we need to know. WHERE IS THE LAWN MOWER!"
  • by Anonymous Coward
    I think I'm immune to large numbers. This article summary has absolutely no effect on me whatsoever.
  • by Dancin_Santa (265275) <DancinSanta@gmail.com> on Tuesday June 07, 2005 @05:36AM (#12744841) Journal
    The idea behind all this mapping is to find genetic sequences that can be used to mend ailing people. Using a computer to throw every single combination possible against the wall and seeing what sticks is certainly a way to go about this, but it also raises the spectre of a single large company owning all these combinations. This wouldn't be such a terrible thing if there was some sort of actual science involved, but by brute-forcing results, they are doing nothing more complicated than running a counting program with an infinite number of bits.

    So each result is directly traceable to a number. Will these companies own these numbers? Can you even take out a patent on a number? In the DeCSS case, it was argued that the decoding algorithm was protected even though some implementations of it were nothing more than a carefully crafted prime number.

    I don't like the idea of someone owning numbers any more than I think someone should be entitled to the fruits of their own work. This whole patent "creation/reward" system is getting turned on its head because of the power of computers. What would have been prohibitive even 10 or 15 years ago is possible (even easy) now. How can we keep our rights without sacrificing the progress of science and the arts?
    • by Hittite Creosote (535397) on Tuesday June 07, 2005 @06:12AM (#12744954)
      The centre is funded by the Wellcome Trust and the UK's Medical Research Council. The Wellcome Trust Sanger Institute is a non-trading, non-profit making registered charity. And they tend to make their results open - these are the people who said that the genome should belong to no one individual or company. In other words, if you want to keep your rights without sacrificing the progress of science - we need more places like the Sanger centre.
    • by Gurdy (155196) on Tuesday June 07, 2005 @06:14AM (#12744962)
      > "it also raises the spectre of a single large company owning all these combinations."

      You might be interested to read our data release policy http://www.sanger.ac.uk/Projects/release-policy.sh tml [sanger.ac.uk] which describes how the finished data is made publicly available, to all, no charge.

      (I work at the Sanger Centre.)

      Dave
    • Well the nice thing about the Wellcome Trust is that they are an independent charity and the largest non-corporate non-governmental source of biomedical research funding in the UK.

      Maybe you'd like to read their constitution: here [wellcome.ac.uk]

      Sure theres a chance that things can get tied up in the hands of companies - but lets look at the human genome project. The best data came out of the academic sector, the private data (held by Celera) didn't turn out to be too profitable after all (or even better quality) and is
    • This wouldn't be such a terrible thing if there was some sort of actual science involved, but by brute-forcing results, they are doing nothing more complicated than running a counting program with an infinite number of bits.

      This way of genome decoding is much more spectacular. It attracts investors. And doing it the hard way will definitely bring some results.

      But bunch of boring sciencists, writing boring equations, can also result in zero success. On the other hand it could save whole lotta money for

    • As noted, the Sanger, like the rest of the public HGP centers, makes their data accessible to everyone for free. You're thinking of Celera, who got out of the business.

      (Claimer: I work at another of the centers. Similar scale, but those folks at Sanger have more server room floor space...sigh.)
  • I wish I could get the submitters exchange rate. I'd be rich rich rich. It's currently around 1.9 dollars to the pound meaning anual running costs are more line $260k which could rise to around $1m.

    Having said that everything is cheaper on the US side of the pond so the submitter is probably about right. Sigh.

    • I think the prices quoted are Euros.
      Currently 1 Euro = 1.23 USD, so I think the article is about right.
      • The trouble is, the poster was too lazy to a) look up the ascii code for or b) type EUR in front of the figures (as in EUR400,000).
        Both sterling and euros are used when quoting costs in Britain.
        • Ok, I'll revise that

          The trouble is, Slashdot is too shit to handle the Euro symbol.
          • Re:Exchange Rate (Score:2, Informative)

            by Anonymous Coward
            Ways to put the Euro symbol in webpages:
            • Hex code 0xA4 (decimal 164) in codepage 8859-15 is what you get when you press AltGr+e. This happens to be the general currency symbol in 8859-1, so it's not a good choice if you can't make sure that the document comes with the correct encoding declaration. ""
            • HTML entity &euro; "€"
            • Unicode character reference &#8364; ""
            • Hexadecimal unicode character reference &#x20AC; ""

            As you can see, Slashcode filters all but the html entity, so that's your onl

  • Doing some quick math here: 2000 processors+1petabyte, divide by 1000=
    2 processors + 1TB per house.
    In processors: Way past it
    In storage: Getting there (quick count of harddisks lying around= 750GB at least)

    Since my energy bill is lower, even with the hardware running 24/7/365, are they buying their energy to expensive or what?
    • The computing power density of 1000 homes is a lot lower. 1000 homes + the ground they sit on also cost more to buy/build than the one datacenter.
      • It is more about the power consumption. It just seems to expensive. I just tried to compare it in a way which makes their claims just sound to big.

        I do not propose a shared/distributed infrastructure, especially not for the storage (if they use up my 750GB, where do I leave my own data? Offline on DVDs?)
        • It is more about the power consumption

          Yep.

          First off, utility companies generally charge a higher rate for business/industrial power than they do for residential power; so even if all things were equal, they'd still be paying more per KW/H than you.

          Secondly, you can't compare a couple of desktop machines running in a home office to a datacenter with multiple fully-populated 72U racks. Running 2 or 3 computers in a 120 ft^2 room isn't going to require any additional cooling. Running 2000 mahines


          • First off, utility companies generally charge a higher rate for business/industrial power than they do for residential power; so even if all things were equal, they'd still be paying more per KW/H than you.


            Actually you that is backwards. Residential power is usually more expensive. Think of it as buying in volume. Additionally, some businesses and many industrial power users negotiate lower rates with the stipulation that in case of a certain stage of power consumption/power shortage, their power wil
    • It's simple really. They have to power more stuff than your house does.

      Whereas one computer doesn't really produce enough heat to cause a problem in the house (well, dependingly...), 2000 do. This requires an inbuilding airconditioning system to vent the heat, which adds a LOT to the energy bill.

      The computers themselves are usually a small load when it comes to the utilites of the building. Oh, of course there are monitors, and things you'd find in an ordinary home (well, probably microwaves, coffee p
    • You have to add in the 'other' stuff too - network boxes, firewalls, routers, switches, etc. Also lights and AC for it all in one place. Not all the processors in the houses will be top of the line either, which adds to the AC bill. Maybe factor in redundancy in the network too - I wouldn't want a SPOF in an array that large. Stuff adds up...
  • Windows (Score:2, Funny)

    by Elshar (232380)
    They must be using Windows ClusterFun edition.
  • by goneutt (694223) on Tuesday June 07, 2005 @05:50AM (#12744892) Journal
    TANSTAFL. This post seems drawn into the spinning power meter dials and not caring about what the computer is. If you want a lot of power, you need a lot of power. Chip scale efficiency could reduce their bill, but its a research foundation crunching numbers all day. If they need more money they just ask their contributors politly.
    How's this stack up with google's server farm bill.
  • by itsme (6372) on Tuesday June 07, 2005 @05:54AM (#12744903) Homepage
    http://www.archive.org/web/petabox.php [archive.org]

    it uses only 60kW for 1 Peta byte
  • by manavendra (688020) on Tuesday June 07, 2005 @05:58AM (#12744916) Homepage Journal
    What about the costs of scaling and maintaining such an infrastructure? The routine administrative tasks, reporting, etc? The costs for someone actually looking at the generated results to see if they are meaningful at all, and if it is all going in the right direction?
  • Math (Score:5, Interesting)

    by Alphanos (596595) on Tuesday June 07, 2005 @06:08AM (#12744943)
    Cost of 0.75 MW: ~$170K
    $/MW: ~$227K

    Cost of 1.4 MW: >$600K
    $/MW: >$429K

    Why the difference?
    • Part of it is probably inflation, but personally I'm hoping my electricity bill isn't going to double in the next three years...
    • Re:Math (Score:3, Interesting)

      by Walkiry (698192)
      >Why the difference?

      Presumably, the infrastructure to get 1.4 MW safely inside the same building and distribute it is more complcated and expensive than what two independent .75 MW would be. Things tend to go down in price when you buy in bulk, until you reach a point where the amount you're asking for is giving more trouble than what is usually dealt with.
    • Maybe it's a cleaner grade of electron ... you know the stuff audiophiles use
    • Re:Math (Score:3, Insightful)

      by Renraku (518261)
      Diminishing returns.

      You've gotta have a lot of infrastructure outside the facility to be able to support 1.4MW. Infrastructure that is probably taken care of by the power company, for a fee.

      And the more power you push down the line, the more power that is lost to the environment. Especially if you're overcharging the lines, which causes acceleration of the loss the more power you pump into them.
  • Units (Score:3, Insightful)

    by Hank Chinaski (257573) on Tuesday June 07, 2005 @06:37AM (#12745033) Homepage
    They use Megawatts as a measurement of energy consumption? Should't that be Megawatt/hour ? P.S.: Dont click the link. Editors could at least include as "Signup required" warning.
    • No, MW is the correct way to express it.
    • Re:Units (Score:1, Insightful)

      by Anonymous Coward
      They use Megawatts as a measurement of energy consumption? Should't that be Megawatt/hour ?

      No. Didn't you pay attention in high school? A megawatt is a unit of power. Power is energy divided by time. A watt is one Joule per second. A joule is a unit of energy.

      So, watts means joules per second. When you get your household electric bill it is in kilowatt-hours, which is the number of watts multiplied (not divided) by the time you consumed that many megawatts.

      So, since a watt is energy/time, a kilowatt-h
  • by Pegasus (13291)
    It seems all their boxen are based on Alpha processors. Why? Simply, because even today, you can get the most flops per clock tick out of Alpha. It's a shame such a wonderful architecture was burried.

    Anyway, I think I'll be the first in line when they deceide to retire their gs320 servers :)
    • not any more: the Itanium2 is 2 times as fast by that metric. Alpha @ 833MHz: SPECfp2000 644 SPECfp_base2000 571. Itanium2 @ 1.6 GHz: SPECfp2000 2675 SPECfp_base2000 2675. Alpha was cool in its day, some of those ideas went into Itanium. Itanium will "flop" in market for same reasons 8D
  • by Wayne247 (183933) <slashdot@laurent.ca> on Tuesday June 07, 2005 @06:43AM (#12745051) Homepage
    The interesting bit about genome research is that suppose we do find what the human genetic code all means. We can then start treatments to correct genetic problems, right? If we do so, and say we correct illness X on some kid. When this kid grows up, becomes an adult and have kids of his own, what kind of genetic heritage will he give his own kids? Will these kids inheric the original bad gene of their parent? If so, we'd be running at our lost since defects would multiply across generations...
    • To correct the kid's kids, you need to make the correction in the gamete, before the original kid is conceived. Maybe I'm not reading enough lately, but from Huxley to Gattaca, I don't recollect anyone actually trying that method...
    • We can do either (Score:4, Informative)

      by cookie_cutter (533841) on Tuesday June 07, 2005 @08:56AM (#12745714)
      Will these kids inheric the original bad gene of their parent?

      It depends. If you are doing somatic cell genetic engineering, then you only fix those cells in the patient in which the defect manifests itself, and not the germ-line cells (ie, sperm and eggs), so the 'fix' is not passed on to the next generation. If instead you modify the germ-line cells as well, then the 'fix' is passed on to the next generation.

      One of the main reasons for doing the somatic fix rather than the germ-line fix is that we're still pretty damned new to this genetic engineering thingy, so it's probably a good idea to not fuck with the genetic heritage of future generations just to cure a patient today. However, as the science and technology develops, and we gain more experience with it, our self-assuredness in our abilities will likely increase, and we'll think we know what we're doing enough to risk making 'permanant' changes to the germ-line. I put 'permanant' in quotes, because if we make genetic changes one way, we should be able to turn them back if and when we decide they are mistakes.

    • Every person has about 7 major genetic defects in their DNA, which only become apparent when two parents with the same defective gene have children.

      Dealing with genetic diseases relies on three stages:

      1. Identify which genes cause the problem and how they are passed on through the generations; whether they are dominant or submissive.

      2. Create a test which determines which genes each parent has. From this information, it is possible to determine whether the disease will be passed on to their children.

      If
  • Dont bother decoding the thing!! ask God for the password!!
  • I wonder what the target of this research is. Daily I hear news on TV about people dying of hunger in Africa and other parts of the world. Can't this money be used there? Or am I nuts to think that way.
    • Re:Research target (Score:1, Informative)

      by Anonymous Coward

      The human genome is only one of the many genomes being studied at the Institute.

      One of the organisms being actively studied at the Sanger Institute is Paramecium falciparum, the organism that causes malaria, and Anopheles gambiae, a mosquito. Study of both of these will hopefully reap huge benefits in the treatment, prevention and perhaps eventually eradication of malaria.

      The Pathogen Sequencing Unit that's doing that is also studying other major third world diseases, such as plague.

      And much of what

    • So could every dime you spend on CableTV, eating out, video games, video cards, and anything else that you do not need to survive. You could also stop waisting time and get a second job and donate all that money to feed the poor. Remember that when you say "they" should do something about it... You are they.
      That rant aside this research could lead to cures for all sorts of diseases that are currently killing people. So yes you are nuts to think that way.
    • Brand new trolling account?
    • people dying of hunger in Africa and other parts of the world. Can't this money be used there?
      Yes, I'm sure $FAVOURITE_KLEPTOCRAT could find the space to squeeze another Mercedes in.
  • That remark should be sufficient here. I mean... whoa...
  • Do YOU hate Roland Piquepaille? It doesn't have to be so. With my scientifically proven brainwashing program, you can rid yourself of piquephobia forever!

    http://www.bemmu.com/pique/ [bemmu.com]
  • My car requires 1.21 jigawatts and a flux-capacitor.
  • Don't tell me endlessly repeated combinations of the same four base pairs needs 300 TB...
    • Don't tell me endlessly repeated combinations of the same four base pairs needs 300 TB...

      I think it needs more than 300TB infact, it probably needs an infinite amount of space
    • Compression would add to the need for a) greater data redundancy (because compression errors DO happen from time to time), b) more computational time (unless someone made a Gzip chip and stuck it on an HD controller.. *ponders*), and c) would be terribly cost-inefficient.

      HD's are a dime a dozen. CPUs are not. If you have to have more costly CPUs running your File servers, that means less costly CPUs to run your Genetic Algorithms (pardon the pun).
  • Friend of mine manages a cluster that models the worlds oceans. One thing they forgot about when planning it was the cooling needs. That added a nice chunk to the budget.

    I doubt they even looked at the power requirements.

    But it is cool to have access to a super computing cluster.

  • Those stats sound roughly comparable, if anything slightly lower, than what a private company I know of runs for seismic data processing.
  • So much for the whole, "only as complex as a fruit fly" blurb which people use to say humans are simple creatures.

  • as to what they are actually doing with all this computing power.

    OK I broadly understand 'sequencing the human genome' is mapping out all the combinates of genes. There are 23 chromosomes in the human genome. That chromosomes are a pair of the genes. I understand that each gene is one of four DNA molecules called A,G,C & T. There 16 combinations of those mlecules and I can map those out with a pencil and paper, I can produce all 23 sets with desktop computing power.

    So why does it take so much com
    • by oneandoneis2 (777721) on Tuesday June 07, 2005 @09:43AM (#12746055) Homepage
      No, no!

      There are 23 chromosomes in the human genome. That chromosomes are a pair of the genes. I understand that each gene is one of four DNA molecules called A,G,C & T. There 16 combinations of those mlecules and I can map those out with a pencil and paper, I can produce all 23 sets with desktop computing power.

      There are 23 chromosomal pairs. Each half of each pair contains the same (more or less) information - you could think of it as a genetic back-up system. (Except for the XY chromosomal pair in males). At the start, one chromosome is maternal, the other is paternal. But over time, they actually swap bits around until there's a mixture.

      Each chromosome contains one immensely long strand of DNA, a double-helix. This double helix is NOT redundant, only one of the two strands contains genetic information: The other strand is only there to make it easier to copy the helix.

      The human genome is approximately 3 billion bases long, and it takes three bases (known as a codon) to code one amino acid. 4 x 4 x 4 = 64 possible amino acids. (Altho they only actually code 20 or so). Then you have to filter out all the codons that don't actually code anything, and are discarded before the gene is transcribed into a protein.

      NOW do the math!

      • and overlap. please see my other post linking to the http://www.ensembl.org/ [ensembl.org] genome browser.

        if you want to see a very dense genome, try looking at some viri. they take advantage of the fact that each amino acid that is used to make the protein machinery are encoded using three bases, and so can put three genes almost on top of each other. It's on the level of funkyness of a programmer writing a sequence of bits in machine language where 8 fully functional programs could be derived depending on whether you
    • I don't really know what these guys are doing with their computing power, but one cool free bioinformatics resources that allow you to browse the genome is

      http://www.ensembl.org/ [ensembl.org]

      User interface is fairly intuitive and well documented.

      You can see that serving this information is a non-trivial engineering problem.
  • Unfortunatly im lying
    • I heard it wasn't an arrest, but that he volunteered to be a defense witness at the Michael Jackson trial. The deal Piquepaille offered was that in return for testimony admitting to being the real owner of Michael's porn mags, he'd get an exclusive interview on primidi.com, complete with pix of Michael's detachable nose collection, and banner ads from Crazy Glue. Or was it 3M duct tape?

      The defense lawyers said they'd wait and see - they want to hear back from a rebuttal witness (the goat.cx guy) first, bec

  • Only one person in the world has ever claimed to have met him - in the pressroom at Microsoft Devshed Conference in Boston complete with a Roland Pipaquelle badge - and described him as a fortyish reddish-blonde who giggled a lot.

    Oh yeah? Wonder what cold crème he uses. Rolland Pipaquelle is a 61-year-old Jehovah's Witness who lives in a shabby genteel garden apartment in desperate need of an interior decorator on a heavily trafficked commercial road at nnnn XXXXXXXXX XXX. XXXXXX, New York. XXXXXX is
  • by KoReE (4358)
    It's kind of sad that the datacenter I work in does nothing anywhere near as important as genome number crunching.....yet uses a TON more power, and has WAY more storage than this genome DC in this story...
  • I have a sony DMS-8400 petabyte storage array sitting in storage. They should buy it from me.

    You wouldn't believe how hard it is to sell something like this. It seems like any of the companies that need it have the money to purchase it new. Argh!

  • Be careful not to get a Flux Capacitor to close to this
  • The Wellcome Institute... I wonder how they get their samples?
  • "A shift from proprietary to commodity hardware will also help keep costs down, as will the planned move away from a proprietary 64-bit operating system to open-source Linux. Though the move chimes with Sanger's open-source ethos for its sequence data, Butcher cites solid practical reasons for the change. "HP [Hewlett Packard] pulled the plug on the Alpha chip," he says, "so we have nowhere to go." Moving to another proprietary system means it could happen again, he says. "I want something we can rely on an
  • I just saw a commercial the other day from IBM, int he commercial 2 scientists were looking for computers to help them map the human genome and replicate the folding of protiens.

    The IBM representative said "Here is Gene, it is able to fold protiens and map the entire human genome". It was a cluster of IBM systems (maybe 40 total).

    I just laughed and tried to explain to my wife how much BS this was (which basically describes all marketing).
  • That would mean that the mean power consumption of a house is 1.4 kW. A good hair dryer uses that much. Large appliances like air-conditioners, refrigerators, dishwashers, clothes dryers, etc. are all around 1-2 kW.

    1.4 kW is about 2 horsepower. At 110 V, 1.4 kW is a current draw of 12 A. (At 220 V, it's 6 A.) I guess "over 1000 houses" sounds much better than "a few hundred houses."

  • 300 terabytes, advancing to a petabyte in three years? I can see it now.

    "Yeah! We got it! The whole Human Genome! We scanned that sucker in... wait, what? You meant... /all/ the genome? Okay. Uhm. Well, we already got Alice scanned in. I guess we can have Bob done in another couple years or so, maybe Charlie after that, but I think we're going to need to look at how we're doing this..."

Pause for storage relocation.

Working...