Decoding the Genome: Serious Infrastructure 175
Roland Piquepaille writes "The Wellcome Trust Sanger Institute is one of the largest genomics data centers in the world. In "The Hum and the Genome," the Scientist writes about the IT infrastructure needed to handle the avalanche of data that researchers have to analyze. With its 2,000 processors and its 300 terabytes of storage, the data center uses today about 0.75 megawatts (MW) of power at a cost of 140,000 per year (about $170K). But the data center will need more than a petabyte of storage within three years, and its yearly electricity bill will reach 500,000 (more than $600K) for about 1.4 MW, enough to power more than a thousand homes. The original article gets all the facts, but this summary contains all the essential numbers."
Breaking news from our reporter Roland Piquepaille (Score:5, Funny)
Some computers use more power and do less (Score:2)
It's one of the things many geeks tend not to consider when they're dreaming up their ideal ultra powerful, ultra cheap beowulf cluster. The fact that you need a megaWatts worth of power and a megaWatts worth of cooling to go along with those $400 1U high density servers running the latest 4GHz AMD CPUs. Suddenly those cheap servers don't look so cheap.
Re:Some computers use more power and do less (Score:2, Informative)
AMD does not have a CPU running anywhere NEAR 4GHz, you're thinking of Intel.
As far as power consumption..
"Even the Athlon 64 X2 4800+ consumes less power than all single core 90nm Pentium 4 CPUs" - Anandtech
For more information please see this [anandtech.com] and this [anandtech.com]
For less power, better performance use AMD.
Re:Some computers use more power and do less (Score:2)
Re:Breaking news from our reporter Roland Piquepai (Score:1, Funny)
Re:Breaking news from our reporter Roland Piquepai (Score:2, Funny)
probably not.
Amazing! (Score:3, Funny)
- optimize seamless communities
- generate vertical e-services
- everage synergistic convergence
and best of all
- engage e-business content Perfect solution
Re:Amazing! (Score:1, Insightful)
Exactly the same
Decoding the gnome? (Score:5, Funny)
"You will tell us what we need to know. WHERE IS THE LAWN MOWER!"
Re:Decoding the gnome? (Score:1, Funny)
Re:Decoding the gnome? (Score:2)
emerging gnome, package 3 of 46438847...
That's... interesting (Score:1, Funny)
Who owns the results? (Score:5, Interesting)
So each result is directly traceable to a number. Will these companies own these numbers? Can you even take out a patent on a number? In the DeCSS case, it was argued that the decoding algorithm was protected even though some implementations of it were nothing more than a carefully crafted prime number.
I don't like the idea of someone owning numbers any more than I think someone should be entitled to the fruits of their own work. This whole patent "creation/reward" system is getting turned on its head because of the power of computers. What would have been prohibitive even 10 or 15 years ago is possible (even easy) now. How can we keep our rights without sacrificing the progress of science and the arts?
Re:Who owns the results? (Score:4, Informative)
Re:Who owns the results? (Score:5, Informative)
You might be interested to read our data release policy http://www.sanger.ac.uk/Projects/release-policy.s
(I work at the Sanger Centre.)
Dave
Re:Who owns the results? (Score:2, Informative)
http://www.ensembl.org/ [ensembl.org]
another sangerite
Re:Who owns the results? (Score:1)
Re:Who owns the results? (Score:3, Informative)
Maybe you'd like to read their constitution: here [wellcome.ac.uk]
Sure theres a chance that things can get tied up in the hands of companies - but lets look at the human genome project. The best data came out of the academic sector, the private data (held by Celera) didn't turn out to be too profitable after all (or even better quality) and is
Re:Who owns the results? (Score:1)
This way of genome decoding is much more spectacular. It attracts investors. And doing it the hard way will definitely bring some results.
But bunch of boring sciencists, writing boring equations, can also result in zero success. On the other hand it could save whole lotta money for
Re:Who owns the results? (Score:2)
(Claimer: I work at another of the centers. Similar scale, but those folks at Sanger have more server room floor space...sigh.)
Exchange Rate (Score:1)
I wish I could get the submitters exchange rate. I'd be rich rich rich. It's currently around 1.9 dollars to the pound meaning anual running costs are more line $260k which could rise to around $1m.
Having said that everything is cheaper on the US side of the pond so the submitter is probably about right. Sigh.
Re:Exchange Rate (Score:1)
Currently 1 Euro = 1.23 USD, so I think the article is about right.
Re:Exchange Rate (Score:1)
Both sterling and euros are used when quoting costs in Britain.
Re:Exchange Rate (Score:1)
The trouble is, Slashdot is too shit to handle the Euro symbol.
Re:Exchange Rate (Score:2, Informative)
As you can see, Slashcode filters all but the html entity, so that's your onl
Re:Exchange Rate (Score:1)
Enough to power a thousand homes (Score:2, Interesting)
2 processors + 1TB per house.
In processors: Way past it
In storage: Getting there (quick count of harddisks lying around= 750GB at least)
Since my energy bill is lower, even with the hardware running 24/7/365, are they buying their energy to expensive or what?
Re:Enough to power a thousand homes (Score:2)
Re:Enough to power a thousand homes (Score:1)
I do not propose a shared/distributed infrastructure, especially not for the storage (if they use up my 750GB, where do I leave my own data? Offline on DVDs?)
Re:Enough to power a thousand homes (Score:3, Insightful)
Yep.
First off, utility companies generally charge a higher rate for business/industrial power than they do for residential power; so even if all things were equal, they'd still be paying more per KW/H than you.
Secondly, you can't compare a couple of desktop machines running in a home office to a datacenter with multiple fully-populated 72U racks. Running 2 or 3 computers in a 120 ft^2 room isn't going to require any additional cooling. Running 2000 mahines
Re:Enough to power a thousand homes (Score:2)
First off, utility companies generally charge a higher rate for business/industrial power than they do for residential power; so even if all things were equal, they'd still be paying more per KW/H than you.
Actually you that is backwards. Residential power is usually more expensive. Think of it as buying in volume. Additionally, some businesses and many industrial power users negotiate lower rates with the stipulation that in case of a certain stage of power consumption/power shortage, their power wil
Re:Enough to power a thousand homes (Score:2)
Whereas one computer doesn't really produce enough heat to cause a problem in the house (well, dependingly...), 2000 do. This requires an inbuilding airconditioning system to vent the heat, which adds a LOT to the energy bill.
The computers themselves are usually a small load when it comes to the utilites of the building. Oh, of course there are monitors, and things you'd find in an ordinary home (well, probably microwaves, coffee p
Re:Enough to power a thousand homes (Score:2)
Re:Enough to power a thousand homes (Score:1)
(I work at the Sanger Institute)
Windows (Score:2, Funny)
Re:Windows (Score:4, Funny)
I think you mis-spelled that.
Big computers = big power (Score:5, Insightful)
How's this stack up with google's server farm bill.
have they heard of the petabox? (Score:5, Interesting)
it uses only 60kW for 1 Peta byte
Re:have they heard of the petabox? (Score:1)
Shame if ive got a petabyte id rather have my system running a real os
VMS forever!!!
Hows them uptime records going eh?
Re:have they heard of the petabox? (Score:2)
Petafiles
If you didn't laugh, say it out loud.
Whats with the emphasis on power and its costs? (Score:4, Insightful)
Re:Whats with the emphasis on power and its costs? (Score:1)
Within the IT budget the cost of maintaining the equipment & running it (including power & cooling) far outstrip the original purchase price.
[I work here too]
Math (Score:5, Interesting)
$/MW: ~$227K
Cost of 1.4 MW: >$600K
$/MW: >$429K
Why the difference?
Re:Math (Score:1)
Re:Math (Score:3, Interesting)
Presumably, the infrastructure to get 1.4 MW safely inside the same building and distribute it is more complcated and expensive than what two independent
Re:Math (Score:1)
Re:Math (Score:3, Insightful)
You've gotta have a lot of infrastructure outside the facility to be able to support 1.4MW. Infrastructure that is probably taken care of by the power company, for a fee.
And the more power you push down the line, the more power that is lost to the environment. Especially if you're overcharging the lines, which causes acceleration of the loss the more power you pump into them.
Units (Score:3, Insightful)
Re:Units (Score:1)
Re:Units (Score:1, Insightful)
No. Didn't you pay attention in high school? A megawatt is a unit of power. Power is energy divided by time. A watt is one Joule per second. A joule is a unit of energy.
So, watts means joules per second. When you get your household electric bill it is in kilowatt-hours, which is the number of watts multiplied (not divided) by the time you consumed that many megawatts.
So, since a watt is energy/time, a kilowatt-h
Re:Units (Score:2)
Alpha! (Score:2)
Anyway, I think I'll be the first in line when they deceide to retire their gs320 servers
Re:Alpha! (Score:1)
Re:Alpha! (Score:1)
Genome - the dog chasing its tail? (Score:4, Insightful)
Re:Genome - the dog chasing its tail? (Score:2, Informative)
We can do either (Score:4, Informative)
It depends. If you are doing somatic cell genetic engineering, then you only fix those cells in the patient in which the defect manifests itself, and not the germ-line cells (ie, sperm and eggs), so the 'fix' is not passed on to the next generation. If instead you modify the germ-line cells as well, then the 'fix' is passed on to the next generation.
One of the main reasons for doing the somatic fix rather than the germ-line fix is that we're still pretty damned new to this genetic engineering thingy, so it's probably a good idea to not fuck with the genetic heritage of future generations just to cure a patient today. However, as the science and technology develops, and we gain more experience with it, our self-assuredness in our abilities will likely increase, and we'll think we know what we're doing enough to risk making 'permanant' changes to the germ-line. I put 'permanant' in quotes, because if we make genetic changes one way, we should be able to turn them back if and when we decide they are mistakes.
Re:Genome - the dog chasing its tail? (Score:2)
Dealing with genetic diseases relies on three stages:
1. Identify which genes cause the problem and how they are passed on through the generations; whether they are dominant or submissive.
2. Create a test which determines which genes each parent has. From this information, it is possible to determine whether the disease will be passed on to their children.
If
Password (Score:1)
Research target (Score:1)
Re:Research target (Score:1, Informative)
The human genome is only one of the many genomes being studied at the Institute.
One of the organisms being actively studied at the Sanger Institute is Paramecium falciparum, the organism that causes malaria, and Anopheles gambiae, a mosquito. Study of both of these will hopefully reap huge benefits in the treatment, prevention and perhaps eventually eradication of malaria.
The Pathogen Sequencing Unit that's doing that is also studying other major third world diseases, such as plague.
And much of what
Re:Research target (Score:2)
That rant aside this research could lead to cures for all sorts of diseases that are currently killing people. So yes you are nuts to think that way.
Re:Research target (Score:2)
Re:Research target (Score:2)
Whoa (Score:1)
Cure your piquephobia (Score:1)
http://www.bemmu.com/pique/ [bemmu.com]
Big deal! (Score:2)
Compression, people, compression! (Score:1)
Re:Compression, people, compression! (Score:1)
I think it needs more than 300TB infact, it probably needs an infinite amount of space
Re:Compression, people, compression! (Score:2)
HD's are a dime a dozen. CPUs are not. If you have to have more costly CPUs running your File servers, that means less costly CPUs to run your Genetic Algorithms (pardon the pun).
How true (Score:1)
I doubt they even looked at the power requirements.
But it is cool to have access to a super computing cluster.
It's not *that* special (Score:2)
Those stats sound roughly comparable, if anything slightly lower, than what a private company I know of runs for seismic data processing.
Oh well. (Score:2)
humans are simple creatures (Score:2)
An arabadopsis plant on the other hand, that like most plants survives by modifying its cells rather than running away from danger, that's complex.
I haven't a clue... (Score:2)
as to what they are actually doing with all this computing power.
OK I broadly understand 'sequencing the human genome' is mapping out all the combinates of genes. There are 23 chromosomes in the human genome. That chromosomes are a pair of the genes. I understand that each gene is one of four DNA molecules called A,G,C & T. There 16 combinations of those mlecules and I can map those out with a pencil and paper, I can produce all 23 sets with desktop computing power.
So why does it take so much com
Re:I haven't a clue... (Score:5, Insightful)
There are 23 chromosomes in the human genome. That chromosomes are a pair of the genes. I understand that each gene is one of four DNA molecules called A,G,C & T. There 16 combinations of those mlecules and I can map those out with a pencil and paper, I can produce all 23 sets with desktop computing power.
There are 23 chromosomal pairs. Each half of each pair contains the same (more or less) information - you could think of it as a genetic back-up system. (Except for the XY chromosomal pair in males). At the start, one chromosome is maternal, the other is paternal. But over time, they actually swap bits around until there's a mixture.
Each chromosome contains one immensely long strand of DNA, a double-helix. This double helix is NOT redundant, only one of the two strands contains genetic information: The other strand is only there to make it easier to copy the helix.
The human genome is approximately 3 billion bases long, and it takes three bases (known as a codon) to code one amino acid. 4 x 4 x 4 = 64 possible amino acids. (Altho they only actually code 20 or so). Then you have to filter out all the codons that don't actually code anything, and are discarded before the gene is transcribed into a protein.
NOW do the math!
genes run in both directions (Score:2)
if you want to see a very dense genome, try looking at some viri. they take advantage of the fact that each amino acid that is used to make the protein machinery are encoded using three bases, and so can put three genes almost on top of each other. It's on the level of funkyness of a programmer writing a sequence of bits in machine language where 8 fully functional programs could be derived depending on whether you
Re:I haven't a clue... (Score:2)
http://www.ensembl.org/ [ensembl.org]
User interface is fairly intuitive and well documented.
You can see that serving this information is a non-trivial engineering problem.
Roland Piquepaille Arrested (Score:2)
Re:Roland Piquepaille Arrested (Score:2)
The defense lawyers said they'd wait and see - they want to hear back from a rebuttal witness (the goat.cx guy) first, bec
Who is Roland Pipaquelle? (Score:2)
Oh yeah? Wonder what cold crème he uses. Rolland Pipaquelle is a 61-year-old Jehovah's Witness who lives in a shabby genteel garden apartment in desperate need of an interior decorator on a heavily trafficked commercial road at nnnn XXXXXXXXX XXX. XXXXXX, New York. XXXXXX is
Sad... (Score:1)
Re: (Score:2)
Uh Oh (Score:1)
hummm (Score:1)
The Relevant Quote Is (Score:2)
Hmmm maybe they should call IBM (Score:2)
The IBM representative said "Here is Gene, it is able to fold protiens and map the entire human genome". It was a cluster of IBM systems (maybe 40 total).
I just laughed and tried to explain to my wife how much BS this was (which basically describes all marketing).
1.4 MW = over 1000 homes? (Score:2)
1.4 kW is about 2 horsepower. At 110 V, 1.4 kW is a current draw of 12 A. (At 220 V, it's 6 A.) I guess "over 1000 houses" sounds much better than "a few hundred houses."
Odd storage requirements. (Score:2)
"Yeah! We got it! The whole Human Genome! We scanned that sucker in... wait, what? You meant...
Re:Overkill (Score:1)
Re:Overkill (Score:1)
Really, there wouldn't be any decent ST: TNG episodes if that were the case...
Re:Roland (Score:5, Funny)
They're trying to decode his genome to find the missing link.
Which will lead to his website, of course.
Re:Roland (Score:3, Funny)
Awww, he's just French... =)
Re:Roland (Score:2)
Seriously, I'm putting "127.0.0.1 primidi.com" in my hosts file TODAY.
Re:Fuck Roland (Score:5, Informative)
Just have a look on http://www.google.com/search?query=Roland+Piquepa
Real whore here is Timothy. I bet he'll post an ad for your site for some change, too.
Re:Fuck Roland (Score:4, Funny)
Anyone else want to buy Roland and make him shut up?
Oh for goodness sake (Score:1, Offtopic)
The fact that the same person who submitted it also submitted a whole bunch of other stories is besides the point.
Re:Fuck Roland (Score:2, Offtopic)
At least this story is interesting. Why does it piss you off so much that someone makes some money off finding this story? If Roland makes some coin because he's bothered to pay attention to news sites I don't read and report interesting articles to a site I do read, by all means, more power to him! I'm glad he's doing the legwork s
/. malcontents need someone to hate (Score:1, Offtopic)
When Jon Katz left they decided to hate Michael Sims. Now Sims is gone so they need a new target. Not that Katz, Sims, offered any great insight or content to slashdot, but the hatred and paranoia against them is beyond reason. That said, I don't much like Piquepaille's site and don't click on his links. But he does offer the service of collecting and collating technology information - a service not much differ
Re:Fuck Roland (Score:1, Offtopic)