Forgot your password?
typodupeerror
Biotech Science Technology

Grid Computes 420 Years Worth of Data in 4 Months 166

Posted by Zonk
from the that-is-a-lot-of-nerd-hours dept.
Da Massive writes with a ComputerWorld article about a grid computing approach to the malaria disease. By running the problem across 5,000 computer for a total of four months, the WISDOM project analyzed some 80,000 drug compounds every hour. The search for new drug compounds is normally a time-intensive process, but the grid approach did the work of 420 years of computation in just 16 weeks. Individuals in over 25 countries participated. " All computers ran open source grid software, gLite, which allowed them to access central grid storage elements which were installed on Linux machines located in several countries worldwide. Besides being collected and saved in storage elements, data was also analyzed separately with meaningful results stored in a relational database. The database was installed on a separate Linux machine, to allow scientists to more easily analyze and select useful compounds." Are there any other 'big picture' problems out there you think would benefit from the grid approach?
This discussion has been archived. No new comments can be posted.

Grid Computes 420 Years Worth of Data in 4 Months

Comments Filter:
  • Excellent (Score:5, Funny)

    by President_Camacho (1063384) on Friday February 16, 2007 @11:59PM (#18048132) Homepage
    The search for new drug compounds is normally a time-intensive process, but the grid approach did the work of 420 years of computation in just 16 weeks.

    Cue the stoners in 5, 4, 3, 2....
    • Re: (Score:1, Troll)

      by Mullen (14656)
      It's Friday night and they are too stoned to respond to that setup.
    • by Mr2001 (90979)
      It's a shame the tagging system doesn't allow numeric tags. I had to tag this story "fourtwenty" instead.
    • I'll take a Gin and Tonic (with Quinine, of course).
    • Re: (Score:2, Funny)

      The Answer to The Ultimate Question Of Life, the Universe, and Everything == 42 Malarial Drug Research == 420 420 is 419 + 1 (419 - remember Nigeria?) Malarial Drug Research/Answer to Ultimate question = 420/42 = 10 remove 0 from 10 :-) and it is 1 , subtract 1 from 420 and it is 419..... Something is really really fishy...
    • Re: (Score:2, Funny)

      The Answer to The Ultimate Question Of Life, the Universe, and Everything == 42

      Malarial Drug Research == 420

      420 is 419 + 1 (419 - remember Nigeria?)

      Malarial Drug Research/Answer to Ultimate question = 420/42 = 10
        remove 0 from 10 :-) and it is 1 , subtract 1 from 420 and it is 419.....

      Something is really really fishy...

      ----
      comment already exceeded retard limit, hence no sig.
    • I am not as think as you stoned I am.
    • sorry... nothing to contribute today but my name {thats what alcohol does... if i was (just) stoned, i would have a few pages to contribute}....damn alcohol....
  • Wikipedia? (Score:3, Interesting)

    by JonathanR (852748) on Saturday February 17, 2007 @12:06AM (#18048176)
    It strikes me as strange that something like Wikipedia could not be distributed across user's PCs in more of a peer-to-peer fashion. Surely the web itself could benefit from further decentralisation. This issue bothered me some years ago, when I discovered that my desktop PC at work had about 40Gb of unpartitioned disk space. I often wondered about the sense of running file servers in big organisations, when each user probably has a few tens of gigabytes of unused or unpartitioned disk space. If illicit music and video can be distributed by P2P, why not all information?
    • by MrCoke (445461)
      Because Wikipedia contains information, 'facts', figures and data. Place it on P2P and it will be tampered with a lot more then now.

      IMO.
      • Because Wikipedia contains information, 'facts', figures and data.

        Oh man... That's a good one!
      • I really don't think tampering is much of an issue since people can freely edit facts and figures in wikipedia anytime they want anyway. However, the previous concerns about speed and bandwidth costs are valid points. As for corporate fileservers. aside from the backup issues mentioned previously, there is also the issue of ensuring everyone sees the same thing. It's much easier, say, with software, to have all the software stored centrally and have all computers get their software from this one place. Make
    • by Jessta (666101)
      Yes, wikipedia could be decentralised. But between different web servers in data centres not to people's desktop computers.
      I would find it really annoying that the particular data I wanted was on your computer and your computer wasn't on or was infested with malware because you don't know how to properly administer you computer.

      The main reason for having central file servers are:
      1. Backups - By storing all the data in a central location it's much easier to make sure that all data is properly backed up
      2. Sec
      • by mollymoo (202721)
        1 and 3 come down to having multiple nodes and copies of the data, which is trivial for a distributed network. The security thing can be dealth with using existing crypto stuff - hashes for integrity of it's no sensetive, encryption if it is sensetive. 40GB may be cheap, but 400TB (a mere 10 000 users) isn't so cheap.
    • Re: (Score:2, Informative)

      by elchuppa (602031)
      Well this is an excellent question. Actually Van Jacobson is on google video [google.com] with a presentation on this precise pet peeve of yours. The main concern I have with the idea, at least with how Van Jacobson presents it is that with information addressed by content rather than location, it's slightly more challenging to locate it. At least with the IP system you can route closer towards your destination at each hop up and then down... But data without an authoritative source is basically lost. If you don't
    • Your work computer can be managed by the company you work at; they can even revoke root if they're concerned about security. There are actually a few existing distributed filesystems for Linux, though most of them suck, and the few I've seen with the potential not to suck either cost money or are a long way from being stable on Linux. Haven't seen ANY of these on Windows.

      Someone mentioned backup, which isn't a big deal. Ever heard of RAID? Yeah, it could be something like that.

      Although if it's a desktop PC,
    • Overhead mostly. Also there is a good chance of important data being lost when one of the computers holding part of a critical application goes offline. Sure you could add redundancy, and rate the importance of certain things so they would go to dedicated systems, but then you have even more overhead.
    • It's called information assurance. There are reasons that a Netapp/EMC array costs $25,000 per terabyte when a 1TB maxtor usb drive costs less than $1000. The first is that it is made to tolerate faults and be redundant. Sure you could do this in an enterprise, but then you end up with massive duplication to get around people turning off their computers, a massively expensive and complex distribution and tracking system, and higher failure and lower performance of desktop drives that are now running your
  • by zappepcs (820751) on Saturday February 17, 2007 @12:07AM (#18048180) Journal
    If the grid solution finds THE cure for H5N1, will it be patentable? If not, who pays for the R&D to implement it? Who gets the patent? Do the thousands of people who allowed their PCs to be used get anything? Will big drug companies be able to use this and keep the prices low for the final product?
    • by Dr. Spork (142693) on Saturday February 17, 2007 @12:25AM (#18048280)
      These are all good questions, and every user who volunteers their computer for something like this should find answers to them. I'm quite sure that the stuff discovered by distributed networks does not automatically enter the public domain, but in cases like SETI and protein folding, the organizers explicitly state that it will. But it wouldn't be illegal for a drug company to use volunteers' computers just for corporate profit. You have to judge the merit of each of these projects on a case-by-case basis. Remember also that there is a cost to participating: you have to run your computer at peak power, and this will add several hundred dollars to your utility bills each year while polluting the planet with extra coal smoke and CO2.
    • by iminplaya (723125)
      Will big drug companies be able to use this and keep the prices low for the final product?

      And people accuse me of living in a fantasy world...
    • I think you are confusing grids with distributed peer-to-peer computing networks. Grids are formed of (usually) clusters of nodes, usually running Linux. They are designed for problems that require massive amounts of computation, often involving multiple cooperation nodes running in parallel, and normally amounts of data so large that it is simpler to send the programs to a cluster near the data. http://en.wikipedia.org/wiki/Grid_computing [wikipedia.org] Shared processing systems a la seti or BOINC are loosely coupled s
    • What the grid computing will have produced is likely to be a set of predicted structure-activity relationships (SARs), i.e. calculations that say that molecules of a certain shape and with a certain charge distribution might be active. You can patent a group of molecules for a certain disease, so I guess that this would be patentable. (Who gets the patent is not actually that important. Licenses have been invented to solve that problem.) However, if you want to have a claim that stands up to some contest, y

    • by bunratty (545641)

      Will big drug companies be able to use this and keep the prices low for the final product?
      No. The main expense in developing a new drug is doing the pre-clinical and clinical trials to prove the drug is safe and effective and to determine the proper dosage. That requires patients, doctors, nurses, hospitals, research scientists, and so on, over a period of many years. Paying for all of those people to do the work costs tens or hundreds of millions of dollars.
  • by convolvatron (176505) on Saturday February 17, 2007 @12:07AM (#18048182)
    sorry, i missed that definition. what is that in library of congresses per human hair?
  • by Saint Stephen (19450) on Saturday February 17, 2007 @12:08AM (#18048190) Homepage Journal
    (420 years / 16 weeks) / 5000 computers = 1:4 scalability!!!

    Frickin amazing! No one's EVER done that before.
    • by Bob54321 (911744)
      I don't know - the Mozix cluster at work achieves about the same efficiency... I.T. won't listen that there is something incredibly wrong as I employed as a statistician so obviously know nothing about computers!
    • That 420 was based on some benchmark. Perhaps a 1GHz Pentium or something. Perhaps the average CPU on the grid was higher.
    • by jZnat (793348) *
      Working on generic PCs using idle CPU, that's probably pretty good, right? These aren't dedicated grid computers as far as I can tell.
    • by DeQuincey (221531)
      You're glossing over some important points.

      1) I'm pretty sure that the servers have to send the same job out to multiple clients. That is, you can't assume that it's sufficient to have only one computer return a result for one job. There's the possibility that the result is incorrect or never returned.

      2) The point of grid computing is to reduce both the cost and time required to do the computation. The entire endeavor would be more efficient if you had full control over the entire grid, i.e. a huge cluste
    • by nick_davison (217681) on Saturday February 17, 2007 @01:45AM (#18048662)
      Worse...

      It's over 4 months, not a fraction of a second.

      If I have a task that takes 100 seconds to run and I want it completed in under a second, scalability becomes a challenge... I have to figure out how to break it in to at least 100 distinct parts and deal with all of the communication lags associated. To have any kind of fault tolerance, I probably want to break it in to at least 1,000 tasks so that if one processor is running fast, it can get fed more and if one processor corrupts its process, I don't find out right at the end of the second, with no room to compensate, that I have to run re-run that full second's worth of processing elsewhere to make up for it. That's where the challenge comes in.

      If I have a task that takes 100 seconds to run and all I'm trying to do is run it a lot of times over a period of time that's many times greater, I can run it 864 times a day per system with absolutely no scalability issues whatsoever and simply send the relatively small complete result sets back. With 100 systems, if each one can run a distinct task from start to finish, I'd be expecting pretty much dead on 100 times the total number crunching as there are absolutely no issues with task division, synchronization or network lag.

      In this case, they ran 5,000 computers over 4 months. Assuming a single task is solvable in under 4 months by a single system, they should have had no difficult task division problems to solve, absolutely minimal synchronization issues and next to no lag issues to address. In short, even a pretty inefficient programmer should be able to approach 1:1 scalability in that easy of a scenario.

      Efficiency of algorithms is a challenge when you want a single result fast. When you want many results and are prepared to wait so long as you're getting very many of them, that's an incredibly easy distributed computing problem.
  • by cultrhetor (961872) on Saturday February 17, 2007 @12:09AM (#18048196) Journal
    how abouutt a drog thet maks slshdaughters spel gooder and youze gooderest grammer?
  • Here's one (Score:5, Funny)

    by realmolo (574068) on Saturday February 17, 2007 @12:10AM (#18048200)
    Would it be possible to use all that computing power to make an electronic voting machine that works?

    Oh wait! How about a voting machine based on "quantum computing"! Then we wouldn't even have to vote, the machine would already know who won.

    Goddamn liberal qubits! Bunch of flip-floppers!

    Stupid conservative qubits! They think that there is ONE and ONLY ONE answer for everything!
  • by imstanny (722685) on Saturday February 17, 2007 @12:10AM (#18048204)
    Up to 5,000 computers were used at any one time, generating a total of 2,000GB of useful data.


    Based on the size of useful data GRID collected from 5,000+ machines and the quantity of pornography on my computer, they are claiming that: porn != useful.
    ...GRID computing; you disappoint me.

  • by cowscows (103644) on Saturday February 17, 2007 @12:11AM (#18048208) Journal
    In an amazing breakthrough which will no doubt have profound implications on Moore's Law, it has been discovered that multiple computers can accomplish in a shorter time what would take much longer on a single computer! Researchers will next launch a study to see how much faster 6000 video ipods working simultaneously can play through all the songs on the iTMS compared to a single first generation ipod shuffle.
  • Perhaps if enough PCs were put to the task they could create new Shakespearean masterpieces. 64bit Night and all that.
  • Up to a full megawatt or more for sixteen weeks. How much does that cost where you live? Still, it's a great bang for the buck. So how long would it take with a beowolf cluster of these?
  • by Raul654 (453029) on Saturday February 17, 2007 @12:55AM (#18048454) Homepage
    (Preface - My research group specializes in parallel computing) There are classes of problems so computationally intensive that the computers that can do them in a reasonable amount of time won't be invented for decades. Almost all of these are simulations of physical reactions (invitro drug simulation, climate simulation, biomolecular engineering sims, physics sims, 'etc). As a general rule, these problems scale weakly (meaning that as you add more computers, you can simulate more datapoints, and get more accurate results). If memory serves, the hardest problem I can recall involved hydrogen fusion simulations, requiring computers 10-1000 times faster than the best in the world today.

  • by classh_2005 (855543) on Saturday February 17, 2007 @12:59AM (#18048472)
    This looks interesting:

    http://www.majestic12.co.uk/ [majestic12.co.uk]

  • You know, I think the thing that aggravates me the most is that these distributed computing systems are helping drug companies find cures to illnesses using OUR processing power and computers WE paid for, only to sell us the drug that they would have been hard pressed to develop without our hardware back to us at an extremely inflated price.
    • If you have for example cancer and a drug company is making money for providing you the cure for it. You really don't care how they got it, as long as you are able to pay for the drug. It is better to have expensive drug rather than not having it at all. You are not forced to participate.

      But there is of course a better solution also. Drug research could be funded by goverments, with tax money. This would allow cheap drugs and all the research data could be public, which would speed up the research a lot, as
    • Re: (Score:3, Insightful)

      by scottv67 (731709)
      You know, I think the thing that aggravates me the most is that these distributed computing systems are helping drug companies find cures to illnesses using OUR processing power and computers WE paid for, only to sell us the drug that they would have been hard pressed to develop without our hardware back to us at an extremely inflated price.

      Posting a reply to your comment is going to un-do my moderation this morning but I can't let your comment go by without a response. Yes, we (people who run the distr
  • Are there any other 'big picture' problems out there you think would benefit from the grid approach?

    The development of models to find relationships among individuals based upon their phone records, email communications, webpage preferences and other easily recorded and identified identifying tidbits of digital transactional receipts. Of course, I'm sure that there are various three letter agencies already well ahead of me on that one. (High guys!)

  • by 1mck (861167)
    I've been donating my processor time for quite awhile now for the Malaria research, and even though the drug companies will probably benefit from my donation, they would not have these breakthroughs if people didn't donate that time, and it is the fact that a breakthrough will be found is what keeps me donating my processor time. It's a great feeling knowing that I've contributed to a possible cure towards this disease! Other projects that could need the services of Grid Computing, I believe that was the or
    • Re: (Score:3, Interesting)

      by AlXtreme (223728)

      I can also see Grid Computing being used also for computer animations where the time to render animations would be greatly reduced, and allowing movies, and shows to be released much faster than before.

      I'm afraid that that will take quite some time to realize. Rendering CG, besides taking a lot of processing time, also requires enormous amounts of data, which restricts the rendering to render farms, the data being pumped over a high-speed LAN.

      Actually the amount of problems solvable by using Grid Computing


    • Malaria would be a forgotten disease if the ecopagans hadn't outlawed DDT.

      Tens of millions of human beings [typically brown & black, and suffering in the most politically correct of third world cesspools] die every year because of our arrogant and narcissistic obsession with this pagan religion.

      • by gordgekko (574109)
        Well said. If it were millions of cute white babies dying every year, they'd be selling DDT in grocery stores. What's that? They're just black babies? Carry on!
  • Much of this discussion is totally misdirected because the writers are confusing a distributed computing project like SETI or BOINC - http://en.wikipedia.org/wiki/BOINC_client-server_t echnology [wikipedia.org] - with a grid system - http://en.wikipedia.org/wiki/Grid_computing [wikipedia.org]. They are completely different things.
  • Gus Gorman has a patant on this already called the Ultimate Computer, I believe it's in the Grand Canyon.
  • It did 4 months worth of computation in 4 months. If it had been 420 years worth of computation it would have taken 420 years. It's like infomercials that say "and you get all this, a $899 value, for $30!" Obviously it's not "a $899 value" or you would be selling it for that instead of $30. Perhaps, though, they mean that they did 420 processor-years of work over the course of 4 months (meaning that they would have had an average of 1260 cores doing something useful at any time).
    • by Raxxon (6291)
      Assuming your math is accurate (and I'm too tired to look honestly) why were 5k systems used? Were the remaining 3740 used for control/coordination? If so that seems a little... sloppy in my mind. Or were these linked via a SETI-like configuration where nodes were coming on-/going off-line "at random"?
      • by arodland (127775)
        Your guess is as good as mine. My math is good as far as I can tell. It might have just been one of those things where you can only parallelize a problem so far, so nodes spent more of their time waiting for required results than doing anything. Or it could have been, as you said, an unreliable environment where the computers weren't dedicated. Or the numbers could be entirely off. I'm not horribly worried about finding out the specifics either :)
      • by Firethorn (177587)
        Maybe it wasn't 1260 core-years of the average client, but the best computer they could have otherwise been able to afford. Toss in double or triple redundency for security and accuracy* and there you go.

        *because SETI at home discovered some asses hacked the clients to artificially raise their score.
  • by iamacat (583406) on Saturday February 17, 2007 @03:00AM (#18048956)
    I imagined a beowulf cluster of those, nekked and petrified. Then I got ashamed of myself for rehashing the old meme and dumped hot grits in my pants. As I was convulsing on the ground, there was only one thought left in my mind:

    "Does it run Linux?"
  • So actually it is just 4 months of data with the new standard they set.
  • If the headline were "NEW MALARIA DRUGS FOUND WITH AID OF GRID COMPUTING" I would be much more impressed.

    It's all well and good to tie a big grid to a problem, but if you don't ask the right questions, you won't get useful answers.

    Are there any significant grid computing success stories?

    -pvh
  • Are there any other 'big picture' problems out there you think would benefit from the grid approach?

    I can think of two:

    this [microsoft.com]

    ...and this. [3drealms.com]

  • It's worth mentioning that malaria was nearly wiped out by the simple and inexpensive use of DDT before Rachel Carson and her sympathizers managed to get the stuff banned. And 35 years later, pretty much all her arguments have been shown to have been fabricated. But hey, only 30 million+ died as a result.

    It's nice to know that grid computing can be used to evaluate the potential of all those compounds, of course, as there are certainly applications for that. But the context of the current test is one that w
    • Ah, the old "well-meaning but naive environmentalists make stupid decisions" meme.

      The ban on DDT in the U.S. did not result in 30 million deaths from malaria. There is no international ban on DDT: it is still used in developing countries to combat malaria, and it can only be used up to a point before the mosquitoes start developing immunity to it. (In fact, it is even used in the U.S. occasionally for disease control as a residential insecticide; it is only banned as a general-use agricultural pesticide.
  • ...is "Grid computing finds cure for malaria."

    I could look through the threads of my bedroom rug for 420 years and not find the cure either.

    Eyes on the prize, people.
  • Let's consider these numbers. It is 420 years of data for one machine, spread out over a number of machines.
    • Spread out over five machines, that's 84 years.
    • Spread out over ten machines, that's 42 years.
    • Spread out over 50 machines, that's 8.4 years (100 months, or 8 years, and just under five months).
    • Spread out over 500 machines, that's 0.84 years (306 days, or about ten months.).
    • Spread out over 5000 machines, that's 31 days or about one month.

    Since the work took 4 months, it implies that each mach


  •       With a botnet of a few hundred thousand machines, brute-forcing the crypto application of your choice would immediately come to mind. Whether that would be one of the better uses of the botnet is questionable, but hey, if you have something that's really important to you to try to crack...

    steve
  • "...Are there any other 'big picture' problems out there you think would benefit from the grid approach?..."

    One of today's greatest problems facing all humanity is Gravity. Use the Grid to solve Anti-Gravity.
  • Brain: "Pinky, tonight we will take over the world!"

    That sounds like a good one!

"When the going gets weird, the weird turn pro..." -- Hunter S. Thompson

Working...