CERN Collider To Trigger a Data Deluge 226
slashthedot sends us to High Productivity Computing Wire for a look at the effort to beef up computing and communications infrastructure at a number of US universities in preparation for the data deluge anticipated later this year from two experiments coming online at CERN. The collider will smash protons together hoping to catch a glimpse of the subatomic particles that are thought to have last been seen at the Big Bang. From the article: "The world's largest science experiment, a physics experiment designed to determine the nature of matter, will produce a mountain of data. And because the world's physicists cannot move to the mountain, an army of computer research scientists is preparing to move the mountain to the physicists... The CERN collider will begin producing data in November, and from the trillions of collisions of protons it will generate 15 petabytes of data per year... [This] would be the equivalent of all of the information in all of the university libraries in the United States seven times over. It would be the equivalent of 22 Internets, or more than 1,000 Libraries of Congress. And there is no search function."
No Search Function (Score:5, Interesting)
Google it?
If Google is so awesome, maybe they can put their money where there mouth is and do something commendable. Of course, they'll probably have a hard time turning this data into marketing material.
Never mind the data (Score:5, Interesting)
Is there a danger or isn't there? (Score:2, Interesting)
So, is the above quote simply a poster who doesn't know what he is talking about (someone more interested in a catchy phrase in an article than in actually disseminating facts), or are these colliders actually capable of generating particles that haven't existed since the big bang? I tend to think the former - but I'm not a physicist, just a geek.
Re:No Search Function (Score:2, Interesting)
My guess is that they are looking for anomalies within the data that would indicate the presence of one of these subatomic particles. My guess furthermore is that once they get enough data analyzed they will be able to form a model to base a search function around.
That or the summary lies (wouldn't be the first time) and in fact they know exactly what they are searching for, and they have a search function, but of course someone has to look at the output of those functions to determine what impact they have on their model/ideas.
All pages are identical (Score:5, Interesting)
Re:Too much for the 'Net (Score:2, Interesting)
The
Check out http://www.geant2.net/ [geant2.net]
Re:I predict the end of the universe (Score:5, Interesting)
IMHO: This is a GoodThing(TM), it could mean the LoC is well on it's way to becoming an accepted SI unit.
Re:OT: The size of the internet (Score:3, Interesting)
Re:Too much for the 'Net (Score:5, Interesting)
LHC-related experiments will eventually have 70 Gbps of private fibers across the atlantic (Most NY -> Geneva, but at least 10Gbps NY -> Amsterdam), and at least 10 Gbps across the Pacific.
For what it's worth, here's the current transfer rates for one LHC experiment [cmsdoc.cern.ch] You'll notice that there's one site, Nebraska (my site), which averages 3.2 Gbps over the last day. That's a Tier 2 site - meaning it won't even recieve the raw data, just reconstructed data.
Our peak is designed to be 200TB / week (2.6Gbps averaged over a whole week). That's one site out of 30 Tier 2 sites and 7 Tier 1 sites (each Tier 1 should be about 4-times as big as a Tier 2).
Of course, the network backbone work has been progressing for years. It's to the point where Abilene, the current I2 network, [iu.edu] rarely is at 50% capacity.
The network part is easy; it's a function of buying the right equipment and hiring smart people. The extremely hard part is putting disk servers in place that can handle the load. When we went from OC-12 (622 Mbps) to OC-192 (~10Gbps), we had RAIDs crash because we wrote at 2Gbps on some servers for days at a time. Try building up such a system without the budget to buy high-end Fiber Channel equipment too!
And yes, I am on a development team that works to provide data transfer services for the CMS experiment.
Think for a moment (Score:3, Interesting)
Think about it, the only thing stopping us is the ability to store and transfer large amounts of data necessary to describe the precise makeup of a human being. I have a feeling this project will branch off into that area.
Re:Too much for the 'Net (Score:3, Interesting)