CERN Collider To Trigger a Data Deluge 226

Posted by kdawson on Tuesday May 22, 2007 @05:18AM from the things-that-go-bang dept.

slashthedot sends us to High Productivity Computing Wire for a look at the effort to beef up computing and communications infrastructure at a number of US universities in preparation for the data deluge anticipated later this year from two experiments coming online at CERN. The collider will smash protons together hoping to catch a glimpse of the subatomic particles that are thought to have last been seen at the Big Bang. From the article: "The world's largest science experiment, a physics experiment designed to determine the nature of matter, will produce a mountain of data. And because the world's physicists cannot move to the mountain, an army of computer research scientists is preparing to move the mountain to the physicists... The CERN collider will begin producing data in November, and from the trillions of collisions of protons it will generate 15 petabytes of data per year... [This] would be the equivalent of all of the information in all of the university libraries in the United States seven times over. It would be the equivalent of 22 Internets, or more than 1,000 Libraries of Congress. And there is no search function."

CERN Collider To Trigger a Data Deluge

This discussion has been archived. No new comments can be posted.

Search 226 Comments Log In/Create an Account

Comments Filter:

No Search Function (Score:5, Interesting)

by tacocat ( 527354 ) writes: <tallison1@twmi.r ... minus herbivore> on Tuesday May 22, 2007 @05:27AM (#19218527)

Google it?

If Google is so awesome, maybe they can put their money where there mouth is and do something commendable. Of course, they'll probably have a hard time turning this data into marketing material.

Never mind the data (Score:5, Interesting)

by simong ( 32944 ) writes: on Tuesday May 22, 2007 @05:34AM (#19218565) Homepage

What about the backups?

Is there a danger or isn't there? (Score:2, Interesting)

by Excelcia ( 906188 ) writes: <slashdot@excelcia.ca> on Tuesday May 22, 2007 @05:38AM (#19218583) Homepage Journal

catch a glimpse of the subatomic particles that are thought to have last been seen at the Big Bang
I read of "fringe" scientists who warn that there could be potential catastrophic consequences to the coming generation of colliders. The answer to these warnings seems to be that cosmic rays of higher energy than our colliders can generate have been zipping around for billions of years - so if something "bad" could come of it, then it would have already happened.

So, is the above quote simply a poster who doesn't know what he is talking about (someone more interested in a catchy phrase in an article than in actually disseminating facts), or are these colliders actually capable of generating particles that haven't existed since the big bang? I tend to think the former - but I'm not a physicist, just a geek.

Re:No Search Function (Score:2, Interesting)

by Raptoer ( 984438 ) writes: on Tuesday May 22, 2007 @05:59AM (#19218707)

The problem is less that there is no search function (with digital data all you're doing is matching one pattern to another), the problem is more that you don't know exactly what you are searching for!
My guess is that they are looking for anomalies within the data that would indicate the presence of one of these subatomic particles. My guess furthermore is that once they get enough data analyzed they will be able to form a model to base a search function around.
That or the summary lies (wouldn't be the first time) and in fact they know exactly what they are searching for, and they have a search function, but of course someone has to look at the output of those functions to determine what impact they have on their model/ideas.

All pages are identical (Score:5, Interesting)

by Laxator2 ( 973549 ) writes: on Tuesday May 22, 2007 @06:14AM (#19218785)

The main difference between the LHC data and the Internet is that all that 15 PB of data will come in a standard format, so a search is much easier to perform. In fact most of the search will consist on discarding non-interesting stuff while attempting to identify the very rare events that may show indications of new particles (Higgs for example). The Internet is a lot more diverse, the variety of information dwarfs the limited number of patterns LHC is looking for, so "no search available" for LHC data sounds more like "no search needed".

Re:Too much for the 'Net (Score:2, Interesting)

by Anonymous Coward writes: on Tuesday May 22, 2007 @06:26AM (#19218861)

"They could run a fiber across the Atlantic that could handle 4gbps."

The .eu academic networks have a lot more transatlantic bandwidth than that already. When I worked at JANET (the uk academic network) we were one hop from .us and had 10G transatlatic bandwidth (how much of that was on-demand I can't remember). Geant, the .eu research network interconnect, also has direct connections to the .us research networks. The bandwidth is in place and has been for some time. It's being updated right now as well.

Check out http://www.geant2.net/ [geant2.net]

Re:I predict the end of the universe (Score:5, Interesting)

by TapeCutter ( 624760 ) writes: on Tuesday May 22, 2007 @07:14AM (#19219021) Journal

It seems the metric LoC = 10TB. If that is so then an LoC is no longer based on a physical library but has rather been redefined based on a more basic unit of information, (ie: the byte). This sort of thing has happened before, the standard time unit (second) is no longer based on the earth's rotation, rather it is based on some esoteric (but very stable) feature of cesium atoms.

IMHO: This is a GoodThing(TM), it could mean the LoC is well on it's way to becoming an accepted SI unit. :)

Re:OT: The size of the internet (Score:3, Interesting)

by joto ( 134244 ) writes: on Tuesday May 22, 2007 @07:46AM (#19219181)

Meaningful and valuable to who? If I had to make the choice between using the bandwidth and storage space to store your post, or to store half a kilobyte of CERN sensor data, I would actually choose to store your post. And it's not because I find your post particularly valuable. It's because the CERN data is as meaningless to me as line-noise would be. For me even donkey bukkake with midgets is more meaningful, than random sensor data from CERN. Only when the scientists make discoveries from it that either carries important philosophical, economical, and/or practical benefits or changes, do I become interested.

Re:Too much for the 'Net (Score:5, Interesting)

by bockelboy ( 824282 ) writes: on Tuesday May 22, 2007 @07:59AM (#19219279)

That's 4Gbps AVERAGE, meaning it's much below the peak rate. That's also the raw data stream, not accounting for site X in the US wanting to read reconstructed data from site Y in Europe.

LHC-related experiments will eventually have 70 Gbps of private fibers across the atlantic (Most NY -> Geneva, but at least 10Gbps NY -> Amsterdam), and at least 10 Gbps across the Pacific.

For what it's worth, here's the current transfer rates for one LHC experiment [cmsdoc.cern.ch] You'll notice that there's one site, Nebraska (my site), which averages 3.2 Gbps over the last day. That's a Tier 2 site - meaning it won't even recieve the raw data, just reconstructed data.

Our peak is designed to be 200TB / week (2.6Gbps averaged over a whole week). That's one site out of 30 Tier 2 sites and 7 Tier 1 sites (each Tier 1 should be about 4-times as big as a Tier 2).

Of course, the network backbone work has been progressing for years. It's to the point where Abilene, the current I2 network, [iu.edu] rarely is at 50% capacity.

The network part is easy; it's a function of buying the right equipment and hiring smart people. The extremely hard part is putting disk servers in place that can handle the load. When we went from OC-12 (622 Mbps) to OC-192 (~10Gbps), we had RAIDs crash because we wrote at 2Gbps on some servers for days at a time. Try building up such a system without the budget to buy high-end Fiber Channel equipment too!

And yes, I am on a development team that works to provide data transfer services for the CMS experiment.

Think for a moment (Score:3, Interesting)

by kilodelta ( 843627 ) writes: on Tuesday May 22, 2007 @08:53AM (#19219817) Homepage

There are some other benefits to building such a huge network of high powered computers. And it's not the teleportation you thought, it's more copying of metadata and re-creating the original.

Think about it, the only thing stopping us is the ability to store and transfer large amounts of data necessary to describe the precise makeup of a human being. I have a feeling this project will branch off into that area.

Re:Too much for the 'Net (Score:3, Interesting)

by markov_chain ( 202465 ) writes: on Tuesday May 22, 2007 @09:52AM (#19220557)

If they could get 1GB/s sustained, it would take them... 173 days to transfer 15PB. I hope they have dark fiber to light up!

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

CERN Collider To Trigger a Data Deluge 226

CERN Collider To Trigger a Data Deluge More Login

CERN Collider To Trigger a Data Deluge

No Search Function (Score:5, Interesting)

Never mind the data (Score:5, Interesting)

Is there a danger or isn't there? (Score:2, Interesting)

Re:No Search Function (Score:2, Interesting)

All pages are identical (Score:5, Interesting)

Re:Too much for the 'Net (Score:2, Interesting)

Re:I predict the end of the universe (Score:5, Interesting)

Re:OT: The size of the internet (Score:3, Interesting)

Re:Too much for the 'Net (Score:5, Interesting)

Think for a moment (Score:3, Interesting)

Re:Too much for the 'Net (Score:3, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot