CERN Collider To Trigger a Data Deluge 226
slashthedot sends us to High Productivity Computing Wire for a look at the effort to beef up computing and communications infrastructure at a number of US universities in preparation for the data deluge anticipated later this year from two experiments coming online at CERN. The collider will smash protons together hoping to catch a glimpse of the subatomic particles that are thought to have last been seen at the Big Bang. From the article: "The world's largest science experiment, a physics experiment designed to determine the nature of matter, will produce a mountain of data. And because the world's physicists cannot move to the mountain, an army of computer research scientists is preparing to move the mountain to the physicists... The CERN collider will begin producing data in November, and from the trillions of collisions of protons it will generate 15 petabytes of data per year... [This] would be the equivalent of all of the information in all of the university libraries in the United States seven times over. It would be the equivalent of 22 Internets, or more than 1,000 Libraries of Congress. And there is no search function."
OT: The size of the internet (Score:5, Informative)
Okay, the Library of Congress has been estimated to contain about 10 Terabyte, so I buy the 1000 * LoC = 15 Petabyte. But archive.org alone expanded its storage capacity to 1 Petabyte in 2005, so the CERN is not going to generate anything near "22 Internet" (whatever that might be). This estimate [berkeley.edu] from 2002 calculates the size of the internet as about 530 Exabyte, 440 Exabyte of which are email, 157 Petabyte for the "surface web"
re: 15 petabytes? (Score:2, Funny)
On the other hand, I'm sure it will be available on some torrent soon.
Re: 15 petabytes? (Score:4, Funny)
Re: (Score:2)
Re: (Score:2)
I don't know for sure that that is how it will be at CERN, but I know that that is how we do it at Fermilab, and I don't know of any change in technology between when that was set up and now that would invalidate the reasoning behind using tape at Fermilab. So, I would expect that CERN would do the sam
Re: (Score:2, Insightful)
Re:OT: The size of the internet (Score:4, Informative)
Me bad, miscalculated, off by a factor of 1000.
Re:OT: The size of the internet (Score:5, Funny)
Re: (Score:3, Interesting)
Re: (Score:2)
I think I'd rather the random sensor data, given those two options. It's kind of like staring at the wall in front of you when you're at a urinal. It's not that the wall is so interesting...
I suspect (Score:2)
Re: (Score:2)
Either way, the Archive also keeps old versions of the sites, meaning multiple copies of what is essentially the same site.
Re: (Score:2)
Re: (Score:2)
I predict the end of the universe (Score:5, Funny)
Okay... maybe not, but if they ever did put this data in the LoC, the effort required to re-factor all the LoC based measurements would bankrupt the world. And the confusion that goes on while this re-factoring is happening will surely crash at least one probe into Mars, where the English have used the new LoC units and the Americans will have used the old LoC units.
Re:I predict the end of the universe (Score:5, Interesting)
IMHO: This is a GoodThing(TM), it could mean the LoC is well on it's way to becoming an accepted SI unit.
Re: (Score:2)
Now we will have a whole other schism over whether the 10 TB is binary (10 x 2^40) or decimal (10 x 10^12), with SI purists demanding the binary be distinguished as 10 tebibytes.
Re: (Score:2)
It's far from perfect, but it's better than a recurively expanding unit of information gobbling up the universe.
Re: (Score:2)
Too much for the 'Net (Score:3, Insightful)
FYI 15 petabytes per year = 120 petabits per year = 120,000,000 gigabits per year
120,000,000 gigabits per year / ~30,000,000 seconds per year = 4gbps of continuous transmission. They could run a fiber across the Atlantic that could handle 4gbps.
Neutrinos (Score:5, Funny)
You know with the right sort of particle accelerator you could send messages straight through the Earth and save a heap of latency.
Re:Neutrinos (Score:5, Funny)
It's called the "Death Star" project, and we've been having a hell of a time with the receiver...
Never underestimate the bandwidth of a 747 (Score:3, Insightful)
Re:Never underestimate the bandwidth of a 747 (Score:4, Informative)
When you add the amount of time, money, kit and effort that'd go into either burning that many optical disks or filling that many harddrives, then connecting them on the other end and reading it out makes it less attractive than fiber optics.
On the other hand, if the 747 is crammed full of ultra-high-capacity hard-drives (say, the new Hitachi 1TB) in high-density racks that do not need unloading from the aircraft (it lands, it plugs into a power/multiple-10GbE-grid, offloads the data to a local ground facility, then goes out for the next run), you get something that'd possibly be competitive with fiber, as well as a possible business model avenue.
You would, of course, need someone to be willing pay the rough equivalent of
Slight problem (Score:2)
Re:Never underestimate the bandwidth of a 747 (Score:5, Funny)
I'm sorry, how much is that in Cessna 172's again?
Re:Never underestimate the bandwidth of a 747 (Score:5, Informative)
Using the maximum payload weight of an A380F (freighter model), we get with Google calc: (152 400 kg / 700 grams) * 1Tbytes = 193.36913 petabytes, which is 12.8912753 years worth of CERN CMS data over a maximum distance of 5,600 nautical miles.
The maximum useful load of a Cessna 172 is 371 kg, which gives a meager 0.0313823042 years worth of data over a maximum distance of 687 nm.
The raw distance between CERN and Purdue University (not including distances to airports and such) is about 3838 nm, well within range of the A380F. The Cessna 172 falls into the ground/ocean long before that however. Since there's no air-refueling option for the Cessna, the plan calls for a fleet of at least 179 Cessna 172's constantly working in relay, just to keep up with the data production rate!
So, to answer your question: If you want the same leisurely pace of using one A380F, you'll need a massive 2148 Cessnas flying for a full year, every 12 years (the total weight of which is equivalent to 531 A380F's, which should tell you something about the efficiency of said plan).
Re: (Score:2, Informative)
They have been getting sustained performance (with simulated data) of more than that for several years now. This is the sort of thing that Internet2 does well, when it's not on fire.
Re: (Score:2)
Two hard drives can fit 1Tb of data now (1Tb hard drives are also available), so 15Pb can fit on 'just' 30000 hard drives. A large number, but manageable.
Re: (Score:2, Interesting)
The
Che
Re: (Score:3, Interesting)
Re: (Score:3, Informative)
Re: (Score:3, Funny)
Re: (Score:2)
Re: (Score:2, Funny)
Re:Too much for the 'Net (Score:5, Interesting)
LHC-related experiments will eventually have 70 Gbps of private fibers across the atlantic (Most NY -> Geneva, but at least 10Gbps NY -> Amsterdam), and at least 10 Gbps across the Pacific.
For what it's worth, here's the current transfer rates for one LHC experiment [cmsdoc.cern.ch] You'll notice that there's one site, Nebraska (my site), which averages 3.2 Gbps over the last day. That's a Tier 2 site - meaning it won't even recieve the raw data, just reconstructed data.
Our peak is designed to be 200TB / week (2.6Gbps averaged over a whole week). That's one site out of 30 Tier 2 sites and 7 Tier 1 sites (each Tier 1 should be about 4-times as big as a Tier 2).
Of course, the network backbone work has been progressing for years. It's to the point where Abilene, the current I2 network, [iu.edu] rarely is at 50% capacity.
The network part is easy; it's a function of buying the right equipment and hiring smart people. The extremely hard part is putting disk servers in place that can handle the load. When we went from OC-12 (622 Mbps) to OC-192 (~10Gbps), we had RAIDs crash because we wrote at 2Gbps on some servers for days at a time. Try building up such a system without the budget to buy high-end Fiber Channel equipment too!
And yes, I am on a development team that works to provide data transfer services for the CMS experiment.
Re: (Score:2)
*ducks and then goes back to read some article on some exotic particle which we will never find*
Don't forget the security... (Score:2)
Re: (Score:2)
No Search Function (Score:5, Interesting)
Google it?
If Google is so awesome, maybe they can put their money where there mouth is and do something commendable. Of course, they'll probably have a hard time turning this data into marketing material.
Re: (Score:3, Informative)
The data is suitable for high-throughput (ie, batch processing) and the idea is to keep copies of the experimental data in several places during processing. Interesting results g
Re: (Score:2, Interesting)
My guess is that they are looking for anomalies within the data that would indicate the presence of one of these subatomic particles. My guess furthermore is that once they get enough data analyzed they will be able to form a model to base a search function around.
That or the summary lies (wouldn't be the firs
Re:No Search Function (Score:5, Informative)
For a lot of the physics, the researchers know what they are looking for. For example, with the Higgs boson, theories constrain the decay and production to certain channels that have characteristic signatures. So they would be looking for events that have a muon at a certain energy with a hadron jet with another given energy coming off x degrees away and so on. There have been monte carlo simulations and other calculations done to predict what the interesting events should look like using various different theories. Of course there maybe interesting events that pop up that no one has predicted but everyone has a fairly good idea of what the expected events should look like.
Re:No Search Function (Score:5, Funny)
Buy books about Bosons at Amazon.com
Re: (Score:2)
* scintillators fix your mortgage
* Viagra particles
* free teen bosons
60% (Score:5, Funny)
And 60% of it will be porn.
-
GASP (Score:3, Funny)
Re:60% (Score:5, Funny)
See the hottest collisions on the web! Watch as innocent particles get ripped apart, revealing their inner quarks! See protons get exploited and penetrated in their luscious gluons!
Re:60% (Score:5, Funny)
Re: (Score:2)
Never mind the data (Score:5, Interesting)
Re: (Score:2)
I don't want to be the one who has to stay back at night to change backup tapes.
Re: (Score:2)
Re: (Score:3, Funny)
Is there a danger or isn't there? (Score:2, Interesting)
catch a glimpse of the subatomic particles that are thought to have last been seen at the Big Bang
I read of "fringe" scientists who warn that there could be potential catastrophic consequences to the coming generation of colliders. The answer to these warnings seems to be that cosmic rays of higher energy than our colliders can generate have been zipping around for billions of years - so if something "bad" could come of it, then it would have already happened.
So, is the above quote simply a poster who doesn't know what he is talking about (someone more interested in a catchy phrase in an article th
Re: (Score:2)
On a related note, all the particle collid
Re:Is there a danger or isn't there? (Score:5, Funny)
Well, yeah, but the probability is about the same as that of you generating a small black hole by clapping your hands together really hard.
Re: (Score:2)
Reminds me of Commander Blood [wikipedia.org].
Rendez-vous at the Big-Bang
Has someone played it ?
Re: (Score:2)
Re: (Score:2)
9 millimeters? That's huge.
You'd need a mass of about 6x10^24kg to get a Schwartzchild radius of 9mm.
Microscopic (much smaller than a proton) black holes, yes but 9mm just doesn't sound credible unless you've got some very outlandish theories about black holes.
(I've just been to read your link - a 9mm hole is what is left when the entire Earth is consumed by a microscopic black hole.)
Tim.
Re: (Score:2)
Re: (Score:2)
What do you think the big bang was?
22 Internets per year? (Score:4, Funny)
Re: (Score:2)
You don't understand!! Argh, slashdot makes me so aggrivated. Don't you understand that you can't just dump stuff on the tubes? It's not like a truck, you know.
Re: (Score:2, Funny)
Too much, and that's why we should pay the good companies all our hard earned cash to drill giant tubes for all our torrents, MP3s, smut and VoIP calls. Or at least, wasn't that what they were arguing for?
Re: (Score:3, Funny)
OTOH owning the harddrives capable of holding this much data gives you about 730 kilometers of e-penis.
Skynet? (Score:2)
2007: CernNET becomes self aware.
All pages are identical (Score:5, Interesting)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Informative)
Generally the data coming out of these experiments is filtered in two or more stages. It has to run in real time since the data volume is enormous. A detector like this can easily spew out several TB a second of raw data. The first layer of filtering will look at very small portions of the data and make very loose requirements on it, but can run very fast in dedicated electronics. This might
Re: (Score:2)
I doubt they'll actually delete any of this data once they have it safely on disk, but you can bet your life that most of it is going to be filtered out and basically ignored.
Re: (Score:2, Insightful)
Gaaa aaaaa aaaaaaa (Score:4, Funny)
Physics locker room.
Re: (Score:2)
Physics locker room.
It's called a chess club.
So.. (Score:2)
Who was at the Big Bang to see them then? I suspect that the numbers are a lot lower than the number of people that heard that tree fall in the woods and heard the sound of one hand clapping put together.
Re: (Score:2, Funny)
Don't be daft. Everyone here at UU knows that the sound of one hand clapping is 'cl-'
Worst Hyperbole Ever... (Score:4, Insightful)
That line is some of the worst hyperbole ever. Here's why. First, there was (almost by definition) no one there to 'see' anything at the Big Bang. (Supernatural explanations aside, and this purports to be a science article.) Second, these subatomic particles are formed frequently in nature, as high-energy astronomy has found various natural particle accelerators that are FAR more powerful than anything we're likely to build on Earth.
One hopes the author will do better next time.
Re: (Score:2)
Unlikely. This is his explanation of bosons:
"...theory of particle physics (boson is the name physicists give subatomic particles with particular properties)."
"One hopes the author will do better next time" (Score:2)
Bush and his internets (Score:2, Funny)
Re: (Score:2)
5 internets, please ... (Score:2)
That's a LOT of data (Score:2, Funny)
Umm, question. Is this BEFORE or AFTER time stops?
Think for a moment (Score:3, Interesting)
Think about it, the only thing stopping us is the ability to store and transfer large amounts of data necessary to describe the precise makeup of a human being. I have a feeling this project will branch off into that area.
Re: (Score:3, Funny)
Data != Information (Score:2)
I suspect that 15 petabytes of data will actually be equivalent to at most a 2x the information in a number of standard model journal articles and texts. They just have to figure out the right compression kernel.
22 Internets? (Score:2, Funny)
22 Internets (Score:2, Funny)
Google deal (Score:2)
Re: (Score:2)
Re: (Score:2)
Last I heard, they'll be able to add to the structures in-place. FNAL will have to spend some money, but things will be fixed without delaying the project.
And you were aware that FNAL's work passed multiple independent review committees and CERN signed off on it? It just turned out that the same oversight was made by all.
In the end, a little egg-on-face for the US, but not a h
Re: (Score:2, Informative)
Truth: There are several news agencies that have booked flights to descend upon CERN at the "supposed" start of the LHC in November. What will they come and see, lots of hype and not much!
What will happen? Single beam commissioning earliest in May. Collisions probably in August. Not earlier.
I hate being a Anon Coward, but there you go... Yes, I am sitting at a CERN office right now.
Re: (Score:2)
Re: (Score:2)
Only if it really exists... how can you discover something that you have already discov...gurk too much recursion.
Re: (Score:2)
Re:Remember (Score:5, Informative)
I think total - transatlantic fiber plus the European equivalent of Internet2 - bandwidth to CERN will amount to 100 Gbps - about 10 OC-192s. Universities buy into private global fiber networks, which are independent of the public internet.
We then use gridFTP as a transport, which is basically PKI-protected FTP which transfers in N many parallel TCP streams. Then, we use a protocol called SRM to control the gridFTP transfers and (well, the CMS experiment) uses a higher-level application called PhEDEx to control worldwide data movement. Right now, PhEDEx directs about 8-10 Gbps worldwide, and we aren't "doing anything" big.
GridFTP is a fairly effective protocol. I can get near-line speed - 2Gbps from a channel bonded RAID device. Locally, we've been buying large RAIDs - 30TB a box, building up to 200TB this fall. Some sites take a more "clustered" approach - they put a few 500-750 GB drives in each of the cluster's worker nodes, and build up to 200TB that way. Costs are lower, but you have to keep 2 copies of each file in the cluster, plus have the headache of swapping out drives. Of course, I like our method better. In addition, larger, T1 sites have a few petabytes in tape silos.
Funding agencies don't just throw money into projects for years at a time, then wait for results. Two years ago, we did a test at 25% of the turn-on "complexity" (in terms of jobs run and data movement). Last year, we increased that to 50% complexity. Toward the end of this summer, we will have a challenge called CSA07 which should be between 75-100% complexity. Finally, turn-on should be around November this year.
This is a multi-billion dollar project which has been under development for 10-15 years. We've been doing lots and lots of careful planning.
Re: (Score:2, Informative)
Re: (Score:2)
you broke (Score:2)