Forgot your password?
typodupeerror
Science IT Technology

IT At the LHC — Managing a Petabyte of Data Per Second 248

Posted by Soulskill
from the take-a-drink-from-the-science-firehose dept.
schliz writes "iTnews in Australia has published an interview with CERN's deputy head of IT, David Foster, who explains what last month's discovery of a 'particle consistent with the Higgs Boson' means for the organization's IT department, why it needs a second 'Tier Zero' data center, and how it is using grid computing and the cloud. Quoting: 'If you were to digitize all the information from a collision in a detector, it’s about a petabyte a second or a million gigabytes per second. There is a lot of filtering of the data that occurs within the 25 nanoseconds between each bunch crossing (of protons). Each experiment operates their own trigger farm – each consisting of several thousand machines – that conduct real-time electronics within the LHC. These trigger farms decide, for example, was this set of collisions interesting? Do I keep this data or not? The non-interesting event data is discarded, the interesting events go through a second filter or trigger farm of a few thousand more computers, also on-site at the experiment. [These computers] have a bit more time to do some initial reconstruction – looking at the data to decide if it’s interesting. Out of all of this comes a data stream of some few hundred megabytes to 1Gb per second that actually gets recorded in the CERN data center, the facility we call "Tier Zero."'"
This discussion has been archived. No new comments can be posted.

IT At the LHC — Managing a Petabyte of Data Per Second

Comments Filter:
  • Keeping us humble... (Score:3, Interesting)

    by Anonymous Coward on Friday August 03, 2012 @08:37AM (#40867235)

    My wife, a staff physicist at FermiLab in their computing division, manages to keep me humble when I talk about the "big data" work I'm doing in my commercial engineering position. I think having to deal with a billion or so data points per day is big... Not so much in her universe!

  • GRID ack (Score:4, Interesting)

    by PiMuNu (865592) on Friday August 03, 2012 @08:49AM (#40867365)
    I tried using the GRID - it's deeply embedded in acronyms and crud, practically impossible to use without a PhD. For crying out loud, it's just a batch farm!
  • Re:You mean... (Score:5, Interesting)

    by cduffy (652) <charles+slashdot@dyfis.net> on Friday August 03, 2012 @10:58AM (#40868955)

    VMWare is pretty widely recognized as the king of virtualization-- at least so long as you arent concerned with money. Its overhead is far far smaller than the others especially when dealing with huge numbers of connections, and it simply has more features than its competitors.

    Which doesn't mean those features are implemented well.

    Not so long ago, I built an automated QA platform on top of Qumranet's KVM. Partway through the project, my employer was bought by Dell, a VMware licensee. As such, we ended up putting software through automated testing on VMware, manual testing on Xen (legacy environment, pre-acquisition), and deployment to a mix of real hardware and VMware.

    In terms of accurate hardware implementation, KVM kicked the crap out of what VMware (ESX) shipped with at the time. We had software break because VMware didn't implement some very common SCSI mode pages (which the real hardware and QEMU both did), we had software break because of funkiness in their PXE implementation, and we otherwise just plain had software *break*. I sometimes hit a bug in the QEMU layer KVM uses for hardware emulation, but when those happened, I could fix it myself half the time, and get good support from the dev team and mailing list otherwise. With VMware, I just had to wait and hope that they'd eventually get around to it in some future release.

    "King of virtualization"? Bah.

Support bacteria -- it's the only culture some people have!

Working...