Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Cloud Data Storage Science

LHC Data Generation Expected To Scale Up To 400PB a Year 99

DW100 writes: Cern has said it expects its experiments with the Large Hadron Collider to generate as much as 400PB of information per year by 2023 as the scope of its work continues to expand. Currently LHC experiments have generated an archive of 100PB and this is growing by 27PB per year. Cern infrastructure manager Tim Bell, speaking at the OpenStack Summit in Paris, said the organization is using OpenStack to underpin this huge data growth, hoping it can handle such vast reams of potentially universe-altering information.
This discussion has been archived. No new comments can be posted.

LHC Data Generation Expected To Scale Up To 400PB a Year

Comments Filter:
  • by Racemaniac ( 1099281 ) on Wednesday November 05, 2014 @03:19AM (#48315701)

    you mean how we see the universe? because i doubt the universe cares much about the data we generate....

    • by Urkki ( 668283 )

      Well, universe damn well should care! Once we trigger the next phase change in the currently metastable quantum space, there'll be no going back. There won't even be time so even talking about going back makes no sense.

      It's a bit like kids, once you get one started, you won't have much time (or any, depending on your moral values and local laws) to get rid of it, before it will trigger irreversible phase change in your life.

      If only someone had explained these things to our universe, before it started to exp

    • Technically speaking, everything we generate, including data, alters the universe.

    • by necro81 ( 917438 )

      because i doubt the universe cares much about the data we generate....

      Oh, I don't know. Eventually we'll have so many hard drives dedicated to it that it'll collapse into a black hole.

      Or - wait for it - the computing power requirements scale so large that the only way to keep the whole enterprise going is to build a Dyson sphere.

      Maybe the universe won't care even then, but we'll at least come closer to leaving our mark!

    • by Krymzn ( 1812686 )
      Changing our understanding changes our brains, which are part of the universe. Then the knowledge in those brains can be used to change the universe.
  • Compared to Facebook (Score:4, Informative)

    by pmontra ( 738736 ) on Wednesday November 05, 2014 @03:20AM (#48315707) Homepage

    To put this in perspective, Facebook states [facebook.com] to be generating 4 PB per day, so 3.6 times more than the LHC. Does anybody know about anything generating more data than that?

    • by Bob_Who ( 926234 )
      Reality.
      • Actually the LHC generates more data than this. The talk is only talking about the data at CERN. The last count of all the files in the ATLAS experiment's DQ2 store (a distributed dataset access system with storage around the globe) was 161PB. This value includes all the simulated data, analysis data etc. I'm certain CMS has a comparable amount and then there are Alice and LHCb as well so the total will be well over the 300PB which Facebook stores.

        While Facebook generates 4 PB of new data per day they o
    • by Anonymous Coward

      NSA, they generate all the data of everyone combined.

    • by Guspaz ( 556486 )

      Facebook is generating 4 PB per day *now*, while the LHC will be generating 400PB per year by *2023*. 27PB to 400PB in 9 years is MUCH slower than Moore's Law, so their annual storage costs/space requirements will decrease each year.

      With the highest density servers I know of (1U 136TB SSD servers), LHC generates around five racks of data per year today. By 2023, they will only be generating around one rack of data per year, based on an 18-month Moore's Law.

      • I'm not doubting or challenging you, but I'm interested in knowing about your 1U 136TB SSD servers. Can you suggest some specs?

        The highest-density boxes I get to have some familiarity with are Netflix's OpenConnect caches, described at https://openconnect.itp.netfli... [netflix.com] -- where it's mentioned that they fit 36 6TB drives in a 2U chassis, for a total of 216TB, or 108TB/U. You're beating that, and with SSDs, which is ... impressive.

        • I believe these [anandtech.com] are the boxes that are being used.
        • Just about $400K for their 136TB 1U server. I don't think anyone needs any more detailed specs than that.
          • by tibit ( 1762298 )

            For that kind of money, you could hire an equivalent of two full-time engineers for a year to design that thing from the scratch for you, and you'd probably get a couple of production units out of that deal, too. You'd need an ME to do thermal and case design, shouldn't take longer than a couple months for that. An EE to do any custom mezzanine boards that one might need + wiring and overall electrical design. Finally, a software guy to make a config console etc. I assume that project management is not incl

        • by mlts ( 1038732 )

          On the cheap, there are always Backblaze's storage pods. They take up more than 1 RU, but for something that is about $10,000 in price, the price is right.

          This is tier 3 storage, though. If you want actual enterprise-grade stuff, it costs a lot more, but it will come with enterprise-grade performance and enterprise-grade warranties.

          Of course, for long term storage for a lot of data, it is hard to beat LTO-6 for I/O speed and cheap capacity. After the drives and silos are in, if another PB is needed, that

      • The 4PB they generate is actually highly compressible and they are most likely referring to raw logs. According to the page [facebook.com] which quotes the 4 PB figure; FB states they have 300 petabytes of data in 800,000 tables in total. Since that's about 75 days worth of data and FB has been around far longer than that they are most likely referring to logs. So the actual disk space they need per day is far less than 4PB.
      • by Thanshin ( 1188877 ) on Wednesday November 05, 2014 @04:58AM (#48316015)

        So, the LHC should just create a Facebook profile and store all the data on steganographied selfies and baby pictures.

    • by Zocalo ( 252965 )
      The porn industry would be a good bet, but on a more serious note the SKA will easily exceed that, once it becomes operational. Lots of radio telescopes, listening on lots of frequencies, and able to run 24/7... I found a few articles on likely data sizes and some of the figures are insanely large - the test size, a mere 1% of the size that final project project will reach, spits out raw data at 60Tbit/s and even after compression that's 1GByte/s. Figures for the completed array are in the range of 1EByt
    • Very true, but consider the sources and what is generating it.

      Facebook is a large percentage of the Internet.

      Cern is ONE project (with multiple experiments).

      Also, this data has to be ARCHIVED and ACCESSIBLE for all time so that scientists can go back and compare/research past experiments.

      Although I'm sure facebook is archiving a large portion of data, I doubt they archive ALL of it for all time.

  • I doubt that OpenStack can handle it, but if they have the $$ for it, I'm sure that it's no big deal for Oracle.
    • OpenStack is simply a cloud framework. What does any of that have to do with Oracle? In any case, this would be a great test case for a ginormous ceph cluster. I use ceph in conjunction with approximately 10PB of storage and am looking to increase that by at least an order of magnitude over the next year or two.

      More info on ceph: http://en.wikipedia.org/wiki/C... [wikipedia.org]

  • by Anonymous Coward

    (Disclaimer: Grad student)

    ATLAS generates O(PB) of raw data per second, but we only trigger on events that look interesting (e.g. have an isolated lepton, a sign that something more than QCD background happened in the event), and save those for offline analysis. That works out to something on the order of 100s of MBps being saved during run time. I assume the other experiments have similar data rates.

  • hmm... lets see

    COBOL.. dead
    BSD.. dead
    TAPE .. dead

    And yet there would be no LHC datacenter without tape.

    The CERN Tier-0 in Meyrin currently provides around 45 PetaBytes of data storage on disk and 90 PetaBytes on tape for physics, and includes the majority of the 100,000 processing cores in the CERN Data Centre.

    ref:

    http://information-technology.... [web.cern.ch]
    http://www.economist.com/blogs... [economist.com]
    http://storageservers.wordpres... [wordpress.com]
    http://scribol.com/science/hal... [scribol.com]

  • ... PornHub for physicists.

  • by koan ( 80826 )

    That amount of data is something only an AI could get through.

    • there are no AI and none are planned to exist in high energy physics, the usual methods of filtering and statistics will instead be used and will suffice

  • Sounds like they really don't know what they want, except they want it all. Using other people's money, of course!
  • LHC experiments have generated an archive of 100PB...

    Torrent link?

Intel CPUs are not defective, they just act that way. -- Henry Spencer

Working...