Computer Science Tools Flood Astronomers With Data 60

Posted by Soulskill on Tuesday July 19, 2011 @09:10PM from the time-to-build-the-data-ark dept.

purkinje writes "Astronomy is getting a major data-gathering boost from computer science, as new tools like real-time telescopic observations and digital sky surveys provide astronomers with an unprecedented amount of information — the Large Synoptic Survey Telescope, for instance, generates 30 terabytes of data each night. Using informatics and other data-crunching approaches, astronomers — with the help of computer science — may be able to get at some of the biggest, as-yet-unanswerable cosmological questions."

Computer Science Tools Flood Astronomers With Data

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 60 Comments Log In/Create an Account

Comments Filter:

- Re: (Score:2)
  
  by GumphMaster ( 772693 ) writes:
  
  Actually, the speed at which science funding migrates from one flavour-of-the-month to the next clearly exceeds the speed of light. If we could turn that speed violation into workable time travel we could start processing the data mountain (astronomy data is not alone here) about 3000 years ago so that it is complete by lunch-time Sunday.
  - Re: (Score:2)
    
    by RockDoctor ( 15477 ) writes:
    
    Actually, the speed at which science funding migrates from one flavour-of-the-month to the next clearly exceeds the speed of light.
    That's the speed of illumination you're talking about, not the speed of light.
I gots to nitpick (Score:1, Insightful)

by Anonymous Coward writes:

methinks its the sensors that are doing the flooding. Not "computer science tools"
- Re: (Score:2)
  
  by Fourier404 ( 1129107 ) writes:
  
  The sensors wouldn't be picking up anything interesting if they weren't automatically being pointed at interesting things. There aren't enough astronomers to do the pointing manually.
  - Re: (Score:2)
    
    by mwvdlee ( 775178 ) writes:
    
    Yeah, but the "computer science tools" wouldn't know what were interesting things unless the astronomers tell them. So basically this is just the astronomers flooding themselves.
    - Re: (Score:1)
      
      by uninformedLuddite ( 1334899 ) writes:
      
      Yeah, but the "computer science tools"
      all sit together at the back of the class
They might be annoying, (Score:1, Funny)

by Anonymous Coward writes:

But I think "tools" is a bit offensive. They're trying to help the astronomers in a meaningful way.
too much? (Score:2)

by danbuter ( 2019760 ) writes:

My biggest issue would be if there is too much information. What if the scientists are using the wrong search queries and missing something important? Or maybe something important is just buried on page 931 of a 2,000 page data report. Still, it's better than the opposite problem, of just not having the data to search.
- Re:too much? (Score:5, Insightful)
  
  by SoCalChris ( 573049 ) writes: on Tuesday July 19, 2011 @10:14PM (#36818658) Journal
  
  There's no such thing as too much data in a case like this, assuming that they can store it all. Even if it's too much to parse now, it won't be in a few years. Get as much data as we can now, while there's funding for it.
  
  - Re: (Score:3)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
  - Re: (Score:2)
    
    by mwvdlee ( 775178 ) writes:
    
    30TB a night, for a single telescope. The cost of storing such amounts of data would be astronomical *wink*.
- Re: (Score:3)
  
  by DerekLyons ( 302214 ) writes:
  
  What if the scientists are using the wrong search queries and missing something important? Or maybe something important is just buried on page 931 of a 2,000 page data report?
  Which is pretty much the same problem astronomy has had since roughly forever... Looking in the wrong place. Looking at the wrong time. Looking in the wrong wavelength. Look for the wrong search terms. Looking on the wrong page... It's all pretty much the same.
  
  The sky and the data will be there tomorrow and they'll try ag
  - Re: (Score:1)
    
    by ginbot462 ( 626023 ) writes:
    
    For a lot of things .. but obviously not all. But, the concept that you will catch that one transient that will help you is astronomical as well.
True in all fields (Score:4, Interesting)

by eparker05 ( 1738842 ) writes: on Tuesday July 19, 2011 @09:38PM (#36818424)

Many sciences are experiencing this trend. A branch of biochemistry known as metabolomics is a growing field right now (in which I happen to be participating). Using tools like liquid chromatography coupled to mass spectrometry we can get hundreds of megabytes of data per hour. Even worse is the fact that a large percentage of that data is explicitly relevant to a metabolomic profile. The only practical way of analyzing all of this information is through computational analysis, either through statistical techniques used to condense and compare the data, or though searches on painstakingly generated metabolomic libraries.
That is just my corner of the world, but I imagine that many of the low hanging fruits of scientific endeavor have already been picked, going forward, I believe that the largest innovations will come from the people willing to tackle data sets that a generation ago would be seen as insurmountable.

- Re: (Score:2)
  
  by Jah-Wren Ryel ( 80510 ) writes:
  
  Many sciences are experiencing this trend.
  Yes, the piracy sciences have been particularly hard hit. Modern piracy enegineering can easily generate the equivalent of 10 blu-rays, or 500 gigabytes, per day. Modern data reduction tools such as x264 have been developed to deal with this data overload, and can frequently reduce a 50GB bluray by more than 10:1 down to 8GB or less without a significant loss of information in the processed data.
- Re: (Score:1)
  
  by geekatech ( 2393170 ) writes:
  
  We got Bioinformatics, now what would this field be called? Astroinformatics? The Square Kilometre Array project is another example of this.
- Re: (Score:1)
  
  by etherelithic ( 846901 ) writes:
  
  Hm, small world--I'm also in metabolomics (more on the computational end than the biological side of things, what I like to call computational metabolomics). I was going to write a post similar to your own, but more generalized for those who aren't familiar with the biology behind it. The issue now is that well established informatics/statistical/computer science approaches are used as general tools in biology/astronomy/biochemistry, and there is a great need to formulate novel algorithms to take advantage
- Re: (Score:2)
  
  by mwvdlee ( 775178 ) writes:
  
  I download Linux distro torrents faster than "hundreds of megabytes per hour".
  At that speed, a full day's worth of data is only a few GB, or roughly 10,000 less than discussed in TFA.
  Still, analysing even a few GB of data a day is no task for mere men.
  - Re: (Score:2)
    
    by Xest ( 935314 ) writes:
    
    "Still, analysing even a few GB of data a day is no task for mere men."
    Unless it's a word document or power point presentation in which someone has embedded an uncompressed video or bunch of uncompressed images. Then you can get through it in about 5 minutes flat, not counting the half hour it takes Word/Powerpoint to load.
    No, in all seriousness though, it really depends what the data is. That's why I'm not keen on this arbitrary "many gigabytes of data" metric which articles like this are supposed to wow u
- Re:If you'd like to help with all that data... (Score:4, Interesting)
  
  by NoNonAlphaCharsHere ( 2201864 ) writes: on Tuesday July 19, 2011 @10:16PM (#36818670)
  
  Annnnd... we have a winner. GalaxyZoo uses tens of thousands of underutilized, superfluous, non-specialized 'carbon units' for pattern recognition, which they're really really really good at, that is, 800mS after looking at an image -> elliptical, spiral, irregular... "Hmmm, hey, that's funny... wait... WTF --- let's post this to the forum, where hundreds of other random carbon units will weigh it, and a For Really Astronomer(TM) will be checking it out inside 24 hours if it creates enough buzz..." see Hanny's Voorwerp [wikipedia.org] for the quintessential example.
  
  Software that could 'be surprised' would be nice, but it's a long, long way off.
  
Never Too much Data (Score:1)

by Sinthet ( 2081954 ) writes:

I'm not an expert in Astronomy, but in general, I don't think you can collect too much data, as long as its stored in an at least somewhat intelligible format. This way, even if professional astronomers miss something today, amateurs and/or future astronomers will have tons of data to pick apart and scavenge tomorrow.
Plus, more data should make it easier to test hypotheses with more certainty. Hopefully, the data will be made publicly available after the gatherers have had a shot or two at it.
Google writ small... (Score:1)

by Byrel ( 1991884 ) writes:

30TB per day works out to about 10 petabytes per year. If you compare this to the total amount of data produced in a year (from all human sources), around a zetabyte, it's not that huge. In fact, IIRC, the yearly transfer rate of the internet is around 250 exabytes. The people with the really hard job of data processing are internet search engines. Not only do they have to through several orders of magnitude more data, they have to do it faster, and with much less clearly defined queries.
I sometimes wonder
- - Re: (Score:2)
    
    by NoNonAlphaCharsHere ( 2201864 ) writes:
    
    So who else is producing 10 petabytes of *original* *unique* data annually if this is such small potatoes?
    
    What're you, Dan fucking Quayle???
    
    OK, OK, It's totally a "get off my lawn joke"...
    - Re: (Score:1)
      
      by arkane1234 ( 457605 ) writes:
      
      Except... that's the right way to spell it... it's not "potatos".
- Re: (Score:1)
  
  by blueAt0m ( 2393244 ) writes:
  
  Sounds like another task for IBM's Watson. The way I understand the problem, most scientists must be in cohorts with skilled CS folk to generate the types of answers from such large datasets, or they must be half cs folk themselves in order to traverse such scales of data. Quite an undertaking when professionals should be focused in one area. Let alone conveying the ideas of either field to the other how they themselves see/understand it. However the dawn of asking Watson or Enterprise to figure something
informatics? (Score:1)

by countertrolling ( 1585477 ) * writes:

For some reason, that word scares me [marketinginformatics.com]..
Generates? Wrong tense. (Score:5, Informative)

by oneiros27 ( 46144 ) writes: on Tuesday July 19, 2011 @10:24PM (#36818720) Homepage

*WILL* generate. LSST isn't operating yet.
And yes, 30TB is a lot of data now, but we have some time before they finally have first light.
Operations isn't supposed to start 'til 2019 : http://www.lsst.org/lsst/science/timeline [lsst.org]
We just need network and disk drive sizes to keep doubling at the rate they have, and we'll be laughing about how we thought 30TB/night was going to be a problem.
SDO finally launched last year with a date rate of over 1TB/day ... and all through planning, people were complaining about the data rates ... it's a lot, but it's not insurmountable as it might've been 8 years ago, when we were looking at 80 to 120GB disks.
Although, it'd be nice if monitor resolutions had kept growing ... if anything, they've gotten worse the last couple of years.
(Disclaimer : I work in science informatics; I've run into Kirk Bourne at a lot of meetings, and we used to work in the same building, but we we deal with different science disciplines)

- Re:Generates? Wrong tense. (Score:5, Informative)
  
  by Carnivore ( 103106 ) writes: on Tuesday July 19, 2011 @11:09PM (#36818990)
  
  In fact, they just started blasting the site. I actually live next door to the LSST's architect, which is pretty cool.
  Astronomers generate a tremendous amount of data, bested only by particle physicists. Storing it all is a challenge, to put it mildly. Backup is basically impossible.
  The real problem is that the data lines that go from the summit to the outside world are still not fast. The summits here are pretty remote and even when you get to a major road, it's still in farm country. And then getting it out of the country is tough--all of our network traffic to North America hits a major bottleneck in Panama, so if you're trying to mirror the database or access the one in Chile, it can be frustratingly slow.
  
  - - Re: (Score:2)
      
      by dkf ( 304284 ) writes:
      
      As far as I understand it, the data will be available also to the general public. I assume that means they will need to have a global network of caches?
      Possibly. It depends on how much the general public actually wants to download the data; if it is just selected images instead of the bulk (most of which will be boring "not much happening here" stuff) then serving it from a single site will be quite practical.
  - Re: (Score:2)
    
    by dkf ( 304284 ) writes:
    
    Astronomers generate a tremendous amount of data, bested only by particle physicists.
    Earth scientists will merrily generate far more — they're purely limited by what they can store and process, since deploying more sensors is always possible — but they're mostly industrially funded, so physicists and astronomers pretend to not notice.
    - Re: (Score:1)
      
      by csrster ( 861411 ) writes:
      
      Theoreticians surely generate most because they're only limited by how far a CPU can churn out floating-point numbers.
    - Re: (Score:2)
      
      by mwvdlee ( 775178 ) writes:
      
      Deploying more telescopes is always possible as well.
      This isn't a race about who can fill up storage space the quickest.
  - Re: (Score:1)
    
    by ginbot462 ( 626023 ) writes:
    
    At least you aren't at Dome A. You might would have to you some tropospheric (to no pay outrageous SAT usage rates).
- Re: (Score:2)
  
  by Shag ( 3737 ) writes:
  
  *WILL* generate. LSST isn't operating yet.
  This, unless they have a time machine. ;)
  The first Pan-STARRS scope with its 1.3-gigapixel camera has been doing science for a little while now, and I think it might do something like 2.5TB a night. That's still a lot of disk (and keep in mind that they originally planned to have 4 of those scopes), but I think their pipeline reduces it all to coordinates for each bright thingy in the frame and then throws away the actual image (though I could be wrong).
  Where I work, our highest-resolution toy is 80 megap
- Re: (Score:1)
  
  by slackbheep ( 1420367 ) writes:
  
  Because time travel is a bigger issue than storage space? Why don't you try taking a picture of yesterdays sunset, and get back to me?
  - - Re: (Score:1)
      
      by csrster ( 861411 ) writes:
      
      This is a really poor argument for several reasons:
      
      i) telescope time is a scarce resource. If I need an image of a galaxy X I might have to wait years to get telescope time for it. If galaxy X has already been observed once and the data stored then I can do my new research (e.g. datamining) on the existing data. Nobody knows in advance which data is going to be interesting to future researchers so triage is almost impossible.
      ii) telescopes have finite lifetimes. Once the telescope/instrument ceases to
Do astronomers compress? (Score:2)

by wisebabo ( 638845 ) writes:

Ok, I know this doesn't solve the problem of actually ANALYZING the data but for storing and moving the data around, what's the best compression algorithm for astronomical (I mean the discipline, not the size!) data.
I used to work for a company that developed a really good compression algorithm using wavelets. At the time it was the only one to be accepted by A-list movie directors (the people with the real power in Hollywood); they refused to go with any of the JPEG or MPEG variants (this was before JPEG
- Re: (Score:2)
  
  by dargaud ( 518470 ) writes:
  
  Do they "clean up" the images first to make it easier to compress?
  Normally they don't. Compression algorithms, almost by definition, create artifacts that are difficult if not impossible to distinguish from potentially interesting data. So science imagery is almost always saved in 'raw' format, unless you have no other option like with your Gallileo example. Imagine applying a dead pixel detection to an astronomy image: 'poof!', all the stars magically disappear!
  - Re: (Score:2)
    
    by mwvdlee ( 775178 ) writes:
    
    Not all compression algorithms are lossy, though the lossless ones aren't nearly as space-efficient.
    But some form of lossy compression might work too; it would be easy to filter the images so, for instance, any "nearly-black" pixel is set to black. Add some RLE and you have compression.
    The key to lossy compression is having a way to determine what type of data isn't as important and approximating that data.
    - Re: (Score:2)
      
      by dargaud ( 518470 ) writes:
      
      The key to lossy compression is having a way to determine what type of data isn't as important and approximating that data.
      The problem with research is that until you've looked, you don't know what you are looking for...
    - - Re: (Score:2)
        
        by mwvdlee ( 775178 ) writes:
        
        Which is why I said "for instance". I don't know what the researchers are looking for, but I'm pretty sure the researchers themselves have a decent understanding what data they want. In contrast to what dargaud mentioned above, most researchers set out to find specific data to prove or disprove a theorem; they only a specific subset of all data collected. Very few researchers try and discover things in a random set of raw data.
        If all you want to know is the amount of stars in a specific picture, you can kee

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Re: (Score:2)

Re: (Score:2)

I gots to nitpick (Score:1, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

They might be annoying, (Score:1, Funny)

too much? (Score:2)

Re:too much? (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:1)

True in all fields (Score:4, Interesting)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re:If you'd like to help with all that data... (Score:4, Interesting)

Never Too much Data (Score:1)

Google writ small... (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

informatics? (Score:1)

Generates? Wrong tense. (Score:5, Informative)

Re:Generates? Wrong tense. (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Do astronomers compress? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals