Computer Science Tools Flood Astronomers With Data 60
purkinje writes "Astronomy is getting a major data-gathering boost from computer science, as new tools like real-time telescopic observations and digital sky surveys provide astronomers with an unprecedented amount of information — the Large Synoptic Survey Telescope, for instance, generates 30 terabytes of data each night. Using informatics and other data-crunching approaches, astronomers — with the help of computer science — may be able to get at some of the biggest, as-yet-unanswerable cosmological questions."
Re: (Score:2)
Re: (Score:2)
That's the speed of illumination you're talking about, not the speed of light.
I gots to nitpick (Score:1, Insightful)
methinks its the sensors that are doing the flooding. Not "computer science tools"
Re: (Score:2)
Re: (Score:2)
Yeah, but the "computer science tools" wouldn't know what were interesting things unless the astronomers tell them. So basically this is just the astronomers flooding themselves.
Re: (Score:1)
Yeah, but the "computer science tools"
all sit together at the back of the class
They might be annoying, (Score:1, Funny)
But I think "tools" is a bit offensive. They're trying to help the astronomers in a meaningful way.
too much? (Score:2)
Re:too much? (Score:5, Insightful)
There's no such thing as too much data in a case like this, assuming that they can store it all. Even if it's too much to parse now, it won't be in a few years. Get as much data as we can now, while there's funding for it.
Re: (Score:3)
Re: (Score:2)
30TB a night, for a single telescope. The cost of storing such amounts of data would be astronomical *wink*.
Re: (Score:3)
Which is pretty much the same problem astronomy has had since roughly forever... Looking in the wrong place. Looking at the wrong time. Looking in the wrong wavelength. Look for the wrong search terms. Looking on the wrong page... It's all pretty much the same.
The sky and the data will be there tomorrow and they'll try ag
Re: (Score:1)
For a lot of things .. but obviously not all. But, the concept that you will catch that one transient that will help you is astronomical as well.
True in all fields (Score:4, Interesting)
Many sciences are experiencing this trend. A branch of biochemistry known as metabolomics is a growing field right now (in which I happen to be participating). Using tools like liquid chromatography coupled to mass spectrometry we can get hundreds of megabytes of data per hour. Even worse is the fact that a large percentage of that data is explicitly relevant to a metabolomic profile. The only practical way of analyzing all of this information is through computational analysis, either through statistical techniques used to condense and compare the data, or though searches on painstakingly generated metabolomic libraries.
That is just my corner of the world, but I imagine that many of the low hanging fruits of scientific endeavor have already been picked, going forward, I believe that the largest innovations will come from the people willing to tackle data sets that a generation ago would be seen as insurmountable.
Re: (Score:2)
Many sciences are experiencing this trend.
Yes, the piracy sciences have been particularly hard hit. Modern piracy enegineering can easily generate the equivalent of 10 blu-rays, or 500 gigabytes, per day. Modern data reduction tools such as x264 have been developed to deal with this data overload, and can frequently reduce a 50GB bluray by more than 10:1 down to 8GB or less without a significant loss of information in the processed data.
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
I download Linux distro torrents faster than "hundreds of megabytes per hour".
At that speed, a full day's worth of data is only a few GB, or roughly 10,000 less than discussed in TFA.
Still, analysing even a few GB of data a day is no task for mere men.
Re: (Score:2)
"Still, analysing even a few GB of data a day is no task for mere men."
Unless it's a word document or power point presentation in which someone has embedded an uncompressed video or bunch of uncompressed images. Then you can get through it in about 5 minutes flat, not counting the half hour it takes Word/Powerpoint to load.
No, in all seriousness though, it really depends what the data is. That's why I'm not keen on this arbitrary "many gigabytes of data" metric which articles like this are supposed to wow u
Re:If you'd like to help with all that data... (Score:4, Interesting)
Software that could 'be surprised' would be nice, but it's a long, long way off.
Never Too much Data (Score:1)
I'm not an expert in Astronomy, but in general, I don't think you can collect too much data, as long as its stored in an at least somewhat intelligible format. This way, even if professional astronomers miss something today, amateurs and/or future astronomers will have tons of data to pick apart and scavenge tomorrow.
Plus, more data should make it easier to test hypotheses with more certainty. Hopefully, the data will be made publicly available after the gatherers have had a shot or two at it.
Google writ small... (Score:1)
30TB per day works out to about 10 petabytes per year. If you compare this to the total amount of data produced in a year (from all human sources), around a zetabyte, it's not that huge. In fact, IIRC, the yearly transfer rate of the internet is around 250 exabytes. The people with the really hard job of data processing are internet search engines. Not only do they have to through several orders of magnitude more data, they have to do it faster, and with much less clearly defined queries.
I sometimes wonder
Re: (Score:2)
What're you, Dan fucking Quayle???
OK, OK, It's totally a "get off my lawn joke"...
Re: (Score:1)
Except... that's the right way to spell it... it's not "potatos".
Re: (Score:1)
informatics? (Score:1)
For some reason, that word scares me [marketinginformatics.com]..
Generates? Wrong tense. (Score:5, Informative)
*WILL* generate. LSST isn't operating yet.
And yes, 30TB is a lot of data now, but we have some time before they finally have first light.
Operations isn't supposed to start 'til 2019 : http://www.lsst.org/lsst/science/timeline [lsst.org]
We just need network and disk drive sizes to keep doubling at the rate they have, and we'll be laughing about how we thought 30TB/night was going to be a problem.
SDO finally launched last year with a date rate of over 1TB/day ... and all through planning, people were complaining about the data rates ... it's a lot, but it's not insurmountable as it might've been 8 years ago, when we were looking at 80 to 120GB disks.
Although, it'd be nice if monitor resolutions had kept growing ... if anything, they've gotten worse the last couple of years.
(Disclaimer : I work in science informatics; I've run into Kirk Bourne at a lot of meetings, and we used to work in the same building, but we we deal with different science disciplines)
Re:Generates? Wrong tense. (Score:5, Informative)
In fact, they just started blasting the site. I actually live next door to the LSST's architect, which is pretty cool.
Astronomers generate a tremendous amount of data, bested only by particle physicists. Storing it all is a challenge, to put it mildly. Backup is basically impossible.
The real problem is that the data lines that go from the summit to the outside world are still not fast. The summits here are pretty remote and even when you get to a major road, it's still in farm country. And then getting it out of the country is tough--all of our network traffic to North America hits a major bottleneck in Panama, so if you're trying to mirror the database or access the one in Chile, it can be frustratingly slow.
Re: (Score:2)
As far as I understand it, the data will be available also to the general public. I assume that means they will need to have a global network of caches?
Possibly. It depends on how much the general public actually wants to download the data; if it is just selected images instead of the bulk (most of which will be boring "not much happening here" stuff) then serving it from a single site will be quite practical.
Re: (Score:2)
Astronomers generate a tremendous amount of data, bested only by particle physicists.
Earth scientists will merrily generate far more — they're purely limited by what they can store and process, since deploying more sensors is always possible — but they're mostly industrially funded, so physicists and astronomers pretend to not notice.
Re: (Score:1)
Re: (Score:2)
Deploying more telescopes is always possible as well.
This isn't a race about who can fill up storage space the quickest.
Re: (Score:1)
At least you aren't at Dome A. You might would have to you some tropospheric (to no pay outrageous SAT usage rates).
Re: (Score:2)
*WILL* generate. LSST isn't operating yet.
This, unless they have a time machine. ;)
The first Pan-STARRS scope with its 1.3-gigapixel camera has been doing science for a little while now, and I think it might do something like 2.5TB a night. That's still a lot of disk (and keep in mind that they originally planned to have 4 of those scopes), but I think their pipeline reduces it all to coordinates for each bright thingy in the frame and then throws away the actual image (though I could be wrong).
Where I work, our highest-resolution toy is 80 megap
Re: (Score:1)
Re: (Score:1)
Do astronomers compress? (Score:2)
Ok, I know this doesn't solve the problem of actually ANALYZING the data but for storing and moving the data around, what's the best compression algorithm for astronomical (I mean the discipline, not the size!) data.
I used to work for a company that developed a really good compression algorithm using wavelets. At the time it was the only one to be accepted by A-list movie directors (the people with the real power in Hollywood); they refused to go with any of the JPEG or MPEG variants (this was before JPEG
Re: (Score:2)
Do they "clean up" the images first to make it easier to compress?
Normally they don't. Compression algorithms, almost by definition, create artifacts that are difficult if not impossible to distinguish from potentially interesting data. So science imagery is almost always saved in 'raw' format, unless you have no other option like with your Gallileo example. Imagine applying a dead pixel detection to an astronomy image: 'poof!', all the stars magically disappear!
Re: (Score:2)
Not all compression algorithms are lossy, though the lossless ones aren't nearly as space-efficient.
But some form of lossy compression might work too; it would be easy to filter the images so, for instance, any "nearly-black" pixel is set to black. Add some RLE and you have compression.
The key to lossy compression is having a way to determine what type of data isn't as important and approximating that data.
Re: (Score:2)
The key to lossy compression is having a way to determine what type of data isn't as important and approximating that data.
The problem with research is that until you've looked, you don't know what you are looking for...
Re: (Score:2)
Which is why I said "for instance". I don't know what the researchers are looking for, but I'm pretty sure the researchers themselves have a decent understanding what data they want. In contrast to what dargaud mentioned above, most researchers set out to find specific data to prove or disprove a theorem; they only a specific subset of all data collected. Very few researchers try and discover things in a random set of raw data.
If all you want to know is the amount of stars in a specific picture, you can kee