The Astronomical Event Search Engine 93
eldavojohn writes "Google has signed on with the Large Synoptic Survey Telescope project that will construct a powerful telescope in Chile by 2013. Google's part will be to 'develop a search engine that can process, organize, and analyze the voluminous amounts of data coming from the instrument's data streams in real time. The engine will create "movie-like windows" for scientists to view significant space events.' Google's been successful on turning its search technology on several different media and realms. Will they be successful with helping scientists tag and catalog events in our universe?" The telescope will generate 30 TB of data a night, for 10 years, from a 3-gigapixel CCD array.
3/4 LoC a night (Score:2, Interesting)
Seriously though, processing something the equivalent of 3/4th's of the LoC [loc.gov] every night is nothing to be sneezed at. Over the course of those 10 years that's about 110 Petabyte (40TB * 365.25 * 10) of unprocessed data.
Re:3/4 LoC a night (Score:4, Interesting)
Near Earth Objects (Score:2, Interesting)
Re: (Score:3, Informative)
Well, I was a very small cog for a very large telescope. But my understanding is pretty much exactly what you just said.
Re: (Score:2)
Re: (Score:2)
GPU-accel (Score:1)
Re: (Score:2)
Re: (Score:3, Insightful)
Google's PHd's and big thinkers could certainly play a part here. Google is about solving problems with large chunks of constantly changing data that has patterns and creating systems to identify and use those patterns. The web is simply a way for Google to apply the model.
Re: (Score:2)
I'm no expert here -- that's just my gut reaction, coming from the slightly-related field of experimental particle physics.
Re: (Score:1)
1. Google indexes massive amounts of data. The telescope imagery will be a massive amount of data.
2. Google has huge data centers capable of a great amount of distributed processing. The telescope data will require a lot of possibly parallel data processing (multiple images, FFTs on the images, comparison between sequences, etc)
3. Google has a plethora of graduate level employees - who better than a bunch of PhD scientists to stor
Re: (Score:2)
True enough, but google indexes massive amounts of data substantially different from imagery data. This would be something more akin to google earth, which is really nice but not particularly groundbreaking technology so far.
2. Google has huge data centers capable of a great amount of distributed processing. The telescope data will require a lot of possibly parallel data processing (multiple images, FFTs on the
Re: (Score:3, Informative)
HAMR (Score:1)
Re: (Score:2)
Re: (Score:2, Funny)
Re: (Score:2, Interesting)
LSST v. PanSTARRS Approach (Score:4, Interesting)
Conventional: LSST will use a single large telescope and detector; PanSTARRS (as it stands) intends to use a dedicated compute cluster for data reduction.
Novel: LSST is leaning towards distributing its data reduction task over Google's huge server farm; PanSTARRS will use four off-the-shelf 1.8m telescopes, each with a 1.4GP detector, mounted together to image the same piece of sky, and merging the overlapping images in post processing.
When I was working on the project, one of PanSTARRS requirements was to finish analyzing one night's viewing before the following sunset. Early on, the principal investigators decided to solve the image storage issue by not storing them permanently. Instead, once the science for a night's imaging had been extracted (astrometry, LEO or supernova detection, etc), the original images would hit the bit bucket. Whether they've stuck with that I don't know.
Re: (Score:2)
I don't like the idea of deleting the data - there's all sorts of ways it could be useful, most noticably you could add many images to get a deep field. (Hm, I suppose you could do that anyway - keep a 'running total' image.)
This is going to have a big effect on the microlensing surveys - they won't be able to compete with this rate of data aquisition, s
Re: (Score:2)
Yeah. Let's keep in mind that all astronomical observatory images are taken in a standardized lossless format, which is to say tiff. There's a helluva lot of data in every image, each individual file is huge.
BTW,
Venn ist das nurnstuck git und Slotermeyer? Ya! Beigerhund das oder die Flipperwaldt gersput!
Die ist ein Kinnerhunder und zwei Mackel über und der bitte schön is der Wunder
Great news (Score:2, Funny)
Re: (Score:2, Funny)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
And the new name is already picked out... (Score:1)
Re: (Score:2)
Why Google? (Score:1, Interesting)
Re: (Score:2)
Re: (Score:2, Funny)
Re: (Score:3, Interesting)
Re: (Score:2)
My opinion, they can. They just are not sharing it with the rest of the world.
well (Score:1, Flamebait)
That depends. Can you sell advertising doing that?
The only problem (Score:2, Funny)
Funny Thing (Score:1)
Re: (Score:1)
Many corps do have Hubbles; they point them downwards, not upwards.
Re: (Score:1)
30 TB a night... (Score:2)
Hopefully though by 2013 this will be a lot easier.
Re:30 TB a night... (Score:5, Informative)
Re: (Score:2)
Yeah, I did say without losing any detail. Video would usually still have a lot of potential for compression. Of course if you *can* do the analysis on the raw data before you store it ...
Anyway my point was, at 30TB per day for 10 years thats about 100K x 1TB disks assuming no further compression is possible. Google is definitely the company I would go to for that much distributed storage, processing and retrieval. I wouldn't want to manage that myself.
Re: (Score:1)
Re: (Score:2)
And even considering the general noise level, the high dynamic range of the images will mean that outside of actuall stars, nearly all of the bits will be zero.
Im sure a lossless reduction of one order of magnitude is entirely in the realm of the possible.
Re: (Score:2)
[TMB]
Re: (Score:2, Informative)
This is not strictly true. What's true is that the current standard lossy compression techniques mess up photometry. However, if you know what you are going to photometer and how you are going to photometer it, it is certainly possible to compress in a lossy way without ruining the photometry. In a trivial sense, photometry is lossy compression of data (you have turned huge images into a few numbers with
Re: (Score:1)
Re: (Score:1)
wow 3 gigapixels? (Score:1)
Re: (Score:1)
Science... worth every penny. (Score:2)
This whoever provides real obvious value. I could care less about the astronomical events... I guess there is some physics and maths and stuff that can be done... but the database and algorithms and computing systems needed to process all of this data will drive innovation, particularly since it's being do
Lots of data, but not as much as the LHC (Score:5, Informative)
I'd be interested to know more about the data handling methods they have in place for the LHC. I don't think they'll be using Excel.
*Note the correct, non-Frudian-Slip spelling of "hadron [google.com]"
Re: (Score:2)
It looks like you have begun collection ginormous amounts of data. Paperclip recommends you use Microsoft Access to handle large amounts of data. Would you like me to launch Microsoft Access now?
Re: (Score:1)
Re:Lots of data, but not as much as the LHC (Score:5, Funny)
Funny, but CERN itself makes that same misspelling of 'hadron' here [web.cern.ch]. "This is the underground tunnel of the Large Hardon (sic) Collider (LHC)..."
Re: (Score:2)
Mispelling?
You may be right. Proof By Googlefight shows that hadron beats hardon by an eight to one margin [googlefight.com]. That's surprising. I didn't think that I would be able to find anything that beat a hardon on the Internet.
Re:Lots of data, but not as much as the LHC (Score:5, Interesting)
The LHC will produce more data, but we also don't care about most of it. The vast majority of it is junk. The "interesting" physics (particles like W and Z bosons, top quarks, higgs, etc) are about 10^-9 of the events. It is a huge needle in a haystack problem and we throw out most data. We have many experts and professors who design "triggers" which, based on a subset of information that can be delivered to them in a reasonable time, decide whether a given proton-proton collison contains new physics. Many theorists these days are making dents in walls with their heads trying to think of ways these triggers might be missing important information, so that we can suggest changes before it's too late. This is a lot of dedicated silicon, FPGA's, VME crates, etc. Slashdotters should drool. Anyway, we throw out the vast majority of information.
By comparison, LSST is trying to store everything. Scroll up for an interesting comment about calibrating ambient brightness and seeing. I can't answer which will deliver more information, but both are incredibly interesting challenges.
Data challenges abound. We have designed the LHC Grid [web.cern.ch] to distribute this information. There will be several data warehouses located around the world at national labs and universities. Even after the triggers decide what is "interesting", more sophisticated algorithms, with access to all the data in a single proton-proton collision are applied. Then, humans are applied to the data and we will try to dig out new signals from this.
In all this we expect to find (among other things) the origin of mass [wikipedia.org] and Dark Matter [wikipedia.org], and we're working hard to prepare for the onslaught of data. :)
-- Bob
Re: (Score:2)
But can it work... (Score:1)
Yeah, right (Score:2)
The real purpose of Google's involvement is to scan the skies for evidence of other Google-like entities, so they can gang up on us carbon-based lifeforms and take over the galaxy.
Don't think you can seduce us with your efficient search engine and high stock value. We're onto you!!
Search this! (Score:2, Interesting)
Re: (Score:1)
Re:+5 Informative (Score:4, Insightful)
That's a lot of info.
No, that's a lot of data. Info is the result of analysing the data.
Nevermind Google Earth... (Score:1)
Impressive...
Re: (Score:1)
Imagine... (Score:2)
Dull viewing (Score:3, Funny)
I bet it makes dull viewing. Sort of like the recent Ashes Tests in Australia. If you're English.
Curling's not quite cricket (Score:1)
Why Wait? World Wind has SDSS NOW (Score:2)
I just worry that with Google "helping" the imagery could be locked up so not everyone has free and equal access to the data.
edgy science at the edge of computing (Score:2)