Stories
Slash Boxes
Comments

News for nerds, stuff that matters

The Astronomical Event Search Engine

Posted by kdawson on Tue Jan 09, 2007 11:05 PM
from the cataloging-a-firehose dept.
eldavojohn writes "Google has signed on with the Large Synoptic Survey Telescope project that will construct a powerful telescope in Chile by 2013. Google's part will be to 'develop a search engine that can process, organize, and analyze the voluminous amounts of data coming from the instrument's data streams in real time. The engine will create "movie-like windows" for scientists to view significant space events.' Google's been successful on turning its search technology on several different media and realms. Will they be successful with helping scientists tag and catalog events in our universe?" The telescope will generate 30 TB of data a night, for 10 years, from a 3-gigapixel CCD array.
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

The Astronomical Event Search Engine 50 Comments More | Login /

 Full
 Abbreviated
 Hidden
More | Login
Keybindings Beta
Q W E
A S D
Loading ... Please wait.
  • 3/4 LoC a night (Score:2, Interesting)

    Will they be successful with helping scientists tag and catalog events in our universe? Will they defeat the monster and get the girl? And will they be home in time for tea? Find out next on GoogleTrek.

    Seriously though, processing something the equivalent
    • Re:3/4 LoC a night (Score:4, Interesting)

      by Wavicle (181176) on Tuesday January 09 2007, @11:23PM (#17534796)
      I actually did a small, insignificant portion of LSST's computation feasability study at LLNL during my internship there a couple summers ago. And yeah, the computational requirements were nothing to sneeze at. I'm not sure where they are at now, the specs changed seemingly every month, but when I left the CCD array was up to 3 gigapixels of 16 bit greyscale. I believe the observing cadence (at that time, again everything was changing on a regular basis) was two of those for the same piece of sky every 30 seconds. Wish I could have stayed... ahh well. I did get a really nice full-color research poster (that I had to design) out of it though!
      [ Parent ]
      • Near Earth Objects (Score:2, Interesting)

        I saw a documentary not long ago about doing just this photographing of the same piece of sky, only with longer intervals than 30 seconds. Anything moving would automagically be flagged by the software, it's vector computed. Correct me if I'm wrong, but fr
        • Re: (Score:3, Informative)

          Correct me if I'm wrong, but from what I can tell of this project, it's going to do exactly that (and more), but on a larger scope, and with better accuracy?

          Well, I was a very small cog for a very large telescope. But my understanding is pretty much exactl
      • Re: (Score:2)

        Just archiving that much data is bad enough, and google certainly has experience there. But what about making use of all that imagery? No human can look at that much data, and google's experience indexing the web seems only tangentially related.
        • Re: (Score:2)

          When I was working on it, I never once heard the name "google" dropped, so I don't know exactly the relationship. We were researching ways to have the computer identify phenomena based on pre-existing photometric pipelines already in use. In my case I was
            • Re: (Score:2)

              It's possible. The big problem with using a GPU to handle the processing is that they only support single precision floats and aren't very good at branching algorithms. Not sure how a pixel shader would handle 16 bit grey scale pixels either.
        • Re: (Score:3, Insightful)

          No human can look at that much data, and google's experience indexing the web seems only tangentially related.

          Google's PHd's and big thinkers could certainly play a part here. Google is about solving problems with large chunks of constantly changing data
          • Re: (Score:2)

            My only "concern" here is that Google is used to producing search engine front-ends for casual users, and not for the scientist. When digging through data like this, new ideas generally require new kinds of searches (new algorithms). So instead of a poli
          • Re: (Score:2)

            1. Google indexes massive amounts of data. The telescope imagery will be a massive amount of data.

            True enough, but google indexes massive amounts of data substantially different from imagery data. This would be something more akin to google earth, which is
    • Re: (Score:2, Funny)

      Pffft, I sift through that much data every night on limewire looking for por... err, movies.
    • Re: (Score:2, Interesting)

      According to Google (how appropriate), 30 terabytes * 365.25 * 10 = 107.006836 petabytes.
    • LSST v. PanSTARRS Approach (Score:4, Interesting)

      by cmholm (69081) <`cmholm' `at' `mauiholm.org'> on Wednesday January 10 2007, @03:32AM (#17536346) Homepage Journal
      The shop I'm at has been working the image processing and data storage problem for PanSTARRS [hawaii.edu], another sky survey project that is a bit further along (they have a test scope up and running on Maui). It's interesting to me that both projects are at once using conventional solutions and thinking outside of the box.

      Conventional: LSST will use a single large telescope and detector; PanSTARRS (as it stands) intends to use a dedicated compute cluster for data reduction.
      Novel: LSST is leaning towards distributing its data reduction task over Google's huge server farm; PanSTARRS will use four off-the-shelf 1.8m telescopes, each with a 1.4GP detector, mounted together to image the same piece of sky, and merging the overlapping images in post processing.

      When I was working on the project, one of PanSTARRS requirements was to finish analyzing one night's viewing before the following sunset. Early on, the principal investigators decided to solve the image storage issue by not storing them permanently. Instead, once the science for a night's imaging had been extracted (astrometry, LEO or supernova detection, etc), the original images would hit the bit bucket. Whether they've stuck with that I don't know.
      [ Parent ]
      • Thanks - I've been out of the loop in astronomy for some time, so I didn't know about this. (I had a minute amount of involvement in planning for the SDSS.)

        I don't like the idea of deleting the data - there's all sorts of ways it could be useful, most noti
    • Re: (Score:2)

      Seriously though, processing something the equivalent of 3/4th's of the LoC every night is nothing to be sneezed at.

      Yeah. Let's keep in mind that all astronomical observatory images are taken in a standardized lossless format, which is to say tiff. There
  • Great news (Score:2, Funny)

    Now Google will be serving up advertisements on Uranus.
    • Re: (Score:2, Funny)

      How long until Microsoft wants to jump into this marketspace as well?
      • Re: (Score:2)

        The sooner the better. Just because it's Google doesn't mean we should just go ahead and believe that they will implement the "best" solution without some pressure from a competitor. It's become real clear that Google can't be trusted to stay honest withou
        • Re: (Score:2)

          Yup, until everyone's had a chance with Uranus, we'll never know who does it best.
      • Re: (Score:2)

        Marketplace?? I dont think Google will be making a profit on this, even if they somehow put it online with ads.
        • Re: (Score:2)

          Yeah, why would any business make a profit when they provide a service? Preposterous!!
    • Re: (Score:2)

      Nah, Google is going to rename Uranus in 2036 to end that stupid joke once and for all.
    • Re: (Score:2)

      They've been doing that ever since I accepted a check from Mr. Miller.
  • The only problem (Score:2, Funny)

    Is arranging adwords to not get in the way of viewing planetary nebula.
  • .. of raw video data. I'm sure you could compress that a lot without losing any detail. I thought google ran most of their data centers on fairly normal hard drives. At that rate even with hitachi's new 1tb disks, that's a lot of drives.

    Hopefully though b

    • Re:30 TB a night... (Score:5, Informative)

      by Capt'n Hector (650760) on Wednesday January 10 2007, @12:26AM (#17535268)
      You can't compress this stuff unless you do it losslessly. Compression artifacts mess up photometry - if you're trying to compute apparent brightness, you need to factor in things like how bright the ambient sky is, and how much point sources get spread out (FWHM, seeing). That is, a point source that passes through the atmosphere looks like a normal probabliity distribution because of atmospheric distortions. So to get an apparent brightness, you have to correct for this effect. If compression artifacts are introduced, FWHM is thrown off, and you have no idea how "crisp" your image really is. That's why these data sets are so large. Quite literally, they're doing a pixel dump from their massive ccd all night. But hey, somehow I doubt they'll be using this telescope for anything but object detection. There's no reason to store it all except to compare a current picture to one in a base set, kinda like KAIT [berkeley.edu] on stearoids.
      [ Parent ]
      • Yeah, I did say without losing any detail. Video would usually still have a lot of potential for compression. Of course if you *can* do the analysis on the raw data before you store it ...

        Anyway my point was, at 30TB per day for 10 years thats about 100K

      • Re: (Score:2)

        Still, the night has TONS of black.
        And even considering the general noise level, the high dynamic range of the images will mean that outside of actuall stars, nearly all of the bits will be zero.

        Im sure a lossless reduction of one order of magnitude is ent
        • Re: (Score:2)

          It's still a very bad idea. One of the uses of LSST data will be to co-add many images of the same piece of sky to detect fainter objects. For an object that produces one electron in a CCD pixel 20% of the time, the difference between "1" and "0" in those
      • Re: (Score:2, Informative)

        You can't compress this stuff unless you do it losslessly. Compression artifacts mess up photometry

        This is not strictly true. What's true is that the current standard lossy compression techniques mess up photometry. However, if you know what you are go

  • In another threat about the collapsed Pillars of Creation I questioned the value of that type of research... who cares if they collapsed or not. I asked... where is the value in that particular research.

    This whoever provides real obvious value. I could c
  • by Phat_Tony (661117) on Wednesday January 10 2007, @12:34AM (#17535332) Homepage
    That's a lot of data, but it's less than 1/10 as much data [physorg.com] as the Large Hadron Collider [web.cern.ch] will put out, and the LHC is supposed to be coming online within a year, not in six years. By the time the Large Synoptic Survey Telescope comes online, the LHC may have produced more data than the Large Synoptic Survey Telescope will over the life of the project.

    I'd be interested to know more about the data handling methods they have in place for the LHC. I don't think they'll be using Excel.

    *Note the correct, non-Frudian-Slip spelling of "hadron [google.com]"
    • Re: (Score:2)

      I'd be interested to know more about the data handling methods they have in place for the LHC. I don't think they'll be using Excel.

      It looks like you have begun collection ginormous amounts of data. Paperclip recommends you use Microsoft Access to handle l
    • by dido (9125) <dido.imperium@ph> on Wednesday January 10 2007, @02:14AM (#17535934) Homepage

      Funny, but CERN itself makes that same misspelling of 'hadron' here [web.cern.ch]. "This is the underground tunnel of the Large Hardon (sic) Collider (LHC)..."

      [ Parent ]
    • by mcelrath (8027) on Wednesday January 10 2007, @02:30AM (#17536030) Homepage

      The LHC will produce more data, but we also don't care about most of it. The vast majority of it is junk. The "interesting" physics (particles like W and Z bosons, top quarks, higgs, etc) are about 10^-9 of the events. It is a huge needle in a haystack problem and we throw out most data. We have many experts and professors who design "triggers" which, based on a subset of information that can be delivered to them in a reasonable time, decide whether a given proton-proton collison contains new physics. Many theorists these days are making dents in walls with their heads trying to think of ways these triggers might be missing important information, so that we can suggest changes before it's too late. This is a lot of dedicated silicon, FPGA's, VME crates, etc. Slashdotters should drool. Anyway, we throw out the vast majority of information.

      By comparison, LSST is trying to store everything. Scroll up for an interesting comment about calibrating ambient brightness and seeing. I can't answer which will deliver more information, but both are incredibly interesting challenges.

      Data challenges abound. We have designed the LHC Grid [web.cern.ch] to distribute this information. There will be several data warehouses located around the world at national labs and universities. Even after the triggers decide what is "interesting", more sophisticated algorithms, with access to all the data in a single proton-proton collision are applied. Then, humans are applied to the data and we will try to dig out new signals from this.

      In all this we expect to find (among other things) the origin of mass [wikipedia.org] and Dark Matter [wikipedia.org], and we're working hard to prepare for the onslaught of data. :)

      -- Bob

      [ Parent ]
    • Re: (Score:2)

      Should be easy to compress though. Think of all of that blank space betweeen objects.
  • The real purpose of Google's involvement is to scan the skies for evidence of other Google-like entities, so they can gang up on us carbon-based lifeforms and take over the galaxy.

    Don't think you can seduce us with your efficient search engine and high s

  • Search this! (Score:2, Interesting)

    Hm, Google searching space... I'm waiting for the time google will search in people's bodies and catalog their illnesses.
  • Imagine... (Score:2)

    The telescope will generate 30 TB of data a night, for 10 years, from a 3-gigapixel CCD array.
    Imagine a Beowulf cluster of these!
  • Dull viewing (Score:3, Funny)

    by caluml (551744) <slashdot@nOSPam.spamgoeshere.calum.org> on Wednesday January 10 2007, @04:30AM (#17536636) Homepage
    The telescope will generate 30 TB of data a night, for 10 years, from a 3-gigapixel CCD array.

    I bet it makes dull viewing. Sort of like the recent Ashes Tests in Australia. If you're English.
  • Why wait for this when the Sloan Digital Sky Survey http://www.worldwindcentral.com/wiki/Sdss [worldwindcentral.com] is available in NASA World Wind .. NOW. (Yet again, Google is not the first to do something)

    I just worry that with Google "helping" the imagery could be locked

  • Its routine in physics collider experiments and seismic exploration to collect several terabytes a day. The limiting factor seems to be data management.
    • Re: (Score:2)

      In addition to that question, what does that say about the future of Google?

      • Re: (Score:2, Funny)

        Skynet?
      • Re: (Score:3, Interesting)

        It says their smart enough to take on challenging and related problems that they can learn from and use to enhance their information business. This is a real-time application. Imagine if Google could, based on all of the data Google is collecting and ind
        • Re: (Score:2)

          "Imagine if Google could, based on all of the data Google is collecting and indexing, provide a real time view of current trends and patterns of consumers on the web."

          My opinion, they can. They just are not sharing it with the rest of the world.

    • Re:+5 Informative (Score:4, Insightful)

      by VitaminB52 (550802) on Wednesday January 10 2007, @05:14AM (#17536888)
      The telescope will generate 30 TB of data a night

      That's a lot of info.

      No, that's a lot of data. Info is the result of analysing the data.

      [ Parent ]