Forgot your password?
Space Science

Digitizing 100 Years of Astronomical Data 115

Posted by ScuttleMonkey
from the jobs-for-photosynth dept.
Maximum Prophet writes to mention that a collection of glass plates containing astronomical information from the late 19th century through the mid-1980s is being considered for digitization. "The accumulated result weighs heavily on its keepers on Observatory Hill, just up Garden Street from Harvard Square: more than half a million images constituting humanity's only record of a century's worth of sky. 'Besides being 25 percent of the world's total of astronomical photographic plates, this is the only collection that covers both hemispheres,' said Alison Doane, curator of a glass database occupying three floors, two of them subterranean, connected by corkscrew stairs. It weighs 165 tons and contains more than a petabyte of data. The scary thing is that there is no backup." I'm sure that anyone with a spare $5 million or so would be welcomed with open arms.
This discussion has been archived. No new comments can be posted.

Digitizing 100 Years of Astronomical Data

Comments Filter:
  • by L. VeGas (580015) on Wednesday July 11, 2007 @06:34PM (#19831693) Homepage Journal
    165 tons of glass plates?

    Sounds like a typical lunch clean-up after Rosie O'Donnel.

    Sorry. I'm truly sorry.
  • ... and call it Alta Vista?
  • by gatkinso (15975) on Wednesday July 11, 2007 @06:39PM (#19831741)
    now there is some irony.
    • Are you sure about the stability of glass plates? I hear a lot of people have real trouble with windows stability! Sorry, I'll go now....
    • by KokorHekkus (986906) on Wednesday July 11, 2007 @07:10PM (#19832101)

      now there is some irony.
      But currently they also makes them vulnerable to a single point of failure (as indirectly pointed out in the article). If you have some data that has any real value for you then having only one copy (or only one storage facility) isn't any real protection whatever method you use. In this case we have data that would be readily accepted for backup by organisations all around the globe and barring a worldwide upheaval the safety of the data would be much better than any single glassplate could offer.

      Of course the ideal would be if we could develop a cheap digital permanent storage that had guaranteed physical longevity, say several millenia. That combination would allow easy dissemination of the data and safety by using a multiplicty of sources.
      • by bryan1945 (301828)
        "Of course the ideal would be if we could develop a cheap digital permanent storage that had guaranteed physical longevity, say several millenia"

        But would it be compatible with Office 5007 or OO v3452.23?
      • Of course the ideal would be if we could develop a cheap digital permanent storage that had guaranteed physical longevity, say several millenia. That combination would allow easy dissemination of the data and safety by using a multiplicty of sources.

        Ceramic/fired clay tablets will nicely. There's likely to be density issues however.
    • by seaturnip (1068078) on Wednesday July 11, 2007 @07:52PM (#19832515)

      So what? Copy the digital version onto a second set of disks when it comes close to expiring.

      Lossless copying means that given a little bit of maintenance, expiration of digital media is a nonissue.
      • by Venik (915777)
        Ever tried to maintain archival backups for a petabyte-worth of data? You would need a team of operators, millions of dollars worth of storage hardware, expensive climate-controlled facilities. This will not be a one-time expense either. The best way to back up 165 tons of photographic plates is to use more photographic plates to hold microfiche sets. Not exactly cutting-edge technology, but it will outlast any tape archive or storage array. And it will be cheaper.
        • by Smauler (915644)
          A petabyte is not that impressive any more - It's only a few thousand consumer level hard drives. I do realise that the challenge of managing all that data is far from easy, but claiming that a petabyte requires "millions of dollars worth of storage hardware" is wrong. I just checked on newegg, 500Gb is $110, thus 1Tb is $220, thus 1Pb is $220,000, and this is retail prices. Managing all that crap would be the main problem.
          • by sendai2ci (629417)
            I don't think they want management as much as equal or better longevity. Some of these plated have lasted over a century, most over 25 years...

            How long would a RAID5|ZFS system last? Ten years? At which point you can no longer get spare-parts for the system (well not easily.)

            Tape systems would have slightly longer longevity (due to tapes being mainly used for backups.)

            Of course cost of storage will go down, so perhaps they could continually upgrade to smaller higher capacity systems, but then it becomes a m
            • Re: (Score:3, Insightful)

              by FST777 (913657)
              By that time, other techniques will be available to copy the digital archive over. Heck, it might even be possible to make a copy of the digital data on glass plates, complete with descriptions of the used protocol.

              It's true that digitized data is more prone to failure than most analog carriers. The whole point is that digitized data is much easier copied over and over again, without loss, independent from whatever carrier used.
            • by Kadin2048 (468275) *
              I think the point is that you'd pay (let's round up for the sake of overhead) $250,000 now, to store a petabyte. Let's also say that the drives are good for five years, at which point you'd need to replace them.

              So in five years you go out and buy another petabyte of storage. If we assume that the price of storage halves every 24 months or so, then 1PB should only cost you around $62,500 (again, factoring in some overhead). And five years after that, maybe $16k.

              The "halves every 24 months" argument could be
              • by Venik (915777)
                And where would you put your 2000 hard drives? Your VCR? What if - however surprising this may seem to you - one of your 2000 hard drives fails? Would you tell the customer that you just lost 500Gb-worth of photos and three years of their research data? What if you lose two hard drives? We all store a great deal of porn on our PCs, but this is not the same.
          • by Venik (915777)
            The final price depends entirely on what equipment and methods you chose, and, therefor, what level of reliability you can expect. Don't look at the price of just the hard drives: you still need something to put them into. Look at the prices of high-end disk arrays capable of storing and managing 1Pb of data with adequate redundancy, expansion capability, and enough space left over to refresh data every so often.

            Add the cost of a workgroup- or enterprise-level Unix server to handle data integrity maintenanc
        • Re: (Score:3, Interesting)

          by profplump (309017)
          And your photographic copy would A) degrade over time and B) lose quality with each copy. IMHO that's not a very good archive. Moreover, in order to slow the inevitable decay that comes with time and reactive chemicals on paper/plastic/metal/whatever, you'd still need a climate-controlled facility. And you'd still need a team of operators to make the copy, and to make later copies as the earlier ones degrade. And more than anything else, you'd need someplace to store *another* 165 tons of photos, which is c
          • by Venik (915777)
            Photographic copies have a few advantages over digital data storage media. Photographic plates, properly stored, will reliably last 20-30 years, at least. Tapes and hard drives will not. Digital data will need to be maintained. Storing such a large amount of digital data will be a continuous process of reading and writing; replacing disks, drives or tapes; upgrading software, hardware and networking. This is not the kind of a backup that you will be able to put away for 20 years and then go back and "refres
        • by Cecil (37810) on Thursday July 12, 2007 @12:35AM (#19834555) Homepage
          Ever tried to maintain archival backups for a petabyte-worth of data?

          Yes, as a matter of fact. Definitely a lot of work is involved, but do you believe that you wouldn't need a team of document managers, millions of dollars worth of floor space, and expensive climate controlled facilities for archival of microfiche? You most certainly do. It's a lot of data. Period. No matter what you try to do with it, it's a lot of data. It's going to require a lot of resources. That's just a fact of life.

          Anyway, noone in their right mind would choose microfiche for that type of data. If you're only storing plain text pages it's adequate (though I still don't think it would be the "right way to do it" in this day and age), but for photographic plates? Not going to work.

          Microfiche is vastly overrated, in my opinion. My current project involves taking 2 floors worth of 30-50 year old microfiche and scanning it, OCRing it, and PDFing it. Yes it certainly does age. Quite poorly, in fact. The quality is absolutely terrible compared to the paper versions, some of it is stuck together, and indexing and cataloging it is a nightmare all of its own.

          Yes, there are challenges in the digital world too, but most are easily surmountable given a little bit of common sense in understanding that digital is not magic. It doesn't mean you can "fire and forget". The documents will still require maintenance, cataloging, protection and monitoring. Format obsolescence is very nearly a nonissue, it is blown way out of proportion. That's where the "maintenance" comes in. The key benefit of digital is that you can and should losslessly upgrade your format whenever obsolescence is becoming a concern. Formats do not disappear overnight and suddenly everyone forgets what to do with them, you have plenty of time to make your transition if you're paying attention (which you must be: again, digital is not magic).
          • by Venik (915777)
            I think this digital optimism stems from the fact that very few people tried to access any large body of 50-year-old digital data. I know that in the aerospace industry in the US average retention of backups rarely exceeds five years. This is because reliably storing data for longer periods of time causes the costs of backup environment to skyrocket. Sometimes I had to deal with recovering data from 10-15 years ago with mixed success. I remember one instance when data could be recovered from tapes but lucki
            • by Cecil (37810)
              I don't think you understand... you're not supposed to ever let them sit that long. That's my whole point. Maintenance, maintenance, maintenance. You can't just drop your "backups" into a vault and go "ahhh, now they are safe for eternity, because they are digital". It's not magic.
              • by Venik (915777)
                This is exactly my point. I think you meant to reply to someone else's posting.
        • Microfiche has a short life span. When I was working at the Royal Greenwich Observatory they'd done some research and discounted that as a feasible option. Something like 25 years if you're lucky? The glass plates in the RGO were from the same period as these American ones, and in equally reasonable condition (in most cases.... the problem was there as well... we were transferring them to acid free paper sleeves).
          • by Venik (915777)
            If your glass plate with microfiche reliably lasts for 20 years, this will be about 10 years more than I would give any hard drive or tape. It amazes me how willing people are to believe in the longevity of digital data. This supposed longevity is largely an urban myth. Banks, universities, insurance companies keep lots of data in digital format for extended periods of time. But it costs them a pretty penny and almost invariably they still maintain paper archives.
    • by jon287 (977520)
      Hey now...

      People who curate glass museums shouldn't throw stones!
  • ...all the alien porn I've been missing!
  • That's about 6e6 Bytes per gram. Digitizing that data means lots of redundancy while preserving the total mass of this collection.
    • Re:data/mass ratio (Score:4, Informative)

      by HappyEngineer (888000) on Wednesday July 11, 2007 @08:11PM (#19832677) Homepage
      It depends on the number of pounds in a ton, but if it's short tons then

      165 short tons = 149,685,482 grams
      1e15 / 149,685,482 = 6,680,674 bytes per gram

      A quick check of amazon turns up a 1TB drive which weights 2.4 pounds.
      That's 1,089 grams which is 918,592,757 bytes per gram.

      Unless I've messed up my math, it looks like hard drives store 137 times more information per gram. That's not as large a multiple as I had imagined though. The whole thing should still be between 1 and 2 tons when put on hard drives.
  • by MDMurphy (208495) on Wednesday July 11, 2007 @06:50PM (#19831897)
    Google provides views of the Earth, Moon and Mars, why not stars? If the information was made available for them to deliver to their users, they might be interested.

  • by Stranger4U (153613) on Wednesday July 11, 2007 @06:54PM (#19831947)
    This seems like a great opportunity for either corporate sponsorship, or a grass-roots donation drive. In all honesty, $5 million isn't a whole lot of money for the likes of any real corporation, and it probably wouldn't be that hard to raise it through small donations from individuals. Espectially if you could ascribe names to some or all of it. How would it feel to be able to personally identify which plates you paid to have scanned? (this image of the Crab Nebula brought to you by John Smith) I'm surprised Paul Allen or Richard Branson aren't all over this like stink on shit.
  • Google (Score:5, Insightful)

    by blhack (921171) on Wednesday July 11, 2007 @06:55PM (#19831953)
    I'm sure that a company like google would be MORE than willing to fund a project archiving these. The positive press, proliferation of their intended "do no evil/good guy/just another bunch of geeks" image, having their name on a major scientific project would easily be worth the investment.
    • by dpilot (134227)
      I'm sure Microsoft has a way to store this data, no doubt with an "open" format. Things like this are too important to trust to anyone but Microsoft!
  • Backup? (Score:1, Interesting)

    by Anonymous Coward
    Of course, as long as they can keep mildew at bay, odds are that the plates will long outlast any digital record. Of course it always makes sense to keep a backup, not to mention the value of an instantly-retrievable library.
    • Re: (Score:3, Interesting)

      by TheSHAD0W (258774)
      How about we make a backup of the backup on glass plates...

      Ack! Put down that knife!
  • for Google. Man if I had a spare 5 million I'd be all over that, I love data.
  • InfiniBytes (Score:5, Informative)

    by Doc Ruby (173196) on Wednesday July 11, 2007 @07:10PM (#19832095) Homepage Journal

    contains more than a petabyte of data

    Glass photographic plates, especially from silver emulsion, are analog at extremely fine granularity. Effectively molecular, depending on how flat the glass surface was settled from its molten liquid state. The features of its silver oxide crystals, laid in place by individual photons arriving from vastly distant stars, could be meaningful at less than a nanometer. Especially when measuring extremely subtle influences, like the gravity from one distant star bending the light of another distant star, measured across a century in which those stars lost gravitational mass, for comparison.

    There is a practically infinite amount of data on each of those plates, limited by our precision in measuring them. It's a smaller degree of infinity than that of the sky. But the original infinite sky is lost. While the plates' lesser infinities are impossible to replace, and all we'll get to use to look back across all the billions of years we saw in a long century of them.
    • Re:InfiniBytes (Score:4, Insightful)

      by modecx (130548) on Wednesday July 11, 2007 @07:50PM (#19832497)
      here is a practically infinite amount of data on each of those plates, limited by our precision in measuring them.

      And limited by the lenses/mirrors, and limited by atmospheric effects, and inconsistencies in the glass, and the silver, and, and....

      I can't testify to the quality of the glass negatives, but I can testify to the fact that as much as people like to believe, even the best modern analog capture sources aren't anywhere near practically infinite, even in the best laboratory conditions.
      • Re: (Score:3, Insightful)

        by Doc Ruby (173196)
        Well, the lenses/mirrors that are now lost to history do introduce noise. But the atmospheric effects, and inconsistencies in the glass and silver, and probably much of the "writing" noise from the optics do all hold the possibility of being filtered out. Maybe not now, with today's early signal processing tech. But in another hundred or more years, that signal info could be available. If we don't damage them in the interim.
    • by Smauler (915644)

      You're ignoring noise, and noise is a huge factor with analogue (and digital) readings. Many people in the past have ascribed meaning to analogue readings that were not there. At the nanometer level, especially, I think it's almost bound to all noise with most applications.

      Saying that "The features of silver oxide crystals [...] could be meaningful at less than a nanometer" is not actually saying anything. If they are meaningful, say they are. If there is evidence of them becoming meaningful, reference

    • Re: (Score:3, Insightful)

      by monopole (44023)
      Having worked with holographic media for decades (which is about as fine resolution as you can get optically) the maximum resolution is on par with the grain size 40 nm (Afga 8e75) and considerably worse both due to the wavelength of light and the expansion of grains during exposure. To get 'molecular' resolution you'd have to go over to dichromate plates far too slow.

      Due to speed considerations the grain of these plates would be much worse. But well within the resolution of the 'scope used for recording.

    • by Teancum (67324)
      The article does mention that this staff tried to mark a compromise between scanning so finely that they would record noise (a problem when doing analog to digital conversion in any format, but especially image data) and not well enough that you also lose data.

      You can debate the choice here in terms of where to draw this line, and to suggest that they use some archival quality file format that does lossless image compression, but this is something that did go into their consideration.

      I'm so glad that they h
      • by djmink (1131997)
        We are preserving all of the raw scans, and the plates will be preserved, too. Our scans resolve the photographic grains, going well beyond the actual resolution of objects in the sky. We are also saving a few more bits of intensity than are
        found in the photographs. This is a pretty good time to be creating this archive, not just because the scanning technology is mature and storage technology is tracking our needs. There is also a greater interest in the astrophysics of temporal variations in anticipat
        • by Teancum (67324)

          We are preserving all of the raw scans, and the plates will be preserved, too. Our scans resolve the photographic grains, going well beyond the actual resolution of objects in the sky. We are also saving a few more bits of intensity than are
          found in the photographs. This is a pretty good time to be creating this archive, not just because the scanning technology is mature and storage technology is tracking our needs. There is also a greater interest in the astrophysics of temporal variations in anticipation

  • by Anonymous Coward on Wednesday July 11, 2007 @07:20PM (#19832221)

    anyone with a spare $5 million or so would be welcomed with open arms

    That's what she said!
  • Maybe we'll get some data on possibly extra solarplanets from this?
  • by syousef (465911) on Wednesday July 11, 2007 @08:12PM (#19832689) Journal
    When I completed my Astronomy masters access to publicly available data from various sources (most notably NASA data made free to the public) was a real boon. It meant we could do analysis on actual real data instead of artificial or sanitized textbook material. A couple of the students built on this to do some original research. (Sadly that's not the way I went, as my time was more limited).

    There are also lots of amateurs out there running a wide variety of very specialized packages to do everything from discovering asteroids to keeping tabs on the brightness of stars and watching for supernovae.
    • Two years ago I worked in a place [] doing some preliminary astronomy experiments previous to going BIG. I was doing atmosphere science. During a chat with the resident astronomer, I asked where their data was publicly available. His answer, in short: "absolutely not, it's our funding, so it's our data. We release only the final paper. We don't want competition from other labs/astronomers."

      That answer astounded me as in our own project the point was to make the data public as efficiently as possible. I mean,

      • by faxafloi (228519)

        ...but I've searched high-res sky images in the past without finding anything systematic except some specific projects such the Sloan Sky survey (which are just coordinates) or the odd marketing Hubble shot.

        If that's all you found, you didn't look hard enough. Sloan serves imaging and spectral data, and all of Hubble's science data (for example) has been available from three different data centers since 1992. (This is data we're talking about, not pretty pictures.) In fact, all NASA-funded missions are re

      • by syousef (465911)
        If you were a professional astronomer I'd say it sounds like you'd be better off finding a different organization to work for.

        Try looking at cited sources on published papers for starters. [] will give you plenty of pre-publications. Here too []

        I'm well out of touch but here's what you get just from Google:

        Skyview is a must. Images in any wavelength (multiple instruments) []

        Learn about the FITS data format. Not just pretty pictures b
  • by tchdab1 (164848) on Wednesday July 11, 2007 @08:27PM (#19832835) Homepage
    From here: l [],

    Harvard University's endowment, valued at $25.9 billion at the end of FY 2005, is a collection of more than 10,800 separate funds established over the years to provide scholarships; to maintain libraries, museums, and other collections; to support teaching and research activities; and to provide ongoing support for a wide variety of other activities. The great majority of these funds carry some type of restriction.

    I think they can scare up the change.
    • Re: (Score:2, Informative)

      by Anonymous Coward
      Absolutely correct. According to records, Harvard saw their endowment fund appreciate over 16% in a single year (FY2005). Sixteen percent of $30 billion is nearly $5 billion which would allow them to quite easily fund this project. Even if Harvard has the fund invested in an interest-bearing account at 5%, they're still seeing around $1.5 billion per year in interest income - something more than $4 million per day. This project is chump change.
    • Re: (Score:2, Interesting)

      by moosesocks (264553)
      They might, but I doubt it. Unless they could potentially turn it into a media blitz, I genuinely doubt that Harvard (or any private institution for that matter) would pick up this sort of project.

      If they did, they'd keep it private, and only share it amongst other institutions "prestigious" enough to be deserving of the blood and sweat of Harvard scientists.

      I'm sorry, but the Ivy League has quickly degenerated into a billionaire's playground. If they turn away thousands of "perfectly qualified" applicant
    • by drwho (4190)
      The reason why Harvard HAS so much money is because they don't SPEND it. Kind of like that crazy uncle Joe you have (or wish you had), the one that drove a school bus for his whole adult life and died with an estate worth over a million dollars. Yeah, Harvard is weird like that. I should know. I am depending on a NSF grant for my salary at Harvard, the school doesn't seem to want to give me any of its money. However, they seem to fund all sorts of useless "humanities" programs (I am not saying that all huma
  • A great idea. (Score:5, Interesting)

    by niktemadur (793971) on Wednesday July 11, 2007 @09:14PM (#19833201)
    If they manage to standarize a century of these plates, it would significantly extend the time range of data to digitally extrapolate and detect objects previously missed. Just to speak of mapping our own cosmic backyard, a significant amount of slow moving, previously undetected Kuiper Belt Objects, for example, would more easily pop into view. Surely a bunch of comets, too.

    Clyde Tombaugh captured Pluto several times during his three decades long hunt for the elusive Planet X, but failed to put the pieces together. If he had had digital technology, he would have shaved off at least a decade of effort. So imagine all the extremely useful raw data still stored in those plates.
    • by dwarmstr (993558)
      Uh... Tombaugh was 24 when he discovered Pluto aka Planet X. He later searched for other objects on the ecliptic, is that what you mean?
      • Right! Let me set the record straight:

        1. Pluto had been captured on plate as far back as 1915 by another astronomer, and it was he that missed it, not Tombaugh. So if they had known what they were looking at back then, the search would have been shortened by a decade and a half.
        2. Tombaugh kept on searching for other candidates for Planet X after discovering Pluto.

        An interesting tidbit is that during the 1930's, the only 24 hour radio station whose signal reached Flagstaff transmitted from Ciudad Juarez i
  • that's astronomical!
  • by CraterGlass (893417) on Thursday July 12, 2007 @03:34AM (#19835279)

    There is more to this than simply scanning a flat image. The emulsion on these plates is a three dimensional medium, and different data can be extracted depending on your focal depth into the the emulsion. I believe David Malin did much pioneering work on this kind of thing, including the use of different layers for unsharp masking.

    There will be information in the plates that is not yet part of human knowledge, and a simple scan of one focal plane is not going to get it all.

    Certainly it is worth taking backup images of these plates in any way we know how, but we should remain aware that, as of today, no technology exists that will make exact duplicates of them, so great care should always be taken to preserve the originals.

    • no technology exists that will make exact duplicates of them

      Well, I mean, there is no technology that exists to make an exact duplicate of ANYTHING. That doesn't mean that digitizing this information is useless, however.

      You mention that these are three dimensional mediums. Fine, three-dimensional data capture is nothing new. You're right, plopping this thing on a scanner is not going to work. But I'm sure that in some way we can get an image at a set of depth intervals and save those. Now there is an additi
  • GoogleSky (Score:3, Insightful)

    by 12357bd (686909) on Thursday July 12, 2007 @07:00AM (#19836021)
    Seriously, let Google index not only that collection, but any stellar image information and launch GoogleSky.
  • Step 1: Ask 15 to 20 major companies to each sponsor a "scanning trailer". They'd get their name and logo all over it and be part of the on-going story and never-ending literature, etc.
    Step 2: Build-out a tractor trailer per sponsor to include everything needed to do scanning of archived materials (books, papers, photos, glass photo plates, etc.). Power source, scanners (many per trailer), etc.
    Step 3: Drive the swarm of scanning trucks to the parking lots of an archive in need of backup.
    Step 4: Connec
  • ...contains more than a petabyte of data.

    You can be sure that all those PETA-bytes [] are vegetarian!

  • If we assume there are a half million plates as the article states (let's call it a 512K, i.e. 2^19), and there's a petabyte (2^50) worth of uncompressed data on them, that's 2^31 bytes (2GB) per plate. Assuming 3 bytes/pixel and square plates, that's about 26750x26750 pixels. With a 12x12 inch plate, that'd be about 2230 pixels/inch. If the plates are smaller, say 4 inches, that goes up to a more respectable 6700 pixels/inch.

    Cost of storage? Free!!! They should get a few gmail accounts and store the sca

  • Not sure what's meant exactly by it being the "only collection to cover both hemispheres". The Digitized Sky Survey [] covers the whole sky and it's been online [] for 12 years.
    • by bmk67 (971394)
      It's the only collection of photographic plates that covers both hemispheres. Despite DSS, this data is extremely important to astronomical research.
      • by faxafloi (228519)
        The DSS is a collection of plates that covers both hemispheres.
        • by bmk67 (971394)
          You are correct... I didn't check the link you provided and assumed you were talking about the Sloane DSS (which uses neither photographic plates nor is full-sky).
  • I don't think it would be hard to find some company to pay $5m, if they could keep the rights to the images, and pull a Westlaw type of scam. I am sure Harvard-Smithsonian isn't going to fall for this. They want to keep these images for the public, which makes it difficult for anyone to build a business model on and therefore difficult to get funding for. How would Google make money on this? Google adwords for a particular star? Or perhaps on google maps - "coffee near Barnard's star"? I am not saying that

Man must shape his tools lest they shape him. -- Arthur R. Miller