Digitizing 100 Years of Astronomical Data 115
Maximum Prophet writes to mention that a collection of glass plates containing astronomical information from the late 19th century through the mid-1980s is being considered for digitization. "The accumulated result weighs heavily on its keepers on Observatory Hill, just up Garden Street from Harvard Square: more than half a million images constituting humanity's only record of a century's worth of sky. 'Besides being 25 percent of the world's total of astronomical photographic plates, this is the only collection that covers both hemispheres,' said Alison Doane, curator of a glass database occupying three floors, two of them subterranean, connected by corkscrew stairs. It weighs 165 tons and contains more than a petabyte of data. The scary thing is that there is no backup." I'm sure that anyone with a spare $5 million or so would be welcomed with open arms.
That's quite a bit. (Score:3, Funny)
Sounds like a typical lunch clean-up after Rosie O'Donnel.
Sorry. I'm truly sorry.
That's at least... (Score:3, Funny)
Make it searchable... (Score:1)
Glass plates will outlive the digital"backup" (Score:5, Insightful)
Re:Glass plates will outlive the digital"backup" (Score:5, Funny)
Luckily glass isn't a liquid.... (Score:3, Insightful)
Re: (Score:1)
Re:Luckily glass isn't a liquid.... (Score:4, Interesting)
Re: (Score:3, Informative)
Re:Glass plates will outlive the digital"backup" (Score:5, Insightful)
Of course the ideal would be if we could develop a cheap digital permanent storage that had guaranteed physical longevity, say several millenia. That combination would allow easy dissemination of the data and safety by using a multiplicty of sources.
Re: (Score:2)
But would it be compatible with Office 5007 or OO v3452.23?
Re: (Score:2)
Ceramic/fired clay tablets will nicely. There's likely to be density issues however.
Re:Glass plates will outlive the digital"backup" (Score:4, Informative)
So what? Copy the digital version onto a second set of disks when it comes close to expiring.
Lossless copying means that given a little bit of maintenance, expiration of digital media is a nonissue.Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
How long would a RAID5|ZFS system last? Ten years? At which point you can no longer get spare-parts for the system (well not easily.)
Tape systems would have slightly longer longevity (due to tapes being mainly used for backups.)
Of course cost of storage will go down, so perhaps they could continually upgrade to smaller higher capacity systems, but then it becomes a m
Re: (Score:3, Insightful)
It's true that digitized data is more prone to failure than most analog carriers. The whole point is that digitized data is much easier copied over and over again, without loss, independent from whatever carrier used.
Re: (Score:2)
So in five years you go out and buy another petabyte of storage. If we assume that the price of storage halves every 24 months or so, then 1PB should only cost you around $62,500 (again, factoring in some overhead). And five years after that, maybe $16k.
The "halves every 24 months" argument could be
Re: (Score:2)
Re: (Score:2)
Add the cost of a workgroup- or enterprise-level Unix server to handle data integrity maintenanc
Re: (Score:3, Interesting)
Re: (Score:2)
Re:Glass plates will outlive the digital"backup" (Score:5, Insightful)
Yes, as a matter of fact. Definitely a lot of work is involved, but do you believe that you wouldn't need a team of document managers, millions of dollars worth of floor space, and expensive climate controlled facilities for archival of microfiche? You most certainly do. It's a lot of data. Period. No matter what you try to do with it, it's a lot of data. It's going to require a lot of resources. That's just a fact of life.
Anyway, noone in their right mind would choose microfiche for that type of data. If you're only storing plain text pages it's adequate (though I still don't think it would be the "right way to do it" in this day and age), but for photographic plates? Not going to work.
Microfiche is vastly overrated, in my opinion. My current project involves taking 2 floors worth of 30-50 year old microfiche and scanning it, OCRing it, and PDFing it. Yes it certainly does age. Quite poorly, in fact. The quality is absolutely terrible compared to the paper versions, some of it is stuck together, and indexing and cataloging it is a nightmare all of its own.
Yes, there are challenges in the digital world too, but most are easily surmountable given a little bit of common sense in understanding that digital is not magic. It doesn't mean you can "fire and forget". The documents will still require maintenance, cataloging, protection and monitoring. Format obsolescence is very nearly a nonissue, it is blown way out of proportion. That's where the "maintenance" comes in. The key benefit of digital is that you can and should losslessly upgrade your format whenever obsolescence is becoming a concern. Formats do not disappear overnight and suddenly everyone forgets what to do with them, you have plenty of time to make your transition if you're paying attention (which you must be: again, digital is not magic).
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Microfiche is terrible, short life span (Score:2)
Re: (Score:2)
Re: (Score:1)
People who curate glass museums shouldn't throw stones!
I can't wait to see... (Score:2)
Re: (Score:1)
Re: (Score:3, Insightful)
Re:Would Google archive it, perhaps? (Score:5, Funny)
Searchable in Lat/Lon/time/intensity
that would be awesome...
data/mass ratio (Score:1)
Re:data/mass ratio (Score:4, Informative)
165 short tons = 149,685,482 grams
1e15 / 149,685,482 = 6,680,674 bytes per gram
A quick check of amazon turns up a 1TB drive which weights 2.4 pounds.
That's 1,089 grams which is 918,592,757 bytes per gram.
Unless I've messed up my math, it looks like hard drives store 137 times more information per gram. That's not as large a multiple as I had imagined though. The whole thing should still be between 1 and 2 tons when put on hard drives.
Re: (Score:1)
This sounds like a job for Google (Score:3, Insightful)
Re: (Score:3, Informative)
Re: (Score:2)
What a lousy internship that'll be.
"Yeah. Mmmm. I'm going to have to have you do this job for me; it's digitizing this 165 tons of astronomical photographs on glass plates. Mmm. It would be good if you're done before the summer is over. That's great. Mmmm. Thanks."
A Million People With $5 (Score:3, Insightful)
Re: (Score:1)
Google (Score:5, Insightful)
Re: (Score:2)
Backup? (Score:1, Interesting)
Re: (Score:3, Interesting)
Ack! Put down that knife!
This is a job (Score:2)
InfiniBytes (Score:5, Informative)
Glass photographic plates, especially from silver emulsion, are analog at extremely fine granularity. Effectively molecular, depending on how flat the glass surface was settled from its molten liquid state. The features of its silver oxide crystals, laid in place by individual photons arriving from vastly distant stars, could be meaningful at less than a nanometer. Especially when measuring extremely subtle influences, like the gravity from one distant star bending the light of another distant star, measured across a century in which those stars lost gravitational mass, for comparison.
There is a practically infinite amount of data on each of those plates, limited by our precision in measuring them. It's a smaller degree of infinity than that of the sky. But the original infinite sky is lost. While the plates' lesser infinities are impossible to replace, and all we'll get to use to look back across all the billions of years we saw in a long century of them.
Re:InfiniBytes (Score:4, Insightful)
And limited by the lenses/mirrors, and limited by atmospheric effects, and inconsistencies in the glass, and the silver, and, and....
I can't testify to the quality of the glass negatives, but I can testify to the fact that as much as people like to believe, even the best modern analog capture sources aren't anywhere near practically infinite, even in the best laboratory conditions.
Re: (Score:3, Insightful)
Re: (Score:1)
You're ignoring noise, and noise is a huge factor with analogue (and digital) readings. Many people in the past have ascribed meaning to analogue readings that were not there. At the nanometer level, especially, I think it's almost bound to all noise with most applications.
Saying that "The features of silver oxide crystals [...] could be meaningful at less than a nanometer" is not actually saying anything. If they are meaningful, say they are. If there is evidence of them becoming meaningful, reference
Re: (Score:3, Insightful)
Due to speed considerations the grain of these plates would be much worse. But well within the resolution of the 'scope used for recording.
Al
Re: (Score:2)
You can debate the choice here in terms of where to draw this line, and to suggest that they use some archival quality file format that does lossless image compression, but this is something that did go into their consideration.
I'm so glad that they h
Re: (Score:1)
found in the photographs. This is a pretty good time to be creating this archive, not just because the scanning technology is mature and storage technology is tracking our needs. There is also a greater interest in the astrophysics of temporal variations in anticipat
Re: (Score:2)
Re: (Score:2)
And when the sampled phenomenon is as vast as all of interstellar space, that infinitude is relevant.
Re: (Score:2)
Re: (Score:2)
It's funny how many people jumped on me for my pointing out how much more than a "petabyte" is on those plates. But no one has joined me in laughing at the "petabytes" claim.
Slashdot is retarded.
Re: (Score:2)
Maybe if you didn't make completely ridiculous claims people would join in with you laughing at other people making less ridiculous claims? The claimer in question, curator of the collection and hence possibly knows a little about it - though less about computing and digital versus analog, is certainly
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Informative)
If the astronomers who recorded these plates weren't anal, then astronomy wouldn't be advanced enough by now for you to enjoy it as an amateur.
sounds familar (Score:4, Funny)
That's what she said!
Planets et al...? (Score:2)
All this stuff should be digitized and made public (Score:3, Insightful)
There are also lots of amateurs out there running a wide variety of very specialized packages to do everything from discovering asteroids to keeping tabs on the brightness of stars and watching for supernovae.
Re:All this stuff should be digitized and made pub (Score:2)
That answer astounded me as in our own project the point was to make the data public as efficiently as possible. I mean,
Re: (Score:2)
If that's all you found, you didn't look hard enough. Sloan serves imaging and spectral data, and all of Hubble's science data (for example) has been available from three different data centers since 1992. (This is data we're talking about, not pretty pictures.) In fact, all NASA-funded missions are re
Re: (Score:2)
Try looking at cited sources on published papers for starters. http://arxiv.org/ [arxiv.org] will give you plenty of pre-publications. Here too http://sesame.stsci.edu/library.html [stsci.edu]
I'm well out of touch but here's what you get just from Google:
Skyview is a must. Images in any wavelength (multiple instruments)
http://skyview.gsfc.nasa.gov/ [nasa.gov]
Learn about the FITS data format. Not just pretty pictures b
Harvard can handle the burden (Score:5, Informative)
This:
Harvard University's endowment, valued at $25.9 billion at the end of FY 2005, is a collection of more than 10,800 separate funds established over the years to provide scholarships; to maintain libraries, museums, and other collections; to support teaching and research activities; and to provide ongoing support for a wide variety of other activities. The great majority of these funds carry some type of restriction.
I think they can scare up the change.
Re: (Score:2, Informative)
Re: (Score:2, Interesting)
If they did, they'd keep it private, and only share it amongst other institutions "prestigious" enough to be deserving of the blood and sweat of Harvard scientists.
I'm sorry, but the Ivy League has quickly degenerated into a billionaire's playground. If they turn away thousands of "perfectly qualified" applicant
Re: (Score:2)
A great idea. (Score:5, Interesting)
Clyde Tombaugh captured Pluto several times during his three decades long hunt for the elusive Planet X, but failed to put the pieces together. If he had had digital technology, he would have shaved off at least a decade of effort. So imagine all the extremely useful raw data still stored in those plates.
Re: (Score:1)
Re: (Score:2)
1. Pluto had been captured on plate as far back as 1915 by another astronomer, and it was he that missed it, not Tombaugh. So if they had known what they were looking at back then, the search would have been shortened by a decade and a half.
2. Tombaugh kept on searching for other candidates for Planet X after discovering Pluto.
An interesting tidbit is that during the 1930's, the only 24 hour radio station whose signal reached Flagstaff transmitted from Ciudad Juarez i
165 tons? My God (Score:2, Funny)
More than just a flat scan (Score:4, Informative)
There is more to this than simply scanning a flat image. The emulsion on these plates is a three dimensional medium, and different data can be extracted depending on your focal depth into the the emulsion. I believe David Malin did much pioneering work on this kind of thing, including the use of different layers for unsharp masking.
There will be information in the plates that is not yet part of human knowledge, and a simple scan of one focal plane is not going to get it all.
Certainly it is worth taking backup images of these plates in any way we know how, but we should remain aware that, as of today, no technology exists that will make exact duplicates of them, so great care should always be taken to preserve the originals.
Re: (Score:2)
Well, I mean, there is no technology that exists to make an exact duplicate of ANYTHING. That doesn't mean that digitizing this information is useless, however.
You mention that these are three dimensional mediums. Fine, three-dimensional data capture is nothing new. You're right, plopping this thing on a scanner is not going to work. But I'm sure that in some way we can get an image at a set of depth intervals and save those. Now there is an additi
GoogleSky (Score:3, Insightful)
Solution: Roaming "Scanner" Trucks (Score:1)
Step 2: Build-out a tractor trailer per sponsor to include everything needed to do scanning of archived materials (books, papers, photos, glass photo plates, etc.). Power source, scanners (many per trailer), etc.
Step 3: Drive the swarm of scanning trucks to the parking lots of an archive in need of backup.
Step 4: Connec
PETA-byte??? (Score:2)
You can be sure that all those PETA-bytes [peta.org] are vegetarian!
Crunching the numbers (Score:2)
Cost of storage? Free!!! They should get a few gmail accounts and store the sca
Both hemispheres? (Score:2)
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
Funding, complexity, ownership (Score:2)
Re:Why does it have to cost so much? (Score:4, Insightful)
Re:Why does it have to cost so much? (Score:5, Informative)
> It took 2 years and way WAY WAY less than $5,000,000 to do it
500,000 plates. Over 2 years, assuming 50 wks/yr means just 5000 plates need be scanned per week. 1000 plates per day. 125 plates per hour. And this is large, fragile glass with really high data density, so you have to be a) careful in handling and b) use slow high-res scanning.
Let's take a guess that it takes only 10 minutes per plate (to fetch, tag, load, scan, and return). So we need only 20 people to scan 125 plates/hour.
Well, assume 20 scanning people and 1 IT guy handling the sysadmin work for the petabyte storage. Also one scientist/manager. Take a low intern/grad student $35k, 1 sysadmin at $65k, 1 PM/sci at $85K. All x2.5 for overhead, for 2 years. That's $4.25 mil in salaries.
There's also buying a redundant petabyte and all the necessary gear. I'm amazed they figure $5mil can do it.
Re: (Score:2)
as well as onsite.
Re: (Score:2)
>>coat.
Re: (Score:2)
Re: (Score:2)
You are comparing 90 filing cabnets of paper to this.
The fact that A the paper isn't at all the fragile B it's not nearly as much data, 3 that they need special scanners? The they would need at least 5 people to do this? Probable 9 since you are going to want to have people whose only job it is to move the plates.
I wonder what you thin the true cost of the work you mentioned was?
I would guess it at about 200K, for you
Re: (Score:2)
Re: (Score:3, Insightful)
"I scanzord 90 filing cabinets of paper into teh computerz"
You know what, I used to launch model rockets. Its really easy to make stuff go up. Just buy the kit, attach a little engine and off it goes. $30 easy! Freakin NASA I bet they're spending all of our tax dollars on pr0n.
"cheapish 20megapixe
Re: (Score:2, Informative)
You need specialized scanning machines for astronomy. Office equipment doesn't do the job.
My colleagues in the UK had such a scanner. It was ~7 tonnes of metal, gl
Re: (Score:1, Funny)