Storage Dilemma Looms for NASA 75
John Keeton writes "Guys,
This story talks about how NASA is moving its data from
tapes as old as seven tracks to newer media, but then
they get done, they have to start moving it again to new media, and how they are falling behind, and may have to lose TB's worth of data.. Really interesting.."
It says it will take them 4 years to move all the data to
tapes that have a 6 year life expectancy. Hmmm.
20+ (Score:1)
This is just one project, and we haven't even launched the spacecraft yet.
No Subject Given (Score:1)
was just collected by automatic telemetry and
archived. It has never been analyzed, there
are no plans to analyze it in the future, it
can take major research just to figure out
the file and data structures, etc...
The proper conceptual model for visualizing the
structure of this archive is the buildup of
ocean sediments, and the proper data mining
technique is based on the analysis of drill
cores... AHA... here we have reached the
boundary layer between the 7090/Fortran II
sediments and some traces of an early 360...
There is NO lasting format/medium solution (Score:1)
This data must remain accessible in the future as well. Some time ago they got rid of their once-
multi-million, now-obsolete mainframe. After that
they wasted a lot of money finding a method to read legacy mainframe tapes on Unix after discarding the mainframe. The costs involved, such as reverse-engineering the interfaces of the tape device, file format etc. and simply the effort of having all of those clumsy tapes read taught them a lesson: As data amounts continue to grow, ANY backup medium will grow obsolete in a few years. In a few years you can't buy a DVD drive anymore because it's grown obsolete, just like C=64 cassette tapes today. Massive transfer operations from obsoleted media repeatedly are simply out of the question. Their solution was to keep ALL of the data ONLINE. Backups are naturally taken regularily, with whatever equipment is in use at a given time. As all of the old data migrates to the new whizbang online storage along with everything else, there is no need to worry about aging archive libraries and preserving the technology to read the obsolete media. Not to mention the newfound ease of access to the more rarely needed old data. Sure, you need to invest
in large RAID arrays or what have you. Sure, you
need to invest in a high-volume backup system. But those systems are replaced "often", since they are
a part of the production system. In a few years, the extra volume that required the extra investment today will seem non-existent.
Now this may sound wild at first, but think about it: Data grows. Disks grow larger and cheaper, and are replaced. Backup media grow obsolete. The conclusion is a no-brainer, actually.
--
Teemu Yli-Elsila
This one's fairly easy to get round. (Score:1)
Get a heirachical storage system. They make it fairly easy to migrate data across different media types. When a new bigger, faster , better storage media comes along, just add it to the heirarchy. Course BIG automated media libraries help a lot, saves on the arms.
Two possible solutions (Score:1)
I think their problem calls for a layered solution, first they should copy their oldest tapes on to modern archival grade tape. Optical solutions just won't cut it yet and if they feel the data is worth keeping (and it probably is) then they need to get it into a more easy to work format.
I would duplicate the new tapes two, take an old tape and copy it on to 2 new tapes.
Secondly, they need to go optical. Pressed CDs have an average life of 100 years. CD-Rs are good for a couple decades. If you take good care of them they are expected to last longer. NASA needs to push on blue, green, or violet laser stacked CDs and DVDs. That stuff is in the labs now but if NASA and some government agencies started sounding the alarm for it maybe production could be expedited. I blue laser dvd should hold between 50 and 100GB of data, that is still short of some of the really big tapes out there but I would think it would work. Get STC to build an automatic tape to DVD jukebox machine and the problem would start to go away.
No Subject Given (Score:1)
mke2fs
they are just laisy! (Score:1)
as long as they don't have the goverment on their back (the comunists are coming!!!) to pump their energies up they don't do zilch!
looks good to me. (Score:1)
As an employiess of StorageTek I like reading this artical. It gives me hope for the future. :)
#include
speed (Score:1)
I can place 50 gigabytes on a singe tape, uncompressed. I can read or write that entire tape in the same time that a single speed cdrom can read 640 megs. A 40x cdrom would need more media changes (a robot would do the changes of course), and not counting the media changes would take 1.5 times as long!
Optical storage has promis, don't get me wrong, but when your trying to spool data as fast as NASA does from some applications they aren't suitable.
#include %lt;stddisclaimer.h> I'm not speaking for anyone here, all numbers have been rounded and esitmated.
speed (Score:1)
I agree use the right tool for the job. I also agree that optical storage can work, but optical drives go obsolete too, my perdiction is that in 5 years manufactures are going to notice that CD-ROM is never used and the DVD players will cost $.50 less because they don't put the ability to read CDROMS in DVD drives anymore. Whopps, goota move that optical data off cdrom not, while you can still find a reader.
Your still missing the point though: They want to use that data. I just stated the speed they can get from tape. They can't tell me which tapes they will need to read next year, but some of those tapes will be needed for some project. A supercomputer is a device for turning a CPU bound problem into an I/O bound problem. While many supercomptuers run unix and can multitask, the users still want the answer fast, and waiting for data to come off an optical cartrage isn't a good use of time. In todays world human time is more expensive then computer time, so it is worth the cost to make sure human time isn't wasted.
Don't forget that were talking about several hundred terrabytes of data at NASA, even in the optical stroage system everyone is envisioning (which may eventially be made, but it isn't effective today) it will take up significant space, and unless the media never changes (like CDROM->DVD media didn't change, right) they still need to migrate. I'm not a profit, I'm not about to perdict formats won't change.
robots vs by hand (Score:1)
I know for a fact that most current NASA storage is from StorageTek, and they are famious for robots, so it is likey that the current stuff is robotic. I'm also well aware that 2 years ago it was someone else, and if they aren't careful it could be someone else again as they keep upgrading capacities.
So it is a safe bet that the new tapes are in a robot. I'm comfortable saying, though unsure, that the old tapes were at least partially manual.
Accually I know the new systems are robotic, because NASA keeps their data in the same building they handle the deadly chemicals for the Shuttle booster rockets. The data center people really hate to be in a room that shares ventalation with a room where they mix two deadly gasses to make something even more deadly. I don't know why they don't move it.
Why Tape? (Score:1)
Access time is a legitimate concern, if it becomes a bottle neck. in comptuer scient terms: a 50 gb tape takes O(r+n) time. A bunch of cdroms take r^75+n time. Also note that the constant before n is bigger with a cdrom. Simply a dense tape is faster then optical, and has about teh same lifetime. (CD-R is not good for 100 years, as others have noted it is gaurentied for 20 years, tape can be that good)
I'm not against an optical storage system. I'd seriously consider investing money in reasearch ofr such systems. But magnetic media still has life, and is still in general a better solution then optical. Yes I expect this to change in the future, but NASA is dealing with today, they will probably migrate to better media again in 5 years. SOP for many buinesses as they try to re-claim the space consumed by the older storage.
If you don't get this... (Score:1)
If you don't want to, the premise of the movie was that some aliens picked up the voyager probe, read the programming that said that it needed to return the information it collected to Earth, and sent it back towards earth with a bunch of new machinery inside a gas cloud sever solar diameters big. It destroyed everything in its path trying to get back to Earth, and although it was sending out some data, no one on Earth remembered how to activate Voyager's transmit sequence.
HD-ROM -- particle-beam, not optical (Score:1)
They are a DOE spin-off working on archival technologies. The idea is to use particle beams to do the writing instead of lasers: you can focus the beam much more tightly, hence make much smaller dots. They have two technologies -- digital holding 165GB/disk, with 20MB/s storage rate, and analog, holding 90,000 pages scanned at 300dpi. Both use _very_ durable silicon-wafer substrates.
At that density, a 6-platter changer holds a terabyte, and a dozen 500-platter jukeboxes hold a petabyte. If you want really fast access, stripe across multiple platters -- if you stripe 8-way, you get a transfer speed of 10 terabytes per minute, which does better than NASA's old tapes (someone said 23 months, iirc).
fwiw
There is NO lasting format/medium solution (Score:1)
Sure, but how about a DC600A tape drive? For that matter, now that you've got the c64 tape, you will read it with what? Sure, if the data is desperatly important, you could do some sort of hack involving a soundcard and a tape player, but that's for data measured in K.
5.25 IBM formatted still isn't too hard, but how about 8 inch from a PDP-11?
The point is, no matter how long lived a storage method looks now, in 10 to 20 years, it'll be a big pain.
Low tech vs. High (Score:1)
Although painful to think about (given the volume of data, perhaps constantly migrating to the latest and greatest isn't the best answer. For example, I have some old WORM media, and some old punch cards. Guess which one I can still read (if I were really desperate to preserve COBOL code).
I also have some ancient 78 rpm records that I can still play, and some 10 year old audio CDs that I can't. It seems that there's been a wee little bit of spec drift in CD players so that not all new players work well with some old CDs. I say that because I have an older CD player that has no problems with the same old CDs. Wierd but true.
Two possible solutions (Score:1)
they MUST find a solution to the capacity problem first. Optical don't cut it.
Steve Jobs (Score:1)
All those Macintoshes at NASA, now they have to copy their data from floppies to USB-connected ZIP drives. Just because Steve Jobs says floppies are obsolete!
Thousands of terrabytes? Think how many ZIP disks that is!
Giga-click-of-death!
Stones don't scroll (Score:1)
...and they're difficult to grep....
dylan_-
--
500 terabyte RAID array? (Score:1)
Wow!
dylan_-
--
Why tape? (Score:1)
6 years of life expectancy LEFT (Score:1)
Not that it makes it any better. Someone tell NASA about DVD and other digital storage methods.
asinus sum et eo superbio
Better hurry (Score:1)
! ! ! ! ! ! ! ! ! ! !
Wired Magazine Article (Score:1)
This is exactly the subject of a really fantastic article at Wired's magazine archives. Thought I'd contribute the URL [wired.com].
Enjoy,
-p
--
Modern Tape Technology... (Score:2)
Shortage of Money? (Score:1)
Alternatively are they limited by the speed of the
old tape drives, rather than CPU throughput?
CDs? (Score:1)
Come to think of it, is there any high-capacity digital medium that could be reasonably expected to securely hold its data for centuries (as printed text on paper lasts)?
Parcel the tapes out to qualified parties (Score:1)
If all else fails, draw up a full-color glossy ad and sell 'em off to the SF nuts in the back of STARLOG, complete with velvet-lined display case and commemorative plaque. Oh, and when the display case is opened, you get a cheap, tinny-sounding rendition of the first few bars of "2001".
Kewl!
What about stones? (Score:1)
size of tape (Score:1)
speed (Score:1)
It may not be perfect, but it could tide them over for the next 10 years at least, until we start using bio-chemical storage devices.
Too bad 'CD Quality' audio isn't the end-all... (Score:1)
20+ (Score:1)
What's AXAF? :-) See the satellite formerly known as AXAF [harvard.edu].
Optical? (Score:2)
I wonder about some of these time estimates though. Are they talking about the total time to copy all the tapes one at a time? Seems like they could just add more tape drives. The one bottleneck might be the fixed number of readers, since the formats are so old they can't buy new drives to read them...
(gack! The media loves to latch on to "disaster" stories. :-/)
Solution Sought for Brain Imaging Data Storage (Score:2)
Micah
malpern@princeton.edu
speed (Score:1)
A system could be set up to give them real-time access to all their information and provide storage for less than 10 million and that's total cost up to 2005 where they were stating storage costs of 50 million a year.
Something is obviously wrong here. Am I missing something?
why not hard drives? (Score:1)
What about optical (Score:2)
Maybe they would have to use Laser Disc-sized DVD technology? Any other thoughts?
Optical is definately an option, but... (Score:1)
Cost is a problem right now.
In case everyone forgets, we're not spending hords of money on NASA and related departments anymore. In fact, they generally have either static or slightly shrinking budgets. So, naturally, they've gone to strictly commertial stuff whenever possible. No custom build stuff here.
The biggest problem isn't the throughput of modern tech (I do suspect that DVD-RAM/DVD-RW will be the format of choice), but the rate at which they can read data off the old systems. As other people have pointed out, a huge chunk of the data is on VCR-style types (or 9-track reels) - the readers are old, hard-to-find, and I suspect can't do more than a couple hundred kB per minute.
OK, math quiz: 100TBytes / 1MB per minute =~ 100,000,000 minutes =~ 190.25 YEARS. Say you have maybe 100 such readers at your site. It still takes almost 23 months of completely continuous reading to read it all off. No wonder they have a problem...
Oh, and the stab at all the old farts at NASA was unjustified. Most of the people I know at NASA (@ Ames, Goddard, etc.) are real engineers. Many are getting long at the tooth, but I can safely say most of the them are extremely competent, and I'm completely sure this isn't their fault. Probably just the typical upper-level funding problems (ie - complain to the top dogs @ NASA, and, more likely, to the dolts in congress who don't have the vision to properly fund them).
Why Tape? (Score:2)
I read a similar story several years ago. NASA collects an enormous amount of data from the various probes that are wandering about the solar system. At that time, I'm not sure that CD-ROM was proven for data yet and they were placing everything onto magtape.
Now that CD-ROM is pretty well established, I can't see why it wouldn't be suitable for copying those old tapes onto. OK, OK, DVD will hold more but even CD-ROM will hold tons more than an old 9-track tape. A simple calculation (feel free to correct me if I messed up here) shows
(2400 * 12 * 6250) / 8 = up to about 21 MB
I'm guessing that a 9-track tape takes up about the same amount of shelf space as about 6-7 CD-ROMs. Let's see that's 21 MB vs. 3600-4200 MB. Looks to me like they gain back some floor/shelf space as well as longer life for the data.
The concern about access time can't be that legitimate. Robotic tape handlers aren't any faster than CD-ROM handlers/jukeboxes.
I hope NASA acts on this before those old tapes become totally unreadable. Loss of this data, IMHO, would be a catastrophe.
Why? (Score:1)
Too much data (Score:1)
How about compressing the data? Not just lzh or something, but things like peaks/troughs and other statistically significant items? Once the raw data has been around for a while, say a year, they can reduce it to what's significant. If later they change their mind, realize they need the original raw data, too bad! They'll just have to revise their algorithms for the future. No big loss.
What about stones? (Score:1)
On mass storage migration... (Score:1)
What about stones? (Score:1)
So why not just carve everything in stone? Yabadabadoo!