Spirit 'Will Be Perfect Again' 331
G. Holst writes "NASA technicians are preparing to wipe Spirit's flash memory clean of science and engineering files that have stymied its software. The fix, likely to be made Friday, could completely restore Spirit. "I think it will be perfect again," says the Mission Manager. Chalk this one up for earth!" There are numerous stories about Spirit and Mars: one describes being careful with rm -rf. Reader Tablizer sends in an interesting site: "I discovered Bill Momsen's website where he describes his experiences working on the first successful photographic mission to another planet: Mariner IV to Mars."
Re:My question (Score:5, Informative)
Re:Science and Engineering files? (Score:2, Informative)
Re:Any theories on what caused the corruption? (Score:2, Informative)
Re:Any theories on what caused the corruption? (Score:5, Informative)
Re:Repeat? (Score:5, Informative)
I imagine that someone is keeping an eye on it.
Re:My question (Score:5, Informative)
-B
Re:Backup ROM? (Score:4, Informative)
Re:Any theories on what caused the corruption? (Score:3, Informative)
Although that is the simple version that most of the press has been relaying, if you've watched the press conferences, the engineers have been carefull to say that they have not been able to fully reproduce the exact same errors on their test rovers here on Earth. The exact cause of the problem really hasn't been determined. And yes, they did stress test the file system before they sent the rovers up and they never saw the type of problem that they're having now.
Re:My question (Score:5, Informative)
Re:rm -rf?! (Score:3, Informative)
rm -- -rf
Or just use your favorite GUI file manager.
VxWorks memory, embedded protection (Score:5, Informative)
Released versions of VxWorks do not have protected memory. (The development version does.) So nothing is there to prevent overwrites by concurrent tasks, etc.
Those of you in the audience experienced in embedded systems know that this makes sense for embedded hardwar -- VxWorks or not -- for three main reasons:
Stuff running in such environments is damn near bug-free. It's not like, say, Mozilla, or even the Linux kernel, or even /bin/ls. These things get tested rigourously, not as an afterthought deligated to the junior programmer.
In systems which are allowed to fail once in a while, reboots are fast. There's no hard drive to spin up, no filesystem to fsck, etc. It can just go *click* and humans won't typically see an interruption in [whatever it was the doohickey was doing].
There's usually no point in memory protection. If the propulsion system walks off the end of a garbage pointer, mission's over. No real use in keeping the guidance system going; it's already on a ballistic uncontrollable arc. If some critical part of the super-smart pacemaker fails (see #1), there's no victory in digging the device out of the corpse and saying, see, this other critical part wasn't affected, thanks to the memory protection! In those cases, memory protection just increases the cost and size of a device, without helping anything.
Protected memory is good for systems which do more than one thing, and/or have parts which can die without killing the whole device (e.g., a desktop computer). And as I said above, some embedded OSes are added such protection for customers who want to adapt their technology to more general-purpose tasks.
Re:Pretty much OT but an interesting question (Score:3, Informative)
The U.S. gov't owns them. But, they're probably considered "Abandoned in place" or something.
Related to this topic, I read somewhere that NASA has officially stated that the lunar rover vehicle left on the moon is available for anyone who wants it. At a development cost of over $2mil, it's one of the most expensive cars ever developed. I call shotgun!
Re:Pretty much OT but an interesting question (Score:3, Informative)
I don't know what the general answer to this question is, but I do know that ownership of the Viking 1 lander was transferred to the Smithsonian [si.edu] from NASA. This does imply that NASA believes itself to still be the owner of these landers, presumably they consider them to be just waiting collection, and not abandoned.
Re:My question (Score:3, Informative)
None of the MER simulators ever ran for more than a few day at a time. The (highly reasonable) assumption was that a computer that could routinely run multiple times for a few days was stable for much longer periods. Since the test machine was rebooted regularly and set up for specific tests, there was never enough time for 'garbage' to accumulate to the point where it became a problem. This could be solved by running longer tests, but when you only have a few years between the start of a (lander mission) program and it's launch, it's very difficult to arrange for months long tests. You could extend the development phase, but that increases expenses significantly. (And unless you are *very* careful and lucky, you end up with some hardware sitting around for extended periods of time before integration. This is not without significant risks of it's own.)
This is why NASA performs what many call 'Wile E. Coyote' engineering. If it works once, keep doing it. If it fails once, never do it again.
The enourmous cleanroom requirements for all phases of spacecraft assembly comes from some minor but recurring failures due to minor contamination *all the way back in the Ranger program*. Airbags were used with MER because they had worked with the (very similiar) Pathfinder mission, while the rockets of MPL had been a failure. (Even though the cause of the failure was clearly and completely understood.)
NASA engineering, spacecraft, policies, and procedures as a result are a very weird mix of cutting edge and "we've always done it this way and never had a problem". Poorly understood cutting edge (pure O2 atmospheres in spacecraft) have killed, but then so has "always done this way" (O-rings, foam shedding).
Re:VxWorks memory, embedded protection (Score:3, Informative)
I'd want to see real, hard numbers.
Check out any book on OS design. That's about the typical average cost. The processor's memory manager often has a limited list of memory regions, for example 8 or 16, and this is used as a cache for the active regions that are needed. Whenever the processor attempts to access memory outside those regions, an exception/interrupt is thrown, and then the cache needs to be updated, or an exception thrown to that task. That takes a while to process- the amortised cost is typically around that kind of value.
Clearly there are applications where the N% lost to protection is too much
In my experience, this is exceptionally rare. That final 3% can nearly always be clawed back with subtle changes to the software. If it really can't- then the processor was too slow for the task in hand in the first place- normal feature creep has a much bigger than 3% effect on the system performance.
Re:Any theories on what caused the corruption? (Score:3, Informative)
Not only did it take ~7 months (I think) to get to Mars, meaning it was already 7 months "obsolete" when it landed, it was also probably not launched for a LONG time after it was completed. They have to test it extensively and test for every possible scenario they can come up with. They also have to have equipment that is suitable for the extremes of space travel. High G forces, extreme vibration, extreme temperature differences, radiation, and probably thousands of other things I don't know about.
Put simply, YOUR PC THAT SITS ON YOUR DESKTOP WON'T SURVIVE A ROCKET LAUNCH INTO SPACE. And if it did survive the actual launch and made it into space, it would fail very quickly.