Spirit 'Will Be Perfect Again' 331

Posted by michael on Friday January 30, 2004 @02:38PM from the we-get-signal-main-screen-turn-on dept.

G. Holst writes "NASA technicians are preparing to wipe Spirit's flash memory clean of science and engineering files that have stymied its software. The fix, likely to be made Friday, could completely restore Spirit. "I think it will be perfect again," says the Mission Manager. Chalk this one up for earth!" There are numerous stories about Spirit and Mars: one describes being careful with rm -rf. Reader Tablizer sends in an interesting site: "I discovered Bill Momsen's website where he describes his experiences working on the first successful photographic mission to another planet: Mariner IV to Mars."

Spirit 'Will Be Perfect Again'

This discussion has been archived. No new comments can be posted.

Search 331 Comments Log In/Create an Account

Comments Filter:

Re:My question (Score:5, Informative)

by Anonymous Coward writes: on Friday January 30, 2004 @02:49PM (#8137302)

VxWorks in my experience is terrible at memory management, and when you get close to the "edge" it becomes almost useless. Not just with Flash memory either. Even when managing a very large disk system I always try to keep at least 20% free.

Re:Science and Engineering files? (Score:2, Informative)

by Wiser87 ( 742455 ) writes: on Friday January 30, 2004 @02:55PM (#8137361) Homepage

According to spaceflightnow.com, the files are from the cruise stage (travel from Earth to Mars) of the mission...

Re:Any theories on what caused the corruption? (Score:2, Informative)

by Wiser87 ( 742455 ) writes: on Friday January 30, 2004 @02:59PM (#8137413) Homepage

To quote spaceflightnow.com:

"...we are going to bring the system back up in what we call the cripple mode where we are able to have normal operations in the mode in which we are able to take pictures and you are able to use the instruments. And in that mode we are going to delete from the flash memory -- the flash file system -- a large number of files that were left over from the cruise phase of the mission before landing," Adler said.

Engineers think that by deleting that batch of files, the rover might not require as much memory when trying to use the file management system.

Re:Any theories on what caused the corruption? (Score:5, Informative)

by confused one ( 671304 ) writes: on Friday January 30, 2004 @03:10PM (#8137537)

They're deleting all the telemetry and science data Spirit's taken since launch. The OS is in the EEPROMS. With one exception, they can repeat all of the measurements & photos that will be lost. The exception: As one of the orbiters happened to fly directly overhead it took some atmospheric measurements; and, simultaneously Spirit performed the same measurement from the ground -- This would have given them a full thickness measurement of what was going on in the atmosphere at that moment.

Re:Repeat? (Score:5, Informative)

by WhiteWolf666 ( 145211 ) writes: <sherwin.amiran@us> on Friday January 30, 2004 @03:15PM (#8137591) Homepage Journal

As I understand it, the first thing they did once they got Opportunity on the ground was to clear out all the spaceflight 'cruise' data.

I imagine that someone is keeping an eye on it.

Re:My question (Score:5, Informative)

by Ralph Wiggam ( 22354 ) * writes: on Friday January 30, 2004 @03:21PM (#8137653) Homepage

Someone here with VXworks experience explained this a few days ago. To be safe, the system reboots when a memory allocation request fails. It sounds like Windows tech support, step 1: reboot computer. The workaround is to prevent those requests from failing.

-B

Re:Backup ROM? (Score:4, Informative)

by confused one ( 671304 ) writes: on Friday January 30, 2004 @03:29PM (#8137737)

Ummmm, there is. The OS is in the EEPROM. That's how they recovered it: reboot from EEPROM with the Flash disk turned off.

Re:Any theories on what caused the corruption? (Score:3, Informative)

by Cranky_92109 ( 414726 ) writes: on Friday January 30, 2004 @03:34PM (#8137795)

Apparently it was simply too many files and the FS ran out of inodes.

Although that is the simple version that most of the press has been relaying, if you've watched the press conferences, the engineers have been carefull to say that they have not been able to fully reproduce the exact same errors on their test rovers here on Earth. The exact cause of the problem really hasn't been determined. And yes, they did stress test the file system before they sent the rovers up and they never saw the type of problem that they're having now.

Re:My question (Score:5, Informative)

by confused one ( 671304 ) writes: on Friday January 30, 2004 @03:35PM (#8137807)

It's not so much that VxWorks reboots when a memory allocation request fails. It that the memory allocation request will cause the kernel to crash & later a watchdog timer will interrupt the processor & force it to reset.

Re:rm -rf?! (Score:3, Informative)

by Waffle Iron ( 339739 ) writes: on Friday January 30, 2004 @03:45PM (#8137906)

If you, say, hypothetically, created such a file, how would you get rid of it?
rm -- -rf
Or just use your favorite GUI file manager.

VxWorks memory, embedded protection (Score:5, Informative)

by devphil ( 51341 ) writes: on Friday January 30, 2004 @03:59PM (#8138026) Homepage
Released versions of VxWorks do not have protected memory. (The development version does.) So nothing is there to prevent overwrites by concurrent tasks, etc.

Those of you in the audience experienced in embedded systems know that this makes sense for embedded hardwar -- VxWorks or not -- for three main reasons:
1. Stuff running in such environments is damn near bug-free. It's not like, say, Mozilla, or even the Linux kernel, or even /bin/ls. These things get tested rigourously, not as an afterthought deligated to the junior programmer.
2. In systems which are allowed to fail once in a while, reboots are fast. There's no hard drive to spin up, no filesystem to fsck, etc. It can just go *click* and humans won't typically see an interruption in [whatever it was the doohickey was doing].
3. There's usually no point in memory protection. If the propulsion system walks off the end of a garbage pointer, mission's over. No real use in keeping the guidance system going; it's already on a ballistic uncontrollable arc. If some critical part of the super-smart pacemaker fails (see #1), there's no victory in digging the device out of the corpse and saying, see, this other critical part wasn't affected, thanks to the memory protection! In those cases, memory protection just increases the cost and size of a device, without helping anything.
Protected memory is good for systems which do more than one thing, and/or have parts which can die without killing the whole device (e.g., a desktop computer). And as I said above, some embedded OSes are added such protection for customers who want to adapt their technology to more general-purpose tasks.
Re:Pretty much OT but an interesting question (Score:3, Informative)

by bravehamster ( 44836 ) writes: on Friday January 30, 2004 @04:33PM (#8138319) Homepage Journal

The U.S. gov't owns them. But, they're probably considered "Abandoned in place" or something.

Related to this topic, I read somewhere that NASA has officially stated that the lunar rover vehicle left on the moon is available for anyone who wants it. At a development cost of over $2mil, it's one of the most expensive cars ever developed. I call shotgun!

Re:Pretty much OT but an interesting question (Score:3, Informative)

by FlexAgain ( 26958 ) writes: on Friday January 30, 2004 @04:52PM (#8138483)

This is kind of a continuation of an earlier post in a different thread, but I wonder who owns these probes? When we eventually send colonists to Mars, are they free to pick apart these things, lug them back to base as decorations, etc. I am guessing the "possession is 9/10ths of the law" fits pretty well here, even though I would bet NASA would throw a hissy fit if some other country took one of the rovers back to base to use as a boot scraper.

I don't know what the general answer to this question is, but I do know that ownership of the Viking 1 lander was transferred to the Smithsonian [si.edu] from NASA. This does imply that NASA believes itself to still be the owner of these landers, presumably they consider them to be just waiting collection, and not abandoned.

Re:My question (Score:3, Informative)

by DerekLyons ( 302214 ) writes: <[moc.liamg] [ta] [retawriaf]> on Friday January 30, 2004 @06:22PM (#8139359) Homepage

Docrates:

Bottom line, it WAS a bug that could only surface with thousands of files in flash, which is something they didn't try on the ground.

crawling_chaos:

Which is a reminder to always test the boundary conditions, no matter how ridiculous they may seem. If it is possible to have that many files, then the regression test scripts should generate that many files during testing.

The problem here isn't so much one of boundary conditions, but of the subtle differences between simulators and reality.
None of the MER simulators ever ran for more than a few day at a time. The (highly reasonable) assumption was that a computer that could routinely run multiple times for a few days was stable for much longer periods. Since the test machine was rebooted regularly and set up for specific tests, there was never enough time for 'garbage' to accumulate to the point where it became a problem. This could be solved by running longer tests, but when you only have a few years between the start of a (lander mission) program and it's launch, it's very difficult to arrange for months long tests. You could extend the development phase, but that increases expenses significantly. (And unless you are *very* careful and lucky, you end up with some hardware sitting around for extended periods of time before integration. This is not without significant risks of it's own.)
This is why NASA performs what many call 'Wile E. Coyote' engineering. If it works once, keep doing it. If it fails once, never do it again.

The enourmous cleanroom requirements for all phases of spacecraft assembly comes from some minor but recurring failures due to minor contamination *all the way back in the Ranger program*. Airbags were used with MER because they had worked with the (very similiar) Pathfinder mission, while the rockets of MPL had been a failure. (Even though the cause of the failure was clearly and completely understood.)
NASA engineering, spacecraft, policies, and procedures as a result are a very weird mix of cutting edge and "we've always done it this way and never had a problem". Poorly understood cutting edge (pure O2 atmospheres in spacecraft) have killed, but then so has "always done this way" (O-rings, foam shedding).

Re:VxWorks memory, embedded protection (Score:3, Informative)

by WolfWithoutAClause ( 162946 ) writes: on Friday January 30, 2004 @06:48PM (#8139571) Homepage

And having memory protection only costs you maybe 3% of run speed,
I'd want to see real, hard numbers.
Check out any book on OS design. That's about the typical average cost. The processor's memory manager often has a limited list of memory regions, for example 8 or 16, and this is used as a cache for the active regions that are needed. Whenever the processor attempts to access memory outside those regions, an exception/interrupt is thrown, and then the cache needs to be updated, or an exception thrown to that task. That takes a while to process- the amortised cost is typically around that kind of value.
Clearly there are applications where the N% lost to protection is too much
In my experience, this is exceptionally rare. That final 3% can nearly always be clawed back with subtle changes to the software. If it really can't- then the processor was too slow for the task in hand in the first place- normal feature creep has a much bigger than 3% effect on the system performance.

Re:Any theories on what caused the corruption? (Score:3, Informative)

by flewp ( 458359 ) writes: on Friday January 30, 2004 @11:23PM (#8141641)

Seriously, when are people going to realise that you can't just take modern components you'd use with your computer and put it in a space craft?

Not only did it take ~7 months (I think) to get to Mars, meaning it was already 7 months "obsolete" when it landed, it was also probably not launched for a LONG time after it was completed. They have to test it extensively and test for every possible scenario they can come up with. They also have to have equipment that is suitable for the extremes of space travel. High G forces, extreme vibration, extreme temperature differences, radiation, and probably thousands of other things I don't know about.

Put simply, YOUR PC THAT SITS ON YOUR DESKTOP WON'T SURVIVE A ROCKET LAUNCH INTO SPACE. And if it did survive the actual launch and made it into space, it would fail very quickly.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Spirit 'Will Be Perfect Again' 331

Spirit 'Will Be Perfect Again' More Login

Spirit 'Will Be Perfect Again'

Re:My question (Score:5, Informative)

Re:Science and Engineering files? (Score:2, Informative)

Re:Any theories on what caused the corruption? (Score:2, Informative)

Re:Any theories on what caused the corruption? (Score:5, Informative)

Re:Repeat? (Score:5, Informative)

Re:My question (Score:5, Informative)

Re:Backup ROM? (Score:4, Informative)

Re:Any theories on what caused the corruption? (Score:3, Informative)

Re:My question (Score:5, Informative)

Re:rm -rf?! (Score:3, Informative)

VxWorks memory, embedded protection (Score:5, Informative)

Re:Pretty much OT but an interesting question (Score:3, Informative)

Re:Pretty much OT but an interesting question (Score:3, Informative)

Re:My question (Score:3, Informative)

Re:VxWorks memory, embedded protection (Score:3, Informative)

Re:Any theories on what caused the corruption? (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot