Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Space Science

Spirit 'Will Be Perfect Again' 331

G. Holst writes "NASA technicians are preparing to wipe Spirit's flash memory clean of science and engineering files that have stymied its software. The fix, likely to be made Friday, could completely restore Spirit. "I think it will be perfect again," says the Mission Manager. Chalk this one up for earth!" There are numerous stories about Spirit and Mars: one describes being careful with rm -rf. Reader Tablizer sends in an interesting site: "I discovered Bill Momsen's website where he describes his experiences working on the first successful photographic mission to another planet: Mariner IV to Mars."
This discussion has been archived. No new comments can be posted.

Spirit 'Will Be Perfect Again'

Comments Filter:
  • Repeat? (Score:5, Insightful)

    by sabrex15 ( 746201 ) on Friday January 30, 2004 @02:48PM (#8137287)
    One has to wonder, is opportunity going to forego the same problems as spirit?.. As they are "identical" robots.. have steps been put in place to prevent the 2nd robot from "getting full".. I should certainly hope that we dont want this to happen again, as they might not be as lucky to regain it.
  • Apparently it was simply too many files and the FS ran out of inodes. Remember that they're constrained to a 256MB file system. It wouldn't surprise me if they used an 8 bit or 16 bit number for the inode count. (Ah, the joys of Vx(Doesn't)Works.)

    On another note, does anyone know exactly what they're deleting here? While I understand that they need to get this mission underway, is there a chance they could lose valuable mission or navigational information?
  • Re:My question (Score:5, Insightful)

    by PineGreen ( 446635 ) on Friday January 30, 2004 @03:00PM (#8137434) Homepage
    Yes, actually it seems to be a filesystem bug... I mean, a reasonably stable filesystem - every OS has this, I am really surprised they messed this up! I wouldn't mind if it was an obscure kernel race condition or something, but filesystem!!!
  • by sabrex15 ( 746201 ) on Friday January 30, 2004 @03:04PM (#8137469)
    It kind of makes me wonder, hadn't they dont extensive testing of the rovers before they sent them off?... I thought they did full-walkthrough type tests on the rovers here on earth, i.e taking pictures, navigating.. but what?... now the darn thing runs out of memory?.. I bet someone is smacking themselves for not checking that out properly beforehand.
  • Backup ROM? (Score:2, Insightful)

    by xTown ( 94562 ) on Friday January 30, 2004 @03:06PM (#8137491)
    I'm neither a rocket scientist nor a computer scientist, so maybe this is a dumb question, but how come there's not some sort of ROM somewhere in the rover itself that contains a backup of the system in its initial state? Obviously, you'd only use it in a worst-case scenario, but you could restore it and then there'd at least be something and they could reapply all the patches one by one.
  • Re:My question (Score:5, Insightful)

    by Mr2cents ( 323101 ) on Friday January 30, 2004 @03:07PM (#8137498)
    Even if the memory handling is shitty, I wonder how it could have caused so much havoc.. How could it have caused spirit to go into the reset loop? It seems like some bad error handling code was also in play here (just guessing, the details aren't public to my knowledge..).
    Another thing that surprised me is that if the flash had been broken, all data had to be uploaded before the rover went to sleep.. every modern PC can continue to refresh it's DRAM while sleeping. Why can't spirit? Maybe a feature to consider on future missions?
  • by Jarnis ( 266190 ) on Friday January 30, 2004 @03:09PM (#8137525)
    The same reason why your hard drive is cluttered with old unused files.

    Why delete, when you still have room on the flash and you *just* might need that file later...

    Of course they then found out that their filesystem handler borks out way before the flash is actually filled up, and that almost bought the whole show to an end... Software QA testing failure in my books, but they seem to be recovering from the fumble pretty well...
  • Re:My question (Score:5, Insightful)

    by techiemac ( 118313 ) <techiemac AT yahoo DOT com> on Friday January 30, 2004 @03:24PM (#8137692)
    Ok ok ok... chill out everyone...
    VXWorks is not that bad (I use it on almost a daily basis). Every single OS has its problems. Before we all go and start calling VXWorks or Spirits software a crappy piece of code, you have to understand what goes into writing space qualified software.
    This is not some thing you hack together over the weekend. In fact something you wrote for a space system over the weekend would be tested over a period of months and possibly even years depending on the criticality of the code. We're talking life critical system testing here. That means all paths for you code heads out there.
    That said, even when you hit rubber to the road, there are always unexpected situations. Something that you didn't anticipate, a bug that made its way through under circumstance x. Hands up for everyone here who has written a complex bug free system right out of the gates. Anywone who just lifted their hand does not understand what a complex system is or a bug. Though stuff that flies tends to be pretty darn close to bug free.
    We are dealing with many complex unknowns when we land something on another planet.
    VXWorks is actually very popular with the space program. It's not perfect but neither is Linux (though someday it will be right ;) ). In fact the whole system that they are using on the rover has flown quite a few times (VXWorks running on rad hardened PowerPCs with a VME bus for it's backbone).
    Trust me, the software running on the rover is not crappy. In fact, the fact they can bring it back to life like they did says a lot.
  • Re:My question (Score:3, Insightful)

    by techiemac ( 118313 ) <techiemac AT yahoo DOT com> on Friday January 30, 2004 @03:30PM (#8137747)
    To answer your question, there was probably a watchdog timer that caused it to go into a reset loop.
    Yes modern PCs have all of these wiz bang features but let me ask you this... would you want to be on an airplane where it's fly by wire system was controlled with your PC? No probably not.
    Systems that fly and are life critical (yes there is no one on it, but space systems are held to that standard) cannot have a bunch of wiz bang features on board. The more you add, the more potential for failures. So you try to mitigate your risks as much as possible. You can't go out there to simply tweak the chip that failed because it got zapped by radiation as it was heading over to Mars.
  • Re:My question (Score:5, Insightful)

    by crawling_chaos ( 23007 ) on Friday January 30, 2004 @03:42PM (#8137884) Homepage
    Bottom line, it WAS a bug that could only surface with thousands of files in flash, which is something they didn't try on the ground.

    Which is a reminder to always test the boundary conditions, no matter how ridiculous they may seem. If it is possible to have that many files, then the regression test scripts should generate that many files during testing.

    At least it's fixable.

  • by PhuCknuT ( 1703 ) on Friday January 30, 2004 @03:55PM (#8137988) Homepage
    Well it's not as risky as it sounds, the memory they are wiping is secondary storage and doesn't contain any of the OS. It would be like formatting a floppy or erasing a cdrw, they will lose the data that has been saved their since the mission started, but they aren't risking any of the OS itself.
  • Re:Mars Rover (Score:2, Insightful)

    by LedZeplin ( 41206 ) on Friday January 30, 2004 @04:15PM (#8138152)
    I like your point, but with over half the crafts to mars lost, i wouldn't call that part routine just yet.
  • by srleffler ( 721400 ) on Friday January 30, 2004 @04:30PM (#8138294)
    You have to remember that the computer wasn't built this year. It was probably assembled several years ago and has been undergoing testing since.

    It also probably waited a while to be launched, and it took seven months just to get there.

  • The memory is not faulty. It is a bug in the filesystem software. The memory isn't full, but there are more files than the rover can handle. They were basically letting everything pile up, so the rover had eighteen days worth of files (and pre-landing files on top of that.) With the other rover they are deleting the files after they are received on Earth.

    Tim
  • by WolfWithoutAClause ( 162946 ) on Friday January 30, 2004 @06:13PM (#8139270) Homepage
    There's usually no point in memory protection. If the propulsion system walks off the end of a garbage pointer, mission's over. No real use in keeping the guidance system going; it's already on a ballistic uncontrollable arc. If some critical part of the super-smart pacemaker fails (see #1), there's no victory in digging the device out of the corpse and saying, see, this other critical part wasn't affected, thanks to the memory protection! In those cases, memory protection just increases the cost and size of a device, without helping anything.

    Garbage, a well designed system would reboot in the middle of thrusting, without affecting the system at all; except maybe anything that was supposed to happen during the reboot would have to wait till after the reboot.

    And if a pacemaker didn't kick the heart once- the patient is dead? No. The pacemaker is there to keep the heart running at a particular (often faster) rate, not keep the patient alive, second by second.

    I mean, very few computer systems are real-time critical 100% of the time.

    And having memory protection only costs you maybe 3% of run speed, but on the plus side, it allow you to find bugs- really nasty bugs 'memory tramplers' for example- that can corrupt the whole system- and you never quite know what that corruption would do- it could do anything at all. Anything.

  • by nathanh ( 1214 ) on Friday January 30, 2004 @06:43PM (#8139536) Homepage
    Stuff running in such environments is damn near bug-free. It's not like, say, Mozilla, or even the Linux kernel, or even /bin/ls. These things get tested rigourously, not as an afterthought deligated to the junior programmer.

    That's false reasoning.

    1. No practical software is bug-free.

    2. Testing is never complete.

    3. People make mistakes, even during testing.

    4. Spirit broke down.

    It makes sense, when building a robust system, to do rigorous testing AND have the memory protection.

    VxWorks obviously has a brilliant team of brainwashers^Wsalesmen because they've convinced you that you don't need a feature they don't offer. Perfect!

  • by devphil ( 51341 ) on Friday January 30, 2004 @07:31PM (#8139946) Homepage
    It makes sense, when building a robust system, to do rigorous testing AND have the memory protection.

    Absolutely. While building it.

    VxWorks obviously has a brilliant team of brainwashers^Wsalesmen because they've convinced you that you don't need a feature they don't offer. Perfect!

    I forgot, this is slashdot, where VxWorks is the eternal enemy, and second-guessing actual rocket scientists is the national sport.

    IIRC, memory protection was removed from the early versions by popular request, because the cost was too high. Clearly not everyone out there agrees with the opinions stated by sibling posts to yours.

    Me, personally, I don't give a rat's ass one way or the other. (I don't use VxWorks, and haven't had a single segfault in any of my code since I stopped using C.) I just dislike seeing the groupthink mentality defended so vigourously, thus my initial post.

  • by angst_ridden_hipster ( 23104 ) on Friday January 30, 2004 @09:03PM (#8140793) Homepage Journal
    From the Christian Bible, Matthew 26:41.

    "Watch and pray, that ye enter not into temptation: the spirit indeed is willing, but the flesh is weak."

  • by grozzie2 ( 698656 ) on Friday January 30, 2004 @10:39PM (#8141368)
    We'd be winning if Earth would quit sending in the second string players (ie. Russia). ;-)

    The reality of the situation is, first string (Russia) is all tied up doing manned missions, so they have delegated the robot probes to the second string (usa). This is mostly due to the little detail, second string has no operational man rated vehicles to work with....

    Not quite sure how China plays in yet, but, they also have a manned program these days, so, second place is actually up for grabs, the robot probes may soon have to take the third bench....

  • Skirt-cam! (Score:1, Insightful)

    by Anonymous Coward on Saturday January 31, 2004 @04:53PM (#8145848)
    From the 1965 Mariner 4 website: We also had access to TV cameras located throughout the building, some with tilt and pan controls. A favorite pastime was to train the camera on the front door, and when a particularly attractive female entered, to follow her on her walk through the building.

    First skirt-cam? Geeks never change.

HELP!!!! I'm being held prisoner in /usr/games/lib!

Working...