Forgot your password?
typodupeerror
NASA Bug Mars Space

Software Error Likely Killed MGS Spacecraft 199

Posted by kdawson
from the off-by-one dept.
Aglassis writes "NASA investigators have determined that a software update performed in June of 2006 may have doomed the 10-year-old spacecraft. Apparently the software error caused the solar arrays to drive against a mechanical stop which then forced the spacecraft into safe mode. Unfortunately, after that the spacecraft's radiator was pointed at the sun which overheated the battery and destroyed it. Contact was lost with the Mars Global Surveyor spacecraft in November 2006. NASA will form an internal review board to determine formally the cause of the loss of the spacecraft and what remedial actions are needed for future missions."
This discussion has been archived. No new comments can be posted.

Software Error Likely Killed MGS Spacecraft

Comments Filter:
  • by Ancient_Hacker (751168) on Thursday January 11, 2007 @10:45AM (#17556956)
    Just one more example of how Computer Science sint quite up to the reliability requirements of Space:
    • A missing comma in a Do-loop statement causes the first mission to Mars rocket to go off course and blow up.
    • The space-shuttle programs had a race condition that causes the first launch to be scrubbed.
    • The space-shuttle re-entry program had one important variable off by a factor of -4, causing rthe first re-entry to be a bit wobbly.
    • A Ariane guidance program had multiple basic design glitches that caused the first launch to blow up.
    • The F-16 autopilot worked very well, until the plane was deployed to Australia, where on its way there it bounced off the equator.
    • The LEM landing program didnt protect itself from spurious radar data, causing the computer to get behind.

    Aero and space are very unforgiving of human coding errors.

  • by the_humeister (922869) on Thursday January 11, 2007 @10:54AM (#17557088)
    Because it'd be even less user friendly than Linux. Plus they'd also require people to run 80386 processors with 4 MB memory, if that.
  • by Fishbulb (32296) on Thursday January 11, 2007 @11:50AM (#17557984)
    The F-16 didn't "bounce off the equator". Before it ever flew, in simulation the computer flipped the plane over when it crossed the equator due to a bug that incorrectly handled southern lattitudes. Additionally, since the computer "flip" happened instantaneously, and the f-16 can roll at much higher G forces than the pilot can take, the flip would have killed the pilot (and the F-16 would have happily continued on its way).

    http://portal.acm.org/ft_gateway.cfm?id=163293&typ e=pdf&coll=GUIDE&dl=GUIDE&CFID=11154656&CFTOKEN=19 136062 [acm.org]
  • Nope. (Score:2, Informative)

    by Anonymous Coward on Thursday January 11, 2007 @02:33PM (#17561124)
    Additionally, since the computer "flip" happened instantaneously, and the f-16 can roll at much higher G forces than the pilot can take, the flip would have killed the pilot

    A single, half-roll to inverted in the Falcon wouldn't have exerted enough Gs on the pilot to do anything worse than to exclaim WTF!, and disengage the a/p. A roll in and of itself in an aircraft doesn't really induce much Gs.... a "bank-and-yank" turn does, and that's what the F16 can do at higher Gs than the pilot can take... not the roll.

  • by iamlucky13 (795185) on Thursday January 11, 2007 @02:49PM (#17561504)
    It wasn't one engineer. It was a team effort. And it wasn't a very simple matter of "forgetting". Several factors combined, including re-use of code from the MGS mission (a conversion factor was in the old code, but not recognized when the code was adapted for the doomed MCO) and budget constraints that limited pre-flight testing (so bug was missed...and in fact might have still been missed even with more testing). The effects of the bug were also subtle enough that 3 minor main engine firings were conducted without enough error showing up to reveal the problem. It wasn't until the long orbital insertion firing that the error in the trajectory became noticeable, and by then it was too late. The team's first clue something was wrong was when the spacecraft didn't radio home after the engine burn.

    The details are really convoluted, but the Wikipedia page [wikipedia.org] on the mission has a decent write up explaining how the mistake was made, with additional resources cited. The PDF paper giving a perspective from the MCO team is particularly revealing, if you've got some time on your hands.
  • by Ancient_Hacker (751168) on Friday January 12, 2007 @06:56AM (#17571566)
    >Additionally, since the computer "flip" happened instantaneously, and the f-16 can roll at much higher G forces than the pilot can take, the flip would have killed the pilot.

    Well your whole post is called into question due to quite a few questionable items:

    • It seems unlikely that the lattitude would enter at all into any calculation of roll attitude. If so, it's more than a "bug", it's a basic design mistake.
    • The F-16 does have a high roll rate, about 320 degrees per second, but since the pilot is very close to the roll axis, there's very little acceleration at the pilot's position during your basic aileron-roll. Pilots routinely apply maximum roll without dying, or even passing-out.
    • Nobody dies intantly from excess G-s... Fighter pilots overdo it all the time. Usually they let off the stick as they feel the early effects, such as a narrowing or darkening field of vision. If they keep on commanding too many G's, they'll pass out and that will let pressure off the controls, which quickly reduces the G forces. Good fail-safe system.
    • Flipping upside down will quickly send blood to the head, which is exacrtly what's needed to recover from too many positive G's.

Innovation is hard to schedule. -- Dan Fylstra

Working...