Software Error Likely Killed MGS Spacecraft 199
Aglassis writes "NASA investigators have determined that a software update performed in June of 2006 may have doomed the 10-year-old spacecraft. Apparently the software error caused the solar arrays to drive against a mechanical stop which then forced the spacecraft into safe mode. Unfortunately, after that the spacecraft's radiator was pointed at the sun which overheated the battery and destroyed it. Contact was lost with the Mars Global Surveyor spacecraft in November 2006. NASA will form an internal review board to determine formally the cause of the loss of the spacecraft and what remedial actions are needed for future missions."
Don't believe it (Score:5, Funny)
Its most likely the Martian automated defense system setup just before we sent a probe and destroyed their civilisation [slashdot.org].
Should have used Gentoo!! (Score:2)
Re:Should have used Gentoo!! (Score:5, Insightful)
No sandbox can avoid the fact that one test was missing.
Re: (Score:2)
Re: (Score:2)
What you need to do is hold back on producing all those "fun" bugs that we all introduce into systems until you've the reputation as one of the best coders in the world, then go work for NASA and just go wild on some system that won't be used until it's in deep space and you're off working for Google, having destroyed the paper trail.
Re: (Score:2)
[rubs hands together in childlike glee, picturing large & spectacular catastrophes to come]
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Where's K'Breel? (Score:3, Insightful)
Re: (Score:2)
Battery (Score:5, Funny)
a Technical solution I see: (Score:3, Insightful)
What is Microsoft wrote it? (Score:5, Interesting)
Re: (Score:3, Informative)
Re: (Score:2)
Re:What is Microsoft wrote it? (Score:5, Funny)
Luxury! (Score:5, Funny)
Or at least, that's how I remember it...
Re: (Score:3, Insightful)
Why don't all computers use just a single configuration (peripherals, cards, interfaces)?
The purpose of an operating system is so much wider than what the Mars Global Surveyor had to do.
Re:What is Microsoft wrote it? (Score:5, Insightful)
That said, you could get software written to this level of perfection if you wanted. It's easy- follow the space shuttle's team's example. You have a stable team of mature developers who work reasonable hours. You test the hell out of the software to the point a single bug in a test is reason to redo the software. You run the software on four identical computers and make sure they all agree.
Then you hire another entire team to write code that does the same thing, but otherwise has no contact with the first team. That software runs on a fifth computer that takes over if something happens to the other four.
Willing to pay for that?
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
But why actually do it when Microsoft can just pocket the money instead?
It's an interesting idea if you could sell a linux distro with that goal in mind (you get nothing for your money but a disc and promise of future R&D). It might be too clever for most to accept.
Re: (Score:2)
Re: (Score:2)
"Honey, Is it Verb 37, Noun 40 to start Solitaire, or Verb 40, Noun 37?" [spaceborn.dk]
*phew* (Score:5, Funny)
Glad i'm not the programmer who came up with that bit of code! Their next performace review is going to be _lots_ of fun!
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
And Slashdot will blame Microsoft, even though they had zero to do with it...
"Safe" mode? (Score:5, Funny)
Maybe NASA's 'safe mode' just put 'safe mode' in the corners of all the returned images and did them in 8-bit colour...
Bits (Score:2)
THis is one of the joys of embedded systems (Score:2)
In a realtime control system, a fault is a system failure. If there is no backup/recovery procedure then there is no such thing as a "safe mode".
Re: (Score:2)
Different design decisions, I guess, but it still sounds kind of fishy...
YACCS -Yet Another Computer Corkup in Space (Score:5, Informative)
Aero and space are very unforgiving of human coding errors.
Re: (Score:2, Interesting)
Re:YACCS -Yet Another Computer Corkup in Space (Score:5, Insightful)
Maybe we CS types need our own safety movies, perhaps When Buffers Attack!, Threads: Your Parallel Friends or Quagmires of Debugging DOOM?, or maybe Metric or Imperial: You Mean there's a Difference? Or maybe we need to recognize that many of us have the same awesome responsibility that other engineers do of protecting human lives from the consequences of our mistakes. I'm told that this point is hammered home in engineering schools, why not in CS departments?
Re:YACCS -Yet Another Computer Corkup in Space (Score:4, Funny)
Re: (Score:2)
*Sigh*
Re: (Score:3, Insightful)
Things like this are built as teams- and team members have to make certain assumptions about the accuracy of the other team members' work. Those algorithms should have been validated before even being handed off
Re:YACCS -Yet Another Computer Corkup in Space (Score:4, Insightful)
Engineering and applied mathematics are much more demanding than computer programming. Sure, one could argue that "computer science is math too", but my experience is that CS majors don't graduate with a strong math background. And even if they did once know some calculus and linear algebra, they were never required to apply it like an EE or Applied Math person would.
So while you could find a rigorous programmer or software engineer (and I use the term "software engineer" very loosely, because few individuals actually fit that description), it's often a lot easier to look for an engineer or applied mathematician with good programming skills. Their math and physics is usually significantly stronger, and they actually understand what they're programming.
We need Computer Engineering, not Scientists. (Score:2)
There is no discipline in Computer Programming these days, because Computer Programmers don't know how
Re: (Score:2)
One reason I love OSS is that its goal is to standardize and reuse code. I hate reinventing the wheel over and over again just because of some dumb non-disclosure clause.
I think part of the problem is that computer programming takes a special blend of language, mat
Re: (Score:2)
And in a lot of the top institutions that's exactly what has happened. My degree is an engineering one, not a science.
The simplest program is done differently by every programmer where if engineers were doing it they'd all be taught to do it the exact same way.
If you re-write the exact same code over and over you're an idiot. The problem is that (at an application level) no
Re: (Score:3, Insightful)
We're never going to improve as long as people insist on comparing software development to building bridges, i.e. a more sophisticated understanding of the problem is needed. In software, once you have a program for a bridge you can make a billion bridges, all alike or customized by certain parameters, just by running the program. So being "able to build the same damn bridge 100 times" doesn't get you anywhere. Making it better and safer each time? That's another story, and once again, the comparison t
Re: (Score:3, Insightful)
"Well, here we're using the global "qzv" as a loop variable, but over here we'll use it to mean how many widgets we're looking at, and over here, it's our exit condition. Oh, and we'll set it to '5' over here for no discernable reason. Now, here's where we've cut and pasted the code 15 times so that we could change one variable's type (instead of usi
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Insightful)
Mistakes happen when you code. Sure, you try to minimize them but even the most carefully designed code can't be guaranteed to be 100% error free. That's why you employ, presumably, a top-notch QA team to check and recheck, testing your "perfect" code in ways that perhaps you never even considered.
This is what you would expect in a terrestrial application. When the platform that your code is going to run on isn't bound to th
Re:YACCS -Yet Another Computer Corkup in Space (Score:5, Insightful)
NASA has got it rough, has since the mid 70s. Their wildest successes are regarded as routine and hardly noticed by the public eye. Their failures, on the other hand, are spun to be the worst disasters in human history. Granted, when shuttles explode and people die, it's reasonable that the public be concerned. But it seems to me that for every 20 great things that NASA accomplishes, the media picks 1 failure (and sometimes blows that failure out of proportion) to rile the masses into a furious frenzy calling for the dissolution of NASA.
Re: (Score:2)
Re: (Score:2)
And even then - there exists the non-trivial possibility that something might slip through. No QA is ever going to reach
Reliability compared to what? (Score:2)
Just one more example of how Computer Science isn't quite up to the reliability requirements of Space
And how many failures have happened because of an enginering mistake?
You seem to assume that there's zero failure in space for everything else, and 6 problems in.. 30 years? is some horrible record.
All information only makes sense in context. What's the failure rate of other components of the system?
Re:YACCS -Yet Another Computer Corkup in Space (Score:5, Informative)
http://portal.acm.org/ft_gateway.cfm?id=163293&ty
Nope. (Score:2, Informative)
A single, half-roll to inverted in the Falcon wouldn't have exerted enough Gs on the pilot to do anything worse than to exclaim WTF!, and disengage the a/p. A roll in and of itself in an aircraft doesn't really induce much Gs.... a "bank-and-yank" turn does, and that's what the F16 can do at higher Gs than the pilot can take... not the r
Re: (Score:3, Informative)
Well your whole post is called into question due to quite a few questionable items:
Re: (Score:2)
And don't forget the Mars Climate Orbiter "Dirt Dart" mission (http://en.wikipedia.org/wiki/Mars_Climate_Orbite r ). Okay the operators helped by plugging in the wrong units but neither did the software catch the discrepancy in the values.
The systems aboard the spacecraft were not able to reconcile the two systems of measurement, resulting in the navigation error.
Operator error but it would be interesting to figure in the number of accidents that the software could have prevented the operator from ente
Re: (Score:3, Insightful)
"On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
Plus ça change, plus c'est la même chose.
Re: (Score:2)
What or the chances you would use 3248532346863247 as a test
Emulation? Testing? (Score:2)
Why is it that the iPod, hell, even my Java phone is more reliable than these aerospace things?
Re: (Score:2)
Mars lander Spirit started randomly rebooting [wikipedia.org] due to a flash memory access problem.
Mars Polar Lander was lost - there's no definitive proof but it's thought [wikipedia.org] it was a software error (sensing leg deployment ready for landing as actual landing, and hence deactivating thrust too early).
Mars climate orbiter? Chalk that one up to a metric/imperial conversion error [wikipedia.org]... in software.
Of course, the argument could be made that there's no real alternative to doing it in software...
Michael
Re: (Score:2)
MGS spaceship? (Score:2)
MGS? (Score:2)
We hardware types always blame software (Score:2)
Re: (Score:2)
reminds me of that old sig:
The 3 most dangerous situations:
A hardware guy with a software patch.
A user with an idea.
A coder with an electric iron.
Pilot said.... (Score:2, Funny)
Is this a sign? (Score:5, Insightful)
Some expert is always trumpeting the fact that "Johnny can't program," to which many of us roll our eyes and go back to coding. But could this be a sign that the quality of the help NASA is hiring is such that these kinds of mistakes are now rampant? I mean, this could have been avoided if the code had been tested out on a full-scale mock-up of the machine, to verify that it did what it was supposed to do, before ever sending the commands to the actual machine. If anything, it's a QA failure.
Re:Is this a sign? (Score:5, Insightful)
Re: (Score:2)
Re: (Score:2)
I used to wor
one serious error in ten years rampant? (Score:2)
Re: (Score:2)
Better than a metric-English conversion error (Score:3, Insightful)
On a positive note, it has provided me an instructive example for when I help my teenagers with their math homework. If they say it's "almost" correct, I tell them that the guy who screwed up the Mars mission probably said the same thing.
-ccm
Re: (Score:2, Insightful)
KFG
Re:"almost" correct (Score:2)
Re: (Score:2, Insightful)
Why should the bank even care? I don't even remember the last time I balanced my checkbook.
"Almost correct" is someone being spineless.
I just measured the hight of a tree with a meter long chunk of 2x4 and a bubble protractor. I get a figure of 10 meters. How many feet is that? 32.808399 is not the right answer. Using it is likely to result in your shell missing the top of the tree. 30 is the right answer. Why?
Neither you nor yo
Re: (Score:2)
And before you say it's all wrong if part of it's wrong, think about applying that standard to the entire assignment and you'll realize how specious it is.
As a lab instructor, I've even had to mark things wrong which have the answer correct: there
Re: (Score:2)
Calculation errors, process errors, logistics errors
Re: (Score:2)
In an ideal situation, a large majority of a grade ought
Re: (Score:2)
Re:Better than a metric-English conversion error (Score:5, Informative)
The details are really convoluted, but the Wikipedia page [wikipedia.org] on the mission has a decent write up explaining how the mistake was made, with additional resources cited. The PDF paper giving a perspective from the MCO team is particularly revealing, if you've got some time on your hands.
So what if the battery is dead? (Score:2)
Re: (Score:3, Insightful)
Re: (Score:2)
Re: (Score:2)
zing! (Score:2, Funny)
[quote]at least if something went wrong some guy at nasa could tell his grand kids that he bricked something from ~140 million miles away.[/quote]
http://slashdot.org/comments.pl?sid=214508&cid=17
Time for a recall of bad parts (Score:2)
It could probably do repairs to the ISS as well (spacewalks should be for fun, not for work).
Vampire Hackers (Score:2)
An easy fix... (Score:2)
Oh... right... manned exploration is a waste of money and robots are all we ever need.
Lack of QA strikes Nasa AGAIN. (Score:2)
These aren't 'normal workplace errors' that you have to live with, they're -stupid- errors, made because of stupid managers.
Re: (Score:2)
Never forget, what we use, abuse, and refuse, all is created by human hands. Which by design are imperfect.
MGS was currently a low priority for NASA (Score:2, Interesting)
Milestone (Score:2)
O_O (Score:2, Funny)
More propaganda (Score:2)