Software Error Likely Killed MGS Spacecraft 199
Aglassis writes "NASA investigators have determined that a software update performed in June of 2006 may have doomed the 10-year-old spacecraft. Apparently the software error caused the solar arrays to drive against a mechanical stop which then forced the spacecraft into safe mode. Unfortunately, after that the spacecraft's radiator was pointed at the sun which overheated the battery and destroyed it. Contact was lost with the Mars Global Surveyor spacecraft in November 2006. NASA will form an internal review board to determine formally the cause of the loss of the spacecraft and what remedial actions are needed for future missions."
a Technical solution I see: (Score:3, Insightful)
Re:What is Microsoft wrote it? (Score:3, Insightful)
Why don't all computers use just a single configuration (peripherals, cards, interfaces)?
The purpose of an operating system is so much wider than what the Mars Global Surveyor had to do.
Is this a sign? (Score:5, Insightful)
Some expert is always trumpeting the fact that "Johnny can't program," to which many of us roll our eyes and go back to coding. But could this be a sign that the quality of the help NASA is hiring is such that these kinds of mistakes are now rampant? I mean, this could have been avoided if the code had been tested out on a full-scale mock-up of the machine, to verify that it did what it was supposed to do, before ever sending the commands to the actual machine. If anything, it's a QA failure.
Better than a metric-English conversion error (Score:3, Insightful)
On a positive note, it has provided me an instructive example for when I help my teenagers with their math homework. If they say it's "almost" correct, I tell them that the guy who screwed up the Mars mission probably said the same thing.
-ccm
Re:Should have used Gentoo!! (Score:5, Insightful)
No sandbox can avoid the fact that one test was missing.
Re:YACCS -Yet Another Computer Corkup in Space (Score:5, Insightful)
Maybe we CS types need our own safety movies, perhaps When Buffers Attack!, Threads: Your Parallel Friends or Quagmires of Debugging DOOM?, or maybe Metric or Imperial: You Mean there's a Difference? Or maybe we need to recognize that many of us have the same awesome responsibility that other engineers do of protecting human lives from the consequences of our mistakes. I'm told that this point is hammered home in engineering schools, why not in CS departments?
Where's K'Breel? (Score:3, Insightful)
Re:What is Microsoft wrote it? (Score:5, Insightful)
That said, you could get software written to this level of perfection if you wanted. It's easy- follow the space shuttle's team's example. You have a stable team of mature developers who work reasonable hours. You test the hell out of the software to the point a single bug in a test is reason to redo the software. You run the software on four identical computers and make sure they all agree.
Then you hire another entire team to write code that does the same thing, but otherwise has no contact with the first team. That software runs on a fifth computer that takes over if something happens to the other four.
Willing to pay for that?
Re:YACCS -Yet Another Computer Corkup in Space (Score:3, Insightful)
Mistakes happen when you code. Sure, you try to minimize them but even the most carefully designed code can't be guaranteed to be 100% error free. That's why you employ, presumably, a top-notch QA team to check and recheck, testing your "perfect" code in ways that perhaps you never even considered.
This is what you would expect in a terrestrial application. When the platform that your code is going to run on isn't bound to the same gravitational source that you are, you would think...you would *hope*...that the QA team might do an even more thorough job.
If this event is at all indicative of the QA efforts that NASA will be making for our return to the moon, perhaps we'd be better off staying at home.
Re:Is this a sign? (Score:5, Insightful)
Re:YACCS -Yet Another Computer Corkup in Space (Score:5, Insightful)
NASA has got it rough, has since the mid 70s. Their wildest successes are regarded as routine and hardly noticed by the public eye. Their failures, on the other hand, are spun to be the worst disasters in human history. Granted, when shuttles explode and people die, it's reasonable that the public be concerned. But it seems to me that for every 20 great things that NASA accomplishes, the media picks 1 failure (and sometimes blows that failure out of proportion) to rile the masses into a furious frenzy calling for the dissolution of NASA.
Re:Battery (Score:1, Insightful)
Re:YACCS -Yet Another Computer Corkup in Space (Score:3, Insightful)
Things like this are built as teams- and team members have to make certain assumptions about the accuracy of the other team members' work. Those algorithms should have been validated before even being handed off to the programmers, and then validated *again* as part of integrated testing.
Re:Better than a metric-English conversion error (Score:2, Insightful)
KFG
Re:So what if the battery is dead? (Score:3, Insightful)
Re:YACCS -Yet Another Computer Corkup in Space (Score:3, Insightful)
"On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
Plus ça change, plus c'est la même chose.
Re:YACCS -Yet Another Computer Corkup in Space (Score:4, Insightful)
Engineering and applied mathematics are much more demanding than computer programming. Sure, one could argue that "computer science is math too", but my experience is that CS majors don't graduate with a strong math background. And even if they did once know some calculus and linear algebra, they were never required to apply it like an EE or Applied Math person would.
So while you could find a rigorous programmer or software engineer (and I use the term "software engineer" very loosely, because few individuals actually fit that description), it's often a lot easier to look for an engineer or applied mathematician with good programming skills. Their math and physics is usually significantly stronger, and they actually understand what they're programming.
Re:YACCS -Yet Another Computer Corkup in Space (Score:3, Insightful)
"Well, here we're using the global "qzv" as a loop variable, but over here we'll use it to mean how many widgets we're looking at, and over here, it's our exit condition. Oh, and we'll set it to '5' over here for no discernable reason. Now, here's where we've cut and pasted the code 15 times so that we could change one variable's type (instead of using templates), but naturally, all of the bugfixes we've applied since then haven't all migrated into all of the versions. Ah, here's the core of the code, where we cast structs and function pointers to void pointers, and then pass those around, with a jurry-rigged method to figure out what they actually contain duplicated in every piece of code that uses. If you scroll up in this 23,000 line file, you'll see eighteen pages of commented out code. Scroll down, and you can see the famous Sea of TODO Notes -- the only place in the file in which comment are actually associated with descriptive text. Unfortunately, most of them contain only the word 'Fixme'. Now, on to the diverse species of macros you'll find scattered about, defined and redefined throughout the code..."
Re:"almost" correct (Score:2, Insightful)
Why should the bank even care? I don't even remember the last time I balanced my checkbook.
"Almost correct" is someone being spineless.
I just measured the hight of a tree with a meter long chunk of 2x4 and a bubble protractor. I get a figure of 10 meters. How many feet is that? 32.808399 is not the right answer. Using it is likely to result in your shell missing the top of the tree. 30 is the right answer. Why?
Neither you nor you wife is correct, or incorrect either. Define what "correct" means and define the degree of incorrectness and precisely why it is incorrect.
Arithmatic is exact, the things you use it to model often are not. Modeling states and calculation of figures are two seperate acts and skills. They both need to be taught and understood.
Telling me that I'm stooopid is a personal attack; telling me my calculation is incorrect is a statement of fact. Folks need to learn that the latter statement isn't necessarily a bad thing.
Here I am with you 100%.
KFG
Re:We need Computer Engineering, not Scientists. (Score:3, Insightful)
We're never going to improve as long as people insist on comparing software development to building bridges, i.e. a more sophisticated understanding of the problem is needed. In software, once you have a program for a bridge you can make a billion bridges, all alike or customized by certain parameters, just by running the program. So being "able to build the same damn bridge 100 times" doesn't get you anywhere. Making it better and safer each time? That's another story, and once again, the comparison to bridge building doesn't hold up, because you're talking about improving the design, not the building practices or materials.
If there was any merit in this canard, don't you think that before now, you'd have had some engineers who also knew software come along and revolutionize the software industry?
You haven't written a line of code in your life, have you? If you have, tell me what level of standardization you're even talking about, in the software context.
/. is Populated by self-righteous ignorami (Score:1, Insightful)
First of all, what was the move for personal gain that caused this? A software bug? Typically those are accidental, and they are most certainly not limited to NASA. Do you have some evidence of underhanded action happening in this case? No? Didn't think so.
Secondly, what is the rapid demise you're referring to? Do you realize that the last 10 years have seen a brilliant upswing for NASA? With the exception of the unfortunate Columbia tragedy, which itself opened a lot of eyes and spurred many improvements within NASA and especially the manned space program, successful missions have been practically hand-delivered to the American people. Let me name a few: Stardust, Mars Rovers, Mars Reconnaisance Orbiter, Odyssey, Pathfinder, Deep Space 1, Deep Impact, Spitzer, Cassini, and Clementine. There's quite a few excellent missions coming up soon or en route, too: New Horizons, Messenger, Mars Phoenix Lander, James Webb Space Telescope, Mars Surface Laboratory, and the Lunar Reconnaisance Orbiter.
Third, did you have any clue when you opened your trap that the Mars Global Surveyor completed its mission 5 years ago? Every orbit it made after that and picture it returned was a bonus to the American taxpayers and the global scientific community. MGS mapped the entire planet, much of it twice. NASA had been considering finally shutting down the project to free up resources for newer, higher priority missions, like MRO.
Fourth, what is your brilliant plan of firing everyone going to achieve? It will leave an organization with no one who has a clue what's going on. The people who know how all the missions currently in operation will be gone. There will be no one to train any replacements. People who know the in's and outs of spacecraft design will be sitting at home jobless watching as people who have no experience in space exploration try to start back up from the 1950's. The best you can do is to identify those people who are genuine problems or true underachievers and fire them. Then you get rid of specific problems and motivate everyone else to be straight shooters, without eliminating key talent.
Any questions?