Software Error Likely Killed MGS Spacecraft 199

Posted by kdawson on Thursday January 11, 2007 @11:32AM from the off-by-one dept.

Aglassis writes "NASA investigators have determined that a software update performed in June of 2006 may have doomed the 10-year-old spacecraft. Apparently the software error caused the solar arrays to drive against a mechanical stop which then forced the spacecraft into safe mode. Unfortunately, after that the spacecraft's radiator was pointed at the sun which overheated the battery and destroyed it. Contact was lost with the Mars Global Surveyor spacecraft in November 2006. NASA will form an internal review board to determine formally the cause of the loss of the spacecraft and what remedial actions are needed for future missions."

This discussion has been archived. No new comments can be posted.

Software Error Likely Killed MGS Spacecraft

Load All Comments

Search 199 Comments Log In/Create an Account

Comments Filter:

Don't believe it (Score:5, Funny)

by LiquidCoooled ( 634315 ) writes: on Thursday January 11, 2007 @11:35AM (#17556812) Homepage Journal

I don't believe it.
Its most likely the Martian automated defense system setup just before we sent a probe and destroyed their civilisation [slashdot.org].

Share
twitter facebook
- Should have used Gentoo!! (Score:2)
  
  by Marcion ( 876801 ) writes:
  
  The updates would have been added in a sandbox and then only moved to the main system if they passed all the tests.
  - Re:Should have used Gentoo!! (Score:5, Insightful)
    
    by zootm ( 850416 ) writes: on Thursday January 11, 2007 @12:19PM (#17557462)
    
    No sandbox can avoid the fact that one test was missing.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by bhsurfer ( 539137 ) writes:
      
      Man, I'd feel really super important if I wrote a bug that destructive! I feel so inadequate... I need a hug.
      - Re: (Score:2)
        
        by zootm ( 850416 ) writes:
        
        What you need to do is hold back on producing all those "fun" bugs that we all introduce into systems until you've the reputation as one of the best coders in the world, then go work for NASA and just go wild on some system that won't be used until it's in deep space and you're off working for Google, having destroyed the paper trail.
        
        Re: (Score:2)
        
        by bhsurfer ( 539137 ) writes:
        
        Thats it! Great idea! I'm hiring you as my personal manager.
        [rubs hands together in childlike glee, picturing large & spectacular catastrophes to come]
        
        Re: (Score:2)
        
        by lysergic.acid ( 845423 ) writes:
        
        As far as I'm concerned, the largest space disaster to date is still Leprechaun 4 [imdb.com].
  - Re: (Score:2)
    
    by the_tsi ( 19767 ) writes:
    
    ...But if they installed the update on a gentoo sandbox before installing it on the MGS itself, it wouldn't be compiled for EXACTLY that machine, and as we all know, it's the precise compiling that results in gentoo's 20% performance increase (that and funrolling loops and putting flashy stripes on the computer, along with maybe a 8" exhaust).
  - Re: (Score:2)
    
    by Hatta ( 162192 ) writes:
    
    Isn't Mars one big sandbox [smh.com.au]?
- Where's K'Breel? (Score:3, Insightful)
  
  by Amazing Quantum Man ( 458715 ) writes:
  
  We need his report! Tripmaster Monkey, where are you?
- Re: (Score:2)
  
  by orasio ( 188021 ) writes:
  
  Martians were previoulsy killed by all the MSG [truthinlabeling.org] in the spacecraft
Battery (Score:5, Funny)

by Anonymous Coward writes: on Thursday January 11, 2007 @11:37AM (#17556846)

overheated the battery and destroyed it
Have NASA been using Dell batteries?

Share
twitter facebook
a Technical solution I see: (Score:3, Insightful)

by pilgrim23 ( 716938 ) writes: on Thursday January 11, 2007 @11:37AM (#17556850)

Typical response to a problem: form a committee!

Share
twitter facebook
What is Microsoft wrote it? (Score:5, Interesting)

by quadelirus ( 694946 ) writes: on Thursday January 11, 2007 @11:37AM (#17556852)

One crash in ten years? Why don't the NASA guys write consumer operating systems?

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by the_humeister ( 922869 ) writes:
  
  Because it'd be even less user friendly than Linux. Plus they'd also require people to run 80386 processors with 4 MB memory, if that.
  - Re: (Score:2)
    
    by h2g2bob ( 948006 ) writes:
    
    Well, 4 MB should be enough for anybody
    - - Re:What is Microsoft wrote it? (Score:5, Funny)
        
        by the_humeister ( 922869 ) writes: on Thursday January 11, 2007 @01:04PM (#17558246)
        
        I don't know. And people with their "keyboard" and "mouse." Idiots I say. The only true way to interact with a computer is by plugging wires into the serial port and generating the necessary electrical pulses myself.
        
        Parent Share
        twitter facebook
        
        Luxury! (Score:5, Funny)
        
        by avronius ( 689343 ) * writes: on Thursday January 11, 2007 @02:28PM (#17559686) Homepage Journal
        
        We used to live in a vacuum tube. When the computer was running, and your bit was accessed, you almost had enough light to read by. Mother would disconnect the tube when she went to bed, causing floating point errors for almost eight clock-cycles...
        
        Or at least, that's how I remember it...
        
        Parent Share
        twitter facebook
- Re: (Score:3, Insightful)
  
  by Calinous ( 985536 ) writes:
  
  Why don't computers use NASA-quality hardware, ready for space?
  Why don't all computers use just a single configuration (peripherals, cards, interfaces)?
  
  The purpose of an operating system is so much wider than what the Mars Global Surveyor had to do.
- Re:What is Microsoft wrote it? (Score:5, Insightful)
  
  by edremy ( 36408 ) writes: on Thursday January 11, 2007 @12:36PM (#17557770) Journal
  
  Actually, they buy their OS's off the shelf. (VxWorks for the rovers, for example)
  That said, you could get software written to this level of perfection if you wanted. It's easy- follow the space shuttle's team's example. You have a stable team of mature developers who work reasonable hours. You test the hell out of the software to the point a single bug in a test is reason to redo the software. You run the software on four identical computers and make sure they all agree.
  Then you hire another entire team to write code that does the same thing, but otherwise has no contact with the first team. That software runs on a fifth computer that takes over if something happens to the other four.
  Willing to pay for that?
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by timeOday ( 582209 ) writes:
    
    Anyway, the shuttle flight control is only 420,000 [nap.edu] lines of code (plus another 1.5M of support code). Nothing to sneeze at, but Visa and linux are said [wikipedia.org] to have 50 and 30 million lines of code, respectively. So that's about two orders of magnitude! I'm also willing to bet the flight control software for the Shuttle hasn't changed much over the past 25 years, yet 275 people support it.
    - Re: (Score:2)
      
      by Spikeles ( 972972 ) writes:
      
      Here is some of it http://www.hq.nasa.gov/office/pao/History/computer s/Appendix-II.html [nasa.gov]
  - - Re: (Score:2)
      
      by bill_mcgonigle ( 4333 ) * writes:
      
      Yes I am. Spread the cost over all the servers in the world and the cost would still be far less than the cost of all the crashes, infections, and data corruptions that are due to the sloppy way Microsoft writes and tests operating systems.
      
      But why actually do it when Microsoft can just pocket the money instead?
      
      It's an interesting idea if you could sell a linux distro with that goal in mind (you get nothing for your money but a disc and promise of future R&D). It might be too clever for most to accept.
  - - Re: (Score:2)
      
      by StikyPad ( 445176 ) writes:
      
      Virtual OS's are virtually useless for redundancy because they don't protect against hardware malfunctions such as the unscheduled combustion of a component.
- Re: (Score:2)
  
  by camperdave ( 969942 ) writes:
  
  One crash in ten years? Why don't the NASA guys write consumer operating systems?
  
  "Honey, Is it Verb 37, Noun 40 to start Solitaire, or Verb 40, Noun 37?" [spaceborn.dk]
*phew* (Score:5, Funny)

by Daetrin ( 576516 ) writes: on Thursday January 11, 2007 @11:40AM (#17556886)

NASA investigators have determined that a software update performed in June of 2006 may have doomed the 10-year-old spacecraft. Apparently the software error caused the solar arrays to drive against a mechanical stop which then forced the spacecraft into safe mode.
Glad i'm not the programmer who came up with that bit of code! Their next performace review is going to be _lots_ of fun!

Share
twitter facebook
- Re: (Score:2)
  
  by Intron ( 870560 ) writes:
  
  There goes the SEI level 5 certification...
  - Re: (Score:2)
    
    by timeOday ( 582209 ) writes:
    
    Not at all. SEI has no problem with bugs, so long as you follow an elaborate process to fix them, track them, and reconsider the process that lead to them. It's very process- rather than outcome-oriented.
- Re: (Score:2)
  
  by __aaclcg7560 ( 824291 ) writes:
  
  Actually, a subcontractor will blame another subcontractor for the fault and fighting will break out. NASA will keep peace among the subcontractors by blaming a hacker for mistaking the update as a patch for the Metal Gear Solid vidoe game, and vows not create any acronyms that could be misconstrued as a video game.
  - Re: (Score:2)
    
    by Mister Whirly ( 964219 ) writes:
    
    "a subcontractor will blame another subcontractor for the fault"
    And Slashdot will blame Microsoft, even though they had zero to do with it...
"Safe" mode? (Score:5, Funny)

by Bazman ( 4849 ) writes: on Thursday January 11, 2007 @11:43AM (#17556930) Journal

Funny definition of 'safe mode'. I'd get the main antenna pointing at the earth, the battery radiator pointing away from the sun, and the computer going 'what do I do know, smarty earthlings?' and waiting for a command.

Maybe NASA's 'safe mode' just put 'safe mode' in the corners of all the returned images and did them in 8-bit colour...

Share
twitter facebook
- Bits (Score:2)
  
  by michaelmalak ( 91262 ) writes:
  
  Maybe NASA's 'safe mode' just put 'safe mode' in the corners of all the returned images and did them in 8-bit colour...
  
  I think you meant to say 4-bit color.
- THis is one of the joys of embedded systems (Score:2)
  
  by EmbeddedJanitor ( 597831 ) writes:
  
  In a desktop app you can (generally) hit some sort of assert or exception or whatever and halt the software. The user might get annoyed but nobody gets killed etc.
  In a realtime control system, a fault is a system failure. If there is no backup/recovery procedure then there is no such thing as a "safe mode".
- Re: (Score:2)
  
  by joeljkp ( 254783 ) writes:
  
  Yeah, that's what I was thinking too. When the MESSENGER [wikipedia.org] spacecraft enters safe mode, it'll turn itself so the antenna's toward Earth and its heat shield is toward the sun. It'll even rotate itself during its orbit to keep itself in that position.
  
  Different design decisions, I guess, but it still sounds kind of fishy...
YACCS -Yet Another Computer Corkup in Space (Score:5, Informative)

by Ancient_Hacker ( 751168 ) writes: on Thursday January 11, 2007 @11:45AM (#17556956)
Just one more example of how Computer Science sint quite up to the reliability requirements of Space:
- A missing comma in a Do-loop statement causes the first mission to Mars rocket to go off course and blow up.
- The space-shuttle programs had a race condition that causes the first launch to be scrubbed.
- The space-shuttle re-entry program had one important variable off by a factor of -4, causing rthe first re-entry to be a bit wobbly.
- A Ariane guidance program had multiple basic design glitches that caused the first launch to blow up.
- The F-16 autopilot worked very well, until the plane was deployed to Australia, where on its way there it bounced off the equator.
- The LEM landing program didnt protect itself from spurious radar data, causing the computer to get behind.
Aero and space are very unforgiving of human coding errors.
Share
twitter facebook
- Re: (Score:2, Interesting)
  
  by zyl0x ( 987342 ) writes:
  
  Be careful not to place too much of the blame on us programmers. Most of these crazy "business logic" equations were created by some math genius in another department. Since most of these equations mean nothing to programmers, we make sure we're typing them in correctly, since there's no way we would ever recognize any type of mistake. Most of the time the problem lies with the math guy, who was too lazy to carry a remainder, or who thought the equation was good enough being precise to four decimal places.
  - Re:YACCS -Yet Another Computer Corkup in Space (Score:5, Insightful)
    
    by spun ( 1352 ) writes: <loverevolutionary&yahoo,com> on Thursday January 11, 2007 @12:21PM (#17557510) Journal
    
    In other disciplines, the engineers ARE math guys. Face it, compared to other engineering types, software engineers and programmers are SLOPPY. This is because engineering has thousands of years worth of spectacular cork-ups with enormous death tolls to look back on, and engineering students are (I'm guessing, IANAE) shown horrific, traffic-safetyesque movies like Blood on the Protractor, Slide Rule Massacre, and London Bridge is Falling Down, Killing Litle Johnny's Entire Family.
    
    Maybe we CS types need our own safety movies, perhaps When Buffers Attack!, Threads: Your Parallel Friends or Quagmires of Debugging DOOM?, or maybe Metric or Imperial: You Mean there's a Difference? Or maybe we need to recognize that many of us have the same awesome responsibility that other engineers do of protecting human lives from the consequences of our mistakes. I'm told that this point is hammered home in engineering schools, why not in CS departments?
    
    Parent Share
    twitter facebook
    - Re:YACCS -Yet Another Computer Corkup in Space (Score:4, Funny)
      
      by unix_core ( 943019 ) writes: on Thursday January 11, 2007 @12:35PM (#17557758)
      
      I think I've seen some of those, starring Troy McLure right?
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by spun ( 1352 ) writes:
        
        When I came up with those names, I pictured Troy saying them. Dammit, Phil Hartman, why'd you have to marry a crazy murdering alchoholic bitch?
        
        *Sigh*
    - Re: (Score:3, Insightful)
      
      by caerwyn ( 38056 ) writes:
      
      CS people are math guys too, at least many of us are. That doesn't mean we necessarily have the expertise to validate aerospace control algorithms on the fly- that's why the's an entire discipline of aerospace engineers, because you can't expect all the *other* engineers to have sufficient knowledge.
      
      Things like this are built as teams- and team members have to make certain assumptions about the accuracy of the other team members' work. Those algorithms should have been validated before even being handed off
    - Re:YACCS -Yet Another Computer Corkup in Space (Score:4, Insightful)
      
      by Flavio ( 12072 ) writes: on Thursday January 11, 2007 @02:17PM (#17559450)
      
      In other disciplines, the engineers ARE math guys. Face it, compared to other engineering types, software engineers and programmers are SLOPPY. This is because engineering has thousands of years worth of spectacular cork-ups with enormous death tolls to look back on, and engineering students are (I'm guessing, IANAE) shown horrific, traffic-safetyesque movies like Blood on the Protractor, Slide Rule Massacre, and London Bridge is Falling Down, Killing Litle Johnny's Entire Family.
      
      Engineering and applied mathematics are much more demanding than computer programming. Sure, one could argue that "computer science is math too", but my experience is that CS majors don't graduate with a strong math background. And even if they did once know some calculus and linear algebra, they were never required to apply it like an EE or Applied Math person would.
      
      So while you could find a rigorous programmer or software engineer (and I use the term "software engineer" very loosely, because few individuals actually fit that description), it's often a lot easier to look for an engineer or applied mathematician with good programming skills. Their math and physics is usually significantly stronger, and they actually understand what they're programming.
      
      Parent Share
      twitter facebook
    - We need Computer Engineering, not Scientists. (Score:2)
      
      by Banner ( 17158 ) writes:
      
      What is really needed is to get RID of Computer SCIENCE and move it over to the Engineering department and give us Computer ENGINEERING. Scientists don't build stuff, they investigate things, they don't -care- about better ways to build things, better ways to avoid mistakes, it's not their job. Engineers however are all about building the same damn bridge 100 times and making it better and safer each time.
      
      There is no discipline in Computer Programming these days, because Computer Programmers don't know how
      - Re: (Score:2)
        
        by spun ( 1352 ) writes:
        
        Don't get rid of computer science. Theorists are needed in all fields. Breakthrough advances rarely come from the applied sciences. But I really do wish we CS types could take some lessons from the engineering guys, who have a much longer history.
        
        One reason I love OSS is that its goal is to standardize and reuse code. I hate reinventing the wheel over and over again just because of some dumb non-disclosure clause.
        
        I think part of the problem is that computer programming takes a special blend of language, mat
      - Re: (Score:2)
        
        by radish ( 98371 ) writes:
        
        What is really needed is to get RID of Computer SCIENCE and move it over to the Engineering department and give us Computer ENGINEERING
        
        And in a lot of the top institutions that's exactly what has happened. My degree is an engineering one, not a science.
        
        The simplest program is done differently by every programmer where if engineers were doing it they'd all be taught to do it the exact same way.
        
        If you re-write the exact same code over and over you're an idiot. The problem is that (at an application level) no
      - Re: (Score:3, Insightful)
        
        by alienmole ( 15522 ) writes:
        
        We're never going to improve as long as people insist on comparing software development to building bridges, i.e. a more sophisticated understanding of the problem is needed. In software, once you have a program for a bridge you can make a billion bridges, all alike or customized by certain parameters, just by running the program. So being "able to build the same damn bridge 100 times" doesn't get you anywhere. Making it better and safer each time? That's another story, and once again, the comparison t
    - Re: (Score:3, Insightful)
      
      by Rei ( 128717 ) writes:
      
      To put the shoe on the other foot, have you ever seen software written by people who aren't programmers? Uck. The code is usually a nightmare. Things like:
      
      "Well, here we're using the global "qzv" as a loop variable, but over here we'll use it to mean how many widgets we're looking at, and over here, it's our exit condition. Oh, and we'll set it to '5' over here for no discernable reason. Now, here's where we've cut and pasted the code 15 times so that we could change one variable's type (instead of usi
    - Re: (Score:2)
      
      by lysergic.acid ( 845423 ) writes:
      
      just don't be spun when you write that shuttle life-support system code, that's all... ;-)
    - - Re: (Score:2)
        
        by spun ( 1352 ) writes:
        
        This is a very important point, one I allude to in my comment. Engineers have tradition to go on, thousands of years of experience with what works and what doesn't. Cs is too new, we're in a stage like that of Egyptian engineers when they first started putting up the pyramids. "Okay, so a 45 degree angle didn't work, lets try 37..." Give us a few thousand years like you guys have had and I think we'll do okay ;)
- Re: (Score:2)
  
  by shawn(at)fsu ( 447153 ) writes:
  
  It's not like the only problems with air and space vehicles have been caused by coding errors, I'm sure engineering has done fairly well for it self too.
- Re: (Score:2)
  
  by MBCook ( 132727 ) writes:
  
  Like the F16 thing. Let's not forget that the shuttle has NEVER been in space during a new-years. It is untested (at least in space) and they are not positive what will happen. That's why they were worried in December, they didn't want bad weather to force the shuttle to stay in space during the transition.
  - Re: (Score:2)
    
    by camperdave ( 969942 ) writes:
    
    Let's also not forget that the shuttle was DESIGNED FROM DAY ONE NOT TO BE IN SPACE DURING NEW YEARS. Too many support staff are on holidays during that time to make it safe.
    - Re: (Score:2)
      
      by wik ( 10258 ) writes:
      
      Houston, we have the solution [blogspot.com].
- Re: (Score:3, Insightful)
  
  by januth ( 1000892 ) writes:
  
  I wouldn't call it a failure of Computer Science; it's a QA failure without a doubt.
  
  Mistakes happen when you code. Sure, you try to minimize them but even the most carefully designed code can't be guaranteed to be 100% error free. That's why you employ, presumably, a top-notch QA team to check and recheck, testing your "perfect" code in ways that perhaps you never even considered.
  
  This is what you would expect in a terrestrial application. When the platform that your code is going to run on isn't bound to th
  - Re:YACCS -Yet Another Computer Corkup in Space (Score:5, Insightful)
    
    by Mayhem178 ( 920970 ) writes: on Thursday January 11, 2007 @12:48PM (#17557946)
    
    For the uninformed, QA = Quality Assurance. A must-have for any self-respecting software model.
    
    NASA has got it rough, has since the mid 70s. Their wildest successes are regarded as routine and hardly noticed by the public eye. Their failures, on the other hand, are spun to be the worst disasters in human history. Granted, when shuttles explode and people die, it's reasonable that the public be concerned. But it seems to me that for every 20 great things that NASA accomplishes, the media picks 1 failure (and sometimes blows that failure out of proportion) to rile the masses into a furious frenzy calling for the dissolution of NASA.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by Chris Burke ( 6130 ) writes:
      
      NASA has got it rough, has since the mid 70s. Their wildest successes are regarded as routine and hardly noticed by the public eye. Their failures, on the other hand, are spun to be the worst disasters in human history. Granted, when shuttles explode and people die, it's reasonable that the public be concerned. But it seems to me that for every 20 great things that NASA accomplishes, the media picks 1 failure (and sometimes blows that failure out of proportion) to rile the masses into a furious frenzy calli
  - Re: (Score:2)
    
    by DerekLyons ( 302214 ) writes:
    
    wouldn't call it a failure of Computer Science; it's a QA failure without a doubt.
    
    Mistakes happen when you code. Sure, you try to minimize them but even the most carefully designed code can't be guaranteed to be 100% error free. That's why you employ, presumably, a top-notch QA team to check and recheck, testing your "perfect" code in ways that perhaps you never even considered.
    And even then - there exists the non-trivial possibility that something might slip through. No QA is ever going to reach
- Reliability compared to what? (Score:2)
  
  by Vellmont ( 569020 ) writes:
  
  Just one more example of how Computer Science isn't quite up to the reliability requirements of Space
  
  And how many failures have happened because of an enginering mistake?
  
  You seem to assume that there's zero failure in space for everything else, and 6 problems in.. 30 years? is some horrible record.
  
  All information only makes sense in context. What's the failure rate of other components of the system?
- Re:YACCS -Yet Another Computer Corkup in Space (Score:5, Informative)
  
  by Fishbulb ( 32296 ) writes: on Thursday January 11, 2007 @12:50PM (#17557984)
  
  The F-16 didn't "bounce off the equator". Before it ever flew, in simulation the computer flipped the plane over when it crossed the equator due to a bug that incorrectly handled southern lattitudes. Additionally, since the computer "flip" happened instantaneously, and the f-16 can roll at much higher G forces than the pilot can take, the flip would have killed the pilot (and the F-16 would have happily continued on its way).
  
  http://portal.acm.org/ft_gateway.cfm?id=163293&typ e=pdf&coll=GUIDE&dl=GUIDE&CFID=11154656&CFTOKEN=19 136062 [acm.org]
  
  Parent Share
  twitter facebook
  - Nope. (Score:2, Informative)
    
    by Anonymous Coward writes:
    
    Additionally, since the computer "flip" happened instantaneously, and the f-16 can roll at much higher G forces than the pilot can take, the flip would have killed the pilot
    
    A single, half-roll to inverted in the Falcon wouldn't have exerted enough Gs on the pilot to do anything worse than to exclaim WTF!, and disengage the a/p. A roll in and of itself in an aircraft doesn't really induce much Gs.... a "bank-and-yank" turn does, and that's what the F16 can do at higher Gs than the pilot can take... not the r
  - Re: (Score:3, Informative)
    
    by Ancient_Hacker ( 751168 ) writes:
    >Additionally, since the computer "flip" happened instantaneously, and the f-16 can roll at much higher G forces than the pilot can take, the flip would have killed the pilot.
    Well your whole post is called into question due to quite a few questionable items:
    
    It seems unlikely that the lattitude would enter at all into any calculation of roll attitude. If so, it's more than a "bug", it's a basic design mistake.
    The F-16 does have a high roll rate, about 320 degrees per second, but since the pilot is
- Re: (Score:2)
  
  by HangingChad ( 677530 ) writes:
  
  And don't forget the Mars Climate Orbiter "Dirt Dart" mission (http://en.wikipedia.org/wiki/Mars_Climate_Orbite r ). Okay the operators helped by plugging in the wrong units but neither did the software catch the discrepancy in the values.
  The systems aboard the spacecraft were not able to reconcile the two systems of measurement, resulting in the navigation error.
  Operator error but it would be interesting to figure in the number of accidents that the software could have prevented the operator from ente
  - Re: (Score:3, Insightful)
    
    by Minwee ( 522556 ) writes:
    
    Okay the operators helped by plugging in the wrong units but neither did the software catch the discrepancy in the values.
    
    "On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
    Plus ça change, plus c'est la même chose.
- Re: (Score:2)
  
  by ChrisA90278 ( 905188 ) writes:
  
  The problem is the inability to test the software in a realistic environment. In fact you CAN'T fully test software. For example let's say you write a program to add to numbers and print the sum. Very simple program but all you can do is "spot check" it with a few test numbers. for example I doubt testing would catch the bug in the following program
  get a value for "A"
  get a value for "B"
  if (a == 3248532346863247) Add 3 to A
  print (A+B)
  
  What or the chances you would use 3248532346863247 as a test
- Emulation? Testing? (Score:2)
  
  by SanityInAnarchy ( 655584 ) writes:
  
  Subject says it all... When developing for, say, the OLPC, or handheld computers (or PDAs, or smartphones, hell, even the iPhone), you either actually run everything on the device before shipping it to consumers, or (more likely) you emulate the embedded device on your desktop, so you can dig into the guts of it with a debugger, and then you test it on the device anyway.
  
  Why is it that the iPod, hell, even my Java phone is more reliable than these aerospace things?
- Re: (Score:2)
  
  by Mike1024 ( 184871 ) writes:
  
  Don't forget:
  
  Mars lander Spirit started randomly rebooting [wikipedia.org] due to a flash memory access problem.
  
  Mars Polar Lander was lost - there's no definitive proof but it's thought [wikipedia.org] it was a software error (sensing leg deployment ready for landing as actual landing, and hence deactivating thrust too early).
  
  Mars climate orbiter? Chalk that one up to a metric/imperial conversion error [wikipedia.org]... in software.
  
  Of course, the argument could be made that there's no real alternative to doing it in software...
  
  Michael
- - Re: (Score:2)
    
    by camperdave ( 969942 ) writes:
    
    Actually, the "warranty period" was only 90 days. The rovers are now entering the fourth year of operation. Kudos to the design and build teams (JPL, I think).
MGS spaceship? (Score:2)

by DrXym ( 126579 ) writes:

Perhaps Big Boss killed it
MGS? (Score:2)

by RobTFirefly ( 844560 ) writes:

Everyone knows, it was Solid Snake that destroyed Metal Gear Solid.
We hardware types always blame software (Score:2)

by Quiet_Desperation ( 858215 ) writes:

It's just the way of the world. :)
- Re: (Score:2)
  
  by daniel23 ( 605413 ) writes:
  
  reminds me of that old sig:
  
  The 3 most dangerous situations:
  
  A hardware guy with a software patch.
  A user with an idea.
  A coder with an electric iron.
Pilot said.... (Score:2, Funny)

by isieo ( 1049808 ) writes:

Houston, I B.S.O.Ded
Is this a sign? (Score:5, Insightful)

by Billosaur ( 927319 ) * writes: <wgrother@NoSPam.optonline.net> on Thursday January 11, 2007 @12:07PM (#17557284) Journal

Some expert is always trumpeting the fact that "Johnny can't program," to which many of us roll our eyes and go back to coding. But could this be a sign that the quality of the help NASA is hiring is such that these kinds of mistakes are now rampant? I mean, this could have been avoided if the code had been tested out on a full-scale mock-up of the machine, to verify that it did what it was supposed to do, before ever sending the commands to the actual machine. If anything, it's a QA failure.

Share
twitter facebook
- Re:Is this a sign? (Score:5, Insightful)
  
  by benevixit ( 754447 ) writes: on Thursday January 11, 2007 @12:43PM (#17557856)
  
  In all fairness, writing code for a spacecraft is a lot harder than most of our Earthbound coding projects. These are custom-built machines running one-of-a-kind hardware; one can simulate components independently but it's very difficult to figure out how the hardware is going to behave up there in the vacuum. For example, consider the one function of maintaining orientation. Most spacecraft use telescopes that look for star reference points. They look for particular star configurations and use microthrusters or gyroscopes to adjust their orientation. Imagine what it would take to simulate this: a zero-gravity vacuum with a realistic star-field at focus=infinity. Any laboratory mock up is going to cost a lot more than launching a new spacecraft. And that's just one subsystem. Software upgrades at NASA go through a really rigorous quality control regimen, often requiring programmers to justify _individual_lines_ of their code to a review committee. Even then they usually won't patch noncritical bugs until the primary mission is completed. I think your point is a good one. And the key lesson is not that NASA QA sucks, it's that programming for spacecraft is _tough_. I know they are constantly investigating new ways (like more standardization, code re-use, and formal verification procedures) of improving software reliability.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by ivan256 ( 17499 ) writes:
    
    You also need to take into account that the error could be something like failing to deal with a broken component. They may not have known something was broken, and things don't always break in a predictable way. It's still a software bug for not properly handling an error condition, but some error conditions are unlikely to be predicted.
- Re: (Score:2)
  
  by Zontar_Thing_From_Ve ( 949321 ) writes:
  
  Some expert is always trumpeting the fact that "Johnny can't program," to which many of us roll our eyes and go back to coding. But could this be a sign that the quality of the help NASA is hiring is such that these kinds of mistakes are now rampant? I mean, this could have been avoided if the code had been tested out on a full-scale mock-up of the machine, to verify that it did what it was supposed to do, before ever sending the commands to the actual machine. If anything, it's a QA failure.
  
  I used to wor
- one serious error in ten years rampant? (Score:2)
  
  by peter303 ( 12292 ) writes:
  
  My pc doesnt last ten days before crashing.
- Re: (Score:2)
  
  by Spikeles ( 972972 ) writes:
  
  Some expert is always trumpeting the fact that "Johnny can't program,"
  His name isn't "Johnny", It's N. Pence - Flight Software http://mars.jpl.nasa.gov/mgs/people/ [nasa.gov]
Better than a metric-English conversion error (Score:3, Insightful)

by ccmay ( 116316 ) writes: on Thursday January 11, 2007 @12:11PM (#17557344)

I guess those things happen. But at least it wasn't an error converting units, like the other Mars spacecraft that was lost. That is just incredibly stupid. Glad I'm not the "engineer" who wasted thousands of man-years and hundreds of millions of taxpayers' dollars because I was too stupid or lazy to convert between meters and feet.
On a positive note, it has provided me an instructive example for when I help my teenagers with their math homework. If they say it's "almost" correct, I tell them that the guy who screwed up the Mars mission probably said the same thing.
-ccm

Share
twitter facebook
- Re: (Score:2, Insightful)
  
  by kfg ( 145172 ) writes:
  
  If you wish them to grow up to be good little engineers; ask them to define how "almost" correct it is.
  
  KFG
  - Re:"almost" correct (Score:2)
    
    by Migraineman ( 632203 ) writes:
    
    Funny, I have this conversation with my wife all the time. She's an elementary school teacher, and we regularly butt heads about how to deal with this. She's willing to grade a math problem as "correct" if the student demonstrated the correct process, but made a simple clerical error resulting in the wrong answer. She argues that the method is more important than a single result. Uh huhhh. So if I botch the balance in my checkbook, the bank will pat me on the head, say "that's okay," and front me the m
    - Re: (Score:2, Insightful)
      
      by kfg ( 145172 ) writes:
      
      So if I botch the balance in my checkbook, the bank will pat me on the head. . .
      
      Why should the bank even care? I don't even remember the last time I balanced my checkbook.
      
      "Almost correct" is someone being spineless.
      
      I just measured the hight of a tree with a meter long chunk of 2x4 and a bubble protractor. I get a figure of 10 meters. How many feet is that? 32.808399 is not the right answer. Using it is likely to result in your shell missing the top of the tree. 30 is the right answer. Why?
      
      Neither you nor yo
    - Re: (Score:2)
      
      by zippthorne ( 748122 ) writes:
      
      Which is why you don't necessarily assign points at the per-question level of granularity. If she gave partial credit for partially correct problems her students would still feel the burn of missing part of the problem (the actually doing the math part correctly).
      
      And before you say it's all wrong if part of it's wrong, think about applying that standard to the entire assignment and you'll realize how specious it is.
      
      As a lab instructor, I've even had to mark things wrong which have the answer correct: there
      - Re: (Score:2)
        
        by Migraineman ( 632203 ) writes:
        
        I've got no problem with partial credit in cases with clerical errors. I have a ton of problem when it's graded as correct when it isn't. There's a certain amount of discipline required to solve problems. If you're sloppy, you'll make simple errors, but the result is still wrong.
        
        Calculation errors, process errors, logistics errors ... they're all part of the real world. The teacher who doesn't mark the simple errors is doing two things - providing an oversight function that won't always be availble, a
    - Re: (Score:2)
      
      by alienmole ( 15522 ) writes:
      
      Your wife is correct that understanding the process is much more important than not making clerical errors. Clerical errors can always be caught by cross-checking, but if you don't understand the process, you can't get anywhere. Math tests are artificial: you're not trying to build a spaceship, you're trying to test whether someone has learned something. In a more realistic situation, cross-checks would occur, and you'd have time to correct errors.
      
      In an ideal situation, a large majority of a grade ought
      - Re: (Score:2)
        
        by Migraineman ( 632203 ) writes:
        
        I hate to do this, but I'm block-copying from another reply I made because I think it's relevant:
        I've got no problem with partial credit in cases with clerical errors. I have a ton of problem when it's graded as correct when it isn't. There's a certain amount of discipline required to solve problems. If you're sloppy, you'll make simple errors, but the result is still wrong.
        
        Calculation errors, process errors, logistics errors ... they're all part of the real world. The teacher who doesn't mark the simpl
- Re:Better than a metric-English conversion error (Score:5, Informative)
  
  by iamlucky13 ( 795185 ) writes: on Thursday January 11, 2007 @03:49PM (#17561504)
  
  It wasn't one engineer. It was a team effort. And it wasn't a very simple matter of "forgetting". Several factors combined, including re-use of code from the MGS mission (a conversion factor was in the old code, but not recognized when the code was adapted for the doomed MCO) and budget constraints that limited pre-flight testing (so bug was missed...and in fact might have still been missed even with more testing). The effects of the bug were also subtle enough that 3 minor main engine firings were conducted without enough error showing up to reveal the problem. It wasn't until the long orbital insertion firing that the error in the trajectory became noticeable, and by then it was too late. The team's first clue something was wrong was when the spacecraft didn't radio home after the engine burn.
  
  The details are really convoluted, but the Wikipedia page [wikipedia.org] on the mission has a decent write up explaining how the mistake was made, with additional resources cited. The PDF paper giving a perspective from the MCO team is particularly revealing, if you've got some time on your hands.
  
  Parent Share
  twitter facebook
So what if the battery is dead? (Score:2)

by Viol8 ( 599362 ) writes:

Surely it can still function on its solar arrays when its on the daylight side of the planet? Or would it drift too much out of alignment when in the dark? Or is there some other issue?
- Re: (Score:3, Insightful)
  
  by smoker2 ( 750216 ) writes:
  
  I expect the electronics runs off the battery, and the solar just charges the battery. If the battery's dead, nothing will run.
- Re: (Score:2)
  
  by Beryllium Sphere(tm) ( 193358 ) writes:
  
  The battery's failure mode matters. If it has an internal short, nothing will help.
- - Re: (Score:2)
    
    by Viol8 ( 599362 ) writes:
    
    Actually it clearly states it cooked the battery not the whole craft. So why don't you go RTFA instead of attempting some lame karma whoring you cretin.
zing! (Score:2, Funny)

by steak ( 145650 ) writes:

that was the sound of me hitting the bullseye.

[quote]at least if something went wrong some guy at nasa could tell his grand kids that he bricked something from ~140 million miles away.[/quote]

http://slashdot.org/comments.pl?sid=214508&cid=174 27542 [slashdot.org]
Time for a recall of bad parts (Score:2)

by Fry-kun ( 619632 ) writes:

Does anyone else think it's about time to make a small satellite with a few "claws" to fly around our existing satellites and replace their various parts?
It could probably do repairs to the ISS as well (spacewalks should be for fun, not for work).
Vampire Hackers (Score:2)

by Doc Ruby ( 173196 ) writes:

No, everyone knows it's the Martian vampires. That SW glitch pointed the solar collectors at the Martian surface, overpowering the thin layer of blood that protects the biters from the weak rays of the Sun. We need to find out how the vampires reached the MGS to destroy it. Probably they have moles at NASA or a contractor with access to the controllers. We have to fund deployment of my SOLASER Space Debt Inc (SDI) weapon to fry them before they fry us.
An easy fix... (Score:2)

by Autonomous Crowhard ( 205058 ) writes:

Just have a nearby human replace the dead battery and restart the machine.
Oh... right... manned exploration is a waste of money and robots are all we ever need.
Lack of QA strikes Nasa AGAIN. (Score:2)

by Banner ( 17158 ) writes:

I tell you, you see all these ridiculous failures at NASA, it's pretty obvious that they either don't do QA, or that the QA teams are literally hamstrung. These things are the stuff that good QA and Test programs find, making people check bolts on a tilt table before ruining a 50 millon dollar satellite are what process and checklists are all about.

These aren't 'normal workplace errors' that you have to live with, they're -stupid- errors, made because of stupid managers.
- Re: (Score:2)
  
  by JhohannaVH ( 790228 ) writes:
  
  And this is different from any other US Corporation how?
  
  Never forget, what we use, abuse, and refuse, all is created by human hands. Which by design are imperfect.
MGS was currently a low priority for NASA (Score:2, Interesting)

by jespley ( 1006115 ) writes:

I'm a scientist that works with the MGS data so I don't know the engineering side well. However, I do know that last year NASA was strongly considering dropping all support for MGS in order to spend the limited Mars program money on newer missions (the idea being that we had gotten 90% of the useful science from MGS). Instead they decided to keep MGS funded with a bare minimum of money and hence a bare minimum number of personnel. I imagine that the poor overworked engineers running the operational show a
Milestone (Score:2)

by Hugonz ( 20064 ) writes:

So now we can milestone the first paperweight in space...
O_O (Score:2, Funny)

by Vacardo ( 1048640 ) writes:

Well, that's that tops my list on "Worst Times to Get the Blue Screen of Death".
More propaganda (Score:2)

by dreemernj ( 859414 ) writes:

When are they going to admit the truth of how this was destroyed? Oh well, we'll all know once Megatron lays seige to the earth for our delicios oil and rubies and everything else that can be made into energon cubes.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Don't believe it (Score:5, Funny)

Should have used Gentoo!! (Score:2)

Re:Should have used Gentoo!! (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Where's K'Breel? (Score:3, Insightful)

Re: (Score:2)

Battery (Score:5, Funny)

a Technical solution I see: (Score:3, Insightful)

What is Microsoft wrote it? (Score:5, Interesting)

Re: (Score:3, Informative)

Re: (Score:2)

Re:What is Microsoft wrote it? (Score:5, Funny)

Luxury! (Score:5, Funny)

Re: (Score:3, Insightful)

Re:What is Microsoft wrote it? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

*phew* (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

"Safe" mode? (Score:5, Funny)

Bits (Score:2)

THis is one of the joys of embedded systems (Score:2)

Re: (Score:2)

YACCS -Yet Another Computer Corkup in Space (Score:5, Informative)

Re: (Score:2, Interesting)

Re:YACCS -Yet Another Computer Corkup in Space (Score:5, Insightful)

Re:YACCS -Yet Another Computer Corkup in Space (Score:4, Funny)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:YACCS -Yet Another Computer Corkup in Space (Score:4, Insightful)

We need Computer Engineering, not Scientists. (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:YACCS -Yet Another Computer Corkup in Space (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Reliability compared to what? (Score:2)

Re:YACCS -Yet Another Computer Corkup in Space (Score:5, Informative)

Nope. (Score:2, Informative)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2)

Emulation? Testing? (Score:2)

Re: (Score:2)

Re: (Score:2)

MGS spaceship? (Score:2)

MGS? (Score:2)

We hardware types always blame software (Score:2)

Re: (Score:2)

Pilot said.... (Score:2, Funny)

Is this a sign? (Score:5, Insightful)

Re:Is this a sign? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

one serious error in ten years rampant? (Score:2)

Re: (Score:2)

Better than a metric-English conversion error (Score:3, Insightful)

Re: (Score:2, Insightful)

Re:"almost" correct (Score:2)

phew (Score:5, Funny)