Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Mars Global Surveyor Died from Single Bad Command 141

Posted by Zonk on Saturday April 14, 2007 @04:33AM from the damn-space-bugs dept.

wattsup writes "The LA Times reports that a single wrong command sent to the wrong computer address caused a cascade of events that led to the loss of the Mars Global Surveyor spacecraft last November. The command was an orientation instruction for the spacecraft's main communications antenna. The mistake caused a problem with the positioning of the solar power panels, which in turned caused one of the batteries to overheat, shutting down the solar power system and draining the batteries some 12 hours later. 'The review panel found the management team followed existing procedures in dealing with the problem, but those procedures were inadequate to catch the errors that occurred. The review also said the spacecraft's onboard fault-protection system failed to respond correctly to the errors. Instead of protecting the spacecraft, the programmed response made it worse.'"

This discussion has been archived. No new comments can be posted.

Mars Global Surveyor Died from Single Bad Command

Load All Comments

Search 141 Comments Log In/Create an Account

Comments Filter:

It wasn't a single wrong command (Score:4, Informative)

by 91degrees ( 207121 ) writes: on Saturday April 14, 2007 @04:40AM (#18729745) Journal

It was a whole series of errors. Either that or every accident ever is caused by a single minor fault. Here's what the article says
The review panel found that the management team followed procedures in dealing with the problem but that the procedures "were inadequate to catch the errors that occurred."

The review also said the spacecraft's onboard fault protection system failed to respond to the errors. Instead of protecting the spacecraft, the programmed response made it worse.

So, if the procedures were better, this wouldn't have happened. If the fault protection system was better, this wouldn't have happened. If the designers had predicted this exact problem might occur this wouldn't have happened.

Of course, these things do happen. Al we can do is find out why, and stop it from happening again.

Share
twitter facebook
- Re: (Score:2)
  
  by cheater512 ( 783349 ) writes:
  
  Yes but one thing was bad and everything else did its job and the whole thing fell apart.
  - Re: (Score:2)
    
    by Poltras ( 680608 ) writes:
    
    Yes, one minor error and everything else went according to plan. gnak gnak gnak
- Re: (Score:1)
  
  by maestroX ( 1061960 ) writes:
  
  Of course, these things do happen. Al we can do is find out why, and stop it from happening again.
  
  Misalignment of the solar panels should have been handled properly in *any* case, as the machine is relying on solar power mainly. These things do happen at a local car shop -- we're talking NASA with smart people.
  - Re: (Score:3, Insightful)
    
    by Jerry Beasters ( 783525 ) writes:
    
    Smart people make mistakes, deal with it.
  - Re: (Score:2)
    
    by TapeCutter ( 624760 ) writes:
    
    Yep smart people allright, and carefull too. It was planned to last two years and they ended up crashing it by accident on the tenth. Ten years is a long time, maybe they got bored and wanted to go map another planet?
- Re:It wasn't a single wrong command (Score:5, Funny)
  
  by roaddemon ( 666475 ) writes: on Saturday April 14, 2007 @06:57AM (#18730273)
  
  "Either that or every accident ever is caused by a single minor fault."
  
  I agree. Otherwise WWII was caused by Hitler's mom having one too many drinks the night she met his dad.
  
  Parent Share
  twitter facebook
  - Re:It wasn't a single wrong command (Score:4, Funny)
    
    by Darth_brooks ( 180756 ) writes: <clipper377@nOSpaM.gmail.com> on Saturday April 14, 2007 @07:42AM (#18730523) Homepage
    
    I agree. Otherwise WWII was caused by Hitler's mom having one too many drinks the night she met his dad.
    
    How can you come up with such a woefully shortsighted and limited in scope analysis? Honestly. There are at least two theories to work under for the cause of World War II.
    
    WWII was caused by a series of reactions several billion years ago between amino acids. Or it was started 5000 years ago when God created Eve for Adam. Everything else in between is just a smattering of minor details.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by The_Wilschon ( 782534 ) writes:
    
    Does this invoke Godwin's Law? IANALawyer, so I'm not sure of the applicability in this situation.
    - Re: (Score:2)
      
      by networkBoy ( 774728 ) writes:
      
      I don't think so, as it was not used so much as an attack as a funny... borderline though.
      -nB
      - Re: (Score:2)
        
        by toddestan ( 632714 ) writes:
        
        Godwin's law is simply that Nazi Germany will eventually come up in any discussion on the internet, given enough time. It says nothing about context. So it does apply here.
    - Re: (Score:2)
      
      by Boronx ( 228853 ) writes:
      
      Does falling on your face invoke the law of gravity?
- Re:It wasn't a single wrong command (Score:5, Interesting)
  
  by MichaelSmith ( 789609 ) writes: on Saturday April 14, 2007 @07:47AM (#18730561) Homepage Journal
  
  So, if the procedures were better, this wouldn't have happened. If the fault protection system was better, this wouldn't have happened. If the designers had predicted this exact problem might occur this wouldn't have happened.
  
  TFA:
  
  over the years budgets and staff had been cut "in an effort to operate the mission as economically as possible."
  
  MGS was well into bonus time in the sense that the original goals had been reached. The project was running on a reduced budget and this made a mistake inevitable. I can't help thinking that at a higher level this was considered to be a good thing. When you have new missions to run and a fixed budget to run them on you want your old missions to stop so that you can draw a line under it and go on to the next thing.
  
  The last thing management want is to have to decide to shut the spacecraft down because they don't have the budget for operations on the ground. Reducing the budget is a way of inducing the shutdown.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by DerekLyons ( 302214 ) writes:
    
    The last thing management want is to have to decide to shut the spacecraft down because they don't have the budget for operations on the ground.
    
    A nice theory, but one that fails to coincide with the facts. NASA routinely shuts down missions for lack of budget.
  - Re: (Score:2, Interesting)
    
    by osu-neko ( 2604 ) writes:
    
    Score: 5 Interesting for a really, really lame theory?
    
    There are many ways to end a mission. The one best for NASA is to close it, point to its budget, and wait for the cries of underfunding, using the closure of a perfectly good mission as evidence that the agency is truly underfunded based on its needs.
    The worst is for the mission to fail in some spectacular fashion, making people wonder if they should be giving these bozos any more money.
    So, you're telling me you think NASA would intentionally end a m
  - Re: (Score:2)
    
    by Nethead ( 1563 ) writes:
    
    You first name wouldn't happen to be Valentine? Maybe you have a hidden agenda in wanting the MGS to fail. Hitting a little close to home, maybe?
Emmentaler vs. Gruyere (Score:5, Insightful)

by DingerX ( 847589 ) writes: on Saturday April 14, 2007 @04:42AM (#18729753) Journal

One bad command started the chain, but it needed a series of system failures to kill it. In other words, a slight misalignment of the solar panels (or whatever it was) may have been a necessary cause, but not sufficient. The thing needed a safe-mode that wasn't safe, and battery logic that failed to consider environmental variables. All the conditions lined up.

It's like saying that a mid-air collision occurred because two jetliners were assigned the same altitude and jetway in opposite directions at the same time. Yeah, but A) How they got that assignment is kinda complicated and B) any number of traffic control and collision avoidance systems have to fail too.

Share
twitter facebook
- Re: (Score:2, Funny)
  
  by GTMoogle ( 968547 ) writes:
  
  Oh, Gruyere, definitely. Emmentaler is fine, but certainly its popularity can be attributed merely to the region's inadequate defense of the name, allowing cheap knockoffs to proliferate.
  
  *cricket* *cricket*
  
  What?
- Re: (Score:3, Insightful)
  
  by v1 ( 525388 ) writes:
  
  In most complex problems where catastrophic failure occurs, the problem manifests as a result of multiple smaller failures that combine in an unfortunate way, or as a chain reaction. By nature, people will want to narrow down the problem so they can identify a "cause". This is sometimes not appropritate as we see here, where a collection of less critical failures lead to catastrophy, any of which having been avoided would have prevented disaster. It's a bit like team theory... after losing a game the coa
- - Re: (Score:2)
    
    by DingerX ( 847589 ) writes:
    
    When the anti-collision system kicks in and issues a Resolution Advisory, it's because ATC has failed. When the box on board effectively says "ATC has failed in its task, please start climbing." then "Dude, ATC has failed, climb a hell of a lot more, now", only more laconically and imperatively, you don't continue to listen to ATC and dive.
    
    So, yeah, like Ueberlingen, here you had a chain of events that results in a catastrophic failure. Was a "bad command" to blame? Only if the system has zero tolerance fo
That'll Teach 'Em (Score:5, Funny)

by Anonymous Coward writes: on Saturday April 14, 2007 @04:42AM (#18729755)

That'll teach those NASA folks to stop just using "sudo" when a command doesn't work under regular user permissions...

Share
twitter facebook
- Re:That'll Teach 'Em (Score:5, Funny)
  
  by Tibor the Hun ( 143056 ) writes: on Saturday April 14, 2007 @09:00AM (#18731043)
  
  You've just lost thousands of Windows folks...
  su..do...?
  
  Parent Share
  twitter facebook
  - Re:That'll Teach 'Em (Score:4, Funny)
    
    by Anonymous Coward writes: on Saturday April 14, 2007 @09:42AM (#18731343)
    
    ku!
    
    Parent Share
    twitter facebook
  - Re:That'll Teach 'Em (Score:4, Funny)
    
    by Spudtrooper ( 1073512 ) writes: on Saturday April 14, 2007 @11:07AM (#18732083)
    
    Mars Global Surveyor wants to commit seppuku: Cancel or Allow?
    
    Parent Share
    twitter facebook
Oblig. (Score:3, Funny)

by TehBlahhh ( 947819 ) writes: on Saturday April 14, 2007 @04:44AM (#18729759)

It was the Tamil Tigers that hacked it, and inserted this insidious command! The threat of terrorists is everywhere! This would have been preveneted if we had kept up the war on terror.

Share
twitter facebook
- Re: (Score:1, Funny)
  
  by Anonymous Coward writes:
  
  NASA has been militarized, and given the singular task of bringing Democracy to Mars.
- Re: (Score:2)
  
  by Dogtanian ( 588974 ) writes:
  
  Yes, you were joking; but I've said this before. What's to stop hostile parties from hacking, DOSsing or simply hijacking your average space probe?
  - Re: (Score:1)
    
    by maxume ( 22995 ) writes:
    
    $$$. I have no idea what hitting the right spot with a signal strong enough to cause problems(let alone with properly encoded commands) would take, but it strikes me as likely being non-trivial, with little benefit gained(because the bad publicity can be controlled by claiming that there was a measurement problem or whatever).
  - Re: (Score:2)
    
    by WindBourne ( 631190 ) writes:
    
    The lack of an antenna with enough power to send the commands? Not knowing the sequence to start it? Not knowing exactly where the probe is? I would be VERY surprised if anybody would spend the millions required to pull off something like this, except for maybe another country. And that would then amount to an attack on America. While this admin is incapable of doing a war correctly, I suspect that damn few countries want the havoc that we seem to cause.
    - Re: (Score:2)
      
      by Dogtanian ( 588974 ) writes:
      
      When this thought occurred to me, I *was* thinking of foreign governments; particularly during the cold war era. Not to mention the fact that some of the probes out there were launched quite a long time ago and won't have modern standards of "security".
- Re: (Score:2)
  
  by uncoveror ( 570620 ) writes:
  
  The only "Mars probes" that haven't crashed are in Arizona. [uncoveror.com]
Bad command or filename (Score:1)

by anss123 ( 985305 ) writes:

C:\>
- Re:Bad command or filename (Score:5, Funny)
  
  by robably ( 1044462 ) writes: on Saturday April 14, 2007 @04:56AM (#18729797) Journal
  
  C:\>
  
  You know, that looks like the emoticon for an egghead with a beard, frowning. Very appropriate.
  
  Parent Share
  twitter facebook
  - Re: (Score:1, Funny)
    
    by NeilTheStupidHead ( 963719 ) writes:
    
    Egghead? No, no. It's the over developed cranium of our new martian overlords.
  - Re:Bad command or filename (Score:5, Funny)
    
    by Dasher42 ( 514179 ) writes: on Saturday April 14, 2007 @07:08AM (#18730317)
    
    *sniff*
    
    You just made a beautifully appropriate commentary on a common fixture of my childhood. Dude.
    
    Parent Share
    twitter facebook
  - Re: (Score:1)
    
    by srussia ( 884021 ) writes:
    
    C:\>
    
    You know, that looks like the emoticon for an egghead with a beard, frowning. Very appropriate.
    Or tilting your head the other way, a monkey wearing a dunce cap... equally appropriate.
    - Re: (Score:2)
      
      by Zaiff Urgulbunger ( 591514 ) writes:
      
      C:\>
      
      You know, that looks like the emoticon for an egghead with a beard, frowning. Very appropriate.
      
      Or tilting your head the other way, a monkey wearing a dunce cap... equally appropriate.
      
      Or a *really* happy dude wearing a pointy hat at a jaunty angle!
  - Re: (Score:2)
    
    by jo7hs2 ( 884069 ) writes:
    
    Looks like a big-nosed wizard to me.
You nits (Score:3, Insightful)

by Bastard of Subhumani ( 827601 ) writes: on Saturday April 14, 2007 @04:45AM (#18729767) Journal

The mistake caused a problem with the positioning of the solar power panels
What was it this time, degrees vs radians?

Share
twitter facebook
- Re: (Score:1, Funny)
  
  by harry666t ( 1062422 ) writes:
  
  > What was it this time, degrees vs radians?
  
  Grads.
*Design* flaw (Score:3, Insightful)

by CatoNine ( 638960 ) writes: on Saturday April 14, 2007 @04:51AM (#18729789)

From TFA: ".... That exposed one of the batteries to direct sunlight, causing it to overheat." So, also a small naviation error or small mechanical failure could already cause this thing to overheat. It should have been constructed more robust.

Share
twitter facebook
- Re: (Score:3, Interesting)
  
  by dsanfte ( 443781 ) writes:
  
  Some temperature monitors on critical, exposed devices would also help. All you need is the CPU temperature diode present on just about every motherboard sold today. In fact, how about many of them, arranged at strategic positions on the spacecraft hull to give real-time temperature information to the satellite's computer? I guess complicated ideas like these get ignored in favor of simpler solutions, like relying on large chains of command and bureaucratic procedures carried out 30 light-minutes from the p
  - Re:*Design* flaw (Score:5, Informative)
    
    by Mike1024 ( 184871 ) writes: on Saturday April 14, 2007 @05:24AM (#18729875)
    
    Some temperature monitors on critical, exposed devices would also help. All you need is the CPU temperature diode present on just about every motherboard sold today.
    
    I looked at the actual report on the NASA website; it said "the spacecraft's power management software misinterpreted the battery over temperature as a battery overcharge and terminated its charge current."
    
    There was a temperature monitor on the critical, exposed component. Furthermore, the information from the sensor was used in a sensible manner: Li-poly/li-ion batteries can catch fire under some circumstances (see also: sony laptop batteries) so if your li-poly battery overheats while being charged you stop charging it (because you'd rather have a flat battery than an exploded battery).
    
    After the craft stopped charging the battery it never started charging the battery again. The battery ran down and the craft stopped working.
    
    The obvious question is: why didn't charging resume after the battery had cooled down? It might not have cooled down (as it was hot in the first place due to being exposed to the sun) or the system might have been waiting for a 'resume charging' command from ground control, which was never received as the high-gain antenna was in the wrong position.
    
    Personally if I was designing a space craft I'd duplicate the (presumably quite small) onboard computer and radio hardware, because it seems quite common for software/electronics failures to result in loss of communications. Having two processors running different software, each capable of reprogramming the other one if it became broken, would seem like a sensible route to take.
    
    Just my $0.02.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by ckedge ( 192996 ) writes:
      
      > Personally if I was designing a space craft
      
      If you were designing a space craft it would be 2000 lbs overweight and require a $1-billion launch vehicle instead of a $100-million launch vehicle, and you'd get your ass fired.
      
      What kind of moron "quips offhand" that he'd have been smart enough to fix/predict all 50,000 imaginary possible 5-factor problems (remember, this is just the one of 50,000 that occurred) as compared to the $100-million dollars worth of PhD-Years that someone else already spent on the
    - Re: (Score:3, Interesting)
      
      by Tablizer ( 95088 ) writes:
      
      Personally if I was designing a space craft I'd duplicate the (presumably quite small) onboard computer and radio hardware, because it seems quite common for software/electronics failures to result in loss of communications. Having two processors running different software, each capable of reprogramming the other one if it became broken, would seem like a sensible route to take.
      
      Or maybe have a small back-up battery in the center of the probe where it cannot be heated by the sun even if the probe gets poin
      - Re: (Score:2)
        
        by Jonathan_S ( 25407 ) writes:
        
        We may have reached a threashold in unmanned exploration where the operations and software is more expensive than building and launching the hardware itself (at least for going to Mars). This may mean that it is actually cheaper to let failures slip through every now and then. For example, it may be cheaper to have 5 probes with a 40% of failure than 2 probes with 10% failure. However, such may result in national embarassment.
        One possibly method to try to address this (other than just letting some probes di
    - - Re: (Score:2)
        
        by shawnce ( 146129 ) writes:
        
        Not easy to simulate the environment accurately or fully. In the end is was an environmental factor that triggered the slide to death.
  - Re: (Score:2)
    
    by Detritus ( 11846 ) writes:
    
    You should win the Olympic medal for jumping to conclusions.
    Almost all spacecraft do have a large number of temperature sensors that are connected to the spacecraft telemetry system. They are used to detect equipment problems and thermal management issues. This has been standard procedure for many decades.
    - Re: (Score:3, Insightful)
      
      by dsanfte ( 443781 ) writes:
      
      This has been standard procedure for many decades.
      
      And yet, it failed.
  - Re: (Score:2)
    
    by FSWKU ( 551325 ) writes:
    
    ...large chains of command and bureaucratic procedures carried out 30 light-minutes from the point of failure.
    
    Which (sadly enough) makes NASA exactly the same as every other workplace with >50 people...
- More robust == heavier (Score:4, Insightful)
  
  by mangu ( 126918 ) writes: on Saturday April 14, 2007 @07:32AM (#18730441)
  
  It should have been constructed more robust.
  
  So, which scientific experiment would you remove in order to put additional heat shielding? No, the thermal shielding and other protection systems are just right for a spacecraft that had to travel a hundred million kilometers.
  
  What really failed was the ground-based software, that didn't have a good enough thermal model, and the technical support team. Equipment may fail, operators may commit errors, but there should be enough experienced engineers around to do a correct analysis to catch those errors. Downgrading of the engineering team is the true problem here. Look at what happened to Columbia. It blew up on reentry because of a failure that had happened on take-off, was caught on video, but not analyzed correctly.
  
  NASA isn't alone in these failures, perhaps one could say they set the pace for the rest of the industry. The lack of a good thermal model is typical of a whole generation of engineers used to do everything in Excel. With the current CPUs one has at each desktop, it wouldn't be so hard to do a correct thermal model of the spacecraft, but it would imply in solving a system of partial differential equations in C++, something very few engineers are able to do, even when given an extensive library [diffpack.com].
  
  Parent Share
  twitter facebook
  - Re: (Score:1, Informative)
    
    by Anonymous Coward writes:
    
    Not a lack of thermal model (such things DO exist for most spacecraft), and they DO spend quite a lot of time in both modeling and test (thermal balance) where they shine artificial sunlight on the spacecraft in a vacuum chamber while it's operating to verify that the model works. http://mpfwww.jpl.nasa.gov/martianchronicle/martia nchron7/mgs.html [nasa.gov]
    
    In fact, because MGS used aerobraking, which heats the spacecraft during the dips into the atmosphere, I'll bet the thermal model for MGS is better than most.
    
    But,
    - Re: (Score:2)
      
      by mangu ( 126918 ) writes:
      
      let's also remember that this puppy has been going for >10 years, which means it was designed 15 years ago... Call it 1990. Somehow I don't think the thermal design engineers at Martin Marietta and JPL were rookies using Excel for the first time
      Exactly my point, more so considering that not so many people used Excel at that time. I have been working with commercial satellites since 1984 and, although I've never had direct contact with NASA, we have the same suppliers.
      Thermal models are the least develop
- Re: (Score:3, Interesting)
  
  by maxume ( 22995 ) writes:
  
  They don't get to build the best damn space probe they can build, they get to build the best damn space probe they can build for $X. Thermal management isn't easy; controlling orientation allows them to spend money on the stuff they are interested in, rather than insulation and shielding.
- Re: (Score:1)
  
  by DogDude ( 805747 ) writes:
  
  That's easy for you to say, and maybe you're right. But you have to remember that NASA is driving remote-controlled cars, complete with all kinds of sensing equipment MILLIONS ON MILES AWAY. I for one am amazed this project, and I have the utmost respect for NASA.
- Re: (Score:2)
  
  by sjames ( 1099 ) writes:
  
  It WAS more robust. The craft was well beyond it's design lifetime when it was finally lost. All batteries lose capacity over time. I don't know for sure, but it wouldn't surprise me that if this had happened within it's design lifetime, the remaining bettery would have been adequate to maintain contact and allow for corrective commands from the ground such as rotate x degrees and resume charging.
  
  Everyone wants everything built robust enough to last till the end of life on earth or longer, but nobody want
Give NASA a break (Score:2, Interesting)

by patio11 ( 857072 ) writes:

It worked for a decade at a cost of a piddling $220 million, plus $20 million a year in upkeep. At a hair over $40 million a year, thats much, much less wasteful than most NASA missions. (Yeah, I suppose you could consider whether the return was worth it. Heh, who are we kidding -- did YOU get $40 million a year out of those desktop photos? I didn't.)

I propose that next time NASA spend $150 million on the construction phase, which is just a slush fund for defense contractors anyhow, and then issue the l
- Re: (Score:3, Insightful)
  
  by Anonymous Coward writes:
  
  "did YOU get $40 million a year out of those desktop photos?"
  
  MGS mapped targetted parts of the surface of Mars at much higher resolution than any previous mission. Among other things, it was responsible for finding the gullies that are probably signs of water being expelled recently at the surface. The length of the mission allowed it to detect changes at these sites, suggesting the process is still occurring today.
  
  What you really seem to be saying is that exploration of Mars by any means is a big waste
- Re: (Score:3, Insightful)
  
  by dreamchaser ( 49529 ) writes:
  
  I propose that you get a clue. It's hard to place a value on science like this, but the advancement of knowledge and working towards getting off this rock are both highly valuable fields of endeavor.
  
  As for your 'proposal'...did you just pull those numbers out of your ass? If anything costs increase over the years due to rising wages and inflation. $220 million was *dirt cheap* by space mission standards.
  
  We waste far more money on subsidies and entitlements in the US than we spend on science like this.
  - Nope, those are the real numbers (Score:2, Informative)
    
    by patio11 ( 857072 ) writes:
    
    http://nssdc.gsfc.nasa.gov/database/MasterCatalog? sc=1996-062A [nasa.gov]
    
    I realize this was dirt cheap by space mission standards. A laptop encrusted with diamonds which costs $80,000 is dirt cheap by laptop-encrusted-with-diamonds standards. That *doesn't make it worth the money*. I know we waste far more than $40 million a year on many things -- and, logically, every one of them except one can be justified by "We waste more money on another program, don't cut *my* hobby horse!"
    
    Its interesting that you draw the d
    - Re: (Score:2)
      
      by inviolet ( 797804 ) writes:
      
      Drat, where's my mod points when I need them to un-Troll a great post like yours.
- Re:Give NASA a break (Score:4, Insightful)
  
  by brassman ( 112558 ) writes: on Saturday April 14, 2007 @10:36AM (#18731839) Homepage
  
  > did YOU get $40 million a year out of those desktop photos? I didn't.
  
  So divide that by 200 million (roughly) to get your share.
  
  I got two quarters' worth. Heck, you can't get a comic book for fifty cents.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by Nethead ( 1563 ) writes:
  
  I'm willing to bet that 40 million of us got more than a dollar a year worth of postcards. I know I feel I did.
  - Re: (Score:2)
    
    by patio11 ( 857072 ) writes:
    
    Splendid, then we should have no problem privatizing the next probe and funding it with postcard sales.
The actual report (Score:5, Informative)

by Mike1024 ( 184871 ) writes: on Saturday April 14, 2007 @05:10AM (#18729839)

The preliminary official report is availiable from here [nasa.gov]. The summary conclusions are:

* A modification to a spacecraft parameter, intended to update the High Gain Antenna's (HGA) pointing direction used for contingency operations, was mistakenly written to the incorrect spacecraft memory address in June 2006. The incorrect memory load resulted in the following unintended actions:
** Disabled the solar array positioning limits.
** Corrupted the HGA's pointing direction used during contingency operations.
* A command sent to MGS on November 2, 2006 caused the solar array to attempt to exceed its hardware constraint, which led the onboard fault protection system to place the spacecraft in a somewhat unusual contingency orientation.
* The spacecraft contingency orientation with respect to the sun caused one of the batteries to overheat.
* The spacecraft's power management software misinterpreted the battery over temperature as a battery overcharge and terminated its charge current.
* The spacecraft could not sufficiently recharge the remaining battery to support the electrical loads on a continuing basis.
* Spacecraft signals and all functions were determined to be lost within five to six orbits (ten-twelve hours) preventing further attempts to correct the situation.
* Due to loss of power, the spacecraft is assumed to be lost and all recovery operations ceased on January 28, 2007.

Share
twitter facebook
- Re: (Score:2)
  
  by jacksonj04 ( 800021 ) writes:
  
  So hang on... they *overwrote* the memory which contained the contingency operations plan and the hardware limitations data for the solar array? Surely that's bad design, you shouldn't be able to overwrite something like that (Unless the hardware limits plan on changing mid-mission). NASA fault protection modules evidently don't do their job too well :-/
  - Re: (Score:3, Informative)
    
    by gfilion ( 80497 ) writes:
    
    So hang on... they *overwrote* the memory which contained the contingency operations plan and the hardware limitations data for the solar array? Surely that's bad design, you shouldn't be able to overwrite something like that (Unless the hardware limits plan on changing mid-mission). NASA fault protection modules evidently don't do their job too well :-/
    Actually, they had to correct a previous error by writing directly to memory. I believe that writing directly to memory is not a standard operating procedure. The PDF report linked by the GP states that:
    [...] The HGA parameter was actually updated on the two redundant control systems at two different times. The updates were commanded with slightly different (operator input) precision. This difference in precision, while numerically inconsequential, resulted in an inconsistency between the computer memori
    - Re: (Score:1)
      
      by Tablizer ( 95088 ) writes:
      
      Actually, they had to correct a previous error by writing directly to memory. I believe that writing directly to memory is not a standard operating procedure. The PDF report linked by the GP states that:
      
      Either you don't allow change to the safety programs, or you risk breaking them if you do allow changes. Changing as much as possible dynamically has proven very useful in the past. One problem with the Huygens Titan lander was that the radio broadcast code was in firm-ware instead of software, and a doppl
- Re: (Score:2)
  
  by MichaelSmith ( 789609 ) writes:
  
  Hmmm. When they went looking for it the MGS wasn't where they expected it to be. Hard to see how the failure mode they describe would have made it change its trajectory by a significant degree.
- Re: (Score:3, Informative)
  
  by gfilion ( 80497 ) writes:
  
  The preliminary official report is availiable from here [nasa.gov].
  Thanks for the link. The report is only three pages long and very interesting to read. The cause (quoted below) is really stunning, I wonder what's the probability of this sequence of event to happen.
  The LM team performed a fault analysis to determine the cause of the spacecraft anomaly. An LM spacecraft engineer ultimately determined that the likely cause of the anomaly was an incorrect parameter upload that had occurred 5 months earlier (June 2006). A direct memory command to update the HGA's positioni
- VxWorks == no protected memory?!? (Score:2)
  
  by mosel-saar-ruwer ( 732341 ) writes:
  
  A modification to a spacecraft parameter, intended to update the High Gain Antenna's (HGA) pointing direction used for contingency operations, was mistakenly written to the incorrect spacecraft memory address in June 2006.
  
  I am well aware that you can do some nifty things [slashdot.org] in VxWorks, but at some point, shouldn't you be using an OS [like QNX, Integrity, or, gasp!, WinCE] that offers a little more memory protection?
  
  Especially if you're writing code in a language with pointers?
- Re: (Score:2)
  
  by ScrewMaster ( 602015 ) writes:
  
  A modification to a spacecraft parameter, intended to update the High Gain Antenna's (HGA) pointing direction used for contingency operations, was mistakenly written to the incorrect spacecraft memory address in June 2006.
  
  To me, that sounds like they need some "managed code" up there. Windows SE.NET (Space Edition) would have prevented this, I'm sure.
An old error strikes back! (Score:2)

by Gazzonyx ( 982402 ) writes:

Deleted from TA:
In a tragic comedy of errors, NASA accidently sends the Mars Global Surveyor a confirmation to execute "con/con". Microsoft explains that this will be patched in TerraWindows (TM), and for the moment their only suggestion is to "...do the Microsoft '1,2 shuffle'; sigh heavily and do a hard reboot..."
John Dvorak has been contacted as a possible canidate to go manually reboot the Surveyor, but has yet to accept the proposition.
*ducks*
Solar panel caused battery to overheat ? (Score:1, Troll)

by Ace905 ( 163071 ) writes:

The article mentions that a new round of global-warming may be taking place on Mars - does this lend any credence to the theory that global warming is an unavoidable solar event? Maybe Mars and Earth switch off and on in turns - making one hospitable to life while the other becomes a desolate barren wasteland. Maybe we all just need to move 35 Million Miles away.

Sometimes I feel like I need to.

Also, The slashdot write-up says a, 'wrong command to the wrong computer address'. It was the right command, to
- Original post not a troll, responses are. (Score:2)
  
  by Ace905 ( 163071 ) writes:
  
  Wow, I take a commonly discussed 'question' about global warming - reference it. As if it, you know, is discussed by people - and I'm called a Troll.
  
  All of your links to other web sites appear to me, to be Trolling. Am I the scientists that are debating the causes of global warming? No. So I'm not going to look at your chosen data sets and do the math - I'm not qualified to. Is global warming a linear process? No, and all of the scientists agree on that. Until someone wants to prove it's caused by CO2
- - Re: (Score:2)
    
    by Ace905 ( 163071 ) writes:
    
    You sure have a lot of evidence to argue against such a cut and dry subject.
    
    Mods put the parent back up, its obvious this has been a discussion from the first second. Discussion != troll.
Are you telling me ... (Score:1)

by WrongSizeGlass ( 838941 ) writes:

... that NASA doesn't have an undo command? I guess they really have cut their budget.
- Re: (Score:1)
  
  by dominious ( 1077089 ) writes:
  
  I guess they really have cut their budget.
  
  this must be true! in fault tolerance, when the system reaches a bad state, it should undo by
  using backward error recovery: saved variables in past checkpoints which will allow a rollback to a good state.
  This is ofcourse costly in terms of processing and data storage, and may not always succeed:
  "Please ignore incoming missle."
Impressive (Score:5, Interesting)

by hcdejong ( 561314 ) writes: <hobbes@xAAAmsnet.nl minus threevowels> on Saturday April 14, 2007 @05:23AM (#18729873)

Not the error itself, but the fact NASA was able to figure out what happened in such detail, when the spacecraft it happened to is not giving any diagnostic information and cannot be examined directly.

Share
twitter facebook
wrong parameter? (Score:5, Funny)

by advocate_one ( 662832 ) writes: on Saturday April 14, 2007 @05:26AM (#18729881)

/sudo shutdown -h now sent instead of /sudo shutdown -r now

Share
twitter facebook
- Re:wrong parameter? No, wrong command! (Score:2)
  
  by chris_sawtell ( 10326 ) writes:
  
  /sudo shutdown -h now
  bash: /sudo: No such file or directory
  
  That would not cause any problems whatsoever.
What's in a name.. (Score:3, Funny)

by owlnation ( 858981 ) writes: on Saturday April 14, 2007 @06:53AM (#18730253)

Admittedly offtopic, but...

Somehow I find it reassuring that NASA employs someone called "Dolly Perkins". It has that warm cosy 1950's feeling of Golden Age Space Exploration. Now, if only we could get the astronauts named "Buck", "Rock", or "Trent".

Share
twitter facebook
- Re: (Score:2)
  
  by i_want_you_to_throw_ ( 559379 ) writes:
  
  I AM Dolly Perkins you insensitive clod!!!!
- Re: (Score:1)
  
  by mangu ( 126918 ) writes:
  
  I find it reassuring that NASA employs someone called "Dolly Perkins". It has that warm cosy 1950's feeling of Golden Age Space Exploration.
  
  So, what do you make of "Fuk Li, manager of the Mars exploration program at JPL"?...
Good code Is For Old People (Score:2, Informative)

by AHuxley ( 892839 ) writes:

In Capitalist West you send sloppy code to perfect probe.
In Soviet Russia perfect probe sends lens cap code back to you!

A wiki link to help with the lens part.
http://en.wikipedia.org/wiki/Venera_program [wikipedia.org]
The command (Score:2)

by bl8n8r ( 649187 ) writes:

[root@surveyor]# dd if=/dev/urandom of=/dev/solar_panels
Fewer than I expected... (Score:2)

by WgT2 ( 591074 ) writes:

There sure are fewer MS jokes than I expected.
Batteries Overheating (Score:2, Funny)

by ptelligence ( 685287 ) writes:

Guess they weren't aware of the recall on those Dell batteries.
Hearken Ye Back (Score:2)

by Aquitaine ( 102097 ) writes:

That command was:

win
Obligatory Simpson's reference (Score:1)

by sizzzzlerz ( 714878 ) writes:

D'oh!
Better, fast, cheaper - the reality (Score:4, Insightful)

by kilodelta ( 843627 ) writes: on Saturday April 14, 2007 @08:57AM (#18731015) Homepage

NASA has been on this kick of doing quick, reduced cost and inexpensive projects for some time now. They really have no choice since congress will only give them funding for unmanned and low cost missions.

So occasionally you get the stunning successes, E.G. the Mars rovers Spirit and Opportunity. Considering they were only supposed to last 90 sols and they're somewhere out to 1075 or more sols it means that the Steve Squyers is currently the start of NASA.

But more likely you get the devastating failures.

It's really sad that we blow a few billion a month on our little Iraq and Afghanistan ventures yet sciences take a back seat.

Share
twitter facebook
- Re: (Score:2)
  
  by JWW ( 79176 ) writes:
  
  Yeah, and Mars Global Surveyor was considered a stunning success too! This is an old spacecraft that was operating beyond its missions original lifespan. While its sad it can't continue its mission, it did achieve its original goals and then some.
  
  I say NASA should mark this as a success, take the lessons learned from this mission and build the next probe to send to Mars to take Global Surveyor's place, like was part of the plan anyway.
  
  I really hate seeing all the harping on NASA here when this was a REALL
- - Re: (Score:2)
    
    by Thomas Shaddack ( 709926 ) writes:
    
    At the risk of heresy: If something lasts 10X beyond its success criteria, isn't that overbuilt?
    You spelled "properly engineered" wrong.
Typical multiple-factor catastrophe (Score:5, Interesting)

by mattr ( 78516 ) writes: on Saturday April 14, 2007 @10:45AM (#18731905) Homepage Journal

There are an awful lot of posts here that disparage the people who have built and operated this system. To me it looked very much like the explanation for an aircraft accident. The easy failure modes are all known, so the really hard ones are left. In aircraft accidents, and it seems space accidents now too, a fatal result is generally the result of a number of seemingly disparate factors including system states, environmental state, and human impressions of what is going on.

In one major aircraft accident I know a lot about, the (Airbus) jet crashed in part because it ended up being a tug of war between a human pilot and a robot autopilot that should have been disengaged, causing and up and down roller coaster ride. There were lots of other distracting things that were maybe wrong or maybe not, but a key part was the difficulty in knowing what state the machine was in.

It was a similar situation with this accident, it seems, and though the misuse of metric units caused another recent accident it appears that these incidents have elements in common. They are also made more probable it strikes me by funding pressures and also in the way that operating these systems involves radical commands while the systems also lack enough power to be self-aware enough to preserve themselves.

I am not going to do any more guessing because the people involved can probably figure it out themselves, and it seems that these combined factor accidents at least are not costing human lives, while they are adding to knowledge about how not to make the accident the next time.

In that regard my hope is that some of the money being spent on Mars can be used to improve autonomous robotic systems to reduce accidents both on Mars and on Earth.

Share
twitter facebook
- Re: (Score:1)
  
  by PPH ( 736903 ) writes:
  
  It all depends on how one evaluates possible failures and deals with them. There are two schools of thought when it comes to this: One is that systems must be designed to deal with a certain combination of faults, regardless of the probability of occuring. The other attempts to address the probability of various combinations of faults, determine their effect on the overall system and deal only with those where the probability of a significant outcome exceeds some threshold. Both approaches have their proble
- Re: (Score:2)
  
  by sjames ( 1099 ) writes:
  
  A big factor not being considered in this discussion is that the spacecraft was well past it's design lifetime. I don't know all of the details of the design, but it's quite likely that at one time the unaffected battery WOULD have been adequate to maintain contact and allow for corrective action.
  
  The loss really was a combination of many small things including the increasingly fragile state of the surveyor itself. This is not any sort of embarrasment for NASA (or shouldn't be) or any particular person's g
I'm sorry, Dave (Score:1)

by nanosquid ( 1074949 ) writes:

Hal. Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.
Command leaked out in special report (Score:2)

by elgatozorbas ( 783538 ) writes:

HLT
Orbiting brick (Score:2)

by Tekoneiric ( 590239 ) writes:

It sucks when your hacking the firmware in your gadgets and brick them.
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

It wasn't a single wrong command (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:3, Insightful)

Re: (Score:2)

Re:It wasn't a single wrong command (Score:5, Funny)

Re:It wasn't a single wrong command (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:It wasn't a single wrong command (Score:5, Interesting)

Re: (Score:3, Informative)

Re: (Score:2, Interesting)

Re: (Score:2)

Emmentaler vs. Gruyere (Score:5, Insightful)

Re: (Score:2, Funny)

Re: (Score:3, Insightful)

Re: (Score:2)

That'll Teach 'Em (Score:5, Funny)

Re:That'll Teach 'Em (Score:5, Funny)

Re:That'll Teach 'Em (Score:4, Funny)

Re:That'll Teach 'Em (Score:4, Funny)

Oblig. (Score:3, Funny)

Re: (Score:1, Funny)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Bad command or filename (Score:1)

Re:Bad command or filename (Score:5, Funny)

Re: (Score:1, Funny)

Re:Bad command or filename (Score:5, Funny)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

You nits (Score:3, Insightful)

Re: (Score:1, Funny)

*Design* flaw (Score:3, Insightful)

Re: (Score:3, Interesting)

Re:*Design* flaw (Score:5, Informative)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2)

More robust == heavier (Score:4, Insightful)

Re: (Score:1, Informative)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:1)

Re: (Score:2)

Give NASA a break (Score:2, Interesting)

Re: (Score:3, Insightful)

Re: (Score:3, Insightful)

Nope, those are the real numbers (Score:2, Informative)

Re: (Score:2)

Re:Give NASA a break (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

The actual report (Score:5, Informative)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3, Informative)

VxWorks == no protected memory?!? (Score:2)

Re: (Score:2)

An old error strikes back! (Score:2)

Solar panel caused battery to overheat ? (Score:1, Troll)

Original post not a troll, responses are. (Score:2)

Re: (Score:2)

Are you telling me ... (Score:1)

Re: (Score:1)

Impressive (Score:5, Interesting)

wrong parameter? (Score:5, Funny)

Design flaw (Score:3, Insightful)

Re:Design flaw (Score:5, Informative)