Mars Failures: Bad luck or Bad Programs? 389
HobbySpacer writes "One European mission is on its way to Mars and two US landers will soon launch. They face tough odds for success. Of 34 Mars missions since the start of the space age, 20 have failed. This article looks at why Mars is so hard. It reports, for example, that a former manager on the Mars Pathfinder project believes that "Software is the number one problem". He says that since the mid-70s "software hasnâ(TM)t gone anywhere. There isnâ(TM)t a project that gets their software done."" Or maybe it has to do with being an incredible distance, on an inhumane climate. Either or.
It'll make me think twice (Score:3, Insightful)
Its a shame (Score:2, Insightful)
because software is one of the only things that could and should be theoretically perfect
maths (especially that based on 1 or 0 is either right or wrong it seems to be only when humans get involved that things go wrong and mistakes happen
I disagree, Mr. Editor (Score:4, Insightful)
Just as in the NFL when a receiver drops an easy pass and someone yells that he gets paid to catch passes like that, programers get PAID not to fuck things up.
The software motto... (Score:5, Insightful)
If you underestimate the resources you need to do software right, of course you'll have problems -- either getting it done on time, or getting the quality to the level it needs to be (or both).
That problem is hardly unique to the space programs. And of course, it would be a little tricky trying to upload a software patch to a hunk of solar-powered metal a few million miles away.
I wonder how much NASA et al. really tap the resources they should be tapping -- I mean, there ARE areas of industry where mission-critical or life-critical software has been developed and deployed for some time now. Maybe it's just a question of getting the right kind of experience in-house...
Xentax
Software not the problem... (Score:2, Insightful)
Budget and motivation (Score:3, Insightful)
However just throwing money at the problem isn't going to solve it, I'd suggest throwing away the rulebook and starting over for unmanned systems, better craft, less of the multimillion dollar single units and more cheaper devices that can carry out multiple landings at once.
For once, it might be worth imagining a Beowolf cluster of those things - because with many cheaper devices, the mission would most likely have a modicum of success.
Methodolgies (Score:2, Insightful)
Re:We landed on the moon with 512 bytes of RAM (Score:5, Insightful)
With 512 BYTES of ram you can literally look at the entire contents. You can be aware of every single bit on the system.
Now, where we have gigabytes of ram, and even more other storage it is simply impossible to sort through every bit. This errors roll in.
I'm not sure what to do about it, but I see why there is difficulty.
Sorting out the stages (Score:2, Insightful)
It looks as if the testing and debugging starts at the begining and works through the mission. I suppose this will eventially work, but it seems to be an expensive way to do it.
Rocket Science is hard (Score:5, Insightful)
The flip side is that. After Mars Ovserver spectatularly failed in 1993 ("Martians"), NASA started to go with faster, cheaper, better. The idea was, instead of a single $1 billion mission every 5 years with with 90% chance of success, why not 2 $200 million missions every two years, with an 80% chance of success. Everyone loves this idea when it works (Pathfinder), but when a cheap spacecraft fails, the public doesn't care if it cost $10 million or $10 billion, all we know is that NASA is wasting money.
So, the answer is, NASA has hit some bad luck. But the idea of faster, cheaper, better is ultimately a cost-effective one, so if we can solve these software problems (I mean, can't someone independently design a landing simulator?), and NASA can get 80-90%, we'll be getting a lot more science for the dollar. But NASA-haters will always have some missions to point to as a "waste" of money, and try to cut funding as it's mismanaged; other space junkies will insst that anything under 100% is unacceptble, and costs should double to move from 80% to 100%. I don't which attitude is more damaging.
NASA has a "good" track record since Observer, unfortunately, the highest profile missions have generally failed. If MER-1, and MER-2 are both succesful, and SIRTF flies this summer, then everyone should get off of NASA unmanned program's back for a while.
Tough assignment... (Score:5, Insightful)
Which is why we should continue to try. Giving up, saying "space travel is just too costly and risky" is a big cop-out. If we could send people to a different stellar object (the moon) in 1969 with the equivalent of a pocket calculator but not now, what does that say of our technology? Or sociology? Sure you could take the narrow-minded approach and say "and what does that bring us? The ability to jump from rock to rock in our solar system?" If so, you might as well ask why people decided to go to the poles (just ice) or whatever. You're still missing the point.
Kjella
NASA Management Practices and Quality of Software (Score:5, Insightful)
In my years at NASA Goddard I saw a dysfunctional management operate in ignorance of reality.
There was much praise of the employee who "went the extra mile", "put in long hours" and "served the customer" (that applied to contractor employees). There was also very little thought paid to the consequences of those practices.
What's the first thing to go when you're tired? It's not your body -- it's your mind. That's right -- if you're staying at work until you're feeling tired, you're making mistakes that need to be corrected later. The tireder you are, the more mistakes. The tireder you are, the less you can actually do.
I witnessed people who wore their exhaustion as a badge of honor. And, when they got into management, insist that others emulate their bad example. The result that I saw was people who should have been kept out of management becoming increasingly dominant. This was accentuated by the "faster, better, cheaper" ideology promulgated by former NASA administrator Goldin. This ideology was used to get rid of more experienced (and thus costly) people who were aware of the consequences of trying to squeeze more work out of fewer people.
It could take a long time for NASA to recover from this culture. The failure of projects in the past few years, the crash of Columbia could be turning points -- or they could be used by incompetents to justify even more dysfunctional behavior.
Re:Its a shame (Score:2, Insightful)
Software is human beings communicating with each other in ambigous natural languages and then trying to convert what they think they understand into a hyper specific computer language that a program (ie compiler) will translate into machine code.
The hard part is trying to eliminate all the killer misunderstandings. One of the early Geminis came down several hundred miles from the planned spot because some programmer assumed that there were 24 hours in a day. Not in celestial navigation!
Software is hard to do right.
Programmers (Score:5, Insightful)
Yes, programmers have erred. To err is human, to allow errors to propagate into mission failures is a failure of systems engineering, and I think that is where the real blame lies. A lot of the problem is thatspacecraft systems engineers often have a very amateurish grasp of software, if any at all.
For example, on Mars Climate orbiter, a junior programmer failed to properly understand the requirements. However, systems failed to:
It's really quite simple (Score:5, Insightful)
Look at the Space Shuttle. The space shuttle has never had a catastrophic computer failure-- but every line of code on that truck has survived review by a group of programmers. They've examined it, line by line, multiple times, in order to ensure that it's exactly right, because the cost of failure is 7 astronauts and a multimillion dollar orbiter.
The new Mars programs, however, are part of the streamlined "do it on the cheap" NASA. NASA put the Mars Rover down using mostly off-the-shelf and open-source software and a small amount of home-brew stuff. No matter how good open source software gets, it still hasn't undergone the level of review that the Space Shuttle code has seen. No matter how popular an off-the-shelf package is, it's not cost-effective for the manufacturer to give it that sort of treatment. NASA can't afford to do that level of code review because that costs them the ability to do some other program.
NASA is simply trying to do more with less in the unmanned launches, and the cost of that is we need to expect some failures. These failures are unfortunately very visible...
-JDF
Disagreeing with Hemos (Score:5, Insightful)
I have to really disagree with this. NASA is used to dealing with alien climates and terrain and astronomical distances. NASA is also used to dealing with problems. They have some of the best problem solvers out there, and when something goes wrong, then tend to pinpoint why. When NASA says A, B, and C are the causes of failure, I believe them. When NASA cannot figure out why something went wrong, I worry.
What I'm trying to say is, distance and inhuman conditions shouldn't have that much of an affect on how well a probe works. We built Voyagers I and II, didn't we? They worked even better than expected. And they encountered climates and conditions which make Mars look easy.
NASA has dealt with so many varying circumstances and climates over the years, and been so blunt about their mistakes, I find it hard to believe that they would blame the failures of an entire class of missions on something "easy." And yes, blaiming failures on software is an easy way out, how many times have you heard someone say "Oh! It must be the software!" when something doesn't go as expected?
Now, I know this guy doesn't speak for NASA as a whole, but as a NASA trained administrator, and the head of some very large projects, I'm willing to take his opinions at face value. If he says it looks like software has really been a cause of failure, who am I to laugh at his expertise and belittle his explanations? I might not like his explanation, but I buy it.
Re:I disagree, Mr. Editor (Score:4, Insightful)
Regarding the losses of the two space shuttles, it is hardly fair to compare hardware failure to software failure. The physical behavior of a mechanical system is not deterministic--stress something hard enough and it will break, and it is impossible to predict when a particular part will fail in advance. You can do lots of testing to get a sense of when, on average, a part will fail under certain conditions, and you can design and engineer as best as possible for something to work even if a part fails, but parts will fail and sometimes hardware failures are irrecoverable.
Software, on the other hand, is completely deterministic. With error-checking and proper testing, it is possible, at least in principle, to write software that will not fail. Software failure that results in loss of life is simply inexcusable.
Software is Hard (Score:5, Insightful)
PHB's also haven't figured out that developers aren't interchangeable widgets. If you know C, it doesn't mean you'll be immediately productive in Korn shell scripting, and vice-versa.
PHB's also haven't figured out that experience is key. There are exceptions, but generally speaking, a young hotshot isn't going to be as productive as an experienced professional. Sure, the young hotshot might get v1.0 done first, but it'll be buggy, unreliable, unscalable, hard to maintain, etc.
The "problem with software" is almost entirely a management issue, imho.
-Teckla
Management Failure (Score:3, Insightful)
We haven't seen software failures taking out manned missions, two shuttles failed from the high stresses of takeoff and re-entry. Just a guess, but the engineering standards are probably much higher for the manned programs, and more people review the code. Also, keep in mind that NASA has been experimenting with the idea of saving money with faster paced development which means some reduction in review and other QA standards, particularly on unmanned planetary missions. It may even be that this method is cost effective in spite of some high profile failures.
Re:Methodolgies (Score:4, Insightful)
Or, at least when it comes to writing rock-solid code that reliably does the wrong thing...
Re:We landed on the moon with 512 bytes of RAM (Score:5, Insightful)
Yes, but can your computer recover from a triple memory failure? Can you rewire your computer remotely to fall back on a redundent system? Frankly I keep the covers off my case to keep my CPU from overheating.
State of the art is not always measured in Gigahertz.
Re:Software is Hard (Score:4, Insightful)
PHB's also haven't figured out that developers aren't interchangeable widgets. If you know C, it doesn't mean you'll be immediately productive in Korn shell scripting, and vice-versa.
I think this statement is true, but only because of the failure of education (or lack thereof). A good software analyst, is trained to think about the concepts, not the language. When I was a senior, we had a class where every project was a new language. One of the professor's summed it up, "Any monkey can learn a programming language by reading a book. An analyst will know what he's doing, no matter the language." It's all too sad that most employers hire based on language experience, and not successful software engineering practices.
The "problem with software" is almost entirely a management issue, imho.
For many reasons, but proper software engineering is understood but not popular. The results of a Cleanroom Engineering project have been well documented. Why isn't it popular? It doesn't have a fun sounding name and it's tedious to do correctly.
Re:I disagree, Mr. Editor (Score:3, Insightful)
Software is NEVER deterministic in an operating environment. Just because you can put it on a bench and test the snot out of it does not certify it's behavior in the real world. I have written many programs that work perfectly in testing, only to have a user punch in an unexpected value and bring things to a crashing halt.
Oh no, all design documents dissolve on contact with the real world. The best software is the type that realizes it is operating in an imperfect world, and takes pains to vet its data before processing, or die in a manner that is the least catastrophic to life and property.
Software (Score:4, Insightful)
I always thought that there should be a way, to build a probes navigation and propulsion systems in a standardized whay so that avionics software wouldn't need to change that much.
Sort of a standardized platform if you will for doing solar system exploration.
This platform would consist of a number of parts that would not change, and could be reusable in a number of different configurations for building a probe, depending on what its job was.
Cameras, photometers, spectrometers, and power sources could all be packaged in the same why depending on the probes job.
Every probe that nasa launches is always customized and built around cost and included packages.
I am not so sure that is the best way to go about it as you have to reinvent all the software to manage the probe every time you build one.
Probes should be cheap, produced in high volume, (thousands) and interchangeable.
With a standardized approach, failure rates should come down a bit and costs should be reduced.
-Hack
Re:I disagree, Mr. Editor (Score:5, Insightful)
That's just bunk. As a programmer writing software for spacecraft you must be able to anticipate every possible value and account for it. Every condition should be able to be gracefully handled by an error checking routine. There is zero room for failure. If that means it takes 20 years to write, test, rewrite, and retest the perfect program, then so be it. When human life is involved price is not an object. (well, within reason of course since there's a dollar value on human life in the space program, but the negative publicity value is astronomically more than the dollar value of the loss of human life.)
Nobody cares about QC, only budget (Score:1, Insightful)
So then they spent what, twice as much? three times as much? As a QC regime would have cost to actually design, build, and install compensation electronics on the Hubble to correct for the aberrations in the mirror.
Probably is, then as a result budgets STILL get cut. There's no money to do things "right".
Re:Software is Hard (Score:2, Insightful)
Too bad the OSI doesn't believe in it [troed.se].
My dad was on the Viking project... (Score:4, Insightful)
Re:Tough assignment... (Score:2, Insightful)
Big difference between the shuttle... (Score:2, Insightful)
... and the Mars vehicles. The Shuttle carries people. You can afford to cut corners a little if no one's going to get killed.
Sean
Failure (Score:3, Insightful)
Re:We landed on the moon with 512 bytes of RAM (Score:3, Insightful)
The grandparent post's point still stands. 128MB is one huge mass of program and data to debug. I know I wouldn't stake my reputation on a "bug free" multi-megabyte program--only a fool would.
Remember, the true complexity of a program increases exponentially with the size of the program.
This is why I will never trust Windows for anything more than a gaming platform (millions of lines of hastily-written code == one hell of a buggy program). I would bet that any recent version of Windows has several hundred thousand bugs in it.
From a complexity standpoint, UNIX is an order-of-magnitude better than Windows but is still big enough to have lots of bugs. Linux is similar to UNIX in complexity.
No software in wide use today is bug free. I have never seen software that was bug free. Even the printf() call in a "Hello World" program probably has bugs in it, regardless wether the "Hello World" program exposes them.
Personally, I would never feel confident enough to write software that puts human life directly at risk, unless there are fail-safe non-software-controlled mechanisms in place. Sometimes, we just have to put software aside and let real Engineers do what they do best. And, yes, there is no such thing as a Software Engineer (it is still very much a made-up job title that anyone can have, even me:).
Re:Its a shame (Score:3, Insightful)
And theoretically prohibitively expensive.
I have yet to meet someone who is geniunely willing to pay for software quality. They simply don't care or understand. Once the software reaches some minimum threshold of "working", the project gets cut off or put on some other tangent.
Re:Mistakes (Score:3, Insightful)
That's why some folks at NASA develop [nasa.gov] more sophisticated control software that can take of failures. The RAX experiment on DS1 probe successfully demonstrated this approach viable.
However, at the moment the project suffers major rewrite in C++, notorious for its 'safety', for reasons having very little to do with engineering...
Faster, better, cheaper - choose any two (Score:2, Insightful)
Software can be done right. Anyone who doesn't believe this either (a) does not know how many millions of lines of software are involved in avionics and air traffic control, (b) never flies on an airplane, or (c) has a death wish. Of course I guess there's also a fourth possibility - when all else fails, blame the software. The space shuttle's record proves that software can be dependable, but also illustrates that making it that way is very, very expensive. Just a matter of priorities.
Orbital Mechanics a contributing factor (Score:3, Insightful)
It's physics, dudes. (Score:1, Insightful)
1. Distance from Earth to Mars is about 35,000,000 miles at the closest and the mean distance is something like 48,000,000 miles.
2. The velocity of light is constant at 186,000 miles/second.
3. This means it takes 6.5 to 9 minutes or so, round trip for a radio signal to reach the spacecraft and get feedback in either direction.
4. If the spacecraft encounters difficulties that would require it to report, receive instructions, report back, receive additional instructions, if necessary, then we are talking about a 13 - 18 minute process, just for minor correctons.
5. This is akin to remotely driving an unmanned car with messages transmitted by carrier pidgeon.
6. So, for all practical purposes, the landing craft must be autonomous, which means that the software must be reliable, fast, and comprehensive.
I don't know about you folks, but I haven't seen any software that I would trust to drive my car from my house to the office unmanned (about 7 blocks), much less take millions of dollars worth of hardware millions of miles from home and expect it to get there safely.
In my opinion, manned missions make more sense because they have a significantly greater chance of success even though the cost is also significantly higher.
Re:NASA Management Practices and Quality of Softwa (Score:3, Insightful)
(1) Schedule realistically, so that tasks can be completed without overtime. This may mean some things just cannot be done in the desired time period. Learn to accept that.
(2) Hire and retain sufficient staff, so that the work can be shared between multiple people. This may mean that some of the time the company will be overstaffed. Accept that too.
Obviously both these suggestions come with a pricetag, but lost missions aren't free either...
Re:It's really quite simple (Score:2, Insightful)
You could do 1 500 million dollar mission or 30 $20m missions