Forgot your password?
typodupeerror
Space Science

Mars Failures: Bad luck or Bad Programs? 389

Posted by Hemos
from the what-makes-the-machine dept.
HobbySpacer writes "One European mission is on its way to Mars and two US landers will soon launch. They face tough odds for success. Of 34 Mars missions since the start of the space age, 20 have failed. This article looks at why Mars is so hard. It reports, for example, that a former manager on the Mars Pathfinder project believes that "Software is the number one problem". He says that since the mid-70s "software hasnâ(TM)t gone anywhere. There isnâ(TM)t a project that gets their software done."" Or maybe it has to do with being an incredible distance, on an inhumane climate. Either or.
This discussion has been archived. No new comments can be posted.

Mars Failures: Bad luck or Bad Programs?

Comments Filter:
  • by bigattichouse (527527) on Monday June 09, 2003 @08:46AM (#6149532) Homepage
    Make it simple. The original software used (like in the moonshots) was Very simple control loops... no OS, no overhead.. just a simple program doing a VERY simple job over and over. Read stick, fire retros as appropriate.
    Also, solid state, however big and bulky, isn't susceptible to the radiation that many mega-tiny chips are... by writing (and testing) the software in the simplest manner, and building a VERY specific piece of hardware out of solid state components.. and lots of unit testing... you're more likely to get there.
    For the same reason the 486 was the only space-rated intel processor for quite a long time (not sure if thats still true).

    I'd rather go on "slower" simpler hardware that does a very specific job... and you can repair with a soldering iron.

  • Mistakes (Score:5, Interesting)

    by Restil (31903) on Monday June 09, 2003 @08:48AM (#6149555) Homepage
    Of course, the stupid metric conversion problem only accounted for one of the failures, but it's indicitive of a larger problem. There's obviously a shortcoming in quality control and verification if such an obvious mistake could be overlooked. What less obvious problems are we missing all together? Most of the failures occured during the orbital entry phase, during which time they shut off the transmitter, and therefore don't have up to the second data on the reason for the failure. Sure, they likely wouldn't have much of an opportunity to save the mission, but they would have a good chance at figuring out what the problem actually was so it could be fixed the next time around. Instead, we're left to guess. Cost concerns are always mentioned as the reason, but how much have we "saved" really? An extra million $$ to keep the transmitter on would probably have paid for itself a long time ago.

    -Restil

  • by Anonymous Coward on Monday June 09, 2003 @08:53AM (#6149601)
    same as the Volkswagen Beetle (old versions) is still deemed the worlds most reliable car, no water,engine management systems,injections,turbos,massive wiring looms air con,etc etc ,
    so basic that the error rate is significantly reduced to a point that identifying and fixing errors are trivial without the need to plug a single computer in or sort through 2miles of cables looking for a single break

    i digress technology makes life harder not easier

    cheers
  • by mcheu (646116) on Monday June 09, 2003 @08:54AM (#6149610)

    Thing is, space exploration isn't done with *current* technology. The computing technology used in a lot of aerospace applications is 20-30 years old. There are a number of reasons for this, but the ones I've heard of are:

    1. The projects are long-term, and have been in development for a lot of years. Especially when it comes to government projects. They can't just up and switch to the latest tech whenever it comes around, otherwise it will end up like DNF and never see the light of day.

    2. The engineers don't trust the latest and greatest. The technology isn't considered mature enough. All the bugs have been worked out in the older tech, so it's more robust, the engineers are more familiar with it, and more often than not, manufacturers have shunk and simplified the designs significantly since introduction.

    It's more likely that you'd find a 8086 processor in the space shuttle than a Pentium 4 unless someone brings a laptop aboard. It wasn't all that long ago that NASA put adds on websites and geek magazines appealing for old 8086 processors for spare parts. I haven't heard anything since, so either they found a supplier, or they're too busy piecing together the Columbia.

  • by Anonymous Coward on Monday June 09, 2003 @08:55AM (#6149617)
    Programmers get paid to do their job to the best of their ability, just like any other employee.

    When not even the best programmers can get it right it might be time to start thinking that there's a hard problem in there, docking pay isn't the way to fix it.
  • by Larthallor (623891) on Monday June 09, 2003 @08:59AM (#6149651)
    Perhaps one of the reasons that the software isn't getting done on time is that much of the system is written from the ground up. Perhaps it would be better to design a common, open source spacecraft platform. So many of the basic tasks that spacecraft software must perform are essentially identical. The main differences for critical spacecraft systems would be the hardware. If a general purpose OS and spacecraft toolkit were designed, then the main things that would have to written from scratch for different missions would be drivers for the hardware and various configuration settings.

    I'm not sure how suitable RT Linux would be from a technical/performance standpoint, but having a highly portable open source OS would give a flexibility and availablility that would make adoption much easier.
  • by Gerry Gleason (609985) <gerry@NoSPAM.geraldgleason.com> on Monday June 09, 2003 @09:00AM (#6149661)
    And it is finite as well, but I don't see anyone with a closed form solution to that either. Even with a very small, searchable code space for possible programs, it is not possible to completely characterize the program's behavior.

    Theoretically, all programs have latent bugs, unless they are too simple to do much.

  • I'm not surprised. (Score:5, Interesting)

    by dnnrly (120163) on Monday June 09, 2003 @09:01AM (#6149671)
    I've seen the code for some MAJOR blue chip companies and I really do wonder how these people stay in business with the rubbish that they put out. For example some of code drops from our clients don't even compile! The reason for all the crap is that it's very easy to cut corners without it being very obvious immediately. Typically, the first thing that gets stopped when things ar getting tight (either time or money) is documentation, quickly followed by testing. Next it's individual features, removed from the requirements 1 by 1.

    Since software engineering is still a 'black art' as far as most traditional engineers and project managers are concerned, there isn't the real intuition/understanding of when things are starting to look bad. Without looking at code AND knowing something about it, you won't stand a chance 'intuiting' whether or not things are going well.

    Writing software is an expensive business in both time and money. It's also a very young business without the same 'discipline of implementation' as other areas. Until the process matures and people realise that doing it on the cheap gives you cheap software, things aren't going to change and Mars probes are going to continue to produce craters.
  • Re:Its a shame (Score:2, Interesting)

    by tomstdenis (446163) <tomstdenis@@@gmail...com> on Monday June 09, 2003 @09:04AM (#6149696) Homepage
    Why wait 100 years? I'm ashamed of most programmers *TODAY*. Stupid three week IT majors with a background in ASP.NET or some shit...

    Used to be comp.sci was about comp.sci not staying upto date with the latest code monkey script language.

    There is still a reason why the majority of *real* work is coded in C. Its a simple language that gets things done.

    The dot.com busta VB script kiddies [e.g. three week IT grads] come and go. True comp.sci'ers stick along better.

    Tom
  • by AndroidCat (229562) on Monday June 09, 2003 @09:06AM (#6149720) Homepage
    Yeah but... The Apollo 11 LEM computer crashed several times [wikipedia.org] during the landing.
  • Funny. Of all of the things that went wrong mechanically with the shuttle, from enginees that had to be tweaked beyond what a Rice-Boy would consider safe, to a protective houseing made of glass, to strapping 2 solid fuel boosters just to jet the sucker off the ground, the software on the Space Shuttle worked well, and worked the first time.

    Part of it was the fact they had absolute geniouses working on the problem. Think of it, they designed a system in the late 1970's, tested it on the ground, and had it successfully fly for 20 years without a major "oopsie". Or rather, if a major "Oopsie" happened, they had ways around, over, or through it. They spent YEARS developing the flight software for the Shuttle.

    Software CAN be done right. It just has to be a priority.

  • How about this? We're launching fairly small, very complex probes, that aim to do a lot more than the moon missions in some respects...certainly the craft are responsible for accomplishing a lot more 'unsupervised'.

    With the moon missions, there were manned craft, and so every line of code had to be checked and rechecked--and hundreds of guys were on the ground watching everything that happened, twenty-four seven, until the astronauts were safely back on the ground.

    Now, windows for a Mars launch come much less frequently. There might be a temptation to rush some of the QA and just cross fingers. Speed of light delay means that NASA can't intervene in most situations--problems are resolved one way or another before anyone on the ground even hears about them.

    Moon launch hardware had to last for a few days in space--stressful, busy, lengthy days, but a few days nonetheless. We expect Mars craft to spend months in hard vacuum and harder radiation, and then land successfully without human help, on a planet with higher gravity than the moon...

    Just some thoughts. The parent is right--Mars missions are hard because it's far away, and you have to travel through space to get there.

  • by MartyC (85307) on Monday June 09, 2003 @09:25AM (#6149927) Homepage
    According to this page [ucar.edu] only 3 of 26 missions to Venus have been total failures. When you consider that Venus is a much more hostile environment than Mars then you have to conclude that either Mars is just plain unlucky or mission planners are getting something wrong.
  • by vondo (303621) * on Monday June 09, 2003 @09:27AM (#6149948)
    Also, solid state, however big and bulky, isn't susceptible to the radiation that many mega-tiny chips are...

    Actually, the current microchips are inherently rad-hard (radiation resistance). This wasn't the case in the past. It's something about the size of the features being small and also shallow, so that not much charge is deposited as a charged particle passes through. 0.25 and 0.18 microns are apparently especially good. However, as feature size continues to go down, things will get worse again.

    You might find this link [vanderbilt.edu] interesting too.

  • by reddish (646830) on Monday June 09, 2003 @09:27AM (#6149957) Homepage
    They claim that "there was this crack" or "we confused metrics" but at the very core of the problem they didn't understood the problem and the tools to solve it.

    However much you may disagree, simple Newtonian dynamics and is all it takes to get a space probe from A to B in the vast majority of cases. It's a well-understood problem domain.

    Dragging in stuff like chaotic long-term behavior of n-body systems, while an interesting fact in itself and worthy of study, has very little to do with the engineering problem at hand. Ephemerides for all major bodies in the solar system for the coming hundreds of years are known up to uncanny accuracies (metres) and plotting the trajectory of a probe is simply a matter of numerical integration, to put it bluntly.

    Now when someone mixes up metres and feet things go awry. But don't claim stuff like this could have been prevented by hiring more mathematicians. It's simply a case of human error, something that happens in the Real World.

    Having a high IQ, my friend, is no excuse for making stupid claims about things you don't know anything about.

  • by jellomizer (103300) on Monday June 09, 2003 @09:30AM (#6149992)
    Perhaps it explains why there should be a manned mission. The main problem with exploring the unknown is that there are a lot of unknown variables out their and computer technology is not always adaptable for all unknown variables. This is why there is software failure and lost contact. Manned missions give some extra control of the mission and gives the ability to improvise new solutions for unknown problems. Like Fixing a part that is broken by using an other material that is available. Or realigning so it will maintain contact. The big problem with mars is that it takes 20 minutes to send a signal for it do do something different remotely. A human who is well trained will be able to make these decisions and control the new instructions in far less time (within seconds). If it wasn't so expensive to do a Manned mission to mars. I am sure manned missions would have a much higher success rate.
  • by Malc (1751) on Monday June 09, 2003 @09:32AM (#6150014)
    What, the programming teams worked in a vacuum to each other? You're telling me that the products of their efforts didn't communicate with each other? The programmers should have noticed and/or documented properly. Personally, if I were a programmer on this project, I would have been VERY surprised if we weren't using ISO units, and I would have questioned it strongly. Anybody who's taken any physics courses knows that even in the US, people use ISO units. It was not a software problem - the software obviously did what it was told to do.

    GIGO.
  • The Apollo 1 accident was caused by bad wiring and a pure Oxygen atmosphere. It had nothing to do with the computers.

    And when I point out an aerospace system that does work, showing me a zillion ones that don't doesn't invalidate my point. The difference between the systems that work and the system that fail is crafstmanship.

  • My space failure (Score:3, Interesting)

    by TheSync (5291) on Monday June 09, 2003 @10:23AM (#6150631) Journal
    Heh, I was a part of a space failure [umd.edu] myself. We were using pretty much off-the-shelf equipment, but it passed NASA spec shake and thermal testing. What probably did it in was radiation...in low earth orbit we figured there wouldn't be much risk of radiation problems.

    If we were to do it again, we probably would have had some kind of radiation-resistant reset system, because building the whole thing in rad-hard would be very expensive (our budget was $1500 plus donated equipment!) But having a few rad-hard devices to reset the box in case of a crash would probably have been affordable.

    About 100 amateur radio operators contacted our payload, and relayed their GPS coordinates to others using amateur packet radio. At the same time, the GPS unit on board the Spartan satellite transmitted its position to listeners on the ground as well. But had it not crashed after about 17 hours, it is possible that several hundred other amateur radio operators would have used it.
  • by marauder404 (553310) <(moc.oohay) (ta) (404reduaram)> on Monday June 09, 2003 @10:23AM (#6150633)
    NASA software engineering is actually quite remarkable -- at least for the shuttle program. I read a paper once about how they actually break many of the paradigms of writing code that so many programmers are accustomed to so that the code is absolutely perfect. Deadlines are met well ahead of schedule and nobody works late. They're not allowed to work late, because the pressure or fatigue could cause an error to occur. The code is personally signed-off by the chief software engineer that it won't hurt anyone. Every line of code is fully documented. The code is virtually written twice by two separate teams. This article actually details some of it great length: They Write the Right Stuff [fastcompany.com]. I don't disagree with you that maybe the way they write software needs to be reviewed, but it seems that they already go a long way to ensure that happens.
  • by Anonymous Coward on Monday June 09, 2003 @10:54AM (#6151009)
    Consider that it is a much shorter trip to Venus, the orbital dynamics are easier (and quicker) for going to the inner planets, and the types of missions to Venus were much simplier with much more modest goals (because the Venusian environment really limits the kind of things you could land there---no fancy airbags, detached rovers, etc.).
  • by Waffle Iron (339739) on Monday June 09, 2003 @11:04AM (#6151111)
    Even with only 20K or so of code, the apollo guidance computer software development nearly slipped the schedule of the entire moon program. This page [nasa.gov] on this very interesting site [nasa.gov] describes the software development.

    I haven't read the whole site in a while, but IIRC, it describes the typical problems with software: underscoping the problem (in the 60s, most people assumed that the computer hardware development would be the majority of the effort), code bloat (the computer required much more memory than originally planned), buggy production code, schedule slips, problems caused by cruft. When the project started, they just waded right in to coding with few tools and little awareness of the need for proper engineering practice.

    This particular case was made more difficult by the program loading procedure: the program ROM was made one bit at a time by hand threading magnetic cores on to tiny wires then embedding it in a solid block of epoxy. The write-compile-debug cycle could be weeks. If bugs were discovered late in the schedule, the astronauts just had to work around them. The software devleopers did have mainframe-based simulators for development, though.

    With the gigabytes of space available for today's software, I'm surprised that any modern space projects get finished at all.

  • by EccentricAnomaly (451326) on Monday June 09, 2003 @11:09AM (#6151168) Homepage
    Just look at the rate of failure for early moon missions [inconstantmoon.com]

    It's a hard probelm to send a probe to the Moon or Mars. landing and aerocapture at Mars are dicy things.

  • by confused one (671304) on Monday June 09, 2003 @11:47AM (#6151536)
    We've all heard of the "faster, better, cheaper" game NASA's been playing lately.

    Here's the problem as I see it: As software and hardware have become more complicated, there's a need to increase testing. Instead, in order to meet NASA's new budgetary requirements, funding in general, and specifically for testing, has gone down. So, it's not possible to completely test all of the hardware AND software, as it should be.

    As an analogy: If we were talking about commercial airliners; these probes would never be certified to fly.

    I'm not putting all the blame on NASA here; although, it is apparent to me that they need to start reporting what it's actually going to cost. Having said that, Congress is equally complicit; they need to come to the realization that it's expensive to do work outside the atmosphere (they apparently don't understand this...)

  • by Phil Karn (14620) <karn@@@ka9q...net> on Monday June 09, 2003 @12:18PM (#6151909) Homepage
    So how do you explain the significantly higher success rates to planets other than Mars, e.g., Venus and Jupiter? They share the same problems of long delay times and the need for autonomous control.

    Your comment about manned vs unmanned makes absolutely no sense. One could buy a hundred or a thousand unmanned planetary missions for what a single manned mission would cost, and there would still be no guarantee that the manned mission would succeed. Yet we could easily afford to have many of those unmanned missions fail.

    I say that the manned space program is one of the major contributing factors to the poor Mars success rate. More specifically, the enormous sums of money that the Shuttle and ISS have siphoned from the far more productive unmanned planetary program and flushed down the drain.

  • by dinog (582015) on Monday June 09, 2003 @12:37PM (#6152110)
    Venus has a much thicker atmosphere, and this makes things easier. Many of the Mars craft have used aerobraking, and there just isn't much room for error when the atmosphere can't be measured in whole number milibars. Another failed attempt dealt with landing, which is also more difficult in a thin atmosphere because parachutes are far less effective. This is why some of the probes resorted to airbags. No one would even think about that on venus. An ugly option, but not much uglier than the alternatives.

    On the other hand, once the probes get to Mars, they last much longer than the ones sent to Venus. That is where the hostile environment on Venus becomes and issue.

    Dean G.

  • by macdaddy357 (582412) <macdaddy357@hotmail.com> on Monday June 09, 2003 @01:22PM (#6152558)
    Did you know that a law allowing the use of the Metric system in the United States was signed into law by President Johnson? Andrew Johnson!
  • by aebrain (184502) <aebrain@webone.com.au> on Monday June 09, 2003 @10:47PM (#6158168) Homepage Journal
    Well, yes.
    A quote from a recent Newspaper article [theage.com.au]:
    Spaceflight avionics software development is not for the faint-hearted either.

    "The question for software developers is not, 'Are you paranoid?', the question is, 'Are you paranoid enough?' " Brain says. "Every software module, every function, procedure or method has to assume that information coming in may have been spoilt by a malfunction and be prepared for the worst. The system must be ductile - bending, not breaking - when things go wrong. In space no one can press Control/Alt/Delete."
    A team of Australian programmers developed FedSat's onboard software, building on work done in Britain. It is written in Ada-95, a programming language designed for embedded systems and safety-critical software. All it has to work with is 16MB of RAM, 2MB of flash memory for storing the program, a 128K boot PROM (programmable read only memory) and 320MB of DRAM in place of a hard disk that would never survive the launch process. All essential data is stored in three physically different locations.
    Language is important. The numbers say it, the metrics say it, the successful projects say it, even some /. posts say it. But the "programmer gods" don't believe it, or more often, won't bother doing the research.
    The rest of us will just have to settle for actually doing this work, satellites, laser eye surgery systems, aircraft, subs etc instead of making yet another kludgy VB system to sell the latest in sportswear or whatever.

The reason that every major university maintains a department of mathematics is that it's cheaper than institutionalizing all those people.

Working...