Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Space Bug Mars Programming

Bad Code May Have Crashed Schiaparelli Mars Lander (nature.com) 163

cadogan west writes "In the accordance with the longstanding tradition of bad software wrecking space probes (See Mariner 1), it appears a coding bug crashed the ESA's latest attempt to land on Mars." Nature reports: Thrusters, designed to decelerate the craft for 30 seconds until it was metres off the ground, engaged for only around 3 seconds before they were commanded to switch off, because the lander's computer thought it was on the ground. The lander even switched on its suite of instruments, ready to record Mars's weather and electrical field, although they did not collect data...

The most likely culprit is a flaw in the craft's software or a problem in merging the data coming from different sensors, which may have led the craft to believe it was lower in altitude than it really was, says Andrea Accomazzo, ESA's head of solar and planetary missions. Accomazzo says that this is a hunch; he is reluctant to diagnose the fault before a full post-mortem has been carried out... But software glitches should be easier to fix than a fundamental problem with the landing hardware, which ESA scientists say seems to have passed its test with flying colours.

This discussion has been archived. No new comments can be posted.

Bad Code May Have Crashed Schiaparelli Mars Lander

Comments Filter:
  • by phantomfive ( 622387 ) on Saturday October 29, 2016 @06:41PM (#53176765) Journal
    This wouldn't have happened if they'd used imperial not metric!
    New age hippie liberal airheads. If it's not a hogshead, it's not fresh!
  • Martians (Score:5, Funny)

    by meglon ( 1001833 ) on Saturday October 29, 2016 @06:41PM (#53176767)
    They're still unwilling to concede that their defenses against the Martian's OBDS (Orbital Bombardment Defense System) is inadequate.
    • The most illustrious Council of the Elders met beneath the purple sky. Fields of loyal adepts filled the gathering grounds, as many loyal civilians waited on the perimeter, pushing ever forward to hear to words of the great one speaking, to even catch a glimpse of one of his most reputable gelsacs. As K'breel, speaker for the Council stood to speak, a hush fell over the crowd, and all stood in rapt attention, speaking thusly:

      Behold how the weaklings have fallen! Our priests and soldiers have toiled many days to finish our planetary defenses, and now they are operational! Our prayers during the last eclipse were especially effective.

      A junior reporter who asked a question about 'metric' was hastily removed from the

    • They panicked when they heard we were sending a "probe" to learn more about them.

  • by ZecretZquirrel ( 610310 ) on Saturday October 29, 2016 @06:43PM (#53176771)
    Only bad testing.
    • Testing on another planet is not that easy, though.
      • by m00sh ( 2538182 ) on Saturday October 29, 2016 @08:49PM (#53177123)

        Testing on another planet is not that easy, though.

        Yes, test it all in production.

        Since testing is sooooooooo hard.

        Landing is the most complicated part and Beagle and others have failed exactly here. There should be x100 or even more code for unit and integration testing than the actual code itself for the landing code. And, those tests should run through every permutation possible of every possible failure point or bad sensor readings.

        There is no way it thinks it has landed with that many sensor inputs. It is simply code that is not put through a good enough testing system.

        • Well, it's not like this is the first time a system thought it was near the ground while in fact it wasn't. Turkish Airlines flight 1951 [wikipedia.org] just to name one.

      • by Gr8Apes ( 679165 )
        Seems a lot of app code gets tested on another planet, because it certainly doesn't work on this one.
    • Re: (Score:2, Interesting)

      by Anonymous Coward

      Working in a company that makes automotive electronics, I can say that any problem without an obvious hardware assembly cause becomes defined as a software problem.

      Faulty sensor causing false readings that cause the software to detect that the craft is on the ground? That's the software's fault for not detecting that the sensor was faulty and using magic as a backup method to get the right result.

      • Yeah, and this occurs on earth too.

        "No, sorry, you have to replace your 1200 dollar catalytic converter"

        "Can you just replace the 02 sensors since they are a problem on my jeep as evidenced by online posts???" //invalidates money spent to fix the problem by adding the false error the o2 sensors made

        "Yeah sure - 200 bucks in labor for two sixty dollar sensors. Jerk"

        "ok, thanks. (passes emissions test) "

      • As someone else who works designing and building automotive radar systems, I take umbrage at your defensiveness. For one thing, car parts have to be dirt cheap to be competitive. parts for a Mars Lander don't. We don't know whether there were redundant altimeters (but normally there's three of everything ), and we don't even know that it was in fact a faulty altitude signal, regardless of root cause, that led to the crash.

      • It's actually worse than that. If there's a problem with the hardware, i.e. it's known to be failing to do what it's supposed to, it's the software people who're tasked with making a "workaround", i.e. frigging their own code to correct the error rather than the (often more expensive) hardware mod. I've got so many software projects behind me with hacks for hardware bugs you wouldn't believe. "isn't that what software is for?" is the inevitable bollocks you get from hardware engineers when confronted wit
        • It's actually worse than that. If there's a problem with the hardware, i.e. it's known to be failing to do what it's supposed to, it's the software people who're tasked with making a "workaround", i.e. frigging their own code to correct the error rather than the (often more expensive) hardware mod. I've got so many software projects behind me with hacks for hardware bugs you wouldn't believe. "isn't that what software is for?" is the inevitable bollocks you get from hardware engineers when confronted with the problem.

          Detecting when the hardware fails -is- part of the software's job. Industrial software does this routinely. Why can't aerospace software?

          But launching with known-bad hardware is criminal... 8-P

    • Um... there is bad code AND bad testing! 8-P

      All code should be considered bad, until it passes testing. All testing should be considered bad, until it finds some bugs. Wash and repeat... ;-)

      Of course, testing costs money and it's better to buy the boss a new big desk... right??

  • The ridicule 'Murrica took for it's lander crashing goes silent.

    A code problem eh? Shit happens, and my condolences - it can happen to any of us.

    • Funny though only US has success on the red planet. This is not only Europe's second attempt, but the same bug. In 1999 the computer shut off its thrusters too early too and it crashed it's probe 100 feet in the air.

      I forgot the name of the probe for that one.

      • Funny though only US has success on the red planet. This is not only Europe's second attempt, but the same bug. In 1999 the computer shut off its thrusters too early too and it crashed it's probe 100 feet in the air.

        I forgot the name of the probe for that one.

        I'm not certain. The Beagle in 2004 didn't deploy correctly, but otherwise I'm not certain. The Phobos-Grunt didn't make it out of earth gravity in 2011. It is so darn hard to land on Mars. The martian atmosphere is thick enough to make for a lot of heat, but not thick enough to slow you down anywhere near a normal earth type landing. And of course, the distances and reulting delays in communications time were making for around 14 minutes radio time, and 7 minutes of time between atmospheric contact and la

  • by gTsiros ( 205624 ) on Saturday October 29, 2016 @07:04PM (#53176825)

    ...in recent years, it wouldn't surprise me one bit

    • ...in recent years, it wouldn't surprise me one bit

      Yeah, ever since they stopped using goto and started with these class things, everything is going to hell.

      • by wonkey_monkey ( 2592601 ) on Sunday October 30, 2016 @05:09AM (#53178015) Homepage

        everything is going to hell.

        No, everything is calling hell() as a function.

    • Well, we are not talking about the iPhone, which code quality and testing is deteriorating with any new iOS since the late Jobs left. It's about specific hardware that requires specific software to do specific tasks, that require specifically competent people. ESA staff is expected to provide an outcome of the highest quality. Nobody knows yet for sure the cause of the Mars lander crash.
      • by gTsiros ( 205624 )

        the fact that the possibility of "software bug" is even listed as a possibility, speaks volumes.

    • Please, do publish the names and faces of ESA programmers! Anyone thinks you can program with a few years or even months of expertise and such is not the case. I would still suspect SCHIZOPHRENIC sabotage: the programmer was right, but a voice told him... therefore he does something different. International competition? GO TO HELL. I cannot live 500 years to wait to have more information when all time spans for space endeavour a more than a lifetime long.
  • It's the only way to be sure.

  • The most likely culprit is a flaw in the craft's software or a problem in merging the data coming from different sensors, which may have led the craft to believe it was lower in altitude than it really was,

    I don't remember which lander, but a previous one somewhere suffered a similar problem, mistaking landing leg deployment for surface contact. (the legs came down, and when the hit full stop the bounced back up a bit, triggering it to think the foot hit the surface) which caused it to shut off the landing

    • "So when do we know we're on the ground?" "Just have it switch modes when the accelerometer registers one Mars gravity."
      • "Just have it switch modes when the accelerometer registers one Mars gravity."

        All three accelerometers, and that they've registered no movement for long enough to confirm they're not moving. It's easy to armchair-quarterback stuff like this, but really, you'd think there was sufficient hardware on the machine to deal with problems like this.
        • by Tablizer ( 95088 )

          I thought radar was typically used to know how far one is from the ground? Seems a lot more straightforward than detecting thumps and bumps.

          • I thought radar was typically used to know how far one is from the ground? Seems a lot more straightforward than detecting thumps and bumps.

            Radar takes a lot of power to run and a non-trivial amount of weight and space for something you only use ONE time. So when possible they try to come up with other ways to do it.

            In this case I have no idea what they were using but nobody has mentioned radars, but they have mentioned that the lander was running on very small batteries designed to only last a couple days on the surface. This suggests they didn't have the power for a radar.

            • They don't take that much power (maybe a dozen watts) and have been used on all previous landings. Other than 3d stereoscopic estimates, I don't see another way and nobody is sending processors that powerful out into space yet.
    • You don't experience lower gravity until after you land. Everything up to that point is freefall and not detectable by accelerometers other than the force from the engines.
      • by v1 ( 525388 )

        if t were a vacuum, you'd be right. But there is no true freefall in an atmosphere. The air slows you down, and you do experience some gravity. (if you can reach terminal velocity, you'll be experiencing 100% gravity, as gravity is trying to accelerate you, but can't) The denser the atmosphere (which varies depending on your altitude) the more gravity you feel. They frequently rely on that to deploy the landing gear, it doesn't need to be powered when landing on a body with appreciable gravity.

        It's admi

    • Re:sounds familiar (Score:5, Interesting)

      by Solandri ( 704621 ) on Sunday October 30, 2016 @03:42AM (#53177911)
      Usually when that sort of thing happens, it's not because the programmer did something obviously wrong. It's usually because the programmer had two (or more) competing scenarios to design for. He tried to design something which would split the difference, and ended up erring too much to one side.

      Lufthansa flight 2904 [wikipedia.org] is a good example. The plane had to land in an expected crosswind on a wet runway. A crosswind landing requires landing with the plane's orientation misaligned from the runway. The plane is pointed into the crosswind, so is actually landing diagonally, then when it hits the ground it has to quickly yaw so it's aligned with the runway (so the wheels are pointed in the right direction). The way this is done is it lands on one gear first, pivots around on that gear to point the nose at the end of the runway, then drops down the second gear, then the nose gear.

      The A320's flight computer was programmed to avoid the disastrous scenario of a thrust reverser deploying in mid-air [wikipedia.org]. It prohibited deployment of the thrust reversers unless both rear landing gear had 6.3 tons of force each on them. Full deployment of the spoilers (disrupts lift to plant the plane firmly on the ground) was prohibited unless the 6.3 tons criteria was met or the wheels were spinning faster than 72 knots.

      Unfortunately, in flight 2904's case, the crosswing landing maneuver placed most of the initial the force on a single landing gear, so the thrust reversers didn't deploy. The wet runway caused hydroplaning so the spoilers failed to deploy, hindering the pilots from getting the second landing gear down. By the time the above criteria were met and the plane began slowing down, it was well past the halfway point of the runway, and ended up going off the end. Design criteria selected to prevent one type of accident inadvertently caused another.
    • by sjames ( 1099 )

      I would suggest more defensive programming next time. For example, it isn't reasonable that the probe could make ground contact only 3 seconds after firing the landing thrusters. Determine the minimum possible time that is reasonable and don't even consider shutting the thrusters down until that much time has elapsed.

  • what 'cha gonna do
  • But software glitches should be easier to fix than a fundamental problem with the landing hardware

    Oh good, then just fix the glitch, recompile, and restart the landing sequence. Lucky they sent the Debug version the first time, maybe they should try the Release next?

  • by Billly Gates ( 198444 ) on Saturday October 29, 2016 @07:46PM (#53176955) Journal

    A quick glance at the low resolution screenshot showed an explosion with black soot. Engineers said it was caused by the rockets still being on.

    If they were turned off it would leak fuel in It's crater but would not ignite

    • by Anonymous Coward
      The lander used monopropellant engines fueled by hydrazine, which decomposes into hydrogen and nitrogen exothermically (~800ÂC) when passed by an appropriate catalyst. In the case of a crash landing, it probably decomposed a lot more vigorously, and that much heat presumably would've affected the composition of the Martian soil enough to turn it black.
  • QA (Score:5, Informative)

    by bradgoodman ( 964302 ) on Saturday October 29, 2016 @08:10PM (#53177017) Homepage
    I've been in organizations that had pretty light SQA departments. I used to say that the "really" good shops had 1-to-3 ratios - 1 engineer doing QA for every 3 doing implementation. When I started working for more "mission critical" stuff - that ratio went even higher.

    I know people that work in companies that design chips. Those manufacturing cycles are MUCH longer and expensive - you can't just recompile when you test and find a bug. This, their QA is probably more like 10 people doing simulation (behvioral, thermal, timing, power, emissions, RF susspetabiliy, etc) before a design is even fabricated.

    I would imagine that in Space Exploration - this would go even higher - given the time and expense of these missions. The point is - saying "it's just software" doesn't help you here. Software is *very* complex and the intricacies of advanced logic, variability of factors - trying to do this stuff probably dwarfs that of the hardware components in this day and age.

    • Re:QA (Score:4, Informative)

      by ShakaUVM ( 157947 ) on Saturday October 29, 2016 @10:04PM (#53177321) Homepage Journal

      >I would imagine that in Space Exploration - this would go even higher - given the time and expense of these missions.

      It is. Well, at least it is at JPL - I've gone through their coding standards and testing process for spaceflight, and it's extremely intensive.

      I watched a video on their standards before, and without rewatching it I don't know if this is the same one, but it looks pretty good skimming through it.

      https://www.youtube.com/watch?... [youtube.com]

      I'd be really interested in seeing someone go through the process and finding out where it went wrong.

    • My experience is the opposite. One instance we had 3 development engineers to about 8 test engineers. This was for a safety critical system.
    • Re:QA (Score:5, Informative)

      by johannesg ( 664142 ) on Sunday October 30, 2016 @03:20AM (#53177875)

      I work for a company that writes those simulations. Generally a simulation consists of a CPU emulator that runs the onboard software, and a whole bunch of models for each aspect of the spacecraft and environment: the orbit model, the communication model, various instrument models, etc.. These systems are generally set up to allow gradual replacement of each model with real hardware as it becomes available, so the software development is already underway long before the spacecraft hardware has even been built. Each model is a hard real-time program (to allow drop-in replacement of hardware), and has extensive capabilities for error injection in order to simulate things like flipped bits, broken communication channels, broken sensors, etc.

      I don't know what happened on Schiaparelli and they weren't a customer of ours anyway, but a scenario where a sensor breaks and sends bogus information could and should have been tested for during development.

      I'm not sure what the software engineer:QA ratio is - most of that happens internally by the spacecraft people. You run into their QA people everywhere though, while I have yet to personally meet my first flight software engineer.

      Oh, and back in the day I wrote the very first software-only environment for testing flight software on the ground. Up until then, the test environment used real hardware for the flight computer, thus requiring an expensive second set of flight computers just for doing the onboard software development. I hacked together a proof of concept that showed that you _could_ in fact model and simulate the flight computer as well, leading to a substantial cost saving on space projects since...

      The flight computer _simulator_ generally speaking runs on Linux. I'm not sure what the models use these days, but I have seen IRIX and Sun systems around for this purpose. As for the flight computer itself, VxWorks is not an uncommon choice of OS, and the on-board CPU is usually something like ERC32 or Leon - both are radiation-hardened SPARCs.

  • The kind that grew up in a world where the code you delivered had to work because you can't simply ship an update after you find out it barfs in all but laboratory conditions. I am guilty of it myself, I have to admit, I start to slack and deliver bananaware because, hey, a cursory test will do, if everything fails, just send a patch to the customer!

    We need programmers back that knew how to write code that, you know, WORKS!

    • You said it yourself - the problem isn't so much the programmers as it is management. Programmers often want to validate/verify/test their work a lot more, but if management's attitude is "ship it!", the coders don't stand a chance no matter how good they are. My experience is that if you're stuck on a difficult problem that only shows itself under very hard-to-duplicate conditions and are spending lots of time trying to fix it, you get written up instead of appreciation for trying to create a quality pro
      • A lot of scientific software isn't written like commercial software. Scientific software does amazing things, but the source code is often completely awful. I know, I used to write scientific software. Once I started looking at how commercial people wrote software I got much better - but very many scientists are great at science but write terrible, terrible code from the perspective of professional commercial developers. It may not be the case in the instance of this lander, I'm just putting it out ther
        • by NotAPK ( 4529127 )

          I actually make a good chunk of my living re-writing scientific software. My physics PhD allows me to come rapidly up to speed on the domain knowledge that went into the original code, and then I can apply software development best practices to produce a final product that is actually maintainable. However, I would be surprised to hear that any actually scientists would have contributed functional code to this space probe mission. At the end of the day, such software is an engineering item: said scientists

      • The difference is that an old progger will probably be more inclined to say "screw you, pimple face, I don't give a shit what you learned in your MBA courses, obviously it's short for mastering bullshit annoyingly. Now go back into your office, play with your numbers and don't stand in the way of people who're actually working. It ships when it's done".

        • And I've done just that, followed by a visit to HR for being "insubordinate". My favorite was when we had a project that got cancelled halfway through. I took great care to make notes in the source that it wasn't finished, hadn't been properly tested, and under no circumstance should it be shipped. About a year later, I got written up because "my code" was causing crashes in a another customer's mission critical system. They didn't tell me specifically what the problem was, so I went exploring through t
          • "I have to give you 2 weeks notice. I have more than 2 weeks of vacation time standing. See you. Or rather, never gonna again."

    • Old programmers also made mistakes. Remember the Morris worm? It has used vulnerabilities in several Unix applications.

  • send it to space to land on earth.
    The fact is, that any approach that will work on Mars, with minor mods, will work on Earth. So, it is easy enough to test this.
    Once ESA has a REAL FULLY TESTED LANDING SYSTEM, then everything else is 1 offs.
    And considering the money that ESA has spent on going to mars, only to crash, it would be worth their effort to full test one.
    • by dbIII ( 701233 )
      The xplane software has an interesting Mars atmosphere simulation mode that shows how wildly different things are.
      Having to fly near the speed of sound to avoid stalling and controls not having much of a grip on the air to respond are two things that rub in the many differences.
      • Can you fly a modified VTOL aircraft (Super MarsHarrier) in a special semi-vertical mode so you have upward thrust AND forward thurst?
        • by dbIII ( 701233 )
          If you have some incredibly huge jet engines (to compress that not very dense air) that can pivot why not, but rockets sound easier. It's not so much a plane then as a "flying bedstead" like the Eagle lander simulator of the 1960s.
  • Government work = lowest bidder
    You get what you pay for, sometimes.

  • I told them not to run Microsoft Lander, but nobody would listen. "But everyone else is using it blah blah."

  • Here's my take on it: The lander's radar got a reflection from the plasma from the decent engine and indicated close proximity to the ground. The software then did exactly what it was programmed to do -- it shut down the thrusters. Once that event occurred, the software entered a new state and possibly even shut down the ground radar. Thus it was doomed from that point forward. No amount of pre-mission testing could have detected this scenario.

    The above is complete speculation, but I believe that the

    • Kind of reckless if they did not have plausibility checking in there. Standard practice is to put checks in that won't even start looking for the ground until an appropriate amount of time has passed - to avoid exactly that sort of thing.

      I can't imagine why you wouldn't put that sort of check in. Well no, I guess I can imagine it, and it makes me sad.

    • The lander's radar got a reflection from the plasma from the decent engine and indicated close proximity to the ground.

      Not very likely, even if the engine exhaust was plasma, the geometry of the spacecraft and the low density of the exhaust mitigate against that occurrence.

  • From what is written in TFA, it could be a software bug, but it could be as well a sensor fault. It's probably too early to figure exactly what happened. Nevertheless, it is likely the best to present it that way for now, as a software bug is easier to fix than re-designing the sensor suite.
  • Did they have any coders recently imported from console gaming backgrounds? They have a very relaxed view on fixing bugs after go-live!
  • Thrusters, designed to decelerate the craft for 30 seconds until it was metres off the ground, engaged for only around 3 seconds before they were commanded to switch off

    I must have put a decimal point in the wrong place or something. Shit! I always do that. I always mess up some mundane detail!

  • Seems a little unprofessional for the head of solar and planetary missions to be publicly spreading theories for which he has zero evidence, even if they are qualified ?

    On another note, as a Software Engineer I know for certain that 99% of all failures are Hardware related ;-)

  • Isn't it astounding that in 1968 NASA sent a mission to the moon, with hand wired graphite memory ropes.. termed by NASA as "Little Old Lady" memory, and with less memory than a commodore 64 they sent men to the moon, landed, toured, came back to the ship, then came back to earth? Anyone who believes any of NASA's lies, has to be among the most gullible people in the world.

You can tell how far we have to go, when FORTRAN is the language of supercomputers. -- Steven Feiner

Working...