Bad Code May Have Crashed Schiaparelli Mars Lander (nature.com) 163
cadogan west writes "In the accordance with the longstanding tradition of bad software wrecking space probes (See Mariner 1), it appears a coding bug crashed the ESA's latest attempt to land on Mars." Nature reports:
Thrusters, designed to decelerate the craft for 30 seconds until it was metres off the ground, engaged for only around 3 seconds before they were commanded to switch off, because the lander's computer thought it was on the ground. The lander even switched on its suite of instruments, ready to record Mars's weather and electrical field, although they did not collect data...
The most likely culprit is a flaw in the craft's software or a problem in merging the data coming from different sensors, which may have led the craft to believe it was lower in altitude than it really was, says Andrea Accomazzo, ESA's head of solar and planetary missions. Accomazzo says that this is a hunch; he is reluctant to diagnose the fault before a full post-mortem has been carried out... But software glitches should be easier to fix than a fundamental problem with the landing hardware, which ESA scientists say seems to have passed its test with flying colours.
The most likely culprit is a flaw in the craft's software or a problem in merging the data coming from different sensors, which may have led the craft to believe it was lower in altitude than it really was, says Andrea Accomazzo, ESA's head of solar and planetary missions. Accomazzo says that this is a hunch; he is reluctant to diagnose the fault before a full post-mortem has been carried out... But software glitches should be easier to fix than a fundamental problem with the landing hardware, which ESA scientists say seems to have passed its test with flying colours.
Mark my words (Score:5, Funny)
New age hippie liberal airheads. If it's not a hogshead, it's not fresh!
Re: (Score:2)
This wouldn't have happened if they'd used imperial not metric!
New age hippie liberal airheads.
Naahh, just Europeans.
Re: (Score:2)
I think this problem is a manifestation of a scientific units monoculture.
We need diverse units to keep our calculations robust.
Re: (Score:3)
Not naming it after a crater [wikipedia.org] might've helped, too.
Re:Mark my words (Score:4, Funny)
Now whenever an ESA scientist wants to talk about the Schiaparelli crater, you can ask "which one?" to shut him up.
Re: (Score:2)
I take it that you'll be holding yourself down and beating yourself to death, to save us the effort?
Re: (Score:1)
Martians (Score:5, Funny)
Re: (Score:3)
Behold how the weaklings have fallen! Our priests and soldiers have toiled many days to finish our planetary defenses, and now they are operational! Our prayers during the last eclipse were especially effective.
A junior reporter who asked a question about 'metric' was hastily removed from the
Re: (Score:2)
They panicked when they heard we were sending a "probe" to learn more about them.
There is no bad code. (Score:3)
Re: (Score:2)
Re:There is no bad code. (Score:4, Insightful)
Testing on another planet is not that easy, though.
Yes, test it all in production.
Since testing is sooooooooo hard.
Landing is the most complicated part and Beagle and others have failed exactly here. There should be x100 or even more code for unit and integration testing than the actual code itself for the landing code. And, those tests should run through every permutation possible of every possible failure point or bad sensor readings.
There is no way it thinks it has landed with that many sensor inputs. It is simply code that is not put through a good enough testing system.
Re: (Score:1)
Well, it's not like this is the first time a system thought it was near the ground while in fact it wasn't. Turkish Airlines flight 1951 [wikipedia.org] just to name one.
Re: (Score:2, Interesting)
If the people writing the simulation are too close to the people writing the control software, I can see this happening.
When I worked on this stuff (and I did, including a Mars probe) we had three independent teams on different sides of the building, each with their own set of requirements, design, code and tests. Not only that, but the development environments and languages were different to avoid common mode bugs. Fun times; I have no idea how things are done today.
Re: (Score:1)
Googled DATDP and couldn't find anything. What does it stand for?
He must work for a eupopean space agency, they don't define their acronyms. Or maybe it was nasa...
Re: (Score:1)
Re: (Score:2, Interesting)
Working in a company that makes automotive electronics, I can say that any problem without an obvious hardware assembly cause becomes defined as a software problem.
Faulty sensor causing false readings that cause the software to detect that the craft is on the ground? That's the software's fault for not detecting that the sensor was faulty and using magic as a backup method to get the right result.
Re: (Score:1)
Yeah, and this occurs on earth too.
"No, sorry, you have to replace your 1200 dollar catalytic converter"
"Can you just replace the 02 sensors since they are a problem on my jeep as evidenced by online posts???" //invalidates money spent to fix the problem by adding the false error the o2 sensors made
"Yeah sure - 200 bucks in labor for two sixty dollar sensors. Jerk"
"ok, thanks. (passes emissions test) "
Re: (Score:2)
As someone else who works designing and building automotive radar systems, I take umbrage at your defensiveness. For one thing, car parts have to be dirt cheap to be competitive. parts for a Mars Lander don't. We don't know whether there were redundant altimeters (but normally there's three of everything ), and we don't even know that it was in fact a faulty altitude signal, regardless of root cause, that led to the crash.
Re: (Score:2)
Re: (Score:2)
It's actually worse than that. If there's a problem with the hardware, i.e. it's known to be failing to do what it's supposed to, it's the software people who're tasked with making a "workaround", i.e. frigging their own code to correct the error rather than the (often more expensive) hardware mod. I've got so many software projects behind me with hacks for hardware bugs you wouldn't believe. "isn't that what software is for?" is the inevitable bollocks you get from hardware engineers when confronted with the problem.
Detecting when the hardware fails -is- part of the software's job. Industrial software does this routinely. Why can't aerospace software?
But launching with known-bad hardware is criminal... 8-P
Re: (Score:1)
Um... there is bad code AND bad testing! 8-P
All code should be considered bad, until it passes testing. All testing should be considered bad, until it finds some bugs. Wash and repeat... ;-)
Of course, testing costs money and it's better to buy the boss a new big desk... right??
Re: (Score:1)
It's not like the normal software release scheduling governed by trade shows, salesmen selling shit that doesnt exist yet or the normal "we have to ship so we can book revenue for this quater!".
From all i've heard tho in this case it does seem to be "blame the software until we find the root cause", which i think any software dev is
Re: (Score:2)
Re: (Score:2, Informative)
As a manufacturing engineer I can tell you from experience even in tightly regulated industries the instances of the print not matching the part is more common than you would think, even on parts produced for decades. When you are talking about one-offs that just self-destructed on another planet and cannot compare the as-produced part to the print it becomes exceedingly difficult to account for last-minute design changes.
Re: easier to fix? (Score:1)
That one indeed. I remember years ago in the manufacturing facility where my father worked a machinist retired and suddenly the quality of a certain part went down to unusable. What they found out that the part was rotaionally casted and the original machinist didn't read the actual casting instruction that came with the blueprints. He was an experienced machinist and knew what part he had to deliver and he always delivered it just right. The ones that the guy who came after him produced were brittle and no
Re: (Score:2)
This is common everywhere: how to capture specific domain knowledge? And remember, it's a two-way process. The knowledge of the current expert has to be recorded in some way, and then, the new guy has to be trained to the intricacies of the previous procedure.Mix in the observation that some players may not want this process to be successful, and you're probably boned before you even realise it.
Re: (Score:1)
This is common everywhere: how to capture specific domain knowledge? And remember, it's a two-way process. The knowledge of the current expert has to be recorded in some way, and then, the new guy has to be trained to the intricacies of the previous procedure. ...
The place that records the previous expert's knowledge is called "Source Code". Destroy that and you will start as an ignorant beginner. 8-)
Don't believe what "everyone says". ;-)
Re: (Score:2)
Re: (Score:2)
Easier to fix - in that, "we don't have to spend 10 years redesigning the entire fucking lander, we just need to fix the fucking software and send a new fucking copy of the same hardware in six fucking months."
Jesus fucking christ, are you really that fucking Asperger's?
I don't see much evidence that the OP has Asperger's, but you may want to get checked out about your possible case of Tourette syndrome.
Re: (Score:3)
What the hell is that "easier to fix" comment about?
How are you going to issue a software patch to the pile of rubble on another planet? This is not a situation where you can ship the product without testing and fix it in firmware later!.
I've been doing a lot of reading about the early space programs of the US and the Soviet Union, and that context the meaning is clear: you can use the same approach in the next Mars landing attempt; you don't have to redesign an entirely new system.
"Rocket science" is hard, because you not only have to be smart, you have to be able to stand repeated failure. Normal people when faced with a spectacular fiasco give up, or they wipe the slate clean and start over. But in something as complicated as a mission
Re: (Score:2)
You can't if the people who did it before and were helping since 2009 pull out in 2012, take their bat and ball and go home.
NASA pulled out due to budget cuts and IP restrictions meant the ESA couldn't use their stuff after they pulled out.
So in this case politi
Re: (Score:2)
I think you are a little messed up.
Re:easier to fix? (Score:5, Funny)
How are you going to issue a software patch to the pile of rubble on another planet? This is not a situation where you can ship the product without testing and fix it in firmware later!.
It's Agile. The product owner will raise this issue as a priority in the backlog, they'll fix it in this sprint, and it will ship in the next release.
And suddenly (Score:2)
A code problem eh? Shit happens, and my condolences - it can happen to any of us.
Re: And suddenly (Score:1)
Funny though only US has success on the red planet. This is not only Europe's second attempt, but the same bug. In 1999 the computer shut off its thrusters too early too and it crashed it's probe 100 feet in the air.
I forgot the name of the probe for that one.
Re: (Score:2)
Funny though only US has success on the red planet. This is not only Europe's second attempt, but the same bug. In 1999 the computer shut off its thrusters too early too and it crashed it's probe 100 feet in the air.
I forgot the name of the probe for that one.
I'm not certain. The Beagle in 2004 didn't deploy correctly, but otherwise I'm not certain. The Phobos-Grunt didn't make it out of earth gravity in 2011. It is so darn hard to land on Mars. The martian atmosphere is thick enough to make for a lot of heat, but not thick enough to slow you down anywhere near a normal earth type landing. And of course, the distances and reulting delays in communications time were making for around 14 minutes radio time, and 7 minutes of time between atmospheric contact and la
Considering the decline in code quality... (Score:3)
...in recent years, it wouldn't surprise me one bit
Re: (Score:3)
...in recent years, it wouldn't surprise me one bit
Yeah, ever since they stopped using goto and started with these class things, everything is going to hell.
Re:Considering the decline in code quality... (Score:5, Funny)
everything is going to hell.
No, everything is calling hell() as a function.
Re: (Score:2)
No, goto hell is still worse!
Re: (Score:2)
Done right, C++ classes can run pretty efficiently. In any of my classes, I know exactly what memory that class is occupying. Dynamic memory allocation is easy enough to manage, much more flexible than memory maps. Debugging is far easier with today's IDE, rather than doing by watching an I/O pin on an oscilloscope or LED.
Java and C# can be worse memory wise but still manageable if you know what you are doing and I've found C# in particular allows me to prototype code a lot faster. I'm taking an algorit
Re: (Score:2)
Re: (Score:2)
the fact that the possibility of "software bug" is even listed as a possibility, speaks volumes.
Re: (Score:1)
Nuke the bugs from orbit (Score:2)
It's the only way to be sure.
sounds familiar (Score:2)
I don't remember which lander, but a previous one somewhere suffered a similar problem, mistaking landing leg deployment for surface contact. (the legs came down, and when the hit full stop the bounced back up a bit, triggering it to think the foot hit the surface) which caused it to shut off the landing
Re: (Score:2)
Re: (Score:2)
All three accelerometers, and that they've registered no movement for long enough to confirm they're not moving. It's easy to armchair-quarterback stuff like this, but really, you'd think there was sufficient hardware on the machine to deal with problems like this.
Re: (Score:1)
I thought radar was typically used to know how far one is from the ground? Seems a lot more straightforward than detecting thumps and bumps.
Re: (Score:2)
I thought radar was typically used to know how far one is from the ground? Seems a lot more straightforward than detecting thumps and bumps.
Radar takes a lot of power to run and a non-trivial amount of weight and space for something you only use ONE time. So when possible they try to come up with other ways to do it.
In this case I have no idea what they were using but nobody has mentioned radars, but they have mentioned that the lander was running on very small batteries designed to only last a couple days on the surface. This suggests they didn't have the power for a radar.
Re: (Score:2)
Re: (Score:2)
Pretty sure you're thinking of Sonar, not Radar, which as the OP says, does use lots of power (and a vacuum tube, which means a separate high voltage power supply, no such thing as solid state radar, because SS can't handle the power needed).
Welcome, time traveller from the 1970s, to the year 2016, where 3.3V low-power solid-state radar [nxp.com] most definitely is a thing, and is mass-deployed in cars and other moving objects.
Re: (Score:2)
Compare to the cost, weight, and power consumption of simple contact switches.
Re: (Score:2)
Re: (Score:2)
if t were a vacuum, you'd be right. But there is no true freefall in an atmosphere. The air slows you down, and you do experience some gravity. (if you can reach terminal velocity, you'll be experiencing 100% gravity, as gravity is trying to accelerate you, but can't) The denser the atmosphere (which varies depending on your altitude) the more gravity you feel. They frequently rely on that to deploy the landing gear, it doesn't need to be powered when landing on a body with appreciable gravity.
It's admi
Re:sounds familiar (Score:5, Interesting)
Lufthansa flight 2904 [wikipedia.org] is a good example. The plane had to land in an expected crosswind on a wet runway. A crosswind landing requires landing with the plane's orientation misaligned from the runway. The plane is pointed into the crosswind, so is actually landing diagonally, then when it hits the ground it has to quickly yaw so it's aligned with the runway (so the wheels are pointed in the right direction). The way this is done is it lands on one gear first, pivots around on that gear to point the nose at the end of the runway, then drops down the second gear, then the nose gear.
The A320's flight computer was programmed to avoid the disastrous scenario of a thrust reverser deploying in mid-air [wikipedia.org]. It prohibited deployment of the thrust reversers unless both rear landing gear had 6.3 tons of force each on them. Full deployment of the spoilers (disrupts lift to plant the plane firmly on the ground) was prohibited unless the 6.3 tons criteria was met or the wheels were spinning faster than 72 knots.
Unfortunately, in flight 2904's case, the crosswing landing maneuver placed most of the initial the force on a single landing gear, so the thrust reversers didn't deploy. The wet runway caused hydroplaning so the spoilers failed to deploy, hindering the pilots from getting the second landing gear down. By the time the above criteria were met and the plane began slowing down, it was well past the halfway point of the runway, and ended up going off the end. Design criteria selected to prevent one type of accident inadvertently caused another.
Re: (Score:2)
I would suggest more defensive programming next time. For example, it isn't reasonable that the probe could make ground contact only 3 seconds after firing the landing thrusters. Determine the minimum possible time that is reasonable and don't even consider shutting the thrusters down until that much time has elapsed.
bad code bad code (Score:2)
Well if that's the only problem... (Score:2)
Oh good, then just fix the glitch, recompile, and restart the landing sequence. Lucky they sent the Debug version the first time, maybe they should try the Release next?
Than why black oily explosion? (Score:3)
A quick glance at the low resolution screenshot showed an explosion with black soot. Engineers said it was caused by the rockets still being on.
If they were turned off it would leak fuel in It's crater but would not ignite
Re: (Score:1)
QA (Score:5, Informative)
I know people that work in companies that design chips. Those manufacturing cycles are MUCH longer and expensive - you can't just recompile when you test and find a bug. This, their QA is probably more like 10 people doing simulation (behvioral, thermal, timing, power, emissions, RF susspetabiliy, etc) before a design is even fabricated.
I would imagine that in Space Exploration - this would go even higher - given the time and expense of these missions. The point is - saying "it's just software" doesn't help you here. Software is *very* complex and the intricacies of advanced logic, variability of factors - trying to do this stuff probably dwarfs that of the hardware components in this day and age.
Re:QA (Score:4, Informative)
>I would imagine that in Space Exploration - this would go even higher - given the time and expense of these missions.
It is. Well, at least it is at JPL - I've gone through their coding standards and testing process for spaceflight, and it's extremely intensive.
I watched a video on their standards before, and without rewatching it I don't know if this is the same one, but it looks pretty good skimming through it.
https://www.youtube.com/watch?... [youtube.com]
I'd be really interested in seeing someone go through the process and finding out where it went wrong.
Re: (Score:2)
Re:QA (Score:5, Informative)
I work for a company that writes those simulations. Generally a simulation consists of a CPU emulator that runs the onboard software, and a whole bunch of models for each aspect of the spacecraft and environment: the orbit model, the communication model, various instrument models, etc.. These systems are generally set up to allow gradual replacement of each model with real hardware as it becomes available, so the software development is already underway long before the spacecraft hardware has even been built. Each model is a hard real-time program (to allow drop-in replacement of hardware), and has extensive capabilities for error injection in order to simulate things like flipped bits, broken communication channels, broken sensors, etc.
I don't know what happened on Schiaparelli and they weren't a customer of ours anyway, but a scenario where a sensor breaks and sends bogus information could and should have been tested for during development.
I'm not sure what the software engineer:QA ratio is - most of that happens internally by the spacecraft people. You run into their QA people everywhere though, while I have yet to personally meet my first flight software engineer.
Oh, and back in the day I wrote the very first software-only environment for testing flight software on the ground. Up until then, the test environment used real hardware for the flight computer, thus requiring an expensive second set of flight computers just for doing the onboard software development. I hacked together a proof of concept that showed that you _could_ in fact model and simulate the flight computer as well, leading to a substantial cost saving on space projects since...
The flight computer _simulator_ generally speaking runs on Linux. I'm not sure what the models use these days, but I have seen IRIX and Sun systems around for this purpose. As for the flight computer itself, VxWorks is not an uncommon choice of OS, and the on-board CPU is usually something like ERC32 or Leon - both are radiation-hardened SPARCs.
Re: (Score:2)
Great post, thanks.
We need the old programmers back. (Score:2)
The kind that grew up in a world where the code you delivered had to work because you can't simply ship an update after you find out it barfs in all but laboratory conditions. I am guilty of it myself, I have to admit, I start to slack and deliver bananaware because, hey, a cursory test will do, if everything fails, just send a patch to the customer!
We need programmers back that knew how to write code that, you know, WORKS!
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
I actually make a good chunk of my living re-writing scientific software. My physics PhD allows me to come rapidly up to speed on the domain knowledge that went into the original code, and then I can apply software development best practices to produce a final product that is actually maintainable. However, I would be surprised to hear that any actually scientists would have contributed functional code to this space probe mission. At the end of the day, such software is an engineering item: said scientists
Re: (Score:2)
The difference is that an old progger will probably be more inclined to say "screw you, pimple face, I don't give a shit what you learned in your MBA courses, obviously it's short for mastering bullshit annoyingly. Now go back into your office, play with your numbers and don't stand in the way of people who're actually working. It ships when it's done".
Re: (Score:2)
Re: (Score:2)
"I have to give you 2 weeks notice. I have more than 2 weeks of vacation time standing. See you. Or rather, never gonna again."
Re: (Score:2)
Old programmers also made mistakes. Remember the Morris worm? It has used vulnerabilities in several Unix applications.
Re: (Score:2)
Come to Europe. We're hiring!
this is why they need to drop test it and then... (Score:2)
The fact is, that any approach that will work on Mars, with minor mods, will work on Earth. So, it is easy enough to test this.
Once ESA has a REAL FULLY TESTED LANDING SYSTEM, then everything else is 1 offs.
And considering the money that ESA has spent on going to mars, only to crash, it would be worth their effort to full test one.
Re: (Score:2)
Having to fly near the speed of sound to avoid stalling and controls not having much of a grip on the air to respond are two things that rub in the many differences.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
this particular issue would NOT have happened.
Government work = lowest bidder (Score:2)
Government work = lowest bidder
You get what you pay for, sometimes.
What you git (Score:1)
I told them not to run Microsoft Lander, but nobody would listen. "But everyone else is using it blah blah."
Re: (Score:2)
They didn't even upgrade to Kerbal Space Program.
My Theory (Score:2)
The above is complete speculation, but I believe that the
Re: (Score:2)
Kind of reckless if they did not have plausibility checking in there. Standard practice is to put checks in that won't even start looking for the ground until an appropriate amount of time has passed - to avoid exactly that sort of thing.
I can't imagine why you wouldn't put that sort of check in. Well no, I guess I can imagine it, and it makes me sad.
Re: (Score:2)
Not very likely, even if the engine exhaust was plasma, the geometry of the spacecraft and the low density of the exhaust mitigate against that occurrence.
Software bug, or... (Score:2)
Known issue (Score:1)
Mundane Details (Score:1)
Thrusters, designed to decelerate the craft for 30 seconds until it was metres off the ground, engaged for only around 3 seconds before they were commanded to switch off
I must have put a decimal point in the wrong place or something. Shit! I always do that. I always mess up some mundane detail!
Unsubstantiated gossip (Score:2)
Seems a little unprofessional for the head of solar and planetary missions to be publicly spreading theories for which he has zero evidence, even if they are qualified ?
On another note, as a Software Engineer I know for certain that 99% of all failures are Hardware related ;-)
Software bug on Mars. (Score:1)