Programming Error Doomed Russian Mars Probe 276

Posted by Soulskill on Tuesday February 07, 2012 @03:25PM from the to-infinite-loops-and-beyond dept.

astroengine writes "So it turns out U.S. radars weren't to blame for the unfortunate demise of Russia's Phobos-Grunt Mars sample return mission — it was a computer programming error that doomed the probe, a government board investigating the accident has determined." According to the Planetary Society Blog's unofficial translation and paraphrasing of the incident report, "The spacecraft computer failed when two of the chips in the electronics suffered radiation damage. (The Russians say that radiation damage is the most likely cause, but the spacecraft was still in low Earth orbit beneath the radiation belts.) Whatever triggered the chip failure, the ultimate cause was the use of non-space-qualified electronic components. When the chips failed, the on-board computer program crashed."

Programming Error Doomed Russian Mars Probe

This discussion has been archived. No new comments can be posted.

Search 276 Comments Log In/Create an Account

Comments Filter:

headline fail (Score:3, Informative)

by jamessnell ( 857336 ) writes: on Tuesday February 07, 2012 @03:29PM (#38958095) Homepage

"the ultimate cause was the use of non-space-qualified electronic component" != "programming error" hardware fail.

Re:Excuse me... not a programmer's fault. (Score:5, Informative)

by Cochonou ( 576531 ) writes: on Tuesday February 07, 2012 @03:37PM (#38958221) Homepage

Well... if you read TFA (or actually the first TFA linked), it is clearly written:
In a report to be presented to Russian Deputy Prime Minister Dmitry Rogozin on Tuesday, investigators concluded that the primary cause of the failure was "a programming error which led to a simultaneous reboot of two working channels of an onboard computer [...] Likewise, cosmic rays and/or defective electronics are not the leading suspects behind Phobos-Grunt’s demise.
The summary is clearly bolting together two contradicting reports.

Re:headline fail (Score:2, Informative)

by Anonymous Coward writes: on Tuesday February 07, 2012 @03:39PM (#38958259)

They probably just had someone ordering parts that didn't know to order mil spec (I'm assuming mil spec is fine for space stuff)
No, not even close. "Mil spec" is basically industrial grade with a little bit extended temperature range. Radiation hardened stuff is completely different ballpark.

Contradictions (Score:5, Informative)

by Aladrin ( 926209 ) writes: on Tuesday February 07, 2012 @03:39PM (#38958265)

The summary is so contradictory because it quotes from 2 articles, and each of them is completely different. One says that the parts were space-tested and fine, and the other says they were never space-certified and were definitely bad. The first one says instead that a software bug caused parts of the system to reboot. The second doesn't know what happened and just blames faulty hardware.

Re:So how much? (Score:4, Informative)

by stewbee ( 1019450 ) writes: on Tuesday February 07, 2012 @03:53PM (#38958499)

If only. The reason ICs cost so little is that the cost is spread out over millions of parts. As my analog circuits Prof would say. "Your very first IC off the line is going to cost a million dollars. Everything else after that is free." So to buy one or two ICs that are radiation hardened is probably going to cost that much since it will most likely be custom. Now that's not to say they can't reuse some of the masks for an existing IC to make it cheaper, but It won't be that much cheaper. My guess is that they would want to redesign the part anyway if it is going to be in a radiation intense environment. The radiation could cause some weird quantum effects in the IC that might mean they want the transistors to be larger for reliability purposes. But that last part is just a guess since I am not an IC designer and thought my electronic materials class was nothing short of voodoo.

Long story short, they probably saved more than $5 for using a COTS part, but they probably lost the probe by the part not being radiation hardened.

Re:Excuse me... not a programmer's fault. (Score:3, Informative)

by Anonymous Coward writes: on Tuesday February 07, 2012 @04:29PM (#38959025)

In that case, the primary CPU is already up and running; it's booting additional processors.

Re:TFS - obviously written by a hardware guy (Score:5, Informative)

by mevets ( 322601 ) writes: on Tuesday February 07, 2012 @04:50PM (#38959293)

Try this one on your hardware guys:
"The main purpose of software is to make hardware reliable".
Drives them nuts...

Re:Excuse me... not a programmer's fault. (Score:5, Informative)

by K. S. Kyosuke ( 729550 ) writes: on Tuesday February 07, 2012 @05:06PM (#38959515)

I'm not a satellite engineer, but wouldn't it be easy enough to just install a lead shield around the PCB to protect from most radiation? As long as the shield's not too thick, it shouldn't add too much weight, especially compared to using older-technology chips that'll take up more board space.
Well, that depends. Even on Earth's surface, we have to use ECC in more demanding application. In LEO, you lose the protection of the atmosphere but you still have Earth's rather strong and large magnetosphere. But this was an interplanetary probe. Once you get out of the radiation belts, interstellar and intergalactic particles start hitting you. You can't protect from those with a lead shield of any reasonable size. Pretty much the only way is simply to make the chip simple, rugged and design it with components (transistors) large enough that a particle flying through won't bother you much. Or add redudnancy. Or both, if possible (that's the usual case).

Re:Worse than on the ground... (Score:4, Informative)

by Panaflex ( 13191 ) writes: <convivialdingoNO@SPAMyahoo.com> on Tuesday February 07, 2012 @05:35PM (#38959887)

There's hardware to deal with that - a watchdog timer can reboot the system quickly.
Assuming the system comes back up with a working CPU and RAM, then the main computer should be able to work around bad peripheral or components on the bus. I think that's what the article is getting at.
On military aircraft, they use VM's to run the OS and software. Communicate between systems is passed synchronously and requires that each module know the state of the other modules. There is never an assumption that the other system will just work - all messages require acknowledgement and verification of results.

Re:Excuse me... not a programmer's fault. (Score:4, Informative)

by ChrisMaple ( 607946 ) writes: on Tuesday February 07, 2012 @06:04PM (#38960239)

There are many aspects to radiation hardness. Radiation can flip one or more bits, resulting in bad data or program crash. Radiation can cause latchup, which will last until power is cycled; if the design is bad, latchup can fry a part. Rad hard parts are designed to be resistant to latchup. Really bad radiation can damage a part that isn't even powered.
A laptop can live through bit flips, and with luck it can live through latchup, and be functional after power cycling. Spacecraft control generally has to be always on; power cycling in not an option. Thus the design requirements for spacecraft control must be much stricter.

Re:So how much? (Score:2, Informative)

by Anonymous Coward writes: on Tuesday February 07, 2012 @06:33PM (#38960537)

I have worked (not long) as an electrical engineer in a team developing electronics for scientific instruments mounted aboard space probes, rovers, etc. This means interplanetary travel and operation, so this is the kind of place where you definitely want to use rad-hard components, unlike low orbit where you are still well within the magnetosphere. Phobos-Grunt orbit-boosting stage had no good reason to use hardened components.
Concerning prices: I have done some design/prototyping but I wasn't involved with the procurement process of flight-qualified rad-hard components, so what I know is from discussion with colleagues. First, lead times can reach one year, even for quite basic components. Then, the cheapest rad-hard discrete MOSFET from International Rectifier (which is basically the only rad-hard MOSFET manufacturer - there is no room for competition in such a small market as rad-hard components) is in the vicinity of 400 €. And this is no high-power transistor, but the closest equivalent (although with higher specs most often not needed) to the 2N2222, the most basic low-power, logic-level MOSFET ever that you can buy for a few cents. The price ratio is more around 1000 here...

Re:Excuse me... not a programmer's fault. (Score:5, Informative)

by bughunter ( 10093 ) writes: <bughunter&earthlink,net> on Tuesday February 07, 2012 @07:44PM (#38961229) Journal

As another EE with experience in rad hard space qualified design, he's not being self-contradictory. He's spot on.
If your CMOS structures are prone to latchup in the presence of single high energy events, then shielding does you no good. The amount of shielding necessary would more than consume the entire payload mass budget. Adding insufficient shielding just creates showers of secondary particles, each with more than enough energy to cause latchup alone, therefore rendering you at a statistical loss compared to no shielding whatsoever.
With this in mind means designing the CMOS structure to make shielding unnecessary. For example, build your circuits on bulk insulators instead of bulk semiconductor.
Just because you can't understand it doesn't mean he's self contradictory. You just missed his point. And then attacked him.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Programming Error Doomed Russian Mars Probe 276

Programming Error Doomed Russian Mars Probe More Login

Programming Error Doomed Russian Mars Probe

headline fail (Score:3, Informative)

Re:Excuse me... not a programmer's fault. (Score:5, Informative)

Re:headline fail (Score:2, Informative)

Contradictions (Score:5, Informative)

Re:So how much? (Score:4, Informative)

Re:Excuse me... not a programmer's fault. (Score:3, Informative)

Re:TFS - obviously written by a hardware guy (Score:5, Informative)

Re:Excuse me... not a programmer's fault. (Score:5, Informative)

Re:Worse than on the ground... (Score:4, Informative)

Re:Excuse me... not a programmer's fault. (Score:4, Informative)

Re:So how much? (Score:2, Informative)

Re:Excuse me... not a programmer's fault. (Score:5, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot