Follow Slashdot stories on Twitter

Programming Error Doomed Russian Mars Probe 276

Posted by Soulskill on Tuesday February 07, 2012 @03:25PM from the to-infinite-loops-and-beyond dept.

astroengine writes "So it turns out U.S. radars weren't to blame for the unfortunate demise of Russia's Phobos-Grunt Mars sample return mission — it was a computer programming error that doomed the probe, a government board investigating the accident has determined." According to the Planetary Society Blog's unofficial translation and paraphrasing of the incident report, "The spacecraft computer failed when two of the chips in the electronics suffered radiation damage. (The Russians say that radiation damage is the most likely cause, but the spacecraft was still in low Earth orbit beneath the radiation belts.) Whatever triggered the chip failure, the ultimate cause was the use of non-space-qualified electronic components. When the chips failed, the on-board computer program crashed."

This discussion has been archived. No new comments can be posted.

Programming Error Doomed Russian Mars Probe

Load All Comments

Search 276 Comments Log In/Create an Account

Comments Filter:

Excuse me... not a programmer's fault. (Score:5, Insightful)

by LostCluster ( 625375 ) * writes: on Tuesday February 07, 2012 @03:25PM (#38958015)

We've got a contradictory summary here. Chip failure isn't a programming fault, it's a hardware problem. Stop confusing hardware and software you insensitive clod.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by Anonymous Coward writes:
  
  Obviously the error handling routine was poorly written.
  - Re:Excuse me... not a programmer's fault. (Score:5, Funny)
    
    by Anonymous Coward writes: on Tuesday February 07, 2012 @03:31PM (#38958143)
    
    sure, it missed:
    if(cpu_melted)
    abort();
    
    Parent Share
    twitter facebook
    - Re:Excuse me... not a programmer's fault. (Score:5, Funny)
      
      by MSesow ( 1256108 ) writes: on Tuesday February 07, 2012 @03:39PM (#38958267)
      
      That could throw a ProcessorNotFoundException, be sure to code accordingly.
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Funny)
        
        by Anonymous Coward writes:
        
        The linux kernel throws an error about unsupported CPU's, how that code should execute in the first place is a mystery.
        
        Re:Excuse me... not a programmer's fault. (Score:4, Funny)
        
        by tripleevenfall ( 1990004 ) writes: on Tuesday February 07, 2012 @04:08PM (#38958725)
        
        In Soviet Russia, code executes you!
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Informative)
        
        by Anonymous Coward writes:
        
        In that case, the primary CPU is already up and running; it's booting additional processors.
    - Re:Excuse me... not a programmer's fault. (Score:5, Funny)
      
      by wjsteele ( 255130 ) writes: on Tuesday February 07, 2012 @04:16PM (#38958825)
      
      Actually, that code worked perfectly!!!
      
      Bill
      
      Parent Share
      twitter facebook
  - TFS - obviously written by a hardware guy (Score:3)
    
    by Thud457 ( 234763 ) writes:
    
    "Cosmic rays?"
    "That's a software problem...
    They're lucky those chips they bought from China weren't made of lead, or contain deadly melamine!!!
    - Re:TFS - obviously written by a hardware guy (Score:5, Interesting)
      
      by sconeu ( 64226 ) writes: on Tuesday February 07, 2012 @04:17PM (#38958847) Homepage Journal
      
      You laugh, but how many of you low level guys had to work around buggy hardware?
      I once sent a memo to my boss that I was doing the equivalent of "working around a burnt out lightbulb in software".
      E.g.: How many hardware guys does it take to change a lightbulb? None, we'll just have the software work around it.
      
      Parent Share
      twitter facebook
      - Re:TFS - obviously written by a hardware guy (Score:5, Informative)
        
        by mevets ( 322601 ) writes: on Tuesday February 07, 2012 @04:50PM (#38959293)
        
        Try this one on your hardware guys:
        "The main purpose of software is to make hardware reliable".
        Drives them nuts...
        
        Parent Share
        twitter facebook
      - Re:TFS - obviously written by a hardware guy (Score:5, Interesting)
        
        by garyebickford ( 222422 ) writes: <gar37bic.gmail@com> on Tuesday February 07, 2012 @07:22PM (#38961039)
        
        Not even necessarily low level. I once had a weird intermittent problem in a PHP driven web system. After a couple of weeks of diagnosing (largely trying to find a case the could more-or-less reliably tickle the bug), it turned out to be an interaction of a bug in the Redhat version of that day (2001) with a bug in the particular CPU we were using. PHP code just happened to trigger it under certain conditions. Since the box was at Level 3, we had to drive an hour down there and replace the machine.
        And long ago I worked on Perq workstations, which had a stack-machine CPU (the CPU was a 15x15 inch board filled with TTL). The expression stack was four chips. The system was designed around the chip spec - NEVER DO THAT!!! Chips can not be depended to go at exactly the design spec - some are slow, some are fast. As a result, every CPU had to be tested at installation with those four chips inserted in different locations, essentially in order of speed. If a fast one came after a slow one in the slots, the CPU would barf. Basically someone just kept swapping chips around until it worked.
        We were just discussing some of the remarkable repairs done in software to accommodate problems in various interplanetary probes - truly amazing stuff.
        
        Parent Share
        twitter facebook
    - Re: (Score:2)
      
      by jd2112 ( 1535857 ) writes:
      
      "Cosmic rays?" "That's a software problem...
      They're lucky those chips they bought from China weren't made of lead, or contain deadly melamine!!!
      If they were made of lead they might have blocked enough radiation to prevent them from crashing.
  - Re:Excuse me... not a programmer's fault. (Score:5, Interesting)
    
    by icebike ( 68054 ) * writes: on Tuesday February 07, 2012 @03:54PM (#38958521)
    
    Obviously the error handling routine was poorly written.
    I'll assume your tongue was firmly planted in your cheek, and suggest a +1 Funny mod.
    But on the chance you were serious, depending on where that chip was, it may have been beyond something manageable by software.
    A chip in a power controller could take down any or all of the processor components, or render access to control circuits impossible.
    The linked article also states
    Everything was working well with the spacecraft immediately after launch, including deployment of the solar panels, until the command to start the engines was issued. When that did not happen, the spacecraft went into a safe mode, keeping the solar panels pointed to the Sun to maintain power.
    How many times do you supposed they actually tested engine start IN THE SPACE CRAFT? I'm guessing ZERO.
    non-space qualified parts being used in some of the electronics circuits. This is a design failure by the spacecraft engineers that might have been caught had they performed adequate component and system testing prior to flight. But they did not.
    
    So design failure, due to radiation, prior to the craft getting near the strongest radiation belts. Unbelievable. Occam would be skeptical.
    This sounds to me like some on-board internal source of radiation, or induction, or simple overload, fried a chip somewhere in some un-specified circuitry, most probably in the engine controls. This seems far more likely than an external radiation source given the shielding the physical design would provide.
    I doubt space qualification made any difference at all. The window for space radiation in the brief time it was operational was small.
    Rather I suspect under-spec parts, over voltage or high current draw, or internal shielding oversights.
    
    Parent Share
    twitter facebook
    - Re: (Score:3)
      
      by Rakishi ( 759894 ) writes:
      
      How many times do you supposed they actually tested engine start IN THE SPACE CRAFT? I'm guessing ZERO.
      I'm sure they tested the engine multiple times. I'd figure the stress of the launch (vibrations, etc, etc.) causes something to fail either due to shoddy construction or small debris falling onto something.
      I doubt space qualification made any difference at all. The window for space radiation in the brief time it was operational was small.
      Exactly. I doubt all those laptops on the ISS are radiation hardened but they last quiet a while anyway.
      - Re: (Score:3)
        
        by icebike ( 68054 ) * writes:
        
        How many times do you supposed they actually tested engine start IN THE SPACE CRAFT? I'm guessing ZERO.
        I'm sure they tested the engine multiple times. I'd figure the stress of the launch (vibrations, etc, etc.) causes something to fail either due to shoddy construction or small debris falling onto something.
        I'm sure they tested the engines too. Its probably a tried and true engine. The Russians tend to make very good motors.
        But I seriously doubt they tested it in the space craft using the space craft's wiring harness. They used the harness on the test bed platform.
      - Re:Excuse me... not a programmer's fault. (Score:4, Informative)
        
        by ChrisMaple ( 607946 ) writes: on Tuesday February 07, 2012 @06:04PM (#38960239)
        
        There are many aspects to radiation hardness. Radiation can flip one or more bits, resulting in bad data or program crash. Radiation can cause latchup, which will last until power is cycled; if the design is bad, latchup can fry a part. Rad hard parts are designed to be resistant to latchup. Really bad radiation can damage a part that isn't even powered.
        A laptop can live through bit flips, and with luck it can live through latchup, and be functional after power cycling. Spacecraft control generally has to be always on; power cycling in not an option. Thus the design requirements for spacecraft control must be much stricter.
        
        Parent Share
        twitter facebook
      - Re:Excuse me... not a programmer's fault. (Score:4, Interesting)
        
        by robot256 ( 1635039 ) writes: on Tuesday February 07, 2012 @07:59PM (#38961347)
        
        Actually, darwin is kind of right. The difference between 120nm transistors and 45nm transistors is quite substantial. Between random radiation, natural wear due to thermal cycling, and period electrostatic discharges from handling and plugging in connectors, it is not surprising that the older chips are sturdier in general.
        But he may have just invoked the "They don't make them like they used to" logical fallacy, because sure there are some 20-year-old SNES machines, but how many of them died 2 years after production? Compare that percentage to the figure for PS3's and you have your answer.
        
        Parent Share
        twitter facebook
    - - Re:Excuse me... not a programmer's fault. (Score:5, Funny)
        
        by pixelpusher220 ( 529617 ) writes: on Tuesday February 07, 2012 @06:24PM (#38960437)
        
        Except no one knows for certain the computers crashed at all.
        I'm quite sure that the computers crashed. Right along with the spacecraft ;-)
        
        Parent Share
        twitter facebook
    - - Re:Excuse me... not a programmer's fault. (Score:5, Informative)
        
        by bughunter ( 10093 ) writes: <bughunter@@@earthlink...net> on Tuesday February 07, 2012 @07:44PM (#38961229) Journal
        
        As another EE with experience in rad hard space qualified design, he's not being self-contradictory. He's spot on.
        If your CMOS structures are prone to latchup in the presence of single high energy events, then shielding does you no good. The amount of shielding necessary would more than consume the entire payload mass budget. Adding insufficient shielding just creates showers of secondary particles, each with more than enough energy to cause latchup alone, therefore rendering you at a statistical loss compared to no shielding whatsoever.
        With this in mind means designing the CMOS structure to make shielding unnecessary. For example, build your circuits on bulk insulators instead of bulk semiconductor.
        Just because you can't understand it doesn't mean he's self contradictory. You just missed his point. And then attacked him.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by icebike ( 68054 ) * writes:
        
        100 times smaller in area per bit? Which makes it 100 times more susceptible,
        Or 100 times less susceptible assuming a random dispersal of cosmic rays. Smaller targets.
        Depends on the density of the rays I suppose.
        But in any case, that amount of errors WOULD be noticed if it were infact occurring and going undetected and uncorrected
        by the hardware. Just about zero memory goes unused in the modern computer. They strive to use it all in one way or
        another. Unused memory is wasted memory.
        Computers correct for these errors. Parity checking either in hardware or software. You can compare
  - - - Comment removed (Score:5, Interesting)
        
        by account_deleted ( 4530225 ) writes: on Tuesday February 07, 2012 @09:06PM (#38961891)
        
        Comment removed based on user account deletion
        
        Parent Share
        twitter facebook
        
        Re:Excuse me... not a programmer's fault. (Score:4, Interesting)
        
        by EETech1 ( 1179269 ) writes: on Wednesday February 08, 2012 @05:25AM (#38964619)
        
        I asked one of the main AVR designers from Norway if it was ok to set a configuration, or a constant in RAM during initialization and trust with 100% certainty that it would not change during operation. He said that even on the worlds cleanest power supply, and absent the presence of any EMI, he would still NOT recommend it.
        If you run 10 AVRs for 1000 hours you will see bits flipped. Many times it only effects a RAM variable that is constantly being recalculated anyways, so it causes little if any disruption to the operation of the device.
        It really sucks when its something critical like a timer counter control register.
        If anyone would like to duplicate my testing, I'd be glad to send code, but all you have to do is set everything to a known value, and then read it over and over til it changes. It doesn't take as long as you think (or hoped) it would! It also gives you a good idea on how well your PCB takes care of your Micro.
        Always check, and if necessary, reset your hardware configs during runtime! Those "all of the sudden it started acting up, so I turned it off and back on again and it was fine" problems just disappear!
        I still remember the time my CON_0 register read 8! Although I'm sure it'll happen again, you'll never notice it!
        Cheers
        
        Parent Share
        twitter facebook
- Re:Excuse me... not a programmer's fault. (Score:5, Informative)
  
  by Cochonou ( 576531 ) writes: on Tuesday February 07, 2012 @03:37PM (#38958221) Homepage
  
  Well... if you read TFA (or actually the first TFA linked), it is clearly written:
  In a report to be presented to Russian Deputy Prime Minister Dmitry Rogozin on Tuesday, investigators concluded that the primary cause of the failure was "a programming error which led to a simultaneous reboot of two working channels of an onboard computer [...] Likewise, cosmic rays and/or defective electronics are not the leading suspects behind Phobos-Grunt’s demise.
  The summary is clearly bolting together two contradicting reports.
  
  Parent Share
  twitter facebook
  - - Re:Excuse me... not a programmer's fault. (Score:5, Funny)
      
      by Anonymous Coward writes: on Tuesday February 07, 2012 @03:52PM (#38958485)
      
      This has nothing to do with reading TFA. It has everything to do with the summary
      You just defined all of slashdot. What was your point again?
      
      Parent Share
      twitter facebook
- Re: (Score:2)
  
  by Rary ( 566291 ) writes:
  
  The second link makes the following claim:
  In a report to be presented to Russian Deputy Prime Minister Dmitry Rogozin on Tuesday, investigators concluded that the primary cause of the failure was "a programming error which led to a simultaneous reboot of two working channels of an onboard computer," the Russian state-owned news agency RIA Novosti reported.
  However, the third link says nothing of the sort. It sounds like TFS is just a mishmash of conflicting theories from different articles.
  - Re:Excuse me... not a programmer's fault. (Score:4, Interesting)
    
    by Rary ( 566291 ) writes: on Tuesday February 07, 2012 @03:45PM (#38958359)
    
    To follow up, the article saying that it was a chip failure is dated yesterday, while the article claiming it was a programming failure is dated today. Presumably, this is new information to shoot down the previous claims, but TFS (in typical Slashdot "editorial" style) fails to actually make that distinction, and puts both claims together as part of a single summary.
    
    Parent Share
    twitter facebook
- Re: (Score:2)
  
  by MindStalker ( 22827 ) writes:
  
  Chip failure, but it was a software error that lead to not handling the chip failure gracefully. Space qualified stuff has to be much more redundant and capable of handing failures of multiple components.
  - Re: (Score:3)
    
    by 0123456 ( 636235 ) writes:
    
    A while back I read some interesting discussions between satellite engineers about the tradeoffs between space qualified and not space qualified chips. From what I remember you gain resistance to radiation, but lose in other areas such as resistance to physical damage (e.g. a solder joint coming loose due to launch vibrations) because they're so far behind the state of the art that you may have to put a lot more chips on the same circuit board.
    So it doesn't seem a clear-cut choice... rebooting the computer
    - Re: (Score:2)
      
      by Grishnakh ( 216268 ) writes:
      
      I'm not a satellite engineer, but wouldn't it be easy enough to just install a lead shield around the PCB to protect from most radiation? As long as the shield's not too thick, it shouldn't add too much weight, especially compared to using older-technology chips that'll take up more board space.
      - Re:Excuse me... not a programmer's fault. (Score:5, Informative)
        
        by K. S. Kyosuke ( 729550 ) writes: on Tuesday February 07, 2012 @05:06PM (#38959515)
        
        I'm not a satellite engineer, but wouldn't it be easy enough to just install a lead shield around the PCB to protect from most radiation? As long as the shield's not too thick, it shouldn't add too much weight, especially compared to using older-technology chips that'll take up more board space.
        Well, that depends. Even on Earth's surface, we have to use ECC in more demanding application. In LEO, you lose the protection of the atmosphere but you still have Earth's rather strong and large magnetosphere. But this was an interplanetary probe. Once you get out of the radiation belts, interstellar and intergalactic particles start hitting you. You can't protect from those with a lead shield of any reasonable size. Pretty much the only way is simply to make the chip simple, rugged and design it with components (transistors) large enough that a particle flying through won't bother you much. Or add redudnancy. Or both, if possible (that's the usual case).
        
        Parent Share
        twitter facebook
      - Re:Excuse me... not a programmer's fault. (Score:5, Insightful)
        
        by ChrisMaple ( 607946 ) writes: on Tuesday February 07, 2012 @06:29PM (#38960487)
        
        Many chips are never designed to meet military or space specifications: the extra certification is very, very expensive and there are design compromises between performance and ruggedness. Furthermore, the testing you suggest for space qualification, if failed, results not in a mil-spec component but a component that has been destroyed by the test. In some cases, samples of a given batch are heavily tested to verify the batch, but those devices are considered damaged and not sold.
        Some rad hard type devices are of no interest to consumer design due to the poor performance caused by the compromises involved in achieving hardness. Rad hard devices aren't designed as often due to the small market, and the design is more difficult and takes longer, and certification takes time, too. Thus, the devices are older technology. Additionally, rad-hard parts (the actual transistors inside the ICs) are bigger physically than conventional devices, which also means they can be fabricated on older technology equipment. Thus, with respect to current commercial technology, space-qualified devices are often older technology.
        
        Parent Share
        twitter facebook
      - Re: (Score:3)
        
        by Grishnakh ( 216268 ) writes:
        
        Maybe they should try magnetic shielding [thespacereview.com]. For a human spacecraft, it'd be quite an undertaking, but for protecting a small electronics module, maybe it wouldn't be so difficult.
- Re: (Score:3)
  
  by icebike ( 68054 ) * writes:
  
  The second link in summary leads to an article that is internally contradictory. That page from Discovery News is all over the place.
  Which is not surprising given the bio of the author [discovery.com]:
  Klotz came to Brevard County, Fla. (aka The Space Coast) as a copy editor for the local paper 24 years ago. She switched to writing because it was obvious the reporters were having way more fun than the editors for the same money. After a year or so of writing for the business section,
  Journalism major trying to wear the big girl shoes.
  The Link to the planetary society page seems much more reliable.
- Re:Excuse me... not a programmer's fault. (Score:4, Funny)
  
  by smcdow ( 114828 ) writes: on Tuesday February 07, 2012 @04:06PM (#38958699) Homepage
  
  You can't possibly call yourself a programmer if your code can't recover from a hardware fault.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by VortexCortex ( 1117377 ) writes:
    
    You can't possibly call yourself a programmer if your code can't recover from a hardware fault.
    I agree.
    "Beware of programmers who carry screwdrivers." - Leonard Brandwein
  - Re:Excuse me... not a programmer's fault. (Score:4, Funny)
    
    by Beardo the Bearded ( 321478 ) writes: on Tuesday February 07, 2012 @05:31PM (#38959827)
    
    Amateur. My software is so good it doesn't even NEED hardware.
    
    Parent Share
    twitter facebook
    - Re: (Score:3)
      
      by pixelpusher220 ( 529617 ) writes:
      
      Bah. My software turns hardware INTO software! Mostly molten pools....
- Re: (Score:2)
  
  by geekoid ( 135745 ) writes:
  
  If a software failover fails, and the currently used chip fail, then it's both.
  Please, do some low lever software /hardware work before opening you mouth.
  This isn't one of your slapped together VB3 front end.
  Yeah, YOU Herd me.
- Re: (Score:2)
  
  by gatkinso ( 15975 ) writes:
  
  Space rated hardware, software, and (more relevantly) firmware is designed to handle this type of problem (to the fullest extent possible).
- Re: (Score:3, Insightful)
  
  by alienzed ( 732782 ) writes:
  
  On the other hand, this demonstrates so aptly why they failed in the first place. "Yep, it's a software problem, because the hardware failed to run any after it was damaged."
- Re: (Score:3)
  
  by jamstar7 ( 694492 ) writes:
  
  At least they didn't fuck up a meters-to-feet conversion.
- Re:Excuse me... not a programmer's fault. (Score:4, Interesting)
  
  by crutchy ( 1949900 ) writes: on Tuesday February 07, 2012 @04:39PM (#38959169)
  
  to my knowledge, only the Apollo Guidance Computer has ever truly achieved hardware failure tolerance. the Apollo 11 LM radar fault overloaded the computer, but was able to continue due to restart logic built into the AGC that was able to pick up critical tasks from where they were when the computer was restarted and drop non-critical tasks, and all with a very small fraction of the capabilities of current technology (although I think from memory they were able to fit 2 transistors on a single chip!). the AGC is really a marvel of (past) engineering and computer science. the reliability problem alone would be insurmountable with today's garbage. probably part of the reason why we haven't been back there since.
  
  Parent Share
  twitter facebook
Programming error? (Score:5, Funny)

by mehrotra.akash ( 1539473 ) writes: on Tuesday February 07, 2012 @03:28PM (#38958075)

the ultimate cause was the use of non-space-qualified electronic components
Programming error?
Perhaps in the software used to order the parts

Share
twitter facebook
headline fail (Score:3, Informative)

by jamessnell ( 857336 ) writes: on Tuesday February 07, 2012 @03:29PM (#38958095) Homepage

"the ultimate cause was the use of non-space-qualified electronic component" != "programming error" hardware fail.

Share
twitter facebook
- Re: (Score:2)
  
  by X0563511 ( 793323 ) writes:
  
  Even better... a design fail! The hardware worked (or not) as per it's specifications. It's not the hardware's fault you put it where it wasn't meant to go!
  - - Re: (Score:2, Informative)
      
      by Anonymous Coward writes:
      
      They probably just had someone ordering parts that didn't know to order mil spec (I'm assuming mil spec is fine for space stuff)
      No, not even close. "Mil spec" is basically industrial grade with a little bit extended temperature range. Radiation hardened stuff is completely different ballpark.
      - Re: (Score:2)
        
        by sjames ( 1099 ) writes:
        
        Sometimes mil spec isn't even extended at all, but just has more rigorous testing to make sure it's within the standard specs.
    - Re: (Score:3)
      
      by Tastecicles ( 1153671 ) writes:
      
      mil spec isn't proofed against hard radiation; it does some soft radiation and EM not quite up to airburst-strength pulse. Space spec has to withstand high energy radiation such as Cosmic, X- and Gamma rays way beyond what you'd encounter 5 miles below a thermonuclear burst, otherwise it'll get outside the VA belts and simply die.
    - Re:headline fail (Score:5, Funny)
      
      by smitty97 ( 995791 ) writes: on Tuesday February 07, 2012 @03:49PM (#38958431)
      
      (I'm assuming mil spec is fine for space stuff)
      You don't happen to work at the Russian Space Agency purchasing department, do you?
      
      Parent Share
      twitter facebook
- Re: (Score:3)
  
  by geekoid ( 135745 ) writes:
  
  A) Some hardware has software embedded into it, yeah shocking.
  B) Parts fail in space craft. If the software failed to detects a failed piece and roll to back up, the software has it's roll in the incident as well.
  C) If it jump to the wrong mode after the error, that's also a software error.
  I'm not saying one way or another in the specific incident. The idea that there is a hard line between all software and hardware is false, and technical people should know better.
  - - Re: (Score:2)
      
      by yurtinus ( 1590157 ) writes:
      
      In an embedded system - particularly a critical system - you usually have software aware of the state of interfacing hardware. Additionally, you should have some redundant systems so you can handle a hardware fault on one of them. The article says "two chips failed," with no further details. I'd assume the guys calling it a software error are doing so for a reason - likely those chips were part of some databus interface, D/A or A/D converter, or something that the software *talks* to (as opposed to runs on)
- Re: (Score:2)
  
  by jd2112 ( 1535857 ) writes:
  
  Programmer didn't include if($component.SpaceRating != TRUE) {throw "INITIALIZATION ERROR: NON SPACE RATED COMPONENT!"}
So how much? (Score:3)

by cvtan ( 752695 ) writes: on Tuesday February 07, 2012 @03:31PM (#38958145)

How much did they save by using Radio Shack parts in a Mars probe? $5.00 even?

Share
twitter facebook
- Re:So how much? (Score:5, Funny)
  
  by Spykk ( 823586 ) writes: on Tuesday February 07, 2012 @03:50PM (#38958447)
  
  Not even the government could save money by buying something at Radio Shack.
  
  Parent Share
  twitter facebook
- Re:So how much? (Score:4, Informative)
  
  by stewbee ( 1019450 ) writes: on Tuesday February 07, 2012 @03:53PM (#38958499)
  
  If only. The reason ICs cost so little is that the cost is spread out over millions of parts. As my analog circuits Prof would say. "Your very first IC off the line is going to cost a million dollars. Everything else after that is free." So to buy one or two ICs that are radiation hardened is probably going to cost that much since it will most likely be custom. Now that's not to say they can't reuse some of the masks for an existing IC to make it cheaper, but It won't be that much cheaper. My guess is that they would want to redesign the part anyway if it is going to be in a radiation intense environment. The radiation could cause some weird quantum effects in the IC that might mean they want the transistors to be larger for reliability purposes. But that last part is just a guess since I am not an IC designer and thought my electronic materials class was nothing short of voodoo.
  
  Long story short, they probably saved more than $5 for using a COTS part, but they probably lost the probe by the part not being radiation hardened.
  
  Parent Share
  twitter facebook
  - Re:So how much? (Score:4, Interesting)
    
    by autophile ( 640621 ) writes: on Tuesday February 07, 2012 @04:48PM (#38959281)
    
    For want of a rad-hard chip, the board died.
    For want of a board, the software couldn't cope.
    For want of good software, the engine start failed.
    For want of engine start, the probe died.
    For want of a probe, the human race didn't detect the slimy aliens from Phobos and all perished in a hot and somewhat greasy fireball.
    
    Parent Share
    twitter facebook
- Re: (Score:2)
  
  by John Bresnahan ( 638668 ) writes:
  
  How much did they save by using Radio Shack parts in a Mars probe? $5.00 even?
  Based on my last visit to Radio Shack, I don't think their parts are any cheaper than the special-purpose, radiation-hardened parts they should have used.
  But when you can't wait until tomorrow for a part for your space probe, Radio Shack is convenient.
  - Re: (Score:2)
    
    by yurtinus ( 1590157 ) writes:
    
    I dunno, seems to me it'd be quicker just to order your parts from Digikey instead of going to Radio Shack, buying a cell phone and contract, then dismantling the phone to desolder the part you need (and hope you didn't bust the part in the process)... Sure, Radio Shack is convenient for a lot of things, as long as all of those things are cell phones and expensive Ethernet cables.
- Re: (Score:3)
  
  by systemeng ( 998953 ) writes:
  
  When I worked in the test equipment industry, we had a term for the lowest grade of parts that still worked when binning components: The radio shack bin. I once built part of an emergency prototype for a test equipment cooling system with radio shack parts. The prototype was sent to Taiwan where it failed prematurely due to the marginal components. Never Again!
- Re: (Score:3)
  
  by K. S. Kyosuke ( 729550 ) writes:
  
  How much did they save by using Radio Shack parts in a Mars probe? $5.00 even?
  This is not the first time something like this happened to the Russians. In the 1970's, the Soviet Mars 4 [wikipedia.org] probe failed in flight. The reason? Due to cost savings, the transistors used had had their gold parts replaced with aluminium ones, which were prone to chemical degradation (a.k.a. corrosion). The Soviets then realized that they had manufactured three more probes of the same series using the same (unfit) transistors. Now what did they do? Of course they launched them! Guess what happened? Mars 5 failed
- Re:So how much? (Score:5, Interesting)
  
  by jd ( 1658 ) writes: <imipak@nOsPaM.yahoo.com> on Tuesday February 07, 2012 @04:40PM (#38959187) Homepage Journal
  
  Space Micro [spacemicro.com] doesn't list the prices of their components or systems, nor can I find any from anyone else. Honeywell [honeywellm...ronics.com] don't list their prices either. Atmel seem to have dropped out of the field. Linear [linear.com] don't list the prices for their space-hardened stuff. Don't see any for BAE [baesystems.com] either, or Intersil [intersil.com]. Empire Magnetics [empiremagnetics.com] require a lot of personal data before they give you access to even the price classification information. Not the prices, just how they're classified.
  You've got to allow for a year's worth of traveling outside of an atmosphere and then operating on Mars for the duration of the mission. This analysis of radiation for manned missions [esa.int] suggests you're looking at 3.5 mSv per day, then 20 rems per year [solarstorms.org] in most of the places of interest.
  Converting everything to rads, it's 0.1 rads per mSv and 1 rad per rem, so that's 12.75 rads to get to Mars if you assume a year-long trip, plus 20 rads for the mission, so anything with a rating of less than 32.75 rads is pretty much guaranteed to fail. However, over the course of a two years, the odds of there being a [hps.org]solar flare [nasa.gov] are not insignificant. To be safe, you want resistance to a further 400 rad. 432.75 rad is within the tolerance of most of the space-hardened components (some components can be taken up to 1000 rad, others up to 10,000). However, the cheapest space components would NOT survive. You're talking high-end on the space scale.
  I'm going to figure that the top-line components will cost 100x that of their conventional counterparts, due to the higher-level of precision and QA that are required. It might well be a good deal more. In Russia, you've also got to pay for smuggling decent-grade hardware out of the US, as all of this stuff will be under massive amounts of regulation.
  My guess is that the cuts would have saved enough that those doing the cost-cutting could buy second homes in Switzerland.
  
  Parent Share
  twitter facebook
  - - Re:So how much? (Score:4, Interesting)
      
      by jd ( 1658 ) writes: <imipak@nOsPaM.yahoo.com> on Tuesday February 07, 2012 @08:08PM (#38961419) Homepage Journal
      
      The links for International Rectifier, for those *#$% off with Congress and wanting to build their own damn Rover:
      Rad Hardened Single chip MOSFETs [irf.com]
      Rad Hardened Multi Chip MOSFETs [irf.com]
      Space-Rated DC-DC Converters [irf.com]
      Space-Rated Low RF Power DC-DC Converters [irf.com]
      Rad-Hardened Voltage Regulators (fixed) [irf.com]
      Rad-Hardened Voltage Regulators (variable> [irf.com]
      Rad-Hardened Gate Drivers [irf.com]
      Some of their other military/avionics stuff may be space-rated or rad-hardened but it doesn't say so.
      
      Parent Share
      twitter facebook
Always Blame Software (Score:5, Insightful)

by invid ( 163714 ) writes: on Tuesday February 07, 2012 @03:33PM (#38958163)

Is it just me, or is it the responsibility of all software engineers to find the hardware problem in order to prove to people that the cause isn't software?

Share
twitter facebook
- Re: (Score:2)
  
  by Hognoxious ( 631665 ) writes:
  
  is it the responsibility of all software engineers to find the hardware problem in order to prove to people that the cause isn't software?
  Find someone else, I'm busy.
  In any case, it's usually orders of magnitude easier to blame the spec. It's written by management/users, after all...
- Re: (Score:3)
  
  by rwv ( 1636355 ) writes:
  
  In my experience... hardware problems are acceptable if there's a software work-around. Special acknowledgement isn't given to software for fixing hardware bugs... it's just expected since hardware is arguably more expensive to change.
Obligatory Armageddon quote (Score:2)

by Kinthelt ( 96845 ) writes:

Components. American components, Russian Components, ALL MADE IN TAIWAN!
http://www.imdb.com/title/tt0120591/quotes?qt=qt0459113 [imdb.com]
- Re: (Score:2)
  
  by Tastecicles ( 1153671 ) writes:
  
  Ob. Clancy (mis?)quote:
  "See? We have the best technology in our missiles, Tovarisch."
  "What does it say?"
  "Texas Instruments."
- Re: (Score:2)
  
  by K. S. Kyosuke ( 729550 ) writes:
  
  Hey, my movie beats your movie every day!
  No wonder this circuit failed. It says "Made in Japan". [imdb.com]
Contradictions (Score:5, Informative)

by Aladrin ( 926209 ) writes: on Tuesday February 07, 2012 @03:39PM (#38958265)

The summary is so contradictory because it quotes from 2 articles, and each of them is completely different. One says that the parts were space-tested and fine, and the other says they were never space-certified and were definitely bad. The first one says instead that a software bug caused parts of the system to reboot. The second doesn't know what happened and just blames faulty hardware.

Share
twitter facebook
- Re: (Score:2)
  
  by mbone ( 558574 ) writes:
  
  The summary is so contradictory because it quotes from 2 articles, and each of them is completely different.
  " A foolish consistency is the hobgoblin of little minds, (Emerson)
Sounds like a editor failure to me (Score:5, Funny)

by kbob88 ( 951258 ) writes: on Tuesday February 07, 2012 @03:43PM (#38958327)

In other news, U.S. radars were not responsible for the highly confusing and contradictory summary posted this morning to a Slashdot story about Russia's Phobos-Grunt probe. A thorough investigation has determined that the story's chips should have been able to withstand the radiation received when the story was transmitted through the intertubes and routed over northern Alaska. Instead, investigators blamed a typing failure on the story editors. "A series of tests showed that the editing was lousy and sloppy, and disciplinary action will be taken on those responsible," a spokesman said.

Share
twitter facebook
Obligatory... (Score:2)

by Cruciform ( 42896 ) writes:

In Soviet Russia probe causes programming bug!
They have very strict security measures. It can be traumatic.
What is it with Mars and probes? (Score:2)

by g0bshiTe ( 596213 ) writes:

What's with Mars and probes? Seriously, how many have been lost either going or coming from?
- how long does it take YOU to walk a mile? (Score:3)
  
  by Thud457 ( 234763 ) writes:
  
  Mars is 60,000,000 miles away.
  Phobos Grunt would have taken three years to get there.
  If it didn't die of dysentery on the journey there.
  - Re: (Score:2)
    
    by PlatyPaul ( 690601 ) writes:
    
    Radiation bites Phobos Grunt.
    Radiation bites Phobos Grunt.
    Phobos Grunt dies.
- Re: (Score:2)
  
  by geekoid ( 135745 ) writes:
  
  It's HARD.
  I mean, we have pretty much mapped every spot on the planet, yet airplanes still crash.
Staffing Error Doomed American Tech News Site (Score:5, Insightful)

by billcopc ( 196330 ) writes: <vrillco@yahoo.com> on Tuesday February 07, 2012 @03:51PM (#38958477) Homepage

Okay, we still have a respectable though dwindling community of commenters, so can we please get rid of these editors who can't even be bothered to read four lines of summary text before posting ?
The headline and summary do not make sense. Come on, we're supposed to be nerds, aka intelligent, focused, attentive knowledge aggregators.
the fuck is wrong with this goddamned site?! These failures are starting to make Digg look good!

Share
twitter facebook
- Re: (Score:2)
  
  by geekoid ( 135745 ) writes:
  
  No. You are welcome to go to the times and pay for a subscription that uses actual editors.
Fun to read the comments (Score:5, Insightful)

by vlm ( 69642 ) writes: on Tuesday February 07, 2012 @03:52PM (#38958479)

Fun to read the comments here. I've done embedded stuff and you need to be defensive. You can see at a glance who here has never done defensive programming before, or embedded or safety critical programming, all blaming the hardware. There's 3 states so you got 2 bits of input and a disallowed state comes in. Deal with it, don't just curl up and die and blame the hardware designer. There's a 12 bit A/D conversion result stored in two bytes, and there's a 14 bit number found there, deal with it don't just curl up and die and blame the ... . Theres a cycle start button and an emergency stop button and both are simultaneously on. Deal with it. You reboot a mission critical (or safety critical!) CPU and a minor auxiliary input A/D doesn't initialize, do you burn the plant down in a woe is me pity party because one out of 237 sensors aren't coming on line, or do you deal with it?
Finally radiation is a statistical phenomena. There is no such think as radiation free. If they used non-rad hardened parts, its gonna crash maybe 10000 times more often. Thats OK, you program around that, assuming you know what you're doing. Radiation hardened does not equal radiation-proof. If there was a single bit error, or a latchup on a rad-hardened unit, with a poorly programmed control system it would have failed just as well, its just that a rad hardened chip would have made it a couple orders of magnitude less likely. A shitty design that has a 1 in 20000 failure rate due to better hardware instead of 1 in 2 is still a shitty programming design, even if the odds are "good enough" that it makes it most of the time with the better hardware.

Share
twitter facebook
- - Re: (Score:3)
    
    by systemeng ( 998953 ) writes:
    
    You checksum memory with all processor cycles that are not dedicated to a specific task. If you detect a failure, you reload the system from read-only memory. . .
Baloney (Score:5, Interesting)

by mbone ( 558574 ) writes: on Tuesday February 07, 2012 @03:58PM (#38958573)

What are the chances chips would fail in a 20-30 minute period just after launch but before Mars transfer orbit insertion ?
No, I bet this was a programming error, coupled with a near total failure to test the software.

Share
twitter facebook
- Re: (Score:2)
  
  by DerekLyons ( 302214 ) writes:
  
  What are the chances chips would fail in a 20-30 minute period just after launch but before Mars transfer orbit insertion?
  Small, but decidedly non-zero. So I should point out that "improbable" != "impossible".
Oh come on. (Score:2)

by JustAnotherIdiot ( 1980292 ) writes:

I read the title and I was going to make a joke forgetting a ;, or something in the like.
But this wasn't a programming error, it was a hardware failure |:
Did the editor even read what he wrote?
- Re: (Score:2)
  
  by Fnord666 ( 889225 ) writes:
  
  Did the editor even read what he wrote?
  The editors no longer write or read anything. They just cut and paste. Submitters no longer write anything, they just copy the first paragraph or two of an article. I swear that some days all of the articles are probably just submitted by a very short perl script.
Top Ten reasons for failure of Mars Probe. (Score:4, Funny)

by walterbyrd ( 182728 ) writes: on Tuesday February 07, 2012 @04:40PM (#38959189)

Ripped from old David Letterman "Top Ten List"
10. "Mars probe? What Mars probe?"
9. Forgot to use The Club
8. Those lying weasels at Radio Shack
7. Too much Tang
6. Made by G.E.
5. Them Martians musta shot it down with a ray gun
4. Heh, heh, heh ... Our space probe sucks -- heh, heh, heh
3. At least we didn't blow all our money on some dork screwing around with a car phone
2. Remember Watergate? Well, Nixon's up to his old tricks again!
1. Space monkeys

Share
twitter facebook
- Top 10 reasons for failure of Mars Probe. (Score:4, Funny)
  
  by EnsilZah ( 575600 ) writes: <EnsilZah @ G m a i l .com> on Tuesday February 07, 2012 @06:38PM (#38960591)
  
  01 Hardware 10 Software And it seems the article opted for 11 which is an undefined state. (Monospace used for effect)
  
  Parent Share
  twitter facebook
Radiation Damage? (Score:3)

by funkboy ( 71672 ) writes: on Tuesday February 07, 2012 @06:24PM (#38960441) Homepage

Well, if there was an RTG [wikipedia.org] onboard, then maybe the radiation damage was from inside the spacecraft.
It seems strange to me that they'd blame radiation damage as they have a separate institution dedicated to developing rad-hard SPARC chips for space applications [wikipedia.org] that has a very successful track record.
Question: how do they know it was radiation damage if they never heard back from the probe?

Share
twitter facebook
Darn you Id Software! (Score:3)

by Darth Hubris ( 26923 ) writes: on Tuesday February 07, 2012 @07:10PM (#38960935)

Who saw "Doom", "Mars", and "Phobos" and reached for your shotgun?

Share
twitter facebook
- Re:How is "chip failure" a "programming error"? (Score:5, Funny)
  
  by Hognoxious ( 631665 ) writes: on Tuesday February 07, 2012 @03:47PM (#38958377) Homepage Journal
  
  A 4 digit ID and never heard of microcode.
  Seriously Gramps, the distinction between hardware and software isn't as clear cut as it was when shit was all powered by steam.
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by Capt.DrumkenBum ( 1173011 ) writes:
    
    Stop dissing Steam, it is the power source of the future. :)
    Also, get off my lawn.
  - Re: (Score:2)
    
    by geekoid ( 135745 ) writes:
    
    Problem came on board during the first SW:EP1 discussion, not any of the technical ones. Not that there was any real technical ones at the time.
  - Re: (Score:2)
    
    by invid ( 163714 ) writes:
    
    The turbines in a nuclear power plant are run by steam, sonny.
- Re:Description Fail (Score:5, Interesting)
  
  by expatriot ( 903070 ) writes: on Tuesday February 07, 2012 @03:47PM (#38958391)
  
  The Planetary Society entry says that two modules failed and then the main computer crashed. Probably irrelevant if the computer crashed or not if there were significant failures in the electronics. Perhaps if the computer had kept going there woud have been some communication of what had gone wrong.
  One of the commenters wrote "It is rather unlikely radiation caused the failure. Russians said the failure was due to an SRAM WS512K32V20G24M from White Electronics. This part is a module containing 4 CY7C1049 chips from Cypress and is actually screened. While the Cypress part is very susceptible to Latchup," No idea if this is true or not.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Johann Lau ( 1040920 ) writes:
    
    this might be interesting for you and others (it's pretty much gibberish to me :D)
    http://russianspaceweb.com/phobos_grunt_aftermath.html [russianspaceweb.com]
    - - Re: (Score:3)
        
        by garyebickford ( 222422 ) writes:
        
        It's worth noting that the Space Shuttle's navigation system had three identical computers who all 'voted' on the result, and if one disagreed it took itself out of the system. And there was a fourth computer made by a different company, using a different architecture and different programming language, that monitored the three. In retrospect, I think that's a pretty good idea. Having two different architectures makes having the same programming error occur in two different systems very unlikely.
        Of cours
- Re: (Score:2)
  
  by vlm ( 69642 ) writes:
  
  Mythologically, which is where the moon got its name, Phobos is a dude. He's got a twin brother Deimos. Given that datapoint, guess the name of another Martian satellite...
  - Re: (Score:2)
    
    by crawling_chaos ( 23007 ) writes:
    
    Steve?
- Re: (Score:2)
  
  by geekoid ( 135745 ) writes:
  
  What we do know for sure: Bottom.
- Re:Worse than on the ground... (Score:4, Informative)
  
  by Panaflex ( 13191 ) writes: <convivialdingo@y ... m minus math_god> on Tuesday February 07, 2012 @05:35PM (#38959887)
  
  There's hardware to deal with that - a watchdog timer can reboot the system quickly.
  Assuming the system comes back up with a working CPU and RAM, then the main computer should be able to work around bad peripheral or components on the bus. I think that's what the article is getting at.
  On military aircraft, they use VM's to run the OS and software. Communicate between systems is passed synchronously and requires that each module know the state of the other modules. There is never an assumption that the other system will just work - all messages require acknowledgement and verification of results.
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Excuse me... not a programmer's fault. (Score:5, Insightful)

Re: (Score:3, Insightful)

Re:Excuse me... not a programmer's fault. (Score:5, Funny)

Re:Excuse me... not a programmer's fault. (Score:5, Funny)

Re: (Score:3, Funny)

Re:Excuse me... not a programmer's fault. (Score:4, Funny)

Re: (Score:3, Informative)

Re:Excuse me... not a programmer's fault. (Score:5, Funny)

TFS - obviously written by a hardware guy (Score:3)

Re:TFS - obviously written by a hardware guy (Score:5, Interesting)

Re:TFS - obviously written by a hardware guy (Score:5, Informative)

Re:TFS - obviously written by a hardware guy (Score:5, Interesting)

Re: (Score:2)

Re:Excuse me... not a programmer's fault. (Score:5, Interesting)

Re: (Score:3)

Re: (Score:3)

Re:Excuse me... not a programmer's fault. (Score:4, Informative)

Re:Excuse me... not a programmer's fault. (Score:4, Interesting)

Re:Excuse me... not a programmer's fault. (Score:5, Funny)

Re:Excuse me... not a programmer's fault. (Score:5, Informative)

Re: (Score:3)

Comment removed (Score:5, Interesting)

Re:Excuse me... not a programmer's fault. (Score:4, Interesting)

Re:Excuse me... not a programmer's fault. (Score:5, Informative)

Re:Excuse me... not a programmer's fault. (Score:5, Funny)

Re: (Score:2)

Re:Excuse me... not a programmer's fault. (Score:4, Interesting)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re:Excuse me... not a programmer's fault. (Score:5, Informative)

Re:Excuse me... not a programmer's fault. (Score:5, Insightful)

Re: (Score:3)

Re: (Score:3)

Re:Excuse me... not a programmer's fault. (Score:4, Funny)

Re: (Score:2)

Re:Excuse me... not a programmer's fault. (Score:4, Funny)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:3)

Re:Excuse me... not a programmer's fault. (Score:4, Interesting)

Programming error? (Score:5, Funny)

headline fail (Score:3, Informative)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:3)

Re:headline fail (Score:5, Funny)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

So how much? (Score:3)

Re:So how much? (Score:5, Funny)

Re:So how much? (Score:4, Informative)

Re:So how much? (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re:So how much? (Score:5, Interesting)

Re:So how much? (Score:4, Interesting)

Always Blame Software (Score:5, Insightful)

Re: (Score:2)

Re: (Score:3)

Obligatory Armageddon quote (Score:2)

Re: (Score:2)

Re: (Score:2)

Contradictions (Score:5, Informative)

Re: (Score:2)

Sounds like a editor failure to me (Score:5, Funny)

Obligatory... (Score:2)

What is it with Mars and probes? (Score:2)

how long does it take YOU to walk a mile? (Score:3)

Re: (Score:2)

Re: (Score:2)

Staffing Error Doomed American Tech News Site (Score:5, Insightful)

Re: (Score:2)

Fun to read the comments (Score:5, Insightful)