Stories
Slash Boxes
Comments
typodupeerror delete not in

Comments: 324 +-   Why ISS Computers Failed on Tuesday October 16 2007, @12:03AM

Posted by kdawson on Tuesday October 16 2007, @12:03AM
from the triply-redundant-is-not-foolproof dept.
space
science
Geoffrey.landis writes "It was only a small news item four months ago: all three of the Russian computers that control the International Space Station failed shortly after the Space Shuttle brought up a new solar array. But why did they fail? James Oberg, writing in IEEE Spectrum, details the detective work that led to a diagnosis." The article has good insights into the role the ISS plays as a laboratory for US-Russian technology cooperation — something that is likely to be crucial in any manned Mars mission.
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by Rebelgecko (893016) on Tuesday October 16 2007, @12:06AM (#20991565)
    They "upgraded" to Vista.
    • by Anonymous Coward on Tuesday October 16 2007, @01:42AM (#20992041)
      Clippy: It looks like you want to install a new solar array. Do you want help with that?
        • by FoolsGold (1139759) on Tuesday October 16 2007, @01:58AM (#20992127)
          Are you honestly saying that anyone who thinks Vista is decent is a MS shrill?

          Why? Is defending a MS operating system for honest reasons impossible to believe anymore?
          • by Woy (606550) on Tuesday October 16 2007, @04:34AM (#20992839)
            > Is defending a MS operating system for honest reasons impossible to believe anymore?

            We don't do honest here. We do technically sound.
            • by hey! (33014) on Tuesday October 16 2007, @06:44AM (#20993423) Homepage Journal

              We don't do honest here. We do technically sound.


              We don't do technically sound here. We make do parroting the "common wisdom" and secretly praying nobody who actually knows something will be bothered to respond.

              Good form means getting and informative moderation rating without provoking an informative result. If you do provoke an informatve result, you end up in the penalty box (i.e., spend a few days actually getting work done rather than wasting time on Slashdot).
            • by encoderer (1060616) on Tuesday October 16 2007, @10:35AM (#20996457)
              You missed one important part...

              Millions, nay Tens of Millions of people give Microsoft and their products "the time of day." People who have no dogmas or political agendas when it comes to computing. People who just see a computer and its software as a tool to get their desired job done. And not just MBA or Administration types, but also millions of software developers and network administrators and such.

              I don't think Windows is perfect, but I also don't think OSX is perfect nor do I think that Linux or any flavor of Unix is perfect. I do think that the O^n usefulness of the Windows install base provides so much opportunity that it ends up offering the most value to businesses and consumers.

              And with regard to their "self serving" ways... many on slashdot are anti-business or at least anti-corporation. They adopt the FSF malarkey that all code should be given away free. I put food on my family's table by developing software and the notion that it should be given away free just misses the mark. Market-based economics can bring out the best in innovation, which is why America has some of the highest paid and most productive workers in the world.

              Slashdot is full of idealistic college students and 20-somethings (of which I am a part) who think that corporations are "evil" and that we should all wear birkenstocks and eat crunchy granola and spend our days writing software that solves a problem that's already been solved on a Windows platform and then give it away for free just so we can say we fought the good fight. It's naive. Say what you want about Microsoft, but that company, and the efforts of billg have made THOUSANDS of people millionaires and probably a handful of billionaires, too. Many of those people took that money and started their own software companies solving their own unique, novel problems, and on their own hiring employees and fueling the economy and probably making a lot of those people millionaires, too, who perpetuate it.

              Business is good for all of us. Economic success and security is good for America.
  • Metric electricity vs Imperial electricity...
  • Urgh. (Score:5, Insightful)

    by Airconditioning (639167) on Tuesday October 16 2007, @12:12AM (#20991589) Journal
    The article reeked of condesension towards the Russians. It's no way to report on your partners in space.
    • Re:Urgh. (Score:5, Funny)

      by istartedi (132515) on Tuesday October 16 2007, @12:24AM (#20991661) Journal

      For a split second, I thought you said it reeked of condensation towards the Russians.

    • Re:Urgh. (Score:5, Interesting)

      by Jugalator (259273) on Tuesday October 16 2007, @01:54AM (#20992109) Journal
      I agree... That's what first came to mind after having watched this incident unfold live. What he fails to mention is that the Russian engineers were always open to suggestions and they cooperated pretty well when they needed to discuss the problems. The Russians were also working nearly 24/7 on trying to find and resolve the problems and come up with theories before they were running out of time. The article makes it sound like they early on got locked into "blaming the Americans" or something. It was merely one theory that was tossed around and discussed, and diagnosed early on. If there seem to be a power failure (which it ended up being all about), surely one logically suspected culprit could be a power feed problem?
    • Re:Urgh. (Score:4, Insightful)

      by Anonymous Coward on Tuesday October 16 2007, @02:04AM (#20992151)
      Yup. OK, it's a design flaw. We have been, and still are, capable of doing things just as bad, if not far worse. Look at the Shuttle fiascos.

      This item is hugely biased. It looks to me like a simple case of corrosion, which could easily have been patched up if it happened on a Mars flight. The engineers and crew all seemed to work well together, and the Russians were the ones who sorted the problem.

      I don't know if the Russian Program Managers got all political against us, but the item, written by a retired NASA manager, sure as hell gets political against the Russians. He's right in one thing - the managers need to stop getting political, and I suggest he starts with himself!

      It's just as well he's retired - looks like he's fighting long lost battles against cooperation with the Russians and Europeans.
      • Re:Urgh. (Score:4, Interesting)

        by DerekLyons (302214) <fairwater@gmaPERIODil.com minus punct> on Tuesday October 16 2007, @03:02PM (#21000861) Homepage

        I don't know if the Russian Program Managers got all political against us, but the item, written by a retired NASA manager, sure as hell gets political against the Russians.

        When you follow the space progam/ISS day in and day out, rather than relying on the all to infrequent Slashdot coverage... you soon see why. Again and again when something goes wrong, the Russians first (publically) announced 'theory' is that the problem is 'the Americans fault'. Only months later, if ever, does the truth come out. There are a couple of failures from the early flights of the current Soyuz version that were publically blamed on the Americans - that the Russians have yet to disclose the real cause of. The Russians have a long habit of being less than candid when it comes to their space program, and NASA has gone right along with them in covering up safety and performance issues with MIR, Soyuz, and the ISS.
         
         

        This item is hugely biased. It looks to me like a simple case of corrosion, which could easily have been patched up if it happened on a Mars flight.

        Sure, this one failure could have been patched up - but this is only the latest in a long series of failures caused by poor design and manufacture of the Russian segments of the ISS. Failures nowhere matched on the US side. Failures consistently blamed on the US by the Russians. While both NASA and the Russians are publically praising the performance of the Russian hardware.
         
        It's not just about the Russians.
         
         

        It's just as well he's retired - looks like he's fighting long lost battles against cooperation with the Russians and Europeans.

        It may seem that way to somebody unfamiliar with the backstory and history. (I.E. pretty much every Slashdot commentator so far.)
         
        [rant]The Slashdot hivemind frustrates the hell out of me when it comes to space issues. Too damm few bother to actually read and keep up with the field, and fewer still know much about the history.[/rant]
    • Re:Urgh. (Score:5, Insightful)

      by Ethanol-fueled (1125189) on Tuesday October 16 2007, @03:10AM (#20992481) Homepage
      Hell yeah. Mod parent up. The real heroes are in space cooperating and solving problems.
      Seriously, all of that political cold war-era cockwaving should stop.
    • Re:Urgh. (Score:4, Insightful)

      by JoelKatz (46478) on Tuesday October 16 2007, @05:58AM (#20993167)
      Absolutely.

      "It is dismaying that after decades of experience with manned space stations, Russian space engineers still couldn't keep unwanted condensation at bay."

      That's a bunch of crap. That's like saying it's dismaying that McDonald's has served billions of burgers and still can't figure out how to make them healthy.

      Condensation is "still" a problem because it's one of the big and tricky ones. To get rid of the condensation, you have to get rid of the people.
      • Re: (Score:3, Funny)

        by Anonymous Coward
        Yeah but I don't know if the thoughts of a guy who made jeans really applies to this situation.
      • Re:Urgh. (Score:5, Funny)

        by UncleTogie (1004853) * on Tuesday October 16 2007, @12:43AM (#20991757) Homepage Journal

        Hey, the truth hurts. Let's face it, Russian technology is not on the same level as US, Japanese, or Korean.

        Lev Andropov: Armageddon: "Components. American components, Russian Components, ALL MADE IN TAIWAN!

      • Tell me, how many casualties have the russians had in the last decade, even last two decades? This was in the days of Mir, when the russians maintained a continues space pressence year after year and the US was out of space for year after year for blowing up space shuttles.

        So whose tech is behind whose? The ISS didn't plunge out of the sky when the Space Shuttle was not available, apparently the russian capability is more then enough to operate it.

        And finally, who build the de-humidefier that was the fault in the first place?

  • by Cyberax (705495) on Tuesday October 16 2007, @12:15AM (#20991601)

    ...They also decided to rig a thermal barrier out of a surplus reference book and all-purpose gray tape....


    Once again, duct tape saves the day! :)
  • Hmmm (Score:5, Funny)

    by K.os023 (1093385) on Tuesday October 16 2007, @12:15AM (#20991603)
    Could this be the one place where it would be appropriate to mention that in Russia, crashes compute?


    Or would that be "In Russia, crashes compute you!" ?
  • Duct Tape (Score:5, Insightful)

    by istartedi (132515) on Tuesday October 16 2007, @12:17AM (#20991611) Journal

    They also decided to rig a thermal barrier out of a surplus reference book and all-purpose gray tape

    Almost certainly, this was the duct tape we all know and love. They probably thought it was better not to actually say that, though. Pretty funny. And as an added side-benefit, they should be safe from terrorists.

  • by quanticle (843097) on Tuesday October 16 2007, @12:19AM (#20991623) Homepage

    I think NASA should have learned this lesson by now. After all, the Challenger disaster showed this principle as well. In that case, the same cold temperature that weakened the primary seal on the solid rocket booster weakened the secondary as well, sapping its ability to provide redundant backup. In this case, the same condensation affected all three computers equally.

    Its troubling to see them taking shortcuts on safety and redundancy, when such measures have resulted in loss of life before. How hard would it have been to have had three shut-off cables?

    • by 8-bitDesigner (980672) on Tuesday October 16 2007, @12:39AM (#20991731) Homepage
      Two nit-picky points here:
      1. It wasn't condensation that felled all three computers, it was a single corroded connector, which shorted and sent a kill-command to all three computers. Technically, redundancy here would've circumvented that issue.
      2. Actually, I believe the article stated that it was a Russian-manufactured component, not a NASA design.
    • by khallow (566160) on Tuesday October 16 2007, @01:39AM (#20992027)

      Its troubling to see them taking shortcuts on safety and redundancy, when such measures have resulted in loss of life before. How hard would it have been to have had three shut-off cables?

      At first, I was nodding in agreement. But then I realized, how do you find out when you've built in hidden single points of failure? Everyone knows that a single point of failure is bad. Hence, the ones that get into a space station weren't intended (or were due to shoddy work). One way to find them is to use the equipment in a real situation and vet it when it breaks. Exactly what they did. Now that they know this is a problem, they can fix it.
  • by cioxx (456323) on Tuesday October 16 2007, @12:22AM (#20991651) Homepage
    Look people, I can see that ISS personnel are really upset about this. I honestly think they ought to sit down calmly, take a stress pill, and think things over. I know the computers had made some very poor decisions recently, but they can give explorers their complete assurance that the work will be back to normal. These machines still got the greatest enthusiasm and confidence in the mission. And they want to help.
  • by dd1968 (1174479) on Tuesday October 16 2007, @12:50AM (#20991791)
    These computers functioned for months or years. When they failed, the right question to ask first was "what has changed?" This is exactly what the Russians did. According to the author the Russians first considered potential causes stemming from the newly installed solar power wing, the visiting shuttle, and the expanded station structure (the reason for the shuttle being there). One conclusion is that they were pointing the finger at NASA and playing the blame game. Another is that they were doing what good engineers anywhere would do to debug the problem.

    The author is obviously way more qualified than I to assess the situation and he may well be right but from the content of the article I came away thinking, wow, I would have looked first at all the recent changes to the station and the power supply too.

    • by DNS-and-BIND (461968) on Tuesday October 16 2007, @01:23AM (#20991959) Homepage
      I see you have never dealt with Russians. The ones in their space program are especially tetchy about taking ANY blame whatsoever. Their equipment is always perfect, and the foreign equipment MUST be the problem. You know, how when there's a problem, you kind of step back for a second and analyze the entire situation? That's what NASA does. The Russians merely blame the first thing they can think of. Then, when that's disproven, they have a lot of other proposed explanations, none of which involve the failure of Russian equipment. It's even worse when there is a semi-plausible event like the new solar panel.

      Look, the Russians as people are all right. But their management in the space program is obsessed with face. They feel that admitting any faults demeans the Russian nation and the Russian people. You can laugh but that's how it is.

      • by giafly (926567) on Tuesday October 16 2007, @03:33AM (#20992579)

        I see you have never dealt with Russians. The ones in their space program are especially tetchy about taking ANY blame whatsoever. Their equipment is always perfect, and the foreign equipment MUST be the problem.
        I see you have never worked in the computer industry, if you think this mindset is unique to Russians. Actually it is universal.
  • by JustShootMe (122551) * <rmiller@duskglow.com> on Tuesday October 16 2007, @12:53AM (#20991803) Homepage Journal
    That for all of the controls and quality control required of mission critical hardware such as this, it still comes down to:

    1) unexpected failure modes
    2) political battles

    Which really isn't a whole lot different than 1) the unexpected failure modes I see every day at work, and 2) the political wrangling (fingerpointing) that takes place when they happen. Apparently NASA and its Russian equivalent are no better than any old software company.

    The lesson being, people are people, and people are still the ones that design these things.
  • Power off command (Score:5, Interesting)

    by jsse (254124) on Tuesday October 16 2007, @01:01AM (#20991847) Homepage Journal

    Also, in a shocking design flaw, there was a "power off" command leading to all three of the supposedly redundant processing units.
    That reminds me many years ago, when my friend worked as a programmer in a major bank writing small programs for an online international financial system. He issued an 'shutdown' command through JCL(Job Control Language) and that really shutdown the entire system. He didn't realize he had the privilege to issue administration commands. Instead of reporting the crisis to his manager, he hide away until someone figured out what's going on. Needless to say, my friend was fired.

    Years later I met his manager, he told me that my friend could have been promoted for discovering one of the biggest loophole ever in the bank's history, if he had reported the problem immediately. Though the unexpected shutdown caused considerable damage, it could have saved billions from real break-in with this loophole.

    That's a lesson that every engineer should have been learned. :)
  • I hope they don't (Score:5, Insightful)

    by khallow (566160) on Tuesday October 16 2007, @01:11AM (#20991897)

    The article has good insights into the role the ISS plays as a laboratory for US-Russian technology cooperation -- something that is likely to be crucial in any manned Mars mission.

    No offense to Russia or the US, both who produce good space gear, but technology cooperation is probably a bad idea unless it is tested more thoroughly than in the ISS. The ISS is a great example of how to screw up international cooperation. The station has been delayed for more than a decade (and cost NASA around $50 billion so far) due to redesign and indecision, reliance on a single launch vehicle for key components (the Shuttle), and the inclusion of the Russians. There are parts of the station that can only communicate with the Russians and parts that can only communicate with NASA. Aside from basic utility hookup (electricity), there's no connection between the different parties on the ISS (at least between the Russians and NASA, the ESA and Japanese parts might work better with NASA's stuff). And if you want to make changes that affect more than one party, it becomes by default an international issue. Finally, there's no easy way to transfer ownership. NASA's communication system is integral (TDRSS [wikipedia.org]) to the NASA parts and is also a national secret (so I understand). So the communication system can't be transfered to another party like the Russians or the ESA.

    If there's any international cooperation between space agencies, it probably should be at a rather trivial and manageable level. Say including foreign astronauts or using off the shelf equipment that is know to work under the circumstances.

  • Here we go again... (Score:5, Informative)

    by LanceUppercut (766964) on Tuesday October 16 2007, @02:01AM (#20992137)
    Well, well, well... Here we go again. Jim Oberg. That same Jim Oberg who was almost blowing his gasket a couple of weeks ago when that journalist was asking him questions about alcohol abuse by astronauts (you all remember the story, I'm sure). It was all preposterous nonsense not backed up by any evidence, he said, berely keeping his cool. And what do we see now? He is happily making up stories about Russians accusing US of the computer falures - something that never happened in reality. The power problems caused by some new US installations were indeed considered as intermediate working brainstormed versions of what could have happened. But nobody ever did any fingerpointing or made any acussations before the situation was sufficiently researched and the root cause determined. Of course, Jim Oberg could not refreain from distorting the truth "just a little". Tsk, tsk, tsk... Note, how he refers to the hypothesis as both "blatant finger pointing" and just "guesses" within single paragraph - just to keep his article a little fuzzy, so that he can flip-flop to either when the situation calls for it. Nothing surprising here, though...
  • by hazard (2541) on Tuesday October 16 2007, @02:02AM (#20992145)
    The article is misleading. The computers are not actually of Russian make, they were supplied to Russians by Europeans (EADS). See here [softpedia.com].
  • by Zymergy (803632) * on Tuesday October 16 2007, @02:11AM (#20992195)
    I had an 89' Nissan Pathfinder and it had factory wiring harness connectors to ALL of the various electrical connections which were water-tight with one or more ribbed red silicone gaskets.
    The connectors were not always easy to disconnect, however, after 177,000 miles and 11 years of original ownership, I never found any corrosion inside any one of them I ever disconnected for service.
    Additionally, the male/female electrical contacts within the sealed connectors appeared to be made from a tinned Copper and/or Brass metal. This is important to note, as Brass, and to a much larger extent, Copper, have ELECTRICALLY CONDUCTIVE oxide states (as surface corrosion by moisture and/or other aqueous solvents).
    In other words, you corrode a Copper or Brass metal electrical connector, and it will still conduct electricity just fine. It may degrade certain frequencies of network/data signaling and alter the dB loss and impedance, but it will still conduct.
    This is another reason why the top-post Nissan main battery terminal connectors for this vehicle were made from a Copper/Brass strap instead of a traditional Lead connector.
    Lead oxide powders (as found on many old standard Lead top-post automotive battery terminals) are not effective electrical conductors (as anyone who has wiggled/cleaned a corroded connection to allow their car to start could attest).
    Why did the design/production Engineers for the ISS not utilize Gold Plated Watertight industry standard (ISO, etc) wiring interconnects? (Even cheap RJ-45 connectors have gold-plated pins)
    -That is the REAL Question.
  • Wiring corrosion? (Score:5, Insightful)

    by Animats (122034) on Tuesday October 16 2007, @02:17AM (#20992227) Homepage

    I'm surprised that connector corrosion would be a problem. Aviation has a long history of wire problems [etsu.edu], but gold-plating connectors seems to be a stable solution to that problem. The ISS uses Kapton wire, which was popular in the 1980s and is lightweight and tough. But that material is hygroscopic and now banned by the USAF, US Navy, Boeing, etc. "Susceptible to aging in that it dries out forming hairline cracks which can lead to micro current leakage (i.e. electrical 'ticking' faults)"

    There are ways to do corrosion-resistant contacts without precious metals; the automotive industry has solved this problem. The alloys aren't simple; here's one used for under-hood automotive connectors. [olinbrass.com] Copper, iron, magnesium, and phosphorus, with upper limits on tin, zinc, nickel, lead, and manganese. But avionics connectors are usually gold plated; it doesn't add that much cost. And Russia is a major exporter of gold.

    The article doesn't go far enough. OK, the connectors corroded. Why? Wrong alloy? Plating failure? Wear from too many connector insertions? Was the spec wrong, or were the cables not made to spec?

    • Hmmmm. (Score:5, Informative)

      by WindBourne (631190) on Tuesday October 16 2007, @12:53AM (#20991805) Journal
      The original plans called for the ISS to be finished many years ago. It is not yet, because America has had issues with transportation. In addition, a few modules that were planned to make the ISS very useful were canceled because of us (in particular, CAM). In the end, both sides have had issues, and changes have occurred. That is normal for these kinds of projects. To be honest, I think that all of this has been handled pretty decently.
      • Re:Hmmmm. (Score:4, Insightful)

        by CharlieG (34950) on Tuesday October 16 2007, @03:23AM (#20992525) Homepage
        I think NASA's BIG mistake (pun intended) was designing the modules such that they could ONLY be lifted by the shuttle, instead of the then Titan's, or today's Delta/Atlas heavy lift versions, particularly post Challenger, when all the commercial stuff got moved off the shuttle.

        If they had designed the modules for multiple lift modes, if one was NOT operational, the odds are the other would be. THAT is true redundency - 2 totally different systems, each capable of doing the job
        • Re:Hmmmm. (Score:5, Interesting)

          by WindBourne (631190) on Tuesday October 16 2007, @03:38AM (#20992607) Journal
          Problem with doing the small lift, is that the ISS would have been a fraction of the size that it is. Until they developed transhab, each module would have to be rinky dink.

          Personally, I would argue that not moving forward on new lifters was THE real mistake. In particular, during reagans time was when the Challenger happened. reagan should have started the development on a new lifter then. Clinton did start one (X-33), but it was killed off with W. Right now, I would have to say that if America can get multiple launchers that can lift 25 metric tones inexpensively AND perhaps 2 launchers that are true Saturn class (the Ares IV|V and the the falcon BFR), then we would be ok for some time, perhaps 2020-2025. What amazes me is that we expected a new class of rocket to last like an airliner. Yet, Rocket Science is in the same place that Airplanes were in the 40's; roughly undergoing all sorts of changes due to loads of new research. Hopefully, we learned from all this.
    • Re: (Score:3, Interesting)

      Russia has shown that they do not consider humidity to be an issue. In particular, the MIR was all but finished because it had mold everywhere.

      Russia taught us a lot about space construction and staying alive in a space station. But likewise, we have also done the same. But it is obvious that there is room for more growth.
    • by arivanov (12034) on Tuesday October 16 2007, @01:22AM (#20991953) Homepage
      Slashdot didn't want to let me cut-'n-paste it in.

      Nope it does not. I guess I will have to put that in phonetic transcription:

      Tovarish Dave: Otkroj luk skotina.
      Tovarish HAL: Pshel na huj

      I wonder how you sing "Daisy Daisy" in Russian?

      Margaritka, margaritka pshla na huj

      That is modern Russian, the wonderful language of Pushkin and Chehov may slightly differ..

    • by jamstar7 (694492) on Tuesday October 16 2007, @02:17AM (#20992225)
      I'm thinking it's relatively close to even. We lost 3 on the pad (early Apollo, where we learned that a full oxygen mix in a capsual with burnable stuff in it is Almost A Good Idea), & a pair of crewed space shuttles. Officially, the Russians haven't lost anybody but rumor around the water cooler is, they lost a couple when they couldn't deorbit a capsual in time and the cosmonauts ran out of oxygen, couple died on the pad in explosions, and a couple parachute failures pancaked a couple Vostoks into the Siberian tundra.
It's not that I'm afraid to die. I just don't want to be there when it happens. -- Woody Allen