Forgot your password?
typodupeerror
Space Bug NASA

ISS Computer Failure 289

Posted by kdawson
from the little-help-from-my-friends dept.
A number of readers wrote us with news of the computer problems on the International Space Station. Space.com has one of the better writeups on the failure of Russian computers that control the ISS's attitude and some life-support systems. Two out of six computers in a redundant system cannot be rebooted. The space shuttle Atlantis may have its mission extended until the problem is fixed. A NASA spokesman was optimistic that the problem can be resolved; worst-case scenario would be for the shuttle to evacuate everyone onboard the ISS. Engineers are working on the theory (among others) that the failure may have been triggered by new solar panels installed earlier in Atlantis's mission.
This discussion has been archived. No new comments can be posted.

ISS Computer Failure

Comments Filter:
  • DFMEA (Score:5, Interesting)

    by ThosLives (686517) on Thursday June 14, 2007 @09:23AM (#19504167) Journal

    Hopefully they're starting with their DFMEA documentation... "guessing" at the problem and having "theories" is probably not a good way to go. Also, it's apparently a common-mode failure, which you shouldn't have in a safety-critical system; generally this is avoided by having different computer hardware and/or completely different code to do the same tasks.

    Quite unfortunate that it seems like systems engineering is lacking in more and more disciplines recently, although I suppose it makes good systems engineers more valuable.

    My list for this would be something like: "Computer doesn't boot." Possible reasons: "No Power", "Insufficient power", "Corrupt memory", "Broken circuits", etc. Then you go down that tree further and find the root cause. The most disturbing thing is that they had such a major common-mode failure...whatever happened to the "no single points of failure" mantra?

    * sigh *

  • by elrous0 (869638) * on Thursday June 14, 2007 @09:24AM (#19504195)
    How about we evacuate the ISS and stop pumping money into that worthless money sink?

    No, no--I know is sounds crazy. But hear me out. Maybe we could actually pursue something NEW--you know, dare to violate that 30-year-old sacrosanct NASA policy of just repeating themselves over and over again and wasting trillions of $ on contractors and grandiose promises which never amount to squat.

    Just a thought.

  • by devnullkac (223246) on Thursday June 14, 2007 @09:24AM (#19504197) Homepage

    The stated worst case scenario is that the ISS will need to be evacuated, but if the remaining gyros are being overwhelmed, might the station enter an unrecoverable spin state before the problem is resolved?

  • by bronzey214 (997574) <jason...rippel@@@gmail...com> on Thursday June 14, 2007 @09:28AM (#19504251) Journal
    At this point, as a US taxpayer, I'd much rather see the ISS finished rather than just leaving it up there as a pile of space junk.

    It's kinda like finding out your house you're current building will cost twice as much as normal.

    Do you just leave it half finished and abandon it or do you keep pumping money into it?
  • by clickclickdrone (964164) on Thursday June 14, 2007 @09:37AM (#19504359)
    Sort of related.. The trains on my line in the UK are run using some sort of Java based system (we know because they were very buggy to begin with and the website used to give surprisingly honest updates on progress). ANyway, now and then it still goes a bit loopy and we have to sit in the station while the drive warns us over the Tannoy 'I'm just rebooting the train, back in a few minutes' and sure enough, the power drops, lights go out, fans stop then whoosh, it's on again, the displays start scrolling logos and welcome messages and one by one you can hear the subsystems power up. Quite cool, if your sad like me.
  • Re:DFMEA (Score:5, Interesting)

    by Sanat (702) on Thursday June 14, 2007 @09:47AM (#19504493)
    I have seen ground faults cause these types of problems. maybe the new solar panels has a leakage path back to the mechanical structure creating a voltage distribution problem after being interfaced with the ISS mechanically and electrically.

    These problems are not easy to diagnose when you have hands on capability leave alone 200 miles above Earth.

    I do hope that it is sorted out swiftly and the ISS and its occupants remain safe.
  • Re:OS? (Score:5, Interesting)

    by T.E.D. (34228) on Thursday June 14, 2007 @09:53AM (#19504585)

    Could these computers have MicroSoft's Windows as the OS?


    No.

    On NASA's manned space equipment you will find no software that is not controlled by NASA. These folks don't just run a few tests. They spend thousands of dollars per SLOC in testing. They actually mathematically prove their software's correctness. Perhaps the Russian agency's quality isn't quite as high, but I still doubt their (or anyone else's) systems onboard the ISS have any OS at all. Most likely they are all custom embedded systems.

    I'd council against jumping to conclusions about the cause of this solely based on the Russian origin of these systems. I remember a lot of people did that with the early Ariane crash [embedded.com] based on it being written in Ada, and ended up looking pretty silly when the problem turned out to be some ported code that wasn't rewritten properly for the new platform.
  • by richdun (672214) on Thursday June 14, 2007 @10:09AM (#19504819)
    Evacuating ISS would be a very bad thing to have happen. The crew would be fine, as this luckily happened with a shuttle in dock, which can act an emergency lifeboat for the whole crew (plus the Soyuz that's up there with them, if things got too crowded on Atlantis). The biggest problem would be for the hardware - without people up there to keep maintenance tasks going, the station would need to be completely shutdown save for a few critical systems (attitude control, the NH4 cooling systems, power, etc.). In this case, some of those few critical systems are what seem to be giving the trouble.

    Evacuating ISS is always a last resort, because should something happen to it while unoccupied, it'd be a total loss. We won't have another shuttle ready for a month or so, and I believe the Russians just recently did a Soyuz exchange, so there'd be no quick return, even if the problems were fixed. With attitude control in question, it could become too unstable for even a shuttle or Soyuz docking to occur.
  • Just for the record (Score:5, Interesting)

    by djupedal (584558) on Thursday June 14, 2007 @10:13AM (#19504901)
    For all those chucksters cracking wise about what a bucket of bolts the ISS is...

    The first piece of the space station was Zarya, the Russian control module that was launched into orbit November 20, 1998. A few weeks later, on December 4, 1998, the U.S. module Unity was launched into space. On December 7, 1998, the two modules were connected.

    That makes the ISS just over 8 years in service.

    How old is Atlantis?
    • Fourth orbiter to become operational
    • 01/29/79 Contract Awarded
    • 03/03/80 Started structural assembly of Crew Module
    • 04/10/84 Completed Final Assembly
    • 10/03/85 First Flight

    Space Shuttle Atlantis has completed 27 flights, spent 220.40-days in space, completed 3468 orbits, and flown 89908732 miles in total, as of September 2006. Atlantis visited visited MIR in 1997!

    Atlantis is 23 years old as of last April. 21 years in service. More than twice as old as the ISS.

    Now, tell again - which is the real bucket of bolts? ISS or Atlantis?
  • by peter303 (12292) on Thursday June 14, 2007 @10:18AM (#19504989)
    Many of NASA computers on spacecraft use a long-tested version of realtime UNIX called VxWorks from Charles River. It doesnt nexcessarily have the fancy stuff in modern *nix's, but is fairly reliable. Even that has been known to fail. The flash memory driver in the Martian Rovers had a bad free-list routine which shut them down for several weeks near the beginning of their mission after the flash memory filled up. A fix was uploaded. Flash memory was relatively new and hadnt been tested as much as the rest of the system.
  • Re:OS? (Score:1, Interesting)

    by Anonymous Coward on Thursday June 14, 2007 @10:56AM (#19505517)
    Here [sourceforge.net] there is not download and monte.sourceforge.net [sourceforge.net] is empty.

    Try here [kasperd.net].

  • Re:OS? (Score:2, Interesting)

    by LarsG (31008) on Thursday June 14, 2007 @07:57PM (#19513757) Journal
    Service restart isn't the problem. The problem is copying kernel state.

    The kernel holds a lot of information, such as which processes are running, memory allocation, drivers etc. For a true in-place switchover to a new kernel (i.e., all programs keep running as if nothing happened), all that information has to be copied over.

    The other option is to load the new kernel image to memory, shut down all processes and unload drivers, jump to new kernel and start a standard initialization. That would be the same as doing a 'shutdown -r', except that the new kernel is loaded by the old kernel instead of by the BIOS.

"Pull the trigger and you're garbage." -- Lady Blue

Working...