Forgot your password?
typodupeerror
NASA Space Hardware Technology

Self-Healing Computers For NASA Spacecraft 70

Posted by Soulskill
from the it-worked-for-the-borg dept.
Roland Piquepaille writes "As you can guess, hardwired computer systems are much faster than general-purpose ones because they are designed to do a single task. But when they fail, they need to be totally reconfigured. This can be just a costly problem in a lab on Earth, but it can be vital in space. This is why a University of Arizona (UA) team is working with NASA to design self-healing computer systems for spacecraft. The UA engineers are working on hybrid hardware/software systems using Field Programmable Gate Arrays (FPGAs) to develop these reconfigurable processing systems. As the lead researcher said, 'Our objective is to go beyond predicting a fault to using a self-healing system to fix the predicted fault before it occurs.'"
This discussion has been archived. No new comments can be posted.

Self-Healing Computers For NASA Spacecraft

Comments Filter:
  • Not new (Score:5, Informative)

    by Anonymous Coward on Saturday April 26, 2008 @04:45AM (#23206262)
    I used to work for JPL, in a group that was researching the feasibility and applications of FPGAs for this exact purpose. That was around 7-8 years ago, which significantly predates this "news," given the pace of technology. IIRC, they called it "evolvable hardware."
  • by ScrewMaster (602015) on Saturday April 26, 2008 @05:05AM (#23206308)
    Interestingly, that's pretty much how the Space Shuttle's on-board systems work. Three separate processors from two different vendors (IBM and Rockwell, if I recall correctly.) Nothing new under the Sun, I suppose.
  • by legonis (1053412) on Saturday April 26, 2008 @07:19AM (#23206592)
    I fail to see what is new in their approach. Both of these two fields had been explored before and their approach is essentially based on redundancy, only the available standby gates are in the FPGA. I read their paper, it seems that the biggest part that they are still lacking is for problem determination. Their approach is also prone to failure when their reconfiguration hardware or their processor or their analog components are the faulty ones. Although it could have some potentials, it's reliability has to be analyzed and I don't see it replacing classic N-Version systems any time soon.
  • by Anonymous Coward on Saturday April 26, 2008 @09:30AM (#23207102)

    Well at least you cant get a robot pregnant......
    I guess you're not a big fan of Battlestar Galactica?
  • by Animats (122034) on Saturday April 26, 2008 @10:46AM (#23207436) Homepage

    It's Roland the Plogger again, pushing his ad-laden blog. The actual research summary is here [arizona.edu]. The real paper won't be out until July.

    This isn't new. JPL has been trying various levels of self-healing for years.

    The original article describes a cluster of five machines, set up so that if one fails, others take over tasks running on the failed machine. That's what the better server management systems do. I went to a talk last week by Amazon's CTO, and he described how their platform does that.

    The project web site makes things clearer. There are two levels of recovery. The upper level works like cluster fallover. The lower level tries to reconfigure the FPGAs to use different cells in the FPGA to work around faults. That's likely to be a delicate process; you'd need substantial on-chip test resources to reliably do gate-level fault isolation on an FPGA that's been hit hard by a cosmic ray. It's not clear how fine-grained this is; this may be more like having multiple units like GPU shaders replicated in an FPGA, with the ability to turn off the failed ones. Sort of like the way Sony ships PS3 machines with eight Cell processors, at least seven of which work.

    The available info isn't enough to tell whether this is a good idea or not. About typical for Roland the Plogger.

Disks travel in packs.

Working...