Why ISS Computers Failed 324
Geoffrey.landis writes "It was only a small news item four months ago: all three of the Russian computers that control the International Space Station failed shortly after the Space Shuttle brought up a new solar array. But why did they fail? James Oberg, writing in IEEE Spectrum, details the detective work that led to a diagnosis." The article has good insights into the role the ISS plays as a laboratory for US-Russian technology cooperation — something that is likely to be crucial in any manned Mars mission.
The REAL reason they failed (Score:5, Funny)
Re:The REAL reason they failed (Score:5, Funny)
Re: (Score:3, Insightful)
Re: (Score:3, Funny)
Re: (Score:3, Interesting)
Re:The REAL reason they failed (Score:4, Insightful)
Why? Is defending a MS operating system for honest reasons impossible to believe anymore?
Re:The REAL reason they failed (Score:5, Insightful)
We don't do honest here. We do technically sound.
Re:The REAL reason they failed (Score:5, Informative)
We don't do technically sound here. We make do parroting the "common wisdom" and secretly praying nobody who actually knows something will be bothered to respond.
Good form means getting and informative moderation rating without provoking an informative result. If you do provoke an informatve result, you end up in the penalty box (i.e., spend a few days actually getting work done rather than wasting time on Slashdot).
Re: (Score:3, Interesting)
Here's the problem: the vast majority of Slashdotters are either: a) technically incompetent or b) Unix people, which also makes them technically incompetent but also gives them an unjustified superiority complex. After all, their OS of choice has gotten to the point that they have to assemble it themselves and then give it away for free. And despite all of that, people still don't want it. Go fig.
In all seriou
Re: (Score:3, Informative)
Nope, I'm including most of them in the statement as well.
Possibly because desktop users don't see any value in windows, they either pirate it or it's installed on the machine for them.
Stick to the topic at hand, we're not talking home users. Home users don't see any value in their computers, let alone their OS.
Over time the everyday user experince has been confused by the amount of changes in the only gui they've b
Re:The REAL reason they failed (Score:5, Insightful)
Millions, nay Tens of Millions of people give Microsoft and their products "the time of day." People who have no dogmas or political agendas when it comes to computing. People who just see a computer and its software as a tool to get their desired job done. And not just MBA or Administration types, but also millions of software developers and network administrators and such.
I don't think Windows is perfect, but I also don't think OSX is perfect nor do I think that Linux or any flavor of Unix is perfect. I do think that the O^n usefulness of the Windows install base provides so much opportunity that it ends up offering the most value to businesses and consumers.
And with regard to their "self serving" ways... many on slashdot are anti-business or at least anti-corporation. They adopt the FSF malarkey that all code should be given away free. I put food on my family's table by developing software and the notion that it should be given away free just misses the mark. Market-based economics can bring out the best in innovation, which is why America has some of the highest paid and most productive workers in the world.
Slashdot is full of idealistic college students and 20-somethings (of which I am a part) who think that corporations are "evil" and that we should all wear birkenstocks and eat crunchy granola and spend our days writing software that solves a problem that's already been solved on a Windows platform and then give it away for free just so we can say we fought the good fight. It's naive. Say what you want about Microsoft, but that company, and the efforts of billg have made THOUSANDS of people millionaires and probably a handful of billionaires, too. Many of those people took that money and started their own software companies solving their own unique, novel problems, and on their own hiring employees and fueling the economy and probably making a lot of those people millionaires, too, who perpetuate it.
Business is good for all of us. Economic success and security is good for America.
Re: (Score:3, Interesting)
Of cour
Re: (Score:3, Interesting)
Or maybe the "show-stoppers" you're hearing about are nothing but pure weapons-grade bullshit in the first place.
Underneath it all, many people are waiting for MS to release "a better Unix tha
Re:The REAL reason they failed (Score:4, Insightful)
Re: (Score:2)
It'll be another NT but worse.
Re: (Score:3, Informative)
Not totally immune to EMP; they'll saturate, then return to normal operation, whereas a transistor will just act like a fuse and burn out.
They didn't bring the right travel adapters. (Score:5, Funny)
Re:They didn't bring the right travel adapters. (Score:5, Funny)
Re:They didn't bring the right travel adapters. (Score:5, Funny)
You... will... DIE!! *force lightningz!*
Re:They didn't bring the right travel adapters. (Score:5, Funny)
Re:They didn't bring the right travel adapters. (Score:5, Funny)
I choose to listen to music in a specially-designed, oxygen-free space. You can really hear the increase in clarity and room dynamics. The mid-range sounds a lot brighter too.
Re: (Score:3, Informative)
Urgh. (Score:5, Insightful)
Re: (Score:2, Funny)
Re: (Score:3, Funny)
Re: (Score:3, Interesting)
Re:Urgh. (Score:5, Funny)
For a split second, I thought you said it reeked of condensation towards the Russians.
Re: (Score:2)
Though someone seems to have modded me funny anyway... life goes on.
Re:Urgh. (Score:5, Interesting)
Re:Urgh. (Score:4, Insightful)
This item is hugely biased. It looks to me like a simple case of corrosion, which could easily have been patched up if it happened on a Mars flight. The engineers and crew all seemed to work well together, and the Russians were the ones who sorted the problem.
I don't know if the Russian Program Managers got all political against us, but the item, written by a retired NASA manager, sure as hell gets political against the Russians. He's right in one thing - the managers need to stop getting political, and I suggest he starts with himself!
It's just as well he's retired - looks like he's fighting long lost battles against cooperation with the Russians and Europeans.
Re:Urgh. (Score:4, Interesting)
When you follow the space progam/ISS day in and day out, rather than relying on the all to infrequent Slashdot coverage... you soon see why. Again and again when something goes wrong, the Russians first (publically) announced 'theory' is that the problem is 'the Americans fault'. Only months later, if ever, does the truth come out. There are a couple of failures from the early flights of the current Soyuz version that were publically blamed on the Americans - that the Russians have yet to disclose the real cause of. The Russians have a long habit of being less than candid when it comes to their space program, and NASA has gone right along with them in covering up safety and performance issues with MIR, Soyuz, and the ISS.
Sure, this one failure could have been patched up - but this is only the latest in a long series of failures caused by poor design and manufacture of the Russian segments of the ISS. Failures nowhere matched on the US side. Failures consistently blamed on the US by the Russians. While both NASA and the Russians are publically praising the performance of the Russian hardware.
It's not just about the Russians.
It may seem that way to somebody unfamiliar with the backstory and history. (I.E. pretty much every Slashdot commentator so far.)
[rant]The Slashdot hivemind frustrates the hell out of me when it comes to space issues. Too damm few bother to actually read and keep up with the field, and fewer still know much about the history.[/rant]
Re:Urgh. (Score:5, Insightful)
Seriously, all of that political cold war-era cockwaving should stop.
Re: (Score:3, Insightful)
Re:Urgh. (Score:4, Insightful)
"It is dismaying that after decades of experience with manned space stations, Russian space engineers still couldn't keep unwanted condensation at bay."
That's a bunch of crap. That's like saying it's dismaying that McDonald's has served billions of burgers and still can't figure out how to make them healthy.
Condensation is "still" a problem because it's one of the big and tricky ones. To get rid of the condensation, you have to get rid of the people.
Re:Urgh. (Score:5, Funny)
Lev Andropov: Armageddon: "Components. American components, Russian Components, ALL MADE IN TAIWAN!
Re: (Score:2)
Actually it was the Chinese!
From Command Override [willthomasonline.net]:
LOST IN SPACE
The next hack came almost immediately, when Russian computers controlling the International Space Station's orientation and supplies of oxygen and water inexplicably failed while the station's three crewmembers were hosting seven visiting shuttle astronauts.
Among the station's network of six Russian computers, only two remained functioning. A system-wide re-boot usually resolved smaller hitches, But this time, the system was unable to re-boot.
"A failure of this type has not occurred before," the BBC reported. [BBC June 14/07]
"This is serious," stated James Oberg, a retired rocket scientist turned author and consultant. "These computers run their life support, so if they can't be restored, the space station could become uninhabitable." Oberg added, "Statistically, this is not random. There is some new environmental factor that must identified and isolated, and neither step is trivial." [TechNewsWorld June 14/07]
Russian flight controllers and onboard engineers traced the problem to "odd readings" in electrical power cables feeding the Russian computers through a corroded junction box labeled BOK 3. [Space.com July 16/07]
The gremlins returned to the Russian machines on February 5, when another ISS computer system crashed in the Zvezda Service Module that routes data between orientation sensors and four positioning gyroscopes. The space station's solar power stopped supplying power, and communications were cut with Earth.
Though power and comms were restored three hours later, New Scientist reports, "The cause of the computer crash remains a mystery. NASA has so far not identified the cause of the crash." [New Scientist Feb 5/02]
But Hank was on it. "They had limited oxygen, a limited time frame," he observed. The astronauts onboard the space station didn't know if the next computer malfunction "would open an airlock." But like an airliner in flight, the station should have smoothly shifted over to backup systems.
It didn't.
"The word 'redundancy' never got into the story," Hank pointed out. Instead, all three backup circuit boards wired into three isolated circuits, "had to blow out in the same way at the exact same time. The fault that occurred in the first board, the second board, and the third board all had to be the same damn thing at the same damn time."
"Impossible," he declared. Especially, since each of the simultaneously faulty microchips had been "stress tested to hell and back. Except for internal stressors."
Except for "Made In China" microchip mischief.
While it is not yet confirmed that the February 5 microchip malfunction was related to the June 14 space station hack, according to Hank's sources, on that earlier date the Chinese pulled the equivalent of Cheney's Singapore diversion--in space. "Nobody got busted for it," he adds. "You always hear about the company at fault."
Not this time.
J / K! ;)
Indeed, how many russion casualties have there bee (Score:5, Insightful)
Tell me, how many casualties have the russians had in the last decade, even last two decades? This was in the days of Mir, when the russians maintained a continues space pressence year after year and the US was out of space for year after year for blowing up space shuttles.
So whose tech is behind whose? The ISS didn't plunge out of the sky when the Space Shuttle was not available, apparently the russian capability is more then enough to operate it.
And finally, who build the de-humidefier that was the fault in the first place?
Re: (Score:2)
Duct tape saves the day! (Score:5, Informative)
Once again, duct tape saves the day!
Re: (Score:2)
Re:Duct tape saves the day! (Score:5, Funny)
Re:Duct tape saves the day! (Score:5, Funny)
Re: (Score:3, Insightful)
Two studies related to duct tape have been reported recently. The first was a government study of various commercial products for affixing insulation to HVAC systems, which found that every product performed well over time except duct tape. The second was a study which showed that the folk remedy for warts in which you cover them with duct tape was surprisingly effective.
There you have it: amazingly versati
Hmmm (Score:5, Funny)
Or would that be "In Russia, crashes compute you!" ?
Duct Tape (Score:5, Insightful)
They also decided to rig a thermal barrier out of a surplus reference book and all-purpose gray tape
Almost certainly, this was the duct tape we all know and love. They probably thought it was better not to actually say that, though. Pretty funny. And as an added side-benefit, they should be safe from terrorists.
Re: (Score:2)
Redundancy != Safety (Score:5, Insightful)
I think NASA should have learned this lesson by now. After all, the Challenger disaster showed this principle as well. In that case, the same cold temperature that weakened the primary seal on the solid rocket booster weakened the secondary as well, sapping its ability to provide redundant backup. In this case, the same condensation affected all three computers equally.
Its troubling to see them taking shortcuts on safety and redundancy, when such measures have resulted in loss of life before. How hard would it have been to have had three shut-off cables?
Re:Redundancy != Safety (Score:5, Informative)
Re: (Score:2)
Re: (Score:2)
Re:Redundancy != Safety (Score:4, Funny)
Re: (Score:2)
It was a Russian subsystem that failed, so don't instinctively crap on NASA for every problem in the world.
If that Russian failure had resulted in NASA astronauts dying, then it would become a NASA failure. NASA can't foist off that kind of blame.
Depends on the Redundancy (Score:2)
To do this really well though, requires risk management software that I am not sure even exists. You'd have to simulate everything. The devil, as happened to Challenger, is that, there are so m
Re:Redundancy != Safety (Score:5, Insightful)
Its troubling to see them taking shortcuts on safety and redundancy, when such measures have resulted in loss of life before. How hard would it have been to have had three shut-off cables?
Give it a rest (Score:5, Funny)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
obligatory (Score:2, Funny)
Re: (Score:2)
Nyet, Dave. (Score:2)
Comrade Dave: Open ze Pod Bay Doors, HAL.
Comrade HAL: Nyet Comrade Dave, I cannot do that.
I wonder how you sing "Daisy Daisy" in Russian?
Re:Nyet, Dave. (Score:5, Funny)
Nope it does not. I guess I will have to put that in phonetic transcription:
Tovarish Dave: Otkroj luk skotina.
Tovarish HAL: Pshel na huj
I wonder how you sing "Daisy Daisy" in Russian?
Margaritka, margaritka pshla na huj
That is modern Russian, the wonderful language of Pushkin and Chehov may slightly differ..
And this shows the value of ISS (Score:2)
Hate to break it to you... (Score:4, Insightful)
Re: (Score:2)
Rust proof gold anyone? (Score:2)
Couldn't the ISS with it's multi billion dollar cost use contacts and cables that can't rust? Gold for contact points, aluminum for the bulk cable?
Heck, given the costs involved, it'd barely be a rounding error in the budget to use solid gold cables. One tonne of gold at $700 per ounce is about $25 million. Not
Re: (Score:2)
Re:Rust proof gold anyone? (Score:4, Informative)
Interesting hardware problem (Score:2)
This'll get worse and worse as exploration goes farther and farther afield. Even little things like mold, dust, and the black gunk that piles up on the bottom of a mo
Re: (Score:3, Interesting)
Russia taught us a lot about space construction and staying alive in a space station. But likewise, we have also done the same. But it is obvious that there is room for more growth.
Proper debugging technique (Score:5, Insightful)
The author is obviously way more qualified than I to assess the situation and he may well be right but from the content of the article I came away thinking, wow, I would have looked first at all the recent changes to the station and the power supply too.
Cascading failures (Score:2, Interesting)
I have always tried to learn from air crash investigations and so on how failure modes develop. In problem solving mode, it seems one should assume the distinct possibility of multiple problems all at once.
In this case, multiple failure paths existed, tho it took a power spike to set it off as I interpretted it. Even without co
Re:Proper debugging technique (Score:4, Insightful)
Look, the Russians as people are all right. But their management in the space program is obsessed with face. They feel that admitting any faults demeans the Russian nation and the Russian people. You can laugh but that's how it is.
Re: (Score:2)
Re:Proper debugging technique (Score:5, Insightful)
Re: (Score:3, Funny)
Try debugging the electrics on an 80s BMW some time. The manual for the door locks is 3 pages thick.
Hint: fuse 11 is not your friend.
It's interesting... (Score:5, Insightful)
1) unexpected failure modes
2) political battles
Which really isn't a whole lot different than 1) the unexpected failure modes I see every day at work, and 2) the political wrangling (fingerpointing) that takes place when they happen. Apparently NASA and its Russian equivalent are no better than any old software company.
The lesson being, people are people, and people are still the ones that design these things.
Power off command (Score:5, Interesting)
Years later I met his manager, he told me that my friend could have been promoted for discovering one of the biggest loophole ever in the bank's history, if he had reported the problem immediately. Though the unexpected shutdown caused considerable damage, it could have saved billions from real break-in with this loophole.
That's a lesson that every engineer should have been learned.
Jingoism (Score:2, Insightful)
It is dismaying that after decades of experience with manned space stations, Russian space engineers still couldn't keep unwanted condensation at bay. But what's worse is that they designed circuitry that would allow one spot of corrosion to fell a supposedly triply redundant control computer complex.
I find it more dismaying that an otherwise seemingly adult and mature article writer feels such an urge to childishly emphasize blame. What is it with this childish American and Russian jingoism? If blame is so important, can't you people at least blame the engineers and not the nationality?
Re: (Score:2)
Re: (Score:2)
When the failure happened, the Russians pointed the finger everywhere but themselves.
The Russians showed jingoism by pointing fingers at NASA, and the article author does the same kind of jingoistic finger-pointing in return. Childish on both sides.
Nobody is perfect. No need to point fingers. Just learn and move on. Like grown-ups.
Judging from the comments, the Slashdot crowd seems more mature than these people. That's rather surprising considering the trolls and other children we have here.
Re: (Score:2)
There is a lot of history behind this.
The Americans and the Russians have always taken very different approaches to dealing with safety engineering in space. The Russians have typically taken an empirical, "what me wo
I hope they don't (Score:5, Insightful)
The article has good insights into the role the ISS plays as a laboratory for US-Russian technology cooperation -- something that is likely to be crucial in any manned Mars mission.
No offense to Russia or the US, both who produce good space gear, but technology cooperation is probably a bad idea unless it is tested more thoroughly than in the ISS. The ISS is a great example of how to screw up international cooperation. The station has been delayed for more than a decade (and cost NASA around $50 billion so far) due to redesign and indecision, reliance on a single launch vehicle for key components (the Shuttle), and the inclusion of the Russians. There are parts of the station that can only communicate with the Russians and parts that can only communicate with NASA. Aside from basic utility hookup (electricity), there's no connection between the different parties on the ISS (at least between the Russians and NASA, the ESA and Japanese parts might work better with NASA's stuff). And if you want to make changes that affect more than one party, it becomes by default an international issue. Finally, there's no easy way to transfer ownership. NASA's communication system is integral (TDRSS [wikipedia.org]) to the NASA parts and is also a national secret (so I understand). So the communication system can't be transfered to another party like the Russians or the ESA.
If there's any international cooperation between space agencies, it probably should be at a rather trivial and manageable level. Say including foreign astronauts or using off the shelf equipment that is know to work under the circumstances.
Re: (Score:2)
The only way you can test is by doing. They're running the very test you're asking for.
Re: (Score:3, Insightful)
Re: (Score:2)
This is one of the most self-contradictory sentences I've read for quite some time. Because of the inclusion of the Russians, the ISS
does not rely on a single launch vehicle! Which craft was sending astronauts and supplies when all the shuttles for grounded for years after
the Col
Re: (Score:2)
Notice missions 22-26, from 2003 to 2005? Notice that Soyuz made more than half the flights to ISS?
Now please, so some respect for the noble efforts of the seriously underfunded Russian space program...
First things First (Score:2)
Cause you never know exactly how bad it's gonna get.
BBH
Lack of Restraint (Score:2)
Here we go again... (Score:5, Informative)
The computers are not Russian, but European (Score:5, Informative)
Superior Terrestial Connector Technology! (Score:4, Informative)
The connectors were not always easy to disconnect, however, after 177,000 miles and 11 years of original ownership, I never found any corrosion inside any one of them I ever disconnected for service.
Additionally, the male/female electrical contacts within the sealed connectors appeared to be made from a tinned Copper and/or Brass metal. This is important to note, as Brass, and to a much larger extent, Copper, have ELECTRICALLY CONDUCTIVE oxide states (as surface corrosion by moisture and/or other aqueous solvents).
In other words, you corrode a Copper or Brass metal electrical connector, and it will still conduct electricity just fine. It may degrade certain frequencies of network/data signaling and alter the dB loss and impedance, but it will still conduct.
This is another reason why the top-post Nissan main battery terminal connectors for this vehicle were made from a Copper/Brass strap instead of a traditional Lead connector.
Lead oxide powders (as found on many old standard Lead top-post automotive battery terminals) are not effective electrical conductors (as anyone who has wiggled/cleaned a corroded connection to allow their car to start could attest).
Why did the design/production Engineers for the ISS not utilize Gold Plated Watertight industry standard (ISO, etc) wiring interconnects? (Even cheap RJ-45 connectors have gold-plated pins)
-That is the REAL Question.
Re:Superior Terrestrial Connector Technology! (Score:2)
Wiring corrosion? (Score:5, Insightful)
I'm surprised that connector corrosion would be a problem. Aviation has a long history of wire problems [etsu.edu], but gold-plating connectors seems to be a stable solution to that problem. The ISS uses Kapton wire, which was popular in the 1980s and is lightweight and tough. But that material is hygroscopic and now banned by the USAF, US Navy, Boeing, etc. "Susceptible to aging in that it dries out forming hairline cracks which can lead to micro current leakage (i.e. electrical 'ticking' faults)"
There are ways to do corrosion-resistant contacts without precious metals; the automotive industry has solved this problem. The alloys aren't simple; here's one used for under-hood automotive connectors. [olinbrass.com] Copper, iron, magnesium, and phosphorus, with upper limits on tin, zinc, nickel, lead, and manganese. But avionics connectors are usually gold plated; it doesn't add that much cost. And Russia is a major exporter of gold.
The article doesn't go far enough. OK, the connectors corroded. Why? Wrong alloy? Plating failure? Wear from too many connector insertions? Was the spec wrong, or were the cables not made to spec?
Hmmmm. (Score:5, Informative)
Re:Hmmmm. (Score:4, Insightful)
If they had designed the modules for multiple lift modes, if one was NOT operational, the odds are the other would be. THAT is true redundency - 2 totally different systems, each capable of doing the job
Re:Hmmmm. (Score:5, Interesting)
Personally, I would argue that not moving forward on new lifters was THE real mistake. In particular, during reagans time was when the Challenger happened. reagan should have started the development on a new lifter then. Clinton did start one (X-33), but it was killed off with W. Right now, I would have to say that if America can get multiple launchers that can lift 25 metric tones inexpensively AND perhaps 2 launchers that are true Saturn class (the Ares IV|V and the the falcon BFR), then we would be ok for some time, perhaps 2020-2025. What amazes me is that we expected a new class of rocket to last like an airliner. Yet, Rocket Science is in the same place that Airplanes were in the 40's; roughly undergoing all sorts of changes due to loads of new research. Hopefully, we learned from all this.
Re:A bit harsh on the Russians. (Score:5, Interesting)
Re: (Score:2)
Oh, what about the cosmonauts whose pressure equalization valve opened at an altitude of 160 km ?
http://en.wikipedia.org/wiki/Soyuz_11 [wikipedia.org]
Re: (Score:2, Insightful)