Murphy's Law Rules NASA 274

Posted by michael on Friday October 22, 2004 @10:12AM from the cat-lands-butter-side-down dept.

3x37 writes "James Oberg, former long-time NASA operations employee, now journalist, wrote an MSNBC article about the reality of Murphy's Law at NASA. Interesting that the incident that sparked Murphy's Law over 50 years ago had a nearly identical cause as the Genesis probe failure. The conclusion: Human error is an inevitable input to any complex endeavor. Either you manage and design around it or fail. NASA management still often chooses the latter."

Murphy's Law Rules NASA

This discussion has been archived. No new comments can be posted.

Search 274 Comments Log In/Create an Account

Comments Filter:

interesting but it's not really true (Score:5, Interesting)

by spacerodent ( 790183 ) writes: on Friday October 22, 2004 @10:16AM (#10597692)

while it's possible to always have a mistake, having people double check a project from the ground up will almost always find the problems. Nasa's current difficulties arise from scattered teams that all only check their parts rather than having fully qualified teams that go over the entire vehical. The fact that the whole thing is usually designed by committee and in several pieces then assembled at the last minute probally helps facilitate error. The Saturn V rockets and other technology we used to land on the moon had hte capability of being far less relyable than today's technology but we still managed to use them for years without error.

Cost Effective (Score:5, Interesting)

by clinko ( 232501 ) writes: on Friday October 22, 2004 @10:18AM (#10597709) Journal

It's actually more cost effective to allow for failures. You build the same sat 5 times and if 4 fail in a cheaper launch situation, you still save money.

From this [scienceblog.com] article:

"Swales engineers worked closely with Space Sciences Laboratory engineers and scientists to define a robust and cost-effective plan to build five satellites in a short period time."

Circular reasoning (Score:3, Interesting)

by D3 ( 31029 ) writes: <daviddhenning&gmail,com> on Friday October 22, 2004 @10:26AM (#10597812) Journal

The fact that human error isn't compensated for is the true human error that needs compensation.
I think I just sprained my brain thinking up that one.

The problem with errors (Score:4, Interesting)

by RealityProphet ( 625675 ) writes: on Friday October 22, 2004 @10:28AM (#10597838)

The problem with errors is that detecting all errors all the time is absolutely impossible. Think back to your intro theory cs class and to Turing Recognizability. Think halting problem. Now, reduce the problem of finding all errors to the halting problem:

if (my_design_contains_any_errors) while(1);
else exit;

Feed this into a program that halts on all input and see what happens. You can't, because we know it is impossible for it to always return an answer. QED: errors are unavoidable. No need to sniff derisively in the direction of NASA's "middle management". Let's see if YOU can do a better job!

Re:interesting but it's not really true (Score:3, Interesting)

by orac2 ( 88688 ) writes: on Friday October 22, 2004 @10:44AM (#10597964)

from scattered teams that all only check their parts rather than having fully qualified teams that go over the entire vehical.

Your sentiment is correct, but your details are a little off. For example the Saturn V rocket was built by "scattered teams" (and committees were heavily involved, despite the mythology around Von Braun)-- the first stage was built by Boeing, the second by North American, the third by Douglas Aircraft, the Instrument Unit (the control system) by IBM, the LEM by Grumman and the CSM by North American, and so on, all the way down a huge chain of sub-contractors. But Apollo had brilliant technical management: it was pricey, but did do an amazing job of system integration.

It's when you try for cheaper missions that having one team take a spacecraft from design through operation is important: this was done on the Mars Pathfinder mission to great success, but wasn't done on other "Faster, Cheaper, Better" missions, to great falure, as demonstrated by, well, take your pick.

John Galls Systemantics (Score:4, Interesting)

by Anonymous Coward writes: on Friday October 22, 2004 @10:44AM (#10597968)
Systems display antics. John Gall has written a great book which vastly expands on Murphys law which is called Systemantics - The Underground Text of Systems Lore [generalsystemantics.com]. I cannot recommend this book enough. It contains some truths about the world around us that's blindingly obvious once you see it, but until then you're part of the problem. Systemantics applied to political systems is very enlightening. Too bad that the only people who think like this in politics are the selfish and egomanical Libertarians (yeah, yeah.. I know. Libertarianism is the new cool for the self styled nerd political wannabe).

Here are some of the highlights:
- 1. If anything can go wrong, it will. (see Murphy's law)
- 2. Systems in general work poorly or not at all.
- 3. Complicated systems seldom exceed five percent efficiency.
- 4. In complex systems, malfunction and even total non-function may not be detectable for long periods (if ever).
- 5. A system can fail in an infinite number of ways.
- 6. Systems tend to grow, and as they grow, they encroach.
- 7. As systems grow in complexity, they tend to oppose their stated function.
- 8. As systems grow in size, they tend to lose basic functions.
- 9. The larger the system, the less the variety in the product.
- 10. The larger the system, the narrower and more specialized the interfaces between individual elements.
- 11. Control of a system is exercised by the element with the greatest variety of behavioral responses.
- 12. Loose systems last longer and work better.
- 13. Complex systems exhibit complex and unexpected behaviors.
- 14. Colossal systems foster colossal errors.
Re:interesting but it's not really true (Score:3, Interesting)

by mikael ( 484 ) writes: on Friday October 22, 2004 @10:57AM (#10598097)

Just like the case in which the airport crew assigned to clean an aeroplane put some masking tape over the air pressure sensors, but forget to remove it. Or rather, as the airport was badly lit and the masking tape wasn't noticably different from the skin of the aircraft, nobody noticed this small defect. Until the pilots came in to land the aeroplane that is, then it became a large problem.

KISS... (Score:3, Interesting)

by Kong99 ( 618393 ) writes: on Friday October 22, 2004 @10:59AM (#10598120)

A great engineer demonstrates his/her skill not by designing something terribly complex, but by designing the object that meets required specifications as SIMPLY as possible with as FEW unique parts as possible. That is GREAT engineering.
However, with that being said I really do not believe Engineers are the problem at NASA. Bureaucracy is the enemy at NASA. NASA needs a complete top to bottom overhaul.

Plan for Software Project Failure (Score:3, Interesting)

by StCredZero ( 169093 ) writes: on Friday October 22, 2004 @11:26AM (#10598383)
I'm just waiting for management to get a clue about this. Most software projects fail on some level. Most software in big corporations sucks on some level, in at least one important component. Management should get a clue and PLAN for this. They should:
- Have redundant competing projects
- Have standards that mandate how components/systems fit together
- NOT mandate that thou shalt use software X all across the enterprise
What large corporations have been doing is Soviet style central planning. What happens is that they get stuck with mediocre or sucky software that they cannot replace. Eventually, a few smaller companies start up that manage to have good software (out of many that fail in part because of sucky software) which gives them a competitive advantage. These either get bought up by or grow into ossified bureaucratic behemoths with no internal competition.

Sometime a corporation is going to become the Bazaar within, instead of the Cathedral (Cathedral & the Bazaar) [catb.org] and they'll maintain a long term competitive advantage by having internal competition.

I'm not holding my breath, however.
Re:That is NOT correct. (Score:3, Interesting)

by at_18 ( 224304 ) writes: on Friday October 22, 2004 @11:27AM (#10598397) Journal

Scaled Composites, for example, has demonstrated a suborbital craft capable of barely reaching space for a cost of around $25 million. In comparison, NASA developed and flew three X-15 prototypes with similar capabilities for a cost of $300 million in 60's dollars (which incidentally was considered a cheap program).

With the small difference that Scaled Composites is benefitting from 30 years of technology advancements. I don't think that an equivalent company of the 60s could build three SpaceShipOnes for $300 million.

Re:That is NOT correct. (Score:3, Interesting)

by 0123456 ( 636235 ) writes: on Friday October 22, 2004 @11:30AM (#10598425)

"F1 race cars, Racing Sailboats, Nuclear Reactors - NO design is failsafe, and NO design is foolproof."

Not true: there are failsafe nuclear reactor designs that even a genius couldn't manage to melt down, let alone a fool... you just have to design them with safety guaranteed by the laws of physics, not the control systems. General Atomics built a lot of them decades ago, and the Chinese are developing modern versions today.

Good design prevents a heck of a lot of problems. If nothing else, you'd have thought that by now engineers would realise that if you design something so it can be fitted backwards, sooner or later it will be.

Re:That is NOT correct. (Score:3, Interesting)

by Dun Malg ( 230075 ) writes: on Friday October 22, 2004 @12:06PM (#10598898) Homepage

If nothing else, you'd have thought that by now engineers would realise that if you design something so it can be fitted backwards, sooner or later it will be.
Hah! Engineers are the most intelligent bunch of idiots you'll ever find. The problem with engineers is that often their own cleverness and/or familiarity with the item they're designing blinds them to the viewpoint of someone who's "not clever" or totally new to the item. With (for example) the classic non-reversable, yet perversely symmetrical accelerometers, it probably never occured to the engineer designing them that someone could "not know" which end goes up. Sometimes it looks like just plain stupid engineering, like with a particular telephone PBX control system I work with. It has two expansion slots, Slot 1 and Slot 2. When you want to add only one expansion card, where do you put it? Slot1? No, that's too obvious. You put it in Slot 2. If you out a second card in later, that goes in Slot 1. At first I thought it was just an error in labeling the slots on the cabinet, but then I noticed that the circuit board itself is marked the same way! I'm sure there's a perfectly rational reason for it that makes sense only to the engineers who designed the system.

Re:The Gimli Glider (Score:4, Interesting)

by dtmos ( 447842 ) writes: on Friday October 22, 2004 @12:08PM (#10598919)

The jet liner to which you refer, I think, is the Gimli glider [wadenelson.com] which, through a forehead-slapping number of independent goofs, ambiguities, and misunderstandings made by a frighteningly large number of people, ran out of fuel over Cananda in 1983.

Re:interesting but it's not really true (Score:3, Interesting)

by Rei ( 128717 ) writes: on Friday October 22, 2004 @12:21PM (#10599060) Homepage

You're saying that there was no way to see the risk of *fire in a pure oxygen atmosphere*?

They certainly couldn't have tested the circuit here. The circuit detonated a ballistically launched chute. Not exactly the sort of thing you want to be doing in the lab on a fully built craft you're about to launch. Yes, they could have tested the integration of the chute with the craft electronics, but then to justify that, you'd need to justify testing *every* integration made on the craft. And you people already complain about how much NASA costs - what, are you just looking for more fuel to throw on the "NASA is too expensive" fire?

Ah well. Let the NASA-bashing thread continue. God knows slashdot has enough of them, always participated in by those who have never designed a rocket or probe in their life....

Re: interesting but it's not really true (Score:3, Interesting)

by OwnedByTwoCats ( 124103 ) writes: on Friday October 22, 2004 @12:26PM (#10599126)

The flight control computers on the Shuttle are an interesting example. If you have one fcc, and it dies, then you lose the vehicle and crew (vehicle is unflyable without the fcc). If you have two fccs, and one starts producing erroneous results, you don't know which one to trust. If you have three computers, you can survive any one of them failing, but then a second failure causes you problems.

If you have four computers, there's an outside chance that two will fail, and you will have to choose between two As and two Bs, and if you choose the wrong one, vehicle and crew are debris.

So the shuttle has five computers. The fifth runs software developed independently from the software running on the other four, and breaks two-two ties.

Probably the most reliable large software project ever. But horribly expensive in $/SLOC...

Expect Failure (Score:1, Interesting)

by Anonymous Coward writes: on Friday October 22, 2004 @12:39PM (#10599257)

Human error is an inevitable input to any complex endeavor. Either you manage and design around it or fail.

Strike the word complex from the above quotation.
As a software developer that's responsible for developing protocols for various tasks, I've learned that any system needs to be robust against failure and should also fail safe. All too many times I've seen people come up with systems that function well when every part works exactly as it should, but blow up in terrible ways when a single mistake is made. For example, consider using the bronze-gold-silver way of doing revision control versus a real revision control system like CVS et al. The former system works only so long as people copy the proper file from one area to another every time, for as long as the system's in use. I've witnessed developers completely trash a production environment by accidentally copying old files into the gold area.
Mistakes are going to happen and processes won't be followed 100% all of the time. The key is to design systems that expect this to occur and provide ways of dealing with the failure.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Murphy's Law Rules NASA 274

Murphy's Law Rules NASA More Login

Murphy's Law Rules NASA

interesting but it's not really true (Score:5, Interesting)

Cost Effective (Score:5, Interesting)

Circular reasoning (Score:3, Interesting)

The problem with errors (Score:4, Interesting)

Re:interesting but it's not really true (Score:3, Interesting)

John Galls Systemantics (Score:4, Interesting)

Re:interesting but it's not really true (Score:3, Interesting)

KISS... (Score:3, Interesting)

Plan for Software Project Failure (Score:3, Interesting)

Re:That is NOT correct. (Score:3, Interesting)

Re:That is NOT correct. (Score:3, Interesting)

Re:That is NOT correct. (Score:3, Interesting)

Re:The Gimli Glider (Score:4, Interesting)

Re:interesting but it's not really true (Score:3, Interesting)

Re: interesting but it's not really true (Score:3, Interesting)

Expect Failure (Score:1, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot