Forgot your password?
typodupeerror
Space Technology

Murphy's Law Rules NASA 274

Posted by michael
from the cat-lands-butter-side-down dept.
3x37 writes "James Oberg, former long-time NASA operations employee, now journalist, wrote an MSNBC article about the reality of Murphy's Law at NASA. Interesting that the incident that sparked Murphy's Law over 50 years ago had a nearly identical cause as the Genesis probe failure. The conclusion: Human error is an inevitable input to any complex endeavor. Either you manage and design around it or fail. NASA management still often chooses the latter."
This discussion has been archived. No new comments can be posted.

Murphy's Law Rules NASA

Comments Filter:
  • by zerdood (824300) <null@dev.com> on Friday October 22, 2004 @10:14AM (#10597680)
    Someday all decisions will be made by machines. We'll just sit back while they do all the work. Then, no more human error.
    • Re:Mark my words (Score:5, Insightful)

      by wiggys (621350) on Friday October 22, 2004 @10:20AM (#10597733)
      Except, of course, that we programmed the machines in the first place.

      When a computer program crashes it's usually down to the human(s) who programmed it, and in the rare occasions it's a hardware glitch and it was humans who designed the hardware, so we're still to blame either directly or indirectly.

      I suppose it's like the argument about whether bullets kill or the human who pulled the gun's trigger.
    • Who will make the machines? Humans. With error.
    • Yes, but machines are programmed by people. Your average hacker is lazy, impatient, tired, horny and may or may not be intoxicated. They fuck up. That's why we have bugs.
    • You would think an article that outlines several failed autopilot systems might indicate a fundimental flaw in that thought process.
    • Computers cannot make decisions. They can perform computations. They can evaluate formulas. They can even pen new algorythems. But their decision making power ultimately comes down to flipping a coin.

      It can be a very heavily weighted coin, but it is a coin nonetheless.

    • Re:Mark my words (Score:4, Insightful)

      by NonSequor (230139) on Friday October 22, 2004 @10:31AM (#10597856) Journal
      If you're expecting this to result from the development of human level AI I wouldn't bet on it. In order to solve problems not predicted by its creators it will have to make some leaps of intuition the way humans do when they solve problems. The ability to propose original solutions also introduces the possibility of error. An AI will also have to rely on inductive reasoning in some situations and there is no reason to believe that a computer can avoid making any false inductions. I suspect that human level AIs will be able to do a lot of things better than us, but they will have at least some of the same flaws we do.
  • by spacerodent (790183) on Friday October 22, 2004 @10:16AM (#10597692)
    while it's possible to always have a mistake, having people double check a project from the ground up will almost always find the problems. Nasa's current difficulties arise from scattered teams that all only check their parts rather than having fully qualified teams that go over the entire vehical. The fact that the whole thing is usually designed by committee and in several pieces then assembled at the last minute probally helps facilitate error. The Saturn V rockets and other technology we used to land on the moon had hte capability of being far less relyable than today's technology but we still managed to use them for years without error.
    • by Moby Cock (771358) on Friday October 22, 2004 @10:20AM (#10597731) Homepage
      Its an oversimplification to say that older technology was used without errors. In fact, its just downright incorrect. Appolo 1 and Appolo 13 both suffered from catastophic failures. Furthermore, the next generation of space vehicles, the shuttle, has had two very significat disasters and reams of other failures.
      • No, it isn't. The Saturn V rocket was the most complicated and largest system ever built by man at the time and launched without a SINGLE failure for its entire operational life. The vehicles and satellites it carried had problems but the rocket itself never failed.
        • by eggoeater (704775) on Friday October 22, 2004 @10:36AM (#10597900) Journal
          The fact that the Saturn V rockets never blew up doesn't mean they never had problems! There were plenty of things that went wrong. Even in the movie Apollo 13, one of the Saturn V engines malfunctioned during take off. We survive failures in rockets and other critical pieces of technology due not only to pragmatic design but also redundancy. (Also, think about the design of airplanes...triple redundancy on hydrolic lines.)
          Also, there was some kind of semi-critical problem in EVERY SINGLE Apollo mission except Apollo 17, the very last one.
        • by 0123456 (636235) on Friday October 22, 2004 @11:24AM (#10598369)
          "The vehicles and satellites it carried had problems but the rocket itself never failed."

          No, but it came damn close. The 'pogo' problem on one of the launches, for example, almost lead to the loss of the Saturn V: if I remember correctly it would have broken up in a few seconds, but one of the engines shut down due to excessive forces and that saved the rocket.

          The sad thing is that by the time we launched the last Saturn the worst of the bugs had been resolved, just in time to stop flying them...
          • The sad thing is that by the time we launched the last Saturn the worst of the bugs had been resolved, just in time to stop flying them

            Sounds like most soft- and hard-ware. About the time the bugs are ironed out, its time for the next version with new bugs.

      • Its an oversimplification to say that older technology was used without errors. In fact, its just downright incorrect. Appolo 1 and Appolo 13 both suffered from catastophic failures.

        Indeed. And it's interesting to look at the root causes: Apollo 1- poor design, poor managment, poor craftsmanship. Apollo 13-poor managment (the cockup regarding the thermostat switch) and foulups by the workers on the floor (damaging the tank that would later explode because of the damage and the switch).

        While it's fashio

    • I'm still trying to figure out why the Apollo formula of contractors with Nasa oversight doesn't seem to work anymore.

      Then I remember Apollo 1, that killed 3 astronauts, and Apollo 13, that nearly killed 3 more.

      To invoke Heinlien, Space is a harsh mistress.

      To invoke Sun Tsu, success in defense is not based on the likelyhood of your enemy attacking. It is based on your position being completely unassailable.

      • by _Sprocket_ (42527) on Friday October 22, 2004 @11:25AM (#10598372)


        I'm still trying to figure out why the Apollo formula of contractors with Nasa oversight doesn't seem to work anymore.


        Take a look at Chapter 5 [speedera.net] of the CAIB Report. You might be especially interested in Section 5.3 - "An Agency Trying To Do Too Much With Too Little." And since you're comparing Apollo era NASA with today's program, look at diagrams 5.3-1 and 5.3-3. In short, the Apollo program enjoyed considerably more funding.
      • by sphealey (2855) on Friday October 22, 2004 @12:10PM (#10598941)
        I'm still trying to figure out why the Apollo formula of contractors with Nasa oversight doesn't seem to work anymore.
        Two reasons. First, outsourcing requires more and better project managers and technical managers than insourcing. Many organizations learned this to their sorrow in the 1980s; many more are going to learn it around 2006.

        Second, the stable of competent contractors that existed in the 1940-1960 time frame is gone. North American, Grumman, McDonnell, dozens of others that could be named have been absorbed into 2-3 borg-like entities. The result is less competition, less choice, less innovation, few places for maverick employees to go, and in the end worse results from outsourcing.

        sPh

    • I think this goes along with the saying: 'If you make something idiotproof, someone will build a better idiot'. Sure maybe they could have designed the accelerometers so that they couldn't be installed backwards. But then again what else might have failed. I guess in the end it all comes down to econnomics. What does the cost-benifit analysis say? Is it better to keep checking and double-checking, or to just send it out like it is. Now, I can understand cost-benifit is a little bit difficult when you are ta
    • by Wizzy Wig (618399) on Friday October 22, 2004 @10:32AM (#10597863)
      ...having people double check a project from the ground up will almost always find the problems...


      Then you double check the checkers, and so on... that's the point of the article... humans will err... Like Demming said... "you can't inspect quality into a process."

    • by Control Group (105494) * on Friday October 22, 2004 @10:37AM (#10597913) Homepage
      No, it is true. It's the "almost always" in your statement that's the key. It's simple statistics, really. Assume that a well-trained, expert engineer has a 5% chance of making a material error. This implies that 5% of the things s/he designs have flaws.

      Now suppose this output is double-checked by another engineer, who also has a 5% chance of error. 95% of the first engineer's errors will be caught, but that still leaves a .25% chance of an error getting through both engineers.

      No matter what the percentages, no matter how many eyes are involved, the only way to guarantee perfection is to have someone with a zero percent chance of error...and the chances of that happening are zero percent. Any other numbers mean that mistakes will occur. Period.

      I remember reading a story somewhere about a commercial jet liner that took off with almost no fuel. There are plenty of people whose job it is to check that every plane has fuel...but each of them has a probability of forgetting. Chain enough "I forgots" together, and you have a plane taking off without gas. At the level of complexity we're dealing with in our attempts to throw darts at objects xE7 kilometers away, it is guaranteed that mistakes will propagate all the way through the process.

      • Just like the case in which the airport crew assigned to clean an aeroplane put some masking tape over the air pressure sensors, but forget to remove it. Or rather, as the airport was badly lit and the masking tape wasn't noticably different from the skin of the aircraft, nobody noticed this small defect. Until the pilots came in to land the aeroplane that is, then it became a large problem.

      • Assume that a well-trained, expert engineer has a 5% chance of making a material error. This implies that 5% of the things s/he designs have flaws.

        Now suppose this output is double-checked by another engineer, who also has a 5% chance of error. 95% of the first engineer's errors will be caught...

        That doesn't follow. It's only true if the two errors are completely independent, which is a very big 'if'. In practice, the chances are that some types of error are more likely than others, and that the pro

        • I was being overly simplistic, admittedly, but I think the "model" (to put on airs) I used illustrated my point adequately: as long as there is a percent chance of an error being made at every step of the process, an error will eventually be made.

          Obviously, the trick is to minimize the odds, but you can't eliminate them.

        • The flight control computers on the Shuttle are an interesting example. If you have one fcc, and it dies, then you lose the vehicle and crew (vehicle is unflyable without the fcc). If you have two fccs, and one starts producing erroneous results, you don't know which one to trust. If you have three computers, you can survive any one of them failing, but then a second failure causes you problems.

          If you have four computers, there's an outside chance that two will fail, and you will have to choose between
      • by tentimestwenty (693290) on Friday October 22, 2004 @11:36AM (#10598511)
        There might always be errors which you can reduce with many checks. The key is to have the checks done by someone who has an eye for potential problems. There is a particular skill set/personality that can forsee unknown problems better than say an engineer who is single minded and focussed. You can get a hundred experts to check the same work but often it's the one guy who says "why is that wheel upsidedown" that reveals a completely unanticipated problem.
      • Re:The Gimli Glider (Score:4, Interesting)

        by dtmos (447842) on Friday October 22, 2004 @12:08PM (#10598919)
        The jet liner to which you refer, I think, is the Gimli glider [wadenelson.com] which, through a forehead-slapping number of independent goofs, ambiguities, and misunderstandings made by a frighteningly large number of people, ran out of fuel over Cananda in 1983.
      • I already trust my computer. My computer has no business 'wondering' whether it trusts me or not.
        Your sig, happens explain this more than enough. Computers should never trust humans.
    • from scattered teams that all only check their parts rather than having fully qualified teams that go over the entire vehical.

      Your sentiment is correct, but your details are a little off. For example the Saturn V rocket was built by "scattered teams" (and committees were heavily involved, despite the mythology around Von Braun)-- the first stage was built by Boeing, the second by North American, the third by Douglas Aircraft, the Instrument Unit (the control system) by IBM, the LEM by Grumman and the CSM
    • Finding problems is a good thing but I've found that nobody likes to be told their baby is ugly.... and if they're far enough up the corporate food chain... good luck getting 'em to listen.
    • Makes me think of the quote from Armageddon [imdb.com]:
      Rockhound : You know we're sitting on four million pounds of fuel, one nuclear weapon and a thing that has 270,000 moving parts built by the lowest bidder. Makes you feel good, doesn't it?

      And it will have flaws, no matter how often and thorough you check.
    • Apollo had two gross errors in twelve launches: the Apollo 1 fire that killed Grissom, White, and Chafee, and the Apollo 13 explosion.
  • Cost Effective (Score:5, Interesting)

    by clinko (232501) on Friday October 22, 2004 @10:18AM (#10597709) Homepage Journal
    It's actually more cost effective to allow for failures. You build the same sat 5 times and if 4 fail in a cheaper launch situation, you still save money.

    From this [scienceblog.com] article:

    "Swales engineers worked closely with Space Sciences Laboratory engineers and scientists to define a robust and cost-effective plan to build five satellites in a short period time."
    • I'm just waiting for management to get a clue about this. Most software projects fail on some level. Most software in big corporations sucks on some level, in at least one important component. Management should get a clue and PLAN for this. They should:
      • Have redundant competing projects
      • Have standards that mandate how components/systems fit together
      • NOT mandate that thou shalt use software X all across the enterprise

      What large corporations have been doing is Soviet style central planning. What happe

  • Good Point (Score:5, Insightful)

    by RAMMS+EIN (578166) on Friday October 22, 2004 @10:23AM (#10597766) Homepage Journal
    ``Human error is an inevitable input to any complex endeavor. Either you manage and design around it or fail.''

    This is a very good point, and I wish more people would realize it.

    For software development, the application is: Just because you can write 200 lines of correct code does not mean you can write 2 * 200 lines of correct code. Always have someone verify your code (not yourself, because you read over your errors without noticing them).
    • Thats a very popular cliche. The fact is with NASA's shrinking budgets, they don't have the resources to design around potential failures. There's old school NASA that desinged the Cassini probe that has redundant systems and is properly designed and tested, and there's new school NASA that makes the cheap Mars probes. Just looking at the Mars probes you'll see why they have moved to this method. If you can make five fault intolerant probes for the same cost as one fault tolerant probe, and odd are that onl
      • Re:You'd think so. (Score:3, Insightful)

        by RAMMS+EIN (578166)
        ``There have been a few companies who have discovered that its cheaper to have paying customers find the flaws in their software, rather than do any kind of formalized testing before release.''

        Not only that, but it's actually beneficial to produce and ship buggy software. Bugs have to be fixed, and who can fix them better than the people who wrote the code? So, it makes sense for programmers to leave flaws in their programs. Companies that ship flawed products can make customers pay for upgrades that also
    • There is a third option which is often overlooked, even though it is nearly always applicable: Reduce the complexity of your system! It can be a lot harder to design a simple system to solve a complex problem, but it's the only way to truly defeat Murphy's Law. It may be that some problems simply can't be solved by a simple system, but I'll wager that the set of those problems is a lot smaller than most people realize.
  • by nels_tomlinson (106413) on Friday October 22, 2004 @10:24AM (#10597788) Homepage
    That's right, blame it all on the Irish. After all, it's not like anyone else ever screwed up...
  • by Puls4r (724907) on Friday October 22, 2004 @10:25AM (#10597795)
    >>Either you manage and design around it or fail. >>NASA management still often chooses the latter.

    This is hindsite at its best, and is the classic comment by beareaucrats who have no concept of what cutting edge design is about. F1 race cars, Racing Sailboats, Nuclear Reactors - NO design is failsafe, and NO design is foolproof. Especially a one off design that isn't mass produced. Even mass produced designs have errors, like in the Auto Industry. It is a simple fact of life that engineers and managers balance Cost and Safety constantly.

    What you SHOULD be comparing this against is other space agencies that launch a similar number of missions and sattelites - i.e. other real world examples.

    Expecting perfection is not realistic.
    • If you are going be sheer number of launches, body count, payload capacity, or cost effectiveness, the Russians have us beat hands down.

      Sure we've been to the moon. But we haven't done a damn bit of fundimental research since then. (A lot of improvements to our unmanned rocket technology have been bought/borrowed/stolen from the Russian program.)

      • If you are going be sheer number of launches, body count, payload capacity, or cost effectiveness, the Russians have us beat hands down.

        Well,part of one out of four isn't bad. Let's examine these in detail shall we?

        • Sheer number of launches - This is the only one that the Russians 'beats' the US on, mostly because their hardware is unreliable and short lived, thus requiring frequent replacement. So far as manned flight go however, they've actually flown less. (87 Soyuz flights vs. 113 Shuttle flights a

    • This is hindsite at its best, and is the classic comment by beareaucrats who have no concept of what cutting edge design is about. F1 race cars, Racing Sailboats, Nuclear Reactors - NO design is failsafe, and NO design is foolproof.

      But this isn't about design. It's about implementation. In each of the examples, the failure occurred because of incorrect assembly of key components.

      Having said that - there IS an issue of design brought up by the article. That is, the design of a system should not al

    • by orac2 (88688) on Friday October 22, 2004 @10:57AM (#10598101)
      This is hindsite at its best, and is the classic comment by beareaucrats who have no concept of what cutting edge design is about.

      You only get to play the hindsight card the first time this kind of screw-up happens. If you actually read the article you'll see that Oberg (who isn't a beauracract but a 22-year veteran of mission control and one of the world'd experts on the Russian space program) is indicting NASA for having a management structure that leads to technical amnesia: the same type of oversight failure keeps happening again and again.

      Oberg is not alone in this. The Columbia Accident Report despairingly noted the similities between Columbia and Challanger: both accidents where caused by poor management but what was worse with Columbia was that NASA had failed to really internalise the lessons of Challanger, or heed the warning flags about management and technical problems put up by countless internal and external reports.

      Sure, space is hard. But it's not helped by an organization that has institutionalised technical amnesia and abandoned many of its internal checks and balances (at least this was the case at the time of the Columbia report, maybe things have changed).

      And if you really want to compare against other agencies, NASA's astronaut bodycount does not compare favorably against the cosmonuat bodycount...

      Sadly, your post is a classic comment by slashdotters who have no concept what effective technical management of risky systems looks like. (Hint: not all cutting edge designs get managed the same way. There's a difference between building racing sailboats and spaceships. This is detailed in the Columbia accident report. Read it and get a clue).

    • This is hindsite at its best, and is the classic comment by beareaucrats who have no concept of what cutting edge design is about. F1 race cars, Racing Sailboats, Nuclear Reactors - NO design is failsafe, and NO design is foolproof. Especially a one off design that isn't mass produced. Even mass produced designs have errors, like in the Auto Industry. It is a simple fact of life that engineers and managers balance Cost and Safety constantly.

      This advice better applies to yourself. Why does NASA use "one

      • by at_18 (224304)
        Scaled Composites, for example, has demonstrated a suborbital craft capable of barely reaching space for a cost of around $25 million. In comparison, NASA developed and flew three X-15 prototypes with similar capabilities for a cost of $300 million in 60's dollars (which incidentally was considered a cheap program).

        With the small difference that Scaled Composites is benefitting from 30 years of technology advancements. I don't think that an equivalent company of the 60s could build three SpaceShipOnes for

      • In comparison, NASA developed and flew three X-15 prototypes with similar capabilities for a cost of $300 million in 60's dollars (which incidentally was considered a cheap program).

        You've got good points. But you're being unfair on this one. Even the Rutan notes [thespacereview.com] that the X-15's capabilities far outstrip Spaceship One. That, and X-15 provided some of the basic building blocks in aero and astronomics on which Spaceship One could be built. Furthermore, Spaceship One enjoyed numerous high-performanc

      • Why does NASA use "one off" designs for all of its work (eg, the Space Shuttle, space probes, etc)?

        NASA usually doesn't use "one off" designs its work. There were five shuttles built. Voyager one and 2, and the two mars rovers are examples of "build two, so that its more likely that one will work".

        The commercial launch business also has a few standard designs and tested optional configurations that get used over and over again.
    • by 0123456 (636235)
      "F1 race cars, Racing Sailboats, Nuclear Reactors - NO design is failsafe, and NO design is foolproof."

      Not true: there are failsafe nuclear reactor designs that even a genius couldn't manage to melt down, let alone a fool... you just have to design them with safety guaranteed by the laws of physics, not the control systems. General Atomics built a lot of them decades ago, and the Chinese are developing modern versions today.

      Good design prevents a heck of a lot of problems. If nothing else, you'd have thou
      • by Dun Malg (230075)
        If nothing else, you'd have thought that by now engineers would realise that if you design something so it can be fitted backwards, sooner or later it will be.

        Hah! Engineers are the most intelligent bunch of idiots you'll ever find. The problem with engineers is that often their own cleverness and/or familiarity with the item they're designing blinds them to the viewpoint of someone who's "not clever" or totally new to the item. With (for example) the classic non-reversable, yet perversely symmetrical acc

  • by computational super (740265) on Friday October 22, 2004 @10:25AM (#10597805)
    NASA management still often chooses the latter.

    Why be different than any other management?

  • by Anonymous Coward
    Human error is an inevitable input to any complex endeavor. Either you manage and design around it or fail. NASA management still often chooses the latter.

    There's a contradiction in that above statement .. but I cant think what it is exactly. Along lines of human manages and designs the error handling dont they?

    That said nothing wrong with building in redundancy and failsafes

    In space probes redundancy comes at the cost of number of unique mission goals and financial cost.

    Sometimes you just have to eat t
    • No-one, least of all Oberg (a 22-year veteran of mission control), is asking NASA to have a 100% success rate. Space is harsh, unknown unkowns lurk, etc.

      What is is calling for is a management structure that allows solutions to problems that have occured before to be implemented properly. Columbia was destroyed for almost the same root causes that were exposed after Challanger. I don't think it's unreasonable to expect people to have elimiated those problems, and kept them eliminated.

      The Columbia Accident
  • Circular reasoning (Score:3, Interesting)

    by D3 (31029) <daviddhenning AT gmail DOT com> on Friday October 22, 2004 @10:26AM (#10597812) Journal
    The fact that human error isn't compensated for is the true human error that needs compensation.
    I think I just sprained my brain thinking up that one.
  • All moderately-complex projects have to be built around:

    1. change
    2. error

  • by RealityProphet (625675) on Friday October 22, 2004 @10:28AM (#10597838)
    The problem with errors is that detecting all errors all the time is absolutely impossible. Think back to your intro theory cs class and to Turing Recognizability. Think halting problem. Now, reduce the problem of finding all errors to the halting problem:

    if (my_design_contains_any_errors) while(1);
    else exit;

    Feed this into a program that halts on all input and see what happens. You can't, because we know it is impossible for it to always return an answer. QED: errors are unavoidable. No need to sniff derisively in the direction of NASA's "middle management". Let's see if YOU can do a better job!
    • No need to sniff derisively in the direction of NASA's "middle management".

      We're not talking about some unknown unknowns that crop up as the inevitable residue of your halting problem analogy.

      We're talking about a class of errors that have happened before. We already know about them, they've already been detected. And yet, because of management failure, they continue to persist. The Columbia Accident Board identified this as NASA's key problem, not the weakeness of Reinforced Carbon Carbon leading wing e

    • The problem with errors is that detecting all errors all the time is absolutely impossible.

      This is the thinking that provided an opportunity for the Japanese to become economic giants.

      Granted - it didn't start that way. Early Japanese production methods were error-prone to say the least. Then an American statistician named Dr. W. Edwards Deming taught a post-war reconstruction Japan how to improve quality. Ironically, Deming developed these ideas to improve production of military products.

      Severa

  • by woodsrunner (746751) on Friday October 22, 2004 @10:30AM (#10597850) Journal
    If you compare the advances to Science and Knowledge due to mistakes rather than deliberate acts, it might come out that everything is a mistake.

    Recently I took a class on AI (insemination, not intelligence) and apparently the two biggest breakthroughs by Dr. Polge, in preserving semen were due to mistakes. First, his lab mislabeled glycerol as fructose and they were able to find a good medium for suspension. Secondly, he blew off finishing freezing semen to go get a few pints and didn't make it back to the lab until the next day thus discovering that it was actually better to not freeze the stuff right away.

    Mistakes are some of the best parts of science and life in general. It's best to try to make more mistakes (i.e. take risks) than it is to try and always be right. (unless you are obsessive compulsive).
  • Its important not to loose sight of the harshness of the enviroment these systems are designed to operate in. As a simple example striking a match is an easy task to perform yet people trapped in cold remote areas have died because they were under too much stress to light a fire even though they had the tools. Its also question of consequences when something goes wrong, and space is not very forgiving.
    • Except the problems Oberg describes, including the proximate cause of the Genesis crash, have absolutely nothing to do with environmental conditions in space or anywhere else, but everything to do with mistakes made in nice clean rooms here on Earth, and magnified by poor management, again here on Earth.

  • Human Factor (Score:4, Insightful)

    by xnot (824277) on Friday October 22, 2004 @10:35AM (#10597887)
    I think the biggest difficulty surrounding large organizations is the lack of communication tools linking the right engineers together. It seems unfathomable that some of these mistakes were able to propegate throughout the entire engineering process and nobody caught them.

    Unless you consider the fact that often in large organizations, the left hand typically has no clue what the right hand is doing. I work at Lockheed Martin, and typically I'm involved in situations where one group makes an improvement that then none of the other groups know about, changes/decisions are poorly documented (if at all) so nobody knows where the process is going, people making poor decisions due to lack of proper procedures from management about what to do, teams not being co-located, poor information about which people have the necessary knowledge to solve a particular problem, or any number of things that confuses the engineering process, to the detriment of the product. Most of these situations are caused by a lack of communication throughout the organization as a whole.

    This is a serious problem, and it needs to be acknowledged by the people in a position to make a difference.
    • The problem with every large organization is that building the communication networks linking the right people is hard to do. There is a tradeoff to make when deciding which and how many people information needs to propagate to. Too few, and the right people don't get the message. Too many, and the message gets lost in the noise.
  • Nasty Remark (Score:3, Insightful)

    by mathematician (14765) on Friday October 22, 2004 @10:40AM (#10597940) Homepage
    "Either you manage and design around it or fail. NASA management still often chooses the latter."

    I find this remark very unfair. It is a really nasty snide attitude to it, like "we are perfect - why can't you be."

    Come on guys, NASA is trying to do some really difficult and ground breaking stuff here. Cut them some slack.
    • As I posted elsewhere, but I belive it bears repeating:

      We're talking about a class of errors that have happened before. We already know about them, they've already been detected. And yet, because of management failure, they continue to persist. The Columbia Accident Board identified this failure as NASA's key problem, not the weakeness of Reinforced Carbon Carbon leading wing edges.

      Unknown errors are unavoidable, but ignoring solutions to known errors is unforgivable.

      During his 22-years at Mission Contro
  • by onion_breath (453270) on Friday October 22, 2004 @10:43AM (#10597960) Homepage
    I love how journalists and others like to sit back and criticize these engineers' efforts. They are human, and they will do stupid things. Having been trained as a mechanical engineer (although I mostly do software engineering now), I have some idea of how many calculations have to be made to design even one aspect of a project. I couldn't imagine the complexity of such a system, trying to account for every scenario, making sure agorithms and processes work as planned for ONE mission. No second chances. That we have individuals willing to dedicate the mental efforts to this cause at all is worthy of praise. These people have pride and passion in what they do, and I'm sure they will continue to do their best.

    For anyone wanting to yack about poor performance... put your money where your mouth is. I just get sick of all the constant nagging.
    • Exactly - people should listen to themselves sometimes.

      "Can't you even fling a 2 tonne piece of incredibly delicate scientific apparatus a billion miles across space without one thing going wrong? Call yourself a scientist?"
    • Except that Oberg is a 22-year veteran of Mission Control.

      Except that everything he's saying here is an echo of what the Columbia Accident Investigation Board said about NASA's manned space program.

      Except that, if you'd bother to read the article, you'd see that the criticism is not of "engineer's efforts", but of management.

      Except that "are human, and they will do stupid things" is the whole point of Oberg's article, and he's talking about the failure of NASA to provide oversight to catch these inevitab
  • by Anonymous Coward on Friday October 22, 2004 @10:44AM (#10597968)
    Systems display antics. John Gall has written a great book which vastly expands on Murphys law which is called Systemantics - The Underground Text of Systems Lore [generalsystemantics.com]. I cannot recommend this book enough. It contains some truths about the world around us that's blindingly obvious once you see it, but until then you're part of the problem. Systemantics applied to political systems is very enlightening. Too bad that the only people who think like this in politics are the selfish and egomanical Libertarians (yeah, yeah.. I know. Libertarianism is the new cool for the self styled nerd political wannabe).

    Here are some of the highlights:
    • 1. If anything can go wrong, it will. (see Murphy's law)
    • 2. Systems in general work poorly or not at all.
    • 3. Complicated systems seldom exceed five percent efficiency.
    • 4. In complex systems, malfunction and even total non-function may not be detectable for long periods (if ever).
    • 5. A system can fail in an infinite number of ways.
    • 6. Systems tend to grow, and as they grow, they encroach.
    • 7. As systems grow in complexity, they tend to oppose their stated function.
    • 8. As systems grow in size, they tend to lose basic functions.
    • 9. The larger the system, the less the variety in the product.
    • 10. The larger the system, the narrower and more specialized the interfaces between individual elements.
    • 11. Control of a system is exercised by the element with the greatest variety of behavioral responses.
    • 12. Loose systems last longer and work better.
    • 13. Complex systems exhibit complex and unexpected behaviors.
    • 14. Colossal systems foster colossal errors.
  • by wiredog (43288) on Friday October 22, 2004 @10:48AM (#10598009) Journal
    "these switches were reportedly developed as a nuclear warhead safety device"

    Very comforting to know how easy it is to wire the safeties on nuclear weapons up backwards.

  • he would roll over in his grave and say:

    After all this post has GENESIS and outer space in it.

  • from the article:
    "After all, these switches were reportedly developed as a nuclear warhead safety device, so one could just assume that they were properly wired."

    Nice to know those safety devices are foolproof.
  • Maybe we should pass the hat and send every NASA manager a copy of _Systemantics_, for their enlightenment. (Likely the scientists and engineers already have their own copies.)
  • KISS... (Score:3, Interesting)

    by Kong99 (618393) on Friday October 22, 2004 @10:59AM (#10598120)
    A great engineer demonstrates his/her skill not by designing something terribly complex, but by designing the object that meets required specifications as SIMPLY as possible with as FEW unique parts as possible. That is GREAT engineering.

    However, with that being said I really do not believe Engineers are the problem at NASA. Bureaucracy is the enemy at NASA. NASA needs a complete top to bottom overhaul.

  • by Ced_Ex (789138) on Friday October 22, 2004 @11:19AM (#10598332)
    Here's the real reason for NASA and their errors, as quoted by Gordon Cooper a former astronaut.

    "Well, you're sitting on top of this rocket, about to be flung into the most hostile environement know to man, and you keep thinking, 'Everything here was supplied by the lowest bidder.'"

  • by ishmalius (153450) on Friday October 22, 2004 @11:35AM (#10598482)
    During design and testing, Murphy is your best friend. Before the baby chick leaves the nest, you want everything that can possibly go wrong, to do so. You can address each of the failures encountered, and then move on to new opportunities for error. This is a mysterious process called "learning," which definitely has its good points.

    NASA does test everything. He didn't mention in the article, but I would be almost certain that the accelerometers were tested, and passed the tests; but that the tests themselves were improper.

    • What Oberg is objecting to (and its one of the the things the Columbia Accident investigation board objected too as well) is that NASA has a long history of forgetting lessons learned.

      That is why he is holding managements feet to the fire here, as did the CAIB.

      And NASA doesn't test everything. In fact, NASA's relience on simulation and extrapolation and just plain guess work was harshly criticized in CAIB report, which is free for anyone to read.

  • by Anonymous Coward
    It is a difficult thing to design something to face failures. It requires a mind set vastly different than that of most "builders of things." Those folks tend to think in the positive: my creation does this, and this, and this, and ... This is true whether the thing being built is a program, a car, or a team of people.

    If you want to see this in action, find your favorite developer and ask the following: "What does your program do, and how doe it do that?" Prepare for a long response :-)

    Then ask: "How doe
  • And the subsequent controversies...

    is here [improb.com].

    (This paper won a prestigious 2003 Ig Nobel [improb.com] award for engineering.

    W
  • Summary of The fastest man on Earth [improb.com]:

    George Nichols: "The Law's namesake, was Capt. Ed Murphy Jr., a development engineer... Frustrated with a strap transducer which was malfunctioning due to an error in wiring the strain gauge bridges caused him to remark-- 'if there is any way to do it wrong, he will'-- referring to the technician who had wired the bridges. I assigned Murphy's Law to the statement and the associated variations..."

    David Hill: "Murphy was kind of miffed off. And that gave rise to his observation: 'If there's any way they can do it wrong, they will.' I kind of chuckled and said, that's the way it goes. Nothing more could be done really."

    John Paul Stapp: "we do all of our work in consideration of Murphy's Law. [defined as] the idea that you had to think through all possibilities before doing a test."

    Dr. Dana Kilanowski: "at the time I believe Stapp said something like, 'If anything can go wrong he'll do it.' A couple days later there was a press conference in Los Angeles and Stapp said something like, 'it was Murphy's Law -- if anything can go wrong, it will go wrong.' [...] I have heard that Murphy claimed he invented Murphy's Law, but Stapp is the one noted for his witticisms, his haikus, and his plays on words."

    Ed Murphy: "I didn't tell them that they had positively to orient them in only one direction. So I guess about that time I said, 'Well, I really have made a terrible mistake here, I didn't cover every possibility.' And about that time, Major Stapp says, 'Well, that's a good candidate for Murphy's Law'. I thought he was going to court martial me, but that's all he said. [Stapp reeled off a host of other Laws, and said] 'from now on we're going to have things done according to Murphy's Law'."

    Chuck Yaeger: "Look, what you're getting into here is like a Pandora's Box. Goddamn it, that's the same kind of crap...you get out of guys who were not involved and came in many years after."

    And in the end it wasn't as extreme a failure as Genesis:

    According to Nichols the failure was only a momentary setback --"the strap information wasn't that important anyway," he says -- and regardless good data had been collected from other instruments. The Northrop team rewired the gauges, calibrated them, and did another test. This time Murphy's transducers worked perfectly, producing useable data. And from that point forward, Nichols notes, "we used them straight on" because they were a good addition to the telemetry package. But Murphy wasn't around to witness his devices' success. He'd returned to Wright Field and never visited the Gee Whiz track ever again.
  • `nuff said...

    Note the "law" doesn't just torture NASA exclusively, it just rears its head very visibly in their case.
  • HOW IT HAPPENS (Score:4, Insightful)

    by LaCosaNostradamus (630659) <LaCosaNostradamus&mail,com> on Friday October 22, 2004 @01:28PM (#10599782) Journal
    1. Manager issues a stupid order.
    2. Subordinates obey order out of fear.
    3. Manager gains confidence that stupidity is a valid method.
    4. Stupidity gains an increasing foothold until a catastrophe occurs.
    OR
    1. Manager cuts another corner or cost.
    2. Nothing immediately bad happens as a result.
    3. Manager gains confidence that cutting is a valid method.
    4. The cuttings increase until a catastrophe occurs.
    Managers are among the most moronic of the "educated" Western class ... because, after all, they don't understand the trends I outlined above.

You know that feeling when you're leaning back on a stool and it starts to tip over? Well, that's how I feel all the time. -- Steven Wright

Working...