Forgot your password?
typodupeerror
Bug NASA Space Science

NASA Finds Cause of Voyager 2 Glitch 283

Posted by kdawson
from the blame-cosmic-rays dept.
astroengine writes "Earlier this month, engineers suspended Voyager 2's science measurements because of an unexpected problem in its communications stream. A glitch in the flight data system, which formats information for radioing to Earth, was believed to be the problem. Now NASA has found the cause of the issue: it was a single memory bit that had erroneously flipped from a 0 to a 1. The cause of the error is yet to be understood, but NASA plans to reset Voyager's memory tomorrow, clearing the error."
This discussion has been archived. No new comments can be posted.

NASA Finds Cause of Voyager 2 Glitch

Comments Filter:
  • by BlackErtai (788592) on Wednesday May 19, 2010 @12:44AM (#32261368) Homepage
    Nobody knows you've done anything at all.
  • Really? (Score:3, Insightful)

    by atomicthumbs (824207) <atomicthumbs@g[ ]l.com ['mai' in gap]> on Wednesday May 19, 2010 @12:44AM (#32261370) Homepage

    The cause of the error is yet to be understood

    Let me guess: cosmic ray. Is it really that hard? What else causes a single bit-flip error in space?

    • Re: (Score:3, Insightful)

      by srothroc (733160)
      Age? Voyager is hardly brand new.
      • Just incredible! (Score:3, Insightful)

        by mcrbids (148650)

        Voyager is anything but brand new. Voyager is probably older than most Slashdotters, having been launched in 1977. Think about it: 1977 - when advanced microchips were not as powerful as the chip driving the shatty calculator you buy today at the dollar store. 1977 was a different time, when information technology usually didn't even involve transistors, yet, and vacuum tube testers (for your TV) were still found at the local drug store.

        And yet, some 33 years later, Voyager 2 is still chugging on, after vis

        • Re:Just incredible! (Score:5, Informative)

          by fdrebin (846000) on Wednesday May 19, 2010 @02:10AM (#32261792)

          1977 was a different time, when information technology usually didn't even involve transistors, yet, and vacuum tube testers (for your TV) were still found at the local drug store.

          Tube testers were pretty darned hard to find almost anywhere in 1977 (you could find them in old-used-electronics stores). I do recall testing tubes in drugstores in the early 70's.

          Solid state, and even (*gasp*) integrated circuits were in widespread use. Why, by gosh by golly, we even had *8080*'s then.

          I was a senior in college in physics+EE; I and a handful of my fellow students managed to coerce one of the EE profs to take a few hours and teach us about tubes (they had been removed from the curriculum). For the most part the interest was for us audio-nerds... tubes had that nice desirable sweet sound... (but I digress)

          /F

          • by Hurricane78 (562437) <deleted@slashd[ ]org ['ot.' in gap]> on Wednesday May 19, 2010 @03:39AM (#32262264)

            tubes had that nice desirable sweet distortion...

            There, fixed that for ya...

          • by mirix (1649853)

            Writing was well past on the wall for tubes by 1977. TV's were almost all solid state by the mid 70's, with the exception of the GE "portacolor" sets, which somehow managed to be made into the 80's. (No idea how / why they kept them that way. I suppose GE had a massive glut of associated parts).
            Radios and stereos would have been SS for a decade already, for the most part (couple stragglers here too).

            The latest production, normal, sort of consumer type tubes, (ie. not 100kW radio station tubes) made here tha

            • Re: (Score:3, Informative)

              by vlm (69642)

              I wonder what a brand new ancient rad-hard cpu costs.

              They're all kind of "ancient", by some definition. The BAE RAD6000 is at least 14 years old and they go for about 1/4 mil. Most recent launch was this February.

              http://en.wikipedia.org/wiki/IBM_RAD6000 [wikipedia.org]

              Some might consider the RAD750 to be "ancient" being about 9 years old. They retail about $200K. The TSSM is going to launch in a decade with one, at which point that CPU will be 19 years old.

              http://en.wikipedia.org/wiki/RAD750 [wikipedia.org]

              The cost and licensing of the fault tolerant GPL LEON series is very confusing, s

        • New-fangled memory (Score:5, Informative)

          by dfsmith (960400) on Wednesday May 19, 2010 @03:48AM (#32262302) Homepage Journal
          One of the upgrades the Voyagers had over the Viking computers was CMOS memory (instead of plated wires). Read all about it at http://history.nasa.gov/computers/contents.html [nasa.gov] Apparently, there was some debate at the time over whether these new-fangled memories would be reliable.
        • by dugeen (1224138)
          The thought of Voyager still on its way is an inspiring one. I can't visualise it without hearing the opening notes of the original Star Trek theme.
        • In the 1970s, computers were used for two things: to go to the moon, and to play pong. Nothing in between. That was back before every OS sucked [deadtroll.com].

          Just thought you would appreciate the song. Getting offa your lawn now.

        • Re:Just incredible! (Score:5, Informative)

          by vlm (69642) on Wednesday May 19, 2010 @06:35AM (#32263096)

          1977 - when advanced microchips were not as powerful as the chip driving the shatty calculator you buy today at the dollar store.

          Classic, ever repeated confusion of what "power" is. Unless you mean volts times amps, power is what you can do with it. An old mainframe can run a department of a small multinational corporation, maybe a large university, or perhaps a division of state government. We know this, because they did in fact do so, very profitably. You claim a dollar store calculator is more powerful. That means a dollar store calculator should be able to run, say, an entire multinational corporation, maybe multiple universities, or an entire state government. Oh wait, a dollar store calculator can, at best, slowly calculate someone's income tax, possibly correctly. I guess the old mainframe is more powerful after all.

          When I worked at a mainframe shop in the late 90s I heard alot of similar tiresome comments... "Ha ha, mainframes, bet you didn't know my laptop can run NOPs faster than your mainframe can run floating point FFTs ha ha ha mainframes". At which point you simply tell them to put up or shut up, hand them a bus and tag cable, and have their infinitely "more powerful" laptop process 5% of the NYSE volume like our mainframes did, while supporting about 100K trader desks, a couple TB of tape robot storage, etc.

          • Re:Just incredible! (Score:5, Interesting)

            by commodore64_love (1445365) on Wednesday May 19, 2010 @07:52AM (#32263694) Journal

            >>>have their infinitely "more powerful" laptop process 5% of the NYSE volume like our mainframes did, while supporting about 100K trader desks, a couple TB of tape robot storage, etc.

            A laptop could do that if it had an efficient assembly-written OS (like Kolibri), rather than the bloated general purpose OSes like Windows NT or OS X. At my former company we used the equivalent of laptops (Pentium 2s) to manage, load mission data, and launch a ship full of Tomahawk missiles.

          • by Anonymous Coward on Wednesday May 19, 2010 @07:55AM (#32263718)

            Classic, ever repeated confusion of what "power" is. Unless you mean volts times amps, power is what you can do with it.

            Have you ever kissed a girl?

          • Re:Just incredible! (Score:4, Informative)

            by Anonymous Coward on Wednesday May 19, 2010 @08:44AM (#32264270)

            Exactly. The IBM 360 had a truly incredible I/O capacity, powered by multiple parallel processing elements called "channels." You programmed them with "channel command words" or CCWs. They were independent of the main CPU. When a channel needed memory, it got locked down (pfixed) and allocated to the channel, so the channel could piss into memory at high speed. Really large, thick cables connected the CPU with peripheral devices. These cables had lots of wires in them. Because lots of bits were flowing IN PARALLEL. Look up the transfer rate of a 2701 drum drive, still maintained and used for paging devices as late as the 1980's by companies who could not find anything faster.

            When DEC tried to claim that they could replace 360's with VAX's, guess what happened? They didn't have massively parallel I/O processors. They didn't have a massive transfer capability. They generated an interrupt on every character typed by every user, for God's sake. They were not I/O engines. They failed, utterly. Not that VAX wasn't a good machine, but no way could it replace a 360.

            How did a small 360 support hundreds of users? Why, through an innovation called "CICS." What happened was, the mainframe would fill a 3270 CRT terminal screen with a "form." You would fill in the form, locally, using the "smart" 3270's field-editing and checking capability, with no interaction with the mainframe. When you were finished filling in your form, you'd hit TRANSMIT. At which point, the variable data on your form would be glued together by the 3270 in one record and sent up for processing by the mainframe (along with everyone else's form data). A few seconds later, you'd get another form in response. Lather, rinse, repeat.

            Oh wait. That's exactly how most business Web applications work. Except the screens are prettier.

    • Re:Really? (Score:5, Insightful)

      by pclminion (145572) on Wednesday May 19, 2010 @12:50AM (#32261412)

      Let me guess: cosmic ray. Is it really that hard? What else causes a single bit-flip error in space?

      When you have a probe billions of miles from Earth, with no hope of ever physically retrieving it, and something weird happens, I don't think the first thing you do is start making assumptions.

      • Re:Really? (Score:4, Insightful)

        by Peach Rings (1782482) on Wednesday May 19, 2010 @01:01AM (#32261486) Homepage

        It's pretty amazing that they even were able to track the problem down to a particular bit. No general purpose operating system has anything even remotely having dreams of approaching that level of reliability and stability. It's nice to see the strengths of bare-metal hacking demonstrated in this bleary age of big-button-pushing Java and .NET.

        • Re:Really? (Score:5, Insightful)

          by BitZtream (692029) on Wednesday May 19, 2010 @01:11AM (#32261536)

          Its also extremely important to note that not a single item you own is made to the specifications that Voyagers were made, even though made over 30 years ago.

          Its also rather important to note that as unstable as most OSes are, they are several million times more complex than the code Voyager 1 and 2 run.

          Finally, joke about Windows all you want ... if you do a default installation of Windows and you don't install any additional drivers or software, it is extremely stable and will just sit there for ages happy to do nothing but tick away.

          Its also entirely feasable to find 1 stuck or flipped bit even using Java and .NET, you just have to actually understand the inner workings of this code which is not something pretty much any developer working in these environments has time to do these days.

          Both things may be computers that run code and use electricity to do so, but thats about where the shared bits end. These guys have been using the same code for 30+ years ... they kinda know how it works and all its quirks at this point.

          With all that said ... you're still right, its freaky impressive.

          • by sznupi (719324)

            Impressive how they established this one bit with certainty - a command for transmitting back, basically, RAM content? Or at least checksums for various parts of it, narrowing down the location? (what about the storage from which it will be restored?) Would that even work considering the gibberish transmitted?

            If that was determined based largely on a copy at hand - what if some other bit is also wrong?...

            • Re:Really? (Score:4, Interesting)

              by rew (6140) <r.e.wolff@BitWizard.nl> on Wednesday May 19, 2010 @02:18AM (#32261832) Homepage

              Certainty? I don't think so.

              I think they simulated Voyager with this bit flipped and saw the same output (that is transmitted to earth).

              I hope they tried to flip ALL bits, and found that only this one bit would give the results seen. If you would follow the code and find and test just a few likely places, I'd expect a few more unexpected places to give the same results.

              The quick fix is to send the correct byte to the craft and hope that fixes it. If the bit has become stuck in the new position, they will have to do a remote firmware upgrade (with the code rewritten to fit the stuck-at value...) Other memory cells may have broken down in the mean time, but with a stuck-at value that is correct for the current version of the firmware, which you won't know until you try them....

              • by RichiH (749257)

                I assume they did in fact try a _lot_ of combinations. Testing all of them is probably impossible due to the complexity of even the "basic" system they sent flying in 1977, but I assume they went though this stuff _carefully_.

                What I find more amazing than locating the actual problem is that they can reset the thing over all that distance and be reasonably certain that it will come back to life.

            • Re: (Score:3, Informative)

              I would imagine that it was relatively easy. Voyager has not only a small amount of memory (about 541kb) about 10% of the command system's memory is dedicated to fault protection. Read here: Jet Propulsion Laboratory [nasa.gov]
          • Re: (Score:3, Funny)

            by dakameleon (1126377)

            Finally, joke about Windows all you want ... if you do a default installation of Windows and you don't install any additional drivers or software, it is extremely stable and will just sit there for ages happy to do nothing but tick away.

            Let me just OT for a moment here: if you didn't install any drivers or software... it'd just sit there, period, and you wouldn't be too happy about this slightly warm expensive paperweight you just bought. What on earth is the point of a computer without additional software?

            • by Jugalator (259273)

              The comparison is against Voyager that also have software installed, but where Windows is so much more complex and still with potential to run that stable. But yes, of course that complexity also drives the hw requirements.

          • Re: (Score:3, Funny)

            by IWannaBeAnAC (653701)

            Finally, joke about Windows all you want ... if you do a default installation of Windows and you don't install any additional drivers or software, it is extremely stable and will just sit there for ages happy to do nothing but tick away.

            Yeah, the problems only come when you try to use the keyboard or mouse.

        • Re:Really? (Score:5, Informative)

          by 0123456 (636235) on Wednesday May 19, 2010 @01:27AM (#32261602)

          It's pretty amazing that they even were able to track the problem down to a particular bit.

          To be fair, Voyager doesn't have many bits in its memory :). Tracking down a bad bit is much easier when you have 4k of RAM than when you have 4GB of RAM.

        • by rew (6140)

          It's happened before. Last time they just rearranged the code so that the particular bit that had become stuck-at-0 was required to be 0. Might have been a mars mission and not voyager.

        • Bah Kids today (Score:3, Insightful)

          by jellomizer (103300)

          You probably haven't had much experience with these older computer systems. They did what they need to do and that is it. The hardware was wired to do what it needs to do. Every bit had a purpose If that bit failed you knew that something was wrong. Making it fairly easy to find the bit that was bad.

          1K can be represented in a 32x32 square. these systems had only a few k of memory to view. And millions of dollars for funding Finding a missing bit is actually very easy. Especially if you go threw the d

      • by rew (6140)

        I can hope, can't I?

    • Re:Really? (Score:4, Funny)

      by mozumder (178398) on Wednesday May 19, 2010 @12:50AM (#32261420)

      Let me guess: cosmic ray. Is it really that hard? What else causes a single bit-flip error in space?

      Incredibly annoying alien hackers?

      • Let me guess: cosmic ray. Is it really that hard? What else causes a single bit-flip error in space?

        Incredibly annoying alien hackers?

        That's what I heard, and through a very reliable source [theonion.com]

        • The funny thing is that that part of the onion is not fake - the responses people make are, the but news they are responding to is real.
    • Re:Really? (Score:5, Funny)

      by sznupi (719324) on Wednesday May 19, 2010 @12:59AM (#32261464) Homepage

      V'Ger is unwilling to just transfer the data to its Creator...

    • Re: (Score:3, Funny)

      by Anonymous Coward

      Actually it was a metric "0" that got switched to an imperial "1".

    • Re:Really? (Score:5, Funny)

      by ianezz (31449) on Wednesday May 19, 2010 @01:20AM (#32261580) Homepage
      M-x butterfly [xkcd.com]. Cosmic rays, but on purpose.
    • Re: (Score:3, Funny)

      by T Murphy (1054674)
      A tiny cosmic spatula.
    • by rjch (544288)

      Let me guess: cosmic ray. Is it really that hard? What else causes a single bit-flip error in space?

      • Age of equipment.
      • Electrical short.
      • Space debris.
      • Alien hackers.

      Pick one. Any one.

      • Re: (Score:3, Interesting)

        by rew (6140)

        • Age of equipment.

        You're the second one to suggest "age". When humans die of age, that's some failure in the human body that's common when people grow old. That's when we say someone died of old age. However when human made devices die, there is always a component that has failed. When you have a 5 year old mobile telephone that dies, you say it died of old age, and replace it. That's because you don't care and replacing it costs less than finding out the root cause for the failure.

        When a properly designed computer flips a b

        • Re: (Score:3, Informative)

          by Tapewolf (1639955)

          In any case, I don't know what memory technology voyager uses. The (slightly) more modern space shuttles used magnetic core memory for essential systems. These are not affected by cosmic rays. If it isn't magnetic core, then it is likely to be static RAM. This too is not easily modified by a cosmic ray.

          I got curious and looked it up: http://voyager.jpl.nasa.gov/faq.html [nasa.gov]

          ...apparently it uses Plated Wire memory [wikipedia.org] which I had not heard of before, but seems to be a relative of core store.

        • by kinnell (607819)

          You're the second one to suggest "age". When humans die of age, that's some failure in the human body that's common when people grow old. That's when we say someone died of old age. However when human made devices die, there is always a component that has failed. When you have a 5 year old mobile telephone that dies, you say it died of old age, and replace it. That's because you don't care and replacing it costs less than finding out the root cause for the failure.

          When a properly designed computer flips a bit, SOMETHING happened. We may never know, it might have been a cosmic ray. But don't you think that they would use space-certified RAM chips for such a project?

          Semiconductor devices deteriorate over time due to dopant diffusion in the substrate. It's entirely possible for that memory bit to flip because the threshold voltages have drifted too far out of specification over the years.

        • >But don't you think that they would use space-certified RAM chips for such a project?

          They did, but cosmic rays come in a wide range of intensities, from feeble all the way up to having enough energy in one photon to make (baseball analogy ahead) a baseball jump.

    • by Zoxed (676559)

      >> The cause of the error is yet to be understood

      Just to clarify: this was the submitters comment: it does not appear in the source article.

    • Hardware that old uses sufficiently large components such that the mundane cosmic rays that regularly strike earth and earth-orbiting satellites are generally not strong enough to flip a bit. While it's certainly possible that one got a lucky shot, it's also quite possible that the hardware is failing, or that Voyager 2 is encountering much more energetic cosmic rays at the edge of the protective range of the Sun's magnetic field. Assuming the reset works, it'll be interesting to see how it fares as it fl

    • It was my understanding that Voyager's computers used CORE memory since it is not susceptible to radiation induced soft errors.

  • by superdave80 (1226592) on Wednesday May 19, 2010 @12:49AM (#32261408)

    Why don't they just always try that first?

  • by blind biker (1066130) on Wednesday May 19, 2010 @01:02AM (#32261488) Journal

    This is why you DO WANT nuclear energy in space! OK, Voyager 1 and 2 have RTGs, but even those are considered politically incorrect these days, especially such massive ones as in the Voyagers.

    More nuclear power in spacecraft, I say. To provide propulsion (ion drive, or even better, explosive drive) and energy when far from the Sun. Fuck PC.

    • Re: (Score:2, Informative)

      by eclectro (227083)

      Politically incorrectness is not what is stopping RTGs from being launched, but lack of supply of plutonium 238 [discovery.com]. It's difficult to protest launches with radioactive elements because they all have been successful. And if one were to crash, the RTGs are sealed so there would not be any leakage. Unfortunately environmentalists want to protest anything radioactive, even though such criticisms may no longer be valid.

  • Hero (Score:5, Insightful)

    by LoudMusic (199347) on Wednesday May 19, 2010 @01:04AM (#32261500)

    NASA is my hero. They do cool shit all the time. Even when their stuff breaks, it's cool. Then they fix it and it's even more cool.

    • They do cool shit all the time.

      No kidding [youtube.com]

  • Cosmic Ruse (Score:2, Interesting)

    by XiaoMing (1574363)

    First I was going to suggest that this satellite would careen forward out of control like a Toyota, but then realized that wouldn't be quite accurate.

    The cosmic rays we get one Earth are actually short-lived particles such as muons (a fat electron, probably most well known aside from the standard protons-neutrons-electrons) that result from cosmic naked hydrogens hitting our atmosphere. Out in space though, it'd be interesting to see if those protons would have the same effect as a terrestrial "cosmic ray"

  • by Arker (91948) on Wednesday May 19, 2010 @01:45AM (#32261682) Homepage
    You telling me NASA doesnt even use parity memory? Seriously?
  • Just don't brick it! (Score:3, Informative)

    by WGFCrafty (1062506) on Wednesday May 19, 2010 @01:53AM (#32261714)
    The Voyagers are my favorite probes!

    I wonder how many bits they'll have to send to change the one wrong one, and how long that will take.

    Leave it to the stoner astrophysicists Carl Sagan to oversee one of the more amazing feats of space trave!!

    Radioisotope thermoelectric generator [wikipedia.org]s are awesome!
    Anyone know how much fuel is remaining? They've been heating up for knowledge for a long period of time.



    Personally, I want about 6 of the units in Voyager 2, screw solar!
  • I'm surprised that a single-bit error is even an issue on such an important (and expensive) piece of equipment.

    Hamming codes [wikipedia.org] have been around since the 1940's.

    • Re: (Score:3, Informative)

      by ledow (319597)

      The spacecraft is in an incredibly hostile environment. Who's to say that there *wasn't* ECC and it's just that it's Hamming code wasn't enough to compensate for the error - it would make sense: as the hardware ages, the device leaves the solar system, the errors start getting closer and closer to the limits of error correction until one day - bam, even with error correction it slips through the net and ends up as a bad bit in memory.

      Technically, this is possible (but incredibly rare) on even the greatest

    • Hamming codes are designed for correcting transmission errors and are probably being used in the transmission. However what we are talking about is a fipped bit inside the code that produces the string to be transmitted, in such a case the ECC will simply ensure that the string of garbage it produces is transmitted correctly. Even in moern spacecraft they don't rely on ECC to verify a running program they use redundant systems; ie: 3 computers running the same code "vote" on the correct answer.
  • NASA plans to reset Voyager's memory tomorrow

    Considering the distances involved, I found it funny that the sentence implied simultaneity. Voyager 2 is about 92 AU out (according to WP), which is 12 light-hours and 45 light-minutes. So if they send the signal in the morning, the memory will be reset in the afternoon, and they can hope for clean signals the day after.

    • by RoboRay (735839)

      Would you consider it less funny if they'd said "NASA plans to reset Voyager's memory tomorrow and they'll know the next day if it worked"?

  • by dltaylor (7510)

    So who misused the emacs macro?

    For those of you who don't get the (obligatory) xkcd reference:

    http://xkcd.com/378/ [xkcd.com]

  • i have never ever heard that a change of ONE byte in a software would cause orderly, neat and complex datasets to be produced. i bet everyone would prefer such bugs in their software, rather than the normal bugs everyone gets.
  • Cosmic ray examples (Score:3, Interesting)

    by MK_CSGuy (953563) on Wednesday May 19, 2010 @05:46AM (#32262824)

    While not naming specifically cosmic rays as the cause in this case, what examples of actual cosmic ray-induced debacles are there in software eng. history?

  • by KlausBreuer (105581) on Wednesday May 19, 2010 @06:31AM (#32263082) Homepage

    Well, okay, as long as they don't get the "Press any key to continue" message...

  • They are going to reboot it and it will solve the problem. Heck, just like Windows. If all the users I had to support would reboot their machine before calling half of them wouldn't need to call.

    Of course, they refuse to learn that...

  • > a single memory bit that had erroneously flipped from a 0 to a 1.

    Didn't this used to be known as a soft error, as in cosmic rays passing through the chip and flipping a bit.

Living on Earth may be expensive, but it includes an annual free trip around the Sun.

Working...