Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Space NASA

Pluto Probe Back To Normal, Cause of Snafu Found 80

Tablizer writes: NASA has provided an update to the problem with the New Horizons probe that will fly by Pluto next week. "The investigation into the anomaly that caused New Horizons to enter "safe mode" on July 4 has concluded that no hardware or software fault occurred on the spacecraft. The underlying cause of the incident was a hard-to-detect timing flaw in the spacecraft command sequence that occurred during an operation to prepare for the close flyby. No similar operations are planned for the remainder of the Pluto encounter.
This discussion has been archived. No new comments can be posted.

Pluto Probe Back To Normal, Cause of Snafu Found

Comments Filter:
  • by tomhath ( 637240 ) on Monday July 06, 2015 @08:46AM (#50053607)

    The underlying cause of the incident was a hard-to-detect timing flaw in the spacecraft command sequence that occurred during an operation to prepare for the close flyby.

    So a "flaw" in the command sequence isn't a software fault? Sure sounds like one to me. Glad to hear the craft is functioning again though.

    • by MightyYar ( 622222 ) on Monday July 06, 2015 @08:51AM (#50053649)

      I'm pretty sure that "fault" has a specific meaning in NASA parlance. There was obviously a software bug, but it probably didn't "fault".

      • I would guess "fault" is their word for crash. This one did not crash, some audit method failed, it entered a safe fall back mode.

        Can't blame NASA though, when the commands are transmitted over 3 billion miles, the signal would degrade so much it is possible some critical command or an command argument was not correctly received.

        • I'm not sure that's a sound argument. You should be checksumming such that you're confident that what you're doing is what you were asked to do, and working in transactions, such that if you've not received a whole command group, you're not running any of it. I'd think it was only in desperate circumstances you'd issues a command that says do this, or in fact do anything plausible if you don't fully receive this, because you're about to fly into something hard...

          • That is probably what actually happened. It got some command, it knew it was garbled, so it did not execute it. Then what? More commands are sure to follow. Ground control would assume the command has been executed. It can wait to check status, it takes 5 hours for the status to be reported back. The ground control thinks the machine is in some state, but the machine knows that is not true due to one skipped command. So it probably sends out a status update saying, "Need to synch everything up. Going back t
        • by tambo ( 310170 )

          Can't blame NASA though, when the commands are transmitted over 3 billion miles, the signal would degrade so much it is possible some critical command or an command argument was not correctly received.

          Nonsense - that's one of the easiest problems to solve in all of computer science: you just tack on a hashcode, checksum, parity bit, etc., and the receiver verifies that it got the right message. If it doesn't verify, the receiver doesn't follow it, and when the sender doesn't get an acknowledgment, it re

          • Yep. Not particularly strenuous CRC formulae can detect errors that may happen in a data stream running the entire age of the universe.

            • Yep. Not particularly strenuous CRC formulae can detect errors that may happen in a data stream running the entire age of the universe.

              Yep. Collision free and works with Ada and decision voting. Trivial when you thunk about it. (goddamned rocket scientists act like they know stuff)

          • by danlip ( 737336 )

            when the sender doesn't get an acknowledgment, it retransmits the message

            When your round-trip communication time is on the order of 10 hours you might want to modify that strategy.

            (not that it is hard to do so, just transmit the message multiple times with a sequence number so the client can detect the repeats)

        • by Tablizer ( 95088 )

          Come on NASA, was it a "fault", "snafu", "glitch", or "bug". Come clean now!

          Personally, I suspect it was a snag.

          • As long as it isn't a BFRC they're OK with it.
            • by Tablizer ( 95088 )

              Belgian Flatcoated Retriever Club?

              Sticking with the idea that "Pluto" is a dog, eh?

              • by bondsbw ( 888959 )

                Of course not. Pluto is a dwarf dog.

                • by Whiteox ( 919863 )

                  It's hard to determine the breed. Pluto does look a bit like a Rhodesian Ridgeback without the ridges. Otherwise a rather large Hungarian Vizsla except for the eyes.
                  Hmmm.....

          • I'm guessing it was an unanticipated race condition. Everything works correctly, everything passes all tests, but for some extremely rare constellation of input values software module "B" is able to complete its calculations and report its results before "A" can-- which has a probability of occurrence so low that it rounds to zero-- and that screws the pooch. If the probability of this happening again approaches zero, it would be fair for NASA to say there was no error in the programming, but instead an unexpected glitch in operations that is unlikely to ever recur.

            You can never test for every possible corner condition. More than that, in probably every real world situation, the longer the time since the last hard reboot, the more likely it is that the software will encounter some corner conditions. That Pluto bird has been running for quite a while.

            • Or maybe they just needed time to scrub all the pictures of the invading Vogon fleet. Don't want people to panic...

    • by Anonymous Coward on Monday July 06, 2015 @08:57AM (#50053727)

      There's a gap between "flawless" and "faulty" whos length, as it so happens, is remarkably similar to the distance that New Horizons has travelled so far.

    • by JeremyR ( 6924 )

      The article doesn't elaborate, so I'm guessing this refers to a command sequence sent from the ground. If these are generated by software, it still could have been a software fault, but not on the spacecraft.

    • by geogob ( 569250 )

      I believe they meant that the software (or hardware) on the spacecraft behaved as expected, but the error was rather due to an handling mistake, sending the commands with the wrong timing. If you asked me, such an handling mistake should be catched by the on-board software and handled properly (which means telling the operator right away to RTFM). I would thus qualify this as a software issue, regardless of what they say.

      The official statement is simply putting the "you're holding it wrong" response to a wh

      • I believe they meant that the software (or hardware) on the spacecraft behaved as expected, but the error was rather due to an handling mistake, sending the commands with the wrong timing. If you asked me, such an handling mistake should be catched by the on-board software and handled properly (which means telling the operator right away to RTFM). I would thus qualify this as a software issue, regardless of what they say.

        The official statement is simply putting the "you're holding it wrong" response to a whole new level.

        Well, ok, one could argue that any obscure corner case should be handled appropriately. But at some point, you have to launch the thing.

      • by ColaMan ( 37550 )

        If you asked me, such an handling mistake should be catched by the on-board software and handled properly (which means telling the operator right away to RTFM).

        Well, that's what happened. Commands were sent, probe responded with a WTF!? and halted, people double-checked things - Oh, there's the problem, probe was reset back to normal.

        Unfortunately, the round-trip time to the probe is nearly 9 hours, and nobody wants to be that guy that broke it good and proper, so they double check everything before replyin

    • by dissy ( 172727 )

      So a "flaw" in the command sequence isn't a software fault?

      I don't see why it must be.

      Imagine you wrote a shell script to first create a temp folder, then recursively delete the source data folder, followed by copying the source folder to the new temp folder.

      Oops, your data is gone!

      Is that a fault with the delete command doing exactly as you instructed it to?
      Or is that a fault in your sequence commands in the script?

      • by tomhath ( 637240 )

        Imagine you wrote a shell script...

        That's software.

        • by dissy ( 172727 )

          That's software.

          That's software doing exactly as instructed, and as expected.
          The question is: Is the software working perfectly to be considered a software fault?

          A developer or operator fault most certainly. But there was no part of the software doing anything it wasn't told. No part that had any expectation of working differently than it did.

          Here we call that operator error.

          "I right clicked this file and selected delete. When it asked if I was sure I clicked Yes. Now I'm shocked, appalled, and confused why that file g

        • Congratulations, you can read! Now go practice reading the rest of the post, it describes how the "software" is not faulty yet gives an unwanted outcome due to command timing.

    • I believe the use of the word "fault" here means that there is nothing broken on the spacecraft, hardware or software. It behaved as it was supposed to, it was just fed a bad command sequence. i.e. any software fault was in the auditing software on the ground. Even then it may not be a "fault" (i.e. breakage) but just some conditions that aren't accounted for in the audit.

    • Sounds like: All software worked as designed, and two real-time events occurred (at exactly the same time / within the same timestamp resolution) || (in the reverse order to anticipated, possibly due to delayed reporting/recognition) || (at the same time as a higher-priority interrupt). Not technically a software fault; a *design* fault perhaps, but not a fault in the software as designed and implemented.
    • "no hardware or software fault occurred *on the spacecraft*"

      There may have been a hardware or software fault on the ground, that resulted in an invalid command sequence. The desired behaviour in this case may be to enter a safe mode, so that you have a known means to recover (rather than bricking).

    • Maybe the software was working the way it should but not the way the humans intended it to? Like the killer robots/AI of sci-fi.

  • by Anonymous Coward

    No, the plans were drawn in miles!

  • by Anonymous Coward

    If Pluto is Mickey's Dog, then how can Goofy be Mickey's best friend?

    Truly NASA is the only one who can answer this important conundrum.

  • No exactly a SNAFU (Score:4, Informative)

    by jgtg32a ( 1173373 ) on Monday July 06, 2015 @08:54AM (#50053707)

    While NASA has had some spectacular bugs in the past, they aren't common enough to start throwing around SNAFU.

    Situation Normal: All Fucked Up

  • by xxxJonBoyxxx ( 565205 ) on Monday July 06, 2015 @08:55AM (#50053713)

    I always take a shot when someone uses the word "anomaly" in a space story. The legacy of STTNG continues.

  • A few months or years ago to look for possible race conditions. A software simulator or backup craft is not quite the same. The main sequence is less than a day due to the high velocity of the spacecraft.
  • I'm just sayin'. Those creepies wouldn't want to be observed before they hit Tombaugh Station.

  • by radiumburn ( 4175735 ) on Monday July 06, 2015 @09:06AM (#50053795)
    Lets blame the 1 second time change - probably couldn't connect back to the local satellites because of a time certification error haha.
    • by Tablizer ( 95088 )

      They are not used to a time zone change to Pluto Local Time. The Plutonians* were not willing to help.

      * "Plutocrats"? "Plutoids"? Reminds me of a joke about Hillary allegedly selling nuke mines to Putie. She's the "Plutonium Plutocrat".

  • Someone sat on the keyboard.

  • by gsslay ( 807818 ) on Monday July 06, 2015 @10:02AM (#50054217)

    It can only be attributable to human error. They checked out the AE-35 Unit and it had no problems at all.

    I've still got the greatest enthusiasm and confidence in the mission.

  • The underlying cause of the incident was a hard-to-detect timing flaw in the spacecraft command sequence that occurred during an operation.

    I've been a sys admin for a very long time and this sounds very familiar to many mad-libs style answers I've provided to uninitiated management immediately following an irreparable mistake.
  • ... waited to the last minute, panicked because the details for Pluto were not complete, and bought themselves some time.
  • I'll start digging.

    (and showing myself out)

Technology is dominated by those who manage what they do not understand.

Working...