Forgot your password?
typodupeerror
Mars NASA IT

Reformatting a Machine 125 Million Miles Away 155

Posted by Soulskill
from the red-rover-red-rover-send-updates-right-over dept.
An anonymous reader writes: NASA's Opportunity rover has been rolling around the surface of Mars for over 10 years. It's still performing scientific observations, but the mission team has been dealing with a problem: the rover keeps rebooting. It's happened a dozen times this month, and the process is a bit more involved than rebooting a typical computer. It takes a day or two to get back into operation every time. To try and fix this, the Opportunity team is planning a tricky operation: reformatting the flash memory from 125 million miles away. "Preparations include downloading to Earth all useful data remaining in the flash memory and switching the rover to an operating mode that does not use flash memory. Also, the team is restructuring the rover's communication sessions to use a slower data rate, which may add resilience in case of a reset during these preparations." The team suspects some of the flash memory cells are simply wearing out. The reformat operation is scheduled for some time in September.
This discussion has been archived. No new comments can be posted.

Reformatting a Machine 125 Million Miles Away

Comments Filter:
  • by Anonymous Coward on Saturday August 30, 2014 @10:57AM (#47790999)

    We're gonna need you to go out to the rover and reboot it. Yeah, it got stuck. You should probably leave ASAP.

  • by SternisheFan (2529412) on Saturday August 30, 2014 @10:59AM (#47791007)
    Easy, just gotta' replace the button battery.
    • It's running on solar power, that's how it lasts 10 years. Though the rechargeable battery must be tough to take so many recarchings.

      Ideally, you have redundant systems for such a situation, where you can take one of them down and use the other to do the booting, formatting, programming, as if there were a user sitting right next to it. They say it has a flashless mode of operation, but the way I think of it, as in a regular PC, with a BIOS, you can reformat the harddrive without booting off of and using th

      • by Anonymous Coward

        Wow. Talk about missing an obvious joke and over-thinking the response. Seriously epic *WHOOSH*

      • tl;dr on the whole post BUT... I've had my iPod nano in daily use for the past 8 years and it's still going strong. True, it doesn't need to power any motors - but the design specs probably also allocate a lot less weight to the battery.
    • what's this step 4??

      Press the reset button.

      Who the hell designed this stuff?

  • When I reboot machines in Asia or UK/EU using IPMI from the US.

  • do they get sombody in or from India?

    • I'll be glad to help you with that Sir.

      • A tech support guy from India helped me with my licensing problem. He was very nice and efficient and solved it right away. No complaints.
        • Sometimes when I sound mocking, ironic and sarcastic, I'm actually serious, as in ironic-ironic, or sarcastic-sarcastic. A lot of Americans simply smack the phone down on Indian tech support, saying gimme somebody who speaks English. I patiently listen to them struggle through it.

  • With a replacement SLC SSD and a screwdriver
  • ECC? (Score:5, Funny)

    by TechyImmigrant (175943) on Saturday August 30, 2014 @11:13AM (#47791059) Journal

    They didn't do any ECC on the flash memory? I thought these people were rocket scientists.

    • Re: (Score:3, Insightful)

      by Anonymous Coward

      As it happens, for flash, read errors are often transient. A better model than DRAM style ECC is to treat it more like a disk drive with checksums on each block. If you get an error, reread the block. And if you have a problem writing a block (e.g. the readback is wrong), just use a new block. Surely you've noticed that your USB thumbdrive gradually gets smaller with time as blocks wear out. (In space hardware, back in 2000, wear leveling was done manually.. still is as far as I know.. there's no nice rad

      • ECC use is standard with all flash storage. Flash is so unreliable that it can't be used without it, and it has nothing to do with the hard radiation environment on Mars. As for wear leveling, it's been standard since at least 1990 with the first attempts at flash storage. Why the rovers don't do it, I don't know. Maybe because it requires too many cycles of an already limited processor, plus dedicated storage space to keep "use counts" of all the flash blocks.
        • by Agripa (139780)

          ECC use is standard with all flash storage. Flash is so unreliable that it can't be used without it, and it has nothing to do with the hard radiation environment on Mars.

          NOR Flash does not normally use ECC and has reliability closer to that of EEPROM than NAND Flash.

      • by Agripa (139780)

        The long duration radiation performance of flash memory (particularly back in 2000, when these things were being designed) was/is not particularly well understood.

        Flash is another form of floating gate memory. Wouldn't the known long duration performance of EPROM and EEPROM apply?

    • by Anonymous Coward

      The rocket scientists did their job ten years ago. They're working at McDonalds now.

      • by Anonymous Coward

        This would make an interesting movie plot where they have to recall all the older, laid off rocket scientists working at McDonald's and bagging groceries at the supermarket to reboot an idle probe on a far away planet because it's the only one that can be repurposed to save the earth from an asteroid impact. But only the old guys know the hardware and can reprogram the firmware.

        Yeah I'm a laid off old guy. Get off my lawn!

        • And add in the volunteer group that decided to save the project, working out of an abandoned McDonald's.

          Oh, wait....

    • Well, in their defense, ECC on the flash memory isn't exactly rocket science.

  • Alternative Title (Score:5, Insightful)

    by wisnoskij (1206448) on Saturday August 30, 2014 @11:14AM (#47791067) Homepage
    How to brick a 2.5 billion dollar device.
    • Not sure if it was opportunity or its twin, but one of them required a modem reset not long after landing.
    • by rasmusbr (2186518) on Saturday August 30, 2014 @11:34AM (#47791167)

      I would imagine that the system probably boots itself off of a ROM chip that has a routine for receiving data from Earth and storing it in RAM and then flashing that data onto the flash chip.

      If the rover does not boot from ROM then it is a miracle that it hasn't bricked itself yet.

      • by Agripa (139780)

        I would imagine that the system probably boots itself off of a ROM chip that has a routine for receiving data from Earth and storing it in RAM and then flashing that data onto the flash chip.

        I wonder if the ROM would actually be a floating gate ROM instead of mask ROM or fuse based PROM in which case it would be more like EPROM or NOR Flash.

        Does anybody even make mask ROM or fuse based PROM any more?

        • by rasmusbr (2186518)

          I checked and it is EEPROM. And there are two EEPROM:s, I presume those are for redundancy in case one gets zapped.

    • I dunno so much these days. Its 10 years old and got a few miles on the clock plus collection for the new owner would be an issue. On the plus side vandalism won't be a worry. For a few centuries anyway.

    • They will almost certainly do a dummy run on an identical piece of flight hardware on Earth. The only difference is how the data are sent.
      • And the state of the hardware. Some unknown number of systems on the real curiosity are degraded to the point of malfunctioning; And they have little to no way of exactly measuring what and where.
        • And the state of the hardware. Some unknown number of systems on the real curiosity are degraded to the point of malfunctioning; And they have little to no way of exactly measuring what and where.

          Opportunity. Curiosity is on the other side of Mars, nuturing holes in its wheels and looking for cats to kill.

  • by mark_reh (2015546) on Saturday August 30, 2014 @11:16AM (#47791079) Journal

    Is it?

  • Why didn't they plan ahead for this sort of operation in the beginning, making it painless and 'reliable' ( as possible ).

  • I believe NASA is operating under the assumption that the rover's on board flash memory is still serviceable. 10 years ago flash memory was still in its relative infancy. A reformat and reload risks bricking the rover completely.
    • by beelsebob (529313)

      I believe you're assuming that the flash used on a rover that went to mars, and encounters all kinds of crazy radiation, is in some way similar to the crappy OCZ thing you stuck in your PC 10 years ago.

    • by Nimey (114278)

      You're a poster child for Dunning-Kruger [wikipedia.org]: some random on the Internet who thinks he's smarter than the folks who designed a Mars rover that lasted over 10 years past its 90-day expected life.

    • by M1FCJ (586251)

      Don't forget, we don't hear what the techies are talking about. What we're hearing is what the techies told to the PR guy distilled down to a journo, being summarized in The Register (!) and some other soft-tech sites, finally an inaccurate summary on the frontpage of Slashdot.

      I wouldn't be surprised if it were just a "fsck.ext4 -cc" (I know it's not an ext4, it was't even released when Opportunity soft-crashed and bounced around on Mars nor it runs Linux).

  • I didn't realize they used OCZ for the storage tech. ;)
    • by Anonymous Coward

      It was designed to last 3 months and failed after 10 years.
      If OCZ was involved, it'd be the other way around. ;)

  • I'd hate to be the guy a) pitching this operation at the change control meeting, and b) the guy signing off on this change.

  • It worked on Spirit (Score:5, Interesting)

    by lemur3 (997863) on Saturday August 30, 2014 @03:37PM (#47792279)

    they had to do this type of thing on spirit shortly after it arrived on mars..

    read more here: http://trs-new.jpl.nasa.gov/ds... [nasa.gov]

    or the PDF linked therin here http://trs-new.jpl.nasa.gov/ds... [nasa.gov]

    its got all sorts of awesome details.

    We commanded a shutdown, which terminated the
    current communication window, and the loss of signal occurred at the predicted time. Fifty minutes later, we commanded a beep at 7.8125 bps to alert us if the shutdown command did not work, and much to our disappointment, the beep was received!

    really a fun read. ..im guessing theyll be doing a lot of similar stuff

  • Man, hope they don't select the wrong partition.....
  • I'm sure they didn't get any of their capacitors from that bad batch a few years past.
  • ... handing over remote desktop access to tech support in Bangalore.

    Now if only we could get a Martian to IM during the process: "Yes. The little red LED is blinking ....."

  • Flash memory isn't the Rover's problem. It's still running XP and there are no more hot fixes. At this point the Rover's system has massive "bit rot," not to mention that it's been hacked countless times by the Chinese. Undeterred by this seemingly insurmountable problem, Microsoft has donated a Windows Phone for communications back to earth and a Surface Pro to power the Rover "because it's just like a computer." They didn't say just who's going to operate their touch-only interfaces. It all makes perfect

That does not compute.

Working...