Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Mars NASA IT

Reformatting a Machine 125 Million Miles Away 155

An anonymous reader writes: NASA's Opportunity rover has been rolling around the surface of Mars for over 10 years. It's still performing scientific observations, but the mission team has been dealing with a problem: the rover keeps rebooting. It's happened a dozen times this month, and the process is a bit more involved than rebooting a typical computer. It takes a day or two to get back into operation every time. To try and fix this, the Opportunity team is planning a tricky operation: reformatting the flash memory from 125 million miles away. "Preparations include downloading to Earth all useful data remaining in the flash memory and switching the rover to an operating mode that does not use flash memory. Also, the team is restructuring the rover's communication sessions to use a slower data rate, which may add resilience in case of a reset during these preparations." The team suspects some of the flash memory cells are simply wearing out. The reformat operation is scheduled for some time in September.
This discussion has been archived. No new comments can be posted.

Reformatting a Machine 125 Million Miles Away

Comments Filter:
  • Alternative Title (Score:5, Insightful)

    by wisnoskij ( 1206448 ) on Saturday August 30, 2014 @12:14PM (#47791067) Homepage
    How to brick a 2.5 billion dollar device.
  • by rasmusbr ( 2186518 ) on Saturday August 30, 2014 @12:34PM (#47791167)

    I would imagine that the system probably boots itself off of a ROM chip that has a routine for receiving data from Earth and storing it in RAM and then flashing that data onto the flash chip.

    If the rover does not boot from ROM then it is a miracle that it hasn't bricked itself yet.

  • Re:ECC? (Score:3, Insightful)

    by Anonymous Coward on Saturday August 30, 2014 @12:38PM (#47791193)

    As it happens, for flash, read errors are often transient. A better model than DRAM style ECC is to treat it more like a disk drive with checksums on each block. If you get an error, reread the block. And if you have a problem writing a block (e.g. the readback is wrong), just use a new block. Surely you've noticed that your USB thumbdrive gradually gets smaller with time as blocks wear out. (In space hardware, back in 2000, wear leveling was done manually.. still is as far as I know.. there's no nice rad-hard flash controller chips to make a big pile of MLC flash look like a disk drive, etc.)

    The long duration radiation performance of flash memory (particularly back in 2000, when these things were being designed) was/is not particularly well understood. There are a lot of what is called Enhanced Low Dose Radiation Effects (ELDREs) that are poorly understood for all semiconductor devices: you can't just blast the part in an accelerator at 1kRad/hr for a few days to get to a few hundred kRad and expect that this is the same as taking a few tens of Rad/hr over days and days and days, with 12 hours off after the sun goes down to anneal and heal.

    And, because resources on spacecraft are very precious, one doesn't blindly head off and say "let's just TMR everything". You make a rational choice based on the expected design life and the data you do have and pray for the best.

    And, of course, the design life was 3-6 months, and here we are 10 years later, still cranking along. I think it's done pretty well, all things considered.

  • Re:ECC? (Score:5, Insightful)

    by Nimey ( 114278 ) on Saturday August 30, 2014 @02:48PM (#47791869) Homepage Journal

    You're a poster child for Dunning-Kruger [wikipedia.org]: some random on the Internet who thinks he's smarter than the folks who designed a Mars rover that lasted over 10 years past its 90-day expected life.

No man is an island if he's on at least one mailing list.

Working...