Upgrading Software From 350 Million Miles Away 228
CWmike writes "Picture doing a remote software upgrade. Now picture doing it when the machine you're upgrading is a robotic rover sitting 350 million miles away, on the surface of Mars. That's what a team of programmers and engineers at NASA are dealing with as they get ready to download a new version of the flight software on the Mars rover Curiosity, which landed safely on the Red Planet earlier this week. 'We need to take a whole series of steps to make that software active. You have to imagine that if something goes wrong with this, it could be the last time you hear from the rover,' said Steve Scandore, a senior flight software engineer at NASA's Jet Propulsion Laboratory. 'It has to work,' he told Computerworld. 'You don't' want to be known as the guy doing the last activity on the rover before you lose contact.'"
Wow (Score:5, Insightful)
Failsafe (Score:1, Insightful)
For such expensive projects, would it not make sense to have two EPROM's, one containing the original known working system, and one for the new one. If the new version fails, the machine can fall back to the older version, switch between the two if there are more OS upgrades planned. If they have watchdog times on board to keep the rover going, surely they could do similar setup for the OS?
Re:hmm (Score:5, Insightful)
More seriously, for space systems and embedded systems in general, due to resource constraints on-board, you usually cannot fit all the functionality you would like to in one software image. So you keep only what is necessary for the first mission, and then you replace the obsolete ones with the next thing you want to do.
As a simplified example, when you launch a satellite, you will need it to deploy its solar arrays quickly (and do many initialization checks). When that is done, you could imagine changing this part of the software with something else...
Also, they might have had time planning constraints on the project, and needed to launch with a simpler first version of the software, while finalizing the second one. That does happen.
Re:And NASA has made mistakes with this before... (Score:5, Insightful)
why the NASA engineers want to take such a risk
Similar to some devices here on Earth, the rover should have an automatic revert solution. For instance, a non-updatable software running on a separate processor detects specific conditions (like no signal from Earth for a while) and flashes back the updatable software to its original version when that condition occurs.
Such things tend to be present, but how many times have they tested the automatic revert in actual conditions? An alternative codepath is always a risk.
Updating the software can have great advantages. Only a slightly more reliable connection would allow vast amounts of more science to be done. Adapting the algorithms for autonomous functions such as simple navigation or sample processing also makes a great difference when your lag time for a single command is measured in terms of minutes and you don't even have that level of "real-time" access most of the time.
Re:Failsafe (Score:5, Insightful)
Comment removed (Score:2, Insightful)
Re:And NASA has made mistakes with this before... (Score:5, Insightful)
I think it is safe to assume that they purposely bricked the rover (or test rover) before the mission. And made sure it played out as the GP stated. And that they did this many different ways.