Upgrading Software From 350 Million Miles Away 228
CWmike writes "Picture doing a remote software upgrade. Now picture doing it when the machine you're upgrading is a robotic rover sitting 350 million miles away, on the surface of Mars. That's what a team of programmers and engineers at NASA are dealing with as they get ready to download a new version of the flight software on the Mars rover Curiosity, which landed safely on the Red Planet earlier this week. 'We need to take a whole series of steps to make that software active. You have to imagine that if something goes wrong with this, it could be the last time you hear from the rover,' said Steve Scandore, a senior flight software engineer at NASA's Jet Propulsion Laboratory. 'It has to work,' he told Computerworld. 'You don't' want to be known as the guy doing the last activity on the rover before you lose contact.'"
Actually... only 157 million miles away (Score:5, Informative)
The spacecraft TRAVELLED 350 million miles to get there, but as of tonight, Mars is only about 157.5 million miles from Earth.
Re:Failsafe (Score:5, Informative)
Re:Failsafe (Score:5, Informative)
Computers: The two identical on-board rover computers, called "Rover Compute Element" (RCE), contain radiation hardened memory to tolerate the extreme radiation from space and to safeguard against power-off cycles. Each computer's memory includes 256 kB of EEPROM, 256 MB of DRAM, and 2 GB of flash memory.[22] This compares to 3 MB of EEPROM, 128 MB of DRAM, and 256 MB of flash memory used in the Mars Exploration Rovers.[23]
The RCE computers use the RAD750 CPU, which is a successor to the RAD6000 CPU used in the Mars Exploration Rovers.[24][25] The RAD750 CPU is capable of up to 400 MIPS, while the RAD6000 CPU is capable of up to 35 MIPS.[26][27] Of the two on-board computers, one is configured as backup, and will take over in the event of problems with the main computer.[22]
http://en.wikipedia.org/wiki/Curiosity_rover#Specifications [wikipedia.org]
Data transfer speeds between Curiosity and each orbiter may reach 2 Mbit/s and 256 kbit/s, respectively, but each orbiter is only able to communicate with Curiosity for about eight minutes per day
When you have little bandwidth, better get it right the first time.
Re:it can fly? (Score:5, Informative)
"flight software"? (Score:1, Informative)
Re:And NASA has made mistakes with this before... (Score:5, Informative)
99% of brickings are the result of people doing stuff that the manufacturer did not intend for you to do, on devices where important design details were hidden for commercial reasons.
This is unlikely (one would hope) to be the case here.
Re:And NASA has made mistakes with this before... (Score:4, Informative)
Sounds like it was not just a software update gone wrong but rather some mechanical problem which they were trying to work around. It was nothing like the usual bricking problem, where a firmware update overwrites code which is needed to perform future firmware updates.
The rovers have several mechanisms to make it safer to update firmware remotely. But ultimately a combination of multiple unfortunate events can still lead to the loss of a rover. And one of those events may have been human error. From the description it sounds like mechanical problems with the solar panel, combined with two cases of human error in coordination of updates, another case of human error trying to correct the previous human errors, an unfortunate condition triggering a latent problem introduced by previous errors, and finally ending up in a position causing the battery to overheat, and loss of power being the ultimate reason it was impossible to adjust the previous mistakes.
Re:And NASA has made mistakes with this before... (Score:5, Informative)
Re:Failsafe (Score:5, Informative)
The radiation this thing emits is NOTHING compared to the solar and cosmic radiation it would experience both in transit and on Mars. Putting everything in a metal box only helps so much, you still need specifically designed electronics which can handle the odd bit of radiation without dying. Even with a thick metal box you can't run an i7 on Mars, or not for very long at least. Your standard DDR3 isn't going to work either, or your standard EEPROM.
The other thing to remember is that although this project is extremely important, they're still not going to throw more capabilities in than they need, because that is more that can go wrong. For a remote sensing platform, the amount of EEPROM isn't that important - you just need enough to hold your communication protocols, some basic reaction-to-obstacle algorithms and the motor control code. You aren't going to be pulling massive libraries in. The emphasis is on making it as simple as possible, so that there is less chance for bugs to creep in. Those extra MIPS will come in handy for the navigation and onboard image processing, and the flash for storing interesting info until you can upload, so those are what they have upgraded the most.
Re:Failsafe (Score:4, Informative)
Not really. That might have been true 10 years ago.
No.
All I'm saying is: you can bet the hardware is in a well-shielded heavy metal box, and today all it takes is about 1/4 of a cubic inch to squeeze in another GB of RAM or flash.
I wonder why they didn't think about that. A nice thick, heavy metal box. Easy! Perhaps you should go and work for NASA?
Let's ignore the earth's magnetosphere for the moment and make some massive assumptions.
The pressure on the ground is about 10^5 Pa. That means there's 10^4 Kg of stuff above you to absorb radiation from space. That equates to 10m of water, 1.25m of steel ot about 90cm of lead. Quite a lot.
Mars is about 1.5 Au from the sun, so receives about 0.4 times the radiation.cos
The atmosphere is about 600Pa, by comparison.
Radiation hardening is a very well established field. Using some degree of shielding is just one of the many techniques in use. On Mars, it is simply not enough on its own.
It is very, very difficult to make a rad-hard processor, and then very thoroughly test it. Yo can't just keep shrinking the feature size, because is it goes down, the effect of radiation increases. Not only that but as the amount of crystal per transistor shrinks, the chance of unrecoverable lattice damage increases, due to the lack of redundancy.
There are faster Rad-hardened DSPs, but those are, well, DSPs and only actually really fast for DSP like tasks.
There also are almost certainly faster ones available now. But it's been in transit for a year, and they certainly weren't building it with a brand-new untested processor for which thay had to write all the software on the way after they launched it.
So, given the constraints, it's a pretty great CPU to have on board.
Re:And NASA has made mistakes with this before... (Score:3, Informative)
Think you'd be able to code everything the rover is ever meant to do, in a single unchanging program image, into just a few hundred kB?
In other cases, upgraded software provides new capabilities that weren't envisioned during the original design. Spirit and Opportunity, for instance, were given lots of new capabilities over their mission life: like the ability to autonomously navigate based on Simultaneous Locating And Mapping (SLAM [wikipedia.org]) using the various cameras. These are capabilities that were just in development in academia when the rovers were originally programmed, but became proven during the MER mission. As a result of having that autonomous navigation capability, Spirit and Opportunity were able to travel much further distances than they would have if every single wheel revolution needed to be commanded from Earth.
Re:And NASA has made mistakes with this before... (Score:4, Informative)
Re:And NASA has made mistakes with this before... (Score:4, Informative)
Re:And NASA has made mistakes with this before... (Score:4, Informative)
No, it follows from the Tsiolkovsky rocket equation, and it is linear. The amount of fuel required is exponential in the delta-V required, but linear in the payload mass. m_1 = m_0 e^{- \Delta v / v_e}