Mars Rover Temporarily Froze In Place Following Software Error (extremetech.com) 45
UPDATE (1/25/2018): NASA has successfully unfrozen Curiosity, which will now live to rove another day.
But here's the original report shared by a reader detailing what the concerns were: NASA reports that Curiosity has suffered a system failure that left the robot unaware of its position and attitude on the red planet. Until it recovers, Curiosity is frozen in place. Mars is far enough away that we can't directly control Curiosity in real-time -- the rover gets batches of commands and then carries them out. That means it needs to have precise awareness of the state of all its joints, as well as environmental details like the location of nearby obstacles and the slope of the ground. This vital information ensures the rover doesn't bump anything with its arm or clip large rocks as it rolls along.
Curiosity stores all this attitude data in memory, but something went wrong during operations several days ago. As the rover was carrying out its orders, it suddenly lost track of its orientation. The attitude data didn't add up, so Curiosity froze in place to avoid damaging itself. While the rover is physically stuck in place, it's still in communication with the team here on Earth. Since everything else is working on the rover, NASA was able to develop a set of instructions that should get the rover moving again. When transmitted, the data will inform Curiosity of its attitude and confirm its current state. This should allow the rover to recover and keep performing its safety checks. However, NASA also hopes to gather data on what caused the issue in the first place. The hope is they can avoid another freeze-up in the future.
But here's the original report shared by a reader detailing what the concerns were: NASA reports that Curiosity has suffered a system failure that left the robot unaware of its position and attitude on the red planet. Until it recovers, Curiosity is frozen in place. Mars is far enough away that we can't directly control Curiosity in real-time -- the rover gets batches of commands and then carries them out. That means it needs to have precise awareness of the state of all its joints, as well as environmental details like the location of nearby obstacles and the slope of the ground. This vital information ensures the rover doesn't bump anything with its arm or clip large rocks as it rolls along.
Curiosity stores all this attitude data in memory, but something went wrong during operations several days ago. As the rover was carrying out its orders, it suddenly lost track of its orientation. The attitude data didn't add up, so Curiosity froze in place to avoid damaging itself. While the rover is physically stuck in place, it's still in communication with the team here on Earth. Since everything else is working on the rover, NASA was able to develop a set of instructions that should get the rover moving again. When transmitted, the data will inform Curiosity of its attitude and confirm its current state. This should allow the rover to recover and keep performing its safety checks. However, NASA also hopes to gather data on what caused the issue in the first place. The hope is they can avoid another freeze-up in the future.
Re: (Score:2)
Most likely it was coded by scientists and engineers, not code monkeys. And since it isn't yet clear what happened, it could be a hardware glitch.
Self-driving mode (Score:1, Offtopic)
Re: (Score:3, Funny)
Re: (Score:2)
Should be fine as long as there aren't broadside semi trailers in the way.
All it takes is one cosmic ray, flipping a bit.. (Score:5, Interesting)
Re: (Score:2)
Dammit, Boeing!
Re: (Score:1)
I'm not an expert in this, but hasn't ecc been consumer level technology for decades now?
Re:All it takes is one cosmic ray, flipping a bit. (Score:5, Informative)
Yes, the rover uses a pair of hardened PowerPC 750 processors, as seen in the PowerMac G3 from last century.
And ECC was common even then in servers.
PowerPC 750 (Score:5, Informative)
More reading for architecture nerds.
https://en.wikipedia.org/wiki/... [wikipedia.org]
https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:2)
shit can still happen.
That's why you design these things so they can recover from single-bit errors no matter what.
Re: (Score:2)
Re: (Score:2)
That is why you use ECC memory with scrubbing. If this was a bit-flip, somebody screwed up in the design.
Re: (Score:2)
Or maybe they couldn't get error-correcting memory in a radiation-hardened version.
Re: (Score:2)
An alpha particle from the sun will flip half the chips bits ...
Re: (Score:2)
It will do no such thing, especially not on a PPC 750. The particle size is tiny compared to transistors on the die.
Re: (Score:2)
It will cause an electron shower on impact. ...
So it depends if the chip is hardened or not
No idea how big transistors are on a PPC 750, but but modern transistors are just about 100 atoms ...
Re: (Score:2)
It will do no such thing, especially not on a PPC 750. The particle size is tiny compared to transistors on the die.
Radiation susceptibility is not as simple as feature size. Larger features are more resilient but also present more volume to capture charge from radiation events which is why the last few generations of DRAM have become more resistant to single event upset; of course memory sizes have grown to more than offset this.
IBM had access to a silicon-on-insulator process which eventually went to Global Foundries so they were in a special position for creating a relatively modern high performance radiation hardene
Re: (Score:2)
Don't forget our Sun is at an extremely low level of activity now which creates a very mild solar wind, which reduces the activity of the ionosphere, which allows more cosmic rays to hit the surface of Earth now than in any other time during the modern era of computing.
Watch out for oddities and store your data on a checksumming filesystem. Or scare your boss for entertainment.
Bug (Score:1)
So that's what one in a million LOC looks like. Wow. Keep up the great work NASA!
Doesn't know its attitude. (Score:5, Funny)
Re: (Score:2)
I was literally settling in to make a similar comment. Since they launched the two rovers I have personified them a la WALLE and I honestly believe they have 'tude.
Thanks for beating me to it!
Re: (Score:2)
>> Curiosity stores all this attitude data in memory ...
All this time I thought "attitude control" was the American way of pronouncing "altitude control". Kind of like the way they pronounce "nuclear" (nukelar) and "soldering iron" (soddering iron). Now I know better: Attitude Control [wikipedia.org]
Re: (Score:1)
Or some alien life attack?
No the rover is just waiting for a call to systemd to relaunch locationd. Ask Leonard how long it will take and most likely he will blame kernel latency issues for the systemd freeze and not the time it takes to create and send new compiled binary command routines that will not crash systemd completely.
Re: (Score:2)
Or you could ask Linus about writing your own kernel scheduler for the Rover. You could circumnavigate the red planet 10 times on the amount of energy he'll release.
Re: (Score:2)
Mmmmm....little green alien women, yummy!!
Re: (Score:2)
So Discreet!! (Score:5, Funny)
One of my favorite things about aliens is how discreet they are. They avoid all cameras while messing with little things here and there on these vehicles.
At some point you have to call it what it is. They are clearly geniuses.
--
Change of weather is the discourse of fools. - Thomas Fuller
It's obvious (Score:3)
"suffered a system failure that left the robot unaware of its position and attitude on the red planet."
It has a bad attitude, that much is obvious.
Re: (Score:3)
Everybody needs a little time alone to flip their bits every now and then!
Drunk (Score:5, Funny)
If I had been drudging along that boring Mars landscape for seven years non-stop, I would also like to get drunk just this once. No wonder it has a bit of hangover, and can't quite remember where it is and how it got there. Let it sleep it out, and tomorrow it will be fine again.
Re: (Score:2)
So the Dog Year ratio for Mars Rovers is 3:1? That would make it old enough to drink.
Already fixed... (Score:5, Informative)
announced a few days ago
https://mars.nasa.gov/msl/miss... [nasa.gov]
Re: (Score:1)
Microsoft (Score:3, Funny)
Have they tried turning it off, and then on again? (Score:2)
Legacy applications (Score:3)
I maintain a collection of legacy applications whose standard error recovery is to abort and restart. This is fine if the abort clears the error. If it doesn't, you're hooped. This was so pervasive in the product line that they actually wrote a C preprocessor to standardize crashes.
An "I'm broken. Fix me." safe mode would have made my life a whole lot easier...
...laura