Upgrading Software From 350 Million Miles Away 228

Posted by samzenpus on Friday August 10, 2012 @03:06AM from the distant-support dept.

CWmike writes "Picture doing a remote software upgrade. Now picture doing it when the machine you're upgrading is a robotic rover sitting 350 million miles away, on the surface of Mars. That's what a team of programmers and engineers at NASA are dealing with as they get ready to download a new version of the flight software on the Mars rover Curiosity, which landed safely on the Red Planet earlier this week. 'We need to take a whole series of steps to make that software active. You have to imagine that if something goes wrong with this, it could be the last time you hear from the rover,' said Steve Scandore, a senior flight software engineer at NASA's Jet Propulsion Laboratory. 'It has to work,' he told Computerworld. 'You don't' want to be known as the guy doing the last activity on the rover before you lose contact.'"

This discussion has been archived. No new comments can be posted.

Upgrading Software From 350 Million Miles Away

Load All Comments

Search 228 Comments Log In/Create an Account

Comments Filter:

And NASA has made mistakes with this before... (Score:5, Interesting)

by YesIAmAScript ( 886271 ) writes: on Friday August 10, 2012 @03:09AM (#40942705)

It is a difficult task. While NASA has don'e a lot better than most of us programmers ever have, they have made mistakes in updating from Earth to Mars before.
http://en.wikipedia.org/wiki/Mars_Global_Surveyor#Loss_of_contact [wikipedia.org]

Share
twitter facebook
- Re:And NASA has made mistakes with this before... (Score:5, Interesting)
  
  by Taco Cowboy ( 5327 ) writes: on Friday August 10, 2012 @03:27AM (#40942835) Journal
  
  That is why I do not understand why the NASA engineers want to take such a risk
  Unless it is a totally fatal software bug - that is, if they do not upgrade the software, the Curiosity rover gonna be bricked - I do not think taking the risk of bricking the rover for a regular software upgrade is worth the danger of bricking the rover, which is, as TFA has stated, 350 millions miles away
  
  Parent Share
  twitter facebook
  - Re:And NASA has made mistakes with this before... (Score:5, Informative)
    
    by Anonymous Coward writes: on Friday August 10, 2012 @03:50AM (#40943013)
    
    99% of brickings are the result of people doing stuff that the manufacturer did not intend for you to do, on devices where important design details were hidden for commercial reasons.
    This is unlikely (one would hope) to be the case here.
    
    Parent Share
    twitter facebook
    - Re:And NASA has made mistakes with this before... (Score:5, Funny)
      
      by K. S. Kyosuke ( 729550 ) writes: on Friday August 10, 2012 @04:55AM (#40943381)
      
      99% of brickings are the result of people doing stuff that the manufacturer did not intend for you to do
      In that case, that should happen with deep space probes quite a lot.
      
      Parent Share
      twitter facebook
      - Re: (Score:3)
        
        by RockDoctor ( 15477 ) writes:
        
        99% of brickings are the result of people doing stuff that the manufacturer did not intend for you to do
        In that case, that should happen with deep space probes quite a lot.
        ... or it would do, if the manufacturers and the users weren't the same group. Or, for the likes of NASA, the manufacturers of the flight hardware computers and the manufacturers of the flight software weren't two groups of the same organisation, both of whom would take equal accountability for a failure like this. (And probably work in
    - Re: (Score:2, Funny)
      
      by Anonymous Coward writes:
      
      pff worst case scenario : they send him over to mars to jtag the rover by hand...
  - Re:And NASA has made mistakes with this before... (Score:5, Interesting)
    
    by hcs_$reboot ( 1536101 ) writes: on Friday August 10, 2012 @04:23AM (#40943213)
    
    why the NASA engineers want to take such a risk
    Similar to some devices here on Earth, the rover should have an automatic revert solution. For instance, a non-updatable software running on a separate processor detects specific conditions (like no signal from Earth for a while) and flashes back the updatable software to its original version when that condition occurs.
    
    Parent Share
    twitter facebook
    - Re:And NASA has made mistakes with this before... (Score:5, Insightful)
      
      by cnettel ( 836611 ) writes: on Friday August 10, 2012 @04:35AM (#40943279)
      
      why the NASA engineers want to take such a risk
      Similar to some devices here on Earth, the rover should have an automatic revert solution. For instance, a non-updatable software running on a separate processor detects specific conditions (like no signal from Earth for a while) and flashes back the updatable software to its original version when that condition occurs.
      Such things tend to be present, but how many times have they tested the automatic revert in actual conditions? An alternative codepath is always a risk.
      Updating the software can have great advantages. Only a slightly more reliable connection would allow vast amounts of more science to be done. Adapting the algorithms for autonomous functions such as simple navigation or sample processing also makes a great difference when your lag time for a single command is measured in terms of minutes and you don't even have that level of "real-time" access most of the time.
      
      Parent Share
      twitter facebook
      - Re:And NASA has made mistakes with this before... (Score:5, Insightful)
        
        by Bigby ( 659157 ) writes: on Friday August 10, 2012 @10:20AM (#40945581)
        
        I think it is safe to assume that they purposely bricked the rover (or test rover) before the mission. And made sure it played out as the GP stated. And that they did this many different ways.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by Ruie ( 30480 ) writes:
        
        I think it is safe to assume that they purposely bricked the rover (or test rover) before the mission. And made sure it played out as the GP stated. And that they did this many different ways.
        Ideally - yes. In practice, they have limited funds and lots of deadlines.
        If they had lots of time to debug it, there would be no need to upload new software.
    - Re: (Score:2)
      
      by Jane Q. Public ( 1010737 ) writes:
      
      Haha, I wrote pretty much the same thing, at about the same time. See below.
      - Re: (Score:2)
        
        by somersault ( 912633 ) writes:
        
        I think most of us thought it. That probably means that NASA thought it too. Unless they were really against doing such a thing to save space/weight, but I think a few extra grams and square inches to have a recovery partition is definitely worth it, considering bricking the thing means you just wasted several billion dollars..
    - Re:And NASA has made mistakes with this before... (Score:5, Interesting)
      
      by Confusador ( 1783468 ) writes: on Friday August 10, 2012 @08:29AM (#40944439)
      
      They do indeed have systems like that, if you're interested it's worth looking into how they dealt with the Sol 18 Anomaly on Spirit. Of particular note is the "Shutdown Dammit" command that they used to override everything else the rover was doing so it would stop wasting battery overnight.
      Seeing as they were able to update the software on a device that wouldn't even finish booting, I imagine the procedures for doing it on a functioning device are pretty robust, even if they're still nailbiting.
      
      Parent Share
      twitter facebook
    - Re:And NASA has made mistakes with this before... (Score:5, Funny)
      
      by DrXym ( 126579 ) writes: on Friday August 10, 2012 @01:03PM (#40947845)
      
      Similar to some devices here on Earth, the rover should have an automatic revert solution.
      
      It does. Scientists put a small switch in at the back which you hold down while powering it up and it will reset itself.
      
      Parent Share
      twitter facebook
  - Re:And NASA has made mistakes with this before... (Score:5, Interesting)
    
    by Jane Q. Public ( 1010737 ) writes: on Friday August 10, 2012 @04:34AM (#40943277)
    
    "I do not think taking the risk of bricking the rover for a regular software upgrade is worth the danger of bricking the rover..."
    I guess it all depends on on (A) what the perceived value of the upgrade is, versus (B) the perceived risk.
    
    It's probably a safe bet that they learned from the Surveyor issue, and built in better tests and safeguards. I imagine -- although I don't really know -- that they have implemented something like the "rolling upgrades" that are common now, which allow processes to replaced on the fly one at a time, without reboot, and with a failsafe revert that runs at a higher level than any of those processes if anything goes wrong.
    
    It isn't like Windows, in which just about every time you install or upgrade something you have to make all the changes then "reboot". They get done one at a time, and they are tested individually after they are made.
    
    It sounds complicated but conceptually it's pretty simple: you have a top-layer monitor program program that accepts commands to replace lower-level processes. All it needs to be pretty "fail-safe" is to wait for a specified period of time for an "okay" signal from Ground Control. If it doesn't receive one in the specified time, it automatically reverts the process back to the old version. It's a little more involved than that, but that's the idea.
    
    Lots of software does that now. A lot has improved since 1996.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by datapharmer ( 1099455 ) writes:
      
      as long as it isn't HP writing the installer I'm ok with it... Installing printer... error... rolling back... installing.. error... ad infinitum
  - Re: (Score:2)
    
    by TenDollarMan ( 1307733 ) writes:
    
    Yeah, but maybe the new JellyBean will be totally awesome!
  - Re: (Score:2)
    
    by coofercat ( 719737 ) writes:
    
    I imagine it's a lot easier to change the software than it is to change the hardware. I have no idea what kit the rover has in it, but since my phone camera used to take bad pictures until a software update came along, I should think Nasa probably want to upgrade the software in their cameras in preference to biking a new camera out to Mars. I seem to remember that the drill may contaminate samples with teflon or something - that being the case, I'm sure they've got a fancy filter than can remove most of th
  - Re: (Score:3, Informative)
    
    by necro81 ( 917438 ) writes:
    
    In some cases, the software loaded on the device is not suited to the task the engineers want it to do. TFA mentions that the software on the device now is geared towards interplanetary cruise, EDL, and some very basic on-the-surface tasks. If they actually want the rover to do what they've sent it there to do, they need to perform the upgrade. Why not have the entire suite of mission software on the rover when it launches? Perhaps they hadn't gotten around to coding/testing the on-the-surface software
  - virus protection (Score:5, Funny)
    
    by gsgriffin ( 1195771 ) writes: on Friday August 10, 2012 @09:23AM (#40944891)
    
    Probably concerned that their virus software is now out of date after the long journey.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by mcgrew ( 92797 ) * writes:
    
    They've upgraded software on the other two rovers, as well as probes even farther away. I doubt there's any reasonable chance they'll brick it.
  - Re: (Score:3)
    
    by Frans Faase ( 648933 ) writes:
    
    It is not such a big risk and it has been done many times before with all kinds of space crafts. And you should also realize that many safety precautions has been build into the system. It is definitely not like doing a OS update on a PC. I presume that in case something goes wrong, the rover will get into some kind of safe mode sooner or earlier, allowing to establish communication again. Safe mode communication is at a very slow speed and it could take some time to establish contact again, but in many cas
  - Re: (Score:2)
    
    by pingbak ( 33924 ) writes:
    
    No, the likelihood of getting bricked is really small, although the likelihood of misaligned or damaged equipment failure is much greater.
    "Bricking" is really small because there is always a known, good image that preceded the update. In the case of a failure, these spacecraft go into a "safe hold" mode (there are actually several different safe hold levels). The lowest safe hold level ensures that the operator always has access to a low-level monitor. This monitor allows the operator to select which image
  - - Re: (Score:2, Interesting)
      
      by Anonymous Coward writes:
      
      Unbelievable, this is so stupid...
      WHY NOT INCLUDE SECOND BIOS? or whatever fuck they are using? if its so precious and easilly broken, why not use back up hardware? It's not like it would add another half kilo of weight???? Risk is TOO BIG not to do that. A few grams => problem solved.
      - Re:And NASA has made mistakes with this before... (Score:5, Interesting)
        
        by Sean Hederman ( 870482 ) writes: on Friday August 10, 2012 @07:06AM (#40944011) Homepage
        
        First off, shielded hardware is NOT a few grams. A second system adds a significant amount of weight. Each gram added to the rover is several hundred kilos more propellant required. In any case, they DID add a second system, which will take over in the event of an emergency. However, even then, an update is quite perilous, because you could theoretically brick the one system, and if something else goes wrong, you now have no backup.
        
        Parent Share
        twitter facebook
        
        Re:And NASA has made mistakes with this before... (Score:4, Informative)
        
        by fisted ( 2295862 ) writes: on Friday August 10, 2012 @10:00AM (#40945319)
        
        It's not a linear relationship since you need additional propellant to move the additional propellant you needed for the extra payload
        
        Parent Share
        twitter facebook
        
        Re:And NASA has made mistakes with this before... (Score:4, Informative)
        
        by jpmorgan ( 517966 ) writes: on Friday August 10, 2012 @05:24PM (#40951667) Homepage
        
        No, it follows from the Tsiolkovsky rocket equation, and it is linear. The amount of fuel required is exponential in the delta-V required, but linear in the payload mass. m_1 = m_0 e^{- \Delta v / v_e}
        
        Parent Share
        twitter facebook
      - Re:And NASA has made mistakes with this before... (Score:4, Informative)
        
        by Frans Faase ( 648933 ) writes: on Friday August 10, 2012 @10:14AM (#40945499) Homepage
        
        If you would inform yourself, you would know that we are not talking about a general PC with 4Gbytes of memory here, but about a much smaller (but reliable and radiation hardend) PowerPC compatible system with limited RAM. The reason that they planned this update is because they want to remove the flight software for the trip to mars and replace it by software needed to drive and control the rover. It is true that they spend improving the software during the time that the spacecraft was flying to mars. That would be more than logical to do. Please note that the software for the Spirit and Opportunity rover also have been updated several times. It would not surprise me, that when they know the Curiosity Rover better, they will perform another software update.
        
        Parent Share
        twitter facebook
- Oblig. (Score:5, Funny)
  
  by AliasMarlowe ( 1042386 ) writes: on Friday August 10, 2012 @03:29AM (#40942849) Journal
  
  So what's their problem? Just tell a sysadmin [xkcd.com] to fix it.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by Solandri ( 704621 ) writes:
  
  Also Viking 1: http://en.wikipedia.org/wiki/Viking_1#Lander [wikipedia.org]
- Re:And NASA has made mistakes with this before... (Score:4, Informative)
  
  by kasperd ( 592156 ) writes: on Friday August 10, 2012 @04:42AM (#40943329) Homepage Journal
  
  they have made mistakes in updating from Earth to Mars before.
  Sounds like it was not just a software update gone wrong but rather some mechanical problem which they were trying to work around. It was nothing like the usual bricking problem, where a firmware update overwrites code which is needed to perform future firmware updates.
  
  The rovers have several mechanisms to make it safer to update firmware remotely. But ultimately a combination of multiple unfortunate events can still lead to the loss of a rover. And one of those events may have been human error. From the description it sounds like mechanical problems with the solar panel, combined with two cases of human error in coordination of updates, another case of human error trying to correct the previous human errors, an unfortunate condition triggering a latent problem introduced by previous errors, and finally ending up in a position causing the battery to overheat, and loss of power being the ultimate reason it was impossible to adjust the previous mistakes.
  
  Parent Share
  twitter facebook
- Re:And NASA has made mistakes with this before... (Score:5, Informative)
  
  by gagol ( 583737 ) writes: on Friday August 10, 2012 @04:49AM (#40943357)
  
  That is probably why a team of 100 software engineers issues about 1000 commands per day for the rover. My guess is a lot of the work is triple checking everything before they upload an update. There is just no room for error in this situation.
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by gagol ( 583737 ) writes:
    
    For those wondering where the numbers come from, just read the article!
- Re: (Score:3)
  
  by wmac1 ( 2478314 ) writes:
  
  There are 2 separate computers on the board. Perhaps they upgrade one of them and after it worked correctly they transfer control to it and upgrade the other one?
Actually... only 157 million miles away (Score:5, Informative)

by ronhip ( 465417 ) writes: on Friday August 10, 2012 @03:10AM (#40942711)

The spacecraft TRAVELLED 350 million miles to get there, but as of tonight, Mars is only about 157.5 million miles from Earth.

Share
twitter facebook
- Re:Actually... only 157 million miles away (Score:5, Funny)
  
  by Anonymous Coward writes: on Friday August 10, 2012 @03:36AM (#40942909)
  
  Forgot something and noticed halfway? Happens to me all the time...
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by K. S. Kyosuke ( 729550 ) writes:
    
    The space highways are curved a lot.
- Re:Actually... only 157 million miles away (Score:5, Funny)
  
  by TubeSteak ( 669689 ) writes: on Friday August 10, 2012 @03:53AM (#40943035) Journal
  
  Good news everyone!
  NASA will only have to wait half as long to find out if their software upgrade worked!
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by RaceProUK ( 1137575 ) writes:
    
    Good news everyone! NASA will only have to wait half as long to find out if their software upgrade worked!
    Now read that in Farnsworth-voice...
- Re: (Score:2)
  
  by toygeek ( 473120 ) writes:
  
  Oh is that all?
- Re: (Score:3, Funny)
  
  by TenDollarMan ( 1307733 ) writes:
  
  Curiosity made the Mars run in 1.82543347 × 10-5 Parsecs
- Re: (Score:2)
  
  by Dave Whiteside ( 2055370 ) writes:
  
  1.684Au or
  156,537,715 miles
  see
  http://www.fourmilab.ch/cgi-bin/uncgi/Solar/ [fourmilab.ch]
- Re: (Score:2)
  
  by MachineShedFred ( 621896 ) writes:
  
  If they do it right, it shouldn't matter if it's 157 feet, or 157 million miles.
  Are we supposed to believe that they haven't tested and hardened this before lighting the fuse?
- Re: (Score:2)
  
  by pingbak ( 33924 ) writes:
  
  Gravity slingshots and curvature. It's very ineffecient to travel in a straight line from Earth to Mars.
Not the same cost to get wrong, but (Score:2, Interesting)

by Anonymous Coward writes:

Working in remote smart metering we have a similar problem, where you can brick meters if the signal drops at the wrong place, or firmware doesn't fit the hardware right.
Wow (Score:5, Insightful)

by undulato ( 2146486 ) writes: on Friday August 10, 2012 @03:15AM (#40942747) Homepage

NASA doing a software upgrade is not big news. This is going to be phenomenally safe. Much scarier doing software upgrades on millions of unknown hardware configurations globally than on one totally locked down platform no matter what distance or cost is involved.

Share
twitter facebook
- Re: (Score:2)
  
  by AchilleTalon ( 540925 ) writes:
  
  Agree, NASA has done a complete upgrade on the previous rover. This isn't new stuff and it has been tested, the procedure is well known. Well, yes, someone may do stupid thing at the wrong time, however, the main difference is the speed of transfer and the delay between transmission and confirmation everything went fine. The environment is well controlled and I do not doubt there is fallback mechanisms in place. So, I'm sorry, on this one I am not really impressed by the NASA team.
  - Pressure changes things (Score:5, Interesting)
    
    by jeko ( 179919 ) writes: on Friday August 10, 2012 @04:34AM (#40943273)
    
    Get a 10-foot 4X4 piece of lumber. Drop it flat on the ground. Walk from one end to the other like a balance beam. I'll bet you can do it. I'll bet you can do it blindfolded, walking backward. I'll bet you can do it reciting the alphabet backward. I'll bet you could do it drunk.
    Take that same 4X4, suspend it 20 stories in the air between a couple of cranes. Put a bunch of razor sharp, rotating propellers on the ground beneath it. Intersperse the propellers with oil drillbits pointed up, not down for once. Have a bunch of trained turkey vultures flying around to watch you fall. Take your wife, kids and your momma, put a gun in their mouths while the Joker cackles that when you fall, he's gonna blow their heads off. Bring in the television cameras and monitors so the whole World can watch and you can watch them watch. Have some intern read the tweets and comments sections about your plight over the loudspeakers.
    Now, there are a few ice-blooded "Licensed to Kill" Double-O men who could keep it together and walk that beam under that kind of pressure. Mary Lou Retton and Nadia could, no doubt. I seriously doubt I could.
    Is it a big deal to do a software upgrade under such tightly controlled conditions? Not really. But try doing that software upgrade when billions of dollars and your career is on the line, with the whole world watching. The guy who screws that up is gonna be a punchline and a byword for a few decades, a real Wilson if you've read that book. :-) You'll be known as the guy who screwed up Mars.
    Tell me there wouldn't be maybe one or two drops of sweat on the keyboard...
    
    Parent Share
    twitter facebook
    - - Re: (Score:2)
        
        by jkflying ( 2190798 ) writes:
        
        And all your colleagues lose their last few years of work.
        
        Re: (Score:3)
        
        by fuzzyfuzzyfungus ( 1223518 ) writes:
        
        And hell hath no fury like epic nerd-rage...
        If the firmware guys brick this thing, they'll probably be found in either the decompression test chamber with their eyeballs boiling off, or floating in the old hydrazine tank out back.
  - Re: (Score:2)
    
    by Bigby ( 659157 ) writes:
    
    I'm impressed that they found budget dollars for proper testing. Maybe they estimated the cost of failure to be $2.5b. I wish I could do that at work.
- - - Re:Wow (Score:5, Interesting)
      
      by arth1 ( 260657 ) writes: on Friday August 10, 2012 @07:30AM (#40944105) Homepage Journal
      
      That reminds me... I have sometimes wondered what security protocols NASA (and their Russian counterparts) have in place for their probes. Back from now to the 1970s, when security wasn't nearly as advanced as it is today.
      Is it possible that someone with a large directional backyard antenna can hack some of the probes? To be remembered as the man who killed Voyager 2 might be attractive for some people.
      And who's to say that this hasn't already happened? There are non-responding probes out there, with no evidence for why they failed.
      
      Parent Share
      twitter facebook
Hold F8, Boot to Safemode - which lacks networking (Score:3)

by DontScotty ( 978874 ) writes: on Friday August 10, 2012 @03:23AM (#40942807) Homepage Journal

By pressing F8 at the "Starting Windows 95" message, and then choosing Safe Mode from the Windows 95 start-up menu.
Following these steps will gain you ultimate FAME and FAILURE - for updating the Mars software!!!

Share
twitter facebook
- Re: (Score:2)
  
  by fatphil ( 181876 ) writes:
  
  I can't even get to that stage, it keeps giving me a keyboard error - did no-one stick one on Curiosity?
- Re: (Score:3)
  
  by wvmarle ( 1070040 ) writes:
  
  No keyboard found. Press to continue.
  - Re: (Score:2)
    
    by wvmarle ( 1070040 ) writes:
    
    No keyboard found. Press <F1> to continue.
    (correcting for HTML... preview? What preview? Oh, that preview...)
Imagine if it had been in kilometers (Score:2)

by G3ckoG33k ( 647276 ) writes:

Imagine how far it would have been if they had measured it in kilometers instead!
Whoaw!
.
.
-
.
.
. ;)
- Re: (Score:2)
  
  by fa2k ( 881632 ) writes:
  
  For those distances I just read "miles" as "kilometers". A factor of 1.6 doesn't really make a huge difference for a casual understanding.
Should have gone with Debian.. (Score:2, Funny)

by Anonymous Coward writes:

sudo apt-get update mars
Software upgrades.... (Score:2)

by disi ( 1465053 ) writes:

It will sit there forever: "Are you sure you want to update? Yes/No"
- Re: (Score:3)
  
  by lxs ( 131946 ) writes:
  
  Have you tried turning it off and on again?
Re: (Score:2, Insightful)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
Should be easy enough (Score:3)

by symes ( 835608 ) writes: on Friday August 10, 2012 @06:01AM (#40943697) Journal

They are bound to have a copy of Curiosity here on Earth, surely? So they should be able to thoroughly test the process first. Ok, it is not Mars and there might be issues specific to transmitting that data over such distances... but still. I'd be really surprised if this hasn't been thoroughly tried and tested.

Share
twitter facebook
- Re: (Score:2)
  
  by ledow ( 319597 ) writes:
  
  More than that, if you design the system properly it would never be a problem.
  Watchdog timers on everything - on the hardware coming up, on the communications with Earth, etc. If you don't get a response from the timers in X seconds/minutes/days, then completely revert to the previous version of the software and try again.
  So if you upgrade the software and break the radio, in a day or so of not being able to talk to Earth, the machine should notice and revert back to the previous software. If you break th
Well, it's not really the same (Score:2)

by aglider ( 2435074 ) writes:

But the tecnologies used in some botnets are a goot starting points.
That'd be, call home and try to pull anything you need to do the upgrade.
The orbiter relay should be doing the same, first.
Terminology? (Score:2)

by Guppy06 ( 410832 ) writes:

a new version of the flight software on the Mars rover Curiosity
Is anybody else thinking that any changes to the flight software is now a few days too late?
- Re: (Score:3)
  
  by the eric conspiracy ( 20178 ) writes:
  
  Yeah, the freakin summary is potty as usual.
  They aren't upgrading the flight software. They are replacing the flight software with driving around and exploring software.
Can't they just SSH into it? (Score:2)

by scorp1us ( 235526 ) writes:

I mean, the lag is going to be on par with SSH in to a terrestrial server with my AT&T service and cell phone.
Latency! (Score:2)

by DarthVain ( 724186 ) writes:

People complain of 300ms of latency here on earth with their ISP. I have heard it takes 14 MINUTES for a signal round trip. Thats 840 seconds, or 840,000ms of latency. So you are not exactly programming on the fly.
The worst part, would be that presumably there is some pretty robust simulated debuggery on earth before anything gets transmitted. However once you finally tested, confirmed, compiled, packaged etc... and press the send button. You have to wait likely an eternal excruiciating 14 minutes before yo
This is NASA, give them a break (Score:2)

by TheSpoom ( 715771 ) writes:

I love how everyone here is like, "Y'know, they really should have a backup software solution on the rover" or "If I was doing this, I would do this, that, and the other thing, and they're stupid for not doing that".
An awful lot of assumptions being made about people who are probably the very top of their game. I'm going to give NASA the benefit of a doubt here: I think they wouldn't do the upgrade unless it was very beneficial, and I'd bet they're doing it in a way that has layers upon layers of safeguard
- Re: (Score:3, Funny)
  
  by Anonymous Coward writes:
  
  Thank you so much Mr. Wowsers for giving NASA this great idea. I suspect, given the genius of the thought, you will be contacted for employment shortly.
- Re: (Score:2)
  
  by zachie ( 2491880 ) writes:
  
  This, and also having a full replica of the whole rover on Earth to double check that any software updates won't screw the whole operation. But I can't imagine they are not doing these already :?
- Re:Failsafe (Score:5, Informative)
  
  by fatphil ( 181876 ) writes: on Friday August 10, 2012 @03:37AM (#40942915) Homepage
  
  Exactly. That's how it's done in the telecomms world (infrastructure, not terminals). Typically the new software is given three attempts to boot, and if it doesn't acknowledge that it's fully booted after three attempts, the bootloader falls back to the previous version of the software. Of course, things get tricker if you need to update the bootloader, but those should be very rare situations. However, they in turn can be handled a similar way (typically there's a 3-stage boot, the initial being a ROM bootstrap, then your bootloader, then the OS which you'll want to change).
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Jane Q. Public ( 1010737 ) writes:
    
    It's extremely unlikely they will do anything even remotely resembling a "reboot". Instead they will carefully replace one process at a time, with no restarting.
    - Re: (Score:2)
      
      by fatphil ( 181876 ) writes:
      
      That depends on what OS they're running. And whether they need to change anything in that OS itself. And whether they think they can trust the current state of the system. If the reason you're patching the software is because there's a bug which means you can't trust the state of the system, such as a scribbler, then the last thing you want to do is to attempt to continue running in that state (even dumping your state for later debugging is dangerous - you can no longer trust the data that in the flash driv
      - Re: (Score:2)
        
        by ourlovecanlastforeve ( 795111 ) writes:
        
        They're running vxworks and they do have a backup computer. First the backup is flashed and verified, then the primary is flashed and verified.
        
        Re: (Score:2)
        
        by fatphil ( 181876 ) writes:
        
        I had presumed it would be VxWorks, as I know they've used it in plenty of previous projects, and it certainly is one of the most capable embedded OSes in existence. By 'verified', do you mean just verifying a HMAC? Why 2 flashes - why not just a bank switch?
      - Re: (Score:2)
        
        by Brian Feldman ( 350 ) writes:
        
        If by computer you mean some kind of ground system....
- Re:Failsafe (Score:5, Informative)
  
  by Anonymous Coward writes: on Friday August 10, 2012 @03:42AM (#40942949)
  
  Computers: The two identical on-board rover computers, called "Rover Compute Element" (RCE), contain radiation hardened memory to tolerate the extreme radiation from space and to safeguard against power-off cycles. Each computer's memory includes 256 kB of EEPROM, 256 MB of DRAM, and 2 GB of flash memory.[22] This compares to 3 MB of EEPROM, 128 MB of DRAM, and 256 MB of flash memory used in the Mars Exploration Rovers.[23]
  The RCE computers use the RAD750 CPU, which is a successor to the RAD6000 CPU used in the Mars Exploration Rovers.[24][25] The RAD750 CPU is capable of up to 400 MIPS, while the RAD6000 CPU is capable of up to 35 MIPS.[26][27] Of the two on-board computers, one is configured as backup, and will take over in the event of problems with the main computer.[22]
  http://en.wikipedia.org/wiki/Curiosity_rover#Specifications [wikipedia.org]
  Data transfer speeds between Curiosity and each orbiter may reach 2 Mbit/s and 256 kbit/s, respectively, but each orbiter is only able to communicate with Curiosity for about eight minutes per day
  When you have little bandwidth, better get it right the first time.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Jane Q. Public ( 1010737 ) writes:
    
    In all honesty, except for the MIPS figure, that seems like pretty lame hardware for something of this importance.
    
    But I'll bet that it's misleading: the majority of the functions probably aren't performed directly in the CPU and main memory, but by sub-modules running off of PLAs.
    - Re:Failsafe (Score:5, Insightful)
      
      by gagol ( 583737 ) writes: on Friday August 10, 2012 @05:01AM (#40943413)
      
      Given it is radiation hardened specs, those are fabulous! You cant just get your latest core i7 and expect it to work correctly once it escapes the protection of earth's magnetosphere. Also, heat dissipation is much more trickier when you dont have air to work with (space) or cannot afford top replace air filters for the cooling systems (mars).
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by Jane Q. Public ( 1010737 ) writes:
        
        "Given it is radiation hardened specs, those are fabulous!"
        Not really. That might have been true 10 years ago.
        
        Hey... they had to radiation-harden this thing against ITSELF.
        
        All I'm saying is: you can bet the hardware is in a well-shielded heavy metal box, and today all it takes is about 1/4 of a cubic inch to squeeze in another GB of RAM or flash.
        
        So I suspect that they are using a system that is a bit more "distributed" (conceptually) than your everyday PC.
        
        Re:Failsafe (Score:5, Informative)
        
        by jkflying ( 2190798 ) writes: on Friday August 10, 2012 @05:45AM (#40943603)
        
        The radiation this thing emits is NOTHING compared to the solar and cosmic radiation it would experience both in transit and on Mars. Putting everything in a metal box only helps so much, you still need specifically designed electronics which can handle the odd bit of radiation without dying. Even with a thick metal box you can't run an i7 on Mars, or not for very long at least. Your standard DDR3 isn't going to work either, or your standard EEPROM.
        The other thing to remember is that although this project is extremely important, they're still not going to throw more capabilities in than they need, because that is more that can go wrong. For a remote sensing platform, the amount of EEPROM isn't that important - you just need enough to hold your communication protocols, some basic reaction-to-obstacle algorithms and the motor control code. You aren't going to be pulling massive libraries in. The emphasis is on making it as simple as possible, so that there is less chance for bugs to creep in. Those extra MIPS will come in handy for the navigation and onboard image processing, and the flash for storing interesting info until you can upload, so those are what they have upgraded the most.
        
        Parent Share
        twitter facebook
        
        tried any of this in the field? (Score:2)
        
        by dutchwhizzman ( 817898 ) writes:
        
        It takes 3-5 years to field test this stuff. It takes years of preparation after the final decision of what hardware to use before you get to launch the thing and after that, to get it to mars. You are looking at the best of the best, proven technology hardware available for this sort of radiation tolerance at the moment they had the last opportunity to make design changes.
        One does not simply fly to Mars.
        
        Re:Failsafe (Score:4, Informative)
        
        by serviscope_minor ( 664417 ) writes: on Friday August 10, 2012 @08:44AM (#40944531) Journal
        
        Not really. That might have been true 10 years ago.
        No.
        All I'm saying is: you can bet the hardware is in a well-shielded heavy metal box, and today all it takes is about 1/4 of a cubic inch to squeeze in another GB of RAM or flash.
        I wonder why they didn't think about that. A nice thick, heavy metal box. Easy! Perhaps you should go and work for NASA?
        Let's ignore the earth's magnetosphere for the moment and make some massive assumptions.
        The pressure on the ground is about 10^5 Pa. That means there's 10^4 Kg of stuff above you to absorb radiation from space. That equates to 10m of water, 1.25m of steel ot about 90cm of lead. Quite a lot.
        Mars is about 1.5 Au from the sun, so receives about 0.4 times the radiation.cos
        The atmosphere is about 600Pa, by comparison.
        Radiation hardening is a very well established field. Using some degree of shielding is just one of the many techniques in use. On Mars, it is simply not enough on its own.
        It is very, very difficult to make a rad-hard processor, and then very thoroughly test it. Yo can't just keep shrinking the feature size, because is it goes down, the effect of radiation increases. Not only that but as the amount of crystal per transistor shrinks, the chance of unrecoverable lattice damage increases, due to the lack of redundancy.
        There are faster Rad-hardened DSPs, but those are, well, DSPs and only actually really fast for DSP like tasks.
        There also are almost certainly faster ones available now. But it's been in transit for a year, and they certainly weren't building it with a brand-new untested processor for which thay had to write all the software on the way after they launched it.
        So, given the constraints, it's a pretty great CPU to have on board.
        
        Parent Share
        twitter facebook
    - Re: (Score:3)
      
      by fuzzyfuzzyfungus ( 1223518 ) writes:
      
      The RAD750 [baesystems.com] is quite limited in power; but has the advantage of being comparatively close to 'just going down to newegg and buying a motherboard' by the standards of projects that go into space and shop at mil/aero contractors... The price is still up in the "If you have to ask, don't ask" range; but doing a very-low-volume DIY would likely be worse still...
- Re: (Score:2)
  
  by Sollord ( 888521 ) writes:
  
  The rover has two computers ones a fully redundant back up and I'd hope they didn't build a system that requires both system to be upgrade at the same time...
- Re: (Score:3)
  
  by PhunkySchtuff ( 208108 ) writes:
  
  Not only am I absolutely sure they've got more than one copy of critical data in flash, but they have two identical and redundant computers on board
  http://en.wikipedia.org/wiki/Curiosity_rover#Specifications [wikipedia.org]
  From http://marsprogram.jpl.nasa.gov/msl/mission/rover/brains/ [nasa.gov]
  The rover has two "computer brains" one which is normally asleep. In case of problems the other computer brain can be awakened to take over control and continue the mission.
- Re: (Score:2)
  
  by wvmarle ( 1070040 ) writes:
  
  According to the linked article, they have two computers on board.
  They're currently testing the computers to see everything works as intended, then upgrade the main computer, and if that goes fine upgrade the backup computer. Also the new software has been uploaded in transit, so at the moment they have both software systems (the landing system and the surface work system) on their craft.
  What is not clear from the article, is how independent these computers are. E.g. what would happen if the upgrade fails p
  - Re: (Score:3)
    
    by kasperd ( 592156 ) writes:
    
    What is not clear from the article, is how independent these computers are. E.g. what would happen if the upgrade fails partially, with the main computer trying to take over the craft, while the backup computer is still on the original program.
    That's always a risk if you have two computers for redundancy. To completely solve that problem, you need four computers. But the algorithms for coordinating in such a scenario are complicated. So it might be safer to rely on systems being able to use the proper compu
    - Re: (Score:3)
      
      by Jane Q. Public ( 1010737 ) writes:
      
      "If you had a 3 out of 4 setup with the four computers running identical software, it only takes one software bug to bring down the system."
      Not at all. You have a separate "supervisor" board that moderates among the computers. In a case like that, you only need 3 for Damned Good Redundancy, not 4.
      
      But I expect that NASA has good reason to have faith in the reliability of their dual machine.
      - Re:Failsafe (Score:5, Interesting)
        
        by kasperd ( 592156 ) writes: on Friday August 10, 2012 @06:29AM (#40943831) Homepage Journal
        
        You have a separate "supervisor" board that moderates among the computers.
        And then that board becomes a single point of failure.
        In a case like that, you only need 3 for Damned Good Redundancy
        3 computers and a supervisor? That's already 4 components.
        
        If you want to handle t arbitrary node failures, then you need at least 3t+1 nodes in total. Whether you call the nodes for computers or supervisor boards doesn't change that fact. If you have t failures among 3t or fewer total nodes, then the failures can happen in a way that cause the functional units to receive so inconsistent information, that they are unable to do anything meaningful. It is a case of byzantine agreement.
        
        Any system designed to handle failures of one third or more components is making assumptions about how the failed components behave. If the failed components behave differently than the assumption, it takes even fewer failures to break the entire system.
        
        Parent Share
        twitter facebook
- Re: (Score:3)
  
  by c0lo ( 1497653 ) writes:
  
  i hope theres a really, really good reason why the need to update the software at all
  Well, zero-day exploits.. and Wikileaks... and anonymous not forgiving or forgetting... and Duqu/Flame/Mahdi...
  (grin)
- Re:hmm (Score:5, Insightful)
  
  by hey_popey ( 1285712 ) writes: on Friday August 10, 2012 @03:39AM (#40942929)
  
  Of course, not! They do it just for the lulz!
  More seriously, for space systems and embedded systems in general, due to resource constraints on-board, you usually cannot fit all the functionality you would like to in one software image. So you keep only what is necessary for the first mission, and then you replace the obsolete ones with the next thing you want to do.
  As a simplified example, when you launch a satellite, you will need it to deploy its solar arrays quickly (and do many initialization checks). When that is done, you could imagine changing this part of the software with something else...
  
  Also, they might have had time planning constraints on the project, and needed to launch with a simpler first version of the software, while finalizing the second one. That does happen.
  
  Parent Share
  twitter facebook
- Re:it can fly? (Score:5, Informative)
  
  by Bonobo_Unknown ( 925651 ) writes: on Friday August 10, 2012 @03:45AM (#40942967)
  
  The point of the exercise is to replace the no longer needed flight software with software it can use to better perform it's tasks while on Mars.
  
  Parent Share
  twitter facebook
- Wrong Question (Score:2)
  
  by mutube ( 981006 ) writes:
  
  What we really need to know is why it didn't need flight software BEFORE now?! Obviously it isn't really on Mars... if 'Mars' even exists. Lizards all the way down I tell you! LIZARDS!!
- - Re: (Score:2)
    
    by Jane Q. Public ( 1010737 ) writes:
    
    Whitespace and the Red Planet would probably not get along.
- Re: (Score:2)
  
  by MSojka ( 83577 ) writes:
  
  "...as they get ready to download a new version of the flight software on the Mars rover Curiosity..."
  Flight software? She flying back too?
  
  "Flight" as in "fight-or-flight response". You know, in case Curiosity encounters Martian life which think it's delicious ... or at least interesting enough to study and take apart.
  Those people at NASA think of everything ...
- - Re: (Score:2)
    
    by biodata ( 1981610 ) writes:
    
    I thought the point the poster was trying to make was that there should be two rovers ON EARTH - one that they try stuff out on (the dev rover), and one that they only do stuff to that actually gets done to the one on Mars (the test rover). That way they can hopefully control for the effects of doing and undoing changes, and they will always have one system here that is in the same state as the one on Mars, except for while the one on Mars is being updated. Engineers often like to set things up that way fo
- Re: (Score:2)
  
  by sunking2 ( 521698 ) writes:
  
  While it's a rover at this point, NASA considers it a space craft. The term "flight" is equivalent to "production". It means it's on the actual hardware and has passed the appropriate certifications.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

And NASA has made mistakes with this before... (Score:5, Interesting)

Re:And NASA has made mistakes with this before... (Score:5, Interesting)

Re:And NASA has made mistakes with this before... (Score:5, Informative)

Re:And NASA has made mistakes with this before... (Score:5, Funny)

Re: (Score:3)

Re: (Score:2, Funny)

Re:And NASA has made mistakes with this before... (Score:5, Interesting)

Re:And NASA has made mistakes with this before... (Score:5, Insightful)

Re:And NASA has made mistakes with this before... (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re:And NASA has made mistakes with this before... (Score:5, Interesting)

Re:And NASA has made mistakes with this before... (Score:5, Funny)

Re:And NASA has made mistakes with this before... (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

virus protection (Score:5, Funny)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2, Interesting)

Re:And NASA has made mistakes with this before... (Score:5, Interesting)

Re:And NASA has made mistakes with this before... (Score:4, Informative)

Re:And NASA has made mistakes with this before... (Score:4, Informative)

Re:And NASA has made mistakes with this before... (Score:4, Informative)

Oblig. (Score:5, Funny)

Re: (Score:2)

Re:And NASA has made mistakes with this before... (Score:4, Informative)

Re:And NASA has made mistakes with this before... (Score:5, Informative)

Re: (Score:3)

Re: (Score:3)

Actually... only 157 million miles away (Score:5, Informative)

Re:Actually... only 157 million miles away (Score:5, Funny)

Re: (Score:2)

Re:Actually... only 157 million miles away (Score:5, Funny)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Not the same cost to get wrong, but (Score:2, Interesting)

Wow (Score:5, Insightful)

Re: (Score:2)

Pressure changes things (Score:5, Interesting)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re:Wow (Score:5, Interesting)

Hold F8, Boot to Safemode - which lacks networking (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Imagine if it had been in kilometers (Score:2)

Re: (Score:2)

Should have gone with Debian.. (Score:2, Funny)

Software upgrades.... (Score:2)

Re: (Score:3)

Re: (Score:2, Insightful)

Should be easy enough (Score:3)

Re: (Score:2)

Well, it's not really the same (Score:2)

Terminology? (Score:2)

Re: (Score:3)

Can't they just SSH into it? (Score:2)

Latency! (Score:2)

This is NASA, give them a break (Score:2)

Re: (Score:3, Funny)

Re: (Score:2)

Re:Failsafe (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Failsafe (Score:5, Informative)

Re: (Score:2)