Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Space

Crowdfunded, Solar-powered Spacecraft Goes Silent 366

Last week saw the successful launch of the Planetary Society's LightSail spacecraft, the solar-powered satellite that runs Linux and was crowdfunded on Kickstarter. The spacecraft worked flawlessly for two days, but then fell silent, and the engineering team has been working hard on a fix ever since. They've pinpointed the problem: a software glitch. "Every 15 seconds, LightSail transmits a telemetry beacon packet. The software controlling the main system board writes corresponding information to a file called beacon.csv. If you're not familiar with CSV files, you can think of them as simplified spreadsheets—in fact, most can be opened with Microsoft Excel. As more beacons are transmitted, the file grows in size. When it reaches 32 megabytes—roughly the size of ten compressed music files—it can crash the flight system." Unfortunately, the only way to clear that CSV file is to reboot LightSail. It can be done remotely, but as anyone who deals with crashing computers understands, remote commands don't always work. The command has been sent a few dozen times already, but LightSail remains silent. The best hope may now be that the system spontaneously reboots on its own.
This discussion has been archived. No new comments can be posted.

Crowdfunded, Solar-powered Spacecraft Goes Silent

Comments Filter:
  • Seriously? (Score:5, Insightful)

    by Anonymous Coward on Friday May 29, 2015 @09:23AM (#49798615)

    I’m usually the first to defend others when some bug like this makes it through testing. Hindsight always being 20/20, only takes one bug amongst a million good bits of code, etc. But this just seems like something that even basic testing should have caught.

    Did they not run this thing on the ground for a few weeks? That’s just basic testing, especially for something that is going to be inaccessible for a while. Also that some critical bit of processing relies on stuff being written (and then presumably read back from) a csv file is very worrying.

    This sounds like some very shoddy work.

    • Re:Seriously? (Score:5, Insightful)

      by Mr D from 63 ( 3395377 ) on Friday May 29, 2015 @09:31AM (#49798689)
      Testing might have found it, but I'd say that regardless of testing they should assume something bad will happen with the software and have a mechanism in place to force reboot & update on a locked up system. Maybe they thought they did. Its a shame if they can't get it fixed.
      • Re:Seriously? (Score:5, Informative)

        by NotInHere ( 3654617 ) on Friday May 29, 2015 @09:43AM (#49798809)

        Their current plan is to wait charged particles to affect electronics so that it forces a reboot.

        Spacecraft are susceptible to charged particles zipping through deep space, many of which get trapped inside Earth’s magnetic field. If one of these particles strikes an electronics component in just the right way, it can cause a reboot. This is not an uncommon occurrence for CubeSats, or even larger spacecraft [planetary.org], for that matter. Cal Poly’s experience with CubeSats suggest most experience a reboot in the first three weeks; I spoke with another CubeSat team that rebooted after six.

      • Re:Seriously? (Score:5, Insightful)

        by macs4all ( 973270 ) on Friday May 29, 2015 @11:05AM (#49799583)

        Testing might have found it, but I'd say that regardless of testing they should assume something bad will happen with the software and have a mechanism in place to force reboot & update on a locked up system. Maybe they thought they did. Its a shame if they can't get it fixed.

        Speaking as an embedded developer, this is completely inexcusable.

        Not having a Watchdog, PLUS not making the limited-filesize log file "roll-over", is clearly Amateur-Hour stuff. Who wrote this code, anyway? An eight year old???

        Next we're going to hear that they bricked it with a software update, because they didn't think they needed to checksum the uploads, or provide enough RAM to hold the updated code before they re-flashed the OS, or something similar.

        Pathetic. They deserve to lose their spacecraft.

        Fortunately, if extraterrestrials discover the floating hulk of this abomination, they will (rightly) conclude that there is no intelligent life worth exploiting on this planet, and will decide not to enslave us...

        • Re:Seriously? (Score:5, Insightful)

          by amicusNYCL ( 1538833 ) on Friday May 29, 2015 @12:49PM (#49800471)

          Not having a Watchdog, PLUS not making the limited-filesize log file "roll-over", is clearly Amateur-Hour stuff. Who wrote this code, anyway? An eight year old???

          It's not even who wrote it, it's who designed it. Reading the summary actually made me angry that there is a group of people out there somewhere with the ability to build, launch, and track a satellite but without the common sense to recognize that they're creating a system that will grow infinitely in size without a mechanism to clear that data out. Does the satellite have unlimited storage space available? No? Then how about designing a way to monitor and clear the data other than saving it in /tmp?

          Pathetic. They deserve to lose their spacecraft.

          They definitely do. And no amount of descriptions of a CSV file meant for a grade school kid, or saying that 32MB is about the size of 10 songs, is going to minimize the schadenfreude that I'm feeling. Such a basic design error and they never even bothered to run tests for a significant period of time before putting the damn thing in space.

          Way to go, LightSail team. I dub thee LightFail.

          • Re:Seriously? (Score:4, Informative)

            by nitehawk214 ( 222219 ) on Friday May 29, 2015 @02:45PM (#49801389)

            These guys did not launch a satellite, ULA did. Basically LightSale simply took a ride on an Atlas 5 that was deploying the X-37B and was thrown out as a secondary payload. Pretty much anybody can do that. A lot of CubeSats are often made by college students.

            Also, describing CSV and measuring files in songs makes me want to punch Bill Nye, and I love Bill Nye.

    • Re:Seriously? (Score:5, Informative)

      by harperska ( 1376103 ) on Friday May 29, 2015 @09:37AM (#49798741)

      One report I read made it sound like they were aware of the bug for a while. It's possible that they had to launch with an old version of the software because the patch wasn't ready yet, and being a secondary payload on a launch you have no say whatsoever as to the launch date. They probably expected to be able to upload the patch after launch, but the log filled up faster than expected.

      That being said, it is shoddy programming to blindly write to a log on a resource-constrained embedded platform (or any platform, really. Just especially so on something like this), so somebody definitely goofed. All I am saying is that it probably was caught by testing, but couldn't be fixed in time due to various constraints. It was a dumb move on the developer's part to not do enough diligence and to rely too heavily on QA in the first place.

      • by Yoda222 ( 943886 )

        That being said, it is shoddy programming to blindly write to a log on a resource-constrained embedded platform (or any platform, really. Just especially so on something like this), so somebody definitely goofed.

        Maybe they did not blindly write log on a resource-constrained embedded platform. Depending on how much memory they had, and how short the mission duration was, they could have computed that 32MB/(2d) * length_of_mission was sufficiently smaller than available_space

        But as a test engineer in the space industry (having working on big and expensive and too much paperwork satellites, not on the new space version of cheaper, less doc and simpler tech, which really looks interesting, but I don't know to what exte

    • by plopez ( 54068 ) on Friday May 29, 2015 @09:42AM (#49798791) Journal

      Roll your log files. I smell a DevOps debacle.

    • But this just seems like something that even basic testing should have caught.
      Did they not run this thing on the ground for a few weeks?

      It was tested by the same guys that tested the Boeing 787 for only 247 days [engadget.com] ...

    • Re:Seriously? (Score:5, Insightful)

      by mnooning ( 759721 ) on Friday May 29, 2015 @09:56AM (#49798943) Journal

      As a retired QA guy, I can tell you that checking that no files can grow without bound is standard fare. Same with exercising all code for long periods of time, as you pointed out. That means there was not a single experienced QA guy on the team.

      By the way, CSV was the golden standard for many years. Given the tight compactness/memory budget that space projects have, CVS with it's small foot print might well be the logical choice.

      • by unrtst ( 777550 )

        By the way, CSV was the golden standard for many years. Given the tight compactness/memory budget that space projects have, CVS with it's small foot print might well be the logical choice.

        We're talking about telemetry beacon data written once every 15 minutes. CSV is NOT the ideal format for that, and is nowhere near compact. Naive CSV parsers are trivial, but also break very very easily (ex. embedded new lines in a quoted field; quotes in a quoted field; mixed quotes; etc). Also, while CSV can be read in a

        • by Megane ( 129182 )
          They're waiting for reboot because it froze the system completely. TFA says that the manufacturer of their "avionics board" had fixed this bug but it wasn't in the one that went up. So most likely it was a driver bug. A crash or lock-up in kernel space is a lot more problematic than just filling up a filesystem. And apparently they had scheduled an upload of the fix, but the satellite crashed right before the comms window. So now instead of a solar sail, they have a solar brick.
    • by petes_PoV ( 912422 ) on Friday May 29, 2015 @10:08AM (#49799071)

      when some bug like this makes it through testing

      Testing? what testing? If it compiles, it works. Every hacker knows this.

      I have to say, when I read that the spacecraft ran Linux and had died, I naturally assumed that someone had left the auto-update enabled and it was busy trying to apply about 50 million kernel patches.

    • you would think they would have test ran the softwhere for up-time stability for weeks or months. then again being Linux crashing one piece of software should not cripple the system.i think they have a bigger issue.
    • by prefec2 ( 875483 )

      In embedded systems, you do not go for testing (alone), you verify the system. And you certainly do not use dynamic data management, neither in RAM nor in storage (which is often the same). So you do not append something to a file. You can overwrite something in a file, but not append. And why store that data on the device anyway. An incredible whackjob, an epic fail.

  • CSV (Score:5, Insightful)

    by Anonymous Coward on Friday May 29, 2015 @09:26AM (#49798647)

    I know the average IQ at /. has gone down over the years, but I think the explanation of what a CSV file is is slightly too much dumbing down.

    • Re:CSV (Score:5, Insightful)

      by ArcadeMan ( 2766669 ) on Friday May 29, 2015 @09:31AM (#49798697)

      I think the "32 megabytes—roughly the size of ten compressed music files" part is even more insulting.

      • Re:CSV (Score:5, Insightful)

        by gstoddart ( 321705 ) on Friday May 29, 2015 @09:34AM (#49798723) Homepage

        Honestly, I'm surprised they didn't try to define space, Linux, and solar.

        This sounds like someone failed to run a bench test where the system was up and running for an extended period of time.

        Which strikes me as utterly bizarre.

      • Can I get that in Libraries of Congress? I mean damn, how am I supposed to really know how big it is?

        • Oh, 32MB would be like pico or fempto-LOCs (possibly even less) ... it's slightly less than 4 empty .xlsx fles (which are like CSV files, but different ;-)

          You can buy about 1000x that amount of storage in the express checkout at Wal Mart for under $10.

          • by ihtoit ( 3393327 )

            yeah but the track width on a 32GB SD card would last about 3 seconds in space before a charged particle zaps across it and blows the whole deal.

      • Re:CSV (Score:5, Interesting)

        by Megane ( 129182 ) on Friday May 29, 2015 @09:54AM (#49798923)

        To be fair, that was copypasta from TFA. And they carefully omitted the next sentence: "The manufacturer of the avionics board corrected this glitch in later software revisions. But alas, LightSail’s software version doesn’t include the update."

        That still doesn't excuse a problem that would have been found by bench-testing the thing for a few days before sending it up. Nor does it excuse constantly appending one file to store data in an unattended system. Also, anything that JPL sends up has a backup channel that can push that little red button on the main computer. All they can do now is hope for cosmic rays to reboot it randomly. At least it's in LEO and not zipping off into interplanetary space.

        In the meantime, the team is looking at several fixes to work around the software vulnerability once contact is reestablished. One is a Linux file redirect that would send the contents of the troublesome beacon.csv file to a null location, a sort-of software black hole. Lab testing on this fix has been promising—over a gigabyte of beacon packets have already been sent into nothingness without a system freeze.

        Well, isn't that special. Now they test it. So if they can just link it to /dev/null, did they really even need that data? It's always fun to cause a mission to fail by recording data that wasn't even needed.

      • I think the "32 megabytes—roughly the size of ten compressed music files" part is even more insulting.

        True that. It should have been in Libraries of Congress.

      • I think the "32 megabytes—roughly the size of ten compressed music files" part is even more insulting.

        Especially when the -- well known -- standard unit of measure is "Libraries of Congresses".

      • It's just a copy/paste from TFA which is apparently targeted at someone with no technical knowledge what so ever.

      • what's a music file...doesn't everything just stream in as needed?

      • I assume you mean the part where they think any of us listen to MP3s of music at such a shitty bitrate? ;)

  • Re: (Score:2, Insightful)

    Comment removed based on user account deletion
    • Re: (Score:2, Interesting)

      by Anonymous Coward

      I say professional because NASA screwed up a few years back with a probe to Mars when two systems attempted to communicate. One "spoke" in Kilometers, the other "Miles".

      Actually this particular failure wasn't as obvious of an oversight as you may think. The reason it happened was because in an existing system one particular set of parameters were logged in miles since they weren't responsible for flight control (which NASA mostly uses metric for). Later on portions of this design were reused and an engineer

    • Actually, NASA had a "file system full" problem on one of the Mars rovers, almost exactly the same problem that Lightsail has [extremetech.com]. Fortunately they were able to fix it remotely.
  • How embarrasing (Score:4, Insightful)

    by Tyrannosaur ( 2485772 ) on Friday May 29, 2015 @09:29AM (#49798673)

    You'd think that something as small as 32MB would have been tested before they launched the thing... It doesn't sound like it takes very long to fill up 32MB either

    • by sycodon ( 149926 )

      Sounds like it's pretty much a transponder in an airplane.

      I wonder why they are even logging this information?

    • It doesn't sound like it takes very long to fill up 32MB either

      Nope. Just ten compressed audio files would do it.

  • I'll never understand how groups (Especially NASA) can spend millions, or even BILLIONS on projects like these and not even complete the sorts of rudimentary testing that those of us in the professional software fields have to do every day. Ok, this computers going into space and going to run for days/months/years... whatever... so hey, maybe we should boot it up while it's still on the ground and see if it'll run for a couple of months without crashing first?

    One of the mars rover had the same problem. It w

    • Re:UAT (Score:5, Insightful)

      by itzly ( 3699663 ) on Friday May 29, 2015 @09:44AM (#49798821)

      Well, how do you test it before you're happy ? If the beacon is 40 bytes, and transmitted every 15 seconds, it would take half a year before you fill up 32 MB. That's a long time for testing.

      This is the kind of mistake you shouldn't even make in the first place.

      • by bondsbw ( 888959 )

        One way to test that is to simulate time. A simulation wouldn't need to wait 15 actual seconds, it could speed up time such that transmissions run immediately after the last, until the test has surpassed the expected lifetime of the mission.

        If this were able to be done once every millisecond instead of once every 15 seconds, they would have run across the bug within 14 minutes.

    • by Lumpy ( 12016 )

      The problem is they hire programmers from EA.

      it compiled! OMG! Launch it!

    • Re: (Score:3, Informative)

      by Anonymous Coward

      First off.. LightSail isn't a NASA mission.. it's a low budget cubesat and cubesats tend to trade risk for rigor.

      NASA does run stuff for days/weeks/etc in testing. And you'll note that the Mars rover flash file system thing was able to be recovered from, thanks to smart people at JPL realizing that you always need a way to recover. This is not necessarily the case for cubesats, often built by enthusiastic grad students whose hair is not yet grey from living through near and actual disasters in flight proje

      • Re:UAT (Score:5, Informative)

        by grimmjeeper ( 2301232 ) on Friday May 29, 2015 @10:43AM (#49799389) Homepage

        Speaking as an engineer working on software that is on the Orion spacecraft, I can say that rigorous testing is budgeted into the project from the beginning because it helps to avoid most of the problems like this. The testing that goes on with flight software is orders of magnitude more than you find for a traditional commercial product. You have to. The consequences of failure are, obviously, a lot more significant.

        That being said, it's impossible to catch every single possible bug, especially as systems get more and more complex. But there are strategies that help reduce your risk. For example, you don't just run off to kernel.org and throw the latest stable release on a board. You pick operating systems that are maybe a bit harder to use (i.e. limited in what they can do) but are far better suited to real-time embedded work. And you certainly don't blindly append to a file without verifying that you're not going to overflow your space. And you always have an automated recovery plan for any dynamically allocated space in the event of an overflow.

        This kind of failure is caused by amateurs making amateur mistakes. It was caused by application programmers who don't understand the consequence of failure in a constrained environment where you can't just click a mouse to restart the program. It was caused by poor planning and a lack of understanding of the environment in which they were designing. This was caused by hiring coders instead of experienced engineers. It was caused by trying to do it cheap rather than spending the money to do it right. They got what they paid for.

    • by Maow ( 620678 )

      I'll never understand how groups (Especially NASA) can spend millions, or even BILLIONS on projects like these and not even complete the sorts of rudimentary testing that those of us in the professional software fields have to do every day.

      This is not a NASA project, so you've made a stunningly basic error in your first sentence. Not looking too good for attention to detail for someone "in the professional software field".

      Regardless, if you want to see how NASA does software, or for anyone even remotely interested in how the best practices for true mission-critical software gets written, you can't find a more interesting story on the creation of space shuttle software [fastcompany.com]:

      The right stuff kicks in at T-minus 31 seconds.

      As the 120-ton space shuttle

  • It came across a tachyon eddy and is at warp speed on it's way to the Cardassian homeworld.
  • ... the ability (small code here) to power cycle and come backup in maintenance mode where it doesn't do anything on its own except receive diagnostic commands.

    The computer also needs a sibling for fail-over.

    There may be reasons those were left out that I would agree with.

    I sure hope they can get this puppy lined out.

  • cat beacon >> beacon.csv
    instead of....
    cat beacon > beacon.csv

    oops.

  • SpaceX can retrieve it long enough to hit the reboot button...

  • by jpellino ( 202698 ) on Friday May 29, 2015 @10:11AM (#49799093)
    and not as a verb. Using "hope" as a verb in spaceflight hasn't always gone very well in the past.
  • by Anonymous Coward on Friday May 29, 2015 @10:18AM (#49799155)

    Shaka, when the walls fell

  • Coming up next on Slashdot... Linux is an operating system, kinda like Windows or Mac OS, but built by a bunch of neckbeards, and uses about the same amount of space as 10 compressed music files. Some versions use less, some use more depending upon how it's configured.

    Wow; I think it's time to move on from Slashdot. Taco would be spinning in his grave, assuming he was dead.

  • A satellite running Linux is contingent upon a spontaneous reboot to function again? Great, now we'll never hear from that satellite again.

    Clearly, the plan should have been to run the device on Windows 98. That way, it would only be out of commission for 49.7 days.

  • We don't normally test our spacecraft systems, but when we do, we do it after launch.

  • by cve ( 181337 ) on Friday May 29, 2015 @10:45AM (#49799397)
    Last week a week is approximately the amount of time between new 'Keeping up with the Kardashians' episodes saw the successful launch of the Planetary Society's LightSail spacecraft, the solar-powered satellite that runs Linux Linux is like Windows for smart people and was crowdfunded on Kickstarter Kickstarter is a place to buy digital watches . The spacecraft worked flawlessly for two days, but then fell silent, and the engineering team has been working hard on a fix ever since. They've pinpointed the problem: a software software is like what you download from the app store glitch. "Every 15 seconds, LightSail transmits a telemetry beacon packet a telemetry beacon packet is like a tweet . The software controlling the main system board writes corresponding information to a file called beacon.csv. If you're not familiar with CSV files, you can think of them as simplified spreadsheets—in fact, most can be opened with Microsoft Excel. As more beacons are transmitted, the file grows in size. When it reaches 32 megabytes—roughly the size of ten compressed music files 32 MB is also approximately the size of 13 iPhone 6 selfies —it can crash the flight system The satellite's twitter feed blows-up ." Unfortunately, the only way to clear that CSV file is to reboot LightSail Like holding down the power and home buttons on your iPhone at once -- don't try this unless instructed by someone at the Genius Bar . It can be done remotely, but as anyone who deals with crashing computers understands, remote commands don't always work Like when Siri plays Billy Ray instead of Miley . The command has been sent a few dozen times already, but LightSail remains silent. The best hope may now be that the system spontaneously reboots on its own Like when drop your phone in the pool and it still works .

Sendmail may be safely run set-user-id to root. -- Eric Allman, "Sendmail Installation Guide"

Working...