Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Space Science Hardware

What's Inside the Mars Rovers 458

Captain Zion writes "Space.com has a story about the hardware and software of Mars Rovers Spirit and Opportunity. Basically, they're radiation-shielded, 20MHz PowerPC machines wirh 128Mb RAM and 256Mb of flash memory, running VxWorks. I wonder if I could make a nice firewall with one of these for my home network..."
This discussion has been archived. No new comments can be posted.

What's Inside the Mars Rovers

Comments Filter:
  • by CaptainAlbert ( 162776 ) on Thursday January 29, 2004 @11:07AM (#8123534) Homepage
    > Does a 20mhz processor really need 128mb of ram?

    A processor of any speed doesn't need RAM of any size.

    The application you want to run needs both processing power and memory. How much of each? Depends on the application.
  • Comment removed (Score:4, Informative)

    by account_deleted ( 4530225 ) on Thursday January 29, 2004 @11:08AM (#8123542)
    Comment removed based on user account deletion
  • by Cutriss ( 262920 ) on Thursday January 29, 2004 @11:09AM (#8123555) Homepage
    Basically, they're radiation-shielded, 20MHz PowerPC machines wirh 128Mb RAM and 256Mb of flash memory, running VxWorks.

    Mb = Megabits
    MB = Megabytes.

    The article writes out megabytes, so MB should be used, not Mb!
  • by the real darkskye ( 723822 ) on Thursday January 29, 2004 @11:15AM (#8123601) Homepage
    The CPU is fabricated to withstand the radiation, a brief summary can be found here [nasa.gov] or by googling
  • by PhuCknuT ( 1703 ) on Thursday January 29, 2004 @11:16AM (#8123607) Homepage
    Both.

    They have extra shielding on the outside, and the electronics on the inside are designed to disipate sudden charges created by radiation hits.
  • by Hiroto. S ( 631919 ) on Thursday January 29, 2004 @11:16AM (#8123611) Journal
    I googled across following presentation with a little more details.

    Flying VxWorks to Mars [colorado.edu]

  • by shawnce ( 146129 ) on Thursday January 29, 2004 @11:18AM (#8123635) Homepage
    A lot is done with extra shielding but often radiation hardened chips use larger feature sizes then modern equivalents. The larger the features the more resilient they can be to particle/energy hits. Basically they are harder to damage permanently.
  • Re:Ouch (Score:4, Informative)

    by Quasar1999 ( 520073 ) on Thursday January 29, 2004 @11:19AM (#8123646) Journal
    AH dude... Microsoft is totally different... VxWorks is based on the fact that all memory is linearly accessable.. there is no memory protection... that works great when all your apps are written in-house, and you know the timer ain't gonna trash the motor controller...

    But try that on an OS for a desktop system, and your email program just may blow up your paint program... (remember Windows 3.1's stability? Make that 10x worse)... You can't use VxWorks for the desktop as Windows is used today... it needs a lot of protection... The ease of upgrading is due to the lack of protection...
  • by peter303 ( 12292 ) on Thursday January 29, 2004 @11:19AM (#8123649)
    A leading hypothesis is that flash memory overflow caused Spirit to be shut down for two weeks. Either it was a failure in the memory chips or OS software garbage collection. They are purging and patching now. A few days of testing and perhaps Spirit is active again.

    The lockup happend just as they were going to drill into the rock they've been sitting in front of for nine days. Perhaps there was drill issue too. When the rover memory crashed, it tried to reboot its computer at least a hundred times.
  • Not 20Mhz (Score:2, Informative)

    by kuyttendaele ( 115164 ) on Thursday January 29, 2004 @11:21AM (#8123658) Homepage
    But 20 MIPS
  • by Anonymous Coward on Thursday January 29, 2004 @11:23AM (#8123673)
    you seem to assume that real time OS are faster than non real time Oses, that's usually not the case. the difference between the two is that a RTOS has a guarenteed response time, not a faster one...

    it is not uncommon on so hard real time system to disable processor cache, it makes the processor slower, but the response time to an interrupt is easier to calculate. In RTOS interrupt latency must be PROVEN not to be longer than a constraint.
  • Re:Self-warming (Score:5, Informative)

    by JDevers ( 83155 ) on Thursday January 29, 2004 @11:24AM (#8123678)
    If I'm not mistaken, virtually all probes have some sort of radioisotope heater...

    Radioactivity is NOT radioactivity when you are considering things like this. Saying the people who don't want nuclear powered rockets should hate this as well or else they are hypocrites is tantamount to saying that the people who don't like oil spills should bitch about how some motor oil ALWAYS stays in the plastic container it is shipped in. Not quite the same problem. Afterall, these things aren't much more radioactive than a Coleman lantern wick or a smoke detector element...
  • Re:Redundency Check? (Score:5, Informative)

    by vofka ( 572268 ) on Thursday January 29, 2004 @11:29AM (#8123714) Journal
    If I recall correctly, the Shuttle has 5 GPC's (General Purpose Computers), three of which are "online" at any one time.

    The online GPC's each carry out the same set of calculations (potentially each uses code designed to do the same thing, but written by different programmers), and they compare each others results. If any single GPC is considered to be too far wrong, the offline GPC's submit their answers. The three GPC's that are in closest agreement then become the new online GPC's, and the remaining two go offline. The GPC's can reboot themselves if they are too far out of whack, if they fail in one of the "results elections", and of course when they are told to do so by the crew.

    Also, whenever a GPC is sent offline by one of the others, a specific caution indicator (and potentially the master caution indicator and klaxon) is activated, and the relevant error codes are shown on one of the forward CRT's. The error codes, along with other information such as the currently running program and the current mission phase, determine the crew's actions. Actions can be as simple as disabling the master caution klaxon for the current alert, all the way to hand-checking certain results and manual GPC restarts.

    This is all from memory (from about 5 years back), so some of this may have changed recently, particularly on Atlantis with the "glass cockpit" upgrade that happened 18 months or so ago, but the general gist should be about right (and I'm sure I'll soon know if it isn't!!)
  • Re:Wait a second... (Score:5, Informative)

    by Rootbear ( 9274 ) on Thursday January 29, 2004 @11:32AM (#8123748) Homepage
    Actually they are running at 20MHz. I've seen several write ups which clearly state that. The RAD6000 can apparently run at up to 33MHz, with a claimed 35MIPS. The rovers are "underclocked", probably due to power budget concerns.

    Go to
    http://www.iews.na.baesystems.com/space/rad600 0/ra d6000.html
    and click on the rover picture to get a PDF brochure, which gives the 33MHz/35MIPS figure.

    Rootbear
  • by britt ( 50456 ) on Thursday January 29, 2004 @11:32AM (#8123750) Homepage
    It wasn't an overflow. They ran out of inodes. They had lots of space, but no more inodes, which is why deleting files should fix things up.

    The first intergalactice filesystem bug
  • by GileadGreene ( 539584 ) on Thursday January 29, 2004 @11:34AM (#8123773) Homepage
    radiation-shielded, 20MHz PowerPC machines

    No, they're not.

    The processors in MER are RAD6000 [baesystems.com]'s, which are radiation-hardened versions of the RS/6000, the predecessor to the PowerPC (see this [baesystems.com] for details). The RAD6000's younger brother, the RAD750, is indeed a rad-hardened PowerPC.

    As an aside, there is a big difference between a radiation-shielded processor and a radiation-hardened processor. Shielding implies just sticking some kind of rad-absorbent material between the processor and the environment. A rad-hardened processor is actually manufactured in a different way - different gate layout, different design rules, often different materials (Silicon-on-Insulator is popular). These things are done to minimize or prevent the effects of single-event upsets (when a bit is flipped by high-energy particles) and single-event latchups (which basically turn a couple of gates into a glorified short-to-ground). The materials changes may also improve the overall total dose tolerance of the processor. The work required for redesign is one of the reasons that space-qualified rad-hard processors lag the commercial market. The NASA Office of Logic Design [klabs.org] has some good papers on space processors available online if you're interested in learning more.

  • by Anonymous Coward on Thursday January 29, 2004 @11:35AM (#8123775)
    Writing to flash takes a long time. The difference between RAM and flash will be very noticable, even for a 20MHz processor. For example, a write to the Fujitsu "high-speed" flash devices takes 12.6uS (per 16-bit word), according to Fujitsu's specs. For a 20 MIPs processor, 1 instruction is 50 nS, so modern RAM can "keep up" with it. In this situation, writing to flash would be more than 250 times slower than RAM, even with very fast flash.

    Disclaimer: Even faster flash devices may exist, but I don't know about them.
  • Re:Self-warming (Score:5, Informative)

    by The Fun Guy ( 21791 ) on Thursday January 29, 2004 @11:36AM (#8123795) Homepage Journal
    Radioisotope thermoelectric power units need to be hot enough to allow for electricity to be generated by thermocouples placed between the unit and the heat sink (space). A quick Google search gives 200-500 watts of power generated from multiple interleaved stacks of plutonium-238 or strontium-90, average radioactive source strength of around 50,000 curies, depending on design.

    Radioisotope heaters use much less material, as they only need enough heat to keep the warm electronics box above -40F or so. From the Environmental Impact Statement in the Federal Register ([wais.access.gpo.gov][DOCID:fr10de02-54]):

    "Each rover would employ two [calibration] instruments that use small quantities of cobalt-57 (not exceeding 350 millicuries) and curium-244 (not exceeding 50 millicuries) as instrument sources. Each rover would have up to 11 RHUs that use plutonium dioxide to provide heat to the electronics and batteries on board the rover. The radioisotope inventory of 11 RHUs would total approximately 365 curies of plutonium."

    Nothing you'd like to swallow, but still, much smaller than a radioisotope power unit.
  • No, not PowerPC... (Score:3, Informative)

    by noselasd ( 594905 ) on Thursday January 29, 2004 @11:37AM (#8123796)
    No. Its a RAD6000 CPU. Not entierly your stock PowerPC chip.
    RAD6000 [baesystems.com]
  • by emil ( 695 ) on Thursday January 29, 2004 @11:38AM (#8123808)

    ...as the substrate of the chip, rather than a silicon wafer, so the chip was a "sapphire" chip rather than a silicon chip (although doped silicon could then be used to form transistors, as could Gallium Arsenide or Germanium, through the regular lithographic process).

    This is the classic "Silicon On Insulator." IBM has a process of embedding a layer of glass beneath the surface of a standard silicon wafer, allowing SOI using silicon substrates. This and their work with copper set them apart from the other large silicon transisitor foundries (TSMC, Intel, etc.).

    The processors on the rovers are probably SOI, but I don't know which process is used.

  • by brain159 ( 113897 ) on Thursday January 29, 2004 @11:42AM (#8123841) Journal
    Flash "disk" controllers take care of wear-levelling automatically; even though you're writing to the same logical block number on the disk, it's actually not the same spot in the flash every time.
  • by questamor ( 653018 ) on Thursday January 29, 2004 @11:49AM (#8123916)
    One chart I've seen of IBM's POWER chips and their derivatives had an entire section devoted to the PPC chips, and the RAD6000 wasn't included in these, but was an offshoot to the side, branching just before the PPC601.

    By all other standards however, they seem to be closely related in time and technology to the 601, which powered Powermac 6100, 7100, 8100, 9150, 7200, 8200 and 7500 PPCs, as well as I think, one of IBM's thinkpads.

    Those 601s are a very nice chip, and quite underestimated at what they can do at low clock speeds. If the RAD6000 is anywhere similar, I can understand why it was picked.
  • RAD6K (Score:5, Informative)

    by Anonymous Coward on Thursday January 29, 2004 @12:03PM (#8124069)
    I am an engineer that works with the RAD6K processor boards. A couple of observations here.

    1. The RAD6K really does run at 20 Mhz. They're creakingly slow. They're spec'd to run up to 33 Mhz, but the customer can get them to clock at lower speeds (I've seen them run at 12.5 Mhz). The only drawback is the PCI bus is also clocked as the same speed as the CPU. This is a mixed bag - but a slower PCI bus helps improves signal integrity and decreases power consumption.
    2. The board is PCI, but NOT compact PCI. There is a proprietary PCI connector and a proprietary PCI backplane. You cannot plug commercial PCI products unless you have an adapter to interface to the proprietary PCI connectors.
    3. For those who are not aware, there are three types of memory being used on the rovers. There is the SRAM (the RAD6K boards use SRAMs, not DRAMs), the EEPROM, and apparently, FLASH RAM. The EEPROM and the SRAM are on the processor board itself - there is probably more EEPROM memory in the system on another board. The EEPROM usually holds the flight code, and there are usually two copies. An original version that was launched with the spacecraft, and one patched version made possible via uplinks.
    4. I am amazed at the presence of FLASH RAM's. I am not aware of any rad-hardened FLASH RAM devices for spaceflight use. In addition to radiation hardness, the device must be made reliable with an approved QML-Q or V manufacturing flow. Radiation hardness is just icing on the cake, but the key is that the device must be reliable to withstand temperature extremes, shock and vibration. So, I have yet to see a FLASH RAM device that can be used. I am aware of the Chalcogenide based RAM's which are essientially uses the same substrates on CD-ROMs as memory cells. These products are hard to come by right now and are a high risk because we don't have sufficient data and flight heritage. A catch-22 in flight design is if it hasn't flown before, we don't want to fly it. But at some point, someone has to fly the first generation (someone who is willing to take a huge risk). Anyway, the FLASH RAM's on the rovers are in all likelihood upscreened commercial products. In other words, a mass buy of an entire lot of dies of commercial FLASH RAM's may have been bought, packaged in-house or through a vendor, and then screened for reliablity at extended specifications. This is not the same as the manufacturer who can guarantee the specs by design by designing it from the outset with increased reliability in mind.
    5. Radiation shielding? Minimal at best! The RAD6K shields its SRAMs by placing it on the underside of the processor board and orienting it such that the particles hit the processor side of the board instead of the RAM side of the board. There is some degree of radiation shielding for particles of sufficiently low energy. The truly high speed particles are going to get through and the only thing that will truly stop them is shielding whose thickness is measured in feet. That amount of shielding is too heavy for launch. The best we can do is mitigate the effects of radiation by guaranteeing devices can withstand a certain amount of radation dosage (measured in kRads) and design for latchup protection (latchup is a parasitic condition in which an ionizing particle impacts a transistor structure in a way that causes a SCR to be formed and a runaway current condition is initated leading to the device being burned out by high currents). Radiation effects in the form of SEE's (single event effects) such as bit flips can be mitigated by redundancy and voting circuits, memory scrubbing, and error checking using checksums/CRC's.
  • Re:Wait a second... (Score:3, Informative)

    by Have Blue ( 616 ) on Thursday January 29, 2004 @12:12PM (#8124169) Homepage
    Superscalar refers to multiple execution units, not pipelining.
  • by LordNimon ( 85072 ) on Thursday January 29, 2004 @12:13PM (#8124178)
    It can't be intergalactic if it's in the same solar system. It could be intragalactic, but a better word is interplanetary.
  • Re:RAD6K (Score:5, Informative)

    by demachina ( 71715 ) on Thursday January 29, 2004 @12:58PM (#8124661)
    "A catch-22 in flight design is if it hasn't flown before, we don't want to fly it. But at some point, someone has to fly the first generation (someone who is willing to take a huge risk)."

    Or you fly it as a non mission critical experimental payload which is what we did back in the day I worked on avionics. You fly it as an experimental package so it gets the stress but if it breaks either its not in the mission critical loop, or if it is in the loop you can switch back to proven hardware. I kind of assumed this would be a standard part of qualifying electronics for space flight as well though its obviously a lot more expensive. Its not feasible to test it on Mars due the expense but you could test it in geostationary orbit where it will get lots of radiation and temperature extremes, as well as launch vibration and G's.
  • by barawn ( 25691 ) on Thursday January 29, 2004 @01:05PM (#8124743) Homepage
    I would have thought that the memory would be swapped out

    Swapped out? To where? You're being a bit recursive here. The operation failing is the initialization and reading of the flash. You can't swap to flash while you're initializing it.

    However, I doubt they implemented virtual memory on the thing. It's far too much overhead in the OS to actually implement paging, etc. They were probably just very, very careful about the amount of memory in use at any one time, as with most embedded systems. Someone, however, didn't check that in a massively fragmented flash filesystem, the directory read wouldn't take up the entire RAM. Oops.
  • Re:Wait a second... (Score:3, Informative)

    by Thuktun ( 221615 ) on Thursday January 29, 2004 @01:13PM (#8124828) Journal
    deitel99: The machines aren't as slow as the top post says... they don't run at 20MHz, they are "capable of carrying out about 20 million instructions per second". Depending on the complexity of the instructions, the processor actually runs several times faster than 20MHz.

    danheskett: That's an excellent point. A lot of people are thinking instruction = 1 cycle. The real world is that it's not unusual for an instruction to take 2, 4, 10, or even 100 cycles. The reality of the matter is that instructions can be anything from a single two bit sum to a floating point division. I see this mistake a lot...

    You both assume that the one who wrote the article didn't make the same mistake in the opposite direction.

    In this article about the Stardust probe [spaceflightnow.com], the RAD6000 is said to be "a radiation-hardened version of the PowerPC chip used on some models of Macintosh computers" which "can be switched between clock speeds of 5, 10 or 20 MHz".
  • by rshadowen ( 134639 ) on Thursday January 29, 2004 @01:55PM (#8125335)
    The RAD6000 is based on a POWER CPU called RSC (RIOS single chip) - it's not a PPC chip. This was a design that consolidated the 5 - 6 chip RIOS processor complex onto a single lower performance die for low end workstations. I worked on the development team at IBM.

    The RSC design played a key role in bringing Apple and Motorola together with IBM to create the PowerPC line of CPUs. The 601 was the first PPC and was basically a redesign of RSC. It supported both POWER and PPC architectures, although there were deviances from PPC since the architecture was actually being defined at the time we were working on the chip.

    The RAD6000 version of the design happened because IBM wanted to pursue some government contracts, so had the RSC specially qualified. Another group then took the design and performed the radiation hardening.

    After Pathfinder we had some cool IBM/Mars posters hanging around the building, but oddly enough they vanished very quickly...

  • by egomaniac ( 105476 ) on Thursday January 29, 2004 @02:33PM (#8125774) Homepage
    Are all the millions of dollars spent on full for the rockets?! Why the fuck is my home system more advanced than NASA's? You'd think they'd of at least used a design similar to a Dual G4 or maybe even a Dual Xeon or P4. Can someone explain to me why NASA gets millions of dollars if they are using 1990 equipment?

    A) They don't need to play Doom 3 up there. 20MHz is sufficient for almost anything you would want to do on Mars. Why send up more than you need to?

    B) Your computer runs far hotter and consumes far more power than the Mars rovers do. Power is at a premium when you're millions of miles away from the nearest electrical outlet.

    C) The rovers are radiation-hardened. Your system is not. Your computer would last about twelve minutes in space before it locked up. A big part of radiation hardening is using larger (and therefore slower) transistors.

    D) It takes years to certify a particular piece of equipment as spaceworthy. NASA isn't going to just pop in the latest and greatest Athlon and assume it will work "because the last one did". That means that anything flying into space is automatically going to be at least a few years behind the curve.
  • by greygent ( 523713 ) on Thursday January 29, 2004 @02:43PM (#8125907) Homepage
    The B-52G/H models used 3 ACUs each composed of (if I recall correctly) 4 Z-80 processors. i'm not sure about the number of CPUs in each systemn, but I am sure of the processors. These systems were loaded (very slowly) via tape, and they ran the RADAR and bombing computer subsystems.

    I'm not sure which computers this article is referring to, but it may have been an earlier revision of the computer system in the B-52.

    The only other computer system I can think of in the B-52s was the ECM system, and since it was highly classified, and I did not work on that system, I'm not overly sure.

    For a little extra karma whoring, i'll relate a funny story. Our base, which shall remain nameless, would fly early morning ECM sorties while people were driving into work.

    Their target signals for jamming? The radar guns the cops used in their speed traps to try and catch our folks driving into work. Apparently, the cops finally caught on and complained at the high number of equipment failures with their radar gun equipment.
  • by Anonymous Coward on Thursday January 29, 2004 @03:13PM (#8126278)
    I'm not defending VxWorks, but my preference for real-time design is to NOT use dynamic memory allocation (malloc) or dynamic anything else except during startup. If I need to protect a region of memory, I'll setup the MMU myself (statically with no page swapping) since their basic MMU package is not very robust.

    BTW: VxWorks 5.5 does support FAT32.

    --Robert
  • by Anonymous Coward on Thursday January 29, 2004 @03:21PM (#8126379)

    If you are allowed to pour you hart out about how vxworks sucks for your router/broadband project then can I offer some clues and guesses as to why NASA might have chosen it?

    The article stated that NASA needed to be able to patch code while it was running, a very reasonable demand as one cant just get the rocket back to earth and start the whole thing over.

    Qoute: "In addition to VxWorks' reliability, the system allows users to add software patches -- such as a glitch fix or upgrade -- without interruption while a mission is in flight. "We've always had that [feature] so you don't have to shut down, reload and restart after every patch,""

    Iirc from the nat. geographic documentaries the spirit code was still being developed while the rover was on its way to the planet. (How is that for missing a deadline, they must have learned this one from the game industry ;-) ) I may be mixing things up with the orbiter though. This would be a perfect explanation as to why there apears to be a bug/situation which would have been found with running long tests collecting lots of "science data" on earth.

    Now you go on about the vxWorks malloc implementation (dynamic memory managment features). NASA may actually have little need for having the OS manage the memory. It apears most of the colected data is stored in a filesystem on flash ready for pickup by a comunication system sending it back to earth (By any available orbiter). Now the problem with fragmentation is that it is likely to get worse as soon as it starts, but with 128 MB of ram and *very* few procecsses I dont see how you could get a problematic or fatal amount of fragmentation unless there are multiple procecses collecting a lot of data at the same time in very high resolution mallocinging tiny amounts (less then a page) at a time with little memory left to spare. Neither of these conditions seem to be likely to be met in any big way, but I havent seen this codee ofcourse.

    Now you mention tcp/ip is bad, and I fully believe you. (eventhough my motorola cable modem claims to be running vxworks, consumer broadband must be at least possible with this rtos ;-) ). But, obiously, NASA is not gona use tcp/ip as a protcol and the drivers for the radio hardware are not gona be a standard part of vxworks. I think these pieces of code for sending stuff over high frequency duplex radio data links have been laying around at nasa (or defence contractors) tested and proven on many applications just like many other components of this machines software. This bring me to my other guess as to the choicec of vxworks, the rover has to pretty much plot its own course over the planet. It seam possible that the rover now on mars uses the excact code for that that early prototypes used while spending months in american deserts. VxWorks might have been the os of choice by the prototye engineers, but porting this code to another OS might introduce bugs which goes for all the code nasa already had (tested, and perhaps even proven in space)

    Now unlike VxWorks I have run QNX, (as has everyone who has had the time to go over here [qnx.com], have a look If only to brag about using another os on your desktop). While I agree that it is a very very nice os, I could very well see it not having any features over vxworks that would make it worth it for nasa to port all their current vxworks stuff over only to have to test it again while still not knowing for sure if it will run *excactly* like on vxworks. And ofcourse any features vxworks did lack (or rather did have while not being supposed to ;-) ) could have been fixed by nasa coders in no time just like you did only with the help of the source.

  • by Anonymous Coward on Thursday January 29, 2004 @03:22PM (#8126398)
    Your comments about VxWorks show me that you don't have a well establish practical Real Time OS (RTOS)experience and are trying to to compare an RTOS to a Server or Desktop OS. That is quite a bit misleading. For example,
    consider the use of malloc()/free(). In an RTOS it does not much sense to dynamically allocate and free memory at run time. Same thing applied to other sytem resources, like tasks/processes. This takes quite a bit of time from a Real Time point of view to create new tasks, semaphores, objects, memory allocation, etc.. Most RTOS engineers, who know what they are doing, pre-allocate all the memory they need at boot time, start all the processes they will need, etc. In order words you allocate all your system resources at boot time so you keep your real time characteristics. An RTOS system is static. no dynamically process allocation, memory allocation (i.e. unexpectantly running out of memory), etc. You start everything up and account for your needs. A very different model from probably what you are used to in a Desktop/Server OS. Think about it as static appliance. Give me one good reason why you can't preallocate memory and all the tasks you will need?

    Yes memory protection has its benefits. Speed is not one of them. Remember this is an RTOS. How many RTOS do you know of that support memory protection? hmm.. 1 I believe. Which is VxWorks AE. Which you say is slow. gee. If you want a low latency fast RTOS, memory protection quickly falls off your "must have" list.

    So VxWorks supports FAT16. So that makes it suck?
    It also supports FAT32, NFS, a Flash Filesystem, etc. So all those points make it such too. I see your reasoning. ;-P

    VxWorks, every other true RTOS, does not use the system clock exclusively to multitask a you say. In fact, round robin multitasking which uses the system clock is turned off by default Unlike a server/desktop OS, a context switch is not generated strictly via an interrupt (system clock, device interrupts). context switches can occur at any time as your code generates events. For example, giving a semaphore to unblock a test, sending message to a message queue which unblocks a waiting receiver, trying to get hold of an object which is in use (this would block you). This does not require a high latency and it almost immediate. This is because all code runs in processor supervisor mode. i.e. no traps to communicate from one task to another. This goes along with the no memory protections reasoning. It has benefits. Let me guess you didn't read the manual or take a vxWorks 101 class, right?

    Nasa has been using vxWorks primarly on most of their projects since 1980! I think that says alot by itself. Thank you for your "expert" opinion in helping people judge the quality of an RTOS.
  • by AaronW ( 33736 ) on Thursday January 29, 2004 @05:20PM (#8127813) Homepage
    While I would agree with you that avoiding malloc and preallocating memory is the way to go, but it is not always possible. In my case, we are using various 3rd party libraries, and changing them to use static memory allocation would be prohibitave. In at least one case, the third party library source code was not available. Also, in many cases the dynamic nature of some algorithms requires dynamic memory management. You cannot statically allocate everything, especially in a limited memory environment.

    I know we're not the only ones to have been burned by Wind River's malloc. I know several major companies that also had to replace Wind River's code.

    As far as being able to dynamically replace code, VxWorks isn't alone in that. Numerous other RTOSes out there can do the same thing, including QNX. QNX even supports the concept of a hot standby process to take over if the main process dies.

    To give you an idea about how Wind River's malloc works, they keep a sorted linked list of fragments from the smallest to the largest. When you try and allocate a block, it walks the linked list until it finds a block large enough. Likewise, when you free a block it checks if it can coalesc the block with a neighboring block. It then goes through the linked list looking for a slot to insert the free block.

    Yes, VxWorks may have been around since the 80's, but that's part of the problem too and it is showing its age. In the 80s embedded processors typically did not have MMUs. Now MMUs are quite common in the more powerful embedded processors.

    You say you can't have low latency and memory protection? QNX proves that you can. It is low latency and *very* robust. If your driver dies, no problem, restart it. Timesys Linux also has a very low latency, although not as low as QNX. Timesys also has an interesting feature where you can guarantee CPU and networking resources. I can schedule a task to be guaranteed 5.8ms of execution every 8.3ms and it will guarantee that that task will get the CPU time allotted to it with the desired resolution. This is without increasing the system tick rate (usually 10ms). Timesys can also schedule a task to be higher priority than an interrupt. I'm not as familiar with QNXs scheduler, but it's also quite flexible from what I've heard.

    As far as FAT, it is not a robust filesystem. It never has been. If the FAT gets corrupted or a directory entry gets corrupted it's difficult to recover. Other than possibly having 2 copies of the FAT cluster table, any corruption can be difficult to repair. If the FAT table gets corrupted, which table is corrupt and which is not? If a directory entry gets corrupted, it can be impossible to fix. For flash memory, unless you are using a device with special wear-leveling, FAT is about the worst choice since any file write that changes the size of a file requires a write to the directory entry and possibly the FAT table. If the table gets corrupted and you don't run a repair operation (which often ends up leaving orphaned files as lost clusters), the file system can happily corrupt itself to death. Why do you think every time DOS/Windows9x/ME crashed it had to repair the disk with scandisk? FAT is a poorly designed file system that was originally designed for 160K floppies and scales poorly. FAT32 is an improvement, but it's still not very robust. For flash, something like Linux's journalling flash file system 2 (JFFS2) [redhat.com]. More information on VxWorks file system support can be found here [xs4all.nl].

    Basic VxWorks information can be found http://www.slac.stanford.edu/exp/glast/flight/docs /VxWorks_2.2/vxworks/guide/ [stanford.edu].

  • by AaronW ( 33736 ) on Thursday January 29, 2004 @06:05PM (#8128378) Homepage
    Take a look at QNX. I'm not very familiar with many other RTOSes. RTEMS is another interesting one I looked at (is open source). If you do decide to go with VxWorks, make sure you get the source code. A word of warning, though, VxWorks TCP/IP support isn't all that great and performance is mediocre. It's based on an ancient BSD stack.

    QNX has a lot of reliability features built in that are not present in VxWorks, Linux, or RTEMS. It can take periodic snapshots of a task and recover if a task dies.

    Also, field debugging of VxWorks can be a real pain. It does not generate a core file you can debug later.

    Timesys has very good real time support, but the context switching time is on the order of 10us depending on the processor, QNX is better.

    QNX is more more like a unix environment with real threads and processes. QNX relies mostly on effecient message passing between tasks. QNX also has much better TCP/IP support than VxWorks, including support for IPv6. From what I can tell, development and debugging in QNX is much easier than VxWorks. QNX does not have as many priority levels as VxWorks. QNX Neutrino supports 64 priorities. VxWorks supports 256. Timesys Linux can support thousands of priorities.

    QNX also has built-in support for distributed processing as long as there is an interconnect. It also supports SMP (VxWorks does not).

    VxWorks, on the other hand, is like a single task with many threads. A "task" in VxWorks is more like a thread in that all memory is globally accessable. Any function can call any other function.

    I havn't worked with QNX or Timesys personally. I've spent the last 4 years developing for VxWorks Tornado II. The group where I work dealing with OS related issues has more or less decided to use either QNX or Timesys for our next generation software. Timesys looks good from the tools perspective and the wide Linux experience out there, plus the fact that currently our product runs on a mixture of Unix and VxWorks, plus many of the drivers we need are available for Linux and would need to be ported to QNX. Some of the key software we are using has been ported to QNX, though, and the reliability of QNX is also of interest. QNX has also won some nice design wins like Cisco. Since we make routing equipment we tend to pay attention to things like that.

    Also, beware that Wind River's support isn't very good from our experience. It basically boils down to "what support?" even with a support contract. Wind River has layed off a lot of people lately and I'm sure that hasn't helped the company. They've been bleeding money like crazy and are hurting pretty bad. They've been losing market share to other embedded RTOSes and Linux. Hell, Wind River has ported their development tools to support embedded Linux (I wonder why?).

    As far as real-time Linux is concerned, I looked at several of them. RT-Linux requires that RT tasks talk to the RT kernel and use a separate API and drivers than the Linux tasks. It's basically Linux running on top of a RT kernel. Monta Vista does not provide hard real-time, and when I last spoke with them they had no solution for priority inversion. Timesys modified the stock Linux kernel so RT tasks don't need to use a separate set of APIs from non-RT tasks and Timesys can guarantee timing, both for the CPU and networking. For soft real-time, Monta Vista seems to be pretty popular. The people I spoke with who used it said it was quite reliable but couldn't comment on support since they never needed any.

    -Aaron

FORTRAN is not a flower but a weed -- it is hardy, occasionally blooms, and grows in every computer. -- A.J. Perlis

Working...