Forgot your password?
typodupeerror
Earth Math Supercomputing Science

Same Programs + Different Computers = Different Weather Forecasts 240

Posted by timothy
from the climate-change-without-leaving-the-room dept.
knorthern knight writes "Most major weather services (US NWS, Britain's Met Office, etc) have their own supercomputers, and their own weather models. But there are some models which are used globally. A new paper has been published, comparing outputs from one such program on different machines around the world. Apparently, the same code, running on different machines, can produce different outputs due to accumulation of differing round-off errors. The handling of floating-point numbers in computing is a field in its own right. The paper apparently deals with 10-day weather forecasts. Weather forecasts are generally done in steps of 1 hour. I.e. the output from hour 1 is used as the starting condition for the hour 2 forecast. The output from hour 2 is used as the starting condition for hour 3, etc. The paper is paywalled, but the abstract says: 'The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.'"
This discussion has been archived. No new comments can be posted.

Same Programs + Different Computers = Different Weather Forecasts

Comments Filter:
  • by 140Mandak262Jamuna (970587) on Sunday July 28, 2013 @09:34AM (#44405711) Journal
    Almost all the CFD (Computational Fluid Mechanics) simulations us time marching of Navier-Stokes equations. Despite being very non linear and very hard, one great thing about them is they naturally parallelize very well. The partition the solution domain into many subdomains and distribute the finite volume mesh associated with each sub domain to a different node. Each mesh is also parallelized using GPU. At the end of the day these threads complete execution at slightly different times and post updates asynchronously. So even if you use the same OS and the same basic cluster, if you run it twice you get two different results if you run it far enough, like 10 days. I am totally not surprised if you change OS or architecture or big-endian-small-endian things or the math processor or the GPU brands the solutions differ a lot when you make 10 day forecast.
    • Coincidentally, I went to a presentation a couple weeks ago that largely focused on HPC CFD work. The presenter's company doesn't use GPU's because things like memory bandwidth are more important, but that aside, the thing that surprised me the most was that the simulations are not independently clocked (self-clocking) - they use the hardware clock, so things like latency and state are extremely important. Self-clocking would be too expensive with current hardware. Depending on the HPC cluster setup (and

  • by slashgordo. (2772763) on Sunday July 28, 2013 @09:38AM (#44405733)
    When doing spice simulations of a circuit many years ago, we ran across one interesting feature. When using the exact same inputs and the exact same executable, the sim would converge and run on one machine, but it would fail to converge on another. It just happened that one of the machines was an Intel server, and the other was an AMD, and we attributed it to ever so slightly different round off errors between the floating point implementation of the two. It didn't help that we were trying to simulate a bad circuit design that was on the hairy edge of convergence, but it was eye opening that you could not guarantee 100% identical results between different hardware platforms.
    • by Livius (318358) on Sunday July 28, 2013 @10:12AM (#44405933)

      Well, Arrakis melange is a pretty strong drug, so consistency in spice simulations is probably a little too much to expect.

      (Yes, I know the parent really meant SPICE [wikipedia.org].)

    • by rossdee (243626) on Sunday July 28, 2013 @10:14AM (#44405945)

      "When doing spice simulations "

      Weather forecasting on Arrakis is somewhat tricky, not only do you have the large storms, but also giant sndworms.
      (And sabotage by the Fremen)

    • by Cassini2 (956052) on Sunday July 28, 2013 @01:09PM (#44407015)

      This often happens when the simulation results are influenced by variations in the accuracy of the built-in functions. Every floating point unit (FPU) returns an approximation of the correct result to an arbitrary level of accuracy, and the accuracy level of these results varies considerably when built-in functions like sqrt(), sin(), cos(), ln(), and exp() are considered. Normally, the accuracy of these results is pretty high. However, the initial 8087 FPU hardware from Intel was pretty old, and it necessarily made approximations.

      At one point, Cyrix released an 80287 clone FPU that was faster and more accurate than Intel's 80287 equivalent. This broke many programs. Since then, Intel and AMD have been developing FPUs that are compatible with the 8087, ideally at least as accurate, and much faster. The GPU vendors have been doing something similar, however in video games, speed is more important than accuracy. For compatibility reasons (CPUs) and speed reasons (GPUs), vendors have focused on returning fast, compatible and reasonably accurate results.

      In terms of accuracy, the results of the key transcendental functions, exponential functions, logarithmic functions, and the sqrt function should be viewed with suspicion. At high-accuracy levels, the least-significant bits of the results may vary considerably between processor generations, and CPU/GPU vendors. Additionally, slight differences in the results of double-precision floating point to 64-bit integer conversion functions can be detected, especially when 80-bit intermediate values are considered. Given these approximations, getting repeatable results for accuracy-sensitive simulations is tough.

      It is likely that the articles weather simulations and the parent poster's simulations have differing results due to the approximations in built-in functions. Inaccuracies in the built-in functions are often much more significant that the differences due to round-off errors.

      • by matfud (464184) on Sunday July 28, 2013 @03:35PM (#44407797) Homepage

        Trig functions are nasty. CPU's (FPU's) tend to use lookup tables to get a starting point and then iteratively refine that to provide more accuracy. How they do this depends on the precision and rounding of the intermediate steps and how many iterations they will undertake. Very few FPUs produce IEEE compliant results for trig. Multiple simple math operations also tend to be rounded and kept at different precisions on different processors (let alone instruction reordering done by the cpu and compiler.

        GPU's are great performance wise at float (sometimes double) math but tend to be poor at giving the result you expect. Now IEEE-754 does not remove these issues it just ensures that the issues are always the same.

        It is why languages like Java have java.lang.Math and java.lang.FastMath for trig and the strictfp keyword for float and double natives. (FastMath tends to just delegate to Math but does not have to). strictfp can kill performance as a lot of fixups have to be done in software in the better cases (also hotspot compilation can be hindered by it) and in the worst cases the entire simple operation (+,-,*,/) has to be performed in software.

        • by matfud (464184)

          As an additional comment:

          There are reasons why people will pay a lot of money to use a POWER 6 and later processors

    • by AmiMoJo (196126) * <mojo@NOspaM.world3.net> on Sunday July 28, 2013 @02:18PM (#44407381) Homepage

      In theory both should have been the same, if they stuck rigidly to the IEEE specifications. There may be other explanations though.

      Sometimes compilers create multiple code paths optimized for different CPU architectures. One might use SSE4 and be optimized for Intel CPUs, another might use AMD extensions and be tuned for performance on their hardware. There was actually some controversy when it was discovered that Intel's compiler disabled code paths that would execute quickly on AMD CPUs just because they were not Intel CPUs. Anyway, the point is that perhaps one machine was using different code and different super-scalar instructions, which operate at different word lengths. Compilers sometimes extend a 64 bit double to 80 bit super-scalar registers, for example.

      Or one machine was a Pentium. Intel will never live that one down.

  • by Barbarian (9467) on Sunday July 28, 2013 @09:41AM (#44405759)

    The x86 architecture, since the 8081, has double precision 64 bit floats, and a special 80 bit float--some compilers call this long double and use 128 bits to store this. How does this compare to other architectures?

    • by sstamps (39313)

      1) There never was any such thing as an 8081.

      2) The earliest Intel math coprocessor was the 8087, for the 8086. The 80-bit float was a special temporary-precision representation which could be stored in memory, but was otherwise unique to the Intel MCP architecture.

    • IBM 360 and 370 mainframes have had 128 bit floating point since the 60's
    • The x86 architecture, since the 8081, has double precision 64 bit floats, and a special 80 bit float--some compilers call this long double and use 128 bits to store this. How does this compare to other architectures?

      The 80 bit format is not in any way "special", it is the standard extended precision format. Unfortunately, PowerPC didn't support it :-) Compilers tend to use 128 bits to store it, the hardware actually reads and writes 80 bits. In practice, long double isn't used very much.

      The real difference is elsewhere: 1. A C or C++ compiler can decide which precision to use for intermediate results. 2. A C or C++ compiler can decide whether fused multiply-add is allowed. 3. Java doesn't allow extended precision b

  • Chaos (Score:5, Interesting)

    by pcjunky (517872) <walterp@cyberstreet.com> on Sunday July 28, 2013 @09:46AM (#44405781) Homepage

    This very effect was noted in weather simulations back in the 1960's. Read Chaos - The making of a new science, by Jmaes Gleick.

    • by Trepidity (597)

      Was noted in actual weather systems as well (at least as far as we understand them), which is part of what makes it particularly tricky to avoid in simulations. It's not only that our hurricane track models, for example, are sensitively dependent on parameters, but also that real hurricane trajectories appear to be sensitively dependent on surrounding conditions.

    • by Jmaes Gleick.

      Perfect example of the butterfly effect and floating point errors in weather. Over time, it can even change a person's name who wrote a book on weather simulations in the 60's. I bet no one predicted that!

      • by jamesh (87723)

        by Jmaes Gleick.

        Perfect example of the butterfly effect and floating point errors in weather. Over time, it can even change a person's name who wrote a book on weather simulations in the 60's. I bet no one predicted that!

        I did, but nobody listened to me until it was too late.

  • by Impy the Impiuos Imp (442658) on Sunday July 28, 2013 @10:05AM (#44405883) Journal

    This problem has been known since at least the 1970s, and it was weather simulation that discovered it. It lead to the field of chaos theory.

    With an early simulation, they ran their program and got a result. They saved their initial variables and then ran it the next day and got a completely different result.

    Looking into it, they found out that when they saved their initial values, they only saved the first 5 digits or so of their numbers. It was the tiny bit at the end that made the results completely different.

    This was terribly shocking. Everybody felt that tiny differences would melt away into some averaging process, and never be an influence. Instead, it multiplied up to dominate the entire result.

    To give yourself a feel for what's going on, imagine knocking a billiard ball on a table that's miles wide. How accurate must your initial angle be to knock it into a pocket on the other side? Now imagine a normal table with balls bouncing around for half an hour. Each time a ball hits another, the angle deviation multiplies. In short order with two different (very minor differences) angles, some balls are completely missing other balls. There's your entire butterfly effect.

    Now imagine the other famous realm of the butterfly effect -- "time travel". You go back and make the slightest deviation in one single particle, one single quantum of energy, and in short order atmospheric molecules are bouncing around differently, this multiplies up to different weather, people are having sex at different times, different eggs are being fertilized by different sperm, and in not very long an entirely different generation starts getting born. (I read once that even if you took a temperature, pressure, wind direction, humidity measurement every cubic foot, you could only predict the weather accurately to about a month. The tiniest molecular deviation would probably get you another few days on top of that if you were lucky.)

    Even if the current people in these parallel worlds lived more or less the same, their kids would be completely different. That's why all these "parallel world" stories are such a joke. You would literally need a Q-like being tracking multiple worlds, forcing things to stay more or less along similar paths.

    Here's the funnest part -- if quantum "wave collapse" is truly random, then even a god setting up identical initial conditions wouldn't produce identical results in parallel worlds. (Interestingly, the mechanism on the "other side" doing the "randomization" could be deterministic, but that would not save Einstein's concept of Reality vs. Locality. It was particles that were Real, not the meta-particles running the "simulation" of them.)

  • The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time.

    In other words, they all gave different answers, but each one was equally certain that *it* was right.

    • they all gave different answers, but each one was equally certain that *it* was right.

      Perhaps that is where politicians got the idea from?

  • by Anonymous Coward

    They really need to standardize on what butterflies to use.

  • by 140Mandak262Jamuna (970587) on Sunday July 28, 2013 @10:58AM (#44406201) Journal
    These numerical simulation codes can sometimes do things funny things when you port from one architecture to another. One of the most frustrating debugging session I had was when I ported my code to Linux. One of my tree class's comparison operator evaluates the key and compares the calculated key with the value stored in the instance. It was crapping out in Linux and not in Windows. I eventually discovered Linux was using 80 bit registers for floating point computation but the stored value in the instance was truncated to 64 bits.

    Basically they should be happy their code ported to two different architectures and ran all the way. Expecting same results for processes behaving choatically is asking for too much.

  • it's called Binary Coded Decimal (BCD) [wikipedia.org] and it works well. plenty of banks still use it because it's reliable and works. it's plenty slower but it's accurate regardless of the processor it's used on.

    • by HornWumpus (783565) on Sunday July 28, 2013 @12:01PM (#44406569)

      A little knowledge is a dangerous thing.

      Get back to us when you've recompiled the simulation using BCD and then realize that there is still rounding. .01 being a repeating decimal in float is another issue.

    • BCD is no better than fixed point binary in this instance. The banking industry relies on it because we use decimalized currency and it eliminates some types of errors to carry out all computations in decimal. For simulation inputs you're no better off than if you use a plain binary encoded number.

    • by blueg3 (192743)

      Problem discovered decades ago. Called "chaos theory". Turns out that for iterated feedback systems, even arbitrarily-large stored numbers cause round-off errors eventually. Usually more quickly than people anticipate.

      People continue not to understand this admittedly subtle point, proceed to suggest known-bad solutions.

  • by Technomancer (51963) on Sunday July 28, 2013 @11:53AM (#44406517)

    Pretty much most iterative simulation systems like weather simulation will behave this way. When the result of one step of the simulation is the input for another step any rounding error will possibly get amplified.
    Also see Butterfly Effect https://en.wikipedia.org/wiki/Butterfly_effect (not the movie!).

  • by Anonymous Coward

    Floating Point arithmetic is not associative.

    Everyone who reads Stack Overflow knows this, because every who doesn't know this posts to Stack Overflow asking why they get weird results.

    Everyone who does numerical simulation or scientific programming work knows this because they've torn their hair out at least once wondering if they have a subtle bug or if it's just round-off error.

    Everyone who does cross-platform work knows this because different platforms implement compilers (and IEEE-754) in slightly diff

  • by alanw (1822) <alan@wylie.me.uk> on Sunday July 28, 2013 @12:03PM (#44406577) Homepage

    Edward Lorenz discovered that floating point truncation causes weather simulations to diverge massively back in 1961.
    This was the foundation of Chaos Theory and it was Lorenz who created the term "Butterfly Effect"

    http://www.ganssle.com/articles/achaos.htm [ganssle.com]

    • by alanw (1822)

      another link: http://www.aps.org/publications/apsnews/200301/history.cfm [aps.org]

      Instead of starting the whole run over, he started midway through, typing the numbers straight from the earlier printout to give the machine its initial conditions. Then he walked down the hall for a cup of coffee, and when he returned an hour later, he found an unexpected result. Instead of exactly duplicating the earlier run, the new printout showed the virtual weather diverging so rapidly from the previous pattern that, within just a

  • I didn't know anyone was still using the old Pentiums anymore.
  • by goodmanj (234846) on Sunday July 28, 2013 @02:01PM (#44407291)

    This is what chaotic systems do. Not to worry, it doesn't change the accuracy of the forecast.

  • by slew (2918) on Sunday July 28, 2013 @02:18PM (#44407379)

    A better article [wattsupwiththat.com]...

    From what I can gather, although the code was well scrubbed so that the single processor, threaded and message passing (MPI) versions produce the same binary result indicating no vectorization errors, machine rounding differences caused problems.

    Since all the platforms were IEEE754 compliant and the code was mostly written in Fortran 90, I'm assuming that one of the main contributor to this rounding is the evaluation order of terms and perhaps the way that double fourier series and spherical harmonics where written.

    Both SPH and DFS operations use sine/cosine evaluation which vary a great deal from platform to platform (since generally they only round within 1ulp, not within 1/2ulp of an infinitely precise result).

    I remember many moons ago, when I was working on fixed-point FFT accelerators, we were lazy and generated sine/cosine tables using the host platform (x86) and neglected to worry about the fact that using different compliers and different optimization levels on the same platform we got twiddle-factor tables that were different (off-by-one).

    With one bug report, we eventually tracked it down to different intrinsics (x87 FSIN w/ math or FSINCOS) were used and sometime libraries were used. Ack... Later library releases we complied in a whole bunch of pregenerated tables to avoid this problem.

    Of course putting in a table or designing your own FSIN function for a spherical harmonic or fourier series numerical library solver might be a bit out of scope (not to mention tank the performance), so I'm sure that's why they didn't bother to make the code platform independent w/ respect to transcendental functions, although with Fortran 90, it seems like they could of fixed the evaluation order issues (with appropriate parenthesis to force a certain evaluation order, something you can't do in C).

  • Handbrake transcodes video as a multi-threaded application. I have yet to try it, but if I re-encoded the same video multiple times from the same source, would I get a different file size based on an MD5 or SHA1 checksum?

  • by LF11 (18760) on Sunday July 28, 2013 @03:43PM (#44407831) Homepage

    It is surprising how quickly certain rounding errors can add up. I've had the dubious pleasure of writing an insurance rating algorithm based on multiplying tables of factors. The difference between half-up and banker's round at 6 decimal places makes for rating errors totalling > 50% of the expected premium in a surprisingly small number of calculations. It's one thing to know about error propagation from a theoretical standpoint, but it's quite another to see it happen in real life.

    I sympathize with the weather forecasters.

  • Didn't we know this? Take forecasts with a grain of salt because they could be wrong?

  • This problem is not going to go away unless/until computers start doing their math rationally and symbolically. That is, with fractional results stored as fractions with no rounding. Where irrational constants are used in calculations, they'll have to be carried through in symbolic form as you would using pencil and paper. That is, the computer actually stores a representation of 1/2pi, NOT 1.570796327.

    Of course, that leaves the 'minor matter' of performance.

    • by blueg3 (192743)

      These are non-algebraic simulations. Even symbolic math libraries -- which there are no shortage of -- cannot do better.

  • I've seen Microsoft Access do the same thing. Apparently Person-B had loaded a slightly different OS date-handler DLL because they found a bug for date patterns of a specific country they happened to be interested in once. A specific spot on a report that calculated date difference thus produced slightly different answers than if ran on the PC of Person-A, making the final totals not add up the same.

When Dexter's on the Internet, can Hell be far behind?"

Working...