Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Earth Math Supercomputing Science

Same Programs + Different Computers = Different Weather Forecasts 240

knorthern knight writes "Most major weather services (US NWS, Britain's Met Office, etc) have their own supercomputers, and their own weather models. But there are some models which are used globally. A new paper has been published, comparing outputs from one such program on different machines around the world. Apparently, the same code, running on different machines, can produce different outputs due to accumulation of differing round-off errors. The handling of floating-point numbers in computing is a field in its own right. The paper apparently deals with 10-day weather forecasts. Weather forecasts are generally done in steps of 1 hour. I.e. the output from hour 1 is used as the starting condition for the hour 2 forecast. The output from hour 2 is used as the starting condition for hour 3, etc. The paper is paywalled, but the abstract says: 'The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.'"
This discussion has been archived. No new comments can be posted.

Same Programs + Different Computers = Different Weather Forecasts

Comments Filter:
  • by 140Mandak262Jamuna ( 970587 ) on Sunday July 28, 2013 @09:34AM (#44405711) Journal
    Almost all the CFD (Computational Fluid Mechanics) simulations us time marching of Navier-Stokes equations. Despite being very non linear and very hard, one great thing about them is they naturally parallelize very well. The partition the solution domain into many subdomains and distribute the finite volume mesh associated with each sub domain to a different node. Each mesh is also parallelized using GPU. At the end of the day these threads complete execution at slightly different times and post updates asynchronously. So even if you use the same OS and the same basic cluster, if you run it twice you get two different results if you run it far enough, like 10 days. I am totally not surprised if you change OS or architecture or big-endian-small-endian things or the math processor or the GPU brands the solutions differ a lot when you make 10 day forecast.
  • by slashgordo. ( 2772763 ) on Sunday July 28, 2013 @09:38AM (#44405733)
    When doing spice simulations of a circuit many years ago, we ran across one interesting feature. When using the exact same inputs and the exact same executable, the sim would converge and run on one machine, but it would fail to converge on another. It just happened that one of the machines was an Intel server, and the other was an AMD, and we attributed it to ever so slightly different round off errors between the floating point implementation of the two. It didn't help that we were trying to simulate a bad circuit design that was on the hairy edge of convergence, but it was eye opening that you could not guarantee 100% identical results between different hardware platforms.
  • Chaos (Score:5, Interesting)

    by pcjunky ( 517872 ) <walterp@cyberstreet.com> on Sunday July 28, 2013 @09:46AM (#44405781) Homepage

    This very effect was noted in weather simulations back in the 1960's. Read Chaos - The making of a new science, by Jmaes Gleick.

  • by Impy the Impiuos Imp ( 442658 ) on Sunday July 28, 2013 @10:05AM (#44405883) Journal

    This problem has been known since at least the 1970s, and it was weather simulation that discovered it. It lead to the field of chaos theory.

    With an early simulation, they ran their program and got a result. They saved their initial variables and then ran it the next day and got a completely different result.

    Looking into it, they found out that when they saved their initial values, they only saved the first 5 digits or so of their numbers. It was the tiny bit at the end that made the results completely different.

    This was terribly shocking. Everybody felt that tiny differences would melt away into some averaging process, and never be an influence. Instead, it multiplied up to dominate the entire result.

    To give yourself a feel for what's going on, imagine knocking a billiard ball on a table that's miles wide. How accurate must your initial angle be to knock it into a pocket on the other side? Now imagine a normal table with balls bouncing around for half an hour. Each time a ball hits another, the angle deviation multiplies. In short order with two different (very minor differences) angles, some balls are completely missing other balls. There's your entire butterfly effect.

    Now imagine the other famous realm of the butterfly effect -- "time travel". You go back and make the slightest deviation in one single particle, one single quantum of energy, and in short order atmospheric molecules are bouncing around differently, this multiplies up to different weather, people are having sex at different times, different eggs are being fertilized by different sperm, and in not very long an entirely different generation starts getting born. (I read once that even if you took a temperature, pressure, wind direction, humidity measurement every cubic foot, you could only predict the weather accurately to about a month. The tiniest molecular deviation would probably get you another few days on top of that if you were lucky.)

    Even if the current people in these parallel worlds lived more or less the same, their kids would be completely different. That's why all these "parallel world" stories are such a joke. You would literally need a Q-like being tracking multiple worlds, forcing things to stay more or less along similar paths.

    Here's the funnest part -- if quantum "wave collapse" is truly random, then even a god setting up identical initial conditions wouldn't produce identical results in parallel worlds. (Interestingly, the mechanism on the "other side" doing the "randomization" could be deterministic, but that would not save Einstein's concept of Reality vs. Locality. It was particles that were Real, not the meta-particles running the "simulation" of them.)

  • by Anonymous Coward on Sunday July 28, 2013 @10:06AM (#44405893)

    WTF are these amateurs doing? This is a solved problem and has been for several decades. Base float is solved. How to condition your computations so that order remains the same or does not impact the results is solved. Pathetic.

    I ran into this once when working on support for an AIX compiler - got a bug report that we were doing floating point wrong because the code gave different results on AIX than some other machine (HP I think). After looking into it, it turned out that the algorithm accumulated roundoff errors quite badly, and basically wasn't working right on _any_ platform, but would give different results due to slightly different handling of round-off on the different platforms.

    The problem is, this kind of code is very often written by scientists, who have most likely never heard of this issue, or forgot about it, or thought they handled it right but didn't - it's not their area of expertise, so it's not surprising if you think about it. I only hope that for engineering software that designs bridges, airplanes, etc, they realized that they better have it looked over by someone who knows what they are doing.

    BTW, this is one reason why I take all the global warming predictions with a big grain of salt - they are all based on computer simulations which are difficult if not impossible to validate, and given what I've seen, I don't trust the results from them at all.

  • by 140Mandak262Jamuna ( 970587 ) on Sunday July 28, 2013 @10:58AM (#44406201) Journal
    These numerical simulation codes can sometimes do things funny things when you port from one architecture to another. One of the most frustrating debugging session I had was when I ported my code to Linux. One of my tree class's comparison operator evaluates the key and compares the calculated key with the value stored in the instance. It was crapping out in Linux and not in Windows. I eventually discovered Linux was using 80 bit registers for floating point computation but the stored value in the instance was truncated to 64 bits.

    Basically they should be happy their code ported to two different architectures and ran all the way. Expecting same results for processes behaving choatically is asking for too much.

  • by Gravis Zero ( 934156 ) on Sunday July 28, 2013 @11:19AM (#44406331)

    it's called Binary Coded Decimal (BCD) [wikipedia.org] and it works well. plenty of banks still use it because it's reliable and works. it's plenty slower but it's accurate regardless of the processor it's used on.

  • by kyrsjo ( 2420192 ) on Sunday July 28, 2013 @12:12PM (#44406655)

    *SNIP*

    BTW, this is one reason why I take all the global warming predictions with a big grain of salt - they are all based on computer simulations which are difficult if not impossible to validate, and given what I've seen, I don't trust the results from them at all.

    In the case of climate simulations, different models (both physics-wise and code-wise) are run with different computers on the same input data, and yield basically the same results.

    When simulation chaotic behaviour, very small differences can make a big difference in the outcome of your simulations. As an example, I'm currently working on simulations of sparks in vacuum, which is a "runaway" process. In this case, adding a single particle early in the simulations (before the spark actually happens) can change the time for the spark to appear by several tens of %. This also happens if we are running with different library versions (SuperLU, Lapack), different compilers, and different compiler flags. Once the spark happens, the behaviour is predictable and repeatable - but the time for it to happen, as the system is "balancing on the edge, before falling over", is quite random.

  • Good points - in fact in this case one can say that ALL of the calculations done by the different computer architectures are in fact wrong. to varying degrees When doing floating point math without rounding analysis being done then all bets are off. Measurements always have accuracies, and floating point math also adds it's own inaccuracies.

    The Boost library can help: http://www.boost.org/doc/libs/1_54_0/libs/numeric/interval/doc/interval.htm [boost.org]

    Of course all this extra interval management costs in terms of development and performance. But what is the cost of having supercomputers coming up with answers with unknown accuracy?

  • by matfud ( 464184 ) <matfud@yahoo.com> on Sunday July 28, 2013 @03:35PM (#44407797) Homepage

    Trig functions are nasty. CPU's (FPU's) tend to use lookup tables to get a starting point and then iteratively refine that to provide more accuracy. How they do this depends on the precision and rounding of the intermediate steps and how many iterations they will undertake. Very few FPUs produce IEEE compliant results for trig. Multiple simple math operations also tend to be rounded and kept at different precisions on different processors (let alone instruction reordering done by the cpu and compiler.

    GPU's are great performance wise at float (sometimes double) math but tend to be poor at giving the result you expect. Now IEEE-754 does not remove these issues it just ensures that the issues are always the same.

    It is why languages like Java have java.lang.Math and java.lang.FastMath for trig and the strictfp keyword for float and double natives. (FastMath tends to just delegate to Math but does not have to). strictfp can kill performance as a lot of fixups have to be done in software in the better cases (also hotspot compilation can be hindered by it) and in the worst cases the entire simple operation (+,-,*,/) has to be performed in software.

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...