Follow Slashdot stories on Twitter


Forgot your password?
Earth Math Supercomputing Science

Same Programs + Different Computers = Different Weather Forecasts 240

knorthern knight writes "Most major weather services (US NWS, Britain's Met Office, etc) have their own supercomputers, and their own weather models. But there are some models which are used globally. A new paper has been published, comparing outputs from one such program on different machines around the world. Apparently, the same code, running on different machines, can produce different outputs due to accumulation of differing round-off errors. The handling of floating-point numbers in computing is a field in its own right. The paper apparently deals with 10-day weather forecasts. Weather forecasts are generally done in steps of 1 hour. I.e. the output from hour 1 is used as the starting condition for the hour 2 forecast. The output from hour 2 is used as the starting condition for hour 3, etc. The paper is paywalled, but the abstract says: 'The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.'"
This discussion has been archived. No new comments can be posted.

Same Programs + Different Computers = Different Weather Forecasts

Comments Filter:
  • by SlayerofGods ( 682938 ) on Sunday July 28, 2013 @09:40AM (#44405749)

    Yes... because that never rounds off numbers. []

  • by cnettel ( 836611 ) on Sunday July 28, 2013 @09:47AM (#44405789)
    It doesn't help you that individual operations are rounded deterministically, if the order of your operations is non-deterministic. You cannot expect bit-identical results if you parallelize or allow any level of operation reordering. Even a very well-written code might implement a reduce operation in different hierarchies depending on memory layout. Enforcing all these things to be done in the exactly same order, with full IEEE754 compliance is a significant performance cost. By taking numerical aspects into account, you can ensure that your result is not invalid or unreasonable. However, for a chaotic problem where a machine epsilon difference in input data might be enough for a macroscopically different end result, there is nothing you can do and still expect reasonable utilization of modern architectures.
  • Re:Damn you people (Score:2, Informative)

    by YoungManKlaus ( 2773165 ) on Sunday July 28, 2013 @10:19AM (#44405987)

    actually, that would be really good because you have a fixed spacing of values throughout the whole range which is a very important property in simulations (at least as far as I learned in numerical mathematics).

  • by AchilleTalon ( 540925 ) on Sunday July 28, 2013 @10:32AM (#44406065) Homepage
    Measurement errors are involved once at boundary conditions. Precision errors propagates in the computations. So, even if a single precision error is magnitude orders smaller than measurement errors, they can have an impact on the result depending on the computations involved while solving the problem.
  • by Xtifr ( 1323 ) on Sunday July 28, 2013 @11:08AM (#44406261) Homepage

    That would be a case of solving the wrong problem. Getting the exact same result every time doesn't much matter if that result is dominated by noise and rounding errors. In fact, the diverging results are a good thing, since, once they start to diverge, you know you've reached the point where you can no longer trust any of the results. If all the machines worked exactly the same, you could figure the same thing out, but it would require some very advanced mathematical analysis. With the build-the-machines-slightly-differently approach, the point where your results are becoming meaningless leaps out at you.

    Remember, the desired result here is not a set of identical numbers everywhere. It is an accurate simulation. Getting the same results everywhere would not make the simulation one bit more accurate. So really, this is a good thing.

  • by Rockoon ( 1252108 ) on Sunday July 28, 2013 @11:39AM (#44406431)

    So are you saying that enforcing predictable and correct answers has a significant performance cost?

    He said nothing about "correct."

    And yes, enforcing predictable answers across toolchains and architectures has significant performance cost. Even ignoring optimizations, with the x87 FPU (which uses 80-bit registers) it means the compiler needs to emit a rounding operation after every single intermediate operation because the x87 uses 80-bit internal floats but IEEE754 specifies that all operations, even intermediate ones, are always to be performed as if rounded like 32-bit or 64-bit floats.

    When you get into the effects of order-of-operations type optimizations even on hardware that only uses 64-bit floats, you find that in most cases (x + y + z) != (z + y + x) even when the same floating point precision is present in each step of the calculation. Even things like common-divisor optimizations (if z is used as a divisor many times, compute 1/z a single time and multiply because multiplication is much faster than division) destroy the chance of equal outcome between compilers that will do it and compilers that will not.

    The best way to get insight into the issues is to become familiar with the single-digit-of-precision estimation technique.

  • by HornWumpus ( 783565 ) on Sunday July 28, 2013 @12:01PM (#44406569)

    A little knowledge is a dangerous thing.

    Get back to us when you've recompiled the simulation using BCD and then realize that there is still rounding. .01 being a repeating decimal in float is another issue.

  • by alanw ( 1822 ) <> on Sunday July 28, 2013 @12:03PM (#44406577) Homepage

    Edward Lorenz discovered that floating point truncation causes weather simulations to diverge massively back in 1961.
    This was the foundation of Chaos Theory and it was Lorenz who created the term "Butterfly Effect" []

Genius is ten percent inspiration and fifty percent capital gains.