## Same Programs + Different Computers = Different Weather Forecasts 240

knorthern knight writes

*"Most major weather services (US NWS, Britain's Met Office, etc) have their own supercomputers, and their own weather models. But there are some models which are used globally. A new paper has been published, comparing outputs from one such program on different machines around the world. Apparently, the same code, running on different machines, can produce different outputs due to accumulation of differing round-off errors. The handling of floating-point numbers in computing is a field in its own right. The paper apparently deals with 10-day weather forecasts. Weather forecasts are generally done in steps of 1 hour. I.e. the output from hour 1 is used as the starting condition for the hour 2 forecast. The output from hour 2 is used as the starting condition for hour 3, etc. The paper is paywalled, but the abstract says: 'The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.'"*
## Re:Have these people never heard of IEEE754???? (Score:4, Informative)

Yes... because that never rounds off numbers.

https://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules [wikipedia.org]

## Re:Have these people never heard of IEEE754???? (Score:5, Informative)

inputdata might be enough for a macroscopically different end result, there is nothing you can do and still expect reasonable utilization of modern architectures.## Re:Damn you people (Score:2, Informative)

actually, that would be really good because you have a fixed spacing of values throughout the whole range which is a very important property in simulations (at least as far as I learned in numerical mathematics).

## Re:Doesn't matter much (Score:5, Informative)

## Re:Have these people never heard of IEEE754???? (Score:5, Informative)

That would be a case of solving the wrong problem. Getting the exact same result every time doesn't much matter if that result is dominated by noise and rounding errors. In fact, the diverging results are a

goodthing, since, once they start to diverge, youknowyou've reached the point where you can no longer trust any of the results. If all the machines worked exactly the same, youcouldfigure the same thing out, but it would require some very advanced mathematical analysis. With the build-the-machines-slightly-differently approach, the point where your results are becoming meaningless leaps out at you.Remember, the desired result here is not a set of identical numbers everywhere. It is an

accurate simulation. Getting the same results everywhere would not make the simulation one bit more accurate. So really, this is a good thing.## Re:Have these people never heard of IEEE754???? (Score:5, Informative)

So are you saying that enforcing predictable and correct answers has a significant performance cost?

He said nothing about "correct."

And yes, enforcing predictable answers across toolchains and architectures has significant performance cost. Even ignoring optimizations, with the x87 FPU (which uses 80-bit registers) it means the compiler needs to emit a rounding operation after every single intermediate operation because the x87 uses 80-bit internal floats but IEEE754 specifies that all operations, even intermediate ones, are always to be performed as if rounded like 32-bit or 64-bit floats.

When you get into the effects of order-of-operations type optimizations even on hardware that only uses 64-bit floats, you find that in most cases (x + y + z) != (z + y + x) even when the same floating point precision is present in each step of the calculation. Even things like common-divisor optimizations (if z is used as a divisor many times, compute 1/z a single time and multiply because multiplication is much faster than division) destroy the chance of equal outcome between compilers that will do it and compilers that will not.

The best way to get insight into the issues is to become familiar with the single-digit-of-precision estimation technique.

## Re:problem solved decades ago (Score:4, Informative)

A little knowledge is a dangerous thing.

Get back to us when you've recompiled the simulation using BCD and then realize that there is still rounding. .01 being a repeating decimal in float is another issue.

## Lorenz, the Butterfly Effect and Chaos Theory (Score:4, Informative)

Edward Lorenz discovered that floating point truncation causes weather simulations to diverge massively back in 1961.

This was the foundation of Chaos Theory and it was Lorenz who created the term "Butterfly Effect"

http://www.ganssle.com/articles/achaos.htm [ganssle.com]