Same Programs + Different Computers = Different Weather Forecasts 240

Posted by timothy on Sunday July 28, 2013 @08:26AM from the climate-change-without-leaving-the-room dept.

knorthern knight writes "Most major weather services (US NWS, Britain's Met Office, etc) have their own supercomputers, and their own weather models. But there are some models which are used globally. A new paper has been published, comparing outputs from one such program on different machines around the world. Apparently, the same code, running on different machines, can produce different outputs due to accumulation of differing round-off errors. The handling of floating-point numbers in computing is a field in its own right. The paper apparently deals with 10-day weather forecasts. Weather forecasts are generally done in steps of 1 hour. I.e. the output from hour 1 is used as the starting condition for the hour 2 forecast. The output from hour 2 is used as the starting condition for hour 3, etc. The paper is paywalled, but the abstract says: 'The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.'"

Same Programs + Different Computers = Different Weather Forecasts

This discussion has been archived. No new comments can be posted.

Search 240 Comments Log In/Create an Account

Comments Filter:

Re:Have these people never heard of IEEE754???? (Score:4, Informative)

by SlayerofGods ( 682938 ) writes: on Sunday July 28, 2013 @08:40AM (#44405749)

Yes... because that never rounds off numbers.
https://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules [wikipedia.org]

Re:Have these people never heard of IEEE754???? (Score:5, Informative)

by cnettel ( 836611 ) writes: on Sunday July 28, 2013 @08:47AM (#44405789)

It doesn't help you that individual operations are rounded deterministically, if the order of your operations is non-deterministic. You cannot expect bit-identical results if you parallelize or allow any level of operation reordering. Even a very well-written code might implement a reduce operation in different hierarchies depending on memory layout. Enforcing all these things to be done in the exactly same order, with full IEEE754 compliance is a significant performance cost. By taking numerical aspects into account, you can ensure that your result is not invalid or unreasonable. However, for a chaotic problem where a machine epsilon difference in input data might be enough for a macroscopically different end result, there is nothing you can do and still expect reasonable utilization of modern architectures.

Comment removed (Score:2, Informative)

by account_deleted ( 4530225 ) writes: on Sunday July 28, 2013 @09:19AM (#44405987)

Comment removed based on user account deletion

Re:Doesn't matter much (Score:5, Informative)

by AchilleTalon ( 540925 ) writes: on Sunday July 28, 2013 @09:32AM (#44406065) Homepage

Measurement errors are involved once at boundary conditions. Precision errors propagates in the computations. So, even if a single precision error is magnitude orders smaller than measurement errors, they can have an impact on the result depending on the computations involved while solving the problem.

Re:Have these people never heard of IEEE754???? (Score:5, Informative)

by Xtifr ( 1323 ) writes: on Sunday July 28, 2013 @10:08AM (#44406261) Homepage

That would be a case of solving the wrong problem. Getting the exact same result every time doesn't much matter if that result is dominated by noise and rounding errors. In fact, the diverging results are a good thing, since, once they start to diverge, you know you've reached the point where you can no longer trust any of the results. If all the machines worked exactly the same, you could figure the same thing out, but it would require some very advanced mathematical analysis. With the build-the-machines-slightly-differently approach, the point where your results are becoming meaningless leaps out at you.
Remember, the desired result here is not a set of identical numbers everywhere. It is an accurate simulation. Getting the same results everywhere would not make the simulation one bit more accurate. So really, this is a good thing.

Re:Have these people never heard of IEEE754???? (Score:5, Informative)

by Rockoon ( 1252108 ) writes: on Sunday July 28, 2013 @10:39AM (#44406431)

So are you saying that enforcing predictable and correct answers has a significant performance cost?
He said nothing about "correct."

And yes, enforcing predictable answers across toolchains and architectures has significant performance cost. Even ignoring optimizations, with the x87 FPU (which uses 80-bit registers) it means the compiler needs to emit a rounding operation after every single intermediate operation because the x87 uses 80-bit internal floats but IEEE754 specifies that all operations, even intermediate ones, are always to be performed as if rounded like 32-bit or 64-bit floats.

When you get into the effects of order-of-operations type optimizations even on hardware that only uses 64-bit floats, you find that in most cases (x + y + z) != (z + y + x) even when the same floating point precision is present in each step of the calculation. Even things like common-divisor optimizations (if z is used as a divisor many times, compute 1/z a single time and multiply because multiplication is much faster than division) destroy the chance of equal outcome between compilers that will do it and compilers that will not.

The best way to get insight into the issues is to become familiar with the single-digit-of-precision estimation technique.

Re:problem solved decades ago (Score:4, Informative)

by HornWumpus ( 783565 ) writes: on Sunday July 28, 2013 @11:01AM (#44406569)

A little knowledge is a dangerous thing.
Get back to us when you've recompiled the simulation using BCD and then realize that there is still rounding. .01 being a repeating decimal in float is another issue.

Lorenz, the Butterfly Effect and Chaos Theory (Score:4, Informative)

by alanw ( 1822 ) writes: <alan@wylie.me.uk> on Sunday July 28, 2013 @11:03AM (#44406577) Homepage

Edward Lorenz discovered that floating point truncation causes weather simulations to diverge massively back in 1961.
This was the foundation of Chaos Theory and it was Lorenz who created the term "Butterfly Effect"
http://www.ganssle.com/articles/achaos.htm [ganssle.com]

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Same Programs + Different Computers = Different Weather Forecasts 240

Same Programs + Different Computers = Different Weather Forecasts More Login

Same Programs + Different Computers = Different Weather Forecasts

Re:Have these people never heard of IEEE754???? (Score:4, Informative)

Re:Have these people never heard of IEEE754???? (Score:5, Informative)

Comment removed (Score:2, Informative)

Re:Doesn't matter much (Score:5, Informative)

Re:Have these people never heard of IEEE754???? (Score:5, Informative)

Re:Have these people never heard of IEEE754???? (Score:5, Informative)

Re:problem solved decades ago (Score:4, Informative)

Lorenz, the Butterfly Effect and Chaos Theory (Score:4, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot