Same Programs + Different Computers = Different Weather Forecasts 240
knorthern knight writes "Most major weather services (US NWS, Britain's Met Office, etc) have their own supercomputers, and their own weather models. But there are some models which are used globally. A new paper has been published, comparing outputs from one such program on different machines around the world. Apparently, the same code, running on different machines, can produce different outputs due to accumulation of differing round-off errors. The handling of floating-point numbers in computing is a field in its own right. The paper apparently deals with 10-day weather forecasts. Weather forecasts are generally done in steps of 1 hour. I.e. the output from hour 1 is used as the starting condition for the hour 2 forecast. The output from hour 2 is used as the starting condition for hour 3, etc. The paper is paywalled, but the abstract says: 'The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.'"
It is the butterfly effect. (Score:5, Interesting)
Re: (Score:2)
Coincidentally, I went to a presentation a couple weeks ago that largely focused on HPC CFD work. The presenter's company doesn't use GPU's because things like memory bandwidth are more important, but that aside, the thing that surprised me the most was that the simulations are not independently clocked (self-clocking) - they use the hardware clock, so things like latency and state are extremely important. Self-clocking would be too expensive with current hardware. Depending on the HPC cluster setup (and
I've seen this before (Score:5, Interesting)
Re:I've seen this before (Score:5, Funny)
Well, Arrakis melange is a pretty strong drug, so consistency in spice simulations is probably a little too much to expect.
(Yes, I know the parent really meant SPICE [wikipedia.org].)
Re:I've seen this before (Score:5, Funny)
"When doing spice simulations "
Weather forecasting on Arrakis is somewhat tricky, not only do you have the large storms, but also giant sndworms.
(And sabotage by the Fremen)
Re:I've seen this before (Score:5, Insightful)
This often happens when the simulation results are influenced by variations in the accuracy of the built-in functions. Every floating point unit (FPU) returns an approximation of the correct result to an arbitrary level of accuracy, and the accuracy level of these results varies considerably when built-in functions like sqrt(), sin(), cos(), ln(), and exp() are considered. Normally, the accuracy of these results is pretty high. However, the initial 8087 FPU hardware from Intel was pretty old, and it necessarily made approximations.
At one point, Cyrix released an 80287 clone FPU that was faster and more accurate than Intel's 80287 equivalent. This broke many programs. Since then, Intel and AMD have been developing FPUs that are compatible with the 8087, ideally at least as accurate, and much faster. The GPU vendors have been doing something similar, however in video games, speed is more important than accuracy. For compatibility reasons (CPUs) and speed reasons (GPUs), vendors have focused on returning fast, compatible and reasonably accurate results.
In terms of accuracy, the results of the key transcendental functions, exponential functions, logarithmic functions, and the sqrt function should be viewed with suspicion. At high-accuracy levels, the least-significant bits of the results may vary considerably between processor generations, and CPU/GPU vendors. Additionally, slight differences in the results of double-precision floating point to 64-bit integer conversion functions can be detected, especially when 80-bit intermediate values are considered. Given these approximations, getting repeatable results for accuracy-sensitive simulations is tough.
It is likely that the articles weather simulations and the parent poster's simulations have differing results due to the approximations in built-in functions. Inaccuracies in the built-in functions are often much more significant that the differences due to round-off errors.
Re:I've seen this before (Score:4, Interesting)
Trig functions are nasty. CPU's (FPU's) tend to use lookup tables to get a starting point and then iteratively refine that to provide more accuracy. How they do this depends on the precision and rounding of the intermediate steps and how many iterations they will undertake. Very few FPUs produce IEEE compliant results for trig. Multiple simple math operations also tend to be rounded and kept at different precisions on different processors (let alone instruction reordering done by the cpu and compiler.
GPU's are great performance wise at float (sometimes double) math but tend to be poor at giving the result you expect. Now IEEE-754 does not remove these issues it just ensures that the issues are always the same.
It is why languages like Java have java.lang.Math and java.lang.FastMath for trig and the strictfp keyword for float and double natives. (FastMath tends to just delegate to Math but does not have to). strictfp can kill performance as a lot of fixups have to be done in software in the better cases (also hotspot compilation can be hindered by it) and in the worst cases the entire simple operation (+,-,*,/) has to be performed in software.
Re: (Score:2)
As an additional comment:
There are reasons why people will pay a lot of money to use a POWER 6 and later processors
Re:I've seen this before (Score:4, Insightful)
In theory both should have been the same, if they stuck rigidly to the IEEE specifications. There may be other explanations though.
Sometimes compilers create multiple code paths optimized for different CPU architectures. One might use SSE4 and be optimized for Intel CPUs, another might use AMD extensions and be tuned for performance on their hardware. There was actually some controversy when it was discovered that Intel's compiler disabled code paths that would execute quickly on AMD CPUs just because they were not Intel CPUs. Anyway, the point is that perhaps one machine was using different code and different super-scalar instructions, which operate at different word lengths. Compilers sometimes extend a 64 bit double to 80 bit super-scalar registers, for example.
Or one machine was a Pentium. Intel will never live that one down.
double versus long double (Score:3)
The x86 architecture, since the 8081, has double precision 64 bit floats, and a special 80 bit float--some compilers call this long double and use 128 bits to store this. How does this compare to other architectures?
Re: (Score:2)
1) There never was any such thing as an 8081.
2) The earliest Intel math coprocessor was the 8087, for the 8086. The 80-bit float was a special temporary-precision representation which could be stored in memory, but was otherwise unique to the Intel MCP architecture.
Re: (Score:2)
Re: (Score:2)
The x86 architecture, since the 8081, has double precision 64 bit floats, and a special 80 bit float--some compilers call this long double and use 128 bits to store this. How does this compare to other architectures?
The 80 bit format is not in any way "special", it is the standard extended precision format. Unfortunately, PowerPC didn't support it :-) Compilers tend to use 128 bits to store it, the hardware actually reads and writes 80 bits. In practice, long double isn't used very much.
The real difference is elsewhere: 1. A C or C++ compiler can decide which precision to use for intermediate results. 2. A C or C++ compiler can decide whether fused multiply-add is allowed. 3. Java doesn't allow extended precision b
Chaos (Score:5, Interesting)
This very effect was noted in weather simulations back in the 1960's. Read Chaos - The making of a new science, by Jmaes Gleick.
Re: (Score:2)
Was noted in actual weather systems as well (at least as far as we understand them), which is part of what makes it particularly tricky to avoid in simulations. It's not only that our hurricane track models, for example, are sensitively dependent on parameters, but also that real hurricane trajectories appear to be sensitively dependent on surrounding conditions.
Re: (Score:2)
by Jmaes Gleick.
Perfect example of the butterfly effect and floating point errors in weather. Over time, it can even change a person's name who wrote a book on weather simulations in the 60's. I bet no one predicted that!
Re: (Score:2)
by Jmaes Gleick.
Perfect example of the butterfly effect and floating point errors in weather. Over time, it can even change a person's name who wrote a book on weather simulations in the 60's. I bet no one predicted that!
I did, but nobody listened to me until it was too late.
Yes, the Butterfly Effect, as others have said (Score:5, Interesting)
This problem has been known since at least the 1970s, and it was weather simulation that discovered it. It lead to the field of chaos theory.
With an early simulation, they ran their program and got a result. They saved their initial variables and then ran it the next day and got a completely different result.
Looking into it, they found out that when they saved their initial values, they only saved the first 5 digits or so of their numbers. It was the tiny bit at the end that made the results completely different.
This was terribly shocking. Everybody felt that tiny differences would melt away into some averaging process, and never be an influence. Instead, it multiplied up to dominate the entire result.
To give yourself a feel for what's going on, imagine knocking a billiard ball on a table that's miles wide. How accurate must your initial angle be to knock it into a pocket on the other side? Now imagine a normal table with balls bouncing around for half an hour. Each time a ball hits another, the angle deviation multiplies. In short order with two different (very minor differences) angles, some balls are completely missing other balls. There's your entire butterfly effect.
Now imagine the other famous realm of the butterfly effect -- "time travel". You go back and make the slightest deviation in one single particle, one single quantum of energy, and in short order atmospheric molecules are bouncing around differently, this multiplies up to different weather, people are having sex at different times, different eggs are being fertilized by different sperm, and in not very long an entirely different generation starts getting born. (I read once that even if you took a temperature, pressure, wind direction, humidity measurement every cubic foot, you could only predict the weather accurately to about a month. The tiniest molecular deviation would probably get you another few days on top of that if you were lucky.)
Even if the current people in these parallel worlds lived more or less the same, their kids would be completely different. That's why all these "parallel world" stories are such a joke. You would literally need a Q-like being tracking multiple worlds, forcing things to stay more or less along similar paths.
Here's the funnest part -- if quantum "wave collapse" is truly random, then even a god setting up identical initial conditions wouldn't produce identical results in parallel worlds. (Interestingly, the mechanism on the "other side" doing the "randomization" could be deterministic, but that would not save Einstein's concept of Reality vs. Locality. It was particles that were Real, not the meta-particles running the "simulation" of them.)
Translation (Score:2)
In other words, they all gave different answers, but each one was equally certain that *it* was right.
Re: (Score:2)
Perhaps that is where politicians got the idea from?
Just needs a little adjustment (Score:2, Funny)
They really need to standardize on what butterflies to use.
Hey, at it least it ran all the way. (Score:4, Interesting)
Basically they should be happy their code ported to two different architectures and ran all the way. Expecting same results for processes behaving choatically is asking for too much.
Re: (Score:2)
The study used FORTRAN, which is expected to be highly portable.
problem solved decades ago (Score:2, Interesting)
it's called Binary Coded Decimal (BCD) [wikipedia.org] and it works well. plenty of banks still use it because it's reliable and works. it's plenty slower but it's accurate regardless of the processor it's used on.
Re:problem solved decades ago (Score:4, Informative)
A little knowledge is a dangerous thing.
Get back to us when you've recompiled the simulation using BCD and then realize that there is still rounding. .01 being a repeating decimal in float is another issue.
Re: (Score:3)
BCD is no better than fixed point binary in this instance. The banking industry relies on it because we use decimalized currency and it eliminates some types of errors to carry out all computations in decimal. For simulation inputs you're no better off than if you use a plain binary encoded number.
Re: (Score:2)
Problem discovered decades ago. Called "chaos theory". Turns out that for iterated feedback systems, even arbitrarily-large stored numbers cause round-off errors eventually. Usually more quickly than people anticipate.
People continue not to understand this admittedly subtle point, proceed to suggest known-bad solutions.
Welcome to Chaotic Systems 101 ;-) (Score:3)
Pretty much most iterative simulation systems like weather simulation will behave this way. When the result of one step of the simulation is the input for another step any rounding error will possibly get amplified.
Also see Butterfly Effect https://en.wikipedia.org/wiki/Butterfly_effect (not the movie!).
Utterly Unsurprising (Score:2, Insightful)
Floating Point arithmetic is not associative.
Everyone who reads Stack Overflow knows this, because every who doesn't know this posts to Stack Overflow asking why they get weird results.
Everyone who does numerical simulation or scientific programming work knows this because they've torn their hair out at least once wondering if they have a subtle bug or if it's just round-off error.
Everyone who does cross-platform work knows this because different platforms implement compilers (and IEEE-754) in slightly diff
Lorenz, the Butterfly Effect and Chaos Theory (Score:4, Informative)
Edward Lorenz discovered that floating point truncation causes weather simulations to diverge massively back in 1961.
This was the foundation of Chaos Theory and it was Lorenz who created the term "Butterfly Effect"
http://www.ganssle.com/articles/achaos.htm [ganssle.com]
Re: (Score:2)
another link: http://www.aps.org/publications/apsnews/200301/history.cfm [aps.org]
Instead of starting the whole run over, he started midway through, typing the numbers straight from the earlier printout to give the machine its initial conditions. Then he walked down the hall for a cup of coffee, and when he returned an hour later, he found an unexpected result. Instead of exactly duplicating the earlier run, the new printout showed the virtual weather diverging so rapidly from the previous pattern that, within just a
Pentium 4 (Score:2)
Chaos (Score:3)
This is what chaotic systems do. Not to worry, it doesn't change the accuracy of the forecast.
A better article (Score:3)
A better article [wattsupwiththat.com]...
From what I can gather, although the code was well scrubbed so that the single processor, threaded and message passing (MPI) versions produce the same binary result indicating no vectorization errors, machine rounding differences caused problems.
Since all the platforms were IEEE754 compliant and the code was mostly written in Fortran 90, I'm assuming that one of the main contributor to this rounding is the evaluation order of terms and perhaps the way that double fourier series and spherical harmonics where written.
Both SPH and DFS operations use sine/cosine evaluation which vary a great deal from platform to platform (since generally they only round within 1ulp, not within 1/2ulp of an infinitely precise result).
I remember many moons ago, when I was working on fixed-point FFT accelerators, we were lazy and generated sine/cosine tables using the host platform (x86) and neglected to worry about the fact that using different compliers and different optimization levels on the same platform we got twiddle-factor tables that were different (off-by-one).
With one bug report, we eventually tracked it down to different intrinsics (x87 FSIN w/ math or FSINCOS) were used and sometime libraries were used. Ack... Later library releases we complied in a whole bunch of pregenerated tables to avoid this problem.
Of course putting in a table or designing your own FSIN function for a spherical harmonic or fourier series numerical library solver might be a bit out of scope (not to mention tank the performance), so I'm sure that's why they didn't bother to make the code platform independent w/ respect to transcendental functions, although with Fortran 90, it seems like they could of fixed the evaluation order issues (with appropriate parenthesis to force a certain evaluation order, something you can't do in C).
Re: (Score:2)
I have run into this (Score:3)
It is surprising how quickly certain rounding errors can add up. I've had the dubious pleasure of writing an insurance rating algorithm based on multiplying tables of factors. The difference between half-up and banker's round at 6 decimal places makes for rating errors totalling > 50% of the expected premium in a surprisingly small number of calculations. It's one thing to know about error propagation from a theoretical standpoint, but it's quite another to see it happen in real life.
I sympathize with the weather forecasters.
So the weatherman isn't always right? (Score:2)
Didn't we know this? Take forecasts with a grain of salt because they could be wrong?
Not going away (Score:2)
This problem is not going to go away unless/until computers start doing their math rationally and symbolically. That is, with fractional results stored as fractions with no rounding. Where irrational constants are used in calculations, they'll have to be carried through in symbolic form as you would using pencil and paper. That is, the computer actually stores a representation of 1/2pi, NOT 1.570796327.
Of course, that leaves the 'minor matter' of performance.
Re: (Score:2)
These are non-algebraic simulations. Even symbolic math libraries -- which there are no shortage of -- cannot do better.
Microsoft Access (Score:2)
I've seen Microsoft Access do the same thing. Apparently Person-B had loaded a slightly different OS date-handler DLL because they found a bug for date patterns of a specific country they happened to be interested in once. A specific spot on a report that calculated date difference thus produced slightly different answers than if ran on the PC of Person-A, making the final totals not add up the same.
Re:Have these people never heard of IEEE754???? (Score:5, Insightful)
That said, many applied fields, including meteorology, could benefit from more well-disciplined computational science approaches. But don't expect all that much of a difference.
Re: (Score:2, Insightful)
I was in particular thinking about the section on rounding in IEEE754. You are also overlooking that badly conditioned != behaves in a random fashion. My guess is they did not involve the numerics people in the optimization process, which is a complete fail when you know your problem is not well conditioned.
Re:Have these people never heard of IEEE754???? (Score:5, Informative)
Re: (Score:3)
So are you saying that enforcing predictable and correct answers has a significant performance cost?
Re:Have these people never heard of IEEE754???? (Score:5, Informative)
So are you saying that enforcing predictable and correct answers has a significant performance cost?
He said nothing about "correct."
And yes, enforcing predictable answers across toolchains and architectures has significant performance cost. Even ignoring optimizations, with the x87 FPU (which uses 80-bit registers) it means the compiler needs to emit a rounding operation after every single intermediate operation because the x87 uses 80-bit internal floats but IEEE754 specifies that all operations, even intermediate ones, are always to be performed as if rounded like 32-bit or 64-bit floats.
When you get into the effects of order-of-operations type optimizations even on hardware that only uses 64-bit floats, you find that in most cases (x + y + z) != (z + y + x) even when the same floating point precision is present in each step of the calculation. Even things like common-divisor optimizations (if z is used as a divisor many times, compute 1/z a single time and multiply because multiplication is much faster than division) destroy the chance of equal outcome between compilers that will do it and compilers that will not.
The best way to get insight into the issues is to become familiar with the single-digit-of-precision estimation technique.
Re:Have these people never heard of IEEE754???? (Score:5, Insightful)
Almost nothing you do with IEEE754 floating point numbers is correct in the strict mathematical sense. You can't even represent 0.1 (1/10) as an IEEE754 floating point number. There are entire series of lectures on the topic of scientific computing with floating point numbers. The errors are usually small enough that a few simple rules keep you safe (e.g., never compare floating point numbers for equality), but when you do many iterations, the errors can accumulate and mess with your results, and if in that case you do the calculations in a different order, the accumulated error will mess with your results in a different way. That's what's happening here.
Re: Have these people never heard of IEEE754???? (Score:4, Interesting)
Good points - in fact in this case one can say that ALL of the calculations done by the different computer architectures are in fact wrong. to varying degrees When doing floating point math without rounding analysis being done then all bets are off. Measurements always have accuracies, and floating point math also adds it's own inaccuracies.
The Boost library can help: http://www.boost.org/doc/libs/1_54_0/libs/numeric/interval/doc/interval.htm [boost.org]
Of course all this extra interval management costs in terms of development and performance. But what is the cost of having supercomputers coming up with answers with unknown accuracy?
Calculators get 0.5 - 0.4 - 0.1 wrong ... (Score:3)
All IEEE754 would do is ensure that each FPU based calculator would yield the same non-zero result.
Repeatings digits can be expressed as fractions (Score:2)
And that doesn't help if you are trying to do operations that produce repeating numbers in base 10. You're just trading one set of problem numbers for a different set of problem numbers.
Yes and no. You get rounding in either base when you have insignificant significant digits. However by not doing a conversion from one base to another you avoid a second opportunity for rounding errors.
Also numbers with repeatings digits can be expressed as a fraction. In our calculator [perpenso.com] a fraction is a basic data type. If an operation includes a fraction we will try to produce a result that is a fraction. This can sometimes avoid a rounding error.
... would still not solve some of the larger problems inherent in weather prediction ...
I'm not suggesting a solution to this problems. I am just
Re: Have these people never heard of IEEE754???? (Score:2)
Yes, guessing always has, and always will be, easier than deriving the correct answer.
Re: (Score:2)
I am nor arguing about that, I know that this is true. What gets me is that this is a surprise to anyone. I mean, have the done optimization without error estimation? Have they completely ignored error when optimizing? You do not just calculate away on these problems and then check whether the results seem to match reality. The results are far too important for that amateur-level approach.
Re:Have these people never heard of IEEE754???? (Score:4, Informative)
Yes... because that never rounds off numbers.
https://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules [wikipedia.org]
Re: (Score:2)
Re:Have these people never heard of IEEE754???? (Score:5, Insightful)
When floating point roundoff errors grow big enough to affect the outcome of the simulation, you have long since reached the point where you are not predicting anything useful any longer. It is not exactly a problem if the results differ at that point.
Re:Have these people never heard of IEEE754???? (Score:4, Insightful)
When floating point roundoff errors grow big enough to affect the outcome of the simulation, you have long since reached the point where you are not predicting anything useful any longer.
This is not true. If the model predicts rain at 2 pm two days out and different rounding moves it to 3 pm, that is still a useful forecast in a lot of cases.
Re: Have these people never heard of IEEE754???? (Score:4, Insightful)
another one says the earth will absorb the heat Which one do you trust?
I think I'd have to go with the one that doesn't redefine "absorb" to mean "magically disappear".
Re: (Score:2)
When floating point roundoff errors grow big enough to affect the outcome of the simulation, you have long since reached the point where you are not predicting anything useful any longer. It is not exactly a problem if the results differ at that point.
Weather model forecasts are run as an ensemble, not a single run. Generally forecast modelers, like climate modelers, start with numerous small variations in the initial state, run the model multiple times, and average the results.
Thing is, reading the abstract (since the article is paywalled) - its not clear that the summary here is correct. To me, anyway, it seems like they may be saying that, in practice, ensemble forecasting solves this problem even though it's present in individual runs.
Re: (Score:3)
Averaging the result makes sense for climate modeling. But for meteorological forecasts, it makes more sense to report the most commonly occuring prediction in the ensemble, plus something about risks if you're talking about dangerous weather.
Re: (Score:2, Interesting)
WTF are these amateurs doing? This is a solved problem and has been for several decades. Base float is solved. How to condition your computations so that order remains the same or does not impact the results is solved. Pathetic.
I ran into this once when working on support for an AIX compiler - got a bug report that we were doing floating point wrong because the code gave different results on AIX than some other machine (HP I think). After looking into it, it turned out that the algorithm accumulated roundoff errors quite badly, and basically wasn't working right on _any_ platform, but would give different results due to slightly different handling of round-off on the different platforms.
The problem is, this kind of code is very o
Re: (Score:2)
They didn't predict the rain correctly yesterday here, that's why I believe those predictions are obviously incorrect.
Re: (Score:2)
Nice, but no. He's pointing out the obvious: Climate scientists are usually reliant on their own coding skills, which love it or hate it, are not quite on the same level (usually) as a Computer scientist / Software engineer.
And yes, little errors do matter, since a little error in a preceding calculation may be used in the next series of calculations, and so on...the snowball effect.
Re: (Score:2)
Propagation of rounding errors is not a big problem in climate modeling. These models are run thousands of times in order to establish averages, very different from meteorological models (although they are basically the same!) which are run many times to find the most likely specific events.
Re:Have these people never heard of IEEE754???? (Score:5, Interesting)
*SNIP*
BTW, this is one reason why I take all the global warming predictions with a big grain of salt - they are all based on computer simulations which are difficult if not impossible to validate, and given what I've seen, I don't trust the results from them at all.
In the case of climate simulations, different models (both physics-wise and code-wise) are run with different computers on the same input data, and yield basically the same results.
When simulation chaotic behaviour, very small differences can make a big difference in the outcome of your simulations. As an example, I'm currently working on simulations of sparks in vacuum, which is a "runaway" process. In this case, adding a single particle early in the simulations (before the spark actually happens) can change the time for the spark to appear by several tens of %. This also happens if we are running with different library versions (SuperLU, Lapack), different compilers, and different compiler flags. Once the spark happens, the behaviour is predictable and repeatable - but the time for it to happen, as the system is "balancing on the edge, before falling over", is quite random.
Re: (Score:2)
In the case of climate simulations, different models (both physics-wise and code-wise) are run with different computers on the same input data, and yield basically the same results.
Yes, but how many of those basically same results were achieved by tweaking the model until the output was basically the same?
The problem with climate science is that it's not experimental. You cannot run controlled experiments on the climate. Thus, the quality of climate science research is determined not by how accurately it models reality (since it's impossible to test), but by how accepted your research is by other climate scientists. This can easily lead to the point where the science becomes totally
Re: (Score:2)
Yes, it is possible to estimate how well a climate model models reality. The parameters that vary in climate models are not unconstrained, but constrained by physics (experimental evidence). If your climate model accurately hindcasts the climate developments of the 20th century (say), but the parameters are at the extreme range of what's plausible from experimental physics, then it probably isn't a very good model.
Not all climate scientists focus on general circulation models either. If your particular GCM
Re: (Score:2)
Yes, it is possible to estimate how well a climate model models reality.
It's possible to make a climate model, then wait for reality to happen, then see how well they matched, yes. But you can't run experiments to see if your model is sound. And climate models do diverge from reality as reality happens, see this graph [wattsupwiththat.com] for example.
The parameters that vary in climate models are not unconstrained, but constrained by physics (experimental evidence). If your climate model accurately hindcasts the climate developments of the 20th century (say), but the parameters are at the extreme range of what's plausible from experimental physics, then it probably isn't a very good model.
That hasn't stopped astronomers from positing ridiculous things such as dark matter and dark energy.
Re: (Score:2)
the Berkeley Earth Surface Temperature (BEST) project https://en.wikipedia.org/wiki/Berkeley_Earth_Surface_Temperature [wikipedia.org] was done by and funded by people who wanted to show global warming wrong or already thought it was, no way would they tweak their model to fit the consensus of other climate researchers yet they came to the same conclusion.
They didn't make a model, they measured temperatures. I agree that you can measure temperatures accurately. From skimming the article it seems they discredited the 'urban heat bias' hypothesis which is interesting to know.
Also you can test climate models ability to match reality, make them using a limited data set (eg 20k-1k years ago) and then test them on another(eg last 1k years) to see weather they match. Again this is not a hard method to understand, if the new set does not match perditions your wrong, if it does then you are more likely correct. This method is standard across biology as well as several other fields not ideal but good enough.
That doesn't show your model matches reality, it shows that you managed to make a complicated mathematical formula that managed to use some data points to generate some other data points.
Re: (Score:2)
"In the case of climate simulations, different models (both physics-wise and code-wise) are run with different computers on the same input data, and yield basically the same results."
Maybe that means that their models are bad and they're all fudging their data?
Re: (Score:2)
Climate predictions are not vulnerable to rounding errors the way meteorological predictions are. Meteorologists are solving an initial value problem, climate scientist are solving a boundary value problem.
You can make simple climate models that do not rely on computer simulations (energy budget calculations of various sorts), and those are certainly enough to predict big problems from anthopogenic global warming. Heavy-duty numerical climate models aren't used to "prove" global warming, they're used to get
Re: (Score:3)
WTF are these amateurs doing?
Enjoying decent performance. Doing weather forecasts slower than real time is a lot easier but somewhat less useful.
My interpretation of the abstract (I cannot access the actual paper) is that they could not show that any particular compiler or architecture made the predictions any better, just different. In that case you just go with whichever runs fastest.
Re: (Score:2)
My interpretation of the abstract (I cannot access the actual paper) is that they could not show that any particular compiler or architecture made the predictions any better, just different. In that case you just go with whichever runs fastest.
Or you could, you know, compare the results with reality and go with whichever one is most accurate.
Re:Have these people never heard of IEEE754???? (Score:4, Funny)
It is so unfortunate that academics do not have the wisdom of Slashdot available before they submit papers. Alas, that is the reality they have to live with.
Re:Have these people never heard of IEEE754???? (Score:5, Informative)
That would be a case of solving the wrong problem. Getting the exact same result every time doesn't much matter if that result is dominated by noise and rounding errors. In fact, the diverging results are a good thing, since, once they start to diverge, you know you've reached the point where you can no longer trust any of the results. If all the machines worked exactly the same, you could figure the same thing out, but it would require some very advanced mathematical analysis. With the build-the-machines-slightly-differently approach, the point where your results are becoming meaningless leaps out at you.
Remember, the desired result here is not a set of identical numbers everywhere. It is an accurate simulation. Getting the same results everywhere would not make the simulation one bit more accurate. So really, this is a good thing.
Re:Have these people never heard of IEEE754???? (Score:5, Insightful)
"Remember, the desired result here is not a set of identical numbers everywhere. It is an accurate simulation."
*An* accurate simulation is not the desired result either, an accurate model is. Without reproducibility you don't have a model.
Reproducibility is important always.
Re: (Score:3)
But tweaking the FP to ensure reproducibility doesn't improve the accuracy of the model. In fact, it hides the inaccuracies of the model. So, while I completely agree with you in principle, I think that what you said has no bearing on this particular case.
Re: (Score:2)
Remember, the desired result here is not a set of identical numbers everywhere. It is an accurate simulation.
Well, I'd say a useful simulation, which entails some reasonable level of accuracy, but speed and cost are also important.
It isn't helpful if an algorithm gives you a slightly better simulation of tomorrow's weather if it takes a week to run. If your algorithm is faster or less expensive to run then you can run it more often, or use the saved computer time to run other models. Having an ensemble of models or more frequent updates might be more useful to forecasters than having one model that stays coheren
Re: (Score:2)
Much more useful than running your simulation on multiple different supercomputers is to run it multiple times on one supercomputer, but with your input variables perturbed slightly on each run. If you randomly perturb your input measurements proportional to the standard error in those measurements, then the differences between runs will directly tell you how accurate your forecast is. (This should work independent of whether inaccuracy is dominated by initial condition inaccuracy, or by round off. It doesn
Re: (Score:2)
I guess you have never written an actual simulation code. the IEEE754 standard tells you what happens and what kind of precision you get when applying basic operation on float. But that does not guarantee anything at the higher level.
The order of operation is extremely important not to lose precision. For instance, how do you sum a set of float to achieve maximal precision? Hint you do not start from the first one and iterate to the last one. You basically need to keep them sorted by increasing absolute val
Re:Doesn't matter much (Score:5, Informative)
Re: (Score:2)
Measurement errors are involved once at boundary conditions. Precision errors propagates in the computations.
If measurement errors are less than precision errors and precision errors are sufficient to bring out chaos, changing the initial state by epsilon would also bring chaos.
Getting different results using different architectures is a good thing, it allows to see how chaotic the initial conditions are and evaluate the reliability of the result.
Re: (Score:2)
Those tiny rounding errors are causing different forecasts.
So are the measurement errors, and to a much higher degree. The roundoff errors just don't matter.
How accurate and reliable can these forecasts be in reality then?
Once they reach the point where errors have accumulated to this degree, not at all. Everybody knows that.
Re: (Score:2)
Once they reach the point where errors have accumulated to this degree, not at all. Everybody knows that.
Climatologist either don't or are in denial of that fact.
Re: (Score:2)
At first I agreed with you and thought the GP wasn't aware of the concept of chaos (small errors in input give large errors in output). However, that's not what he wrote. He correctly pointed out that the rounding error is much smaller than the error from the initial measurement. Logically it should be the dominant error that first leads to chaotic behavior. The problem then seems to be over-belief in the forecast due to not accounting correctly for the measurement error. Long before any rounding errors sta
Re: (Score:2)
Alas, TFA is about a situation where they take the SAME inputs (initial measurements), run the program on ten different sets of hardware, and get ten different results.
I fail to see how the same program + same inputs == "differences in inputs cause most of the error"....
Re: (Score:2)
Inaccuracies in the input most likely did cause most of the error. Maybe nobody noticed because that error was the same in all the calculations. Eventually a difference between the calculations starting to build up because of differences in rounding between the different runs. This variation was noticed, but it would still be small compared to the differences caused by inaccuracies in the input. In short m
Re: (Score:2)
Again, the article says that they used the same input. This can be verified with a simple diff. Same input leading to different results means that some other input (that is, the circuitry of the CPU or software libraries) have to be at fault, unless you want to start to argue that computer hardware is non-deterministic. Then you've opened an entirely different can of worms that your error margin system will do little or nothing to address.
Re: (Score:2)
Distributed systems are inherently non-deterministic. Moreover it says right there in the tittle, that the different results were produced on different computers.
Re: (Score:2)
I thought that was my point? It seemed that you were trying to argue that the input was actually different.
Re: (Score:2)
You'r contradicting yourself.
Inaccuracies in the input most likely did cause most of the error.
is the exact logical opposite of
Eventually a difference between the calculations starting to build up because of differences in rounding between the different runs.
It's the differences in rounding based on the same input data that the paper is talking about. Not the inaccuracies in input data (testing for which would involve, by definition, different sets of input data varying by a known quantity). If the rounding was behaving the same, we would expect the same output given the same program and input. If a system produces different output every time its run with the same input, then we have a useless syst
Re: (Score:2)
No. You are assuming if both calculations produce the same result, then that result is correct. In reality, you can run the same calculation twice and get the same error.
If you take the same source and compile it for two different systems, is it the same program? What the compiled program does is probably within the specs of the language.
Re: (Score:2, Informative)
Re:Damn you people (Score:5, Insightful)
Precision is the point. Mathematical chaos diverges exponentially. This means that if you have a value of 9.3440281 in one calculation and it returns 3.5 and a value of 9.344028147 in another, that you can get completely different results (where the second case returns 8.1). Now you say: well, let's just make it more precise then! So you put in the value of 9.34402814672 and get a completely different result (1.7), and so on*. If you weren't dealing with mathematical chaos, you would continually refine the values down (e.g. 3.5, 3.45, 3.467, etc.).
* Note: I should be careful with this layman's description to point out that more precise values technically shrink the window down. But since it is exponentially divergent in the first place, this might not ever do you any good in a realistic setting. Ref Lyapunov exponents [wikipedia.org] and mathematical chaos [wikipedia.org]
Re: (Score:2, Funny)
For being the first person ever to use exponentially correctly on slashdot I literally award you one (1) internet.
Re:Damn you people (Score:5, Funny)
For being one of the many to use should of where the correct phrase is should have (often abbreviated should've, I just point at you and laugh.
Re: (Score:2)
TL/RTS:
From what I'm seeing it's a two fold issue
1) The tool chain is violating the rules by rounding before the calculations are completed
2) The programers have broken the rule of not rounding until your calculations
3) The hardware does not have enough precision to actualy deal with the number of places desired by the programmers
Someone else made the comment about going with a 128 int and for this, I see absolutely no reason that's going to be sufficient. Instead, what they need to use is a 1024 (or larger
Re: (Score:3)
I highly doubt you need any more precision than that.
People doubting that you "need any more precision than that" is, roughly speaking, the origin of problems like this in the first place and, more generally, the origin of our understanding of chaos theory.
It turns out you, ultimately, need more precision than you can get. Always.
Re: (Score:3)
I highly doubt you need any more precision than that.
I'm reminded of something about 640k being enough...
Re: (Score:3)
All I was taught about floating point at that level was how wrong results we could get, and that we should avoid it. Several years later on a more advanced course, I learned about how to do floating point calculations, if you really need to.
Re: (Score:2)
And which point do you like to make? pow(x,-1) is equivalent to 1/x. So pretty valid.
Re: (Score:2)
That's the simulation of climate rather than weather, which is a substantially different problem. It's a problem that's still hard and is still plagued by chaos-theory effects on numerical modeling. Not to worry, though: scientists have understood this problem and its implications for about 7 orders of magnitude longer than you've heard about it.