Gaussian Distribution being questioned 205
Robert Wilde writes "The Financial Times is reporting in two stories that a group of scientists have discovered that any scale-independent system does not follow the traditional Gaussian Bell Curve but a new curve. " Interesting implications-for above systems. For what I can gather from the article, for those systems in which this curve is more appropriate, rare events will occur more often then predicted by the Gaussian distribution. Anyone have more comments on this?
Re:How does this relate to standard deviations? (Score:1)
More interesting though, is that fact that the curves shown in the ft.com article weren't properly normalized; comparing these graphs visualy doesn't begin to show what the differences are, and the axis on the graphs didn't make to much sense. It x = "rarity" then what does y correspond to. Typically you would show y and frequency and x as a value, and from this you would determine rarity.
anyway, my two cents
m.d.
hm? (Score:1)
Re:Interesting not exceptional. I agree!!! (Score:1)
Dry (Score:1)
Re:Sceptic in Slashdotia (Score:2)
Don't blame the scientists for poor reporting.
Re:Sceptic in Slashdotia (Score:1)
Re:..:) (Score:1)
Extremely Interesting, looking at God (Score:2)
Re:Gauss turns over in his grave (Score:1)
You miss, sorry. This is only true for a finite variance variables. Observables in nature are not required to have finite variance - there are plenty of cases when they do not.
You position is typical for those who just took some statistic classes, but never bothered to check the fine print and understand what it means (no insult, please) But be careful when you make strong statements in public. They sound funny.
Check some references on "stable distributions" in statistics. For physical examples do a search on "Levy flights"
Also Gaussian distr. is not "normal" in a sense of its frequent occurence. I would claim that scaling, or "power law" as physicists refer to it, distribution is much more common in natural phenomena.
What is the big deal? (Score:1)
Statisticians have said for ages that not all data follows the normal (a.k.a. gaussian) distribution. We even have names for the ways in which distributions differ from the normal. Skewness describes distributions where one tail is stretched out in one direction longer than the other like this [gu.edu.au], or this more extreme example.
Kurtosis describes the "thickness" of the tails in comparison to the height of the centre of the distribution. (i.e. this [gu.edu.au] has more kurtosis than this [gu.edu.au].
So, with some distributions, the chance of rare events is greater than some others.
Secondly, in the financial times (not my usual choice of statistical literature) articles there seems little link between the "universal curve" stuff and the distribution other than the normal.
Re:What is the big deal? (Score:1)
this link [gu.edu.au]
Ooh, i'm a believer (sung to music) (Score:2)
Let's say one night you watch the results of the lottery on TV, and the numbers '1-2-3-4-5-6' come up. Is that a rare occurence? No. That sequence is as likely to occur than your birthday and your girlfriend's birthday combined into esoteric equations.
Example number 2: I'm with this girl one night. I say my astrological sign is Scorpio. "Really!" she exclaims, "I'm Scorpio too!" What are the probabilities of that happening? 1/144? No, just 1/12. At one point (and cryptos will be familiar with this) if you add people, it becomes a rare event that you do not find people with the same sign.
Both of the examples you give here are actually rare occurences, not the number series themselves, but the fact that you recognize them as special series. You note their occurence as extremely rare (the water cooler talk if the lotto was 1-2-3-4-5-6!!) thus in fact making them rare.
These guys were both looking at special curves, in fact random , that turned out to be the same. That is significant in the number of other patterns that can, or cannot, be explained. At the very least this will cause your insurance rates to go up
We're 6 billion on this Earth. It's bound to happen to someone. Same thing with winning the grand prize lottery once or twice.
That's what the story said, very rare occurences are more likely. Check out the Drake Equation [irelands-web.ie] if you think that couldn't be significant
cold fusion
this is different (so far) in that it was two totally seperate areas of study that found the same thing, not some freaks in the desert.
Cool stuff regardless.
Slashdotia
pronounced Slash-dosh-ya?
mathematical nonsense (Score:1)
The Gaussian distribution is 'the universal distribution' in the following sense:
Consider a series of events that generate some value. For example, rolling of a dice, which generates a value from 1 to 6. Assume that these events are independent, meaning that, say, the 10th outcome will in no way influence, say, the 20th outcome. Now take the first N outcomes, add them together and divide by N. The larger you take N, the better the distribution of this average follows the Gaussian distribution. (And I should add that there are some mild conditions that have to be satisfied).
Now what are they saying here? That the 'rareness' of species does not follow the Gaussian distribution? How do you quantify 'rareness'? How can this satisfy any kind of independence condition (where there's one rare animal, there are bound to be more).
What's the weirdest of all, is the statement that rare species are more common that expected. What a joke! If something is more common than expected, then by definition it is not as rare as you thought!
Oh, another point on Chaos Theory... (Score:2)
They're trying to sweeten up the deal by placing the guys behind this as innovators who took on a controversial path. That's just downright silly. I took Chaos Theory grad. courses in college, and let me tell you it's so widely-used that it's like saying electricity is a controversial theory. Let me also tell you that what they're trying to say has absolutely nothing to do with Chaos Theory.
I mean! I hope they never make a movie starring Jeff Goldblum about Newton's life, because we might end up refuting Classical Mechanics (even at non-relativistic speeds) tomorrow, wouldn't we? And those movies 'IQ' and 'Young Einstein' really ruined Relativity for me. Drat.
"There is no surer way to ruin a good discussion than to contaminate it with the facts."
Nothing new about this (Score:2)
why scientists/engineers use the gaussian dist. (Score:1)
Re:Interesting not exceptional (Score:1)
independent rv's. I wonder if self-similarity causes
interdependent rv's, such that they can be shown to
converge to a specific curve, or if their research is just:
"look at our data - it looks about the same" type stuff.
Does anyone know if there is a formal theoretical
basis for this work?
Gaussian has been out for a while.. (Score:1)
I'm glad to see that this is getting some press time, but, it does seem strange to me since much of this has been known since well before the 1970s as quoted in the articles.
I suppose it is time to get the word out a little more and through off the limiting shackles of the Gaussian distribution and white noise
(try brown and pink noise instead.. much more pleasing).
Re:Having trouble understanding the graph... (Score:1)
"There are no shortcuts to any place worth going."
Rare Occurences. (Score:1)
Certain failures and other occurences happen much more frequently than one expects from a straightforward analysis of uptimes and standard accepted failure rates.
I knew my teachers were wrong... (Score:1)
In highschool a few years back I was a teachers-aide, and my teachers all talked about the standard curve, blah blah blah.
Yet, never once did I ever see the distribution - but no, I had to be wrong, right? I mean, who am I to tell the teacher they're wrong (not that I was ever slow to disagree >:P )
Anyways, looking at the graph, it seems a bit more realistic than a standard curve, because in reality, intelligence grows fast but falls faster
(much like my grades...quick to raise, quicker to fall)
Oh, and anyone notice how if you turn your head sideways this kind of looks like half a turnip...which this and the poll option, has Rob revealed a secret fetish?
Re:Sceptic in Slashdotia (Score:2)
jump to "turcotte"
What the curve looks like. (Score:1)
Also, I don't understand how self-similarity would change the bell curve. You'd think every portion would still have the same probabilities, no?
I knew it (Score:2)
Chuck
Conspiracy theorist
It is distribution-distribution plot (Score:2)
The monkeys were less rare and therefore plot to the right, while the programmers, and script kiddies are rare and plot to the left. The "mean" value is the dogs and cats; this plots more to the right.
So what they are saying is that there are more species that have a smaller (rarer) number of critters that they could find. The "most common" value corresponds to the "average" number of critters per species.
I guessing now, but if one did a similar survey of the world's population using nationality instead of species, one may get a similar type of distribution.
Ooops! (Score:1)
sorry.
Re:BBell Curve with a skew. (Score:1)
As we all know... (Score:3)
--
Having trouble understanding the graph... (Score:1)
Also, unrelated to the above question, how come it took scientists so long to analyze the obviousness of the microcosm in such detail within the field of statistics? Shouldn't this have been obvious? Why do you think it wasn't? I have no clue.
"There are no shortcuts to any place worth going."
Re:Rare Occurences. (Score:1)
(The other end is because the device dies from old age. Of course, the MTBF numbers are just another statistic...)
No numbers, but it's the approximate curve.
Plagiarism (Score:2)
"The number of suckers born each minute doubles every 18 months."
Log Normal Linux (Score:2)
Suppose that ppl's programming skills are statistically Gaussian distributed. These ppl then decide to produce a "new" OS called linux. The contribution of these ppl are then plotted up. One would find that the majority of ppl produced a lot of "minor" improvements, smaller programs, scripts, and responses on mailing lists. There would be a smaller group of ppl that contributed a lot of important stuff.
This is the lognormal statistical distribution, IIRC. A bunch of ppl are capable of writing good code in support of this new OS. Unfortunately, only a smaller subset of these ppl have the time to work on the project for a long period of time. Then only a smaller subset of these ppl have the inclination to volunteer their services for this long period of time. Additionally, only a smaller portion of these ppl have the overall skills to do this. The result is that their are only a few ppl that have all of these attributes.
Sorry for this simplistic explanation (it is late and should really be sleeping now). A log normal is really a summation of normal distributions in log space (multiplication in regular space). Another way to view this is to ask yourself a bunch of statistical what if questions (the questions should really generate a set of answers that are Gaussian distributed). When you answer no then you are out of the game. More ppl are eliminated early.
In other words (Score:3)
Or, as wizzards have known for years, million to one chances happen nine times out of ten.
But seriously, folks. This reminds me a lot in terms of its applicability to pretty much everything of an article [newscientist.com] in New Scientist [newscientist.com] that I also found darn interesting.
Old Hat (Score:1)
Re:Baloney (Score:1)
Just because something does not fit the current model does not mean it's wrong (Well it does when you're in college).
I bet you get A's dont you? Fudge your lab data alot?
This might turn out to be on par with cold fusion or it might be significant. Lets wait for the additional research and find out.
-Rich
This is because of a crutch (Score:1)
What we really need to do is stop teaching statistics classes that depend on a gaussian distribution. Down with standard deviations:-).
Re:Another thought: When one side is near saturati (Score:1)
Not every pseudo-random event that you plot will produce a gaussian curve... tracking the rolls of a die will be just a flat linear curve, while tracking the sum of two dice rolled together will produce a guassian bell curve...
There's also more fun you can have with a 'Lorentz' distribution.... as well as however many other distributions there are out there.
If I remember correctly though, a poisson distribution is just a discrete gaussian distribution. Basically for n infinity.
This article is nothing in and of itself.
link to the scientific paper(s)? (Score:1)
Re:Having trouble understanding the graph... (Score:1)
In my own field, the distribution of stock market returns is often taken to be distributed log-normal. Unfortunately, extreme downturns in the market that have been observed should be so rare that they should never be observed with the frequency that they are. A new distribution that gives increased weight to rare events would be very useful.
You ask, "Shouldn't this have been obvious?" No, not really. New distributions are not often found. One can mathematically derive any number of distributions, but they have little use unless you can find physical processes that exemplify them. With the development of chaos theory and fractal theory (the self-similarity referred to in the article) new physical processes have been defined. These have only been recognized in the last 25 years or so.
stats all over again! (Score:1)
not really too surprising (Score:1)
Re:Log Normal Linux (Score:1)
"I use Gaussian because it's mathematically pure!"
"Fat-tails is a distribution for the REAL WORLD!"
"Yeah, but fat-tails just copied Gaussian and added a little bit!"
etc.
--
fsck scheduling (Score:1)
Re:Isn't This Chi-Square Distribution??? (Score:1)
Is this what I think it is? (Score:1)
The difference between this and the gaussian model is that with the gaussian you are merely dealing with the summed behavior of a large number of independent variables. With the SOC model you are dealing with a particular pattern -- the frequency of changes as a function of their magnitude is described by a power law. It's not just a bunch of stuff happening randomly, it's a particular state in which the rarity of a change is correlated in a precise way to its magnitude.
If I'm correct and that *is* what they're talking about, this isn't all THAT new. I have a neuroscientist friend who's been working on applying the SOC model to brain function with some success for a couple years now.
But the article is vague enough that it's not totally clear that's what they're talking about.
This says *absolutely* nothing (Score:2)
It *may* be that they've found another distribution that appears in multiple fields, but there's not enough here to judge this as a statistician. If it has any parameters beyond mean and variance, I'm not likely to be impressed--I can probably produce a three parameter beta distribution that's close.
hawk,wearing his Ph.D. statistician hat for the moment
fractal, anyone? (Score:1)
Self-Similar is key (Score:1)
Distribution (Score:1)
Slap.
Relevent Sci. American article: COBE not Gaussian? (Score:1)
http://www.sciam.com/1999/0999issue/0999scicit5
Unfortunatly, does not reference the papers that it is based on. Sigh.
MobiusKlein.
Heart Data (Score:1)
The interpretation of this is that the heart is in a self-similar state, that is all lengths of time between heart beats occur, at all scales - the distribution of which is a power law. The heart is in a similar state to a condensed matter phase transition, that is its control mechanism keeps the heart in a critically balenced state, ready to change period rapidly.
Re:"universal" curves - a "universal" example (Score:1)
--
"take the red pill and you stay in wonderland and I'll show you how deep the rabitt hole goes"
Re:Baloney (Score:1)
You're right, the journalists missed the point. The part that's "wrong" is the over-application of the gaussian distribusion as a model of everything .
I think the authors missed the real point. (Score:1)
The biggest implication of the model is int he insurance industry. If it is found that floods, fire, earthquakes, and hurricanes follow the new distribution. It may allow insurers to go back to insuring against earthquakes and hurricanes because tey can actually predict long term income and expenditures more accurately. Maybe they will actually do their job instead of claiming hardship whenever a disaster strikes somewhere.
Insurance Exec: Oh wahhh!!! We can't pay a billion in claims, go to the government.
Translation: We have taken in a net profit of 2 billion dollars over the last two years. But, the billion dollars for this disaster will affect our earnings numbers for the next quarter or two and my stock options will be worthless.
Your Supprised?? (Score:2)
I looked over the articals, and all I can say is "So What?" the Gaussian distribution is based on pure random-ness. Did you expect everything to be a completely random event?
Neither artical seems to go into great detail about how the new curve was calculated, but it's simply a _FACT_ that applying the Gaussian distribution to most events is considered a "simplification" of the problem, assuming it's random. Take away some random-ness, and of course the Gaussian distribution won't fit.
Intelligence (however mesured) will not be purely random, nor will floods, grade distributions, tornados, or anything
What's missing from both of these pieces is an explination behind the way the new curve was built, and on what foundation. The Poisson (spelling is way off there) distribution is frequently used in place of Gaussian because it "fits better," but again, doesn't prove that the events have much to do with the math.
This is a case of "curve fitting gone wild" here, and unless I can see someone spell out in scientific detail the relationship between the events and the distribution, I don't buy it. So, they have a new equasion, and a new curve, it doesn't mean that the events are related to the math directly. If you look for anything hard enouth, you will start to find it everywhere.
I do award them credit for a new curve that better fits some models. If the equasion for thier curve is manageable. If it's a complex equasion, it's worthless, because the whole point is to make some equasion fit a distribution of events. If theirs fits, and it's easy to calculate, it's benificial. But it does not emply a direct coorilation between the functions and the variables in the distribution. How do I explain this in SlashDot terms... (/me get's frusturated).
Ok, take Moore's Law, you all know that right? Processor power doubles every 18 months? Or, the more accurately I believe he stated something to the effect that the number of circuts would double every 18 months. Well, a loosely fit exponential function will almost match this trend (roughly). But then you have to "adjust" the month scale between 12 and 24 untill the curve fits well. Now, that's a "model" but does not prove scientificly that circuts and design engineers are behaving exactly as can be predicted. At some point in the future everyone has predicted Moores Law will fail. See... It's a model! Curve Fitting.... Doesn't PROVE anything about what's going on in developers minds, or much tangable other that the "estimation" that things will get more powerfull in the computing world.
Now, take it a step further, say Moores Law fails right as people develop a new method of increacing computing preformance, like say 3D circuts, or something not yet concieved, and with less "countable circuts" you get more preformance. Suddently, new devices start to a few less circuts, and more power. Now the Moores Law curve goes down, slowly at first, leveling off, and maybe dropping just a tad, and it starts to look like a "bell shaped curve" only half drawn. You could go "Curve fitting crazy" and say "Hey, it's Gaussian, it's going to go down now, and within another 15 years, we will all be back to 8 bit processors!" That's just idiotic.
In short, curve fitting is useful to predict many things, but it can not be assumed that the curve implyes natural phenomona. Any curve that fits data is useful. A curve that fits data does not directly imply complete coorelation of events, or diffinitive proff that God does or doesn't play dice (hope he does personally, has to have fun sometime!). And Furthermore:
For those who continue to doubt that it could all be so simple, Prof Turcotte has a suitably direct response. "People say: 'You can't do it because it's too complicated a problem'," he says. "We say: 'Just look at the data'."
So his data fit, so what? Any reasonable math wiz should be able to come up with a few dozen equasions that fit a line. Doesn't prove a thing.
Forgive my typos, bad grammer, and spelling, I got pretty pissed at tabloid junk science, and I had to vent. Feel free to prove me wrong, I would like to see how you can prove the new equasion and chaos theory is the best "insight into the universe" we have... BTW, if you can prove it, you'll probably be up for a Nobel Prize too.
Re:Baloney (Score:1)
My point exactly- the people doing that research probably never intended for the article to lean towards a "Scrap Gaussian! Look at us!" thesis, but that's what happens from time to time when you get journalists in the act of "reporting" science.
Crutches and probable hype over the mundane. (Score:1)
You should add "...approaches infinity, provided the fractional contribution from any one random variable to the sum uniformly converges to zero in the limit as N -> infinity." This is an important distinction. For instance, Levy distributions are a class of stable limit laws for which this is not the case--the largest variable in the sum can in fact dominate the sum. Symmetric Levy distributions may superficially resemble Gaussian laws, but with tails that decay slower (like power laws rather than exponentially fast).
This article is amusing if only because it is a nostalgic throwback to the days of P.R. and hype over "chaos theory." Call me dense, but I don't understand why something as simple as scale-invariance needs to be dressed in the extra jargon and hype. Assuming the author did not miss anything terribly fundamental, I don't see anything novel in what was reported. Perhaps someone in the know can fill me in on just how exactly this turns statistical physics on its head?
[For those who are interested, Levy distributions are treated quite adequately in Limit Distributions for Sums of Independent Random Variables (Gnedenko and Kolmogorov) (c)1954].
Re:This is because of a crutch (Score:1)
As I recall, we were encouraged to map our distributions manually, to discover the shape of our curves, which resemble the Gaussian curve, but are skewed in one direction or another.
The curve in the article looks much more like some of the curves we'd come up with, but the excercise was to demonstrate that the larger your sample, the more your curve began to look like the classic Gaussian curve.
Remember, non-math people, (like me), that all a statistic can show is that something happened that cannot be attributed to random chance alone. That's it. Naturally, the closer your sample is to reality, the more you can be sure that you have results that are statistically significant, and the more discernable this will be when compared with the Gaussian curve, which is intended to be a close approximation of what you get with a distribution of truly random events. The goal of using statistics is to attempt to prove a correlation between conjecture and reality. Statistics is the only way we have of doing this.
Everyone should get a copy of "How to Lie with Statistics", because it explains much better than I can what exactly statistics actually "prove".
Exactly my thought (Score:1)
Indeed I suspect that they just have some variation on a lognormal curve. (Which does indeed show up in many different places.)
Incidentally one of the few things that I disagree with in Knuth is his presentation on Benford's law. Sure they toy mathematical model he generates is fun and all, but he says nothing about why it applies to the real world. And hence his "proof" says nothing about why real numbers that appear in real computers follow Benford's law. I personally find the general explanation in the article you listed to be far more convincing...
Cheers,
Ben
Graph is incorrect (Score:1)
The GOE isn't new... (Score:1)
Actually, I was wondering if it weren't related to the Gaussian Orthogonal Ensemble (GOE) distribution, which was a result of much of Wigner's work pioneering Random Matrix Theory (RMT) decades ago.
Mathematically, the GOE distribution characterizes the eigenvalues of a Gaussian distribution of orthogonal matrices containing random elements. (Forgive me if I've got the math a bit wrong; I'm a physicist by trade...)
Physically, the GOE distribution has been popping up in increasingly many physical systems for a while now. Years ago (maybe by Wigner himself? not sure) it was noticed that the energy level spacings of atomic nuclei have statistical properties consistent with the GOE distribution. Some time later, people fooling around with microwave cavities began seeing these distributions as well. The quantum dot folks have also run into the GOE distribution, I believe.
The GOE distribution seems to provide a good test for broken symmetries in a system. As a system's symmetry is gradually broken by, say, shaving off a corner of a piezoelectric crystal, the statistics followed by the eigenvalues (in this example, the resonant frequencies) gradually shift from GOE to Poisson, the latter which characterizes the eigenvalues of a truly random system.
Now, two really cool things about the apparent universality of the GOE distribution are:
Neat, chaos! Well, sort of. If you take a classically chaotic system, say, a Sinai billiard, and quantize it (solve the Schroedinger equation), time after time you will discover that the eigenvalues of the quantized system have these nice statistical properties that happen to fall out of RMT, namely, the GOE distribution.
So does that mean all quantum systems that follow GOE statistics are chaotic? No. In fact, it's difficult to define what "chaos" really means for a quantum system that has no classical analog. But it implies there's a connection, it certainly is fun to think about, and perhaps continued research will reveal a deeper universal phenomenon at work. I wonder if these researchers haven't taken another step in that direction.
Dang, I wish I had something up on the web about how my research relates to all this... well, you can email [mailto] me.
Re:How does this relate to standard deviations? (Score:1)
Not to mention that the area under the new curve in their graph is significantly more than that under the bell curve. Which means that the total probability is above 1. To use their example, we have a very neat species distribution, say 50% wolves, 50% rabbits, and 30% bears, for a total of 130%... My question is, is the Financial Times always that bad at math?
Everything doesn't fit in nice places (Score:1)
Re:What the curve looks like. (Score:2)
"The reason the systems behaved in the same fashion, they agreed, was that they shared a feature known as self-similarity. If an object is self-similar, it means it looks the same when viewed from far away or nearby. One example is the cauliflower: just as it is made up of individual florets, so each floret is made up of still smaller florets. If you were given a picture with no sense of scale, you could not tell if you were looking at a whole cauliflower or just one floret."
I grepped the article for "fractal" and not once was it mentioned. Gee I'm pretty sure thats the term used for what the author describes, or is the target audience so simplistic that the proper terms have to be dumbed down?
Fear the popular press's interpretation of mathematical research data, especially when they need to mention Jurassic Park in the body of the story.
Einstein (Score:1)
Did I remember correctly?
Re:Aren't you guys missing a point? (Score:1)
In which case, you're absolutely right: WTF is the target audience for the article? Someone speculating on insurance companies stock?
Who is Mandelbrot? (Score:1)
Re:Psychohistory? (Score:1)
The reason I spoke of Psychohistory is because it is supposed to be using the statistics on an advanced probability engine. This is a step toward that equation. The more refined we get to deciding that "1 out of 10 blah blah blah because [insert reason]" is the closer we get to figuring out the universe and how humanity acts as a whole.
-NYFreddie
Re:Ooh, i'm a believer (sung to music) (Score:1)
To me, the difference is that someone has made the claim that the Universe is radically different from what we know, based on a sample of data that was not peer-validated. If you remember the data on cold fusion, it made perfect sense if you adjusted the y-axis, and didn't lead to such mind-boggling conclusions.
I'm willing to bet this is exactly the same. Inventive scientists deduce important rules based on experimental data. Rigorous scientists double-check their data before deducing important rules. What we need is inventive, rigorous scientists.
Sounds good to me. :) As we all know, Slashdotia is the Capital of Slashdom. :)
"There is no surer way to ruin a good discussion than to contaminate it with the facts."
Hurst processes, anyone? (Score:2)
You can also start with log returns (instead of "normal" returns). This will give you an approximation to a Gaussian (as opposed to a lognormal distribution), plus they are summable across time. I work almost exclusively with log returns -- they are a pain when you need to calculate portfolios, but nice otherwise.
A new distribution that gives increased weight to rare events would be very useful
There are several (e.g. Cauchy), but the problem is that they are much harder to deal with (analytically) than the Gaussian. And if you don't like any, you can always work with the empirical distribution -- no need to pollute the facts with assumptions about what they should be. However, not much of statistics will be useful to you -- the Bayesians offer some good tools.
Getting back to the original point, I wonder if these guys heard of Hurst and Hurst processes. A persistent Hurst process (sometimes called black noise) will generate something like what they found, and Hurst himself developed his theory on the basis of natural phenomena (he started with the frequency of floods on the Nile which occurred, surprise, more often than should have been expected). Skim through Peters "Fractal market analysis" for more information.
I bet these guys rediscovered Hurst processes.
Kaa
What is the big deal? (Score:1)
Infinite variance means the tails of the distribution fall off slowly - it is more likely to get an event further from the mean value.
So fucking what? Big news? Hardly.
Stable distributions have a lot of applications in many areas of physics and finance. Do a literature search on "Levy flights" for examples. There was a good article on Levy flights in one recent "Nature" (IIRC) For some financial applications - check out very easily written (but for a specialist kinda useless - IMHO) Mandelbrot's "Fractals and Scaling in Finance". It has some good discussion on the subject.
Guys, you look like fools, making news out of a rather well known field. And discussing it rather childishly...
SETI Client (Score:1)
SETI GBC Analysis [berkeley.edu]
Just kiding.
Re:no REAL information here (Score:1)
Not so clearly - lognormal has a claim (Score:1)
This is just as natural as a normal distribution, and appears more often than straight normal distributions in subjects like finance and stochastic analysis.
Cheers,
Ben
Re:Old Hat (Score:1)
Fat tails (Score:2)
Lots of Pratchett fans here? (Score:1)
Ben
Re:t-distribution (Score:1)
T-distibution is NOT a BETTER FIT. It is a distribution of a sample mean of a Gaussian variable, when you use a sample variance instead of a true variance (when it is unkhown) When your sample grows bigger - the estimator of the variance becaomes more accurate and your t-distribution approaches Gaussian.
There is nothing painful about t to use once you got a clue.
As for the Gaussian, as I mentioned above - it is a particular case of a stable distribution : with a finite variance. This is hardly news, but some recent developments in the self-affine processes made it more other stable distributions more widely known..
unless... (Score:1)
umm is least rare most rare? (Score:1)
Anyway i'll put money on the fact whoever these profs are are trying to scam cash from financial wallstreet types.. (New curves new ways to predict the stock market give us money *cough*) this article was a plant.. but i'm feeling kind of cynical today
-avi
Re:"universal" curves - a "universal" example (Score:1)
In school, I took a class called System Dynamics which is all about modeling dynamic behavior of systems. There is an interesting similarity of behavior between electrical, mechanical, and hydraulic systems in the equations used and how you define them.
Driver:
Electrical = voltage
Mechanical = force
Hydraulic = pressure
Flow:
Electrical = current
Mechanical = velocity
Hydraulic = flowrate of fluid
Resistance:
Electrical: voltage = constant*current
Mechanical: force = constant*velocity
Hydraulic: pressure = constant*flowrate of fluid
Capacitance:
Electrical: contant*integral(current) with time
Mechanical: constant*distance traveled
Hydraulic: constant*integral(flowrate) with time
Inductance:
Electrical: Voltage = constant*delta(voltage)/delta(time)
Mechanical: Force = constant*delta(velocity)/delta(time)
Hydraulic: Pressure = constant*delta(flowrate)/delta(time)
(In the mechanical example, mass is the constant)
The equations are very similar, but you don't see me calling the press and saying I've found a "universal" mathmatical model.
Trying to claim a "universal" law is hype. Just because there is similar behavior for magnetic properties, turbulent flow, and distribution of species is interesting, but doesn't suggest that everything is related in a similar way. I think that is why Mr. Turcotte got such a hostile reaction. Before you claim here might be a "universal law linking patterns of mineral deposits, floods and landslides" you better look at the data first and don't argue from the specific to the general the way he did in this case.
Re:t-distribution (Score:1)
but recent developments in the study of the self-affine (fractal) processes made other stable distributions used more often..
Still bad, but screw my English...
Re:fractal, anyone? (Score:1)
http://www.indep.k12.mo.us/THS/student/aforrest
Gauss turns over in his grave (Score:1)
It's almost nonsensical to state that the nature does not follow the Gaussian curve just because a statistical variable does not follow it. Perhaps it tells you more about the variable itself. If a variable x has a perfect Gaussian distribution, the distribution of log(x) will look nothing like a Gaussian distribution. Does that tell us the Gaussian curve is not the normal curve? It only tells us that even if x is truly random log(x) is not.
Overstated case. (Score:1)
First, the bell curve is ubiquitous because so many random processes satisfy the assumptions of the Central Limit Theorem. (finiteness...)
However, there are lots of natural phenomena that don't meet those requirements and so we use lots of probability distributions in science. Lorentzian and Poisson distributions come to mind.
It's fascinating, but unsurprising that self-similarity leads to a different kind of probability distribution.
The journalist heads into "Golly Gee" territory once he starts calling it a "Universal Curve".
DK
Central limit theorem (Score:1)
This is a breakthru? (Score:1)
      it seems to me that external, modifying events are removed from scientific studies as much as possible.
This act automagically skews the results at least slightly enuff to where you will find something else in nature.
It would seem to me that it would be impossible to take EVERYTHING (i.e. everything) that might efect the results into account so we dont bother trying.
      If we want to predict a mostly random event we apply the bell shaped curve. But I say 'mostly random' becouse most things are not truely random.
      Just becouse we fail to predict or fully understand a problem does not mean that it is utterly random. This new curve helps to predict some things. Others might take a whole new curve. I do not believe that there will ever be a universally true curve. All that this points out (gasp) is that not all things are utterly random.
Baloney (Score:3)
Just because your data doesn't precisely fit the distribution, it does not mean the distibution is "wrong." What it means is your data doesn't match your distribution.
This appears to be another case where journalists have missed the point.
The Gaussian distribution is not "wrong" in any shape or form.
Psychohistory? (Score:1)
I wonder if I should take this back to school and demand they raise my grades for all those times I "created the bell curve".
-NYFreddie
For whom the Bell Curve Tolls , or something.... (Score:1)
To explain the 'rare more common than common' phenomenon, one need look no further than Hallmark or Precious Moments or crap like that: "We are all special, we are all unique, etc." Blah!
Still giddy, this is cool!
The Divine Creatrix in a Mortal Shell that stays Crunchy in Milk
How does this relate to standard deviations? (Score:4)
First, let me say that the graph in the article is poorly labeled (or at least their example poorly chosen), IMHO, since "rarity" is related to the number of standard deviations you are from the mean (whether or not the distribution is symmetrical), whereas their graph has rarity monotonically decreasing from left to right. I guess in this sense ("rarity of a species"), rarity != probability.
This new graph stikes me as a bit odd, since it's not symmetrical. With the bell curve, you only need to know how many standard deviations you are from the mean. With this curve, "above the mean" and "below the mean" are vastly different territories.
This curve brings up two questions for me:
I guess this new curve is just another way of saying that "Hey, there's a class of 'random' events out there that share a common non-uniform distribution!" While that's useful to know, I don't see it as the ultimate refutation of the Gaussian distribution.
--Joe--
Interesting not exceptional (Score:5)
As such from a mathematical point of view this has nothing to do with replacing the gausian curve...it is still clearly the most 'natural' mathematical curve. However, what I understand the authors to be claiming is that certain types of real world events are not actually gaussian and are described better by this model. This shouldn't be that surprising as often the 'extreme' cases are not caused by a mere sum of the independent random variables mentioned earlier.
For instance intelligence might be regarded as the influence of a great deal of small random variables (how some genes got arranged upbring etc..) but the truly tale end cases such as mental retardation do not occur because all of these factors go bad, (someone who is retarded is the result of some genetic defect usually not a combination of bad upbringing poor nutrition etc..). This is probably not the kind of thing the distribution describes but it shows that the gaussian really never has been the end all and be all.
So while this is undoubtly a very interesting subbject it really isn't that exciting. Ohh and the claim that the greater incidence of natural disasters disproves the gaussian was really BS, while they may not be gaussian this doesn't appear to be a large enough sample size to make such definitive claims
Another thought: When one side is near saturation. (Score:2)
I happened to think of one possible reason why so many phenomena might fit a lopsided curve better: The bell curve implies the possibility of infinite extension in both directions. If the mean of the distribution is near one physical extreme (for instance, looking at average rainfall levels -- you can't have negative rainfall), then the curve must become lopsided.
Perhaps that's what they've stumbled onto?
--Joe--
Re:Lots of Pratchett fans here? (Score:2)
Re:Nor is it particularly right... (Score:2)
I feel the same way about the "least squares" technique for determining the line of best fit. It is popular precisely because it is easy to do calculus on x^2.
Mmmmm, curves (Score:2)
One of the "authors" replies (Score:2)
Wow!
It is interesting to see the response that this "research" article in the financial times generated. I'm a research associate (Bruce Malamud) working closely with Donald Turcotte. A student wrote me about the discussion your web site was having. Donald Turcotte was one of the scientists "quoted" in the financial times article. My research area has been in the areas of "time-series analysis" and also applying ideas of fractals and self-organized criticality to natural hazards. I did my Ph.D. with Donald Turcotte and am now doing a brief stint as a postdoc while I look for a "real" job in the world.
First of all, this Financial Times article was a "quickly" researched article on the part of the person who wrote it. Donald Turcotte was contacted and interviewed by phone on Tuesday/Wednesday, with no contact afterwards from the Financial Times to see how correct they got the overall picture. This is how things are and he and I both gulped when we saw how the article appeared. We quickly prepared a short "response" from him (below) to the deluge of e-mails and telephone calls that he received yesterday.
Bottom line, he was a bit misquoted, but the general idea holds. We are talking about applying the ideas of power-law frequency-size distributions (i.e., fractals) to extreme events, including floods, forest-fires, earthquakes, landslides, etc. Donald Turcotte has been active for many years in the area of applying fractals, self-organized criticality, and chaos theory to the earth sciences, and yes, he knows very well that he did not "invent" the idea, just made many applications (well, a bit more then that, but read his book).
On the most basic level (and no, I'm not trying to be insulting, I'm sure many people on this site know what I'm talking about already as this is basic statistics), at one level the idea is a very simple one. Plot the frequency-size distribution of a set of data and see what curve is that best fits the data, i.e. what might be the underlying distribution. For some sets of data (such as forest-fire burn areas, earthquakes, and many other "natural" data sets) the frequency-size distribution follows a nice straight line in log-log space, i.e. it is follows a power-law (fractal or self-similar) distribution. Although one cannot say for SURE what an underlying distribution is, one can make certain (statistical) guesses as to whether a distribution follows more a Gaussian, log-normal, power-law, etc.
Once on "believes" that a set of data follows a certain distribution, one can then begin to make some guesses as to what an "extension" of that curve might bring in time. If one has 30 years of flood-discharge data, one might then be able to make certain predictions as to the "size" of what the 100-year flood might be. Same with earthquakes. One has a better idea of the probability of having a certain size or greater earthquake, flood, forest-fire, etc. each year. It just happens that many of these events appear to follow power-law distributions, and these are not as "accepted" in the statistical community.
Don just came in and is looking over my shoulder. He adds (to my above comments) that statisticians do not in general recognize power-law distributions because one cannot define a pdf for them. (Although one can define pdf's for certain distributions that are similar distributions to the power-law distributions, such as the Pareto distribution).
So...in terms of the insurance community, they are of course very interested if a given "natural hazards" appears to follow more a power-law distribution vs. log-normal or Gaussian, as the resulting recurrence intervals will be very different. Power-law distributions tend to be very conservative for extreme events, i.e. one would expect more larger events in a given period of time, then say a Gaussian distribution. Others of course interested in this underlying distribution would be engineers trying to decide how big a flood one might expect in a given area in a given amount of time (and yes, we're dealing with extreme events, so the statistics are small and unsure), so as to know where people can build houses, how deep to make the bridge supports, etc. Bottom line is the statistics are unsure because there the data sets are small, but people need some sort of a starting point as a lot of money rides on the answers of what the "underlying" distribution might be.
There are also many scientific implications, ranging from the simple "describing" what distribution a data set best follows, to understanding better (or in a different way) the underlying basic physics or equations that describe a given natural phenomena due to a better understanding of the statistics resulting from the equations vs. the actual data. In addition, many scientists are now beginning to think that the pervasive power-law distribution in nature is a general indication of self-organized critical behavior. One definition of self-organized behavior is when one has a complex system with a small steady input, and a power-law distribution of the "avalanches" (the events). Donald Turcotte and I wrote a paper (in Science, see below) applying this general idea of self-organized criticality to computer models and forest fires. Of the references listed below, this is probably the easiest for people to get.
OK, before I start babbling. Below is the "reply" that Donald Turcotte wrote to many of the e-mails that came in during the last day.
Bruce Malamud
_________________________________________
Wednesday September 2, 1999
Ithaca, NY, USA
Dear Interested Reader:
Due to the large number of e-mails and telephone calls I have received with respect to the articles by Michael Peel, "New Curve Makes Life Predictable" and "Redrawing the Curve Reveals New Pattern of Events", that appeared in the Financial Times, September 2, 1999, I have prepared a short general reply. If you have further questions or comments after reading the below "comment" to the article, please do not hesitate to contact me for further information.
These Financial Times articles emphasize the importance of power-law (also called fractal or fat-tail) distributions in estimating the probability of occurrence of extreme events. It is unfortunate the article implies that I invented the idea of power-law distributions, which have been recognized now for many decades. For instance, earthquake hazard assessment is based mainly on the Gutenberg-Richter relation; which is a power-law distribution of the number of earthquakes as a function of their magnitude [for some papers where I discuss this, see DLT, Annual Review of Earth and Planetary Sciences, Vol. 19, p. 263-281, 1991; DLT, Physics of Earth and Planetary Interiors, Vol. 111, 275-293, 1999].
My work in power-law distributions is based on the concept of fractals, which is due to the pioneering work of Benoit Mandelbrot [for instance, see his book, The Fractal Geometry of Nature, Freeman, San Francisco, 1982]. Mandelbrot, along with many other researchers, have applied the concept of fractals to many phenomena in the natural and "man-made" world, including to financial time series. Other distributions, similar to the power-law, such as the Pareto distributions, have also been used for a long time. A good web page which discusses fractals and has many links is The Spanky Fractal Database (http://spanky.triumf.ca/].
My own contributions have concerned applications to natural hazards and related phenomena. These are set forward in detail in my book [DLT, Fractals and Chaos in Geology and Geophysics, 2nd ed., Cambridge University Press, Cambridge, 1997] and in a major review paper on self-organized criticality [DLT, Reports on Progress in Physics, Vol. 62, 1999, available as a pdf document (preprint) which can be sent upon request].
The principal contributions of my group have been the applications of fractal distributions to:
(1) Fragmentation (by explosions in asteroids, etc.). [DLT, Journal of Geophysical Research, Vol. 91, p. 1921-1926, 1986]
(2) Mineral deposits. [DLT, Economic Geology, Vol. 81, p. 1528-1532, 1986]
(3) Floods. [DLT and L. Greene, Stochastic Hydrology and Hydraulics, Vol. 7, p. 33-40, 1993; DLT Journal of Research NIST, Vol. 99, p. 377-389 1994; B.D. Malamud, DLT, and CC Barton, Environmental and Engineering Geosciences, Vol. 2, p. 479-486, 1996. The last paper is available as a pdf document at http://coastal.er.usgs.gov/barton/pubs_online.htm
(4) Landslides. [J.D. Pelletier, B.D. Malamud, T. Blodgett, and DLT. Engineering Geology, Vol. 48., p. 255-268, 1997; available as a postscript file at http://www.gps.caltech.edu/~jon/]
(5) Forest Fires. [B.D. Malamud, G. Morein, and DLT. Science, Vol. 281, p. 1840-1842, 1998; available as a pdf document for subscribers of Science, web site: http://www.sciencemag.org/]
Many extreme-value events are directly related to time series that exhibit persistence or memory (for instance, time series of temperature, river discharge, the stock market, etc.). A good reference to applying persistent techniques (and a discussion of how to apply the techniques) is Advances in Geophysics, Vol. 40, B.D. Malamud, J.D. Pelletier, and DLT.
Two other colleagues that have used power-law techniques applied to natural hazards include Dr. Bruce D. Malamud (Cornell University, e-mail: Bruce@Malamud.Com) and Dr. Christopher C. Barton (USGS, e-mail: barton@usgs.gov, home page: http://coastal.er.usgs.gov/barton/).
Again, please do not hesitate to contact me for further questions.
Donald L. Turcotte
Maxwell Upson Professor of Engineering
:::::::::::::::::::::::::::::::::::::::::::::::
:: Donald L. Turcotte
:: Department of Geological Sciences
:: Cornell University, Snee Hall
:: Ithaca, NY 14853-1504, USA
:: Office: 607-255-7282; Fax: 607-254-4780
:: e-mail: turcotte@geology.cornell.edu
:::::::::::::::::::::::::::::::::::::::::::::::
Sceptic in Slashdotia (Score:4)
The problem here is how you define and measure a rare occurence. Let me give you an example.
Let's say one night you watch the results of the lottery on TV, and the numbers '1-2-3-4-5-6' come up. Is that a rare occurence? No. That sequence is as likely to occur than your birthday and your girlfriend's birthday combined into esoteric equations.
Example number 2: I'm with this girl one night. I say my astrological sign is Scorpio. "Really!" she exclaims, "I'm Scorpio too!" What are the probabilities of that happening? 1/144? No, just 1/12. At one point (and cryptos will be familiar with this) if you add people, it becomes a rare event that you do not find people with the same sign.
All that graph is showing me is that the guys (I'm hesitating to call them scientists - I mean, they published in "serious papers"? Come on. Names, please) looked purposefully for freak occurences, discarding other "rare" occurences that were perfectly normal. That's why the left side of the graph is wider.
Thing is, the Gaussian curve doesn't come out of nowhere; it's not arbitrary. For instance, in statistical mechanics and quantum mechanics, you get bell curve distributions precisely because of the distribution of particle states.
All these guys are saying is, "rare events are not as rare as we think they are". That's not because the bell curve is wrong, it's because we seem to forget how huge the Earth provides a sample.
What are the odds of being struck by lightning twice? One in a billion? We're 6 billion on this Earth. It's bound to happen to someone. Same thing with winning the grand prize lottery once or twice.
And, again, same thing with floods or tornadoes. Yes, in themselves they're rare. When taken alone they seem improbable. But on the scale of the planet, that's the kind of thing that happens.
Alright, anyone got another article on cold fusion lying around?
"There is no surer way to ruin a good discussion than to contaminate it with the facts."
Re:Having trouble understanding the graph... (Score:2)
In other words: we aren't talking about the likelihood that you will encounter an individual of the species, we're talking about counting the species itself. A few really common species, a good spread of "average" species, and a few species represented by few individuals.
'Course I could just be full of it. Wouldn't be the first time...
Re:Having trouble understanding the graph... (Score:2)
It took me a minute, too - I'll try to distill my understanding into english. Assume that the rarity of a species is related to the number of times it is found (duh). The x-axis can be thought of as the number of findings of a given species. The y-axis can be thought of as the number of species that were found X number of times. Using the gaussian distribution, you would expect a symmetric tail-off in both the more-rare and the less-rare directions from the peak value. {Yes, I know you can have asymmetric gaussian distributions.} What this new curve is showing is that the tail-off is much less in the more-rare directions. In other words, assume the peak of the curve is at 100 sightings of a specie, with a standard deviation of 10 sightings. You would expect to some number of species to have 130 sightings (3-sigma). Under the gaussian distribution, you would expect to see the same number of species that only have 70 sightings. This new distribution says that the number of species with only 70 sightings would be much higher than the number of species with 130 sightings.
Fascinating - I will certainly have to explore this further.