Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Science

Gaussian Distribution being questioned 205

Robert Wilde writes "The Financial Times is reporting in two stories that a group of scientists have discovered that any scale-independent system does not follow the traditional Gaussian Bell Curve but a new curve. " Interesting implications-for above systems. For what I can gather from the article, for those systems in which this curve is more appropriate, rare events will occur more often then predicted by the Gaussian distribution. Anyone have more comments on this?
This discussion has been archived. No new comments can be posted.

Gaussian Distribution being questioned

Comments Filter:
  • The curve is weird looking, but still readily quantifiable. Besides the mean, and standard deviation, you can use skew (un-centeredness) and kurtosis (bulging in the middle) to describe how different a given curve is from a bell-curve.

    More interesting though, is that fact that the curves shown in the ft.com article weren't properly normalized; comparing these graphs visualy doesn't begin to show what the differences are, and the axis on the graphs didn't make to much sense. It x = "rarity" then what does y correspond to. Typically you would show y and frequency and x as a value, and from this you would determine rarity.

    anyway, my two cents

    m.d.
  • by stoney ( 780 )
    Easily. this is calles slashdot-effect.
  • The journalist completely missed the point. There are already lots of other curves that you can use to fit 'non-normal' data (e.g. Weibull, Gamma, etc) that could be more appropriate than a normal curve. The link with chaos theory and the new distribution is interesting though.
  • by heroine ( 1220 )
    This is the driest article to appear on slashdot ever. As Hemos gets closer and closer to grad school the articles get drier and drier. How about doing something interesting, like teaching lab mice how to roll over.
  • I admit the popular journalism of scientific is lacking (from the articles given here it is really impossible to form any informed opinion about the scientific validity/merit of the work reported on and no references to that work are given) I quickly checked the first name in the article (Donald Turcotte) and indeed he has been published in a number of peer reviewed journals including Science, about self organized critical behavior. What I don't understand is how you can judge whether the scientists involved in this research are doing good work or quackery based on one popular press article and not try to examine the facts before jumping to conclusions. I'm not saying what is described in the articles is right or groundbreaking, what I am saying is that these articles alone don't give nearly enough information to form a reasonable opinion (although I don't have enough time to go and do a full literature review to form an informed opinion either)

    Don't blame the scientists for poor reporting.

  • Turcotte has a 2-part paper on "Self-Affine Time Series" in a recent /Adv. Geophys./ that looks, by timing and title, as if it would provide the technical information those who are really interested would want.
  • by Axe ( 11122 )
    Do not suffer too much. That article is garbage. Read some books on related subjects instead. As an example Mandelbrot is pretty funny and many parts of his writing are accesible for a person without strong mathematics/statistics background..
  • Not to be too crazy, but if this holds up, and others find this curse, it is exceptional. The basic curves of life, and chaos. This is the stuff that explains why a seashell and a universe have the same design. Chaos theory and quantum mechanics both show a certain unpredictability to reality. Science like this shows there is some underlying pattern. At the very least this is extremely interesting, at least for all of us that want our own universe some day.
  • That's not an observation or a fact.

    You miss, sorry. This is only true for a finite variance variables. Observables in nature are not required to have finite variance - there are plenty of cases when they do not.

    You position is typical for those who just took some statistic classes, but never bothered to check the fine print and understand what it means (no insult, please) But be careful when you make strong statements in public. They sound funny.

    Check some references on "stable distributions" in statistics. For physical examples do a search on "Levy flights"

    Also Gaussian distr. is not "normal" in a sense of its frequent occurence. I would claim that scaling, or "power law" as physicists refer to it, distribution is much more common in natural phenomena.



  • A couple of comments.

    Statisticians have said for ages that not all data follows the normal (a.k.a. gaussian) distribution. We even have names for the ways in which distributions differ from the normal. Skewness describes distributions where one tail is stretched out in one direction longer than the other like this [gu.edu.au], or this more extreme example.

    Kurtosis describes the "thickness" of the tails in comparison to the height of the centre of the distribution. (i.e. this [gu.edu.au] has more kurtosis than this [gu.edu.au].

    So, with some distributions, the chance of rare events is greater than some others.

    Secondly, in the financial times (not my usual choice of statistical literature) articles there seems little link between the "universal curve" stuff and the distribution other than the normal.

  • If you don't want to jump backwards and forwards, all the graphs are accessible from
    this link [gu.edu.au]
  • note: this is all dependant on whether this is actual or some disillusioned scientists. I tend to beleive it, mainly because these scientists would most likely not be the type to publish normally, but until I see it from another source I won't totally believe it, that being said, let me argue like I do.

    Let's say one night you watch the results of the lottery on TV, and the numbers '1-2-3-4-5-6' come up. Is that a rare occurence? No. That sequence is as likely to occur than your birthday and your girlfriend's birthday combined into esoteric equations.

    Example number 2: I'm with this girl one night. I say my astrological sign is Scorpio. "Really!" she exclaims, "I'm Scorpio too!" What are the probabilities of that happening? 1/144? No, just 1/12. At one point (and cryptos will be familiar with this) if you add people, it becomes a rare event that you do not find people with the same sign.


    Both of the examples you give here are actually rare occurences, not the number series themselves, but the fact that you recognize them as special series. You note their occurence as extremely rare (the water cooler talk if the lotto was 1-2-3-4-5-6!!) thus in fact making them rare.

    These guys were both looking at special curves, in fact random , that turned out to be the same. That is significant in the number of other patterns that can, or cannot, be explained. At the very least this will cause your insurance rates to go up :)

    We're 6 billion on this Earth. It's bound to happen to someone. Same thing with winning the grand prize lottery once or twice.

    That's what the story said, very rare occurences are more likely. Check out the Drake Equation [irelands-web.ie] if you think that couldn't be significant

    cold fusion
    this is different (so far) in that it was two totally seperate areas of study that found the same thing, not some freaks in the desert.

    Cool stuff regardless.
    Slashdotia
    pronounced Slash-dosh-ya? :)

  • This is all bullocks.

    The Gaussian distribution is 'the universal distribution' in the following sense:
    Consider a series of events that generate some value. For example, rolling of a dice, which generates a value from 1 to 6. Assume that these events are independent, meaning that, say, the 10th outcome will in no way influence, say, the 20th outcome. Now take the first N outcomes, add them together and divide by N. The larger you take N, the better the distribution of this average follows the Gaussian distribution. (And I should add that there are some mild conditions that have to be satisfied).

    Now what are they saying here? That the 'rareness' of species does not follow the Gaussian distribution? How do you quantify 'rareness'? How can this satisfy any kind of independence condition (where there's one rare animal, there are bound to be more).

    What's the weirdest of all, is the statement that rare species are more common that expected. What a joke! If something is more common than expected, then by definition it is not as rare as you thought!
  • Since when is Chaos Theory ridiculed by the scientific community because it's a little wild? And how in hell could the popularity of Jurassic Park ruin the work of Chaos Theoricists?

    They're trying to sweeten up the deal by placing the guys behind this as innovators who took on a controversial path. That's just downright silly. I took Chaos Theory grad. courses in college, and let me tell you it's so widely-used that it's like saying electricity is a controversial theory. Let me also tell you that what they're trying to say has absolutely nothing to do with Chaos Theory.

    I mean! I hope they never make a movie starring Jeff Goldblum about Newton's life, because we might end up refuting Classical Mechanics (even at non-relativistic speeds) tomorrow, wouldn't we? And those movies 'IQ' and 'Young Einstein' really ruined Relativity for me. Drat.

    "There is no surer way to ruin a good discussion than to contaminate it with the facts."

  • This is exactly what one sees when plotting exponential distributions on a log scale. If you have a reaction where A -> B at some rate, then plotting, for example, event durations you would get this distibution as long as the x-axis is log. When working with any system where there is a delta G of reaction(s) the distribution is not gaussian and you can see this graphically.
  • is not because it's more correct, but because it makes the math nice. and it turns out, the things that we do with that nice math seem to have pretty accurate results. for instance, my data has sufficient outliers that an exponential distribution would probably model them better, but using a gaussion distribution means i can use easy math to get answers that agree with my data to experimental uncertainty, and i'm not going to spend any time making a more precise model that won't get me any improvement in my results (my data was all taken long ago, so the experimental uncertainty is fixed)
  • As you said, for central limit theorem you need a lot of
    independent rv's. I wonder if self-similarity causes
    interdependent rv's, such that they can be shown to
    converge to a specific curve, or if their research is just:
    "look at our data - it looks about the same" type stuff.
    Does anyone know if there is a formal theoretical
    basis for this work?
  • I wonder why there is the sudden interest in this. While I'll admit that many of my colleagues still haven't figured out that the Gaussian curve is not supreme, it has been known for many decades that most things don't follow straight Gaussian randomness (or white noise as many like to call it). Since I started looking at chaos and fractals many years ago, all of the research I've done and looked at ranging from particular motion, to weather patterns, to fluid dynamics, to DNA, to Internet traffic, to images and textures, to EKG signals, and the list goes on.. have all had very non-Gaussian but still random characteristics. Our descriptions for the randomness was through chaos and fractal theories.

    I'm glad to see that this is getting some press time, but, it does seem strange to me since much of this has been known since well before the 1970s as quoted in the articles.

    I suppose it is time to get the word out a little more and through off the limiting shackles of the Gaussian distribution and white noise
    (try brown and pink noise instead.. much more pleasing).
  • Thanks, people. I was looking at the graph merely in one dimension, and didn't even consider that it represented an X-Y axis. Now I understand perfectly. :)
    "There are no shortcuts to any place worth going."
  • I think that anyone who deals with large amounts of computer hardware (ie. enough to be a statistically sound sampling) could attest to the claim made.
    Certain failures and other occurences happen much more frequently than one expects from a straightforward analysis of uptimes and standard accepted failure rates.
  • You know, this isn't really DIRECTLY on topic, however:

    In highschool a few years back I was a teachers-aide, and my teachers all talked about the standard curve, blah blah blah.

    Yet, never once did I ever see the distribution - but no, I had to be wrong, right? I mean, who am I to tell the teacher they're wrong (not that I was ever slow to disagree >:P )

    Anyways, looking at the graph, it seems a bit more realistic than a standard curve, because in reality, intelligence grows fast but falls faster :)

    (much like my grades...quick to raise, quicker to fall)

    Oh, and anyone notice how if you turn your head sideways this kind of looks like half a turnip...which this and the poll option, has Rob revealed a secret fetish?
  • also go here [caltech.edu] halfway down the page
    jump to "turcotte"
  • http://www.ft.com/hippocampus/q14ae5a.htm [ft.com]
    Also, I don't understand how self-similarity would change the bell curve. You'd think every portion would still have the same probabilities, no?
  • this is just another ruse for the insurance company to raise my flood insurance premiums again!

    Chuck
    Conspiracy theorist
  • The graph is rather confusing. This is my interpretation of it. Go out in the field and count the number of critters and categorize them by their species (id). Then normalize this count by some factor (perhaps total number of critters that were counted). For instance, I counter 1K monkeys, 500 cats, 500 dogs, 480 turnips, 200 rats, 50 snakes, 10 roaches, 5 hippos, 3 programmers, and 2 script kiddies. Now plot this distro.

    The monkeys were less rare and therefore plot to the right, while the programmers, and script kiddies are rare and plot to the left. The "mean" value is the dogs and cats; this plots more to the right.

    So what they are saying is that there are more species that have a smaller (rarer) number of critters that they could find. The "most common" value corresponds to the "average" number of critters per species.

    I guessing now, but if one did a similar survey of the world's population using nationality instead of species, one may get a similar type of distribution.

  • He had linked to both articles.
    sorry.
  • I think what they mean is that the breakthrough was the linkage between this particular skewed curve, and a whole slew of previously 'unpredictable' events, (e.g.)demagnetisation of magnets using heat, turbulence flows, etc. Also, the possible applications on other types of so-called 'self-similar' events, if their theory turns out to have some merit.
  • by mattdm ( 1931 ) on Thursday September 02, 1999 @11:57AM (#1708395) Homepage
    One-in-a-million chances happen nine times out of ten.

    --

  • In the 1st article, there is a graph about midway that appears to illustrate the notion that, with the new curve, you are more likely to find the rarest creature than the least-rarest creature. I must not be interpreting right, and I tried reading that part a few more times.

    Also, unrelated to the above question, how come it took scientists so long to analyze the obviousness of the microcosm in such detail within the field of statistics? Shouldn't this have been obvious? Why do you think it wasn't? I have no clue.

    "There are no shortcuts to any place worth going."
  • Well, equipment curves aren't gaussian. If you plotted Failure vs. Time, you'll find two sharp peaks - at the beginning (close to Time = 0), and near the end (Time = ??). In-between it's a flat line (that's not near zero). Ideally, manufacturers would burn in equipment until it's in the "flat" region, but in this "gotta-have-it-now" age, I think testing is going down the tubes...

    (The other end is because the device dies from old age. Of course, the MTBF numbers are just another statistic...)

    No numbers, but it's the approximate curve.
  • Isn't this just Murphy's Law?

    "The number of suckers born each minute doubles every 18 months."
  • How can one get a long-tailed statistical distribution as oppose to a symmetrical Gaussian distribution? There is one simple model that will generate this.

    Suppose that ppl's programming skills are statistically Gaussian distributed. These ppl then decide to produce a "new" OS called linux. The contribution of these ppl are then plotted up. One would find that the majority of ppl produced a lot of "minor" improvements, smaller programs, scripts, and responses on mailing lists. There would be a smaller group of ppl that contributed a lot of important stuff.

    This is the lognormal statistical distribution, IIRC. A bunch of ppl are capable of writing good code in support of this new OS. Unfortunately, only a smaller subset of these ppl have the time to work on the project for a long period of time. Then only a smaller subset of these ppl have the inclination to volunteer their services for this long period of time. Additionally, only a smaller portion of these ppl have the overall skills to do this. The result is that their are only a few ppl that have all of these attributes.

    Sorry for this simplistic explanation (it is late and should really be sleeping now). A log normal is really a summation of normal distributions in log space (multiplication in regular space). Another way to view this is to ask yourself a bunch of statistical what if questions (the questions should really generate a set of answers that are Gaussian distributed). When you answer no then you are out of the game. More ppl are eliminated early.

  • by rde ( 17364 ) on Thursday September 02, 1999 @11:59AM (#1708401)
    The new curve is broader and more gently sloping, suggesting that the rarest events occur more often than predicted by the bell-shaped curve.
    Or, as wizzards have known for years, million to one chances happen nine times out of ten.

    But seriously, folks. This reminds me a lot in terms of its applicability to pretty much everything of an article [newscientist.com] in New Scientist [newscientist.com] that I also found darn interesting.
  • by bap ( 75675 )
    The Bayesian community has known about this for many years. It is a log Gaussian, which is the prior commonly used for SCALE PARAMETERS in Bayesian estimation. It is interesting that it applies to other scale parameters, but it's what you'd expect, not some big breakthrough.
  • Spoken like a true college student. Possibly even a grad student(?)

    Just because something does not fit the current model does not mean it's wrong (Well it does when you're in college).

    I bet you get A's dont you? Fudge your lab data alot?

    This might turn out to be on par with cold fusion or it might be significant. Lets wait for the additional research and find out.

    -Rich
  • If I remember back to my Random Processes class not much really is Gaussian. There are two reasons that that assumption is often made. The first is that we have so many tools that assume it, and work OK if we are near it. The second is the abuse of the central limit theorem which says (correct me here if I'm not percise) the sum of a large set of random variables tends toward a gaussian dist as the number of variables approaches infinite. The problem is that people tend to short the infinite part and exagerate the tends to part.

    What we really need to do is stop teaching statistics classes that depend on a gaussian distribution. Down with standard deviations:-).
  • The infinitely defined portion of the gaussian curve doesn't add anything... because as x -> inifity, gaussian(x) -> 0 much faster... off the top of my head, I can't remember the expression for the gaussian though.

    Not every pseudo-random event that you plot will produce a gaussian curve... tracking the rolls of a die will be just a flat linear curve, while tracking the sum of two dice rolled together will produce a guassian bell curve...

    There's also more fun you can have with a 'Lorentz' distribution.... as well as however many other distributions there are out there.

    If I remember correctly though, a poisson distribution is just a discrete gaussian distribution. Basically for n infinity.

    This article is nothing in and of itself.
  • Could someone please provide a link to the actual scientific paper(s)?
  • If the curve represents a count of species, the most common species would be represented by a single point near the right hand side of the curve. The work is just an observation that, in certain apparently unrelated fields, a new probability distribution is operating.

    In my own field, the distribution of stock market returns is often taken to be distributed log-normal. Unfortunately, extreme downturns in the market that have been observed should be so rare that they should never be observed with the frequency that they are. A new distribution that gives increased weight to rare events would be very useful.

    You ask, "Shouldn't this have been obvious?" No, not really. New distributions are not often found. One can mathematically derive any number of distributions, but they have little use unless you can find physical processes that exemplify them. With the development of chaos theory and fractal theory (the self-similarity referred to in the article) new physical processes have been defined. These have only been recognized in the last 25 years or so.
  • I wonder how this will affect the whole field of data analysis? If this curve proves to pretty common then wouldn't it affect the assupmtion (in at least the social sciences) that your distirbution is normal (ie as in z & t-tests and ANOVA tests?). I guess if you can't assume normalcy then you will need find other analysis techniques.
  • This really isn't the biggest discovery ever. In fact what they've accomplished is to rediscover the base assumptions of the bell curve. The normal curve (bell curve) is a product of stochastic interactions between atomistic events; it pretty much only reflects behavior in systems where new actions are not affected by the history of the system. If you have a saturated system (like the ground being unable to absorb more water in the case of the floding example) you've got a messed up curve. Any decent book on statistics will give you the basics about this. [amazon.com]
  • Oh great, you just HAD to bring Linux into this... now there's going to be a new kind of distribution wars.

    "I use Gaussian because it's mathematically pure!"
    "Fat-tails is a distribution for the REAL WORLD!"
    "Yeah, but fat-tails just copied Gaussian and added a little bit!"
    etc.
    --
  • Least likely events happen when you least need them.
  • exactly: chi square or Poisson more liekly (the distribution of accidents (unlikely events). this is just more chaos drivel whipped along by ignorant journalism. cheers, tim
  • It sounds like what they're talking about is the Self-Organized Criticality model. (Do a Google search on Self-Organized Criticality -- you'll get tons of references.)

    The difference between this and the gaussian model is that with the gaussian you are merely dealing with the summed behavior of a large number of independent variables. With the SOC model you are dealing with a particular pattern -- the frequency of changes as a function of their magnitude is described by a power law. It's not just a bunch of stuff happening randomly, it's a particular state in which the rarity of a change is correlated in a precise way to its magnitude.
    If I'm correct and that *is* what they're talking about, this isn't all THAT new. I have a neuroscientist friend who's been working on applying the SOC model to brain function with some success for a couple years now.

    But the article is vague enough that it's not totally clear that's what they're talking about.

  • The authors have found things that were mismodeled as gaussian and instead follow another distribution. So what? There are plenty of distributions besides the normal that are assymettric and have fatter tails.

    It *may* be that they've found another distribution that appears in multiple fields, but there's not enough here to judge this as a statistician. If it has any parameters beyond mean and variance, I'm not likely to be impressed--I can probably produce a three parameter beta distribution that's close.

    hawk,wearing his Ph.D. statistician hat for the moment
  • Isn't this whole thing just a fractal?
  • The key that some people seem to be ignoring is that this is only for self-similar phenomena. This gives rise to the asymmetry and emphasis for rare events as compared to the gaussian distribution. The assertion is not that a Gaussian is incorrect, but that it does not accurately model certain self-similar phenomena.
  • Ok, I'm getting "arbetsskada". I read that and thought to myself, "Sweet name for a new distribution of linux."

    Slap.
  • The September Sci American has an interesting blurb about possible non-gaussian distribution of the Cosmic Background radiation.

    http://www.sciam.com/1999/0999issue/0999scicit5. html

    Unfortunatly, does not reference the papers that it is based on. Sigh.

    MobiusKlein.
  • I am currently studying ecg data from hearts. We have showed that a healthy heart follows just such a distribution as that mentioned by the article in the financial times.

    The interpretation of this is that the heart is in a self-similar state, that is all lengths of time between heart beats occur, at all scales - the distribution of which is a power law. The heart is in a similar state to a condensed matter phase transition, that is its control mechanism keeps the heart in a critically balenced state, ready to change period rapidly.

  • I guess that the diference between your example is that the equations that are used to explaing the experiments that you listed are rather basic ones. As I understood the curve explained by the article is very complicated and would make sence that it would explain well just one specific experiment. But it did explained a lot more then expected, actualy it turn out the it could be a basic curve as well.

    --
    "take the red pill and you stay in wonderland and I'll show you how deep the rabitt hole goes"
  • You're right, the journalists missed the point. The part that's "wrong" is the over-application of the gaussian distribusion as a model of everything .

  • This doesn't prove a Gaussian curve is "wrong". What it is saying is that the new data is evidence that a gaussian distribution is probably the wrong model to be using under the circumstances described. The theory is that self similar phenomena do not follow a gaussian distribution, but follow this new distribution. It seems to me that the deep mathematical analysis has not really been done, but the experimental evidence suggests the existence of this distribution. There is probably a lot of work ahead in coming up with a mathematical model for the new distibution. What would be real interesting would be if the mathematical model for the distribution reduces to gaussian under specific conditions, kind of like how special relativity reduces to classical mechanics at low speeds.

    The biggest implication of the model is int he insurance industry. If it is found that floods, fire, earthquakes, and hurricanes follow the new distribution. It may allow insurers to go back to insuring against earthquakes and hurricanes because tey can actually predict long term income and expenditures more accurately. Maybe they will actually do their job instead of claiming hardship whenever a disaster strikes somewhere.

    Insurance Exec: Oh wahhh!!! We can't pay a billion in claims, go to the government.

    Translation: We have taken in a net profit of 2 billion dollars over the last two years. But, the billion dollars for this disaster will affect our earnings numbers for the next quarter or two and my stock options will be worthless.
  • Come on now people... Forgive me, but this is hardly shocking.

    I looked over the articals, and all I can say is "So What?" the Gaussian distribution is based on pure random-ness. Did you expect everything to be a completely random event?

    Neither artical seems to go into great detail about how the new curve was calculated, but it's simply a _FACT_ that applying the Gaussian distribution to most events is considered a "simplification" of the problem, assuming it's random. Take away some random-ness, and of course the Gaussian distribution won't fit.

    Intelligence (however mesured) will not be purely random, nor will floods, grade distributions, tornados, or anything

    What's missing from both of these pieces is an explination behind the way the new curve was built, and on what foundation. The Poisson (spelling is way off there) distribution is frequently used in place of Gaussian because it "fits better," but again, doesn't prove that the events have much to do with the math.

    This is a case of "curve fitting gone wild" here, and unless I can see someone spell out in scientific detail the relationship between the events and the distribution, I don't buy it. So, they have a new equasion, and a new curve, it doesn't mean that the events are related to the math directly. If you look for anything hard enouth, you will start to find it everywhere.

    I do award them credit for a new curve that better fits some models. If the equasion for thier curve is manageable. If it's a complex equasion, it's worthless, because the whole point is to make some equasion fit a distribution of events. If theirs fits, and it's easy to calculate, it's benificial. But it does not emply a direct coorilation between the functions and the variables in the distribution. How do I explain this in SlashDot terms... (/me get's frusturated).

    Ok, take Moore's Law, you all know that right? Processor power doubles every 18 months? Or, the more accurately I believe he stated something to the effect that the number of circuts would double every 18 months. Well, a loosely fit exponential function will almost match this trend (roughly). But then you have to "adjust" the month scale between 12 and 24 untill the curve fits well. Now, that's a "model" but does not prove scientificly that circuts and design engineers are behaving exactly as can be predicted. At some point in the future everyone has predicted Moores Law will fail. See... It's a model! Curve Fitting.... Doesn't PROVE anything about what's going on in developers minds, or much tangable other that the "estimation" that things will get more powerfull in the computing world.

    Now, take it a step further, say Moores Law fails right as people develop a new method of increacing computing preformance, like say 3D circuts, or something not yet concieved, and with less "countable circuts" you get more preformance. Suddently, new devices start to a few less circuts, and more power. Now the Moores Law curve goes down, slowly at first, leveling off, and maybe dropping just a tad, and it starts to look like a "bell shaped curve" only half drawn. You could go "Curve fitting crazy" and say "Hey, it's Gaussian, it's going to go down now, and within another 15 years, we will all be back to 8 bit processors!" That's just idiotic.

    In short, curve fitting is useful to predict many things, but it can not be assumed that the curve implyes natural phenomona. Any curve that fits data is useful. A curve that fits data does not directly imply complete coorelation of events, or diffinitive proff that God does or doesn't play dice (hope he does personally, has to have fun sometime!). And Furthermore:

    For those who continue to doubt that it could all be so simple, Prof Turcotte has a suitably direct response. "People say: 'You can't do it because it's too complicated a problem'," he says. "We say: 'Just look at the data'."

    So his data fit, so what? Any reasonable math wiz should be able to come up with a few dozen equasions that fit a line. Doesn't prove a thing.

    Forgive my typos, bad grammer, and spelling, I got pretty pissed at tabloid junk science, and I had to vent. Feel free to prove me wrong, I would like to see how you can prove the new equasion and chaos theory is the best "insight into the universe" we have... BTW, if you can prove it, you'll probably be up for a Nobel Prize too.

  • So the Gaussian distribution isn't wrong but it just isn't correct selection of distributions for that particular experiment.

    My point exactly- the people doing that research probably never intended for the article to lean towards a "Scrap Gaussian! Look at us!" thesis, but that's what happens from time to time when you get journalists in the act of "reporting" science.
  • Statistics classes which teach Gaussian distributions are fine provided students learn what is required for the central limit theorem to apply. I think you may be mistaken regarding the "noninfinite" part. We can quantify the fluctuations from a Gaussian law in a rigorous manner in cases where a noninfinite number of variables are concerned. The real difficulty, on the other hand, is that people sometimes apply the CLT when it is not valid.

    You should add "...approaches infinity, provided the fractional contribution from any one random variable to the sum uniformly converges to zero in the limit as N -> infinity." This is an important distinction. For instance, Levy distributions are a class of stable limit laws for which this is not the case--the largest variable in the sum can in fact dominate the sum. Symmetric Levy distributions may superficially resemble Gaussian laws, but with tails that decay slower (like power laws rather than exponentially fast).

    This article is amusing if only because it is a nostalgic throwback to the days of P.R. and hype over "chaos theory." Call me dense, but I don't understand why something as simple as scale-invariance needs to be dressed in the extra jargon and hype. Assuming the author did not miss anything terribly fundamental, I don't see anything novel in what was reported. Perhaps someone in the know can fill me in on just how exactly this turns statistical physics on its head?

    [For those who are interested, Levy distributions are treated quite adequately in Limit Distributions for Sums of Independent Random Variables (Gnedenko and Kolmogorov) (c)1954].
  • You are not necessarily required to use a true Gaussian curve in statistical analysis, if my memory serves me well.
    As I recall, we were encouraged to map our distributions manually, to discover the shape of our curves, which resemble the Gaussian curve, but are skewed in one direction or another.
    The curve in the article looks much more like some of the curves we'd come up with, but the excercise was to demonstrate that the larger your sample, the more your curve began to look like the classic Gaussian curve.
    Remember, non-math people, (like me), that all a statistic can show is that something happened that cannot be attributed to random chance alone. That's it. Naturally, the closer your sample is to reality, the more you can be sure that you have results that are statistically significant, and the more discernable this will be when compared with the Gaussian curve, which is intended to be a close approximation of what you get with a distribution of truly random events. The goal of using statistics is to attempt to prove a correlation between conjecture and reality. Statistics is the only way we have of doing this.
    Everyone should get a copy of "How to Lie with Statistics", because it explains much better than I can what exactly statistics actually "prove".
  • I had not heard about the New Scientist article, but I have known about Benford's law for some years. Indeed without a description of what curve these people think that they have, I am not sure how it *differs* from Benford's law in practical import!

    Indeed I suspect that they just have some variation on a lognormal curve. (Which does indeed show up in many different places.)

    Incidentally one of the few things that I disagree with in Knuth is his presentation on Benford's law. Sure they toy mathematical model he generates is fun and all, but he says nothing about why it applies to the real world. And hence his "proof" says nothing about why real numbers that appear in real computers follow Benford's law. I personally find the general explanation in the article you listed to be far more convincing...

    Cheers,
    Ben
  • Notice that they place 'most common' on the rightmost of the graph, instead of in the center of the curve.
  • Actually, I was wondering if it weren't related to the Gaussian Orthogonal Ensemble (GOE) distribution, which was a result of much of Wigner's work pioneering Random Matrix Theory (RMT) decades ago.

    Mathematically, the GOE distribution characterizes the eigenvalues of a Gaussian distribution of orthogonal matrices containing random elements. (Forgive me if I've got the math a bit wrong; I'm a physicist by trade...)

    Physically, the GOE distribution has been popping up in increasingly many physical systems for a while now. Years ago (maybe by Wigner himself? not sure) it was noticed that the energy level spacings of atomic nuclei have statistical properties consistent with the GOE distribution. Some time later, people fooling around with microwave cavities began seeing these distributions as well. The quantum dot folks have also run into the GOE distribution, I believe.

    The GOE distribution seems to provide a good test for broken symmetries in a system. As a system's symmetry is gradually broken by, say, shaving off a corner of a piezoelectric crystal, the statistics followed by the eigenvalues (in this example, the resonant frequencies) gradually shift from GOE to Poisson, the latter which characterizes the eigenvalues of a truly random system.

    Now, two really cool things about the apparent universality of the GOE distribution are:

    • The distribution is parameter-independent, and does not contain any information about the system being analyzed. I can glom together energy level data from many different nuclei and still obtain the same GOE distribution.
    • There appears to be a connection to chaos.

    Neat, chaos! Well, sort of. If you take a classically chaotic system, say, a Sinai billiard, and quantize it (solve the Schroedinger equation), time after time you will discover that the eigenvalues of the quantized system have these nice statistical properties that happen to fall out of RMT, namely, the GOE distribution.

    So does that mean all quantum systems that follow GOE statistics are chaotic? No. In fact, it's difficult to define what "chaos" really means for a quantum system that has no classical analog. But it implies there's a connection, it certainly is fun to think about, and perhaps continued research will reveal a deeper universal phenomenon at work. I wonder if these researchers haven't taken another step in that direction.

    Dang, I wish I had something up on the web about how my research relates to all this... well, you can email [mailto] me.

  • First, let me say that the graph in the article is poorly labeled (or at least their example


    Not to mention that the area under the new curve in their graph is significantly more than that under the bell curve. Which means that the total probability is above 1. To use their example, we have a very neat species distribution, say 50% wolves, 50% rabbits, and 30% bears, for a total of 130%... My question is, is the Financial Times always that bad at math?
  • Maybe in the long term, this will be "proven" or "disproven," but does anybody remember the phrase, "Assume a point mass ..." or "Assume a sphere ..." way back in school. What happens in the long run is that it's never as nice as you'd like. That's why there are other distributions than the Gaussian. Why should we assume that everything that can be found has already been found. Take the time to think, folks. Maybe this is like cold fusion, but maybe its like the transistor for model prediction. More advanced, more accurate models could be the result.

  • from the second article:

    "The reason the systems behaved in the same fashion, they agreed, was that they shared a feature known as self-similarity. If an object is self-similar, it means it looks the same when viewed from far away or nearby. One example is the cauliflower: just as it is made up of individual florets, so each floret is made up of still smaller florets. If you were given a picture with no sense of scale, you could not tell if you were looking at a whole cauliflower or just one floret."

    I grepped the article for "fractal" and not once was it mentioned. Gee I'm pretty sure thats the term used for what the author describes, or is the target audience so simplistic that the proper terms have to be dumbed down?

    Fear the popular press's interpretation of mathematical research data, especially when they need to mention Jurassic Park in the body of the story.

  • If memory serves me the "God doesn't play dice" came from a dispute Einstein had with Niels Bohr where Einstein didn't belive in the randomness of quantum mechanics but was proven wrong later, God does play dice.
    Did I remember correctly?
  • That's why I said "the article claims they use Gaussian for this and that". I thought someone will have more insight. Yeah, I also heard that insurance companies employ a few good men/women. I would imagine they would use some extrapolated curve based on claims data for a century back or something, not just some silly distribution.

    In which case, you're absolutely right: WTF is the target audience for the article? Someone speculating on insurance companies stock?
  • One of the original researchers in this field is Benoit Mandelbrot, who applied it to financial markets, showing that price changes are not (as is frequently assumed) gaussian. Why the Financial Times did not pick up on this angle is beyond me.
  • Yes, I realize that it is statistics. However, it deals with determining what seemingly random events are most likely to happen everywhere, not in a single closed environment. Statistics are compilations of data. How that data is used is not statistics. "1 out of 10 blah blah blah" is spouting of statistics. What causes that 1 out of 10 is not statistics. That's the gist of this article.

    The reason I spoke of Psychohistory is because it is supposed to be using the statistics on an advanced probability engine. This is a step toward that equation. The more refined we get to deciding that "1 out of 10 blah blah blah because [insert reason]" is the closer we get to figuring out the universe and how humanity acts as a whole.

    -NYFreddie

  • cold fusion

    this is different (so far) in that it was two totally seperate areas of study that found the same thing, not some freaks in the desert.

    To me, the difference is that someone has made the claim that the Universe is radically different from what we know, based on a sample of data that was not peer-validated. If you remember the data on cold fusion, it made perfect sense if you adjusted the y-axis, and didn't lead to such mind-boggling conclusions.

    I'm willing to bet this is exactly the same. Inventive scientists deduce important rules based on experimental data. Rigorous scientists double-check their data before deducing important rules. What we need is inventive, rigorous scientists.

    Slashdotia

    pronounced Slash-dosh-ya? :)

    Sounds good to me. :) As we all know, Slashdotia is the Capital of Slashdom. :)

    "There is no surer way to ruin a good discussion than to contaminate it with the facts."

  • In my own field, the distribution of stock market returns is often taken to be distributed log-normal

    You can also start with log returns (instead of "normal" returns). This will give you an approximation to a Gaussian (as opposed to a lognormal distribution), plus they are summable across time. I work almost exclusively with log returns -- they are a pain when you need to calculate portfolios, but nice otherwise.

    A new distribution that gives increased weight to rare events would be very useful

    There are several (e.g. Cauchy), but the problem is that they are much harder to deal with (analytically) than the Gaussian. And if you don't like any, you can always work with the empirical distribution -- no need to pollute the facts with assumptions about what they should be. However, not much of statistics will be useful to you -- the Bayesians offer some good tools.

    Getting back to the original point, I wonder if these guys heard of Hurst and Hurst processes. A persistent Hurst process (sometimes called black noise) will generate something like what they found, and Hurst himself developed his theory on the basis of natural phenomena (he started with the frequency of floods on the Nile which occurred, surprise, more often than should have been expected). Skim through Peters "Fractal market analysis" for more information.

    I bet these guys rediscovered Hurst processes.




    Kaa
  • Yes, Gaussian is one case of the family of so called "stable" distribution - the only one with a finite variance. Stable distribution is such that if you add random variables with this distribution you will get a variable with the same distribution. - As in law of the large numbers, but without finite variance.
    Infinite variance means the tails of the distribution fall off slowly - it is more likely to get an event further from the mean value.

    So fucking what? Big news? Hardly.

    Stable distributions have a lot of applications in many areas of physics and finance. Do a literature search on "Levy flights" for examples. There was a good article on Levy flights in one recent "Nature" (IIRC) For some financial applications - check out very easily written (but for a specialist kinda useless - IMHO) Mandelbrot's "Fractals and Scaling in Finance". It has some good discussion on the subject.

    Guys, you look like fools, making news out of a rather well known field. And discussing it rather childishly...

  • by Anonymous Coward
    Quick! Somebody fix the SETI client! We don't want to miss any of the alien signals!!

    SETI GBC Analysis [berkeley.edu]

    Just kiding. :)
  • Yes, my sentiments also. Does anyone have a link to a page which actually SAYS something on this topic?
  • Normal distributions only make sense if your fundamental operation is addition. If, however, your fundamental operation is multiplying the random variables of interest, then you get a distribution whose logarithm is normal. Hence the name lognormal.

    This is just as natural as a normal distribution, and appears more often than straight normal distributions in subjects like finance and stochastic analysis.

    Cheers,
    Ben
  • I thought it was a Jeffries prior one should use as a LIP for a scale parameter.
  • by Anonymous Coward
    This is a well know fact in statistics and finance, so called fat tails.
  • Just in case people don't know where in the heck the above jokes are coming from, go out, buy some books by Terry Pratchett, and have a ball...

    Ben
  • Arrgh... you missed you statistics class.
    T-distibution is NOT a BETTER FIT. It is a distribution of a sample mean of a Gaussian variable, when you use a sample variance instead of a true variance (when it is unkhown) When your sample grows bigger - the estimator of the variance becaomes more accurate and your t-distribution approaches Gaussian.
    There is nothing painful about t to use once you got a clue.

    As for the Gaussian, as I mentioned above - it is a particular case of a stable distribution : with a finite variance. This is hardly news, but some recent developments in the self-affine processes made it more other stable distributions more widely known..
  • it's a one-in-a-million chance of winning the lottery...
  • Umm is that article just on crack or what? I am sure that graph (Learning curves : a fresh approach) has to be a misrepresentation of something... Notice how the "old" graph says to the left we have a small distribution of "rare" species and then (as you move to the right) it gets larger and the for some reason only known to the business kidz very unrare species they think become umm rare again :). Obviously this just isn't true, no one maps out gaussian curves like that.. I think by the assymmetry of the "new" curve that they are showing that some things are better modelled using Poisson Statistics.. thats what the "new graph" looks like anyway.. the question is what on earth are they trying to show.. what is the Y axis??

    Anyway i'll put money on the fact whoever these profs are are trying to scam cash from financial wallstreet types.. (New curves new ways to predict the stock market give us money *cough*) this article was a plant.. but i'm feeling kind of cynical today :)

    -avi
  • Here is an example of "universal" mathmatical models from a Mechanical Engineer:

    In school, I took a class called System Dynamics which is all about modeling dynamic behavior of systems. There is an interesting similarity of behavior between electrical, mechanical, and hydraulic systems in the equations used and how you define them.

    Driver:
    Electrical = voltage
    Mechanical = force
    Hydraulic = pressure

    Flow:
    Electrical = current
    Mechanical = velocity
    Hydraulic = flowrate of fluid

    Resistance:
    Electrical: voltage = constant*current
    Mechanical: force = constant*velocity
    Hydraulic: pressure = constant*flowrate of fluid

    Capacitance:
    Electrical: contant*integral(current) with time
    Mechanical: constant*distance traveled
    Hydraulic: constant*integral(flowrate) with time

    Inductance:
    Electrical: Voltage = constant*delta(voltage)/delta(time)
    Mechanical: Force = constant*delta(velocity)/delta(time)
    Hydraulic: Pressure = constant*delta(flowrate)/delta(time)
    (In the mechanical example, mass is the constant)

    The equations are very similar, but you don't see me calling the press and saying I've found a "universal" mathmatical model.

    Trying to claim a "universal" law is hype. Just because there is similar behavior for magnetic properties, turbulent flow, and distribution of species is interesting, but doesn't suggest that everything is related in a similar way. I think that is why Mr. Turcotte got such a hostile reaction. Before you claim here might be a "universal law linking patterns of mineral deposits, floods and landslides" you better look at the data first and don't argue from the specific to the general the way he did in this case.
  • I need to read before I submit.. Try again:
    but recent developments in the study of the self-affine (fractal) processes made other stable distributions used more often..
    Still bad, but screw my English...
  • amen brother

    http://www.indep.k12.mo.us/THS/student/aforreste r/mandelbig.html
  • As many have pointed out, there is nothing new or nothing surprising in the claim. What the statistical theories claim is that if a variable is truly (mathematically) random the statistical distribution asymptotes to a Gaussian distribution (or the Bell curve). That's not an observation or a fact. That's a theorem which one can prove, in other words, it's more like a definition of a "true randomness" of a variable. Roughly speaking, if something is truly random, its distribution will begin to look like a Bell curve. The real question is, "what is truly random?"

    It's almost nonsensical to state that the nature does not follow the Gaussian curve just because a statistical variable does not follow it. Perhaps it tells you more about the variable itself. If a variable x has a perfect Gaussian distribution, the distribution of log(x) will look nothing like a Gaussian distribution. Does that tell us the Gaussian curve is not the normal curve? It only tells us that even if x is truly random log(x) is not.
  • The journalist tries to hype up the result as somehow overturning a "universal truth", that is, ubiquity of the gaussian bell curve.


    First, the bell curve is ubiquitous because so many random processes satisfy the assumptions of the Central Limit Theorem. (finiteness...)


    However, there are lots of natural phenomena that don't meet those requirements and so we use lots of probability distributions in science. Lorentzian and Poisson distributions come to mind.


    It's fascinating, but unsurprising that self-similarity leads to a different kind of probability distribution.


    The journalist heads into "Golly Gee" territory once he starts calling it a "Universal Curve".


    DK

  • What you're talking about is called the "central limit theorem", which holds for the summation of iid (independent, identically distributed) random variables of any kind of distribution with finite mean and variance.
  • If you measure a set of mostly random events you will end up with a bell cuve.

    &nbsp &nbsp &nbsp it seems to me that external, modifying events are removed from scientific studies as much as possible.
    This act automagically skews the results at least slightly enuff to where you will find something else in nature.
    It would seem to me that it would be impossible to take EVERYTHING (i.e. everything) that might efect the results into account so we dont bother trying.

    &nbsp &nbsp &nbsp If we want to predict a mostly random event we apply the bell shaped curve. But I say 'mostly random' becouse most things are not truely random.

    &nbsp &nbsp &nbsp Just becouse we fail to predict or fully understand a problem does not mean that it is utterly random. This new curve helps to predict some things. Others might take a whole new curve. I do not believe that there will ever be a universally true curve. All that this points out (gasp) is that not all things are utterly random.
  • by ruff ( 83941 ) on Thursday September 02, 1999 @11:59AM (#1708490)

    Just because your data doesn't precisely fit the distribution, it does not mean the distibution is "wrong." What it means is your data doesn't match your distribution.

    This appears to be another case where journalists have missed the point.

    The Gaussian distribution is not "wrong" in any shape or form.
  • It sounds like something out of an Asimov novel, actually. A common formula that can be used to judge seemingly random events when large masses are considered - the individual is random, but the collective is predictable.

    I wonder if I should take this back to school and demand they raise my grades for all those times I "created the bell curve".

    -NYFreddie

  • Goddess, I love it when the status quo gets shaken up! Woo!
    To explain the 'rare more common than common' phenomenon, one need look no further than Hallmark or Precious Moments or crap like that: "We are all special, we are all unique, etc." Blah!
    Still giddy, this is cool!

    The Divine Creatrix in a Mortal Shell that stays Crunchy in Milk
  • by Mr Z ( 6791 ) on Thursday September 02, 1999 @12:04PM (#1708495) Homepage Journal

    First, let me say that the graph in the article is poorly labeled (or at least their example poorly chosen), IMHO, since "rarity" is related to the number of standard deviations you are from the mean (whether or not the distribution is symmetrical), whereas their graph has rarity monotonically decreasing from left to right. I guess in this sense ("rarity of a species"), rarity != probability.

    This new graph stikes me as a bit odd, since it's not symmetrical. With the bell curve, you only need to know how many standard deviations you are from the mean. With this curve, "above the mean" and "below the mean" are vastly different territories.

    This curve brings up two questions for me:

    • Are there processes/events for which the mirror-image of this curve is the more appropriate distribution?
    • Whatever happened to the other distributions we know and love, like the Poisson distribution? Not all random events are evenly distributed, and we've known this for a long time.

    I guess this new curve is just another way of saying that "Hey, there's a class of 'random' events out there that share a common non-uniform distribution!" While that's useful to know, I don't see it as the ultimate refutation of the Gaussian distribution.

    --Joe
    --
  • by PG13 ( 3024 ) on Thursday September 02, 1999 @12:04PM (#1708499)
    The use of the gausian curve is based on the assumption that the random variable we are considering is actually gereated as an average of many many independent random variables. It has been shown for all 'reasonable' independent random variables in the limit their average will be a gausian distribution. This is straightforwad mathematics no arguing with this.

    As such from a mathematical point of view this has nothing to do with replacing the gausian curve...it is still clearly the most 'natural' mathematical curve. However, what I understand the authors to be claiming is that certain types of real world events are not actually gaussian and are described better by this model. This shouldn't be that surprising as often the 'extreme' cases are not caused by a mere sum of the independent random variables mentioned earlier.

    For instance intelligence might be regarded as the influence of a great deal of small random variables (how some genes got arranged upbring etc..) but the truly tale end cases such as mental retardation do not occur because all of these factors go bad, (someone who is retarded is the result of some genetic defect usually not a combination of bad upbringing poor nutrition etc..). This is probably not the kind of thing the distribution describes but it shows that the gaussian really never has been the end all and be all.

    So while this is undoubtly a very interesting subbject it really isn't that exciting. Ohh and the claim that the greater incidence of natural disasters disproves the gaussian was really BS, while they may not be gaussian this doesn't appear to be a large enough sample size to make such definitive claims
  • I happened to think of one possible reason why so many phenomena might fit a lopsided curve better: The bell curve implies the possibility of infinite extension in both directions. If the mean of the distribution is near one physical extreme (for instance, looking at average rainfall levels -- you can't have negative rainfall), then the curve must become lopsided.

    Perhaps that's what they've stumbled onto?

    --Joe
    --
  • And if you want the technical exposition (rather than the narrative one he provides in the novels), then pick up Pratchett's Discworld RPG from Steve Jackson Games. :) Hmmm...if Bill Gates lived on the Discworld, who would come for him when he died? And since rare events are much more common under this newly discovered distribution than under a Gaussian distribution, and since the Discworld is said to reside under the far tails of the probability curve, does that mean there are more Discworlds than were previously believed to exist?
  • YEAH! RIGHT ON!

    I feel the same way about the "least squares" technique for determining the line of best fit. It is popular precisely because it is easy to do calculus on x^2.
  • I personally prefer the more voluptuous curves.
  • by Anonymous Coward
    September 3, 1999

    Wow!

    It is interesting to see the response that this "research" article in the financial times generated. I'm a research associate (Bruce Malamud) working closely with Donald Turcotte. A student wrote me about the discussion your web site was having. Donald Turcotte was one of the scientists "quoted" in the financial times article. My research area has been in the areas of "time-series analysis" and also applying ideas of fractals and self-organized criticality to natural hazards. I did my Ph.D. with Donald Turcotte and am now doing a brief stint as a postdoc while I look for a "real" job in the world.

    First of all, this Financial Times article was a "quickly" researched article on the part of the person who wrote it. Donald Turcotte was contacted and interviewed by phone on Tuesday/Wednesday, with no contact afterwards from the Financial Times to see how correct they got the overall picture. This is how things are and he and I both gulped when we saw how the article appeared. We quickly prepared a short "response" from him (below) to the deluge of e-mails and telephone calls that he received yesterday.

    Bottom line, he was a bit misquoted, but the general idea holds. We are talking about applying the ideas of power-law frequency-size distributions (i.e., fractals) to extreme events, including floods, forest-fires, earthquakes, landslides, etc. Donald Turcotte has been active for many years in the area of applying fractals, self-organized criticality, and chaos theory to the earth sciences, and yes, he knows very well that he did not "invent" the idea, just made many applications (well, a bit more then that, but read his book).

    On the most basic level (and no, I'm not trying to be insulting, I'm sure many people on this site know what I'm talking about already as this is basic statistics), at one level the idea is a very simple one. Plot the frequency-size distribution of a set of data and see what curve is that best fits the data, i.e. what might be the underlying distribution. For some sets of data (such as forest-fire burn areas, earthquakes, and many other "natural" data sets) the frequency-size distribution follows a nice straight line in log-log space, i.e. it is follows a power-law (fractal or self-similar) distribution. Although one cannot say for SURE what an underlying distribution is, one can make certain (statistical) guesses as to whether a distribution follows more a Gaussian, log-normal, power-law, etc.

    Once on "believes" that a set of data follows a certain distribution, one can then begin to make some guesses as to what an "extension" of that curve might bring in time. If one has 30 years of flood-discharge data, one might then be able to make certain predictions as to the "size" of what the 100-year flood might be. Same with earthquakes. One has a better idea of the probability of having a certain size or greater earthquake, flood, forest-fire, etc. each year. It just happens that many of these events appear to follow power-law distributions, and these are not as "accepted" in the statistical community.

    Don just came in and is looking over my shoulder. He adds (to my above comments) that statisticians do not in general recognize power-law distributions because one cannot define a pdf for them. (Although one can define pdf's for certain distributions that are similar distributions to the power-law distributions, such as the Pareto distribution).

    So...in terms of the insurance community, they are of course very interested if a given "natural hazards" appears to follow more a power-law distribution vs. log-normal or Gaussian, as the resulting recurrence intervals will be very different. Power-law distributions tend to be very conservative for extreme events, i.e. one would expect more larger events in a given period of time, then say a Gaussian distribution. Others of course interested in this underlying distribution would be engineers trying to decide how big a flood one might expect in a given area in a given amount of time (and yes, we're dealing with extreme events, so the statistics are small and unsure), so as to know where people can build houses, how deep to make the bridge supports, etc. Bottom line is the statistics are unsure because there the data sets are small, but people need some sort of a starting point as a lot of money rides on the answers of what the "underlying" distribution might be.

    There are also many scientific implications, ranging from the simple "describing" what distribution a data set best follows, to understanding better (or in a different way) the underlying basic physics or equations that describe a given natural phenomena due to a better understanding of the statistics resulting from the equations vs. the actual data. In addition, many scientists are now beginning to think that the pervasive power-law distribution in nature is a general indication of self-organized critical behavior. One definition of self-organized behavior is when one has a complex system with a small steady input, and a power-law distribution of the "avalanches" (the events). Donald Turcotte and I wrote a paper (in Science, see below) applying this general idea of self-organized criticality to computer models and forest fires. Of the references listed below, this is probably the easiest for people to get.

    OK, before I start babbling. Below is the "reply" that Donald Turcotte wrote to many of the e-mails that came in during the last day.

    Bruce Malamud

    _________________________________________
    Wednesday September 2, 1999
    Ithaca, NY, USA

    Dear Interested Reader:

    Due to the large number of e-mails and telephone calls I have received with respect to the articles by Michael Peel, "New Curve Makes Life Predictable" and "Redrawing the Curve Reveals New Pattern of Events", that appeared in the Financial Times, September 2, 1999, I have prepared a short general reply. If you have further questions or comments after reading the below "comment" to the article, please do not hesitate to contact me for further information.

    These Financial Times articles emphasize the importance of power-law (also called fractal or fat-tail) distributions in estimating the probability of occurrence of extreme events. It is unfortunate the article implies that I invented the idea of power-law distributions, which have been recognized now for many decades. For instance, earthquake hazard assessment is based mainly on the Gutenberg-Richter relation; which is a power-law distribution of the number of earthquakes as a function of their magnitude [for some papers where I discuss this, see DLT, Annual Review of Earth and Planetary Sciences, Vol. 19, p. 263-281, 1991; DLT, Physics of Earth and Planetary Interiors, Vol. 111, 275-293, 1999].

    My work in power-law distributions is based on the concept of fractals, which is due to the pioneering work of Benoit Mandelbrot [for instance, see his book, The Fractal Geometry of Nature, Freeman, San Francisco, 1982]. Mandelbrot, along with many other researchers, have applied the concept of fractals to many phenomena in the natural and "man-made" world, including to financial time series. Other distributions, similar to the power-law, such as the Pareto distributions, have also been used for a long time. A good web page which discusses fractals and has many links is The Spanky Fractal Database (http://spanky.triumf.ca/].

    My own contributions have concerned applications to natural hazards and related phenomena. These are set forward in detail in my book [DLT, Fractals and Chaos in Geology and Geophysics, 2nd ed., Cambridge University Press, Cambridge, 1997] and in a major review paper on self-organized criticality [DLT, Reports on Progress in Physics, Vol. 62, 1999, available as a pdf document (preprint) which can be sent upon request].

    The principal contributions of my group have been the applications of fractal distributions to:

    (1) Fragmentation (by explosions in asteroids, etc.). [DLT, Journal of Geophysical Research, Vol. 91, p. 1921-1926, 1986]

    (2) Mineral deposits. [DLT, Economic Geology, Vol. 81, p. 1528-1532, 1986]

    (3) Floods. [DLT and L. Greene, Stochastic Hydrology and Hydraulics, Vol. 7, p. 33-40, 1993; DLT Journal of Research NIST, Vol. 99, p. 377-389 1994; B.D. Malamud, DLT, and CC Barton, Environmental and Engineering Geosciences, Vol. 2, p. 479-486, 1996. The last paper is available as a pdf document at http://coastal.er.usgs.gov/barton/pubs_online.html ]

    (4) Landslides. [J.D. Pelletier, B.D. Malamud, T. Blodgett, and DLT. Engineering Geology, Vol. 48., p. 255-268, 1997; available as a postscript file at http://www.gps.caltech.edu/~jon/]

    (5) Forest Fires. [B.D. Malamud, G. Morein, and DLT. Science, Vol. 281, p. 1840-1842, 1998; available as a pdf document for subscribers of Science, web site: http://www.sciencemag.org/]

    Many extreme-value events are directly related to time series that exhibit persistence or memory (for instance, time series of temperature, river discharge, the stock market, etc.). A good reference to applying persistent techniques (and a discussion of how to apply the techniques) is Advances in Geophysics, Vol. 40, B.D. Malamud, J.D. Pelletier, and DLT.

    Two other colleagues that have used power-law techniques applied to natural hazards include Dr. Bruce D. Malamud (Cornell University, e-mail: Bruce@Malamud.Com) and Dr. Christopher C. Barton (USGS, e-mail: barton@usgs.gov, home page: http://coastal.er.usgs.gov/barton/).

    Again, please do not hesitate to contact me for further questions.

    Donald L. Turcotte
    Maxwell Upson Professor of Engineering

    :::::::::::::::::::::::::::::::::::::::::::::::: :::::::::::
    :: Donald L. Turcotte
    :: Department of Geological Sciences
    :: Cornell University, Snee Hall
    :: Ithaca, NY 14853-1504, USA
    :: Office: 607-255-7282; Fax: 607-254-4780
    :: e-mail: turcotte@geology.cornell.edu
    :::::::::::::::::::::::::::::::::::::::::::::::: :::::::::::


  • by Enoch Root ( 57473 ) on Thursday September 02, 1999 @12:19PM (#1708559)
    Alright. I don't buy it.

    The problem here is how you define and measure a rare occurence. Let me give you an example.

    Let's say one night you watch the results of the lottery on TV, and the numbers '1-2-3-4-5-6' come up. Is that a rare occurence? No. That sequence is as likely to occur than your birthday and your girlfriend's birthday combined into esoteric equations.

    Example number 2: I'm with this girl one night. I say my astrological sign is Scorpio. "Really!" she exclaims, "I'm Scorpio too!" What are the probabilities of that happening? 1/144? No, just 1/12. At one point (and cryptos will be familiar with this) if you add people, it becomes a rare event that you do not find people with the same sign.

    All that graph is showing me is that the guys (I'm hesitating to call them scientists - I mean, they published in "serious papers"? Come on. Names, please) looked purposefully for freak occurences, discarding other "rare" occurences that were perfectly normal. That's why the left side of the graph is wider.

    Thing is, the Gaussian curve doesn't come out of nowhere; it's not arbitrary. For instance, in statistical mechanics and quantum mechanics, you get bell curve distributions precisely because of the distribution of particle states.

    All these guys are saying is, "rare events are not as rare as we think they are". That's not because the bell curve is wrong, it's because we seem to forget how huge the Earth provides a sample.

    What are the odds of being struck by lightning twice? One in a billion? We're 6 billion on this Earth. It's bound to happen to someone. Same thing with winning the grand prize lottery once or twice.

    And, again, same thing with floods or tornadoes. Yes, in themselves they're rare. When taken alone they seem improbable. But on the scale of the planet, that's the kind of thing that happens.

    Alright, anyone got another article on cold fusion lying around?

    "There is no surer way to ruin a good discussion than to contaminate it with the facts."

  • The graph makes more sense if you relabel its axes: x=number of individuals of a species, y=number of species with exactly that number of individuals.

    In other words: we aren't talking about the likelihood that you will encounter an individual of the species, we're talking about counting the species itself. A few really common species, a good spread of "average" species, and a few species represented by few individuals.

    'Course I could just be full of it. Wouldn't be the first time...
  • >In the 1st article, there is a graph about midway that appears to illustrate the notion that, with the new curve, you are more likely to find the rarest creature than the least-rarest creature. I must not be interpreting right, and I tried reading that part a few more times.


    It took me a minute, too - I'll try to distill my understanding into english. Assume that the rarity of a species is related to the number of times it is found (duh). The x-axis can be thought of as the number of findings of a given species. The y-axis can be thought of as the number of species that were found X number of times. Using the gaussian distribution, you would expect a symmetric tail-off in both the more-rare and the less-rare directions from the peak value. {Yes, I know you can have asymmetric gaussian distributions.} What this new curve is showing is that the tail-off is much less in the more-rare directions. In other words, assume the peak of the curve is at 100 sightings of a specie, with a standard deviation of 10 sightings. You would expect to some number of species to have 130 sightings (3-sigma). Under the gaussian distribution, you would expect to see the same number of species that only have 70 sightings. This new distribution says that the number of species with only 70 sightings would be much higher than the number of species with 130 sightings.

    Fascinating - I will certainly have to explore this further.

It is much harder to find a job than to keep one.

Working...