Please create an account to participate in the Slashdot moderation system

typodupeerror

## Why Standard Deviation Should Be Retired From Scientific Use312

An anonymous reader writes "Statistician and author Nassim Taleb has a suggestion for scientific researchers: stop trying to use standard deviations in your work. He says it's misunderstood more often than not, and also not the best tool for its purpose. Taleb thinks researchers should use mean deviation instead. 'It is all due to a historical accident: in 1893, the great Karl Pearson introduced the term "standard deviation" for what had been known as "root mean square error." The confusion started then: people thought it meant mean deviation. The idea stuck: every time a newspaper has attempted to clarify the concept of market "volatility", it defined it verbally as mean deviation yet produced the numerical measure of the (higher) standard deviation. But it is not just journalists who fall for the mistake: I recall seeing official documents from the department of commerce and the Federal Reserve partaking of the conflation, even regulators in statements on market volatility. What is worse, Goldstein and I found that a high number of data scientists (many with PhDs) also get confused in real life.'"
This discussion has been archived. No new comments can be posted.

## Why Standard Deviation Should Be Retired From Scientific Use

• #### Issues (Score:5, Informative)

on Wednesday January 15, 2014 @04:56PM (#45969913) Homepage

On the other hand, you also need to use 2-pass algorithms to compute Mean Absolute Deviation, whereas STD can be easily calculated in one pass. And you still need standard deviation as it relates directly to the second moment about the mean.

Also, annoyingly, Median Absolute Deviation competes for the MAD name and is more robust against outliers.

• #### Standard Deviation is Important (Score:5, Informative)

on Wednesday January 15, 2014 @05:02PM (#45970005)

Standard Deviation is the square root of the second moment about the mean [wikipedia.org], an important fundamental concept to probability distributions. Looking at moments of probability distributions gives us lots of tools that have been developed over the years and in many cases we can apply closed form solutions with reasonably lenient assumptions. Then we apply the square root in order to put it in the same units as the original list of observations and get some of the heuristic advantages that he attributes to the mean absolute deviation.

But it is a balance, and any data set should be looked at from multiple angles, with multiple summary statistics. To say MAD is better that standard deviation is a reasonable point (with which I would disagree), but to say we should stop using standard deviation (the point made in TFA) is totally incorrect.

• #### Re:Standard Deviation is Important (Score:1, Informative)

by Anonymous Coward on Wednesday January 15, 2014 @05:13PM (#45970103)

This.

Standard Deviation is the square root of the second moment about the mean [wikipedia.org], an important fundamental concept to probability distributions.

More generally, it is the L^2-norm of deviation from the mean which will open up theory for Hilbert spaces and functional analysis in general. Try to beat that. You shouldn't discard anything because people use it wrong. You should teach students today to use it right instead. p-value has been as big, if not bigger, a problem.

• #### Re:Basic Statistics (Score:4, Informative)

on Wednesday January 15, 2014 @05:20PM (#45970173)

The meaning of standard deviation is something you learn on a basic statistics course.

I took a statistics course in college. The statistics professor taught us to think of the standard deviation as the "average distance from the average". So if you know the average (mean) then any random data sample will be (on average) one SD away. That is simple, neat, and easy to remember.

It is also wrong.

• #### Re:That's not the problem. (Score:4, Informative)

on Wednesday January 15, 2014 @06:20PM (#45970747)

Not bashing modern life, it's great, but it isn't making many "great thinkers" in the mold of the 19th century mathematicians. We do more, with less understanding of how, or why.

The easier math problems are lower hanging fruit. As time goes on, the problems that are left become increasingly hard. Even when they get solved, average people can't understand what it means, and that makes it hard to care about, and hward for newspapers to make money covering that story.

Also when you read about the history of mathematics, it's easy to feel like these breakthroughs were happening all the time, compared with now, when in fact they were very slowly, and the pace of discovery is probably higher now than at any point in the past.

It's easy to say music was better in the 70's than now when you condense the 70's down to 100 truly great songs, forgetting all the crap, and compare it to whats playing on the radio today.

• #### Re:Basic Statistics (Score:5, Informative)

on Wednesday January 15, 2014 @06:59PM (#45971075)

... think of the standard deviation as the "average distance from the average" ... That is simple, neat, and easy to remember... It is also wrong.

In fact, it is wrong in exactly the way that TFA suggests: you're describing the mean deviation...

• #### Re:The big picture (Score:5, Informative)

on Wednesday January 15, 2014 @07:01PM (#45971095)

Hi, I'm a statistician.

It's not so simple to just say "ok, we're going to use the Mean Absolute Deviation from now on." The use of standard deviation is not quite the historical accident that Taleb makes it out to be--there are good reasons for using it. Because it is a one-to-one function of the second central moment (variance), it inherits a bunch of nice properties that the mean absolute deviation does not. There is not a one-to-one correspondence between variance and mean absolute deviation.

Taleb is correct that the mean absolute deviation is easier to explain to people, but this is not just a matter of changing units of measure (where there is a one-to-one correspondence) or changing function and variable names in code (where there is again a one-to-one correspondence). Standard deviation and mean absolute deviation have different theoretical properties. These differences have led most statisticians over the last hundred years to conclude that the standard deviation is a better measure of variability, even though it is harder to explain.

• #### normal densities (Score:4, Informative)

on Wednesday January 15, 2014 @07:22PM (#45971293)

For normal densities, standard deviations and MAD are just proportional, with a factor of about 1.25, so it doesn't matter which you use.

For non-normal densities, neither of them really is universally "right" for characterizing the deviation, but it's mathematically a whole lot easier to understand how standard deviation behaves in those cases than MAD. So even there, standard deviations are usually the better choice.

• #### Re:Basic Statistics (Score:4, Informative)

on Wednesday January 15, 2014 @07:25PM (#45971315) Homepage Journal

I should note that, contrary to the summary, Taleb is not properly a statistician--he's an economist

To be fair, economics has contributed a lot to the growth of statistics as a field of study. Due to various historical quirks, econometrics developed as almost a separate field from statistics for decades, and economists have often looked at statistical problems with a fresh eye, and had insights that people working in the mainstream of statistics and biostatistics might have missed. In my own work, biostatistics-flavored bioinformatics, I've often found myself referring to the econometric literature.

I have no idea if any of this applies to Taleb, though. Certainly TFA doesn't strike me as a particularly profound example of statistical reasoning ...

• #### Data Science (Score:4, Informative)

on Wednesday January 15, 2014 @09:51PM (#45972369)

Data science is a field that combines machine learning and statistics to derive meaning from data. Data scientists should be reasonably well-versed in classical stats, but the data sets they deal with are often huge, ill-defined, and not amenable to analysis using classical methods. To deal with such challenges, data science recruits a healthy combination of certain areas of comp-sci (databases, machine learning, NLP, AI), statistical methods, and, quite often, improvisation.

Strange that there are so many people on here that are unfamiliar with data science.

• #### Re:So you want to retire a statistical term... (Score:5, Informative)

on Wednesday January 15, 2014 @10:04PM (#45972445)

the mean *absolute* deviation, rather than the square root of the mean *squared* deviation (the standard deviation).

The mean absolute deviation is a simpler measure of variability. However....

The algebraic manipulation of the standard deviation is simpler; the absolute deviation is more difficult to deal with.

Further, when drawing a number of samples from a large population --- the standard deviation of their mean deviations is substantially higher than the standard deviations of their individual standard deviations; that is to say, the standard deviation of a sample provides an estimate that is more in-line with the whole.

That is to say.... there are cases where the Standard Deviation may be better, AND, much of statistics is using standard deviation as its basis.

Fisher, R. 1920 Monthly Notes of the Royal Astronomical Society, 80, 758-770:

the quality of any statistic could be judged in terms of three characteristics. The statistic, and the population parameter that it represents, should be consistent , The statistic should be sufficient, and the statistic should be efficient -- e.g. the smallest probable error as an estimate of the population. Both the standard deviation and mean deviation met the first two criteria (to the same extent); however, in meeting the third criterion -- the standard deviation proves superior.

• #### Standard Deviation is fn of 2nd moment of the data (Score:4, Informative)

on Wednesday January 15, 2014 @11:06PM (#45972805) Journal

I can really go for renaming standard deviation, but it should not be abolished.

Standard deviation is a function of the second moment of the data, and if you remember your laws for combining moments of inertia (the parallel axis theorem), then you'll understand better what you're dealing with.

2nd moments detail resistance to spin, and thus the resiliance of your findings to changes and errors.

• #### Re:So you want to retire a statistical term... (Score:5, Informative)

on Thursday January 16, 2014 @12:04AM (#45973065)
That actually was part of my point. In my day job (and night job and weekend job, and, oh god I need a vacation) I'm an astrophysicist. I have more data sets that I can recall, and the number of problems for which I'm confident that the errors are Gaussian is at most 2 or 3. We're finally in an era where computational power facilitates forward modeling & Bayesian techniques that can provide good estimates of true uncertainties. But I (and many of my colleagues) barely understand how they work. Any expectation that most researchers are willing to invest the time to understand anything beyond Gaussian statistics is unrealistic.

Egotist: A person of low taste, more interested in himself than in me. -- Ambrose Bierce

Working...