## Why Standard Deviation Should Be Retired From Scientific Use 312

Posted
by
Soulskill

from the hope-it-gets-a-good-pension dept.

from the hope-it-gets-a-good-pension dept.

An anonymous reader writes

*"Statistician and author Nassim Taleb has a suggestion for scientific researchers: stop trying to use standard deviations in your work. He says it's misunderstood more often than not, and also not the best tool for its purpose. Taleb thinks researchers should use**mean deviation*instead. 'It is all due to a historical accident: in 1893, the great Karl Pearson introduced the term "standard deviation" for what had been known as "root mean square error." The confusion started then: people thought it meant mean deviation. The idea stuck: every time a newspaper has attempted to clarify the concept of market "volatility", it defined it verbally as mean deviation yet produced the numerical measure of the (higher) standard deviation. But it is not just journalists who fall for the mistake: I recall seeing official documents from the department of commerce and the Federal Reserve partaking of the conflation, even regulators in statements on market volatility. What is worse, Goldstein and I found that a high number of data scientists (many with PhDs) also get confused in real life.'"
## So you want to retire a statistical term... (Score:5, Insightful)

...because people use it incorrectly in economics? Get bent. The standard deviation is a useful tool for statistical analysis of large populations.

## Basic Statistics (Score:5, Insightful)

The meaning of standard deviation is something you learn on a basic statistics course.

We don't ask biochemists to change their terms because the electron transport chain is complicated.

We don't ask cryptographers to change their terms because the difference between extra entropy and multiplicative prediction resistance is not obvious.

We should not ask statisticians to change their terms because people are too stupid to understand them.

## That's not the problem. (Score:5, Insightful)

The problem is that people think they understand statistics when all they know is how to enter numbers into a program to generate "statistics".

They mistake the tools-used-to-make-the-model for reality. Whether intentionally or not.

## Re:That's not the problem. (Score:5, Insightful)

The problem is that peoples' attention spans are rapidly approaching that of a water-flea.

Up until the past 50 or so years, people who learned about Standard Deviation would do so in environments with far less stimulation and distraction. Their lives weren't so filled with extra-curricular activities and entertainments that they never sat for a moment from waking until sleep without some form of stimulus based pastime. When they "understood" the concept, there was time for it to ruminate and gel into a meaningful set of connections with how it is calculated and commonly applied. Today, if you can guess the right answer from a set of 4 choices often enough, you are certified expert and given a high level degree in the subject.

Not bashing modern life, it's great, but it isn't making many "great thinkers" in the mold of the 19th century mathematicians. We do more, with less understanding of how, or why.

## Re:Basic Statistics (Score:3, Insightful)

Someone should tell that to the lawyers!

## Re:So you want to retire a statistical term... (Score:5, Insightful)

## Re:Standard Deviation is Important (Score:4, Insightful)

I'm a little surprised at Nassim Taleb's position on this.

He has rightly pointed out that not all distributions that we encounter are Gaussian, and that the outliers (the 'black swans') can be more common than we expect. But moving to a mean absolute deviation hides these effects even more than standard deviation; outliers are further discounted. This would mean that the null hypothesis in studies is more likely to be rejected (mean absolute deviation is typically smaller than standard deviation), and we will be finding 'correlations' everywhere.

For non-Gaussian distributions, the solution is not to discard standard deviation, but to reframe the distribution. For example, for some scale invariant distributions, one could take the standard deviation of the log of the values, which would then translate to a deviation 'index' or 'factor'.

I agree with him that standard deviation is not trustworthy if you apply it blindly. If the standard deviation of a particular distribution is not stable, I want to know about it (not hide it), and come up with a better measure of deviation for that distribution. But I think the emphasis should be on identifying the distributions being studied, rather than trying to push mean absolute deviation as a catch-all measure.

And for Gaussian distributions (which are not uncommon), standard deviation makes a lot of sense mathematically (for the reasons outlined in the parent post).

## Re:That's not the problem. (Score:5, Insightful)

I think it's also true that a larger percentage of people are going to university, so the average "intelligence" of people in university in terms of natural ability is probably lower now than when it was just the very best students attending.

Most of the mediocre students today would have simply not gone to university in the past. I think the same principle holds when it comes to things like blogs. The fact that public discourse can sometimes make it seem as if people are getting dumber, when it is really just that more and more people know how to read and write and can now even be published, whereas in the past, there was a higher cost to publishing, and you were more likely to have something important to say before being willing to incur that cost.

## Re:The big picture (Score:4, Insightful)

I think NNT is saying that the MAD ought to be used when you are conveying a numerical representation of the "deviations" with the intent that readers use this number to imagine or intuit the size of the "deviations." His example is that of how much the temperature might change on a day-to-day basis. According to him, it's not just that the concept is easier to explain, but that it is the more accurate measure to use for this purpose.

Based on his other work I'm sure he understands that the STD is generally superior for optimization purposes, fit comparison, etc.

## Re:Would those data scientists with PhDs (Score:4, Insightful)