Forgot your password?
typodupeerror
Stats Math Science

Why Standard Deviation Should Be Retired From Scientific Use 312

Posted by Soulskill
from the hope-it-gets-a-good-pension dept.
An anonymous reader writes "Statistician and author Nassim Taleb has a suggestion for scientific researchers: stop trying to use standard deviations in your work. He says it's misunderstood more often than not, and also not the best tool for its purpose. Taleb thinks researchers should use mean deviation instead. 'It is all due to a historical accident: in 1893, the great Karl Pearson introduced the term "standard deviation" for what had been known as "root mean square error." The confusion started then: people thought it meant mean deviation. The idea stuck: every time a newspaper has attempted to clarify the concept of market "volatility", it defined it verbally as mean deviation yet produced the numerical measure of the (higher) standard deviation. But it is not just journalists who fall for the mistake: I recall seeing official documents from the department of commerce and the Federal Reserve partaking of the conflation, even regulators in statements on market volatility. What is worse, Goldstein and I found that a high number of data scientists (many with PhDs) also get confused in real life.'"
This discussion has been archived. No new comments can be posted.

Why Standard Deviation Should Be Retired From Scientific Use

Comments Filter:
  • by Deadstick (535032) on Wednesday January 15, 2014 @06:01PM (#45969987)

    Three characterizations of statistics, in ascending order of accuracy:

    1. There are lies, damned lies, and statistics.

    2. Figures don't lie, but liars figure.

    3. Statistics is like dynamite. Use it properly, and you can move mountains. Use it improperly, and the mountain comes down on you.

  • Re:Basic Statistics (Score:4, Interesting)

    by Mashdar (876825) on Wednesday January 15, 2014 @06:22PM (#45970181)

    Didn't you hear? Guassians are so 1893. And so are all of the other distributions with convenient sigma terms...

    And TFS calls standard deviation "root mean square error", which is only true if you assume the mean is a constant estimator for the distribution :(

    In any case, no one picked Gaussians because they are so easy to integrate... While we're at it, TFA should suggest we round the number e to 3, because irrational numbers are hard, and who cares what natural law dictates.

  • by PacoSuarez (530275) on Wednesday January 15, 2014 @06:33PM (#45970275)

    Perhaps non-mathematicians don't have a problem with this, but it rubs me the wrong way.

    What makes the mean an interesting quantity is that it is the constant that best approximates the data, where the measure of goodness of the approximation is precisely the way I like it: As the sum of the squares of the differences.

    I understand that not everybody is an "L2" kind of guy, like I am. "L1" people prefer to measure the distance between things as the sum of the absolute values of the differences. But in that case, what makes the mean important? The constant that minimizes the sum of absolute values of the differences is the median, not the mean.

    So you either use mean and standard deviation, or you use median and mean absolute deviation. But this notion of measuring mean absolute deviation from the mean is strange.

    Anyway, his proposal is preposterous: I use the standard deviation daily and I don't care if others lack the sophistication to understand what it means.

  • I hate averages (Score:5, Interesting)

    by tthomas48 (180798) on Wednesday January 15, 2014 @06:38PM (#45970337) Homepage

    I also think averages should go away. Most people think they are being reported the median (the number in the middle) when people tell them the average. It's great for real estate agents, and people trying to advocate for tax reform, but the numbers are not what people think they are.

  • by dcollins (135727) on Wednesday January 15, 2014 @06:47PM (#45970423) Homepage

    Well... first of all, summary has it wrong. It's not "mean deviation", it's "mean absolute deviation", or just "absolute deviation" from the literature I've seen. (Mean deviation is actually always zero, the most useless thing you could possibly consider.)

    Keep in mind that standard deviation is the provably best basis if your goal is to estimate a population *mean*, the most commonly used measure of center. Absolute deviation, on the other hand, is the best basis to use for an estimate of a population *median*, which is maybe fine for finances, which is what the linked paper seems mostly focused on. (Bayesian best estimators, if I recall correctly.)

    If the main critique is that economists and social scientists don't know what the F they're doing, then I won't disagree with that. But no need to metastasize the infection to math and statistics in general.

  • Re:"many with PhDs" (Score:4, Interesting)

    by Daniel Dvorkin (106857) on Wednesday January 15, 2014 @07:07PM (#45970619) Homepage Journal

    What other existing specialization in computer science, physics, etc,. do you feel is qualified to use Hadoop to process trillions of triple stores into a network and subsequently build highly multivariate link prediction models and evaluate their output statistically with respect to ground truth, to name but one trifling task?

    As it happens, one of my colleagues runs a project which, among other things, does exactly that. His PhD is in computer science. I'm a bioinformaticist with a background primarily in biostatistics; I couldn't develop a tool like that, but I can certainly see the value in it. In general, I'm not arguing that the tasks currently getting lumped together under "data science" aren't valuable. I'm just saying that I'm not convinced they fit together into a coherent field that can meaningfully be studied in a single degree program, and attempts to make them so may well run into the problem of "jack of all trades, master of none."

  • by Daniel Dvorkin (106857) on Wednesday January 15, 2014 @07:14PM (#45970685) Homepage Journal

    Cancer research and particle physics use data scientists. Unfortunately so does amazon.com.

    Okay, since cancer research is a very large field, I can't say for sure one way or the other ... but I do know that working in bioinformatics at a major academic research center, I've never known a single person in medical research of any kind who called themselves a "data scientist." We have lots of computer scientists and statisticians, most of whom, fortunately, get along well enough to make use of each other's strengths. Regarding particle physics I have no idea, but yeah, I'm willing to bet Amazon or any other large corporation hires more "data scientists" than all the scientific institutions in the world put together--and gets exactly the kind of buzzword bingo they're paying for in return.

  • Re:The big picture (Score:5, Interesting)

    by mythosaz (572040) on Wednesday January 15, 2014 @08:09PM (#45971175)

    I would have said "18 half gallon pottles to the quarter-barrel firkin."
    Wolfram Alpha says 15.75 pottles to the firkin, but that's because of US/UK gallon conversions, I reckon.

    352 nails in a chain - which was interesting to me, in that Google includes those units in its calculator.

    I now know more about pottles, firkins, nails and chains that I did when I woke up. I shudder to think about what got pushed out of my old head to make way for this new minutia.

  • I always hated frequentist statistical methods and the Gaussian Distribution.

    I see below an Astrophysicist echos your claim.

    I'm happily surprised to learn I am not the only one who thinks the whole 'gaussian' should be banished.

    I come from a Systems Science and Research Methodology background in this area. One of my favorite parts of grad school was a 4 hr one on one tutoring session every week I did for a semester with my large State Univ. Research Director who is the person faculty/staff go to for questions about this stuff. All across the disciplines faculty, post-docs, PhD candidates come to these guys b/c it is their job to know & I was doing work for a prof who got me this and it was really cool.

    He explained how each discipline used statistics in their published research & rules for PhD candidates.

    One personal thing I took away was a deep mistrust of anything Gaussian, beyond some astrophysics & math stuff.

    Without getting too technical, IMHO it's bullshit. It's a scientist assigning what ammount to, not random, but factor analysis that is accurate only because of non-quantifiable expertise-type decisions in how to define the research question, how to test, and what kind of statistics to expect.

    I'm not saying they don't make good choices, sometimes these PhD's come up with excellent work across the academic disciplines, I'm just saying that you can piss in a jar and call it statistical significance...

Premature optimization is the root of all evil. -- D.E. Knuth

Working...