Forgot your password?
Math Science

Science and the Shortcomings of Statistics 429

Posted by samzenpus
from the 14%-of-people-know-that-statistics-can-prove-anything dept.
Kilrah_il writes "The linked article provides a short summary of the problems scientists have with statistics. As an intern, I see it many times: Doctors do lots of research but don't have a clue when it comes to statistics — and in the social science area, it's even worse. From the article: 'Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.'"
This discussion has been archived. No new comments can be posted.

Science and the Shortcomings of Statistics

Comments Filter:
  • by Shadow of Eternity (795165) on Wednesday March 17, 2010 @10:31PM (#31518344)

    In other news math may not lie but people still can, all the honesty and good statistics in the world doesnt help end-user stupidity, and there are statistically two popes per square kilometer in the vatican.

  • by jeckled (1716002) on Wednesday March 17, 2010 @10:41PM (#31518400)
    Also, statistics are often manipulated to suggest correlations where there are none.
  • Re:Maths anxiety (Score:5, Informative)

    by Nefarious Wheel (628136) on Wednesday March 17, 2010 @10:50PM (#31518454) Journal
    How to Lie with Statistics [] by Darrell Huff. Recommended reading.
  • by cytoman (792326) on Wednesday March 17, 2010 @10:59PM (#31518532)
    Standard deviation is what you learn very early in school. And this was a endocrinologist - a specialist who no doubt took a lot of Biostatistics courses and such, and used a lot of statistics all through his education. And you are telling me that it's not his "job" to know? Wow! We are talking the most basic stuff that anyone with a degree in the sciences should know. It's almost like saying that an English major can be excused if he doesn't know that 2+2=4 because "it's not his job to know".
  • Re:No surprise here (Score:3, Informative)

    by skine (1524819) on Wednesday March 17, 2010 @11:02PM (#31518560)

    It's perfectly reasonable that someone use a calculator for sales tax (if an exact answer is desired).

    Also, sales tax is multiplication - not algebra.

  • Re:No surprise here (Score:1, Informative)

    by Anonymous Coward on Wednesday March 17, 2010 @11:18PM (#31518676)

    Arithmetic is not algebra. Arithmetic is "What's 10% of $24.45?" Algebra would be "On a given day i, John sells n_i apples to Peter at x_i dollars each, and this price includes sales tax which is a constant proportion 0p1. Let x_1= .. x_2= ... ... What is the tax on the apples sold on days 1 to 12 inclusive?"

    The difference is 24.45 . 10/100 versus p\sum_{i=1}^{12} n_ix_i. Granted, there isn't much difference there really, but come on, there is a time and a place for everything, calculators included.

  • Re:Long winded troll (Score:3, Informative)

    by TapeCutter (624760) * on Wednesday March 17, 2010 @11:20PM (#31518694) Journal
    It's a troll because it implies scientists don't know about those things.
  • by Ethanol-fueled (1125189) * on Wednesday March 17, 2010 @11:33PM (#31518770) Homepage Journal
    s = sample standard deviation = sqrt((sum(x-xbar)^2)/(n-1)), where xbar is the mean
    sigma = population standard deviation = sqrt((sum(x-mu)^2)/N), where mu is the mean
    s is approximately equal to (highestValue-lowestValue)/4, range rule of thumb
    Unusual values are outside +/- 2 standard deviations
    Z = ((x-mu)/sigma) where Z is in terms of standard deviations.
  • by williamhb (758070) on Wednesday March 17, 2010 @11:38PM (#31518818) Journal
    Contrary to the parent poster's claim, the article does not focus on correlation vs causation. It focuses on people getting the correlation wrong in the first place. It lists several common mistakes scientists make when writing up research studies. (Not all scientists are very good at stats). These include:
    • If you run enough studies you are almost certain to find a difference that appears statistically significant at the p<0.05 level through chance alone. (It is incredibly unlikely that you will win the lottery; but across the whole pool of tickets someone wins it most weeks.) That makes studies that bulk analyze large amounts of data against many different factors, actively hunting for something that is significantly different, erroneous.
    • "p < 0.05" does not mean there is a 95% chance of your result being "true"; it just means that someone else rolling dice has a 5% chance of achieving the same result through chance alone.
    • Tests are often combined in ways that are mathematically inconsistent
    • Finding a statistical effect does not mean it is a strong effect
    • You cannot simply compare effect sizes between two studies because the results of their control groups may differ ("effect size analysis" is usually wrong)
    • Failing to find a significant effect does not mean there is no effect ("we found there was no significant effect on..." is misleading because "no satistical significance" is "no information" [your study didn't tell anybody anything] not "no effect" -- to prove "no effect" you need a different statistical test)

    And lots of others. It then suggests Bayesian reasoning as an alternative to traditional statistical tests.

    Most post-PhD scientists are aware of the common mistakes, but being aware that we make mistakes doesn't necessarily stop us from making them. If you chose a random set of conference proceedings, it is almost certain you will find at least one paper (and I suspect usually a dozen or more) that have statistical mistakes in them.

  • by solanum (80810) on Wednesday March 17, 2010 @11:57PM (#31518944)

    and IAAB (biologist) and I can tell you that most scientists don't have access to statisticians or don't have the grant money to pay for them. I also don't have time to learn SAS and code my own tests, therefore I use stuff like SPSS or Genstat (both of which do allow you to code your own tests as well). Just because they are easy to use doesn't mean I do or do not understand the tests, the assumptions or their results. I would say my grasp of stats is above average for my peer group, below where I would like it to be and obviously limited.

    One thing that is interesting to me is that throughout my education and career I have been warned off using multiple means comparisons and LSD in particular (I understand why and have avoided where I can and the latter always). Yet the only actual statisticians I have dealt with in recent years have recommended me to use LSD on means comparisons with 10s of means. I would be hard pressed to publish those results.

    In summary, whilst statisticians like to blame easy to use stats programs for bad stats the reality is they are just a tool and if statisticians can't agree on the acceptable use of the simplest procedures I'm not sure what chance the rest of us have of getting it right.

  • by Z8 (1602647) on Wednesday March 17, 2010 @11:57PM (#31518946)
    I see a lot of posts bashing people for being idiots, and I'm sure that's often the case, but IMHO there are some big problems with statistics itself.
    • The most common school is the "classical" school, which is extremely counterintuitive. For instance, most people think that if a 95% confidence interval is 5 to 10, then the parameter has a 95% chance of being between 5 to 10. This would be true with Bayesian statistics, but exactly backwards for classical statistics. For classical statistics, it's that your 5 to 10 interval has a 95% chance of being around the parameter! This is a subtle difference that most statisticians don't even understand, and it screws up almost everyone. Furthermore the classical statement is much less useful than the intuitive statement that people think it is.
    • Relatedly, other schools which make more sense such as Bayesianism and likelihoodism aren't taught. Furthemore, nonparametric statistics are usually not taught to undergrads (unless they are statistics majors probably). In the real world, non-parametric statistics are often more useful because no parametric model is actually true (for instance, basic regression assumes that the Truth is in your model, and it almost never is).
    • Finally, a lot of statistics as it is normally taught depends on the central limit theorem. Any result that depends on the central limit theorem (or the law of large numbers) is often useless in real applications due to data poverty. The basic reason is that the average of i.i.d. random variables only converges to a normal distribution as 1/sqrt(n). Everyone knows this, and it's obvious that something that converges to 1/sqrt(n) is much much slower than the typical 1/n convergence, but people still rely on the central limit theorem.

    Statistics is changing slowly (mostly because computers and R make non-classical statistics more practical) but the way it's taught still leads to problems.

  • Re:Summery? (Score:3, Informative)

    by icannotthinkofaname (1480543) on Thursday March 18, 2010 @12:12AM (#31519032) Journal

    That would be Muphry's law [].

    For details on Muphry's law, click on the above hyperlink. For more fun laws, click on the below hyperlink.

    More fun here. []

  • by WeirdJohn (1170585) on Thursday March 18, 2010 @12:22AM (#31519074)

    It's the approach that you can just pump the numbers into SPSS or Statistica, and then call on a battery of tests until you get a "significant" result that results in the kind of errors the article (and a disturbing number of /. readers) fall into.

    Unless you're dealing with large samples, all z and t tests assume normality in the population, with insignificant skew or kurtosis. Yet by definition, if we have enough data to be sure we have a normal population, we have enough data that the central limit theorem makes the differences moot. Even more extreme, if we have a complete description of the population (a census) we have no need to use any inferential statistics.

    Meanwhile students are told to test the data for normality, homoscedacticity and linearity, to the point where the repeated tests on a single data set make the chance of a Type II error better than even. But by saying "SPSS said so" and burying assumptions beneath a mountain of waffle and misunderstood jargon we can still get these "results" published.

    No-one who can't perform a balanced block design ANOVA by hand, or explain what transforming data does to residuals under assumptions of a linear additive model, should be allowed near statistical software in my opinion. And the so-called statistics packages in popular spreadsheets should be banned, and any student relying on them should be failed.

  • Re:Long winded troll (Score:5, Informative)

    by crmarvin42 (652893) on Thursday March 18, 2010 @12:36AM (#31519140)
    Peer review is not about catching mistakes, although it can on occation. Peer review is about clear communication, such that the experiment can be repeated as identically as possible and that the readers can understand the authors justification for their conclusions. At least that's what every journal article I've read on the topic indicateded was the reason for the peer review processes creation. One of my advisors asked me about it on my written preliminary exam and I needed to do a lot of reading to be prepared for the oral exam. There were several different societies that claimed to have originated the idea, but no one claimed that the purpose was to catch mistakes, fabrications, or data manipulations.
  • by rve (4436) on Thursday March 18, 2010 @01:05AM (#31519280)

    You're mixing up psychiatrists, psychologists and psychotherapists.
    A psychiatrist went to med school, got a doctors degree and specialized in problems with the brain. A psychologist went to university to learn the study of behavior of people. This involves a lot of statistics and many of them probably do consider it something they didn't go to college for, but it's a study that is supposed to follow the scientific method and prepare students for doing research, not therapy.

    A psychotherapist is anyone who feels like calling themselves that. As a preparation they may have studied psychology at university, or they may have spent 20 years meditating in the Himalayas, or followed a short course at a religious group such as an institute of multiple personality disorder therapists or scientology.

  • Re:Long winded troll (Score:2, Informative)

    by TapeCutter (624760) * on Thursday March 18, 2010 @02:25AM (#31519586) Journal
    I'm not talking about original intent, I'm talking about contempory practice, the first peer-review policy [] I looked at to check your assertion was the journal Nature. It doesn't say anything about clarity or repeatability, it appears to back up what I said, quoth the policy...

    "Nature journals receive many more submissions than they can publish. Therefore, we ask peer-reviewers to keep in mind that every paper that is accepted means that another good paper must be rejected. To be published in a Nature journal, a paper should meet four general criteria:
    * Provides strong evidence for its conclusions.
    * Novel (we do not consider meeting report abstracts and preprints on community servers to compromise novelty).
    * Of extreme importance to scientists in the specific field.
    * Ideally, interesting to researchers in other related disciplines."


    "The editors then make a decision based on the reviewers' advice, from among several possibilities:
    * Accept, with or without editorial revisions
    * Invite the authors to revise their manuscript to address specific concerns before a final decision is reached
    * Reject, but indicate to the authors that further work might justify a resubmission
    * Reject outright, typically on grounds of specialist interest, lack of novelty, insufficient conceptual advance or major technical and/or interpretational problems"
  • Re:Summery? (Score:5, Informative)

    by Saroful (1364377) on Thursday March 18, 2010 @03:37AM (#31519830)
    And what's the law about spelling/grammar corrections that incorrectly correct the supposed spelling error? (Redundancy is purposefully deliberate.) "Its" is possessive. "It's" is a contraction of "it" and "is". -- This has been a message from your friendly neighborhood Spelling Nazi.
  • by Daniel Dvorkin (106857) * on Thursday March 18, 2010 @03:53AM (#31519906) Homepage Journal

    Devore's Probability and Statistics for Engineering and the Sciences is probably the best one-volume, undergrad-level intro to statistics out there. Get a copy (I think it's on the sixth or seventh edition now; you can pick up a fifth edition for cheap) and work your way through that, and you'll have a pretty good idea of where all those formulae come from and how they're used. Get a copy of R [] and check out the "Devore*" packages in the package list [] too. If you want to learn more after that, I recommend Kutner et al.'s Applied Linear Statistical Models for applications, and Casella and Berger's Statistical Inference for theory.

    The Wikipedia stats pages are pretty good for most things, but many of them are written with the assumption of a lot of background knowledge. If you open up a page on a particular stats subject and you comprehend it, great; if not, be prepared to do a lot of digging outside of Wikipedia, because trying to figure out the subject from the links to other WP pages is an exercise in circularity.

  • by Anonymous Coward on Thursday March 18, 2010 @05:37AM (#31520318)

    Are you referring to PSYCHOLOGY or PSYCHIATRY (or both?).

    The former do a lot more stats than the latter during their training. Psychology students in Australia (me) do at least 4 years of statistics (and often 6) before working in the field. Of course we all have a love-hate relationship with stats, but there is no way to get through the course(s) without working hard and actually learning the material.

    Psychiatry students essentially study medicine (and do intermediate stats), and then specialise afterward. They require a completely different skill-set to Psychologists, and have different training.

  • by demonlapin (527802) on Thursday March 18, 2010 @07:08AM (#31520748) Homepage Journal
    Med schools hire statisticians for this. Read the thank-yous. Even small studies will thank the biostatistician. The biostatistician will be an author on a major study.
  • by kenp2002 (545495) on Thursday March 18, 2010 @10:29AM (#31522600) Homepage Journal

    The largest demographic in american prisons are black americans. Real statistic but is it true?

    Given a particular sample that indicates blacks are 60% of the prison population this would appear to be true.

    But what if I said: "The largest demographic in prison is minority, non-whites." Suddenly the % jumps from 60% (black) to 80% (minority). Which is more right? This is the problem with statistics. Context.

    Now I can say readily that the largest demographic in prison is actually right-handed people. The % now jumps to 90%.

    But wait! There is more! The largest demographic is prison is actually people who prior to arrest were below the poverty line which jumps to 99% of the population. Again, all of the above are accurate based on a sample but which is MORE correct? Linear Algebra is coming into play here quickly....

    When that kind of issue comes into play, it is the classic "Correlation != Causation" confusion. The majority of people in prison are in there because of "Being black? Being a minority? being right handed? or being poor?" None of the above. The majority of them are in there because they were convicted of a crime and sentenced. That is the causation of their imprisonment, the rest is correlation which may have a direct causation on the conviction or sentencing, but no direct causation on being in prison. (e.g. You cannot be thrown into prison for being poor, black, minority, right handed)

    Same with medical research, politics, economics, etc. The price of oil rising 10% and a subsequent 5% drop in shipping orders. Measuring the significance of regessors is important but oddly never reported most of the time. Many factors get masked or shadowed by higher level regressors (e.g. being a minority masks a variety of other social and economic factors. In addition it can distort statistical work by being too broad. Asians have a variety of different economic and social factors as north american blacks versus even african immigrants.)

    Back to the orignal subject:

    We can take 100 prisoners and 100 non-prisoners and figure out rather quickly if being black is statistically significant in prison population. Non-prison population blacks would account for 25%-45% of the population (Depending on location). We can see that 60% of prisoners are black. There is a 20+% deviation from the norm. We can test to see the significance of that. Same with minorities. Now we find something quickly that right handed is insignificant because it doesn't deviate from the norm. We can test left-handed and right-handed populations and rule out the handed-ness of a convict being significant.
    We can find the economic status is considerable MORE significant then minority or black as a status. We can determine that the reason minorities or blacks are disporotinally more prevelant in prison is that blacks and minorities have higher rates of poverty. We can extract and determine the statistical weight of POVERTY in regards to imprisonment (Since we find a high % of white in prison that are poor compared to the normal population.) Once we figure that out we can remove that and continue an investigation and figure out what weight minority and black has once we have removed POVERTY from the model (Residual analysis).

    The problem in reporting is without providing the whole, comprehensive analysis you can miss important things. For instance to correct the injustice in sentencing, without reporting the weight POVERTY has in contrast to BLACK or MINORITY you may lose sight that you may have better success addressing POVERTY to normalize sentencing rather then MINORITY or BLACK (or not).

    The same happens in medical reasearch. Given a cocktail of drugs wirthout having the whole analysis you may end up providing more of Medicine A versus B but lose sight that A & B are limited by the dosage of Medicine C.

    Satistics are not bullshit, rather mearly observations with no intrinsic agenda or even implication of truth. Purely amoral, like a hand gun.. useful to both the good and evil.

    Statistics don't lie, nor do they tell the truth. They simple show the relationship of the data as it stands. The Truth or Thruthiness of it is subjective and vulnerable to context.

  • by silentcoder (1241496) on Thursday March 18, 2010 @11:32AM (#31523384) Homepage

    The actual truth, as usual, is a bit more complex than the bit we all remember and quote.

    Where a correlation occurs there are four distinct possible reasons:
    Let say a correlation that during the time when X is known to have increased, Y showed a corelatory increase. then
    1) It is possible that this is because X caused Y - e.g. the causation that way isn't implied - yes it's a possible implication.
    2) It is possible that X in fact caused Y (e.g. the causation is in fact in the opposite direction of what the quoter of the stat is trying to say).
    3) It is possible that X and Y were both caused by an unknown third factor Z.
    4) It is possible that X and Y were caused by completely different factors and their correlation is purely coincidental.

    The mere existence of a correlation does not imply any of these four possibilities more strongly than the other - they are equally likely unless additional data is presented to corroborate one.

    The example from my philosophy textbook (which I'm shamelessly citing here) was this:
    Between the period 1955 to 1965 the number of schools in the US where sex-ed was given increased by 75%, during the same period the amount of teenage pregnancies increased by nearly 80% (Both compared to the decade before that). Conclusion - giving sex-ed led to more teenage pregnancies.

    This citation is classic example of the correlation/causation mistake in that it assumes option 1.
    In this case option 2 actually seems quite likely - if teenage pregnancies were going up, that would put pressure on schools to give sex-ed to try and reverse this trend.
    But what if we consider more available data. Specifically that the pill came on the market in 1953 sparking the sexual revolution.
    If we consider that the pill let to a more relaxed attitude among teenagers about sex, but that this attitude probably spread a lot faster than actual usage of the pill then it explains the increase in teen pregnancies which combined with the known presence of this attitude would put pressure on the schools to give sex ed.
    So then it suggests that in fact we have options 2,3 and 4 happening in a commonly reinforcing manner. The only conclusion that isn't supported by the data at all is option one.
    With each bit of additional data added, including comparison with other times where there was a sharp increase in teenage pregnancy (like the early years of the current decade under the Bush administration) we find that the likelihood of X actually causing Y in this example gets smaller and smaller and in fact becomes statistically insignificantly small.

    But that doesn't mean option 1 is never the right answer. Sometimes a correlation really is due to causation. You just cannot assume it without further evidence.

  • Re:Long winded troll (Score:3, Informative)

    by crmarvin42 (652893) on Saturday March 20, 2010 @09:36AM (#31549090)
    That people are trying to use peer-review as a method to detect fraud, does not make it a good method for doing so. I've mentioned this before on /., although not in this thread, but I have no way of telling if the numbers in a table were generated by the experiment described, some other experiment, a random number generator, or the PR department at the company who's product is being evaluated. As long as the numbers are internally consistent, I have to "trust" that what they describe, happened. I can catch obvious errors, such as the SEM not supporting claims of statistical significance made by the authors. However, if during the review process, they claim that the SEM was a typo (numbers were actually SD and not SEM for example) and change it, I have no way of verifying that their explanation was true.

    Also, in your quote you highlighted 2 different lines. The first has to do with the soundness of the conclusions. This is most definitely a role of peer review, but not related to accuracy. It doesn't mean that they verify that your conclusions are correct. Conclusions are not objective. The data gives you objective facts from which to draw subjective conclusions. This line indicates that your discussion will be evaluated for how well the data (yours and previous literature) supports your conclusions. If you extrapolate, or ignore important results then your paper will be rejected.

    The second bolded section just indicates that if serious errors are found (using insufficiently large sample size, extrapolating results, etc.) then the paper will be rejected. That's totally understandable to reject, but obviously serious errors of this sort are uncommon. Most errors are much harder to detect, and are not picked up by the peer review process in my experience.

We are Microsoft. Unix is irrelevant. Openness is futile. Prepare to be assimilated.