Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Math Stats Science

Weak Statistical Standards Implicated In Scientific Irreproducibility 182

ananyo writes "The plague of non-reproducibility in science may be mostly due to scientists' use of weak statistical tests, as shown by an innovative method developed by statistician Valen Johnson, at Texas A&M University. Johnson found that a P value of 0.05 or less — commonly considered evidence in support of a hypothesis in many fields including social science — still meant that as many as 17–25% of such findings are probably false (PDF). He advocates for scientists to use more stringent P values of 0.005 or less to support their findings, and thinks that the use of the 0.05 standard might account for most of the problem of non-reproducibility in science — even more than other issues, such as biases and scientific misconduct."
This discussion has been archived. No new comments can be posted.

Weak Statistical Standards Implicated In Scientific Irreproducibility

Comments Filter:
  • Re:Or you know.. (Score:5, Informative)

    by Anonymous Coward on Tuesday November 12, 2013 @07:53PM (#45407181)

    This would have the same problems, maybe even worse. The problem with statistics is usually that the model is wrong, and Bayesian stats offers two chances to fuck that up: in the prior, and in the generative model (=likelihood). Bayesian statistics still requires models (yes, you can do non-parametric Bayes, but you can do non-parametric frequentist stats also).

    Contrary to the hype and buzzwords, Bayesian statistics is not some magical solution. It is incredibly useful when done right, of course.

  • by Derec01 ( 1668942 ) on Tuesday November 12, 2013 @07:55PM (#45407201)

    That is because of the central limit theorem, (http://en.wikipedia.org/wiki/Central_limit_theorem), which indicated that for a large number of independent samples, it doesn't matter what the original distribution was, and we certainly can reliably use the normal distribution. It is NOT unfounded.

  • by Anonymous Coward on Tuesday November 12, 2013 @08:03PM (#45407293)

    Statistics does not, by any means, make that assumption. If it did, the entire field of statistics would have been completed by 1810.

    Mediocre (actually, sub-mediocre) practitioners of statistics make that assumption.

    It is true that many estimators tend to a normal distribution as the sample size gets large, but this is not the same as assuming that the data itself comes from the normal distribution.

  • by Anonymous Coward on Tuesday November 12, 2013 @08:28PM (#45407515)

    This is a fallacious understanding of p-value.

    Something closer to (but still not quite) correct would be: that there is a 75-83% chance that the claimed efficacy of the drug is within the stated error bars. For example, there may be a 75-83% chance that the drug is between 15% and 45% effective at treating your disease.

    That's much worse, isn't it?

  • Re:Or you know.. (Score:4, Informative)

    by wickerprints ( 1094741 ) on Wednesday November 13, 2013 @03:53AM (#45410331)

    First of all, recommending that hypothesis tests be conducted with smaller tolerances for Type I error almost invariably imply a large decrease in power. There is no free lunch. There are many experimental designs for which the importance of making a positive inference (i.e., accepting the alternative hypothesis) is so great that you do need to set the alpha level very small. But if the test is to have any power, that means the data you must gather must be much, much more extensive. So, to simply say "alpha = 0.05 is too large because it admits too many irreproducible claims by random chance" sort of misses the basic point. A test conducted at such a level still has at most a 1 in 20 chance of observing a test statistic that would reject the null hypothesis even if the null is true. So a p-value of 0.04, for example, would merit further investigation. That's not so much a flaw of the frequentist methods as it is a flaw in interpretation, due to the natural tendency of investigators or clinicians to want a straightforward "yes/no" answer.

    Bayesian methods, then, don't really offer intrinsically more meaning than frequentist methods. The main difference is that Bayesian methods, by their construction, force the investigator to draw an inference that is not characterized by a "yes/no" answer--in fact, it becomes a bit of a contrivance (e.g., Bayes factors and the calculation of cumulative posterior distributions) to try to interpret Bayesian analyses in this way. Don't get me wrong, that is an appealing and advantageous characteristic, whereas more care is needed to interpret the frequentist approach. But Bayesian methods also suffer from their own problems, many of which arise from the necessity of imposing some kind of prior distribution (so, for instance, Bayes factors are not monotone in the hypothesis).

    The takeaway here is that in statistics, there is no magic bullet, no single approach that is supposed to be the "best" or "optimal" for inferential purposes. It is the role of scientists and investigators to perform the necessary follow-up analyses and meta-analyses to improve the credibility of a claim. So in a sense, the state of statistical methods in scientific research is NOT broken. It is working as intended, where people find enough evidence to stimulate further investigation, and it is through this process that previous claims are tested further. The only part that concerns me is how policymakers lacking in sufficient statistical background might put too much credence in a particular analysis--this idea that "oh, we found significance so this MUST be true"--or how the non-statistically informed public or media all too often distort the meaning of an analysis to the point of absurdity. But I argue that this is not a weakness of statistics. It is a deficiency in understanding brought about the human desire to act upon perceived certainties in a fundamentally uncertain world.

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...