Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Math Stats Science

Weak Statistical Standards Implicated In Scientific Irreproducibility 182

ananyo writes "The plague of non-reproducibility in science may be mostly due to scientists' use of weak statistical tests, as shown by an innovative method developed by statistician Valen Johnson, at Texas A&M University. Johnson found that a P value of 0.05 or less — commonly considered evidence in support of a hypothesis in many fields including social science — still meant that as many as 17–25% of such findings are probably false (PDF). He advocates for scientists to use more stringent P values of 0.005 or less to support their findings, and thinks that the use of the 0.05 standard might account for most of the problem of non-reproducibility in science — even more than other issues, such as biases and scientific misconduct."
This discussion has been archived. No new comments can be posted.

Weak Statistical Standards Implicated In Scientific Irreproducibility

Comments Filter:
  • Scarcely productive (Score:5, Interesting)

    by fey000 ( 1374173 ) on Tuesday November 12, 2013 @07:59PM (#45407255)

    Such an admonishment is fine for the computational fields, where a few more permutations can net you a p-value of 0.0005 (assuming that you aren't crunching on a 4-month cluster problem). However, biological laborations are often very expensive and take a lot of time. Furthermore, additional tests are not always possible, since it can be damn hard to reproduce specific mutations or knockout sequences without altering the surrounding interactive factors.

    So, should we go for a better p-value for the experiment and scrap any complicated endeavour, or should we allow for difficult experiments and take it with a grain of salt?

  • by mysidia ( 191772 ) on Tuesday November 12, 2013 @08:02PM (#45407289)

    Five sigma is the standard of proof in Physics. The probability of a background fluctuation is a p-value of something like 0.0000006.

    Of proof yes... that makes sense.

    Other fields should probably use a threshold of 0.005 or 0.001.

    If they use move to five sigma....... 2013 might be the last year that scientists get to keep their jobs.

    What are you supposed to do; if no research in any field is admissable, because the bar is so high noone can meet it, even with meaningful research?

  • Re:Or you know.. (Score:5, Interesting)

    by Anonymous Coward on Tuesday November 12, 2013 @08:20PM (#45407439)

    Yes, I agree. If a p-value of 0.05 actually "means" 0.20 when evaluated, then any sane frequentist will tell you that things are fucked, since the limiting probability does not match the nominal probability (this is the definition of frequentism).

    The power of Bayesian stats is largely in being able to easily represent hierarchical models, which are very powerful for modeling dependence in the data through latent variables. But it's not the Bayesianism per se that fixes things, it's the breadth of models it allows. A mediocre modeler using Bayesian statistics will still create mediocre models, and if they use a bad prior, then things will be worse than they would be for a frequentist.

    Consider that if Bayesian statisticians are doing a better job than frequentists at the moment, it may be because Bayesian stats hasn't yet been drilled into the minds of the mediocre, as frequentist stats has been for decades. People doing Bayesian stats tend to be better modelers to begin with.

  • Yes and no (Score:5, Interesting)

    by golodh ( 893453 ) on Tuesday November 12, 2013 @09:02PM (#45407805)
    As you say, there is the Central Limit Theorem (a whole bunch of them actually) that says that the Normal distribution is the asymptotic limit that describes unbelievably many averaging processes.

    So it gives you a very valid excuse to assume that the value distribution of some quantity occurring in nature will follow a Normal distribution when you know nothing else about it.

    But there's the crux: it remains an assumption; a hypothesis, and fortunately it's usually a *testable* hypothesis. It's the responsibility of a researcher to check if it holds, and to see how problematic it is when it doesn't.

    If something has a normal distribution, its square or its square root (or another power) doesn't have a Normal distribution. Take for example the diameter, surface area, and volume of berries. The diameter (goes with the radius, r), the surface area (goes with r^2), and the volume of berries (goes with r^3). They cannot all be Normally distributed at the same time, so assuming any of them is starts you out on shaky foundation.

  • The real issue (Score:5, Interesting)

    by Okian Warrior ( 537106 ) on Tuesday November 12, 2013 @09:06PM (#45407845) Homepage Journal

    Okay, here's the real problem with scientific studies.

    All science is data compression, and all studies are are intended to compress data so that we can make future predictions. If you want to predict the trajectory of a cannonball, you don't need an almanac cross referencing cannonball weights, powder loads, and cannon angles - you can calculate the arc to any desired accuracy with a set of equations that fit on half a page. The half-page compresses the record of all prior experience with cannonball arcs, and allows us to predict future arcs.

    Soft science studies typically make a set of observations which relate two measurable aspects. When plotted, the data points suggest a line or curve, and we accept the linear-regression (line or polynomial) as the best approximation for the data. The theory being that the underlying mechanism is the regression, and unrelated noise in the environment or measurement system causes random deviations of observation.

    This is the wrong method. Regression is based on minimizing squared error, which was chosen by Laplace for no other reason that it is easy to calculate. There's lots of "rationalization" explanations of why it works and why it's "just the best possible thing to do", but there's no fundamental logic that can be used to deduce least squares from from fundamental assumptions.

    Least squares introduces several problems:

    1) Outliers will skew the values, and there is no computable way to detect or deal with outliers (source [wikipedia.org]).

    2) There is no computable way to determine whether the data represent a line or a curve - it's done by "eye" and justified with statistical tests.

    3) The resultant function frequently looks "off" to the human eye, humans can frequently draw better matching curves; meaning: curves which better predict future data points.

    4) There is no way to measure the predictive value of the results. Linear regression will always return the best line to fit the data, even when the data is random.

    The right way is to show how much the observation data is compressed. If the regression function plus data (represented as offsets from the function) take fewer bits than the data alone, then you can say that the conclusions are valid. Further, you can tell how relevant the conclusions are, and rank and sort different conclusions (linear, curved) by their compression factor and choose the best one.

    Scientific studies should have a threshold of "compresses data by N bits", rather than "1-in-20 of all studies are due to random chance".

  • by hawguy ( 1600213 ) on Tuesday November 12, 2013 @09:23PM (#45407973)

    And more importantly, a 17-25% chance that it's completely ineffective, no better than a placebo.

    My sister went through 4 different drugs before she found one that made her condition better. One made her (much) worse.

    Yet she likely wouldn't be alive today if none of those 4 drugs worked.

  • by martinux ( 1742570 ) on Wednesday November 13, 2013 @03:55AM (#45410335)

    I work in this field and usually see power calculations recommending samples of non-viable size.

    I can see recruiting hundreds of subjects as being feasible in the US or a large european country but in smaller countries one simply has to state clearly in a paper's limitations that any findings must be interpreted in light of the available sample.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...