Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Stats Science

Social Science Journal 'Bans' Use of p-values 208

sandbagger writes: Editors of Basic and Applied Social Psychology announced in a February editorial that researchers who submit studies for publication would not be allowed to use common statistical methods, including p-values. While p-values are routinely misused in scientific literature, many researchers who understand its proper role are upset about the ban. Biostatistician Steven Goodman said, "This might be a case in which the cure is worse than the disease. The goal should be the intelligent use of statistics. If the journal is going to take away a tool, however misused, they need to substitute it with something more meaningful."
This discussion has been archived. No new comments can be posted.

Social Science Journal 'Bans' Use of p-values

Comments Filter:
  • by aepervius ( 535155 ) on Friday April 17, 2015 @11:25AM (#49494269)
    It is the job of the reviewer to check that the statistic was used ion the proper context. not to check the result, but the methodology. It sounds like social journal simply either have bad reviewer or sucks at methodology.
    • Comment removed based on user account deletion
    • On average, reviewers have the same skill set as authors who will get accepted (since that is the pool they are taken from). If authors are getting it wrong then so will reviewers.
    • They're not crazy. This fantastic article from Nature in February 2014 [nature.com] shows how seemingly statistically certain events (e.g., p less than 0.01) can be thrown off by low probability events.

      .

      Frankly, I've always been a bit confused by the p value. It just seems more straightforward to provide your 95% confidence interval limits.

      • by ceoyoyo ( 59147 )

        Your 95% confidence interval (roughly*) indicates an interval containing 95% of the probability. The p-value indicates how much probability lies within a cutoff region. What most people do with a 95% CI is look to see if it overlaps the null value (zero, or the mean of the other group, for example). The p-value gives the same information, except quantitatively.

        * yes, Bayesians, technically the 95% credible interval, from a Bayesian analysis, contains the area of 95% probability. The confidence interval,

    • Three puzzles (Score:5, Interesting)

      by Okian Warrior ( 537106 ) on Friday April 17, 2015 @01:10PM (#49495277) Homepage Journal

      It is the job of the reviewer to check that the statistic was used ion the proper context. not to check the result, but the methodology. It sounds like social journal simply either have bad reviewer or sucks at methodology.

      That's a good sentiment, but it won't work in practice. Here's an example:

      Suppose a researcher is running rats in a maze. He measures many things, including the direction that first-run rats turn in their first choice.

      He rummages around in the data and finds that more rats (by a lot) turn left on their first attempt. It's highly unlikely that this number of rats would turn left on their first choice based on chance (an easy calculation), so this seems like an interesting effect.

      He writes his paper and submits for publication: "Rats prefer to turn left", P<0.05, the effect is real, and all is good.

      There's no realistic way that a reviewer can spot the flaw in this paper.

      Actually, let's pose this as a puzzle to the readers. Can *you* spot the flaw in the methodology? And if so, can you describe it in a way that makes it obvious to other readers?

      (Note that this is a flaw in statistical reasoning, not methodology. It's not because of latent scent trails in the maze or anything else about the setup.)

      ====

      Add to this the number of misunderstandings that people have about the statistical process, and it becomes clear that... what?

      Where does the 0.05 number come from? It comes from Pearson himself, of course - any textbook will tell you that. If P<0.05, then the results are significant and worthy of publication.

      Except that Pearson didn't *say* that - he said something vaguely similar and it was misinterpreted by many people. Can you describe the difference between what he said and what the textbooks claim he said?

      ====

      You have a null hypothesis and some data with a very low probability. Let's say it's P<0.01. This is such a good P-value that we can reject the null hypothesis and accept the alternative explanation.

      P<0.01 is the probability of the data, given the (null) hypothesis. Thus we assume that the probability of the hypothesis is low, given the data.

      Can you point out the flaw in this reasoning? Can you do it in a way that other readers will immediately see the problem?

      There is a further calculation/formula that will fix the flawed reasoning and allow you to make a correct inference. It's very well-known, the formula has a name, and probably everyone reading this has at least heard of the name. Can you describe how to fix the inference in a way that will make it obvious to the reader?

      • Do you really want someone to answer, or are these all rhetorical?
        Here's my take on this issue: Just because something is prone to be misused and misinterpreted doesn't mean it should be banned. In fact, some of the replacement approaches use the very same logic just with a different mathematical calculation process. However, it does illustrate the need for researchers to clearly communicate their results in ways that are less likely to be misused or misinterpreted. This wouldn't exclude the use of p-valu
      • As always there is an xkcd comic that answers your question in a nice and easy to understand fashion.

        I leave it to you to find the relevant link ;p

      • by lgw ( 121541 )

        He writes his paper and submits for publication: "Rats prefer to turn left", P 0.05, the effect is real, and all is good.

        There's no realistic way that a reviewer can spot the flaw in this paper.

        Actually, let's pose this as a puzzle to the readers. Can *you* spot the flaw in the methodology? And if so, can you describe it in a way that makes it obvious to other readers?

        I guess I don't see it. While P 0.05 isn't all that compelling, it does seem like prima facie evidence that the rats used in the sample prefer to turn left at that intesection for some reason. There's no hypothesis as to why, and thus way to generalize and no testable prediction of how often rats turn left in a different circumstances, but it's still an interesting measurement.

        You have a null hypothesis and some data with a very low probability. Let's say it's P 0.01. This is such a good P-value that we can reject the null hypothesis and accept the alternative explanation. ...

        Can you point out the flaw in this reasoning?

        You have evidence that the null hypothesis is flawed, but none that the alternative hypothesis is the correct explanation?

        The scie

        • The answer is simple. He's taken dozens, if not hundreds of measurements. The odds are in favor of one of the measurements turning up a correlation by chance. The odds against this particular measurement being by chance are 19 to 1--but he's selected it out of the group. The chances that one of *any* of his measurements would show such a correlation by chance are quite high, and he's just selected out the one that got that correlation.

        • He writes his paper and submits for publication: "Rats prefer to turn left", P 0.05, the effect is real, and all is good.

          There's no realistic way that a reviewer can spot the flaw in this paper.

          Actually, let's pose this as a puzzle to the readers. Can *you* spot the flaw in the methodology? And if so, can you describe it in a way that makes it obvious to other readers?

          I guess I don't see it. While P 0.05 isn't all that compelling, it does seem like prima facie evidence that the rats used in the sample prefer to turn left at that intesection for some reason. There's no hypothesis as to why, and thus way to generalize and no testable prediction of how often rats turn left in a different circumstances, but it's still an interesting measurement.

          Another poster got this correct: with dozens of measurements, the chance that at least one of them will be unusual by chance alone is very high.

          A proper study states the hypothesis *before* taking the data specifically to avoid this. If you have an anomaly in the data, you must state the hypothesis and do another study to make certain.

          You have a null hypothesis and some data with a very low probability. Let's say it's P 0.01. This is such a good P-value that we can reject the null hypothesis and accept the alternative explanation. ...

          Can you point out the flaw in this reasoning?

          You have evidence that the null hypothesis is flawed, but none that the alternative hypothesis is the correct explanation?

          The scientific method centers on making testable predictions that differ from the null hypothesis, then finding new data to see if the new hypothesis made correct predictions, or was falsified. Statistical methods can only support the new hypothesis once you have new data to evaluate.

          The flaw is called fallacy of the reversed conditional" [wikipedia.org].

          The researcher has "probability of data, given hypothesis" and assumes this implies "probability of hypothesis, given d

      • by ceoyoyo ( 59147 )

        I assume you're getting at multiple comparisons because you said "he measures many things."

        You're right, the researcher should correct his p-value for the multiple comparisons. Unfortunately, alternatives to p-values ALSO give misleading results if not corrected and, in general, are more difficult to correct quantitatively.

    • That's what happens when your reviewers are the unpaid submitters of other articles to your paper.
    • p-values are inherently bad statistics. You can't fix them with 'good methodology.' Can they be used properly in some situations? Maybe, if the author knows enough statistics to know when or when not to use them. But the people who use p-values are likely not to have that level of knowledge.

      p-values are like the PHP of statistics.

      > "This might be a case in which the cure is worse than the disease. The goal should be the intelligent use of statistics. If the journal is going to take away a tool, however m

    • by delt0r ( 999393 )
      I review in a fairly wide range of Journals (ones i have published in). From biology to computer science to math and statistics. Often the stats is sloppy and misinterpreted. So reviewers, begin the same group of people, have the same flawed ideas about that sort of thing.

      People put far to much faith in "science" and mostly scientists. We are just people, like everyone else. We have the same failings and just because we never left University, it does not make us special.
  • by PvtVoid ( 1252388 ) on Friday April 17, 2015 @11:28AM (#49494313)

    It's a war, I tell you, a war on frequentists! I'm 95% certain!

  • Comment removed based on user account deletion
  • From a blog [richmond.edu] by a colleague of mine on the subject: "Questions that p-values can answer" != "Interesting questions about the world'.
  • "Look at this experimental evidence and tell me what you see?"

  • My Paper (Score:5, Interesting)

    by Anonymous Coward on Friday April 17, 2015 @11:56AM (#49494627)

    Ok, let me enlighten the readers a bit. The reviewers tend to be the typical researcher within the field. The typical social researcher does not have a very strong math background. There is a lot of them into qualitative research and quantitative tends to stop at ANOVA. I have multiple masters in business and social science and worked on a Ph.D. in social science (Being vague here for a reason). However, I have a dual bachelors in comp sci and math. I know statistical analysis very well. My master's thesis for my MBA was an in-depth analysis of survey responses. 30 pages of body and really good graphs. My research professor, an econometrics professor, and I submitted it to a second tier journal associated with the field I specialized in...

    ... 6 pages got published. 6?!? They took out the vast majority of the math. Why? "Our readers are really bad at math," said the editor. If you knew the field... you would be scared shitless. The reviewers suggested we took out the math because it confused them. This is why they want P value out... it is misunderstood and abused. The reviewers have NO idea if it is being used correctly.

    • I have multiple masters in business and social science and worked on a Ph.D. in social science (Being vague here for a reason).

      And what reason is that? You're not even close to identifiable from this information, you know...

    • by bluFox ( 612877 )
      The unfortunate thing is that what they want could have been easily accomplished by requiring smaller p values, and also effect sizes (or the confidence intervals). Instead, it seems that the consensus is on using bayesian tools, and the standard ways of using the bayesian equivalents of t-tests[1] typically requires a smaller number of samples than frequentist methods depending on their prior. [1] http://www.sumsar.net/blog/201... [sumsar.net]
    • by Lehk228 ( 705449 )
      so you are saying the journal is shit and should be disregarded?

      because that is what I got from that. they don't understand the material they are approving or rejecting and so they serve no useful purpose.
  • by QuietLagoon ( 813062 ) on Friday April 17, 2015 @12:01PM (#49494659)
    This is why we can't have nice things.
  • While p-values are routinely misused in scientific literature, many researchers who understand its proper role are upset about the ban.

    Do they also know whether "p-values" is plural or singular?

  • by umafuckit ( 2980809 ) on Friday April 17, 2015 @12:23PM (#49494871)

    I don't think you even need to be pushing people to do Bayesian stats. You just need to force them to graph their data properly. In *a lot* of biological and social science sub-fields it's standard practice to show your raw data only in the form of a table and the results of stats tests only in the form of a table. They aren't used to looking at graphs and raw data. You can hide a lot of terrible stuff that way, like weird outliers. Things would likely improve immediately in these fields if they banned tables and forced researchers to produce box plots (ideally with overlaid jittered raw data), histograms, overlaid 95% confidence intervals corresponding to their stats tests, etc, etc.

    Having seen some of these people work, it's clear that many of them never make these plots in the first place. All they do is look at lists of numbers in summary tables. They have no clue in the first place what their data really look like, and know good knowledge of how to properly analyse data and make graphs. Before they even teach stats to undergrads they should be making them learn to plot data and read graphs. It's obvious most of them can't even do that.

    • They have no clue in the first place what their data really look like, and know good knowledge of how to properly analyse data and make graphs. Before they even teach stats to undergrads they should be making them learn to plot data and read graphs. It's obvious most of them can't even do that.

      That........

      Explains why some people struggle horrifically in statistics, and others can sleep through class and still get an A.

    • Graphs can lie just as easily as statistics themselves.

      • They both can be deceptive if you read them incorrectly or they're designed to deceive. However, it's harder to deceive someone with a graph than with the results of stats test. I really think graphs lie less easily than statistics.
  • by rgbatduke ( 1231380 ) <rgb@phy.du k e . e du> on Friday April 17, 2015 @01:03PM (#49495205) Homepage

    ...and this isn't even the first journal to do this. It's probably happening now because an entire book has just come out walking people how universally abused p-values are as statistical measures.

    http://www.statisticsdonewrong... [statisticsdonewrong.com]

    The book is nice in that it does give one replacements that are more robust and less likely to be meaningless, although nothing can substitute for having a clue about data dredging etc.

    rgb

  • If this is important enough of an issue to consider such a radical change to policy, then they should also have considered other possible solutions, like requiring a statistician be included in the pool of reviewers. The journal I submit to most frequently uses 2 to 3 ad hoc reviewers plus the associate section editor. It could be possible to require the section editor who choses the ad hoc reviewers to include a statistician as the 3rd reviewer. They would then review for the soundness of the statistical p
  • by SteveWoz ( 152247 ) on Friday April 17, 2015 @01:23PM (#49495363) Homepage

    I studied and tutored experimental design and this use of inferential statistics. I even came up with a formula for 1/5 the calculator keystrokes when learning to calculate the p-value manually. Take the standard deviation and mean for each group, then calculate the standard deviation of these means (how different the groups are) divided by the mean of these standard deviations (how wide the groups of data are) and multiply by the square root of n (sample size for each group). But that's off the point. We had 5 papers in our class for psychology majors (I almost graduated in that instead of engineering) that discussed why controlled experiments (using the p-value) should not be published. In each case my knee-jerk reaction was that they didn't like math or didn't understand math and just wanted to 'suppose' answers. But each article attacked the math abuse, by proficient academics at universities who did this sort of research. I came around too. The math is established for random environments but the scientists control every bit of the environment, not to get better results but to detect thing so tiny that they really don't matter. The math lets them misuse the word 'significant' as though there is a strong connection between cause and effect. Yet every environmental restriction (same living arrangements, same diets, same genetic strain of rats, etc) invalidates the result. It's called intrinsic validity (finding it in the experiment) vs. extrinsic validity (applying in real life). You can also find things that are weaker (by the square root of n) by using larger groups. A study can be set up in a way so as to likely find 'something' tiny and get the research prestige, but another study can be set up with different controls that turn out an opposite result. And none apply to real life like reading the results of an entire population living normal lives. You have to study and think quite a while, as I did (even walking the streets around Berkeley to find books on the subject up to 40 years prior) to see that the words "99 percentage significance level" means not a strong effect but more likely one that is so tiny, maybe a part in a million, that you'd never see it in real life.

    • by myrdos2 ( 989497 )
      I had no idea it was so common to confuse the p-value with the magnitude of the effect being studied. I haven't seen anything like it in HCI.
  • and the other crazies were right all along, that psychiatry is not a real science?
    Or does it just prove that the general understanding of math and statistics (except among matematicians) are fields that are in free fall, and that a few years from now, college graduates won't even be able to recite the multiplication table up to 10?

One good suit is worth a thousand resumes.

Working...