Is Statistical Significance Significant? (npr.org) 184

Posted by BeauHD on Thursday March 21, 2019 @09:00AM from the embrace-uncertainty dept.

More than 850 scientists and statisticians told the authors of a Nature commentary that they are endorsing an idea to ban "statistical significance." Critics say that declaring a result to be statistically significant or not essentially forces complicated questions to be answered as true or false. "The world is much more uncertain than that," says Nicoole Lazar, a professor of statistics at the University of Georgia. An entire issue of the journal The American Statistician is devoted to this question, with 43 articles and a 17,500-word editorial that Lazar co-authored.

"In the early 20th century, the father of statistics, R.A. Fisher, developed a test of significance," reports NPR. "It involves a variable called the p-value, that he intended to be a guide for judging results. Over the years, scientists have warped that idea beyond all recognition, creating an arbitrary threshold for the p-value, typically 0.05, and they use that to declare whether a scientific result is significant or not. Slashdot reader apoc.famine writes: In a nutshell, what the statisticians are recommending is that we embrace uncertainty, quantify it, and discuss it, rather than set arbitrary measures for when studies are worth publishing. This way research which appears interesting but which doesn't hit that magical p == 0.05 can be published and discussed, and scientists won't feel pressured to p-hack.

This discussion has been archived. No new comments can be posted.

Is Statistical Significance Significant?

Load All Comments

Search 184 Comments Log In/Create an Account

Comments Filter:

Statistically, yes. (Score:1)

by Narcocide ( 102829 ) writes:

But not always.
- I used to think so (Score:5, Funny)
  
  by goombah99 ( 560566 ) writes: on Thursday March 21, 2019 @09:11AM (#58309524)
  
  Then I took a course on statistics, and the stats professor told me that 47.37% of all statisticians make up their own statistics.
  
  Parent Share
  twitter facebook
  - P-hacking (Score:4, Funny)
    
    by goombah99 ( 560566 ) writes: on Thursday March 21, 2019 @09:13AM (#58309540)
    
    100% of all published incorrect results have a P value above 0.05
    
    Parent Share
    twitter facebook
    - All odd numbers are prime (Score:5, Interesting)
      
      by goombah99 ( 560566 ) writes: on Thursday March 21, 2019 @09:18AM (#58309558)
      
      A prime number is divisible only by itself and 1
      1 is prime (by this definition)
      3 is prime
      5 is prime
      7 is prime
      11 is prime
      13 is prime
      9 is experimental error.
      The proposition that "all odd numbers are prime" has a P value above 0.05.
      
      Parent Share
      twitter facebook
      - Re:All odd numbers are prime (Score:4, Informative)
        
        by colinwb ( 827584 ) writes: on Thursday March 21, 2019 @09:48AM (#58309744)
        
        1 is prime by that definition, but it's mostly called a unit and defined as *not* prime to make factorising integers into primes unique (up to the order of the factors): Prime number - Primality of 1 [wikipedia.org]
        
        Parent Share
        twitter facebook
      - Re: (Score:2)
        
        by rossdee ( 243626 ) writes:
        
        "A prime number is divisible only by itself and 1
        1 is prime (by this definition)"
        When I was learning Maths (Mathematics is plural where I come from) I was taught that 1 is not prime, it is a special case.
        Anyway for 1 the statement becomes:
        1 is divisible by 1 and 1
        But 1and1 is now IONOS
      - Re: (Score:2)
        
        by fibonacci8 ( 260615 ) writes:
        
        https://en.wikipedia.org/wiki/... [wikipedia.org]
      - Re: (Score:1)
        
        by goombah99 ( 560566 ) writes:
        
        It must be odd if it's prime.
        
        Re: (Score:2)
        
        by gnick ( 1211984 ) writes:
        
        The even number 2
        "All odd numbers are prime" does not imply "no even numbers are prime".
      - Re: (Score:2)
        
        by MooseTick ( 895855 ) writes:
        
        "Prime numbers have to be greater than 1 so 1 is not a prime."
        According to your definition. Like most terms, there is no king to give the definitive definition. To me, prime is a cut of meat.
        Also, remember that math is just a mental contruct that allows our human minds to interpret the universe around us.
        
        Re:All odd numbers are prime (Score:4, Informative)
        
        by thrich81 ( 1357561 ) writes: on Thursday March 21, 2019 @02:56PM (#58311444)
        
        Actually 1 is neither prime nor composite by some deep mathematical definitions which go beyond the integers -- they go into the structure of algebraic rings which are generalizations of the integers. If you allow 1 (a unit) to be prime then you break some properties and theorems which everyone generally accepts in the algebra of the integers. The most well known such property is that of unique factorization -- any natural number is factored uniquely into prime factors. If you let 1 be prime then the prime factorization of a composite number can have any number of factors of 1 in it.
        The deeper definition of a prime (from my old abstract algebra book) is, "In the Euclidean ring R a nonunit p is said to be a prime element of R if whenever p = ab, where a, b are in R, then one of a or b is a unit in R."
        And there is a king which gives the definitive definition -- it is the accepted body of mathematical definitions by the world's mathematical community. There are sometimes differing definitions of a term, but those differences are usually well spelled out in any discussions. You can choose not to accept the definitions as the professionals in the field use them but then don't claim your definition is as good or useful as that of the pros.
        
        Parent Share
        twitter facebook
    - Re: (Score:2)
      
      by Impy the Impiuos Imp ( 442658 ) writes:
      
      It would be nice to see how accurate .005 is with longitudinal studies of papers and their ultimate truthiness down the road.
      Like weather predictions of 30% chance of rain at 2 pm, did it actually rain 30% of the time?
      - Re: (Score:1)
        
        by Anonymous Coward writes:
        
        30% chance of rain means that 30% of the given area will experience rain at 2pm.
        Not that is rains 30% of a given amount of time.
        Also, this is regardless of the rain volume.
        
        Re: (Score:2)
        
        by mysticgoat ( 582871 ) writes:
        
        I'm across the river from Washington, in Oregon.
        Here, a 30% chance of rain means that if you be on it raining, you will win 3 out of 10 bets.
      - Re: (Score:3)
        
        by apoc.famine ( 621563 ) writes:
        
        Like weather predictions of 30% chance of rain at 2 pm, did it actually rain 30% of the time?
        That sort of research is done all the time. Usually it's on far more specific parts of weather models than the overall model. Weather models are ridiculously complicated, and scientists spend a lot of time on minor components of them like modeling aerosols better since they form the nuclei of clouds and thus rain, or the vertical humidity profile, or boundary layer dynamics. There are so many minor processes that make up weather that most of the research effort goes into things that 99.9% of the population
    - Re: P-hacking (Score:5, Insightful)
      
      by c6gunner ( 950153 ) writes: on Thursday March 21, 2019 @09:30AM (#58309638) Homepage
      
      100% of all published incorrect results have a P value above 0.05
      0.05 has always intended to be the bare minimum, not a guarantee of absolute truth. If you hit 0.05, and you haven't engaged in P hacking, it indicates that there may be an effect there and that more study is warranted.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by Sique ( 173459 ) writes:
        
        But 0.05 is as arbitrary as any value. A p-value of 0.05 means, that out of 20 studies, that consider themselves significant because of their p-value, one is a pure statistical fluke. So why not 0.1? Or 0.01? Or even 0.000,001?
        
        Re: P-hacking (Score:5, Insightful)
        
        by WhiplashII ( 542766 ) writes: on Thursday March 21, 2019 @10:00AM (#58309802) Homepage Journal
        
        Worse than that, if you only publish one out of 20 studies, you are reporting noise.
        
        Parent Share
        twitter facebook
        
        Re: P-hacking (Score:5, Insightful)
        
        by ShanghaiBill ( 739463 ) writes: on Thursday March 21, 2019 @01:23PM (#58310928)
        
        Worse than that, if you only publish one out of 20 studies, you are reporting noise.
        All publicly funded research should be published.
        Often the failed experiments are more important than the successes.
        Where would we be today if Michelson and Morley hadn't published their failure to measure the ether?
        
        Parent Share
        twitter facebook
        
        Re: P-hacking (Score:4, Insightful)
        
        by fropenn ( 1116699 ) writes: on Thursday March 21, 2019 @12:43PM (#58310688)
        
        Of course 0.05 is arbitrary. But researchers have to run studies using budgets that limit the amount of subjects in the study and they also are up against the level of accuracy of the test / instrument / survey. Obtaining extremely low p-values requires one or more of these:
        
        1. Very large sample sizes.
        
        2. Extremely effective intervention that produces huge differences between your groups.
        
        3. Extremely accurate instruments / measures.
        
        4. Lying.
        
        These things all come at a cost, which has to be balanced between doing fewer studies at higher cost or more studies at less cost.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by denzacar ( 181829 ) writes:
        
        Because 95% accuracy is GOOD ENOUGH for everyday life. And should the results of at least 19 studies agree... that would be more than enough.
        Also, p-value of 0.05 doesn't mean one study out of 20 is a pure statistical fluke.
        It means that in 1 case out of 20 WITHIN the study - we don't know if it is a statistical fluke.
        Which is why studies needs large samples - so that way 2 guys out of 20 who just happen to be allergic to something in the room don't mess up the entire study.
        56 guys out of a 1000 on the othe
  - Re: (Score:2)
    
    by dargaud ( 518470 ) writes:
    
    On average humans have one tit and one testicle...
    - Re: (Score:2)
      
      by apoc.famine ( 621563 ) writes:
      
      Only if you round to the nearest integer of each.
    - Re: (Score:3)
      
      by Aighearach ( 97333 ) writes:
      
      On average humans have one tit
      You understanding of mammal bodies is substantially lacking.
  - - - Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        I head 47.379
Objection (Score:1)

by Anonymous Coward writes:

> 850 scientists and statisticians
Not a statistically significant representation of the scientific community.
- Bio/Medical Fields (Score:5, Insightful)
  
  by Roger W Moore ( 538166 ) writes: on Thursday March 21, 2019 @09:40AM (#58309682) Journal
  
  Plus they are almost all from biology or medicine. Just because their fields don't seem to understand what statistically significant means does not mean that the rest of us do not. Their example when two results measure the same value but one is within one sigma of a null result and the other is not they claim that people interpret this as two incompatible results!? I do not know of any physicist who would look at those data and make that assertion.
  
  Their paper reads more like a "I wish our colleagues understood simple statistics". Banning certain terms is not going to address the underlying problem they clearly have. The solution to ignorance is education, not censorship as they really ought to know, working in universities!
  
  Parent Share
  twitter facebook
  - Re:Bio/Medical Fields (Score:4, Insightful)
    
    by omnichad ( 1198475 ) writes: on Thursday March 21, 2019 @10:27AM (#58309910) Homepage
    
    Statistics in medicine are inherently messier. We don't clone people to do experiments and they don't intentionally kill people. You don't get clean control subjects.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by ceoyoyo ( 59147 ) writes:
      
      No it's not. *Data* in medicine is inherently messier. That makes good statistics more important. In most cases the actual stats are easier: measurements in medicine tend to be so much crap averaged together that the central limit theorem works quite well, Gaussian assumptions are valid, and the t-test reigns supreme.
      - Re: (Score:2)
        
        by omnichad ( 1198475 ) writes:
        
        That's just a narrow definition of statistics. Gathering that data falls under "statistics" as well.
        
        Re: (Score:2)
        
        by ceoyoyo ( 59147 ) writes:
        
        I absolutely include designing the study and gathering data under the heading of statistics.
    - Re: (Score:2)
      
      by Roger W Moore ( 538166 ) writes:
      
      Statistics in medicine are inherently messier.
      I agree that it is harder to quantify your uncertainties because you have so many variables but this is what leads to incorrect uncertainty values. What we are talking about here is the correct interpretation of a stated uncertainty which is an entirely different problem to whether the stated uncertainty is correct.
      - Re: (Score:2)
        
        by omnichad ( 1198475 ) writes:
        
        I don't think you could ever accurately quantify the uncertainty.
        
        Re: (Score:2)
        
        by Roger W Moore ( 538166 ) writes:
        
        That's actually a given in any experiment in any field: all uncertainties are themselves uncertain to some degree. However, this in no way stops you from being able to correctly interpret what a stated uncertainty means.
  - Re: (Score:2)
    
    by oh_my_080980980 ( 773867 ) writes:
    
    It's about publishing Potsy. Something you would know if you were an actual researcher. The publishing game is about publishing statistical significant results. If you don't have that, your research does not get published.
    
    If you had RTFA you would know that no one is banning anything:
    
    "We are not calling for a ban on P values."
    
    "Nor are we saying they cannot be used as a decision criterion in certain specialized applications (such as determining whether a manufacturing process meets some quality-c
    - Re: (Score:3)
      
      by Roger W Moore ( 538166 ) writes:
      
      It's about publishing Potsy. Something you would know if you were an actual researcher.
      I am an actual researcher. Given your lack of understanding of statistics and reliance on ad hominem attacks, if you are a researcher too then you are clearly the target audience that this paper is trying to help by reducing your exposure to simple statistical concepts that you are likely to misinterpret.
      I never said that they were calling for a ban on p-values, I said that they were calling for an end to "statistical significance". To quote:
      We agree, and call for the entire concept of statistical significance to be abandoned.
      This is just stupid. You do not stop using a valuable and sens
    - Re: (Score:2)
      
      by ceoyoyo ( 59147 ) writes:
      
      Yes. They basically want people to stop saying p 0.05 and instead say p = 0.xxxxx. It's a great idea. As far as I can tell, it mostly happened twenty years ago when people learned how to use computers. Every once in a while I review a paper by someone who didn't get the memo and make them include their actual p-values.
  - Re: (Score:2)
    
    by ceoyoyo ( 59147 ) writes:
    
    Sad but true. And I do medical research.
    One time a particularly annoying research assistant came running down the hall all excited about two recently published papers that showed exactly the situation you mentioned. Look! Contradictory results! Who's wrong? Uh, those results are compatible with each other. One of them had a confidence interval that completely included the other.
Nope. (Score:5, Funny)

by dohzer ( 867770 ) writes: on Thursday March 21, 2019 @09:26AM (#58309608)

Nope. I'll delete it from Wikipedia later today.

Share
twitter facebook
Obligatory XKCD cartoon (Score:5, Funny)

by nickovs ( 115935 ) writes: on Thursday March 21, 2019 @09:28AM (#58309620)

882: Significant [xkcd.com]

Share
twitter facebook
- Re: (Score:2)
  
  by Solandri ( 704621 ) writes:
  
  That is actually one of the problems with statistical significance. It's only relevant if you're reporting one single result. If you're reporting multiple results, then that creates a second layer of statistical significance, where on average you expect several of those results to surpass your single-sample threshold of significance just by random chance. And so your findings are only noteworthy if you get more than a certain number of results which surpass your single-sample threshold.
  
  If you've got ju
  - Re: (Score:2)
    
    by ceoyoyo ( 59147 ) writes:
    
    "That is actually one of the problems with statistical significance. It's only relevant if you're reporting one single result."
    No, it's not. That's one of the problems with not knowing what you're doing. You're *supposed to* formulate a detailed hypothesis and analysis plan. That plan should include criteria for deciding what tests you'll do, and what combination of tests you will judge to be supporting each part of the hypothesis. Then you perform multiple comparisons correction based on the number of i
The Standards of Particle Physics (Score:1)

by Anonymous Coward writes:

In particle physics, (the field in which I have my Ph.D. but--full disclosure--no longer work), the standard is 3 sigma to claim evidence for an effect, and 5 sigma to claim discovery. Publication of results below 3 sigma is not only encouraged, but required...it's unethical to conceal such results. A null result can be a theory killer.
- Re: (Score:2)
  
  by habig ( 12787 ) writes:
  
  I still do work in particle physics. Yes, I do understand that particles are way easier to be careful with the error propagation on than anything medical. But still, we do spend 90% of the time on any given analysis "embracing uncertainty, quantifying it, and discussing it,", as TFA says. Figuring out the error bars is the hard part, and also usually the part that referees pick at to make sure you did it right.
  (BTW: Isn't p=0.05 only a 2-sigma result? Ick.)
  - Re: (Score:2)
    
    by sfcat ( 872532 ) writes:
    
    (BTW: Isn't p=0.05 only a 2-sigma result? Ick.)
    Its a bit less than 2-sigma. It should be more like 3-sigma (about p=0.01) which would make p-hacking much more difficult as it would take 100 variations to see a probable null hypothesis. Although the exact methods of conversion are complex. [cochrane.org]
- Re: (Score:2)
  
  by ceoyoyo ( 59147 ) writes:
  
  The publication of inconclusive results is a problem outside physics. Particle physics does have an advantage though: the data and analyses tend to be from only a few places. In parts of physics where Joe Anybody can ask a few undergrads a handful of questions and then write a paper, there's likely less publication of all those inconclusive results.
Quant vs Qual (Score:2)

by Nidi62 ( 1525137 ) writes:

In my International Relations graduate program there was a big push towards quantitative research and analysis; there were two mandatory classes on it. However, I always felt that it broke things down into too simplistic a view, and while it could tell things might be correlated, it never told you why. And with human systems like societies, states, conflict, politics, etc, there are so many inputs, so many factors that contribute to why people act the way they do, what decisions they make, that to boil it
- Re:Quant vs Qual (Score:5, Insightful)
  
  by PacoSuarez ( 530275 ) writes: on Thursday March 21, 2019 @10:20AM (#58309882)
  
  And this is why there is so little truth to be found in the humanities.
  
  Here's a scenario: A white nationalist kills dozens of Muslims. Someone looks at this and sees evidence that the normalization of fringe views, characteristic of the way president Trump talks, is emboldening these maniacs to act violently. Someone else looks at this and sees evidence that white middle-class uneducated men have been marginalized by our economic system and are at their wits' end, which is the same phenomenon that lead to Trump being elected.
  
  The kind of narrative-based elaborate analyses that you advocate doesn't help us decide which of the points of view above is right, and we carry on with our preconceptions, unable to learn anything.
  
  Narratives allow you to explain the past perfectly using models that have no predictive value. The only way to make progress when trying to understand a complex system is to come up with very simple hypotheses and try to validate them empirically. Of course this is very hard to do, but I think people in the humanities do a poor job and fool themselves into thinking they understand things they don't understand.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Nidi62 ( 1525137 ) writes:
    
    Here's a scenario: A white nationalist kills dozens of Muslims. Someone looks at this and sees evidence that the normalization of fringe views, characteristic of the way president Trump talks, is emboldening these maniacs to act violently. Someone else looks at this and sees evidence that white middle-class uneducated men have been marginalized by our economic system and are at their wits' end, which is the same phenomenon that lead to Trump being elected.
    The kind of narrative-based elaborate analyses that you advocate doesn't help us decide which of the points of view above is right, and we carry on with our preconceptions, unable to learn anything.
    You've proven my point: they're both right. When people in power either espouse certain views or give support (whether implicit or explicit) for those views it emboldens others who hold those same views. At the same time, it's a commonly held belief that marginalization, perceived or actual, can lead one to more extremist views. Both of those very likely factored into why the person in your scenario acted the way that he did. Using numbers tries to break everything down into black and white. With peop
  - Re: Quant vs Qual (Score:2)
    
    by phantomfive ( 622387 ) writes:
    
    Best description of narratives ever. It also explains why marketers like them so much.
  - Re:Quant vs Qual (Score:4, Interesting)
    
    by Kjella ( 173770 ) writes: on Thursday March 21, 2019 @03:16PM (#58311584) Homepage
    
    Narratives allow you to explain the past perfectly using models that have no predictive value. The only way to make progress when trying to understand a complex system is to come up with very simple hypotheses and try to validate them empirically. Of course this is very hard to do, but I think people in the humanities do a poor job and fool themselves into thinking they understand things they don't understand.
    A person is not a dice, no matter how much you want it to be. You can ask a fairly simple question like "Would you pose for nude art?" and get a survey answer. But if you break it down there'll be a ton of factors and the more answers you get and the more fine masked you make your model you'll only end up finding more and more differences plus the answer will not remain constant in place or time with a strong group dynamic and feedback loops. And you still will not have found a meaningful answer to why, only a bunch of correlated variables. Qualitative studies do the exact opposite, they don't generalize they ask one and one subject to explain their reasoning and try to summarize them into common sentiments. It's a much more accurate description for each person and the group as a whole. It's just really hard to compare scores because it's not on a measurable willingness scale.
    Yes, we've vaguely identified some risk factors that are usually present in a terrorist. We've got a long manifestos on why exactly that person turned into a terrorist. But everyone at risk are somewhere in between, they're not just risk factors and they're not clones of the terrorist. It's something like the Heisenberg's uncertainty principle for the social sciences, the more specific knowledge you have of an individual the less applicable it's to the group and the more general knowledge you have on the group the less accurate it's for the individual. They're both circling what nobody knows for sure, what exactly goes on in somebody else's head. Until we discover mind-reading technology that's going to be an approximation at best. Just because you can sell power tools to most Americans if you throw a dart at a map you could hit an Amish community.
    
    Parent Share
    twitter facebook
Science is hard (Score:3, Interesting)

by Sarten-X ( 1102295 ) writes: on Thursday March 21, 2019 @09:38AM (#58309678) Homepage
This way research which appears interesting but which doesn't hit that magical p == 0.05 can be published and discussed
The significance value is essentially a measurement of how good a researcher is at their job. Unfortunately, a lot of academics feel that they shouldn't be bothered by silly things like "accountability", because they've chosen the noble ivory tower of research.
If your experiment can't hit that level of certainty, redesign your experiment. Go get more samples, run more simulations, and grow more cultures. Alternatively, go ahead and publish, but include the note that the job isn't actually finished. Use the partial result to justify asking for more funding so you can complete the work.
- Half of your samples died unexpectedly? If you were a better researcher with better lab practices, you'd have had someone check that the equipment stayed plugged in over spring break.
- Nobody responded to your survey? Maybe you should try something more effective than standing in a corner of the local pub for an hour asking the drunks if you can "get something good from them real quick".
- You can't get enough reagents for your chemical process? Perhaps you should have actually budgeted for supplies, rather than host an open-bar party celebrating that you received that grant.
- You ran out of time on the cluster computer? Next time try asking the computer science students to review your program for efficiency, rather than trying to run a direct implementation of your whiteboard notes.
(These are all things I saw first- or secondhand during my time in academia)
I'd be fine getting rid of the p-value, but it would have to be replaced by something else that does an equal job of filtering out the half-assed crank "research" that makes more headlines than discoveries. The only replacement I can think of that wouldn't be vulnerable to similar "hack" methods would be to require that every experiment go through an exhaustive process inspection before, during, and after the run. That's an even more painful thing to deal with than making sure your experiment can produce significant results.
Share
twitter facebook
- Re: (Score:3, Interesting)
  
  by Anonymous Coward writes:
  
  This is absolute horseshit. There is often background noise in a measurement that you CAN NOT GET RID OF. Therefore you will never get a perfect 0 p-value. In fact, you will often be unable to reduce it beyond a certain point NO MATTER HOW GOOD YOUR EXPERIMENT IS.
  What the article is arguing is that we should not be using a blunt instrument like a p-value which is often a lazy person's (like the parent poster) substitute for quality, but instead should be assessing research on its relative merit and making j
  - Re: (Score:3)
    
    by Sarten-X ( 1102295 ) writes:
    
    I have a fair coin that always lands on heads, just with about 50% background noise.
    The whole point of an experiment is to remove the "background noise", which is another way of saying "uncontrolled variables". If your experiment can't isolate the target variable, then you need to fix your experiment. In the extremely rare case that the experiment can't be fixed, like in cases where a small number of particles matters (including the very small number of photons hitting a telescope sensor), you still should
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  They should publish p-value and effect size. I'm also a big advocate of robust statistics; the assumption of an underlying normal distribution is not always justified for real world data, central limit theorem notwithstanding.
- Comment removed (Score:5, Insightful)
  
  by account_deleted ( 4530225 ) writes: on Thursday March 21, 2019 @10:31AM (#58309926)
  
  Comment removed based on user account deletion
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Falconnan ( 4073277 ) writes:
    
    Well, that's the rub, isn't it? Rare is the experiment that proves an idea true. Most experiments are designed to falsify an hypothesis. Statistical noise comes in and complicates this, and can't always be accounted for. Hell, there's a philosophical case to be made that reality is the result of noise and its cancellation.
    It all really comes down to acknowledging that there's always some uncertainty to any measurement, whether due to limits of the measuring device, random noise, or previously unknown variab
- Re: (Score:2)
  
  by AmiMoJo ( 196126 ) writes:
  
  Alternatively, go ahead and publish, but include the note that the job isn't actually finished.
  That's what they are arguing for. Too many scientists won't submit and too many journals will reject based on the p value alone, which means some interesting ideas and data goes unpublished.
  They are not arguing for "crank" research, just that the p value isn't the be-all and end-all, and in fact no single data point should be. Work should be considered on all its merits.
  - Re: (Score:2)
    
    by Sarten-X ( 1102295 ) writes:
    
    What they're arguing for, in their words [nature.com]:
    The editors [of The American Statistician] introduce the collection with the caution “don’t say ‘statistically significant’”. Another article with dozens of signatories also calls on authors and journal editors to disavow those terms.
    We agree, and call for the entire concept of statistical significance to be abandoned.
    TFA summarizes this as a "ban on p-values".
    I'm all in favor of evaluating work on its merits, but the p-value is still a useful tool for measuring one of the most important merits: the chance that the result was completely coincidental.
    - Re: Science is hard (Score:3)
      
      by phantomfive ( 622387 ) writes:
      
      The scientists in the article are complaining that people conclude two things are the same when there is no statistical difference between the two. You can't conclude that: all you can say is "we aren't sure."
      - Re: (Score:2)
        
        by ceoyoyo ( 59147 ) writes:
        
        They are absolutely right on that score. People do that ALL the time. I tell my students it is the first sin of statistics because I'm sure it's responsible for the vast majority of committed statistical fallacies.
        It's the root of the "difference of difference" error, which is apparently present in 50% of neuroscience papers that have the opportunity to make it.
        
        Re: Science is hard (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        What's the difference of difference error?
        
        Re: (Score:2)
        
        by ceoyoyo ( 59147 ) writes:
        
        Here's a blog (with a link to a published paper) discussing the error and it's incidence in neuroscience: https://www.theguardian.com/co... [theguardian.com]
        Basically, imagine you've got a control group and two different treatments. You determine that treatment group A is not significantly different than control, but treatment group B is. So you conclude that treatment B works better than treatment A. Implicitly, you've assumed that the non-significance of group B means "no difference" or at least "less difference." Both o
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        Huh, so basically they are taking .05 and pretending it's zero. That is fascinating, I will look out for that now. I'm glad we had this conversation.
    - Re: (Score:2)
      
      by ceoyoyo ( 59147 ) writes:
      
      Meh, you just have to call it something else. Confidence interval. Likelihood ratio. Bayes factor. The dirty little secret is that these things are all mathematically equivalent, or very nearly so, for the vast majority of analyses that are actually conducted.
      People like simple solutions. A demon to exorcise. P-values fit that. The real problem is lazy interpretation. Any single result is questionable, no matter how well the data is collected and analyzed. A journal article is not truth, it's an obse
- Re: (Score:2)
  
  by jeff4747 ( 256583 ) writes:
  
  The significance value is essentially a measurement of how good a researcher is at their job
  Oh dear god no.
  Proving something is not true (aka, results are not significant) is an incredibly valuable thing.
  Demanding that all researchers produce experiments that prove their hypothesis true and only true is awful, and how you get p-hacking. And it is also what you are demanding here.
- Re:Science is hard (Score:5, Interesting)
  
  by werepants ( 1912634 ) writes: on Thursday March 21, 2019 @12:08PM (#58310500)
  
  The significance value is essentially a measurement of how good a researcher is at their job.
  
  This is totally wrong, and reflects the exact misconception that the article is talking about. For quite a while my job was doing experiments on hardware that cost as much as $100k per sample, where test time would cost $1000/hr or more, and you needed hundreds of hours of testing to get any kind of reasonable certainty. Budgets are finite, and at some point you have to decide how good is good enough, or even if isn't good enough, there just isn't any money left to do better. We could only estimate effects to within a couple orders of magnitude at times. However, we put error bars on fucking everything, so we were very explicit about how much slop there was in the answers. How good a researcher is at their job is determined by how much they can get done with finite resources, and how deeply they understand the limitations of their knowledge. All researchers should be trying to get maximal knowledge per dollar (or per time, in some cases), and sometimes an experiment with large uncertainty is the appropriate approach, or the only thing that is feasible within time/funding/physics constraints.
  Sure, if you are doing something basic like surveys, it's not hard to increase statistics. But if you are doing medical research on a new drug, costs can run into billions and you've got major ethical quandaries every step along the way. If you are developing a drug for a rare condition, there might only be a handful of test candidates in the world, and so you literally can't increase your sample size unless you wait a decade for more incidences to crop up. In that interval, depending on the specifics of the disease, people could be suffering or dying needlessly because you haven't gotten your drug approved.
  Yes, bad research is bad, and journals are replete with examples of terrible studies being published. But the p-value doesn't help that situation - it makes it worse, because it's treated as a binary marker of success. You can easily produce a great p-value by approaching science in the exact wrong way... look for significant correlations in a large, highly multivariate dataset and you are guaranteed to find some total nonsense correlations that look flawless (like the insanely tight correlation between swimming pool drowning deaths and Nicolas Cage movies... true story).
  What we actually need is more rigorous peer review and greater transparency and information sharing in science. If it becomes standard practice to make all of your raw data and calculations public, then it will become obvious very quickly when people are fudging numbers and inflating their stats.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Sarten-X ( 1102295 ) writes:
    
    As I said, go ahead and publish, but include the note that the job isn't actually finished. Use the partial result to justify asking for more funding so you can complete the work.
    For a while, my job was finding the handful of drug-trial candidates you mentioned. I understand that there are times when you realistically cannot hit a high degree of probability, but that doesn't mean a p-value isn't critically important, and it certainly doesn't mean a p-value should be "banned" from peer-reviewed journals.
    If y
- Re: (Score:2)
  
  by epine ( 68316 ) writes:
  
  If your experiment can't hit that level of certainty, redesign your experiment. Go get more samples, run more simulations, and grow more cultures.
  Ridiculous. You have to budget before data collection. But this approach could be valid, I suppose, anywhere money grows on trees.
This won't address the underlying problem (Score:4, Interesting)

by SlaveToTheGrind ( 546262 ) writes: on Thursday March 21, 2019 @09:46AM (#58309720)

Even without a magical "significant/insignificant" threshold, researchers will still evaluate, judge, and compare levels of significance. The pressure will just shift to come up with results that are "MORE significant" rather than "LESS significant," and thus p-hacking will continue by those that were willing to cross that line in the first place.
The root cause is going to remain until peer reviewers force researchers to commit to how they're going to evaluate their measurements before they take those measurements. But the likely outcome would be either a lot less research would get published at all or published research would start to lose some of the imprimatur it now enjoys, including that of the peer reviewers. So that's unlikely to happen.

Share
twitter facebook
- Re: (Score:2)
  
  by Falconnan ( 4073277 ) writes:
  
  I agree 90%. Another thing that would help would be to reduce pressure to always get a solid result at the end. "We did the experiment as designed and approved and we got very little to show for it," needs to be acceptable to avoid fraud, as well as to improve processes. "This experiment fails because 'x'," is a beautiful thing, since it has value: This experiment for this purpose doesn't work, so skip it or improve on it, please.
- Re: (Score:2)
  
  by ceoyoyo ( 59147 ) writes:
  
  Nope. What needs to happen is we need to give up on this idea that papers must be True. Scientific papers evolved from personal letters and presentations at scientific society meetings. A published paper is basically "hey guys, I think I found this thing that might be cool. Take a look?"
  The key being the last part. Have a look and see if you see the same thing. If a bunch of us do, we might be onto something. If, instead, you get a reputation for finding random crap nobody else can replicate, well....
I prefer to use an average for my mis-information (Score:1)

by Anonymous Coward writes:

On average, humans have one breast and one testicle.
It's even worse when economic trends are reported in the popular press.
These statisticians are idealists (Score:4, Interesting)

by plague911 ( 1292006 ) writes: on Thursday March 21, 2019 @09:50AM (#58309748)

Sure, in a perfect world we would all discuss the exact probabilities. The reality is we all (even professionals in an industry) have a limited attention span. Benchmarks are useful, even imperfect benchmarks. This is just another example of some purists thinking we should move to some idealized but impractical situation

Share
twitter facebook
- - Re: (Score:2)
    
    by plague911 ( 1292006 ) writes:
    
    Nonsense. We trust in science, not because we as humans are perfect, but because its a process we use to try and get as close the truth as we can. Admitting to inherent imperfection in humans in no way takes away from the scientific process. Peoples belief is fading because we have a couple political dogmas floating around that pump out the idea that the goberment/scientists/the illuminati is lying to them. Humans are flappable and yet the scientific process is ineffable when compared to "my neighbor told m
In defense of the p-value (Score:2)

by psychic_bacon ( 5478020 ) writes:

I'm really curious about what people think about this comment and my attempt to defend p-values and statistical significance testing as a concept. I used to hate p-values like any respectable scientist, but in teaching intro college stats class (targeted to behavioral science), I've come to appreciate them, for one major reason.
1. We have to take uncertain science and make certain decisions about the conclusions. Science gets simplified to dichotomous decisions. You either approve the drug or not. You eithe
- Re: (Score:2)
  
  by jeff4747 ( 256583 ) writes:
  
  The idea of this proposal is not to abandon p values. It's to stop using p less than 0.05 as a magical threshold.
  Loosening that limit is also far more useful in studies involving the "softer" sciences, where it's not possible to control all confounding variables. I wouldn't expect much of a benefit in astrophysics as you'd get in nutrition.
  So you'd still report p values, and something with a p-value that indicates the result is basically random probably wouldn't get published. But there'd also be more p
- Re: (Score:2)
  
  by fropenn ( 1116699 ) writes:
  
  You have to teach it because students need to be able to read articles and interpret what they are reading. The p-value will be reported in scientific studies for a long time to come, so this is an essential skill.
- Re: In defense of the p-value (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  Everyone needs to understand statistics in the modern world. People who don't get lost and very, very confused.
The problem is more that people don't understand (Score:2)

by Opportunist ( 166417 ) writes:

Mostly, they don't understand that the world isn't black and white.
People want answers. That's a given. And they used to turn to science for this. I say used to, because more and more people think that woo has better answers for their questions. The reason is less that science does not have answers, but that the answers science has require thinking and understanding. They are rarely YES or NO. There's a lot of ifs and buts attached, but people don't want that. They want easy answers.
And reality has rarely e
- Re: (Score:2)
  
  by andrewbaldwin ( 442273 ) writes:
  
  And reality has rarely easy answers.
  Which is why engineers answer most questions with "it depends".
  - Re: (Score:2)
    
    by Opportunist ( 166417 ) writes:
    
    And this is why we don't make good politicians. Politics need easy answers. They needn't be correct or even solve anything, but they have to be easy to understand.
Yes (Score:2)

by Chris Mattern ( 191822 ) writes:

If you understand what it means and how to apply it. If you blindly slap on the formula and use the resulting number to say, "Look, it's significant!", then, no, it isn't.
0.051 (Score:2)

by cheaphomemadeacid ( 881971 ) writes:

meh just set it to 0.051 and watch 90% of "science" publication burn
- Re: (Score:2)
  
  by Actually, I do RTFA ( 1058596 ) writes:
  
  I'm guessing you don't understand p-values and statistical significance. Setting the limit of publishability to 0.051 would increase the number of papers that passed the test.
- Re: (Score:2)
  
  by goose-incarnated ( 1145029 ) writes:
  
  meh just set it to 0.051 and watch 90% of "science" publication burn
  You mean 0.049.
Nature is not always Gaussian (Score:2)

by pz ( 113803 ) writes:

The issue I find with nearly every single biological application of p-value testing is that either the wrong test is used, or, far more frequently, the necessary validations of the assumptions of the test have not been made. I assume that among those many articles from The American Statistician (a journal that I do not read) that point will have been made because although it is a subtle one, it isn't that subtle, and it is important.
The most commonly used statistical tests assume that unaccounted experimen
- Re: (Score:2)
  
  by ceoyoyo ( 59147 ) writes:
  
  No, there's a reason most people learn how to do Gaussian stats and then stop. MOST measurements have Gaussian error because that's what you get when you average or add up a bunch of random variables. Most measurements are really composites like that, and the noise is quite Gaussian.
  It's very important to recognize situations where that's not true though. Counts, surveys, ordinal scales, data that's been transformed, etc. People are legitimately terrible at doing that.
It's Elemetrary Statistics 101, from 60 years ago (Score:2)

by guacamole ( 24270 ) writes:

If you browse around a typical statistics textbook, you will probably find a brief discussion about the difference between statistical significance and real world significance. It seems like a lot of people in sciences, specially in the soft sciences are chasing after the statistical significance because it's now some kind of a prerequisite to get published. However, their findings can amount to very little in the real world. Imagine for example that you find out there the commute distance is statistically
Just an excuse to excuse publishing crap papers. (Score:2)

by Chas ( 5144 ) writes:

Yay.
So we can look forward to even MORE broken, badly researched, pointless garbage being published as academically or scientifically relevant.
Look at the finances of any journal pushing this crap. They're probably on borrowed time, in the financial sense.
It's not p, it's n (Score:2)

by WillAffleckUW ( 858324 ) writes:

I don't care how significant your p value is, if your n is less than 40 case/control match your values are meaningless, other than proof of concept for further study.
Wake me up when you get 256/256 fully matched case/control with true randomization. Then we'll talk p values.
Doesn't "significant" mean "important"? (Score:2)

by RespekMyAthorati ( 798091 ) writes:

I've always thought "statistical reliability" was a better name.
- Re: (Score:3)
  
  by MightyMartian ( 840721 ) writes:
  
  When I took statistics, the text made it clear that a P-value of 0.05 is *somewhat* arbitrary, in that for any individual analysis, it is a useful threshold, but by itself not an absolute indicator of significance. I think the people in this group are guilty of overstating their argument. Determining P-value, or any other statistical measure of significance, is the *start* of a study, and then comes all the hard work of determining if that value is pointing to something truly significant. But a p value of 0
  - Re: (Score:2)
    
    by willaien ( 2494962 ) writes:
    
    Having P = 0.05 has led to "P-Hacking". One example was the "chocolate" trial that had a small group eating different diets and tracked a whole slew of things, looking for a correlation in any of them. It happened to be that in the small group eating 1.5oz of chocolate and otherwise dieting lost slightly more weight than the group just dieting. Due to not having a specific goal in mind, and small sample size, they were bound to determine some sort of "positive correlation" somewhere, and there you go. If it
    - Re: (Score:2)
      
      by ceoyoyo ( 59147 ) writes:
      
      I much prefer Bayes factor hacking. Sounds way fancier.
  - Re: Hail incoherentism! (Score:3)
    
    by phantomfive ( 622387 ) writes:
    
    The real problem is when scientists aren't interested in finding something significant, they are interested in getting published. In that situation, even setting the threshold at .0005 will end up with p value hacking.
  - Re: (Score:2)
    
    by ceoyoyo ( 59147 ) writes:
    
    p 0.05 is supposed to be kind of a minimum threshold. Higher than that and you really can't draw any conclusions. Less than that, and you might have something. Maybe. It's basically a first level filter.
    Does that mean you get some false negatives? Absolutely. And you also get lots of false positives.
    There seem to be a bunch of people who want to look at confidence intervals and say "well, a good part of my confidence interval is over here, which is interesting, so this is important!" There are also a
- Re: (Score:2)
  
  by parkinglot777 ( 2563877 ) writes:
  
  Then don't use the word statistical significance to express the word "real". This "p value exaggerate" has been discussed over 10 years ago [researchgate.net], but the p-value is still being used because many people, like you, are, as you are saying, have to accept some thing or method in order to be "real". As MightyMartian said, p-value could be used in initial test, but it should never be used as "statistical significance" at all.
- Re: (Score:2)
  
  by omnichad ( 1198475 ) writes:
  
  The wait is over - YOU did!
- Re: (Score:2)
  
  by WillAffleckUW ( 858324 ) writes:
  
  To get a truly significant number, you would need 850 scientists, and 850 controls (or non-scientists). And you would need a truly randomized sample of both. If all the scientists are the same age BMI and gender, it's not even close to randomized. Throw in some post-docs.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Statistically, yes. (Score:1)

I used to think so (Score:5, Funny)

P-hacking (Score:4, Funny)

All odd numbers are prime (Score:5, Interesting)

Re:All odd numbers are prime (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re:All odd numbers are prime (Score:4, Informative)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3)

Re: P-hacking (Score:5, Insightful)

Re: (Score:2)

Re: P-hacking (Score:5, Insightful)

Re: P-hacking (Score:5, Insightful)

Re: P-hacking (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Objection (Score:1)

Bio/Medical Fields (Score:5, Insightful)

Re:Bio/Medical Fields (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Nope. (Score:5, Funny)

Obligatory XKCD cartoon (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

The Standards of Particle Physics (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Quant vs Qual (Score:2)

Re:Quant vs Qual (Score:5, Insightful)

Re: (Score:2)

Re: Quant vs Qual (Score:2)

Re:Quant vs Qual (Score:4, Interesting)

Science is hard (Score:3, Interesting)

Re: (Score:3, Interesting)

Re: (Score:3)

Re: (Score:1)

Comment removed (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Science is hard (Score:3)

Re: (Score:2)

Re: Science is hard (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Science is hard (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

This won't address the underlying problem (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

I prefer to use an average for my mis-information (Score:1)

These statisticians are idealists (Score:4, Interesting)

Re: (Score:2)

In defense of the p-value (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: In defense of the p-value (Score:2)

The problem is more that people don't understand (Score:2)