Forgot your password?
typodupeerror
Science

Why Published Research Findings Are Often False 453

Posted by samzenpus
from the race-to-publish dept.
Hugh Pickens writes "Jonah Lehrer has an interesting article in the New Yorker reporting that all sorts of well-established, multiply confirmed findings in science have started to look increasingly uncertain as they cannot be replicated. This phenomenon doesn't yet have an official name, but it's occurring across a wide range of fields, from psychology to ecology and in the field of medicine, the phenomenon seems extremely widespread, affecting not only anti-psychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants. 'One of my mentors told me that my real mistake was trying to replicate my work,' says researcher Jonathon Schooler. 'He told me doing that was just setting myself up for disappointment.' For many scientists, the effect is especially troubling because of what it exposes about the scientific process. 'If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved?' writes Lehrer. 'Which results should we believe?' Francis Bacon, the early-modern philosopher and pioneer of the scientific method, once declared that experiments were essential, because they allowed us to 'put nature to the question' but it now appears that nature often gives us different answers. According to John Ioannidis, author of Why Most Published Research Findings Are False, the main problem is that too many researchers engage in what he calls 'significance chasing,' or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher. 'The scientists are so eager to pass this magical test that they start playing around with the numbers, trying to find anything that seems worthy,'"
This discussion has been archived. No new comments can be posted.

Why Published Research Findings Are Often False

Comments Filter:
  • Hmmmmm (Score:5, Interesting)

    by Deekin_Scalesinger (755062) on Sunday January 02, 2011 @12:30PM (#34737664)
    Is it possible that there has always been error, but it is just more noticeable now given that reporting is more accurate?
  • It's simple. (Score:5, Interesting)

    by Lord Kano (13027) on Sunday January 02, 2011 @12:31PM (#34737672) Homepage Journal

    Even in academia, there's an establishment and people who are powerful within that establishment are rarely challenged. A new upstart in the field will be summarily ignored and dismissed for having the arrogance to challenge someone who's widely respected. Even if that respected figure is incorrect, many people will just go along to keep their careers moving forward.

    LK

  • race to the bottom (Score:4, Interesting)

    by toomanyhandles (809578) on Sunday January 02, 2011 @12:38PM (#34737726)
    I see this as one more planted article in mainstream press: "Science is there to mislead you, listen to fake news instead". The rising tide against education and critical thinking in the USA is reminiscent of the Cultural Revolution in China. It is even more ironic that the argument "against" metrics that usefully determine validity is couched in a pseudo-analytical format itself. At this point in the USA, most folks reading (even) the New yorker have no idea what a p-value is, why these things matter, and they will just recall the headline "science is wrong". And then they wonder in Detroit why they can't make $100k a year anymore pushing the button on robot that was designed overseas by someone else- you know, overseas where engineering, science, etc are still held in high regard.
  • Bogus article (Score:5, Interesting)

    by Anonymous Coward on Sunday January 02, 2011 @12:48PM (#34737794)

    That article is as flawed as the supposed errors it reports on. The author just "discovered" that biases exist in human cognition? The "effect" he describes is quite well understood, and is the very reason behind the controls in place in science. This is why we don't, in science, just accept the first study published, why scientific consensus is slow to emerge. Scientists understand that. It's journalists who jump on the first study describing a certain effect, and who lack the honesty to review it in the light of further evidence, not scientists.

  • Re:Science? (Score:2, Interesting)

    by hedwards (940851) on Sunday January 02, 2011 @12:48PM (#34737800)
    I'm not sure about ecology, but psychology and medicine are definitely not science, nor have they ever been science.

    Probably the best indictment of psychology as a pseudo-science I've ever seen is: Trauma Myth The Truth About the Sexual Abuse of Children--and its Aftermath by Susan Clancy [perseusbooks.com]

    She herself is basically a scientist, she engages in testing hypotheses in order to determine their validity and has been willing to set aside ones that were demonstrated to be false in favor of better ones. But, unfortunately, most in her field are charlatans.
  • by onionman (975962) on Sunday January 02, 2011 @12:50PM (#34737816)

    After years of speculation, the a study has revealed that scientists are, in fact, human. The poor wages, long hours, and relative obscurity that most scientists dwell in has apparently caused widespread errors, making them almost pathetically human and just like every other working schmuck out there...

    I'll add another cause to the list. The "publish or perish" mentality encourages researchers to rush work to print often before they are sure of it themselves. The annual review and tenure process at most mid-level research universities rewards a long list of marginal publications much more than a single good publication.

    Personally, I feel that many researchers publish far too many papers with each one being an epsilon improvement on the previous. I would rather they wait and produce one good well-written paper rather than a string of ten sequential papers. In fact, I find that the sequential approach yields nearly unreadable papers after the second or third one because they assume everything that is in the previous papers. Of course, I was guilty of that myself because if you wait to produce a single good paper, then you'll lose your job or get denied tenure or promotion. So, I'm just complaining without being able to offer a good solution.

  • by bcrowell (177657) on Sunday January 02, 2011 @01:10PM (#34737964) Homepage

    The article can be viewed on a single page here: http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer?currentPage=all [newyorker.com]

    Not surprisingly, most of the posts so far show no signs of having actually RTFA.

    Lehrer goes through all kinds of logical contortions to try to explain something that is fundamentally pretty simple: it's publication bias plus regression to themean. He dismisses publication bias and regression to the mean as being unable to explain cases where the level of statistical significance was extremely high. Let's take the example of a published experiment where the level of statistical significance is so high that the result only had one chance in a million of occurring due to chance. One in a million is 4.9 sigma. There are two problems that you will see in virtually all experiments: (1) people always underestimate their random errors, and (2) people always miss sources of systematic error.

    It's *extremely* common for people to underestimate their random errors by a factor of 2. That means the the 4.9-sigma result is only a 2.45-sigma result. But 2.45-sigma results happen about 1.4% of the time. That means that if 71 people do experiments, typically one of them will result in a 2.45-sigma confidence level. That person then underestimates his random errors by a factor of 2, and publishes it as a result that could only have happened one time in a million by pure chance.

    Missing a systematic error does pretty much the same thing.

    Lehrer cites an example of an ESP experiment by Rhine in which a certain subject did far better than chance at first, and later didn't do as well. Possibly this is just underestimation of errors, publication bias, and regression to the mean. There is also good evidence that a lot of Rhine's published work on ESP was tainted by his assistants' cheating: http://en.wikipedia.org/wiki/Joseph_Banks_Rhine#Criticism [wikipedia.org]

  • by fermion (181285) on Sunday January 02, 2011 @01:11PM (#34737968) Homepage Journal
    The scientific method derives from Galileo. He constructed apparatus and made observations that any trained academician and craftsperson of his day could have made, but they did not because it was not the custom. He built inclined planes, lenses, and recorded what he say. From this he made models that included predictions. Over time those predictions were verified by other such as Newton, and the models became more mathematically complex. The math used is rigorous.

    Now science uses different math, and the results are expressed differently, even probabilistically. But in real science those probabilities are not what most think as probability. In a scanning tunneling microscope, for instance, works by the probability that a particle can jump an air gap. Though this is probabilistic, It is well understood so allows us to map atoms. There is minimal uncertainty in the outcome of the experiment.

    The research talked about in the article may or may not be science. First, anything having to do with human systems is going to be based on statistics. We cannot isolate human systems in a lab. The statistics used is very hard. From discussions with people in the field, I believe it is every bit as hard as the math used for quantum mechanics. The difference is that much of the math is codified in computer applications and researchers do not necessarily understand everything the computer is doing. In effect, everyone is used the same model to build results, but may not know if the model is valid. It is like using a constant acceleration model for which a case where there is a jerk. The results will be not quite right. However, if everyone uses the faulty model, the results will be reproducible.

    Second, the article talks about the drug dealers. The drug dealers are like the catholic church of Galileo's time. The purpose is not to do science, but to keep power and sell product. Science serves a process to develop product and minimize legal liability, not explore the nature of the universe. As such, calling what any pharmaceutical does as the 'scientific method' is at best misguided.

    The scientific method works. The scientific method may not be comopletey applicable to fields of studies that try to find things that often, but not, always, work in a particular. The scientific method is also not resistant to group illusion. This was the basis of 'The Structure of Scientific Revolution'. The issue here, if there is one, is the lack of education about the scientific method that tends to make people give individual results more credence than is rational, or that is some sort of magic.

  • Re:Yes it does. (Score:4, Interesting)

    by Rockoon (1252108) on Sunday January 02, 2011 @01:19PM (#34738026)
    There is a lot of science where new data is not generated at a rate where true reproducibility is an option.

    For example, anything to do with the general health of a person can only really be measured over long time scales (decades), as well as measurements of the climate and things like that.

    In those cases, 'reproduction' means taking the same data, sifting it in possibly the same way (but maybe not), and getting the same or similar result.

    Now take this fact in the context of data dredging.

    Data dredging does not have to be intentional (ie: an intent to defraud, although it certainly can be.)

    If you take 1000 scientists and give them all the same data, they will probably look at that data in several thousand ways. If you are dealing with 95% intervals, and the data is looked at in 2000 ways, then about 100 of those ways will present something 'significant' by simple random chance.

    The same phenomenon exists in that whole bullshit "Equidistant Letter Spacing" Bible-Code crap, but is much easier to dismiss because you have to believe something extremely unlikely (God exists, and orchestrated the translation of the bible into English so that it would have hidden codes.)

    When you really get into dismissing Bible Code in a mathematical manner, you end up realizing that in any data set there exists many things of statistically significance and yet also completely bullshit.
  • Re:Hmmmmm (Score:5, Interesting)

    by digsbo (1292334) on Sunday January 02, 2011 @01:29PM (#34738090)
    Wow. I didn't pick up any of that at all, and I RTFA. It looked to me much more like acknowledgement of widespread difficulties with randomness, scale, and human fallibility. Exactly the kinds of things that would make someone who's a staunch defender of "science as a means to truth" to disregard valuable critical information about it.
  • by Moof123 (1292134) on Sunday January 02, 2011 @01:32PM (#34738114)

    Agreed. Way too many papers from academia are ZERO value added. Most are a response to "publish or perish" realities.

    Cases in point: One of my less favorite profs published approximately 20 papers on a single project, mostly written by his grad students. Most are redundant papers taking the most recent few months data and producing fresh statistical numbers. He became department head, then dean of engineering.

    As a design engineer I find it maddening that 95% of the journals in the areas I specialize in are:

    1. Impossible to read (academia style writing and non-standard vocabulary).

    2. Redundant. Substrate integrated waveguide papers for example are all rehashes of original waveguide work done in the 50's and 60's, but of generally lower value. Sadly the academics have botched a lot of it, and for example have "invented" "novel" waveguide to microstrip transitions that stink compared to well known techniques from 60's papers.

    3. Useless. Most, once I decipher them, end up describing a widget that sucks at the intended purpose. New and "novel" filters should actually filter, and be in some way as good or better than the current state of the art, or should not be bothered to be published.

    4. Incomplete. Many interesting papers report on results, but don't describe the techniques and methods used. So while I can see that University of Dillweed has something of interest, I can't actually utilize it.

    So as a result when I try to use the vast number of published papers and journals in my field, and in niches of my field to which I am darn near an expert, I cannot find the wheat from the chaff. Searches yield time wasting useless results, many of which require laborious decyphering before I can figure that they are stupid or incomplete. Maybe only 10% of the time does a day long literature search yield something of utility. Ugh.

  • Re:Hmmmmm (Score:5, Interesting)

    by Anonymous Coward on Sunday January 02, 2011 @02:06PM (#34738398)

    Very well expressed. To put this in a context which will seem bizarre to many readers of slashdot, there is a whole range of products on the market to help "scientific astrologers" search out correlations between planetary positions and life circumstances. And a legion of astrologers making use of them -- at several hundred dollars a copy -- to pore over birth charts with dozens and dozens of factors. Unless things have changed in the years since I looked into this, what's usually conveniently sidestepped is that some of those factors will indeed show up significant by chance. After all, that is the very definition of probability expressions such as "p less than .05". On replication, these findings will normally disappear, resulting in a crestfallen astrologer. (Then again, why not just expand the original dataset and check again to see if different factors come up this time :-)

    But the motivation to get something out of the data is high, as the parent post points out, and researchers may be able to deceive themselves just as well as astrologers can, especially when academic careers are on the line.

  • Re:It's simple. (Score:4, Interesting)

    by FourthAge (1377519) on Sunday January 02, 2011 @02:07PM (#34738410) Journal

    Oh, it happens. And if you're in the academic business, then I'm very surprised you've not noticed it.

    Politics is very important in the business of accepting and rejecting papers. It's micro-politics, i.e. office politics. It's very important to get things accepted, but in order to do so, you have to be aware of the relevant political issues within the committee that will accept or reject your work. It's hard to write a paper that doesn't step on any toes, so you have to be sure you pick the right toes to step on.

    When I was part of this business I was aware of a few long-standing feuds between academics; their research students and coworkers all took sides and rejected work from the other side. It was bizarre. It would have been funny if it had not been so pathetic. Even now I cannot watch an old Newman and Baddiel sketch [youtube.com] without being reminded of childish feuding professors from real life.

    I don't think every sort of science is like this. Probably in physics and chemistry, you can get unpopular work published just by being overwhelmingly right. But in softer non-falsifiable sciences, it's mostly about politics, and saying the right things. There are a whole bunch of suspect sciences that I could list, but I know that some of them would earn me an instant troll mod (ah, politics again!), so I'll leave it at that.

  • Re:Hmmmmm (Score:5, Interesting)

    by bughunter (10093) <bughunter@ear[ ]ink.net ['thl' in gap]> on Sunday January 02, 2011 @02:50PM (#34738698) Journal

    Start with a ridiculous premise to get people reading, then break out what's really happening

    Welcome to corporate journalism. And corporate science.

    If there's one useful thing that 30 years of recreational gaming has taught me, it's this: Players will find loopholes in any set of rules, and exploit them relentlessly for an advantage. Corrolaries include the tendency for games to degenerate into contests between different rulebreaking strategies and the observation that if you raise the stakes to include rewards of real value (like money) then the games with loopholes attract players who are not interested in the contest, but only in winning.

    This lesson applies to all aspects of life from gaming, to sports, business, and even dating.

    And so it's no surprise that when the publishers set up a set of rules to validate scientific results, that those engaged in the business of science will game those rules to publish their results. They're being paid to publish; if they don't publish, they've "lost" or "failed" because they will receive no further funding. So the stakes are real. And while the business of science still attracts a lot of true scientists -those interested in the process of inquiry- it now also attracts a lot of players who are only interested in the stakes. Not to mention the corporate and political interests who have predetermined results that they wish to promulgate.

    What was really the point of implying that truth can change?

    To game the system, of course. The aforementioned corporate and political interests will use this line of argument now, in order to discredit established scientific premises.

  • Re:Hmmmmm (Score:5, Interesting)

    by JBMcB (73720) on Sunday January 02, 2011 @03:18PM (#34738900)

    The National Center for Complimentary and Alternative Medicine has received billions of dollars of public NIH funding. They study "alternative" medicine, such as chiropractic and homeopathic remedies. So far, their strongest conclusion has been that ginger has a slight positive effect on upset stomachs.

    Billions of dollars. Ginger for upset stomachs. When asked why they haven't produced many solid results, the director of NCCAM usually says that they need more funding. I'd say we need a bit more results-based funding in some areas.

  • Re:It's simple. (Score:1, Interesting)

    by drsmack1 (698392) on Sunday January 02, 2011 @03:20PM (#34738910)

    I could easily do a "there, fixed that for you" on your statement, making it from the position of the other side. In some lights it would make better sense that way.

    TRILLIONS of dollars hang in the balance - and extraordinarily claims require extraordinarily evidence. You want *my* trillions? Give me proof worth a trillion dollars.

    Sooo many times in the last 110 years a large part of some segment of the scientific community was convinced that a disaster was upon us. Each time it was eventually discovered that there was nothing to worry about.

    Of course many suffered before the truth became known.

    Eugenics
    Hole in the ozone
    DDT
    Global Cooling
    Acid Rain
    Alar
    Global Warming

    All these things have one thing in common - they were pushed as a political and social agenda by the liberal/scientific elite. They were seized upon by dangerous power-hungry politicians as a way of grabbing power.

    All this Global Warming stuff smells the same to me. I am a student of Science History and I can clearly see that this is the same pattern, the same story with the names and places changed.

  • Re:Hmmmmm (Score:5, Interesting)

    by JDS13 (1236704) on Sunday January 02, 2011 @03:41PM (#34739046)

    > So, a capitalistic, fully performance based (with results being the performance metric)
    > environment does not seem to work well for science. / Surprised? / Me neither.

    This is a gratuitous, cheap shot. These problems appear only in scientific research that is funded, managed, or supervised by government agencies or academic review committees so that bureaucrats will grant money, or full professorships, or licenses to sell drugs. Hence the crack that if you want to study squirrels in the park, you title your grant proposal, "Global Warming and Squirrels in the Park."

    There are "capitalistic... performance-based environments" in science - but they're the corporate R&D departments that are seeking marketable innovations. There isn't much intellectual corruption or fudging of study results in, say, pushing the limits of video card performance.

  • Data snooping (Score:4, Interesting)

    by tgibbs (83782) on Sunday January 02, 2011 @03:58PM (#34739142)

    If you take 1000 scientists and give them all the same data, they will probably look at that data in several thousand ways. If you are dealing with 95% intervals, and the data is looked at in 2000 ways, then about 100 of those ways will present something 'significant' by simple random chance.

    Not really. This would be only true if all of those 2000 ways were statistically independent from one another. It would take a much larger dataset than most scientists deal with for there to be 2000 different ways of analyzing it, and even then they would not be statistically independent.

    So the problem is not as bad as you suggest, but it is real. If I compare 20 different statistically independent measurements, one is expected to meet the p 0.05 criterion by pure random chance. There are ways of correcting for this bias, by requiring a higher criterion of statistical significance (say p 0.0025), but that also reduces the power of my study to detect a real difference.

    Which is appropriate really depends upon the nature of the experiment and the question being asked. If I do 20 measurements and half of them are statistically significant, I may not much care if one of them is by chance.

    If I want to minimize the likelihood of reporting an incorrect result, while maximizing the power of my study, my best bet is to decide in advance on a very few measurements and statistical tests, and stick with them. That's good for me, but it doesn't really help the reader who is looking at a bunch of different studies, because each finding reported at p = 0.05 still has one chance in 20 of being wrong. Added to that is an unknown magnitude of publication bias, because studies with significant findings are more likely to be published than those that find nothing of statistical significance.

  • Re:Hmmmmm (Score:4, Interesting)

    by dkf (304284) <donal.k.fellows@manchester.ac.uk> on Sunday January 02, 2011 @04:53PM (#34739444) Homepage

    nahh, the problem is a misunderstanding of statistics (thinking that post-hoc analysis with this fishing for statistical significance) is as valid as proper hypothesis testing. The proper way is where the hypothesis is fully pre-formed and then tested. The numbers and statistics apply ONLY TO THE HYPOTHESIS being tested, so you cannot hunt for a statistical significance just somewhere in the data and then re-formulate your hypothesis.

    The problem is that there are a lot of fields (e.g., astronomy, economics) where it is not possible to conduct proper lab experiments. That means you've got to just collect all the data that you can and try to work with it. The best way to do that is to partition the data and use part of it to search for candidate hypotheses, and the rest (possibly with additional partitions) to check those hypotheses, and yet it's never entirely certain that enough data is present in either set for correct conclusions to be drawn. It's challenging statistically (and part of why I prefer to write programs instead).

  • Re:Not that simple. (Score:5, Interesting)

    by Simetrical (1047518) <Simetrical+sd@gmail.com> on Sunday January 02, 2011 @07:11PM (#34740052) Homepage

    Hard sciences simply lend themselves a lot better to repeatability. Where I think we go wrong is assigning the same certainties to the claims of the soft scientists.

    Granted that hard sciences are probably more reliable, but unfortunately, a lot of the research even there is shaky. I overheard roughly the following conversation between a graduate student in mathematics and his thesis adviser one summer, while I was doing undergraduate summer math research at the CUNY Graduate Center on an NSF grant (RTG):

    • Student: So I looked into the paper by Smith, and when I did the same computations, I got a different answer. I haven't been able to figure out what I'm doing differently. Do you think I should e-mail him?
    • Adviser: No. If the results are inconsistent, pretend they don't exist. Don't use them, but don't tell anyone you got different results either. If you do, then they'll just suspect that your results are wrong.
    • Student: Yeah, I suspect that too.
    • Adviser: But don't contact him, because people don't like being proven wrong. You can point out errors in people's papers once you've got tenure – it's not something you want to do as a grad student. You don't want to make this guy your enemy.
    • Student: Oh, okay . . .

    Even if high-profile results are more reliable in the hard sciences, your average paper is still unreproducible garbage. The problem is the system, which forces everyone to publish as much as possible without heed to quality; and the journals, which publish only positive results. Researchers need to publish all their results publicly, including registering their hypotheses before they even begin the study. Universities need to take a stand by not focusing on quantity of publications. More emphasis must be placed on repeatability.

    The people who treat this kind of finding as an attack on science are perpetuating the problem. We should be looking to make the scientific process ever better and more accurate as we come to understand its pitfalls better, not shrug off its inadequacies as inevitable.

  • by tgibbs (83782) on Sunday January 02, 2011 @07:34PM (#34740142)

    The pharmaceutical industry is easily one of the most corrupt industries known to man. Perhaps some defense contractors are worse, but if so, then just barely. It's got just the right combination of billions of dollars at play, strong dependency on the part of many of its customers, a basis on intellectual property, financial leverage over most of the rest of the medical industry, and a strong disincentive against actually ever curing anything since it cannot make a profit from healthy people.

    One tends to hear this sort of thing from people who don't know anything about the pharmaceutical industry, and of course this attitude is pushed very hard by people who are hawking quack cures of one sort or another, and who are thus competitors of the pharmaceutical industry.

    I'm an academic pharmacologist, but I've met a lot of the people involved in industrial drug discovery, and trained more than a few of them. People tend to go into pharmacology because they are interested in curing disease and alleviating suffering. Many of them were motivated to enter the area by formative experiences with family members or other loved ones suffering from disease. They don't lose this motivation because they happen to become employed by a pharmaceutical company--indeed, many enter industry because it is there that they have the greatest opportunity to be directly involved in developing treatments that will actually cure people.

    It is certainly true that pharmaceutical companies are businesses, and their decisions regarding how much to spend on treatments for different illnesses are strongly influenced by the potential profits. A potential treatment for a widespread chronic disease can certainly justify a larger investment than a one-time cure. But it can also be very profitable to be the only company with a cure for a serious disease. And it would be very bad to spend a lot of money developing a symptomatic treatment only to have somebody else find a cure. So a company passes up an opportunity for a cure at its peril. There is definitely a great deal of research going on in industry on potential cures.

    The real reason why cures are rare is that curing disease is hard. Biology is complicated, and even where the cause is well understood, a cure can be hard to implement. For example, we understand in principle how many genetic diseases can be cured, but nobody in industry or academia knows how to reliably and safely edit the genes of a living person in practice. It is worth noting that the classic "folk" treatments for disease, including virtually all of the classic herbal treatments that have been found to actually be effective--aspirin, digitalis, ma huang, etc--are not cures; they are symptomatic treatments. Antibiotics were a major breakthrough in the curing of bacterial diseases, but they were not created from scratch, but by co-opting biological antibacterial weapons that were the product of millions of years of evolution. Unfortunately, for many diseases we are not lucky enough to find that evolution has already done the hardest part the research for us.

Faith may be defined briefly as an illogical belief in the occurence of the improbable. - H. L. Mencken

Working...