Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 internet speed test! ×
Stats Science

Many Surveys, About One In Five, May Contain Fraudulent Data (sciencemag.org) 115

sciencehabit writes: How often do people conducting surveys simply fabricate some or all of the data? Several high-profile cases of fraud over the past few years have shone a spotlight on that question, but the full scope of the problem has remained unknown. [Tuesday], at a meeting in Washington, D.C., a pair of well-known researchers presented a statistical test for detecting fabricated data in survey answers. When they applied it to more than 1000 public data sets from international surveys, a worrying picture emerged: About one in five of the surveys failed, indicating a high likelihood of fabricated data.
This discussion has been archived. No new comments can be posted.

Many Surveys, About One In Five, May Contain Fraudulent Data

Comments Filter:
  • by Anonymous Coward

    Well, we know one does for sure.

  • by Anonymous Coward on Thursday February 25, 2016 @12:26PM (#51583321)

    When I take most surveys I answer calculated to confound the test as much as possible, on the assumption that anyone could be doing so; and thus, to accelerate the process is to bring it to the attention of the researchers faster. The problem is that it's an inherently flawed method.

    • The problem is that it's an inherently flawed method.

      The problem is that people believe what they see and hear

    • Re: (Score:1, Interesting)

      by Anonymous Coward

      Respondent's lie and researchers lie. I've been unable to find it now, but a month ago I came across a essay from an established sociologist who was bemoaning the current number of researchers in his field who were conducting shit studies in the furtherance of promoting their political and cultural ideology. He examined a few prominent studies that were debunked when it came to light that the researchers had either manipulated, or dropped data points that conflicted with their study's conclusions, or outrig

    • When I take most surveys I answer calculated to confound the test as much as possible

      If a few people did this randomly, then it wouldn't skew the results much. But it is not random. Liberals are more willing to participate in surveys, and more willing to answer honestly. Conservatives tend to be more cynical and calculating. Other factors skewing the results are that Democrats are more likely to be home, more likely to answer the phone, and more likely to participate in social media polls. Republican are more likely to let their phone calls roll over to voice mail, but also more likely

      • That is quite interesting. Can you link to a paper, etc., that calculates the adjustments for liberal vs. conservative survey-taking patterns? I would be interested to see the magnitude of this effect.
        • Can you link to a paper, etc., that calculates the adjustments for liberal vs. conservative survey-taking patterns?

          Sorry, I have seen these issues mentioned several places, including Nate Silver's blog, but I have never seen them actually quantified.

          I think the most famous skewed poll was in 1948. Phone surveys predicted a decisive win by Dewey, but instead Truman was re-elected. The reason was that back in 1948, households with lower incomes (and more likely to vote Democratic) often didn't have a phone.

          • No worries, just curious. I've read Nate's blog since we learned in the 2012 election that he's the first to successfully apply Bayes rule to political science. That part of his model is probably proprietary.
    • Google does the whole survey thing with its Google Opinion Rewards app, with Google Play Store credit as an incentive. They address this problem by asking you a bogus question like "Have you been to any of the following locations recently?" and then listing locations that do not exist. If you answer in the affirmative, they know you're lying and cut you off from surveys in the future.

    • Re: (Score:2, Insightful)

      Your response will most likely be washed out by a sea of honest responses.

      Most participants respond to the best of their ability---although it cannot be assumed they are always correct. Respondent error, even about themselves, is more common than outright deception.

      Researchers are aware that participants lie due to self-deception, social desirability, deliberate sabotage, and other reasons. Surveys often incorporate measures to detect deception.

      If you answer in a nonsensical fashion, the worst you'll do is

    • The problem is that it's an inherently flawed method.

      Yet Nate Silver accurately predicted the last two presidential election outcomes. Put another way, just because you're a skeptic doesn't mean you're not the one making shit up and chasing ghosts that aren't there.

      • by jc79 ( 1683494 )

        As a gross estimate, the odds are 1/4 that he would have predicted both elections successfully. I'd give it a bit longer before deciding that he's got all the answers.

    • by jrumney ( 197329 )
      I think the problem is that surveys always return an unusually high number of responses by 104 year old female construction workers from Kazakhstan holding PhDs, so researchers have come to think that it is normal to get so many responses from that group.
    • I see so many flaws in surveys, including multiple choices that don't include my answer, and page after page of ranking grids. Worse yet are the ones that show four product models -- say, server computers -- with random combinations of features and prices, asking me to rank. If they want to know what matters to me, just ask, don't try to trick it out of me. Other surveys lie about duration. All ways that people stop trying to answer and instead just pick randomly or on a straight line to get through the
  • Self referential? (Score:5, Insightful)

    by goombah99 ( 560566 ) on Thursday February 25, 2016 @12:28PM (#51583347)

    23.7 % of statistical analyses make up their statistics.

  • by Diss Champ ( 934796 ) on Thursday February 25, 2016 @12:39PM (#51583493)

    It's not that only 1 in 5 surveys may contain fraudulent data, it is that the fraud is only incompetent enough to be caught by this method in 1 in 5 surveys.

    • Exactly, and there is a negative correlation between the amount of incompetence and the amount of funding.
    • They could have both false positives and false negatives.

      They eliminated studies which had a prima facie case for having highly similar responses. However, it is possible for them to miss a study which generates fairly consistent response patterns for non-obvious reasons.

      I can buy the 17% number. A sizable majority are legitimate scientific endeavors, but there are enough bad actors that you need to actively seek them out.

      I seem to recall about 9% of Americans have a felony conviction. If you assume that's

  • by gstoddart ( 321705 ) on Thursday February 25, 2016 @12:41PM (#51583509) Homepage

    Like it or not, a lot of public opinion polls are paid for by people who want to support a specific point.

    Public opinion polls these days are as much PR and marketing as anything else.

    Honestly, Pew makes money doing this stuff; honest player or not, they have a vested interest in keeping up the belief that their stuff is honest, unbiased, and accurate.

    But I'm entirely willing to believe opinion polls are carefully crafted, or sneakily tweaked, to arrive at the conclusions they've been commissioned to a arrive at.

    • by Anonymous Coward

      Right, like the Gartner Magic Quadrant shit that most C-suite people drool over. I've talked to several software vendors who have told me that Gartner approaches them on the side and allows them to "buy up" their rankings for the right price.

      • by gstoddart ( 321705 ) on Thursday February 25, 2016 @01:07PM (#51583841) Homepage

        Right, like the Gartner Magic Quadrant shit that most C-suite people drool over.

        Or bond ratings agencies.

        I suspect most people, except the people who cite those things, have long since assumed they're full of shit and the conclusions are paid for.

        Why would you assume it's honest and objective information? Someone has to make money off it.

        And if you don't like that, start your own foundation or think tank, and have them publish stuff to your liking.

        Sorry, it's all PR and marketing. It sure as hell aint facts or accurate predictons.

    • Public opinion polls these days are as much PR and marketing as anything else.

      Honestly, Pew makes money doing this stuff; honest player or not, they have a vested interest in keeping up the belief that their stuff is honest, unbiased, and accurate.

      But I'm entirely willing to believe opinion polls are carefully crafted, or sneakily tweaked, to arrive at the conclusions they've been commissioned to a arrive at.

      This is NOT AT ALL about public opinion polls which "people who want to support a specific point" would pay to skew in a certain direction NOR is it about polls that are designed (or "sneakily tweaked") to create certain results.
      It's not even about confirmation bias by the pollsters or researchers leaching into the data.
      This is about finding cases where a pollster would just sit down and fill out a survey after survey by themselves instead of going door to door.
      I.e. Charging for a field survey, forging the

  • ...a pair of well-known researchers presented a statistical test for detecting fabricated data in survey answers...

    That sounds a little suspect...

  • by gurps_npc ( 621217 ) on Thursday February 25, 2016 @12:44PM (#51583547) Homepage
    There are a lot of ways you can screw with a study. For example, you get dramatically different answers by rephrasing the question.

    If you ask: "Do you believe that mothers should be able to legally murder their babies within 2 months of the creation of life?" you get a very different answer than if you ask "Do you believe that women should have the legal right to abortion when the fetus can be demonstrated to show no brain activity more significant than that of a snail."

    This might be intentional, or simple unconscious bias.

    • by Anonymous Coward

      Never attribute to incompetence that which can easily be explained by malice.

      Consider how loaded and/or leading these questions often are. How unwanted results are known to be thrown out. How population samples are carefully selected to maximize the odds of a certain result.

      To assume that fraudulent data would be some mere accidental whoopsie -though perhaps claimed as such when those defrauding us are caught- would be not only blind, but total sensory deprivation.

  • So one in five studies might contain false data. Essentially what they are saying is that there's a 20% chance that they are lying about there being a 20% chance of them lying.

    • No. TFA states exactly what they are saying: "About one in five of the surveys failed, indicating a high likelihood of fabricated data."
  • by sbaker ( 47485 ) on Thursday February 25, 2016 @12:55PM (#51583693) Homepage

    What they ACTUALLY said was that in surveys conducted in the Western World - only 5% failed their test - but in developing countries - the number was 26% of faked surveys.

    Then, they also say that the KIND of survey matters. Their approach is to say that if 85% of answers are identical between two or more respondents then the result is likely to be faked...but they recognize that (for example) in a health survey, all of the healthy people will answer identically to questions about how healthy they are. So that kind of survey is excluded.

    So if the research is to be taken at face value, then in the Western world, one in twenty of *some* classes of survey are probably faked. But they looked at 1000 surveys to arrive at that number - we don't know what fraction of those came from the developing world. If all you're interested in is Western World surveys - then maybe the sample size is very small. Given that there are some classes of survey that are known to be excluded - is it possible that they included a few of "the wrong kind" in their sample.

    All surveys have an error bar of a few percent - this is a survey about surveys.

    I think the conclusion here is that you should ignore surveys carried out by dubious agencies in the developing world. I don't think you should conclude that surveys done by reputable agencies in the western world are unreliable.

    • This is not a survey about surveys. This is a statistical test of the number of identical responses in a survey, and the likelihood that such uniformity could have resulted by chance. That probably indicates not just fraud, but incompetent fraud (or, to be fair, perhaps people in developing countries are just much more likely to answer independent questions in exactly the same way, and the researchers drastically underestimated the independence of these questions). Seems like grading math or physics homewor
  • by Anonymous Coward

    A few years ago a survey said 2/3 of the Danes supports nuclear power. Two days later another survey said 1/3 supports it. A journalist then figured at least one of them is wrong and started digging. It turned out that they didn't answer the same question. 2/3 said yes when asked if nuclear power can be used to reduce worldwide CO2 emissions. 1/3 said yes to Denmark gaining nuclear power for environmental reasons. Despite not answering the same question, the press releases and following headlines made it lo

    • It had absolutely no hint of the city being fictional.

      If they're willing to bomb a city and kill people without even knowing the specific reason for that bombing, their ignorance is truly dangerous to the world.

      If they cannot immediately recall the "where" and "why" to justify homicide, they deserve to be embarrassed.

      The only conceivable justification for their position is the lack of an option for indicating "unsure" or "no opinion". And I'd be shocked if a modern survey didn't include that.

  • ...if the data from THIS survey was deliberately falsified to see if anybody actually checked the sources?

    • THIS is not a survey.
      • You're right, I was mistaken in calling it a "survey". But my point was, did anybody verify their data? :D

        • Yes, it is a good point, and that's why we should be glad that the researchers eschewed Pew's threats and published their paper anyway. Hopefully other researchers will follow up using these, or perhaps more sophisticated, methods to identify the extent of the problem.
        • From TFA:

          During her turn on the stage, Kennedy mounted an attack on the test's methodology. For example, she points out, it does not account for the number of questions on a survey, the number of respondents, nor other factors that can skew the results. She also takes exception to the 85% similarity threshold. "I would choose a different threshold depending on the population and the survey," she says. By putting a number on the extent of data fabrication across all surveys, "they took it too far," Kennedy says. Pew's rebuttal is now online.

          Some at the meeting saw merit in both sides of the fight. Rather than overestimating data fabrication, the method of Kuriakose and Robbins "very likely underestimates the true extent of the problem," says Michael Spagat, an economist at Royal Holloway, University of London, who has investigated high-profile cases of possible data fabrication in war zones. Yet Kennedy's response impressed him, too. "I think the Pew paper is interesting and made some good points," he says. "Specifically, there isn't a hard and fast cutoff beyond which you know there is fabrication." Overall, however, Spagat remains very concerned about data fabrication in surveys. "Robbins and Kuriakose have uncovered a massive problem and the Pew paper doesn't change that."

          Nothing was settled by the end, says the meeting's co-organizer Steven Koczela, president of the MassINC Polling Group in Boston and a previous survey research leader for the U.S. State Department. The case laid out by Kuriakose and Robbins "seems unassailable to me," he says, "but [Pew] are giving it their level best."

  • Would be easier to write fixed questions wouldn't it? Also harder for the less observant to spot. :D Surveys in my mind go into the category of statistics, which can lie in a number of ways. Surprised at this news.
  • by JustAnotherOldGuy ( 4145623 ) on Thursday February 25, 2016 @01:02PM (#51583789)

    Many Surveys, About Eight In Three, May Contain Fraudulent Data

  • by drew_kime ( 303965 ) on Thursday February 25, 2016 @01:06PM (#51583837) Journal

    "Do you prefer chocolate or vanilla?" is different from "Do you support Falun Gong?" Opinion surveys always have to account for the confounding factor that each respondent may be more likely to provide the socially acceptable answer than their true opinion. The stronger the social stigma associated with the question, the more likely this will be a problem.

    This new test is a useful addition to the data analysis process, but doesn't "prove" anything. The challenge is how to refine the technique. If you want to eliminate "false positives" you would need some way to identify "true positives". And if we had a way to do that, we wouldn't need to do surveys.

    Bottom line: Surveys don't prove anything. At best they point to interesting ideas for future study.

    • If they publish their method with sufficient details for others to duplicate it, the cheaters will be able to use it as well.

      If they vet their fabricated data to ensure it passes muster, we will have a real problem.

      I fear this could end up like the arms race between malware coders and antivirus vendors. Because I doubt the good guys would have much of a chance here either.

  • by Anonymous Coward

    Over half of all analysis of surveys is flawed.

  • I once had a girlfriend who was doing her PhD. She habitually 'nudged' her data co-ordinates closer to the line of best fit. Other data she merely fabricated.

  • The other 4 of 5 is just biased by wording the question in a carefull way or with an odd scale.
  • A survey showed that 1 in 5 surveys have fraudulent data. The other 4 surveys had different results.
  • by xxxJonBoyxxx ( 565205 ) on Thursday February 25, 2016 @01:30PM (#51584105)

    Roughly half the marketing departments at companies I've worked for have used half-baked surveys to gather statistics so the company name and the statistic get repeated in the industry over and over again.

    This often happens like this: "At (industry conference) this year, let's pass out a survey asking whether or not someone has every heard of a coworker getting hacked by (whatever threat our product purports to mitigate). Survey goes out to already half-paranoid people walking by, and the entire marketing and sales department fills one out that says 'yes I have'. A week later a press release goes out that says "(company) surveyed (# of people) IT managers and other attendees at (conference) and found that (high percentage) had direct knowledge of a coworker getting hacked by (threat)." Very often this stuff gets picked up by the press, bloggers and even other competitors, and the essentially made-up stat gets repeated and repeated until some people even think its true.

    - http://www.tripwire.com/compan... [tripwire.com]
    - http://www.prnewswire.com/news... [prnewswire.com]
    - https://www.voltage.com/breach... [voltage.com]

  • When surveys ask for personal details that I don't want to supply, I often put crap data in if they don't offer a way to bypass it, such as a "Do not wish to state" option among the choices. For example, age, occupation, and income level.

  • How does this affect the outcome of every episode of FAMILY FEUD??? Maybe the Jones family won after all?

  • In a 100-question survey of 100 people, for example, fewer than five people would be expected to have identical answers on 85 of the questions.

    Who in the world takes a 100 question survey? Every survey I've been asked to take has been less than 10 questions.

  • But I can say that on every porn site that asks, I can truthfully say that I was born on January the 1st, 1927.

  • I used to own a polling company with my wife Tracy Costin The people that sponser the survey's are always asking for more then is realisticly possible they will be making changes to survey questions right up to the last minute then they want a certain number of responsise you can only cal a certain number of people in a certain geographic area once use use up all your numbers and you can't get anyone else to answer questions what do you do? plus they are always in a hurry you have to complete the survey in
  • ...why only 4 out of 5 dentist approve of Crest White Strips

Nonsense. Space is blue and birds fly through it. -- Heisenberg