Forgot your password?
typodupeerror
Science

Research Data: Share Early, Share Often 138

Posted by timothy
from the coin-toss-1-coin-toss-2-coin-toss-3 dept.
Shipud writes "Holland was recently in the news when a psychology professor in Tilburg University was found to have committed large-scale fraud over several years. Now, another Dutch psychologist is suggesting a way to avert these sort of problems, namely by 'sharing early and sharing often,' since fraud may start with small indiscretions due to career-related pressure to publish. In Wilchert's study, he requested raw data from the authors of some 49 papers. He found that the authors' reluctance to share data was associated with 'more errors in the reporting of statistical results and with relatively weaker evidence (against the null hypothesis). The documented errors are arguably the tip of the iceberg of potential errors and biases in statistical analyses and the reporting of statistical results. It is rather disconcerting that roughly 50% of published papers in psychology contain reporting errors and that the unwillingness to share data was most pronounced when the errors concerned statistical significance.'"
This discussion has been archived. No new comments can be posted.

Research Data: Share Early, Share Often

Comments Filter:
  • by Urthas (2349644) on Tuesday December 06, 2011 @01:36PM (#38282096)
    ...most people who do it are downright bad at it. That they might take more time and care to be good at it without the perpetual axe of publish-publish-publish and grants funding hanging over their heads is another issue all together.
  • A better way (Score:4, Insightful)

    by Hentes (2461350) on Tuesday December 06, 2011 @01:47PM (#38282242)

    Don't believe anything that hasn't been verified by an independent group of researchers.

  • by svendsen (1029716) on Tuesday December 06, 2011 @01:50PM (#38282280)
    One reason scientist's don't share is because if the data gets out early and gets around (damn slutty data) is that other scientist's might steal/copy/scope/whatever the data. Unless there is a great way to prevent this the suggestion proposed here will never go anywhere.
  • by Scareduck (177470) on Tuesday December 06, 2011 @01:52PM (#38282310) Homepage Journal

    The IPCC doesn't know about this. Or does this only apply to the "soft sciences"?

  • by Anonymous Coward on Tuesday December 06, 2011 @02:00PM (#38282418)

    Other scientists typically inquire for data after publication of the findings. (How else would someone know what to ask for?) This suggestion only stresses that error-checking be encouraged after the current process of publication.
    Note that this error checking (after attempts to reproduce findings failed) is what led to Gordon Gallup identifying Marc Hauser's recently-acknowledged academic fraud.
    https://en.wikipedia.org/wiki/Marc_Hauser#Previous_controversy_over_an_experiment [wikipedia.org]

  • by br00tus (528477) on Tuesday December 06, 2011 @02:06PM (#38282482)

    Einstein was unable to find a teaching post, and was working in a patent office when he published his annus mirabilis papers. Things have changed over the years though. John Dewey discovered a century ago how children best learned - let the child direct his own learning, and have an adult to facilitate this. This, of course, is not how children are taught. Things nowadays are very test-heavy, and becoming even more so, not as a means to help students in seeing what their deficiencies are, but as a punishment system - and the teachers, and the administrators are under the same punishment system. The carrot of reward is very vague and ill-defined and far-off. It is a system designed to try to squelch the curiosity of those handful of students who had been curious and wanted to learn. Businesses want to get into the education gravy train, and all this charter school stuff is being embraced by both parties, which isn't surprising if you look at the funding behind it.

    At the university, the financial incentives are all aligned so that publishing is a necessity. If one does not publish, they do not get tenure, and then all those years of work were for naught as the academic career is over. And what gets published? An average series of experiments done by the scientific method would usually lead to either inconclusive data and results, or just wind up in a dead end. And what journal wants to publish those results after months of work? One of the most popular Phd comics is this one [phdcomics.com]. It seems fairly obvious to me - the more financial incentives are tied to getting published, the more that bogus studies are going to be published. As far as the idea of honesty, integrity or whatever, these things will gradually subside for most people when they come into conflict with keeping a roof over one's head and food on the table.

  • Re:You Mean... (Score:5, Insightful)

    by jc42 (318812) on Tuesday December 06, 2011 @02:09PM (#38282512) Homepage Journal

    "Trust me I'm a scientist" isn't good enough anymore?

    Actually, in a very real sense, it never was. The story here is that those that were unwilling to let outsiders (i.e., independent researchers) study their data had a significant error rate. But this has generally been understood by scientists; it's why normal scientific procedure encourages getting second opinions from others outside the group.

    If you doesn't want us seeing your data, that will normally be taken as a sign that you know or suspect that there are problems with your data. Attempts to block independent researchers from replicating the experiments or data collection (which is one of the main uses of patent law) is generally taken as an open admission that there's something wrong with your data.

    "Trust me I'm a scientist" may sometimes work with the general public, but it really hasn't ever worked with scientists. A real scientist reacts to interesting scientific news with "Further research is needed", and applies for funding to carry out the research.

  • Re:Psychology (Score:3, Insightful)

    by rgbatduke (1231380) <rgb@@@phy...duke...edu> on Tuesday December 06, 2011 @02:29PM (#38282758) Homepage
    And continues. Phil Jones, for example, has stonewalled requests for the raw data used to e.g. create HadCRUT3 etc, although recently it seems that one reason he hasn't shared it is that he lost it and literally can't share it. So we have a rather important temperature series, openly available on the web and used by many, many climate researchers and nobody can reconstruct it, including the original author. The problem continues -- it is like pulling teeth, getting members of the hockey team to share data and/or methods so anyone can check them.

    Since the few times somebody has bulled through until they've succeeded, e.g. Steve Mcintyre vs Michael Mann, what has been discovered is that the published result (the infamous MBH "hockey stick") is nothing but amplified, distorted white noise that has absolutely no correlation with the data used to produce it, let alone skill at reconstructing actual past temperatures, it doesn't bode well for the discipline.

    I've recently written a guest article on WUWT calling for data/methods transparency in climate research. By transparent, I mean that you should not be allowed to publish a paper that could potentially influence lawmakers and public policy to the tune of hundreds of billions of dollars unless you simultaneously publish all contributory raw data (including any data you for any reason left out) and the actual computer code used to process it into figures and conclusions. Something this important needs full open source open data transparency even more than medical research (another discipline where reproducibility of results is abysmal, where there are vested interests galore, and where we spend/waste a phenomenal amount of both money and human morbidity and mortality on crap results.

    rgb
  • by sustik (90111) on Tuesday December 06, 2011 @02:32PM (#38282808)

    And I believe that they should not have the "right" to publish without others trying as well. Yes, having a topic and milking it for the rest of your life sounds a wet dream, but it is not in the interest of the society, so why would that approach be encouraged/protected?

    So publish your paper and disclose the data. Others after you will reference your work, in fact even those who *just* use the data and otherwise have not much common with your ideas will still have to cite your paper. Sounds great to me. Also remove the quantity thinking in publishing. One paper in 5 years that will be referenced for 50 years coming is way better that 10 papers in 5 years that are reshuffling of the same and instantly forgotten.

    I would replace the publish or perish with: be cited or perish.

    In fact, too many publications can be taken as warning signs that:
    1. There is little new material, but a lot of reuse of text.
    2. The paper is not carefully written and so it is not understood by the field and so the same gets republished over and over.
    3. Corners were cut regarding the experiments or methods, or reviewing related work etc. to save time.

    Of course there are exceptions and just because someone publishes a lot they do not necessarily guilty of the above.

  • Re:Lie or Die (Score:4, Insightful)

    by Anonymous Coward on Tuesday December 06, 2011 @02:45PM (#38282938)

    Sorry, but you're an intellectual bigot who resorts to citing well-known celebrities rather than actually researching what the content of a field actually is and making a principled argument. Unfortunately, your bigotry is only ameliorated by its ubiquity in communities such as Slashdot.

    A number of points need to be made:

    First, most people have a stereotyped idea of what psychology is, because they don't actually know what it is. It's the scientific study of human behavior and experience. If you think it's couches and Freud, you're uninformed. My guess is that Feynman took psychology courses and had his primary exposure to the field during the mid-20th century, when psychoanalysis was dominant in *one branch of psychology*, and isn't even dominant in that area anymore. Psychologists study molecular neurobiology, multivariate statistics, neurophysiology, immunology, and any other number of topics. Be prepared to argue that those fields aren't science (or math) if you're prepared to argue that psychology isn't a science.

    Second, it's worth noting that this fraud case (and the way the story is framed) focuses on psychology, but similar problems happen in other fields. E.g.:

    http://en.wikipedia.org/wiki/Controversy_over_the_discovery_of_Haumea
    http://abcnews.go.com/Health/Wellness/chronic-fatigue-researcher-jailed-controversy/story?id=15076224

    Finally, what would you propose to do instead? Study human behavior and experience nonscientifically? That's what you seem to be suggesting.

  • Re:Psychology (Score:5, Insightful)

    by sstamps (39313) on Tuesday December 06, 2011 @03:04PM (#38283176) Homepage

    And continues. Phil Jones, for example, has stonewalled requests for the raw data used to e.g. create HadCRUT3 etc, although recently it seems that one reason he hasn't shared it is that he lost it and literally can't share it.

    That is complete and utter bullshit.

    First, he has never stonewalled requests for the raw data. It's been out there for ANYONE to obtain. The problem is that, for some of it, you have to PAY to get it, and UEA was forbidden by contract to give away said data for free because then people wouldn't PAY for it anymore. So, if you want to piss and moan about access to the raw data, then apply your angst and woe to the most responsible parties, the Met offices which want to profit from their weather data-gathering businesses.

    Second, the "lost data" canard is a crock. Since the raw data is not owned or generated by UEA, but instead obtained from outside sources, they have NO mandate to keep the original raw data once they have processed it. They (and you and anyone else) can go and get it from the same sources at any time. Whip out your checkbook and get to it.

    So we have a rather important temperature series, openly available on the web and used by many, many climate researchers and nobody can reconstruct it, including the original author. The problem continues -- it is like pulling teeth, getting members of the hockey team to share data and/or methods so anyone can check them.

    You (and they) most certainly can get the original raw data and reconstruct it. There are literally mountains of data that have been released to the public on a large part of climate science. You just need to learn who and how to ask properly and, in some cases, how much it costs.

    Here's [realclimate.org] a huge FREE repository of all kinds of climate-related data, from the climate scientists themselves.

    Since the few times somebody has bulled through until they've succeeded, e.g. Steve Mcintyre vs Michael Mann, what has been discovered is that the published result (the infamous MBH "hockey stick") is nothing but amplified, distorted white noise that has absolutely no correlation with the data used to produce it, let alone skill at reconstructing actual past temperatures, it doesn't bode well for the discipline.

    Mann's work has been vindicated and replicated time and time again, McIntyre's (and others') quixotic attempts to discredit it notwithstanding.

    I've recently written a guest article on WUWT..

    That explains the ignorance of your previous comments a bit.

    ..calling for data/methods transparency in climate research. By transparent, I mean that you should not be allowed to publish a paper that could potentially influence lawmakers and public policy to the tune of hundreds of billions of dollars unless you simultaneously publish all contributory raw data (including any data you for any reason left out) and the actual computer code used to process it into figures and conclusions. Something this important needs full open source open data transparency even more than medical research (another discipline where reproducibility of results is abysmal, where there are vested interests galore, and where we spend/waste a phenomenal amount of both money and human morbidity and mortality on crap results.

    In large part, this is precisely what happens, with a few exceptions. Those exceptions usually revolve around whether any kinda of contracts with private entities to obtain said data, or to develop software/hardware, are in effect that would preclude giving them away. That said, the research should (and usually does) document the specifications for said hardware/software, and include where the original data came from for anyone to pay to obtain it themselves.

    As a software developer who actually writes software for scienti

  • Re:You Mean... (Score:2, Insightful)

    by jimmerz28 (1928616) on Tuesday December 06, 2011 @03:28PM (#38283488)

    Your "fact" is utterly incorrect. Psychology isn't a science like math, physics, chemistry, biology and computer science are sciences.

    A science has laws with verifiable, reproducible outcomes that can be proven (psychology has theories of behavior, not laws). Look at Jung vs. Freud for a great example of why there are no laws of psychology, neither of them is wrong but neither of them is right doesn't make a science.

    Descartes used research and studied according to a scientific method to prove there was such a thing as a "mind", that didn't make him correct or the "science" of the mind an actual science.

  • Re:You Mean... (Score:5, Insightful)

    by jc42 (318812) on Tuesday December 06, 2011 @05:11PM (#38284844) Homepage Journal

    A lot of these errors have been found in neuroscience journals, too, which fancies itself a harder science...

    Actually, this is mostly a special case of a problem that's recognized in most scientific fields: Much scientific work (experimental or observational) has a statistical component, and scientists generally don't have as good an understanding of statistics as their work requires.

    Statistics shares a common problem with other basic subject such as quantum theory, relativity, and chaos theory: They don't fit well with human "intuitive" concepts of how the world works. With quantum theory and relativity, this is fairly blatant, and people usually don't try to pretend to understand them until they've done some serious study. But with statistics (and chaos ;-), people tend to think they have at least a basic understanding of probability, and they also tend to think that that's all they need. They end up publishing data on the basis of output from packaged software that they don't understand well.

    A while back, there was a discussion in a linguistic forum that I follow, about the Pirahã language which lacks words for numbers. As a way of explaining how people could survive without numbers, one contributor came up with an informative parallel: In the modern Western world, there are many important things (economics and climate are hot-topic examples) that can't be understood without an understanding of the important concepts of statistics. But one can easily argue that the dominant "modern" languages lack words for statistical concepts.

    Nearly everyone will object that, for instance, English has well-known terms like "chance", "probability", "mean", "standard deviation", "correlation", etc. But, the author pointed out, these are "cargo-cult" terms, borrowed from an alien (i.e., scientific) language, with little or no actual understanding of their meanings by most of the native speakers of English. This is clear if you look for statistical terms in the English media, and figure out how they're being used. They are just magical terms used to sound convincing, but it's usually clear that the speaker/write doesn't actually understand their technical meaning. Similarly, "quantum" is a common English word, but it's common meaning is very nearly an antonym of the technical meaning in physics. Most English speakers have little or no understanding of the technical meanings of these terms

    In the case of statistical terms, scientists do tend to have taken a course or two in college. But understanding is low, barely above the common understanding used in the media and politics. So it's not surprising that a good number of papers in many scientific fields claim results that don't strictly follow from the data. If there is any sampling done to get the data (and there usually is), it's likely that the conclusions came partly from an interpretation of some software's output that is based on a misunderstanding of the statistical terminology.

    Of course, when you get to the pseudo-sciences and the political arena, this process isn't accidental. Statistical buzz-words are often used as part of the psychological weaponry, to convince readers/listeners of whatever the writer/speaker is trying to convince them of. This is often done with malice aforethought, knowing that the public has almost no understanding of statistics.

Mr. Cole's Axiom: The sum of the intelligence on the planet is a constant; the population is growing.

Working...