Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Science

Research Data: Share Early, Share Often 138

Shipud writes "Holland was recently in the news when a psychology professor in Tilburg University was found to have committed large-scale fraud over several years. Now, another Dutch psychologist is suggesting a way to avert these sort of problems, namely by 'sharing early and sharing often,' since fraud may start with small indiscretions due to career-related pressure to publish. In Wilchert's study, he requested raw data from the authors of some 49 papers. He found that the authors' reluctance to share data was associated with 'more errors in the reporting of statistical results and with relatively weaker evidence (against the null hypothesis). The documented errors are arguably the tip of the iceberg of potential errors and biases in statistical analyses and the reporting of statistical results. It is rather disconcerting that roughly 50% of published papers in psychology contain reporting errors and that the unwillingness to share data was most pronounced when the errors concerned statistical significance.'"
This discussion has been archived. No new comments can be posted.

Research Data: Share Early, Share Often

Comments Filter:
  • "It is rather disconcerting that roughly 50% of published papers in psychology contain reporting errors"

    How many errors are present in this statement? Just saying.

  • You Mean... (Score:3, Funny)

    by Anonymous Coward on Tuesday December 06, 2011 @01:31PM (#38282004)
    "Trust me I'm a scientist" isn't good enough anymore?
    • by Anonymous Coward on Tuesday December 06, 2011 @01:49PM (#38282256)

      Did you know that you can just BUY labcoats?

    • Re:You Mean... (Score:5, Insightful)

      by jc42 ( 318812 ) on Tuesday December 06, 2011 @02:09PM (#38282512) Homepage Journal

      "Trust me I'm a scientist" isn't good enough anymore?

      Actually, in a very real sense, it never was. The story here is that those that were unwilling to let outsiders (i.e., independent researchers) study their data had a significant error rate. But this has generally been understood by scientists; it's why normal scientific procedure encourages getting second opinions from others outside the group.

      If you doesn't want us seeing your data, that will normally be taken as a sign that you know or suspect that there are problems with your data. Attempts to block independent researchers from replicating the experiments or data collection (which is one of the main uses of patent law) is generally taken as an open admission that there's something wrong with your data.

      "Trust me I'm a scientist" may sometimes work with the general public, but it really hasn't ever worked with scientists. A real scientist reacts to interesting scientific news with "Further research is needed", and applies for funding to carry out the research.

  • by Anonymous Coward

    It'll need money to store and make available and staff to manage.

    So who is going to accept an increase in taxes to allow this to happen?

    • Are you kidding? What's the cost of storage on a webserver, per byte? Would that be "zero" compared to the size of any reasonable dataset in the discipline? It would. You could put up a single e.g. 10 TB server in a single lab for a few thousand dollars and it would cost a few hundred dollars a year to run and would handle all the data associated with all the publications in psychology in a decade.

      What is expensive and wastes taxes is bozos who do crap research, publish the crap results, hide the cr
      • by Anonymous Coward

        My dissertation research data set is pushing half a terabyte, plus another half or so of transformed data in intermediate form. Mind you I'm in neuro, and we're a pretty data-driven lab. If you're curious, it's mostly confocal stacks, neuron recordings (each experiment is usually several hours, multiple channels, and sampled at 20kHz or more), integrator output, parameter-space vector data from optimizations, etc. Some of this *can* be regenerated from smaller seed data sets ... if you have access to a d

  • by Urthas ( 2349644 ) on Tuesday December 06, 2011 @01:36PM (#38282096)
    ...most people who do it are downright bad at it. That they might take more time and care to be good at it without the perpetual axe of publish-publish-publish and grants funding hanging over their heads is another issue all together.
    • ...most people who do it are downright bad at it. That they might take more time and care to be good at it without the perpetual axe of publish-publish-publish and grants funding hanging over their heads is another issue all together.

      I agree and I can think of something to illustrate your point.

      I was listening to a This American Life episode a few weeks [thisamericanlife.org] back and there was a story done on two people -- one a music professor and the other a respected oncologist -- who were investigating a long defunct theory that certain electromagnetic wavelengths can kill cancer cells and only cancer cells leaving healthy cells completely fine. When left to run the test, the music professor failed to maintain the control correctly and many other things. But after being corrected by the respected researcher they started getting positive sets of preliminary results. The respected researcher requested that the music professor not share this with anyone and not to attach his name to it just yet.

      Well, the music professor did not follow this advice because he was so excited about the preliminary results and had, I guess, sort of felt like the respected researcher had short changed him and suppressed him. What the music professor wanted to do was blow the lid off this thing with possibly flawed data and sent it to other oncologists with the original researcher's name attached to it -- possibly misrepresenting it as flawed data. Now I can see why a researcher might fly off the handle when data is released extremely early. They were having problems recreating their own findings (with sham-control) which caused the original researcher to want to keep this very much out of the public's eye. You might claim he was just trying to save himself embarrassment but there's nothing embarrassing about finding out your hypothesis is wrong in science, I just think the best researchers avoid these "failures" and the subsequent investment of resources into them.

      I think that scientists figure out how to create the most data and separate the wheat from the shaft in a very lengthy (think decades) long process whereas the first sign of a breakthrough might cause more inexperienced researchers to show the world. And the reason, as you mentioned, is probably the immediate funding they can get with it. But I think it badly neuters scientific news, the reward system and even the direction that research takes. But to release and share early on and often might just make everyone look bad when the whole background of the data is unknown to someone who receives it.

  • Lie or Die (Score:2, Interesting)

    by Chemisor ( 97276 )

    It is very difficult to make a man understand something when his job depends on not understanding it. If psychology research were made to adhere to any kind of stringent scientific standard, there would be no psychology research.

    • by ColdWetDog ( 752185 ) on Tuesday December 06, 2011 @01:52PM (#38282308) Homepage

      It is very difficult to make a man understand something when his job depends on not understanding it. If psychology research were made to adhere to any kind of stringent scientific standard, there would be no psychology research.

      Sounds like you have some issues with authority. Would you like to discuss it?

      • Only if he can discuss it in properly controlled, double blind circumstances. For example, he can wait in a room until a man in a white lab coat enters to discuss it with him. Outside, the researcher can flip a coin to determine whether the individual in the lab coat is an actual psychologist or is a plumber or taxi cab driver. Afterwards he can be ordered to perform a really nasty task, such is cleaning up the urinals in a public restroom with a toothbrush, to determine whether or not he still has issu
        • Afterwards he can be ordered to perform a really nasty task, such is cleaning up the urinals in a public restroom with a toothbrush...

          It's not actually really nasty until he has to brush his teeth with said toothbrush afterwards.

    • Psychology is NOT science, see what Richard Feynman, a somewhat intelligent guy had this to say on the subject.. " I would offer that very good minds can practice psychology, people with deep experience and wisdom and understanding. Psychology obviously has value to many, many people, and also makes deep metaphysical arguments about the world and our understanding of it, yet, its just not a science." Feynman's assesses psychology as a cargo cult science, "(It) follows all the apparent precepts and forms of
      • Re: (Score:3, Interesting)

        by Toonol ( 1057698 )
        I wonder if we just haven't quite mastered the techniques necessary to deal scientifically with highly complex systems. Psychology, economics, climatology, etc., all are theoretically understandable, but are so chaotic that our standard scientific methodology can't be applied... you can't, for instance, repeat an experiment. You can't isolate one changing variable.
      • Re:Lie or Die (Score:4, Insightful)

        by Anonymous Coward on Tuesday December 06, 2011 @02:45PM (#38282938)

        Sorry, but you're an intellectual bigot who resorts to citing well-known celebrities rather than actually researching what the content of a field actually is and making a principled argument. Unfortunately, your bigotry is only ameliorated by its ubiquity in communities such as Slashdot.

        A number of points need to be made:

        First, most people have a stereotyped idea of what psychology is, because they don't actually know what it is. It's the scientific study of human behavior and experience. If you think it's couches and Freud, you're uninformed. My guess is that Feynman took psychology courses and had his primary exposure to the field during the mid-20th century, when psychoanalysis was dominant in *one branch of psychology*, and isn't even dominant in that area anymore. Psychologists study molecular neurobiology, multivariate statistics, neurophysiology, immunology, and any other number of topics. Be prepared to argue that those fields aren't science (or math) if you're prepared to argue that psychology isn't a science.

        Second, it's worth noting that this fraud case (and the way the story is framed) focuses on psychology, but similar problems happen in other fields. E.g.:

        http://en.wikipedia.org/wiki/Controversy_over_the_discovery_of_Haumea
        http://abcnews.go.com/Health/Wellness/chronic-fatigue-researcher-jailed-controversy/story?id=15076224

        Finally, what would you propose to do instead? Study human behavior and experience nonscientifically? That's what you seem to be suggesting.

        • Awright!! a good, well reasoned argument form someone with a vocabulary. However. I would suggest that the definition of a Science does not include Psychology, in which experimentation does not produce results that can be DISPROVED. I would also suggest that many branches of Medicine fall into the same category. That is why an MD has a "practice". Possibly drug testing may fall into the Science category, but not the Practice of Psychology. It's an ART, not a SCIENCE. Call me a stickler for details, but peop
        • Intellectual bigot or not, the GP has a point. That Freud was ever considered a valid contributor to the field is a big fat black mark against psychology and the fact that introductory courses still mention his name with any purpose other than to ridicule his stupid, foolhardy, unscientific perspective is another.

          Psychology is permeated by poor statistics (I read the journals), a poor understanding of experimental design and generally poor methodologies. Sure they make use of molecular neurobiology, multiva

      • by Eil ( 82413 )

        Head back to Wikipedia for a bit... Feynman was not talking about all of psychology, but mostly parapsychology. Reading minds, bending keys, that kind of thing. He was also speaking in a time where non-religious (or loosely religious) mysticism was fairly common and even mainstream compared to today. Psychology is a much different field nowadays than it was almost 40 years ago.

    • It is very difficult to make a man understand something when his job depends on not understanding it.

      Could you post the psychological research backing up this claim?

  • A better way (Score:4, Insightful)

    by Hentes ( 2461350 ) on Tuesday December 06, 2011 @01:47PM (#38282242)

    Don't believe anything that hasn't been verified by an independent group of researchers.

  • by DBCubix ( 1027232 ) on Tuesday December 06, 2011 @01:49PM (#38282254)
    I do research in textual web mining and from time to time I have other researchers ask me for my collections which I spider myself from copyrighted web sources. While my work is purely academic, I am covered by fair use. But since US intellectual property laws are obtuse and overbearing (imho), I cannot take the risk of sharing my collections with others for fear of running afoul of copyright law (since I can't control what is done with the collection once it is out of my hands and how do I know they would use it in a manner consistent with fair use). So it may be more than an unwillingness out of statistical fudging and more an unwillingness to become a target of copyright lawyers.
    • I do research in textual web mining and from time to time I have other researchers ask me for my collections which I spider myself from copyrighted web sources. While my work is purely academic, I am covered by fair use. But since US intellectual property laws are obtuse and overbearing (imho), I cannot take the risk of sharing my collections with others for fear of running afoul of copyright law (since I can't control what is done with the collection once it is out of my hands and how do I know they would use it in a manner consistent with fair use). So it may be more than an unwillingness out of statistical fudging and more an unwillingness to become a target of copyright lawyers.

      Why would that be an issue? The onus would be on the people you share the data with it do keep it in the fair use domain. An analogy would be a professor quoting some copyrighted text in a syllabus and then saying she couldn't give a copy of the syllabus to another professor (or student) because she can't control what they do with it.

      • I do research in textual web mining and from time to time I have other researchers ask me for my collections which I spider myself from copyrighted web sources. While my work is purely academic, I am covered by fair use. But since US intellectual property laws are obtuse and overbearing (imho), I cannot take the risk of sharing my collections with others for fear of running afoul of copyright law (since I can't control what is done with the collection once it is out of my hands and how do I know they would use it in a manner consistent with fair use). So it may be more than an unwillingness out of statistical fudging and more an unwillingness to become a target of copyright lawyers.

        Why would that be an issue? The onus would be on the people you share the data with it do keep it in the fair use domain. An analogy would be a professor quoting some copyrighted text in a syllabus and then saying she couldn't give a copy of the syllabus to another professor (or student) because she can't control what they do with it.

        There is a difference between copying a brief excerpt in a fair use context and copying the complete copyrighted work. The key point is that the fellow researchers want the complete data set, complete copies of copyrighted works. The original researcher is correct to fear legal consequences and regrettably should consult an attorney before sharing such a data set. Alternatively the original researcher should have logged the URLs where the original data was found and provided these URLs to fellow researchers

  • by svendsen ( 1029716 ) on Tuesday December 06, 2011 @01:50PM (#38282280)
    One reason scientist's don't share is because if the data gets out early and gets around (damn slutty data) is that other scientist's might steal/copy/scope/whatever the data. Unless there is a great way to prevent this the suggestion proposed here will never go anywhere.
    • by Anonymous Coward

      If you publish the data with the paper then that isn't a problem. If you want to publish the paper before you've finished analysing the data, that looks like a bigger cause for concern than someone else stealing the data.

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      Other scientists typically inquire for data after publication of the findings. (How else would someone know what to ask for?) This suggestion only stresses that error-checking be encouraged after the current process of publication.
      Note that this error checking (after attempts to reproduce findings failed) is what led to Gordon Gallup identifying Marc Hauser's recently-acknowledged academic fraud.
      https://en.wikipedia.org/wiki/Marc_Hauser#Previous_controversy_over_an_experiment [wikipedia.org]

    • You mean, a way like "requiring the authors to put their data and numerical methods up on a website no later than the date of publication of the paper"?

      Unless their competitors are good at time travel, that seems as though it might be enough...

      rgb
  • The IPCC doesn't know about this. Or does this only apply to the "soft sciences"?

    • by hexghost ( 444585 ) on Tuesday December 06, 2011 @02:02PM (#38282442) Homepage

      What? The IPCC was just collecting already published data, there was no 'new' studies done.

      Careful - your bias is shining through.

      • Regardless of where the data was from and what agreements covered its release, the stated purpose of the people in the emails was to keep the data out of the hands of their opponents regardless of open freedom of information requests, and Phil Jones himself said he would be "hiding behind" the agreements in order to keep the data from being released. There was quite literally a conspiracy not only to avoid normal scientific sharing of information, but legal freedom of information requests.

        As scientists, the

      • No, the IPCC doesn't collect data, they collect published results from data that remains occult even now. Indeed, Phil Jones appears to have lost the raw data out of which e.g. HadCRUT3 was built. Not only is it impossible for anyone else to check his methods or reproduce his results from raw data, he can't reproduce his results from raw data.

        Doesn't matter to the IPCC or anybody else that uses that data.

        Look up the history of e.g. Steve Mcintyre's efforts to get actual data and methods out of any o
      • by Arker ( 91948 )

        What? The IPCC was just collecting already published data, there was no 'new' studies done.

        That's a very weak dodge. Metaresearch doesnt get some magical exemption from scientific procedure. Whether you are going out and collecting raw data to start with, or importing the results of a dozen earlier studies and going from there, all information necessary to replicate your results must be made openly available for replication or else you simply are not doing science.

  • by peter303 ( 12292 ) on Tuesday December 06, 2011 @01:59PM (#38282404)
    Some probes like Mars Rovers, Cassini, SOHO post their data on the web within days. Others like kepler and ESA-Express have posted very little of their data. The tradition is for Principal Investigators to embargo the data one year.
  • Psychologist's statistical study suggesting that psychologists have possible psychological issues with sharing their psychological studies... perhaps this warrants a further psychological study of said psychologists?

  • by br00tus ( 528477 ) on Tuesday December 06, 2011 @02:06PM (#38282482)

    Einstein was unable to find a teaching post, and was working in a patent office when he published his annus mirabilis papers. Things have changed over the years though. John Dewey discovered a century ago how children best learned - let the child direct his own learning, and have an adult to facilitate this. This, of course, is not how children are taught. Things nowadays are very test-heavy, and becoming even more so, not as a means to help students in seeing what their deficiencies are, but as a punishment system - and the teachers, and the administrators are under the same punishment system. The carrot of reward is very vague and ill-defined and far-off. It is a system designed to try to squelch the curiosity of those handful of students who had been curious and wanted to learn. Businesses want to get into the education gravy train, and all this charter school stuff is being embraced by both parties, which isn't surprising if you look at the funding behind it.

    At the university, the financial incentives are all aligned so that publishing is a necessity. If one does not publish, they do not get tenure, and then all those years of work were for naught as the academic career is over. And what gets published? An average series of experiments done by the scientific method would usually lead to either inconclusive data and results, or just wind up in a dead end. And what journal wants to publish those results after months of work? One of the most popular Phd comics is this one [phdcomics.com]. It seems fairly obvious to me - the more financial incentives are tied to getting published, the more that bogus studies are going to be published. As far as the idea of honesty, integrity or whatever, these things will gradually subside for most people when they come into conflict with keeping a roof over one's head and food on the table.

  • by macwhizkid ( 864124 ) on Tuesday December 06, 2011 @02:11PM (#38282534)

    Ultimately, everyone agrees that open sharing of research data funded by the taxpayers would be A Good Thing(TM). The problem is: how do you persuade people to actually do it. Much how things like advanced safety features on cars, free college tuition, and taxes on big banks sound like great ideas, until you look at what it will actually cost to implement. Not just "cost" in terms of money for infrastructure development, data storage, and support, but in terms of persuading an entire culture to change their workflow.

    In our lab, we already spend an extraordinary amount of time on administrative tasks only indirectly related to our research. Adding in a mandatory data sharing task and fielding questions from random people who wanted to use it would be a serious additional chore. Then there's the embarrassment aspect... we actually had a project a couple months ago where there was another group doing an experiment that we wanted to do, and they had software already written. So we thought, "great, we'll just ask them for the code". So we fired off an email... and after a couple weeks we finally got a reply to the effect of "this is actually my first program, and I don't feel comfortable sharing it." So we had to spend 2-3 months writing our own version to do exactly the same thing.

    • by Trepidity ( 597 )

      In the latter case, I think sometimes this is actually for the best. Even though it results in redundant coding, that's one form of replication. If everyone reused the same code written by one grad student long ago and never rewrote it (and the grad student's first program, no less!), there would be a lot of reliance that that program does what it says it does, and does it correctly in all cases. Sure, you could run test cases, read through the code carefully, even try to formally verify it, but in my exper

    • All true, so perhaps we should add a line or two limiting the scope of the rule, then make it law. If your research data is being used as the basis for major political policy decisions and the spending of unlimited amounts of taxpayer money (e.g. climate research) the public's need outweighs your own inconvenience or embarrassment. Cost we can ignore. Note well that in most enterprises putting data/methods up on a website should be almost free -- who doesn't have websites? Who cannot use archive/comress
      • So you say that different rules would apply to research in areas that politicians have gotten interested in, because the public needs more. In that case, the public should be prepared to pay more, since there's a need. After all, the government could take my house because they need to build a bypass there, but they'd have to pay me for it. Public need; public cost.

        Therefore, what you want to propose is that, in selected areas of research, money be automatically allocated for a data librarian's service

  • by Steavis ( 887731 ) on Tuesday December 06, 2011 @02:16PM (#38282596)
    The NSF is now requiring this [nsf.gov] as part of grant applications. You have to have a data management plan that includes the public deposit of both the data and results from grant funded work. Other funding orgs are following suit.

    This is a fairly major project at the university I work for, both from the in-process data management perspective (keeping field researchers from storing their only copies on thumbdrives and laptops) and from the long-term repository perspective for holding the data when the grant is completed (that's what I'm involved with).

    Storage is cheap. Convincing university administrators to pay for keeping it accessible is another problem, but the NSF position is helping.
    • And this is the way everybody should be doing it, including work funded by NIH, NOAA, DOE, EPA, DOD (well, not all of the DOD work, but some of it). It should be legally mandated for all granting agencies, especially agencies that fund research that is critical to public policy decisions, decisions on the spending of large amounts of tax money, human life and well being, or technological advances that belong in the public domain because (after all) the public paid for them.

      rgb
  • Surely people aren't just going to turn over the means to get themselves charged with fraud out of the goodness of their hearts. Somehow this has to be made mandatory by the institutions or the publications that they hope to present their work (as suggested in the second linked article; and as I understand some of the top medical journals do nowadays).

    • by Toonol ( 1057698 )
      Surely people aren't just going to turn over the means to get themselves charged with fraud out of the goodness of their hearts.

      Well, maybe the scientists that aren't committing fraud will be happy to share their data... then, that small percentage of scientists that refuse to will be shamed and/or ignored.
  • Having something to hide. In some cases it is error or bias. What other attributes are "something to hide?". And why didn't the researchers disclose them? What didn't they know, and when did they not know it ?
  • This does show that the pressure to overstate certainty of the results is more common in academia than is otherwise claimed. This is not limited to psychology. Human beings respond to incentives. And lack of requirement to publish data acts as an incentive to overstate certainty.
  • In other words made up bullshit, with no cohesive theory tying it all together. So made up data is the least of their concerns...

  • If it's the median, then it means what the summary is implying.

    If it's the mean, it could just be that the ones who do commit fraud are skewing the sample. And that if you subtract them, the error rate of the people who don't share is no different than the error rate of those who do share.

    Not saying it's one or the other. Just pointing out that in this particular case, it's a very important distinction.

Get hold of portable property. -- Charles Dickens, "Great Expectations"

Working...