Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Science

Major Scientific Journal Publisher Requires Public Access To Data 136

An anonymous reader writes "PLOS — the Public Library of Science — is one of the most prolific publishers of research papers in the world. 'Open access' is one of their mantras, and they've been working to push the academic publishing system into a state where research isn't locked behind paywalls and subscription services. To that end, they've announced a new policy for all of their journals: 'authors must make all data publicly available, without restriction, immediately upon publication of the article.' The data must be available within the article itself, in the supplementary information, or within a stable, public repository. This is good news for replicating experiments, building on past results, and science in general."
This discussion has been archived. No new comments can be posted.

Major Scientific Journal Publisher Requires Public Access To Data

Comments Filter:
  • Good policy (Score:5, Interesting)

    by MtnDeusExMachina ( 3537979 ) on Tuesday February 25, 2014 @05:40PM (#46339223)
    It would be nice to see this result in pressure on other publishers to require similar access to data backing the papers in their journals.
  • good and bad (Score:4, Interesting)

    by eli pabst ( 948845 ) on Tuesday February 25, 2014 @06:04PM (#46339447)
    Will be interesting to see how this is balanced with patient privacy, in particular with the increasing numbers of human genomes being sequenced. I know a large proportion of the samples I work with in the lab have restrictions on how the data can be used/shared due to the wording of the informed consent forms. Many would certainly not allow public release of their genome sequence, so publishing in PloS (or any other journal with this policy) would be impossible. So while I think the underlying principle is good, I think an unintended consequence might be less privacy for patients wanting to participate in research (or less patients electing to participate at all).
  • Practicalities (Score:5, Interesting)

    by Roger W Moore ( 538166 ) on Tuesday February 25, 2014 @06:17PM (#46339525) Journal
    Open data is a great idea but it is not always practical. Particle physics experiments generate petabytes of extremely complex, hard to understand data. Making this publicly accessible is extremely expensive and ultimately useless since, unless you understand the innards of the detector and how it responds to particles and spend the time to really understand the complex analysis and reconstruction code there is nothing useful that you can do with the data. In fact one of the previous experiments I worked on went to great trouble to put their data online in a heavily processed and far easier to understand format in the hope that theorists or interested members of the public would look at the data. IIRC they got about 10 hits on the site per year and 1 access to the data.

    So I agree with the principle that the public should be able to access all our data but for experiments with massive, complex datasets there needs to be a serious discussion about whether this is practical given the expense and complexity of the data involved. Do we best serve the public interest if we spend 25% of our research funding on making the data available to a handful of people outside the experiments with the time, skills and interest to access it given that this loss in funds would significantly hamper the rate of progress?

    Personally I would regard data as something akin to a museum collection. Museums typically own far more than they can sensibly display to the public and so they select the most interested items and display these for all to see. Perhaps we should take the same approach with scientific data. Treat it as a collection of which only the most interesting selections are displayed to/accessible by the public even though the entire collection is under public ownership.
  • Re:Good policy (Score:3, Interesting)

    by Pseudonym ( 62607 ) on Tuesday February 25, 2014 @07:32PM (#46340227)

    You know who needs to introduce this rule? The ACM.

    I'm fed up with so-called scientific papers with results based on proprietary software. It doesn't even have to be open source, though that would clearly be good for peer review. If I can't (given appropriate hardware and other appropriate caveats) run your software, I can't replicate your results. If I can't replicate your results, it's not science.

  • by the gnat ( 153162 ) on Tuesday February 25, 2014 @07:36PM (#46340261)

    Some of these data sets require decades of time and millions of dollars to produce, and the primary investigators want to use the data they've generated for multiple projects. . . There are plenty of scientists out there who poach free online data sets and mine them for additional findings.

    I work in a field (structural biology) that had this debate back when I was still in grade school: the issue was whether journals should require deposition of the molecular coordinates in a public database, or later, should these data be released immediately on publication, or could the authors keep them private for a limited time. The responses at the time were very instructive: one of the foremost proponents of data sharing was accused of trying to "destroy crystallography as we know it", to which his response was yes, of course, but how was that a bad thing? Skipping to the punchline: nearly every journal now requires immediate release of coordinates and underlying experimental data immediately upon publication, during which time the field has grown exponentially and there have been at least six Nobel prizes awarded for crystallography (at least one of which went to an early opponent of data sharing). The top-tier journals (Science, Nature) average about a paper per week reporting a new structure. Not only did the predicted dire consequences never happen, the availability of a large collection of protein structures has actually accelerated the field by making it easier to solve related sturctures (and easier to test new methods), and facilitated the emergence of protein structure prediction and design as a major field in its own right.

    The question I'm worried about: what form do the data need to take? Curating and archiving derived data (coordinates and structure factors) is already handled by the Protein Data Bank, but the raw images are a few orders of magnitude larger, and there is no public database available. Most experimental labs simply do not have the resources to make these data easily available. (The exceptions are a few structural genomics initiatives with dedicated computing support, but those are going away soon.)

Two can Live as Cheaply as One for Half as Long. -- Howard Kandel

Working...