Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Science

Scientific Data Disappears At Alarming Rate, 80% Lost In Two Decades 189

cold fjord writes "UPI reports, 'Eighty percent of scientific data are lost within two decades, disappearing into old email addresses and obsolete storage devices, a Canadian study (abstract, article paywalled) indicated. The finding comes from a study tracking the accessibility of scientific data over time, conducted at the University of British Columbia. Researchers attempted to collect original research data from a random set of 516 studies published between 1991 and 2011. While all data sets were available two years after publication, the odds of obtaining the underlying data dropped by 17 per cent per year after that, they reported. "Publicly funded science generates an extraordinary amount of data each year," UBC visiting scholar Tim Vines said. "Much of these data are unique to a time and place, and is thus irreplaceable, and many other data sets are expensive to regenerate.' — More at The Vancouver Sun and Smithsonian."
This discussion has been archived. No new comments can be posted.

Scientific Data Disappears At Alarming Rate, 80% Lost In Two Decades

Comments Filter:
  • And in 20 years... (Score:5, Insightful)

    by Anonymous Coward on Friday December 20, 2013 @04:04AM (#45743845)

    And in 20 years, these results too shall be lost.

  • Concerning... (Score:5, Insightful)

    by Adam Colley ( 3026155 ) <(eb.opuk) (ta) (gom)> on Friday December 20, 2013 @04:09AM (#45743873)

    Trying to ignore that a paper about the unavailability of scientific data is locked behind a paywall.

    This is nothing new though, I do occasional conversion from ancient data formats, people need to pay better attention, imagine trying to read an 8" CP/M floppy today.

    As libraries move to digital storage rather than the dead tree that's been fine for thousands of years they are inviting a catastrophe, possibly only one well aimed solar mass ejection from massive data loss.

  • Precisely (Score:2, Insightful)

    by Anonymous Coward on Friday December 20, 2013 @04:16AM (#45743891)

    This is bang on. As a system administrator for a STEM department at a Canadian institution, my budget is 0 for data retention. Long term data retention is just not in the mindset of researchers.

  • So...? (Score:4, Insightful)

    by Anonymous Coward on Friday December 20, 2013 @04:22AM (#45743909)

    I'm a researcher and I don't have time or space to keep old data as I'm generating too much new data. We work hard to maximize the use of these data and analyses when we write and publish papers. If this was talking about the papers (or presentations), that were the product of the data, being lost at this rate it would be one thing, but the raw data isn't usually very useful to anyone without context or knowledge of subtle and poorly documented technicalities. This just seems like ammunition for the climate change deniers to bitch about. It's unreasonable to keep the old data indefinitely without a massive public repository that will be poorly indexed and organized.

  • Re:Concerning... (Score:5, Insightful)

    by Dutch Gun ( 899105 ) on Friday December 20, 2013 @04:26AM (#45743915)

    Paper has its own issues. Talk to me about the durability of paper after you recover the books lost throughout time due to natural decay, burning (intentional or otherwise), floods, wars, and social forces (politics, religion, etc). Digital data can be easily copied and archived (when not behind a paywall, of course). It seems to me that redundancy is the best form of insurance against data loss. A solar mass is not going to wipe out every computer with a copy of important data on it, and all the relevant backups. And if it does, we're probably in a lot more trouble for reasons other than losing some scientific research.

    Besides which, I sort of wonder if scientific data also follows the 80/20 rule. If so, how much are we really losing? I'm only half joking, of course, since it's difficult to ascertain the value of research immediately in some cases, but wouldn't it stand to reason that any important or groundbreaking research will naturally be widely disseminated, and thus protected against loss?

  • what the hell? (Score:2, Insightful)

    by Anonymous Coward on Friday December 20, 2013 @04:34AM (#45743941)

    I think it is ridiculous that Slashdot's keep posting articles that are behind paywalls. How the hell are we supposed to see them? Do you expect us to pay for subscriptions to services we'd only use once? you, OP, are out of your mind. articles such as this should be rejected as most users, if not all, can't even access the story. This site really has gone down hill in the last few years, over populated with clueless simpletons, frauds, so-called armchair IT experts and -obvious- subscription pushing trolls.

  • Re:Concerning... (Score:2, Insightful)

    by Anonymous Coward on Friday December 20, 2013 @04:36AM (#45743945)

    The problem is not just an issue of digital storage, but also a problem of redundancy.

    In the "old days", people understood and accepted the risk that a paper copy would be lost. In fact, it was a GIVEN that they would eventually be lost (or damaged or misplaced or stolen or checked out and simply never returned). So multiple copies were kept because centuries of experience dictated that some copies would be lost no matter how strong, carefully maintained and well preserved the originals were.

    Nowadays, people simply think its a matter of "copy and paste". But, as you point out, its not. Different hardware formats on top of different software formats. The card catalog with its rigid but well defined categories was switched for a nebulous and vague "tagging" system. And god help you if the files are corrupted.

  • Re:So...? (Score:0, Insightful)

    by Anonymous Coward on Friday December 20, 2013 @04:59AM (#45743973)

    but the raw data isn't usually very useful to anyone without context or knowledge of subtle and poorly documented technicalities

    Wow, what a load of patronising bollocks. "My data was so important that I published it in a peer reviewed journal, but nobody else is smart enough to review it.

    It's unreasonable to keep the old data indefinitely without a massive public repository

    The bound experiment notebook that any undergrad worth anything was taught to keep in the pre-computer era is just as reasonable to demand today. YOU make your living with data, YOU learn how to maintain backups, or use the democratic process in your academic institution to get someone else to do it.

    I do absolutely acknowledge that the move away from paper has made this vastly harder. Paper kept in a dry environment takes at least a lifetime to rot, and nearly every adult human in the developed world knows how to read and copy a sheet of paper. Maintaining electronic backup media usually takes far more frequent intervention, and greater expertise - not just with the hardware, but to ensure on-going readability of the data format. This is one of those things where the technologist who are entirely hep with every buzzword of the last 5 years forgets that the world's just slightly longer, and what seems like the only important set of tools in the world today will be a footnote in history tomorrow.

  • Re:Concerning... (Score:5, Insightful)

    by Eunuchswear ( 210685 ) on Friday December 20, 2013 @05:03AM (#45743993) Journal

    Digital data can be easily copied and archived

    Can be. But mostly isn't.

  • Re:Concerning... (Score:5, Insightful)

    by martin-boundary ( 547041 ) on Friday December 20, 2013 @06:10AM (#45744171)

    This is nothing new though, I do occasional conversion from ancient data formats, people need to pay better attention, imagine trying to read an 8" CP /M floppy today.

    It's not that it's a new problem as such, it's that for the first time in history we have a simple way to solve it, yet we have stupid greedy rich people who sponsor and enact laws to stop us from solving the problem.

    The way to solve the problem is through massive duplication of all the data, over and over again through time. We have the technical means to do this on an unprecedented scale.

    Even 1000 years ago, people had to painstakingly copy books, by hand, one at a time. And after a handful of copies were produced, there still weren't enough to guarantee that most would survive the ages, wars, fires, censorship, etc. So we generally have tiny collections from the past.

    But now it's digital data. Anyone could copy it. We could have millions of copies of some obscure scientific work, all perfect duplicates. If even 0.1% of these copies survive, that's still thousands of copies.

    And what do we do? We let a bunch of 1 percenters, who themselves barely know how or care to read, sponsor draconian copyright laws to stop eeryone from copying all that stuff, just on the off chance that they might copy a bunch of songs or movies that are outmoded within two years. And the commercial scienrific pulishers are some of the worst.

    It's pathetic.

  • Re:Concerning... (Score:4, Insightful)

    by bfandreas ( 603438 ) on Friday December 20, 2013 @06:25AM (#45744203)
    The combination of insane copyright claims and the overrelyance on comparatively volatile storage technology is steering us directly into another dark ages.
    That's one take on things.
    On the other hand we have already lost so much stuff over the centuries that perhaps what I just said is idiotic alarmism. After all we have rebuilt western civilisation after the fall of Rome(that just took the Dark Ages) and we didn't all die off after the Great Library of Alexandria burned down. The stuff that gets often replicated will propably not be lost. But let's hope it isn't a retweet of Miley Cyrus' knickers.
  • Re:Concerning... (Score:5, Insightful)

    by Teun ( 17872 ) on Friday December 20, 2013 @08:02AM (#45744491)
    In the nineties I had a friend working for a company that bought a lot of old Soviet geophysical data.

    It needed some very special transcription technology but once in the clear and fed to modern 3D seismic software it revealed a lot more than the original reports gave.

    Retaining old reports is nice, retaining old raw data even nicer.

  • Re:Concerning... (Score:5, Insightful)

    by Lisias ( 447563 ) on Friday December 20, 2013 @08:06AM (#45744497) Homepage Journal

    Wishful thinking.

    Let's make a deal: *first*, the gene therapy works. *THEN* we assume we can afford to lose the data the grandparent talks about.

  • by queazocotal ( 915608 ) on Friday December 20, 2013 @08:12AM (#45744523)

    That's not the point.
    The actual published results - even if published in an obscure journal tend to stick around _much_ more.

    Even old journals which go out of publication get their archives and the rights to distribute them bought - as there is some small amount of value there, in addition to the copies in the various reference libraries around the world.

    The problem is that if you are wondering about that graph on page 14 of the paper that the whole paper rests on, you can't get the original data to recreate that graph.

    This is a major problem because the only way to check that graph is now to redo the whole experiment.

"And remember: Evil will always prevail, because Good is dumb." -- Spaceballs

Working...