Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Science

Millions of Research Papers at Risk of Disappearing From the Internet (nature.com) 26

More than one-quarter of scholarly articles are not being properly archived and preserved, a study of more than seven million digital publications suggests. From a report: The findings, published in the Journal of Librarianship and Scholarly Communication on 24 January, indicate that systems to preserve papers online have failed to keep pace with the growth of research output. "Our entire epistemology of science and research relies on the chain of footnotes," explains author Martin Eve, a researcher in literature, technology and publishing at Birkbeck, University of London. "If you can't verify what someone else has said at some other point, you're just trusting to blind faith for artefacts that you can no longer read yourself."

Eve, who is also involved in research and development at digital-infrastructure organization Crossref, checked whether 7,438,037 works labelled with digital object identifiers (DOIs) are held in archives. DOIs -- which consist of a string of numbers, letters and symbols -- are unique fingerprints used to identify and link to specific publications, such as scholarly articles and official reports. Crossref is the largest DOI registration agency, allocating the identifiers to about 20,000 members, including publishers, museums and other institutions.

The sample of DOIs included in the study was made up of a random selection of up to 1,000 registered to each member organization. Twenty-eight percent of these works -- more than two million articles -- did not appear in a major digital archive, despite having an active DOI. Only 58% of the DOIs referenced works that had been stored in at least one archive. The other 14% were excluded from the study because they were published too recently, were not journal articles or did not have an identifiable source.

This discussion has been archived. No new comments can be posted.

Millions of Research Papers at Risk of Disappearing From the Internet

Comments Filter:
  • Let's face it: most journals make money in some shape or form and they are in the business of publishing new content. Yes, some may be purely non profit organizations. But archiving journals takes time and therefore costs volunteer hours or money - money that could be paid out as profits or as salaries for other work. For these reasons, I hypothesize that journals are not being archived because it is either not considered profitable or an archivist is not being paid to divert funds elsewhere. Storage i

    • Journals never assumed the responsibility of archiving themselves. The published. Libraries archived.
    • TFA mentions 7 million journal articles.

      Let's say they are 10 megabytes each. So that is 70 terabytes.

      20 TB drives are available on Amazon for $220.

      I have more than 70 TB of storage in a shoebox in my closet.

  • Rip Aaron Schwartz. Universities, journal publishers and companies that require degrees are all responsible. It's not enough to give you a student debt larger than your mortgage, they destroy the fruits of your labour too. And it gets worse, they are all being rolled up into the proprietary katamari that are "large language models" where the ai will paraphrase random out if context chunks of it all.
  • and who's responsible for it? the article leaves a lot to be desired...
    • And nobody read most of these papers the first time around, so why would anyone want to archive them? So they can not read them again in the future?
  • Are you trying to tell me that the Internet forgets things?

  • by laughingskeptic ( 1004414 ) on Thursday March 07, 2024 @10:59AM (#64297306)
    The U.S. Copyright Office receives a copy of each work ... and then destroys the copy. The office will only retains works under specific conditions: written authorization from the copyright claimant, completion of a Copyright Office Litigation Statement Form in connection with litigation, or a court order for reproduction in litigation.

    Maybe we should change our Copyright Office rules, I am guessing they are in the dumb paranoid state that they currently are because that is the way the copyright holders want them to be.
  • With academics pushing out papers simply because their performance is measured in quantity of papers written how many of those papers are actually worth the bits they are stored in? Are people of the future ever going to want to read most of them?

    Not that I have a better idea to offer...

    • The masters thesis I wrote over 32 years ago still shows up in search results. I can't imagine anyone being interested in it. I just pounded it out with minimal effort as quickly as possible to meet my degree requirements so that I could start a new job. I'd be happy if it disappeared... I knew that the university library was going to archive it, but in 1991 I had no idea that it was going to end up on a new (at that time) thing called the internet.

  • ..and start tossing taxpayer billions at this instant-crisis, could we perhaps peruse the kind of ‘research’ that’s at risk of disappearing forever first?

    I mean, the future may no longer have a use for knowing how duck farts didn’t change migration patterns, or how water is in fact still wet (Revision 74), so if it’s that kind of thing we’re looking to Preserve for the Children, maybe we find the delete button instead.

  • And this proves that they should still mean something.

    The other problem being ignored is that the quality of published work is going down because everyone needs to be published and cite other published works just to have a chance at a job with more funding. The signal to noise has gotten so bad that there is just too much to archive properly in a library. And no, I am not talking about a digital copy. A real library, even if it is microfiche.

    In a century if someone needs to use that library there will
  • Hopefully, most of them are on https://en.wikipedia.org/wiki/... [wikipedia.org]

    But it may not matter that much anyway: https://journals.plos.org/plos... [plos.org] (Why Most Published Research Findings Are False)

  • Or are they more like AI-generated text that is now proliferating on the internet, that regurgitates what somebody else already said?

    • by gweihir ( 88907 )

      Research papers are quite useful, just not all of them. But you only know which ones are useful to you when you are working on something specific.

      • you only know which ones are useful to you when you are working on something specific

        That is true, assuming all research papers are of equal quality. It is possible to assess quality even if you aren't working on something specific. And it's the quality that I was getting at.

        • by gweihir ( 88907 )

          It is possible to assess quality even if you aren't working on something specific.

          Only sometimes, not regularly. As a reviewer, I have had papers by Asian authors that obviously did not speak English that looked like nonsense, read like garbled nonsense, but had good research in there you could only see when you were quite familiar with the topic, and I had nice looking, well written papers that sounded quite plausible but were utter garbage.

          • The ability to communicate well is an important component of good research, it's not just a "nice to have" feature. Perfectly good technique can be completely ruined by poor communication.

            What I mean by "it is possible to assess" is that there are certain factors that must be present in quality research. One can assess the soundness of the hypothesis being studied, the quality and relevance of the controls, the randomness and number of samples being studied. If the paper can't communicate well enough to mak

  • At least if you put them online properly and made sure you are allowed to, they will be in the Internet Archive. This possibly only refers to papers published via greedy publishers like Elsevier and the like. And even then, you just put a tech-report with the same title online.

Asynchronous inputs are at the root of our race problems. -- D. Winker and F. Prosser

Working...