Millions of Research Papers at Risk of Disappearing From the Internet (nature.com) 26
More than one-quarter of scholarly articles are not being properly archived and preserved, a study of more than seven million digital publications suggests. From a report: The findings, published in the Journal of Librarianship and Scholarly Communication on 24 January, indicate that systems to preserve papers online have failed to keep pace with the growth of research output. "Our entire epistemology of science and research relies on the chain of footnotes," explains author Martin Eve, a researcher in literature, technology and publishing at Birkbeck, University of London. "If you can't verify what someone else has said at some other point, you're just trusting to blind faith for artefacts that you can no longer read yourself."
Eve, who is also involved in research and development at digital-infrastructure organization Crossref, checked whether 7,438,037 works labelled with digital object identifiers (DOIs) are held in archives. DOIs -- which consist of a string of numbers, letters and symbols -- are unique fingerprints used to identify and link to specific publications, such as scholarly articles and official reports. Crossref is the largest DOI registration agency, allocating the identifiers to about 20,000 members, including publishers, museums and other institutions.
The sample of DOIs included in the study was made up of a random selection of up to 1,000 registered to each member organization. Twenty-eight percent of these works -- more than two million articles -- did not appear in a major digital archive, despite having an active DOI. Only 58% of the DOIs referenced works that had been stored in at least one archive. The other 14% were excluded from the study because they were published too recently, were not journal articles or did not have an identifiable source.
Eve, who is also involved in research and development at digital-infrastructure organization Crossref, checked whether 7,438,037 works labelled with digital object identifiers (DOIs) are held in archives. DOIs -- which consist of a string of numbers, letters and symbols -- are unique fingerprints used to identify and link to specific publications, such as scholarly articles and official reports. Crossref is the largest DOI registration agency, allocating the identifiers to about 20,000 members, including publishers, museums and other institutions.
The sample of DOIs included in the study was made up of a random selection of up to 1,000 registered to each member organization. Twenty-eight percent of these works -- more than two million articles -- did not appear in a major digital archive, despite having an active DOI. Only 58% of the DOIs referenced works that had been stored in at least one archive. The other 14% were excluded from the study because they were published too recently, were not journal articles or did not have an identifiable source.
Money (Score:2)
Let's face it: most journals make money in some shape or form and they are in the business of publishing new content. Yes, some may be purely non profit organizations. But archiving journals takes time and therefore costs volunteer hours or money - money that could be paid out as profits or as salaries for other work. For these reasons, I hypothesize that journals are not being archived because it is either not considered profitable or an archivist is not being paid to divert funds elsewhere. Storage i
Re: (Score:2)
Re: (Score:3)
TFA mentions 7 million journal articles.
Let's say they are 10 megabytes each. So that is 70 terabytes.
20 TB drives are available on Amazon for $220.
I have more than 70 TB of storage in a shoebox in my closet.
They killed the person backing them up (Score:2)
define properly (Score:2)
Re: (Score:2)
Internet forgets? (Score:2)
Are you trying to tell me that the Internet forgets things?
Re: (Score:2)
The movie Rollerball warned of this in 1975 in a scene where everything from the 13th century was lost https://www.youtube.com/watch?... [youtube.com]
Maybe the U.S. Copyright Office? (Score:3)
Maybe we should change our Copyright Office rules, I am guessing they are in the dumb paranoid state that they currently are because that is the way the copyright holders want them to be.
It's a system that was bound to break eventually (Score:2)
With academics pushing out papers simply because their performance is measured in quantity of papers written how many of those papers are actually worth the bits they are stored in? Are people of the future ever going to want to read most of them?
Not that I have a better idea to offer...
Re: (Score:2)
The masters thesis I wrote over 32 years ago still shows up in search results. I can't imagine anyone being interested in it. I just pounded it out with minimal effort as quickly as possible to meet my degree requirements so that I could start a new job. I'd be happy if it disappeared... I knew that the university library was going to archive it, but in 1991 I had no idea that it was going to end up on a new (at that time) thing called the internet.
Before we Panic.. (Score:1)
..and start tossing taxpayer billions at this instant-crisis, could we perhaps peruse the kind of ‘research’ that’s at risk of disappearing forever first?
I mean, the future may no longer have a use for knowing how duck farts didn’t change migration patterns, or how water is in fact still wet (Revision 74), so if it’s that kind of thing we’re looking to Preserve for the Children, maybe we find the delete button instead.
Libraries used to mean something. (Score:2)
The other problem being ignored is that the quality of published work is going down because everyone needs to be published and cite other published works just to have a chance at a job with more funding. The signal to noise has gotten so bad that there is just too much to archive properly in a library. And no, I am not talking about a digital copy. A real library, even if it is microfiche.
In a century if someone needs to use that library there will
Maybe there's a backup (Score:1)
Hopefully, most of them are on https://en.wikipedia.org/wiki/... [wikipedia.org]
But it may not matter that much anyway: https://journals.plos.org/plos... [plos.org] (Why Most Published Research Findings Are False)
But are these research papers useful? (Score:2)
Or are they more like AI-generated text that is now proliferating on the internet, that regurgitates what somebody else already said?
Re: (Score:2)
Research papers are quite useful, just not all of them. But you only know which ones are useful to you when you are working on something specific.
Re: (Score:2)
you only know which ones are useful to you when you are working on something specific
That is true, assuming all research papers are of equal quality. It is possible to assess quality even if you aren't working on something specific. And it's the quality that I was getting at.
Re: (Score:2)
It is possible to assess quality even if you aren't working on something specific.
Only sometimes, not regularly. As a reviewer, I have had papers by Asian authors that obviously did not speak English that looked like nonsense, read like garbled nonsense, but had good research in there you could only see when you were quite familiar with the topic, and I had nice looking, well written papers that sounded quite plausible but were utter garbage.
Re: (Score:2)
The ability to communicate well is an important component of good research, it's not just a "nice to have" feature. Perfectly good technique can be completely ruined by poor communication.
What I mean by "it is possible to assess" is that there are certain factors that must be present in quality research. One can assess the soundness of the hypothesis being studied, the quality and relevance of the controls, the randomness and number of samples being studied. If the paper can't communicate well enough to mak
Re: (Score:2)
Actually, that is an idiotic comment clearly coming from an idiot. What is essential in research is _research_. The ability to communicate it nicely is optional and "nice to have". Obviously, the information must be in there, but that is it.
Re: (Score:2)
Apparently, there are other idiots like me, who also believe good communication is essential for research.
https://www.research.pitt.edu/... [pitt.edu]
https://learning.closer.ac.uk/... [closer.ac.uk]
https://www.rug.nl/education/p... [www.rug.nl]
Surprising (Score:2)
At least if you put them online properly and made sure you are allowed to, they will be in the Internet Archive. This possibly only refers to papers published via greedy publishers like Elsevier and the like. And even then, you just put a tech-report with the same title online.