Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Science

Ensuring Permanence Of Online Scientific Journals 107

wtpooh writes: "Many librarians and archivists are concerned about the impermanence of online scientific journals. They are accustomed to saving the paper journals for decades and do not have faith that the online versions will still be accessible in the future (What happens when a publisher goes out of business and shuts down its Web site, for example)." (Read more.)

"To help solve this problem, the Stanford Library is collaborating with the National Science Foundation and Sun to create a system called LOCKSS (Lots Of Copies Keep Stuff Safe). LOCKSS is an open source, java/linux based server system which is designed to run on cheap computers at libraries and permanently cache journals to which the libraries subscribe. The LOCKSS systems talk to each other to preserve the integrity of their caches and ensure that there are always at least a minumum number of copies of each article around the world. Read about the current alpha test at the LOCKSS homepage or in this article in the Chronicle of Higher Education "

Sounds like self-interrogating distributed file systems can be useful to people unlikely to get sued by rock bands, as if that wasn't obvious.

This discussion has been archived. No new comments can be posted.

Ensuring Permanence Of Online Scientific Journals

Comments Filter:
  • This is very serious Issue. Paper is rotting away as we speak, valuable documents dating from the first person to put pen to paper. these need to be digitally scanned for future generations. Out
  • Excluding libraries bursting into flame or being flooded, paper documents are still more reliable than magnetic media (or most other computer-based storage methods).

    If an online-only journal is important enough, maybe some institutions (such as university libraries) should print out and archive the material, which would also help those without Internet access (gasp - these people exist?).

    Imagine - the slashdot collection. 23 square miles of library, every story and comment ever posted!
  • by spiralx ( 97066 ) on Wednesday May 03, 2000 @11:45PM (#1093287)

    The immeadiate benefit of this kind of thing over current distributed services such as FreeNet [sourceforge.net] is of course the fact that data stored on LOCKSS will be permanently available irrespective of how many times people actually request that page. On FreeNet a page is only kept on the network for as long as people are actually requesting the page - there is a "decay" of old information which makes it unsuitable for this kind of guaranteed archival.

    The other advantage of the LOCKSS system is that it maintains a certain number of redundant copies across the network, and regularly checks these against each other to ensure that the integrity of each copy is undisturbed by accidents and general bit rot. This system could keep data in pristine form for an indefinite amount of time - as long as the system runs the data is available and correct.

    But as for its use as an archive for other kinds of content as suggested in the story? Well, given that it doesn't appear to be anonymous like FreeNet, the same problems that we're now seeing with Napster [slashdot.org] will undoubtedly occur, and given that the whole point of the system is to keep files on there no matter what happens, the people running the LOCKSS servers will want to keep a close eye on what goes onto the system since removal will be fairly difficult. I doubt that it'll take off for this kind of purpose without the guaranteed anonymity that FreeNet has.

    Another related project worth a look at is the Internet Archive [archive.org] which provides snapshots of public Internet sites for researchers.

  • Fianlly maybe we can have all the papers locally and let us reasonably search through them. It is incredibly annoying when you have to visit several sites to do a full text search
  • Looks like it is time to publish an online magazine covering mp3 files......
    Lets see if they are going to sue every library over the world for exchanging illegal mp3's

    Jeroen

  • I'll be amazed if they can get a significant number of major publishers to agree to this. I work in a company related to the electronic publishing industry and I know that publishers are just as fussy about their copyright as any other industry, if not more so.

    I would suspect that libraries participating in this kind of project leave themselves open to all kinds of action in similar ways to the Napster issue. Since most if not all libraries have a limited budget any threat from a publisher is likely to cause the software to be removed, which doesn't really produce a confident, secure archiving solution.

    It is certainly true that this is one of the biggest issues in the electronic publishing industry at the moment though, if not THE biggest.

    Q.
  • Imagine - the slashdot collection. 23 square miles of library, every story and comment ever posted!

    Heh, you'd have a whole corridor devoted to books full of "First Post!" comments. First Post Hall maybe?

  • ..the LOCKSS system is simply a webpage caching system with the added feature of being able to talk to other PCs and compare webcaches of the same document. Doesn't sound like a replacement for Napster or Gnutella to me.

  • Not interesting??? Did you know that sciences like astronomy are one of the reasons faster and better computers are being made as we speak (seti is a nice example trying out 'hot new networking techniques' and 'computer methods').

    And because of psychology and sociology I know that your post will probably a result of some strange twist in your head caused by your direct social environment ;)

    Jeroen

  • by Anonymous Coward
    This would be A GOOD THING (tm) if there was any real value-add to these online journals, but let's face it, most of them are just transitions from the in-house print format to pdf/html.

    When 'proper' scientific online journals emerge - ones that allow online peer review, rapid publication (by which I mean hours, not weeks), and generally facilitate scientific debate and progress, whilst allowing access to all interested parties (e.g. how many medical journals do you know that accept submissions from patients?) this will be an issue. But then they won't be able to have a paper version, and the giant publishing houses will fall (Yeah, naive, I know) and it SHOULD be different.

    What I want is an online journal in docbook, or some other xml that allows me to do proper contextual searching, and ask questions like what papers talk about knees, have been reviewed by 'respected' personages, cited by at least 20 other authors and are less than 2 months old

    At the moment this might be a good mechanism for public facilities such as schools and libraries that don't have the space/staff/cash to take the paper versions of these journals, but ensuring the permanence of online journals - there ain't really any such thing YET.

    Raist@postmaster.co.uk
  • Methink LOCKKS is good for save and redundant storage of data. However, it gives *no* waranty for the storage on the *long term*. Actually, I think that is a real problem for digital data. This problem is far from being solved.

    A total different point of view is the following. What good is an ideal storage system for the scientific world? Probably none, because that is not the problem.

    As a PhD student, I'm deeply involved in the scientific world, not to say that I'm (going to be) a scientist myself. But a progressive one, who certainly likes the net, and electronic publishing. I've published electronically myself, at Brain Research Interactive. However, ZERO response!

    The scientific culture has the following properties:

    (i) Scientists have highly conservative attitude. If 'the others' don't like it, they simply will no touch it.

    (ii) Naturally, status is very important. Only scientists with very, very high status can change things. Furthermore, the status of your publications is all. You will not publish in a journal with a low status, only if your data is bad. So, as long as the electronic journals don't have a high status, they will be neglected.

    (iii) Apart from the aforementioned characteristics, the peer review is important. The peer review mechanism is more important than how the journal looks like, or its medium. And for the peer review, you need good editors and a good system, which of course is expensive.

    In conclusion, just put up a website with articles and call it a journal, won't work. The safety of the data is only a minor point.

    But, let's keep on trying!

    Jeroen
  • Thanks for the flamage... saves me the trouble.
  • Why would you digitally scan an online document? If its online, then it's already digitized. This is about bit degradation not paper.

    I.e., congratulations on 8th post. Maybe if kharma-whoring were eliminated people would at least read links and think before posting.
  • My fear is that increasing amounts of resource will be poured into maintaining these academic papers for posterity, when many are nothing more than a rehash of earlier work, or turn out to be pure crap anyway.

    OK, storage and networking costs are coming down all the time. This may be so, but the infrastructure has to be provided and maintained, for an open ended volume of data. In addition, there is a cost to each researcher in the future who may have to trawl through a load of lame crap to find the paper they need.

    In short, it needs moderating. Books and papers going out of print, or the final copies getting lost, are natures way of moderating irrelevant crap. Because, if it wasn't irrelevant, someone would have invested in preserving it.

    Once these papers are moderated out of usage through neglect, a future generation may indeed be interested in them. We have a name for such people - archaeologists. These are people who sift through the discarded, irrelevant crap of civilizations, to find out about the civilizations who went before them, the ones who valued these discarded things.
  • Thank you for that simple, but wrong, idea. Printing out their own "back-ups" faces the same copyright issues as making a digital copy. Fair use only gets you so far.

  • The purpose of LOCKSS, as described on its homepage, is to keep archived copies of scientific journals to prevent them disappearing for good, since there are no physically distributed copies. In essence, it's a online archival system that caches jounrnals for libraries that implement them.

    Hence, it would be an apples-to-oranges comparison to compare LOCKSS to Napster or FreeNet, which are meant to provide a more dynamic sharing service with other users on the Internet.

    In fact, from the FAQ, it seems that LOCKSS can't be accessed from outside the implementating facility (it only caches the journals libraries subscribe to), so your whole concern is really moot.

    Go get your free Palm V (25 referrals needed only!)

  • Oddly enough, this topic is about online document preservation. Not searching through documents.

    Congrats on post #12
  • A bit off-topic, assuming this turns into yet-another mp3 discussion ...

    Before I'd so much as seen a webpage I was using DATASTAR and Dialog (and a couple of other big online databases.) For those who haven't seen them, they are /awesome/ -- they have complete, indexed fulltext of literally thousands of newspapers, newswires, magazines, journals (academic and popular). I was thinking about this last night in the context of searching, ie that I was lucky to have had some training on searching those (they had their own oh-so-user-friendly commandline search languages) .

    This is the biggest missed opportunity of the web/net. Searching for articles on something using standard web search engines is slow, painful, and often you end up with a random assortment of stuff. You spend ages sorting spurious hits from the real thing, following links that look like they might be relevant but actually aren't, and so on.

    What I would like is a web interface to one of those databases. I'd even be willing to pay small amounts to get the fulltext of an article once located. Too ofen the best info you can find is a mixture of someone's personal notes, a couple of academic sites' "top level overviews" without anything specific and a bunch of lame niche sites. When I first heard about the web I naively imagined it might become something like the great free public lending library; alas, not so.

    Is there any chance of digital access to the LOCKSS info ? Not unless you're physically in the library, I guess. Ah well.

    vila: a long and noble tradition
    Camaron de la Isla [flamenco-world.com] 'When I sing with pleasure, my

  • And SETI is not astronomy, but rather astrology.

    There indeed is a difference, but I think you make a mistake here.

    Astrology is about predicting the future or horoscopes with some mumbling about the cosmos blah blah blah.

    Astronomy is a real science, and seti is trying to prove a that there is live out there. They are simple trying to provide evidence for a mathematical model that says that there is a good chance that there is live outside our solar system. no astrology involved here.

    The assumption that it will not be of any use is also wrong, although not in the near future there is a change that theorizing about quantum theory, relativity and such will provide faster computers/communication in the near future. The caveman rubbing two sticks together probably got critisized to for not hunting but once he got fire I bet people said something different....

    Jeroen

  • It's a good idea, but I can't see how they can 'ensure that there are always at least a minimum number of copies of each article around the world'.
    First of all, according to them, only libraries which subscribe to certain journals will have the articles - therefore if only 5 libraries subscribe only 5 copies will exist.
    Secondly, conversly, what happens if 5,000 libraries subscribe? That'll be a lot of redundent storage - yeah, I know it is sorta what they are aiming for, but I'm guessing that these journals will need a bit of file space per issue.
    Oh, and I just love the bit about 'java/linux based server system which is designed to run on cheap computers at libraries' - most library computers I've seen are still 1986 type modules (no GUI). The 'high-spec' machines in my city library (which has around 6 public 'dumb terminals', 8 staff 'dummies' and 3 public 'PCs') have a per-hour charge for usage.

    Richy C. [beebware.com]
    --
  • Hmm, perhaps somethink along the same lines as Advogato [advogato.org] could be applied here. It's a weblog like /. but instead of a moderation system it relies on a "trust metric" where users are certified by other users ensuring that people with relevence to the field are given more of a voice. Of course, it's not a perfect system but it would definitely be more productive than /.'s moderation system for an online journal.

    You'd also need more advanced formatting (perhaps a LaTeX to HTML converter since LaTeX seems to be the preferred choice for writing papers) so that equations, tables and graphs could be included in both the paper and responses, a decent search engine with multiple criteria for finding articles/comments.

    I think you could do this now, but it would be a very difficult project to code. Still, maybe someone out there's working on it?

  • You bring up an excellent point. Are you perhaps also suggesting a "dynamic" online journal. Perhaps with a point-counterpoint format or a series of concurring parties adding supporting evidence.

    As long as the timeline of revisions is clear, it should still be possible to archive the journals. Having articles threaded together would not only make searching easier, but also reduce a lot of the redundant information that is included in each article.
  • It's OK to get drunk with your professors. Just don't sleep with them. Drink a tall glass of water NOW and take 2 asprin. E-mail me in the morning.
  • (yes, I am one of the head Freenet developers)

    Except that we share the philosophy in the name of this program, it really is something completely different from what we are doing. There are other systems that I believe attempt to combine the "LOCKSS" idea and permanent storage with anonymity, "The eternity service" is a name one hears a lot but that to my knowledge has never been implemented. The threat models of trying to protect data from being lost with time, and those of keeping it safe from censorship, are largely different.

    I have to say for once I agree with the moderator bashing ACs. This post was not particularly well informed.

    And, "guaranteed anonymity" is pushing it (there are no guarantees in life). See the FAQ.
    -
    We cannot reason ourselves out of our basic irrationality. All we can do is learn the art of being irrational in a reasonable way.
  • A-men. Aaaa-men. Aaaaaaa-men. A-men! A-MEN!
  • if i remember correctly LOCKSS makes sure that articles are only available to libraries who actually have (or had) a subscription (and thus have legal access) to those articles.

    melC
  • I shouldn't say this since I normally don't criticize ACs, but you are so full of shit that a bowl of hot grits down your pants would actually make you smell better.

    Thank you?

  • Yes, I read the article, but what I was responding to in part was the comment at the end of the article blurb:

    Sounds like self-interrogating distributed file systems can be useful to people unlikely to get sued by rock bands, as if that wasn't obvious.

    Given that this system has the potential to be implemented in such a way as to be openly accessible rather than limited to a set userbase, and that was what HeUnique seemed to be implying, that was why I wrote what I did. What it is now is not what it could be in the future, and this type of mechanism can be implemented in other distributed networks as well. And given that the content it stores does not necessarily have to be scientific journals, I think my point does apply.

  • All you have to do is get the tape archive of Echelon and then you've got copies of everything.
  • This is because ACs have brains too. The problem is a biased moderation system that

    a) unfairly demotes AC often by 2 points
    b) encourages kharma-whoring by offering a +1 attack

    The solution is not e-mail registration, that is for someone willing have this garbage traced back to them (a liabilty) despite getting absolutly no compensation to post this shit. Think about it: why should I have either an e-mail or an IP address associated with this drivel?

    Registration should only secure you a user name to aid continuity. E-mail and IP logging is unneccessary. User name and password - unix got it right how come slash got it wrong?
  • To be honest, I can't imagine many of the online providers being too unhappy about their back-archives being available in a distributed format (it would save them the trouble of holding such a database, and I can't imagine that issues more than (say) three months old provide much of a income for them) but formats may be a chore - many will be in pdf or propriatory formats, and the odds of more than one or two being in a common format are low (and the publishers aren't going to foot the cost of conversion).

    What we could do with is an online _based_ submission and review site for scientific papers; something based on the /. model (with a discussion area for online discussion and analysis of papers, some sort of versioning to allow corrections by the author, and the ability to rate papers on a scale of 1-10). Papers scoring highly (a weighted average of the scores) could then be submitted to a more formal 'classic' peer review, then see real paper (thus allowing Real World income from the process). The distilled papers that emerge from this should be of a higher quality, with the authors of papers that make good points but that have glaring holes given time to repair their mistakes, and in cases where a reader/reviewer is in a similar field and can fill in gaps the author missed, opportunities for both to produce a joint paper that neither could have competently completed alone.
    --

  • For the record, I've not only read your drunken rant, but enjoyed it. In fact, it is "funny".

  • This kind of information is currently provided commercially by any number of tools, most of which have web-based interfaces. SilverPlatter Information, Ovid, Dialog (amongst others) distribute large numbers of abstract/indexing databases of this kind of information which are fully searchable on keywords and many different indexes, thesaurus terms etc. Once you've found your record you'll generally get a URL link which will take you directly to the full text of the article you're looking for, assuming that the relevant publisher has made it available, and that your organisation has bought access.

    Although these databases are normally subject-specific. the above tools will allow you search across databases and get unified results, which will pretty much achieve what you're looking for.

    The problem is that you have to pay for this - you can't just get it for free. It takes vast amounts of time compiling this kind of data and it isn't feasible to do it for nothing. Most (if not all) academic libraries will have bought one or other of these solutions so if you're a student or researcher you'll most likely be able to access databases of this kind, and if you're lucky the library will also have bought web-based access to the full text of the articles as well.

    Q.
  • There is also another area for concern with online journals. When you or your library subscribe to a hard print journal you get sent a paper copy which you get to keep and refer to whenever you want -- for the rest of eternity if you keep it in good condition!

    This may not be the case with an online journal. Here the publisher can license the journal to you in such a way, that if you decide to stop your subscription, you don't just not get access to future editions, but you lose access to material you previously did had access to.

    I don't know how prolific this kind of licencing is, but I bet we are going to see more of in the future.

    This may not be too much of an issue at the moment as many journals are hard copy + online access; but eventually the hard copies are going to go.

    If you use an online Journal check out its license, and see where you stand.
  • It seems like most people are looking as to if this could be used to store, illegally, copyrighted data.

    This is for SCIENTIFIC journals. While the journal does have copyright protection, they run articles written by researchers at various universities (and in industry).

    The desire of more researchers to be published has resulted in additional journals being formed. Because publishing a journal on the web is dirt cheap, it makes sense that with the Internet available, more of these journals will appear.

    The problem is, you need an archive of it. This system is a system to guarantee that we do not lose knowledge. It doesn't even have to be available. They could cut a deal with the publishers of the journals that they will maintain the archive, but that if the company goes out of business or stops providing old articles, the archive can show them.

    This would be voluntary, but the publishers would jump at it. Why? Because this system gives them more credibility than a web page alone. The guarantee against the future loss is the best protection for their journal, which makes it more likely to get high quality entries.

    While Freenet or other groups may use similar technology, this is a COMPLETELY different project. This isn't about letting people submit data and protect it, this is about preserving the body of scientific knowledge so we don't lose it when a company goes bankrupt. Digital versions are easier to duplicate than paper equivalents, but our system of copyrights is trying to discourage that. E-books, E-journals, E-magazines have a significant risk. A copy of an article in a manilla folder can be lost or destoryed, but is otherwise perfect. A bookmark to a website can disappear at the whim of a publisher, and there are legal AND technical attempts to prevent you from properly saving an article...

    It's a very strange situation, and projects like this are VERY important to prevent us from losing knowledge.

    I know this sounds elitist, because I'm worrying about the body of knowledge of scientists but not other people. Here is the thing, the information age has allowed more people to publish their ideas and beliefs. However, because we have all jumped onto this technology, we didn't take adequate safeguards to ensure that we don't LOSE anything in this transistion.

    If we archived everything that was traditionally published, we'd have the old status quo. If we archive everything traditionally published and let others publish non-archived, we have a better system than the status quo. An environment where we publish everything, maintain nothing is questionable. In some ways it is better than the status quo, more liberal publishing, and in other ways worse, more data loss.

    The idea is to come up with a STRICTLY better system, where NOTHING is lost and we gain some advantages. Normally, there are tradeoffs. The goal is to avoid tradeoffs, and just make things better.

    Alex
  • The solution is not e-mail registration, that is for someone willing have this garbage traced back to them (a liabilty) despite getting absolutly no compensation to post this shit. Think about it: why should I have either an e-mail or an IP address associated with this drivel?

    Agreed. I mean it's only a weblog isn't it? It doesn't really matter who you are or what you post at all in the real world. And I don't really see the point in starting ACs at zero - after all, anyone can get an account and post the same stuff at 1. And WTF does an E-mail account say about who you are anyway? Nothing really.

  • Even one of the head developers of FreeNet has said your post was BS and still you defend it.

    He has also said, as of right now, that your posts are the truly pointless and stupid ones.
    -
    We cannot reason ourselves out of our basic irrationality. All we can do is learn the art of being irrational in a reasonable way.
  • I guess another problem is, there are a whole lot more people out there these days getting degrees, and producing papers. Some of it is just rehashing what someone else has done, or as you say "pure crap", but certainly some of it is valuable.

    Back in the early 20th century some brilliant researchers produced a wealth of new information and ideas that created an entire new subject, quantum physics. Back then it wasn't hard to keep track of the few hundred or so physicists and chemists who contributed to the developments. But now there are probably tens of thousands trying to continue research in the field. Too much to be able to adequately moderate. Interesting and important stuff can get lost in the sheer volume of information being produced.

    I'm rather fond of a radio commercial I heard not long ago that said something along the lines of "In the last 3 years more information has been produced then in the last 3 centuries." Too bad I can't remember what the commercial was for. But it illustrates my point. Information (and technology) is still increasing at an exponential rate, and it is becoming more and more difficult to begin to make sense of it all. I think the best we can hope for is system that can store data so that at some point in the future, when development slows, maybe someone can begin to sift through the massive amounts of information. Like archaeologists as you suggest (informationologists maybe?). However, doesn't that require preserving what we have? If information is "moderated out of usage through neglect" how will it be preserved for later generations?

    Spyky
  • by Anonymous Coward
    Excluding libraries bursting into flame or being flooded...

    There are a few other things that you need to add to your list of "advantages of paper over magnetic storage".

    1). Paper will not be erased by an EMP pulse if some idiot starts detonating large numbers of nukes in the statosphere.

    2). It's unlikely that all paper records would be lost in the event of a mid-range asteroidal impact event. Computer storage is another matter.

    3). Script kiddies can't HAX0R a paper document across the internet, no matter how many exploits they have available.

    4). Bill Gates can't "innovate" paper to define a propriatory paper standard that will make all of your existing books obsolete and unreadable on the latest version of his OS. The same can't be said for digital storage systems.

    5). You don't need lots of infrastructure to read paper in the daytime, so your data is readily accesible in the middle of nowhere. Try doing that with a labtop ( even if it uses solar batteries ).

    There are lots of good reasons why your most important data should be archived in paper form. Just remember to use de-acidised paper. Otherwise it will start to disintergrate in ~30 years.

    You might be strangling my chicken, but you don't want to know what I'm doing to your hampster.

  • Something that boils my piss is the huge cost of journals, the majority of which report the results of academic work funded by the public sector. What's the deal here? Our taxes fund [much of] the research, but we don't get access to results since we cannot afford to subscribe to the journals.

    Understood, the business model developed in the paper age when someone had to print & distribute academic papers. But I cannot see a good reason why firms lke Elsevier should continue to be as hugely rich as they appear to be.

    The web offers an easy way to take most of the cost out of the loop. What cost remain - that of web publishing, and having journals edited & papers reviewed, should (it seems to me) be capable of being funded from academic departmental budgets, in return for the academic judos of being an reviewer/editor/web publisher.

    In this enlightened scenario, there would be very much greater dissemination of the knowledge produced, to the benefit of a very much wider set of users.

  • What's the problem here..? Everything I wrote refers to scientific journals..

    Copyright issues are paramount for publishers, even for material which they get for free from scientific researchers. Try asking a major publisher for permission to quote from an article in a major scientific journal..

    Q.
  • LOCKSS, is a very good idea, but perhaps the Gutenberg Project or something similar would be a suitable receptacle for all the knowledge. The only obstacles would be copyright or lack of resources to transfer the data.
  • Thanks. So I was wrong, but hey, I'm not going to lose any sleep over the issue :) And at least the replies which were more than "MODERATE THIS SHIT DOWN!" have allowed me to learn something, so in a way it was worth me making the original point anyway...

  • Most e-journals are simultaneously released on CD-ROM as well as the internet. That should be the 'permanent' copy that exists in the library. Who cares if the web site goes down?

    If you're worried about the archival quality of CD-ROMs, that is a reasonable concern, but there are archival issues with books too that are dealt with by all libraries - they just have to learn some new stuff.

  • If they can do this they have solved the other major problem in the electronic publishing industry right now, sometimes called "the Harvard problem" or the "appropriate copy" problem, which is how on earth to tell whether a user has access rights to a given bit of text..

    This is one of the holy grails right now.

    Q.
  • I agree that the traditioansl online databases are a good source of information. Many people make the mistake of thinking that if information cant be found on the Web it doesnt exist in a searchable form. But librarians know different!

    Actually Datastar and Dialog do have web-driven interfaces now.

    Try
    http://www.datastarweb.com
    http://www.dialogweb.com

    However you have to be an already registered subscriber as the first thing you get from these sites is a request to log in with your username and password. And I think Datastar is just launching a service that will link to full-text e-journals.

    Personally (and this may make me appear a bit of a Luddite) I prefer the command-line driven interface - its quicker, more powerful and more flexible. But then I admit I havn't used the web interface too much so there may be features I'm missing.
  • There are other dead tree arhives that need saving.

    Museums also hold important archives that will simply not be available to the public in the near future as older books and journals become too fragile to allow casual browsing.

    For example the Natural History Museum in London contains archives dating back hundreds of years. The original diaries of Darwins voyages are held there but the pages are so fragile now that ordinary visitors can no longer examine them.

    Paper sources do not have an indefinite life and if they were reproduced electronically they could be available online as a resource to be treasured and not let to wither away only accessible to a few select researchers. A system such as LOCKSS could provide a cheap method to preserve ancient tomes and to promote wider access.
  • With the exception of reporting WHO has reviewed a paper (that's kept anonymous, at least in my neck of the woods), all journals offered by Stanford's Highwire Press (e-publishing middleman) offer exactly the functionality you ask: full text searches, citation monitors, etc. In fact, a feature of e-journals via HIghwire Press is that you can be alerted when someone cites a given paper, the assumption being that the citing paper is relavant to your interests as well,
  • Given that this system has the potential to be implemented in such a way as to be openly accessible rather than limited to a set userbase, and that was what HeUnique seemed to be implying, that was why I wrote what I did. What it is now is not what it could be in the future, and this type of mechanism can be implemented in other distributed networks as well. And given that the content it stores does not necessarily have to be scientific journals, I think my point does apply.

    Huh?

    This kind of sychonizing mechanism is already implemented in all sorts of distributed networks. Any rudimentary distributed file system (i.e. Coda) will perform such necessary synchronizations with each other. The point is the LOCKSS was never intended to be an truly-open-free-for-all system, and it respects copyrights by staying that way.

    No one in their right mind would creating a Napster-like sharing program that automatically synchronizes files with many other users and implement a cache that NEVER erases (after all, isn't that your whole point about it better than FreeNet?)

    I said it before, and I'll say it again - it's an apples-to-orange comparison. Given any sufficient and effort and time, any software system has the potential to do possibly anything. Saying this system has the potential to be a better FreeNet is akin to say ICQ has the potential to be a better OS, if "implemented" in such a way. The goals of FreeNet and LOCKSS are fundamentally different in every sense, and forcing one to be another would just give you a complex and inefficient hack of a system.

    Go get your free Palm V (25 referrals needed only!)

  • I am not associated with kuro5hin.

    I visit both sites.

    I point out k5 because I would like to see slashdot changed. If, before change, slashdot is replaced, so be it.

    I will not develop my own site using Perl or slashcode or whatever you call it.

  • If that's the case, please post while registered next time. No self-respecting AC would post that drivel. That AC checkmark box is a pox on this site. Either end AC posting or end the war on AC posters. Slashdot wants to have its cake and eat it too.
  • We live in an age where information is a commodity to be bought and sold. The power goes to those that possess the information. Corporatism dictates that nothing is free, so free information is anathema to the corporatist mindset. Corporations will always fight the notion of "free" anything.

  • When 'proper' scientific online journals emerge - ones that allow online peer review, rapid publication (by which I mean hours, not weeks),



    That wouldn't be peer review. Certainly one of the problems with submission latency is that of paper shuffling, but the major problem is in the fact that a paper is condensed knowledge that has to be studied by someone who is really smart - Ideally smarter than the people who submit the paper ;) These people tend to be busy doing other stuff. The only way I can see of getting that kind of turnaround would be with a large pool of referees, and I don't think it's generally feasable. Current peer review has problems, (a la the publications that appeared after the initial cold fusion annoucments (from, IIRC Utah), but it isn't that bad (IMHO)



    A delay of weeks isn't a problem, that's excellent turnaround. Months is common, and I've seen years. That's a problem. I agree that in certain situations it might be very usful to be able to have some kind of net-meeting-house, where scientists can discuss ideas and exchange results whilst leaving an information trail so that there isn't a problem over credit when the dust settles and the papers are written.



    best wishes,

    Mike

  • Does anyone here understand the purpose of this?

    Yes, very much so. It is trying to solve a huge problem, and it's a nice try.

    Where people are missing the point is in that they are assuming that academic journal publishers are nice, happy people who live in the academic world and believe in sharing information. This just doesn't happen in real life.

    I work on a project involving linking abstract information to full text. Some publishers won't even allow "deep-linking" to individual articles on their web sites - I'm talking about one of the major scientific journal publishers here - their URLs have an encrypted hash at the end to prevent anyone from producing deep links.

    Publishers are incredibly protective of their copyright and I just don't see something like this taking off for major journals. The article I read was talking about JStor - well that's a not-for-profit organisation and they only store back issues anyhow (a very useful service, don't get me wrong, but they're not a primary publisher).

    The desire of more researchers to be published has resulted in additional journals being formed. Because publishing a journal on the web is dirt cheap, it makes sense that with the Internet available, more of these journals will appear.

    The reason this isn't happening as quickly as it would appear is that existing scientific journals have a great "kudos" and prestige associated with them. Every article in those journals is peer-reviewed by respected academics in the same field. Having an article published in one of these big journals can greatly affect an academic's career prospects and pay.

    Doing this independently over the web is not impossible but it's a bit more difficult than just throwing up journal articles on a web site.

    I'd love to see a "Slashdot" style journal where people put up articles and otehrs within the field replied with comments, but we're a bit of a way away from this yet.

    Q.

  • I have removed kuro5hin from my profile because I am NOT associated with them and don't wish to cause confusion.

    I am also not associated with CNET - however they invariable post things 3 days before slashdot or 3 days after. Kinda funny actually.
  • My fear is that increasing amounts of resource will be poured into maintaining these academic papers for posterity, when many are nothing more than a rehash of earlier work, or turn out to be pure crap anyway.

    Yes, but the "pure crap" shouldn't get past the whole peer review process to start with, so you'll only be left with the material that is worthwhile. And any further moderation is censorship really - who decides what is and what isn't worthwhile? It'd have to be someone involved in the field in order to be able to properly judge it, but on what criterion are you going to judge whether one paper is worth more than another?

    In short, it needs moderating. Books and papers going out of print, or the final copies getting lost, are natures way of moderating irrelevant crap. Because, if it wasn't irrelevant, someone would have invested in preserving it.

    But what may seem to be completely irrelevant when it was first written may turn out to be essential to a later development a hundred years down the line. Especially in maths there are a lot of small developments which seem to be pointless at the time but which turn out to be a key part of a greater whole discovered later. You can't decide to throw things away on the basis of "relevance", since relevance is something which you can never tell at the time.

  • He was talking about scanning paper documents, i.e. those printed before computers became widespread.


    ...phil
  • I thought this was one of the reasons for the Library of Congress: to preserve information. Now, if we can just persuade the LoC to cat rectum | gunzip >head and realize the end of the 20th century is nigh we might not need things like this.
  • Many scientists keep electronic copies of articles they consider valuable or important in their field. (Not me, of course, I would never do this, unless it turns out it's not illegal.) I'm sure just about everyone I know keeps a little collection of pdf files, and collections of other kinds of data will certainly become popular to the extent that valuable articles are published in those formats.

    Anyway, it seems to me this ought to produce a natural survival of the fittest articles. Those articles that are most widely appreciated will be cached in the most locations. In a large enough field, this is almost certainly already the case. If less popular articles (i.e., those least appreciated by the scientific masses) are lost, this is probably no greater a tragedy than the loss of work which goes unpublished (i.e., those least appreciated by 2-3 reviewers).

    Of course, this enormous and rapidly growing archive is completely unorganized, and doesn't provide an easy mechanism for public access for the distant future. But the level of concern that valuable articles will be lost should be less than if people weren't already making enormous personal archives.
  • Theres also the Internet Archive [archive.org] who are building a library of snapshots of publicly accessible Internet sites, currently standing at 14 terrabytes of information stored on of information on digital linear tapes.

    The Internet grows at a rate of 10 percent a month, according to the Archive's estimates, while the average life of a Web page is only 75 days. Obviously, a lot of data is being lost. Much of that comes from commerce and media sites that often kill pages containing obsolete information.
    But some of this information is still relevant to researchers and historians

    For example The Internet Ecologies Area at Xerox's Palo Alto Research Center is using multiple snapshots from the Internet Archive on disk -- "the Web in a box" -- as a kind of test tube for understanding the Web.

    The ultimate goal of Internet Archive is to provide free access to the Internet's complete past, so that individuals looking for clues into how a culture changes will have one more medium to play around with.
  • Call it LIBNET.

    Hell you could even have Deja News index all the articles for you.

  • Agreed that paper is still the most useful form of storage if done right (acid free, well stored, etc.), but the digitalized form does address two major issues:

    (1) accessibility of the data from anywhere;
    (2) having an additional archive for redundancy.

    Being a history buff myself, I have something of an attachment to good old paper and it's close friend microfiche. :)

    -L
  • This is definitly a great idea. Alot of people would lose out on information if it wasn't for something like this. I hope experiments like this will even help us push towards preserving our printed material. Here at the University I attend, there has been a real push to preserve printed material in a digital format. I just wish this would help urge the Library of Congress start some sort of "digital preservation" project.
  • And in an ideal world, archaeologists would be out of a job. And that's not just because an enemy of mine is an archaeologist :-)
    The reason it is both valuable and necessary to be able to store all kinds of documents in perpetuity is two-fold:
    (a) you will never lose anything by having more information available to you
    (b) only hindsight can tell you what kind of information will be valuable.

    If we had been able to store all the information about past civilisations then we wouldn't need archaeologists, who are in essence glorified hardware-based search engines!

    A far-fetched example to illustrate point B: far in the future, years after chickenpox and all other viral diseases have been eradicated, a random mutation creates a new chickenpox-like disease. Diseases were eradicated, so people decided storing information about how to cure them was unnecessary. The plague of ChickTwo wipes out all life on earth :-) (Hmm, I smell next year's blockbuster...)

    A more down-to-earth example (I always think of them later) for point A would be that you encounter an engineering problem that was attempted but never solved years ago: it would seem documenting a failed attempt at doing something would be a bad idea, but in reality being able to check what everybody else has already tried would greatly accellerate your own attempts to solve the problem.

    Information is going to be so easy to store in such large amounts that soon the issue of whether to bother to store it will fade away. What does need work, as numerous others have mentioned, are better search engines and methods of ranking items by relevance, not importance: this is not the same as throwing them away.

    Relevance-based searching, as opposed to popularity-based, is why Google [google.com] is a better search engine than most/all others.

  • Well, sounds like a good system.


    But..


    Papers scoring highly (a weighted average of the scores) could then be submitted to a more formal 'classic' peer review, then see real paper (thus allowing Real World in come from the process).


    I would not use such a system, because there is no security for my valuable work... someone could take my idea, give it a twist, and push it into peer review. The current system assures a certain level of security (altough bad things do happen, but rarely). However, if the system would be begin used for a prolonged period, these problems would probably be solved...


    Jeroen

  • Yet another archiving solution? Ye-Gads!
    Disclaimer: I have a MLIS and I used to work for an organization affiliated with OCLC. I now work for a wholly owned subsidary of Reed-Elsevier, who btw is not participating in this project that I am about to write about.

    I completely understand the need to archive data/research, especially those found in STM journals (Science/Technology/Medicine). History has shown us the dangers of not being diligent in archiving AND it has shown us the difficulty with archive. There already exists an organization in the library community that is providing an excellent archiving solution. That organizatin is OCLC [oclc.org]. They have been a repository since 1967. Starting out with archiving cataloging records and sharing them (for cost to preserve/maintain) to their membership.

    OCLC's archiving solution is called ECO, Electronic Collections Online, where a good number of publishers from around the world are supplying OCLC with digital copies of their journals to be maintained. Additionally as technolgy and storage media change, OCLC has taken the leadership in migrating that data to new standard formats as they evolve. Information on ECO may be found here [oclc.com] and specifically information on the archiving is here [oclc.com] and the participating publishers are here [oclc.org].

    Of course everything has a cost. Any university that is taking on this type of activity should really do a serious study on why they are doing it, how much are they willing to spend, will they or future administrations continue funding their archiving project, or should they combine resources with agencies/organizations that are already doing this.

  • Your library must suck. The card-catalog computers in the Public Library of Charlotte & Mecklenburg County are new NEC computers that are built into an LCD monitor! They only run one app! I have rarely been to the library, I need to go more often, but I've also noticed that they have iMacs in what looks like the media section of the library, and other PC's scattered here and there.

    I think that this idea has some merit, but nothing that should libraries should allocate large amounts of money to, but worthwhile still. I hope they are archiving them in plaintext, though.
  • You're not entirely right about the impermanence of Freenet. First of all, nothing is ever discarded unless the allotted storage space is used up. Since this can be set by the node's owner, it is conceivable that nothing will be lost for a long time (especially in smaller Freenets which could operate primarily among academics). This is discussed further in this section of the FAQ [sourceforge.net]. Additionally, one of the developers tells me that when fully implemented it may be possible to have a node set to request a file before deleting it, so that it can determine if it has the last copy available. If so, the file could instead be compressed and archived, so that it can be recovered if need be. Again, this would probably work better on an independent academic freenet than on a WWW-like one, where enough crap gets posted that saving everything would be cumbersome.

    Then again, something about these storage-space arguments strikes me as silly, in a world where there are multiple complete archives of Usenet...

    - Michael Cohn
  • Yes but the topic is about online documents. I.e., the goofball just read the headline and posted the first random thoughts that came to mind.
  • Right now we have little dificulty looking at journals (scientific or otherwise) from 150 years ago or more. The pages may be brittle, but the information is there. Obviously, documents thousands of years old still exist and provide priceless information to researches.

    The larger goal of electronic preservation is to ensure (if possible) that the electronic data of today will have a lifespan equal, or better, to a physical copy. Right now, this is not possible, and it actually seems that the situation for digital information is worse than for physical data. Think about it, depending on whom you talk to, a CD will last 25-75 years, magnetic media have obvious limitations. Then there is the problem of technology/platform change. How can we guarantee that in 150+ years, PDF, HTML, or even digital data itself will be able to be read? We could be using quantuum computers with some bizarre storage medium taht are completely incompatable with today's technology.

    A common solution seems to be to just transfer file formats to tomorrows technology as it is created, and transfer these files from the old media to new as it expires. But at the rate humanity is accumulating knowledge, we could quickly be spending more of our time changing PDFs to whatever, and then that whatever to the next whatever, and so on. Another solution is to maintain the archaic platforms of today so that our files can be read. But, however well cared-for, mechanical devices will break down in time, so that is not a realistic option.

    Another possible option is to maintain write emulators for all of the platforms in exsistence today, so that they can run linux or windows or whatever on the badass machines of tomorrow. But that runs into loads of proprietary technology/patent/copyright/legal issues that slashdotters are all well familiar with. Data needs to be freely available to researchers of the future. It should be just as easy (ie, no license required) as it is to pick up a book off the shelf.

    So from what I know of digital library collection preservation, the situation at present is pretty grim. We are spending huge amounts of money to rush to publish documents in a digital format, with no assurances that 100 years from now (much less a thousand), this data will be available for general concumption. We are all hoping that it will just "work out" or a technological panacea will emerge.

    For more on this topic, try this link [clir.org].

  • I think that 90% of slashdot readers would agree that the world has moved on since the days when math and physics were relavent

    Logically, the term 'relavant' sic requires a to: clause - relevant to what?

    ...today's economy demands is tech-savvy research ...

    OK so obviously it's a troll, but the points are worth refuting anyway

    Could a physicist have come up with Java ? MP3 ? Napster ? KDE ?

    dunno, but

    1. Tim Berners-Lee is a physicist and we wouldn't be having this discussion if he hadn't looked into technological approaches to sharing scientific documents, and
    2. mp3 is a compression algorithm (from Al-Kwharizmi, 14th century arabic mathematician) and so a mathematician (or several) DID come up with mp3
    3. hardly life-changing exsamples, now are they?
    CS and Marketing, these are the Physics and Math of the new economy

    forget CS, think Engineering. Forget Marketing, it's an activity not a science. And then think, hmm, the biggest explosion in literature volume over the last twenty years is in the biomedical disciplines. That's not only where a lot of research is being done, but also where staggering amounts of money are being spent. And there can be no biology without chemistry, no chemistry without physics and no physics without maths.

    I think ...

    actually, it doesn't look much as though you do....

    I suggest you talk to some scientists at some point and get a clue

    TomV

  • True ACs rarely attack their own kind. Please uncheck the "post anonymously" box if you have any integrity.
  • Don't sweat it. I'm just being pissy.

    Your post was actually dead on. The problems here are not technological. They are legal. Even if a company goes belly-up, they're IP does not become public domain. It will most likely be held idle by liquidators until some other company can purchase it. Even if its bought to be buried. Thus, there is the very real possibility of losing the RIGHT to access this material once its officially offline. Or at an online publisher's whim if they put out something embarassing.

    Rock on, Quaryon!

  • > (yes, I am one of the head Freenet developers)

    Then thank you for working on such an important
    system. If I ever get enough time in my schedual
    to give the code a good read over...I will
    probably join development (actually...I am
    thinking about implimenting a freenet server in
    PERL - is there an updated version of the
    protocol docs?)

    > And, "guaranteed anonymity" is pushing it
    > (there are no guarantees in life). See the FAQ.

    Well not true...there is 1 garauntee...death.
    (sorry to be so morbid...but its true)

    -Steve
  • I'll agree that sometimes the "old" method of publishing does get a little conservative, I do think this method does do a few things well. (I have been on both sides of the cueball, as a writer and referee...)

    • Slowing down publication is a good thing. It forces the writer to take the time to double check the data, make sure that the paper is readable, and the conclusions you are presenting are correct. (You'll be amazed at what you catch)
    • Contrary to popular belief, "peer review" does not (nor should not) happen instantly. When I recieved a paper to review, I would first read through it once, and then again to make sure that everything made sense. Then, I would go to the library and look at some of the prominent references mentioned in the paper, to get familiar with the research and see the paper "in context". (Again, science does not happen in a vacuum.) Many times a scientific paper is very specialized, and even experienced referees may not be innately familar with the subject matter. (Case in point: I was an experimentalist, and would often need to go find some of the latest theoretical work.)
    • Paper journals force you to live outside of your specialty. Online searches are too good at giving you 'exactly what you want'. When you thumb through the paper journal (or even the online version of a paper journal), you might find a paper that is pretty applicable to what you are doing. ("Hey, that's pretty close to what I'm trying to do!") You lose this with specific topic searches.
    Paper journals provide what online forums struggle with: crap control. It's not a perfect system, but forces a little thoughtfulness into the process. (Besides, I can't imagine wading through a bunch of "FIRST POST" and "Hot grits in the pants" articles in the latest copy of Phys. Rev. Letters...)
  • . . . they make more money than God, by charging libraries $10K for a subscription simply because they KNOW THEY CAN. (No, really, there's at least one journal that costs ten thousand a year, I used to work in libraries in college).

    Sure, journals cost a lot of money to produce (in the hardcopy world). But a whole lot of academic journals are simply an exercise in price-gouging. They charge $10K because they know damn well that the faculty of a university will DEMAND that the library carry a specific high-prestige journal. There isn't any fundamental reason to charge what they do -- witness their profit margins. I'm sure Elsevier would make noise about how their high-margin journals finance the low-margin ones, but that's simply a lie -- Elsevier makes too much raw profit for that to be the case.

    This, to my mind, is why online academic publishing is so important -- information won't be locked up in these expensive ghettoes any more, and more researchers (and students) will be able to access it.

    This is also, not coincidentally, why you won't see any major companies like Elsevier getting involved in low-cost online journals (for them it would be like killing the goose that laid the golden egg).

    The only way Elsevier will go online with their stuff is if they can charge multiple hundreds or thousands of dollars for access.

  • Surely the most basic difficulty with the idea is the fact that even if the data is replicated enough times to ensure it is always available in at least one copy, it may never be found!

    If I was a university professor, I would be much more worried about the article being present in the same place it was when my student used it in an assignment, than whether or not it existed at all. If the "place" the knowledge is, isnt fixed then it would be necessary to uniquely identify each copy of an article, otherwise you wouldnt know if it was a true copy.

    This is obviously a problem that needs to be tackled, but isnt the correct place for it to be tackled somewhere like the library of congress. The Library of Congress (and its counterparts worldwide) already get a copy of each and ever article, magazine, book etc printed. Would this not be the obvious place to locate an online scientific journal repository? They are'nt going to go out of business!

  • I think you may be a bit optimistic about a review time of hours. No matter how easy it is to publish the data on the web, the review process still involves people sitting down and reading the paper to see if it's Any Good(tm). This takes time. How much time depends on the paper and the field and the reviewer, but it's likely more than just hours.

    Maybe a good question to ask is what an electronic journal is supposed to offer over an electronic preprint server. Preprint servers exist for several of the sciences; the most well known is the xxx.lanl.gov, but there are other more specialised ones too (my favorites are eprint.iacr.org / philby.ucsd.edu and the Electronic Colloquium on Computational Complexity).

    Generally preprint servers accept submissions from anyone and do almost no review before posting something -- something like "is this a scientific paper in English" (although standards do vary). Then it's up to the archive users to figure out if something is any good or not. This means new submissions can go up almost as soon as they are received. The flip side is that sometimes things turn out to be wrong, and then they're marked as such *after* dozens of people have downloaded it and spent precious time trying to figure out what the heck is going on.

    So does the main "point" of an electronic journal come in having better peer review than a preprint server? or is there something else?
  • Our taxes also fund the local fire department, but that doesn't mean they have to give us rides in their big red trucks for free.

    I take it by "Free software ethos" you really mean you want journals to be free? Part of the reason academic journals cost so much is because (a) they have a very limited audience, (b) they generally don't sell their space for advertising, and (c) their target audiences, research institutions and universities, can (usually) afford these prices. Subscription rates for individuals, while expensive, are not outrageously so in my opinion for most journals.

    If you've ever tried purchasing an esoteric book in the science or mathematics fields, you've probably experienced something similar: a 150-page book may retail for 150$, when the local grocery store hawks pulp fiction by the metric ton. As you identified, it results from their business model: if you are only going to sell a few thousand of something, then a high markup is required in order to make even a modest profit on your work. While I agree that academic books and journals could be cheaper, and they should be so when the distribution costs are lowered due to electronic publishing, I doubt that they could be made completely free without sacrificing quality in the process. Many journals that publish electronically (for example, the Physical Review Letters) offer lower subscription rates for the electronic version of their journals than the paper version.

    Incidentally, free electronic journal services do exist, e.g. the Los Alamos e-Print archive at xxx.lanl.gov [lanl.gov]. One thing you will probably notice is that while many of the articles are outstanding, just as many are "I wiped my nose this morning and decided what I saw on the tissue was publishable so here it is" quality. It's hit-or-miss with these articles sometimes. Standard practice among many disciplines is to archive an early draft of the work on the ePrint archive and then publish the refereed, edited, corrected version in a journal such as the Physical Review....

    In this enlightened scenario, there would be very much greater dissemination of the knowledge produced, to the benefit of a very much wider set of users.

    ...which brings me to my point: High quality, refereed journals that cost money are, in many ways, superior to unfiltered electronic archives precisely because they charge for their services and then in turn use a portion of that money to perform quality control. Part of what one pays for is the process of having experts in the field (hopefully) perusing each article closely to catch mistakes made by the authors or elucidate points the authors left unclear. Editors coordinate the refereeing process, and publishers maintain an infrastructure for ensuring this process happens in a timely manner. As long as people are willing to pay for quality control, then a market will exist for these journals. Electronic publishing can do away with many of the costs of publication and distribution of the information, but I don't see how it can reduce the cost to zero without asking publishers to simply get out of the publishing business altogether.
  • What we could do with is an online _based_ submission and review site for scientific papers; something based on the /. model (with a discussion area for online discussion and analysis of papers, some sort of versioning to allow corrections by the author, and the ability to rate papers on a scale of 1-10).

    This does sound like a good idea but I think the biggest problem here would be setting up a decent readership.

    I've been a PhD student now for a few years and I find it difficult to read and critically assess papers submitted to journals and conference proceedings in my field alone. And these are the papers that have already been peer reviewed. Slashdot works, I believe, because "many moderators make most trolls invisible". A Slashdot-esque review process in a small community would be succeptable to a small contingent of people overhyping garbage.

    Another problem that springs to mind is the current protocol of referencing sources when writing papers of your own. If these prototype papers are freely available people will want to reference them. Care would have to be taken to make sure there is an easy way to reliably do this. There would be nothing worse than reading a good paper that makes impressive, bold claims only to find that all of their sources have disappeared off the face of the web! I guess this brings us back to the point of something like LOCKSS (which I believe is a very good idea).

    Ultimately, the scientific community is precisely that - a community. The internet is just starting come up with excellent ways of getting communities to interact faster and better but its groups like Slashdot, web developers, and other net-centric collectives that are currently taking advantage of this, naturally. The scientific community was around along time before the net and seemed to do okay so I think it will take quite a bit of gentle persuasion to move it online. There are certainly worse goals to strive for though and I'd love to see a move like this.

  • The other problem here is not that data is lost due to simply disappearing. We also loose all the intermedate revisions of a document. All we end up with is the final final final, most recent draft.. I suspect (especially with politically sensitive reporting for historical purposes) that the intermediate copies could add a lot to the sum total of information.
  • I'm no History and Philosophy of Science expert, but my understanding was that one of the main reasons for "the rise of the conference" in fields like computer science was that publishing in journals is such a rigged game. Nowadays a lot of new results are first reported in conference papers, and when the theory is rounded out, a journal paper is prepared.

    No reason why the money (which are, as mentioned in other posts, truely disgusting) given to these publishing companies cannot be used by universities and research institutes to build a distributed knowledge record for research.

    The Academic Publishing Industry is not long for this [brave new] world.

  • by gbnewby ( 74175 ) on Thursday May 04, 2000 @04:45AM (#1093367) Homepage
    One comment (the person with the MS LS/IS) mentioned part of the key: the cost of journals, including e-journals from 'regular' journal publishers, is astounding!

    Here at UNC-CH (where I serve on the Library Administrative Board; teach Library & Information Science; also I help run Project Gutenberg - good enough?), total subscription costs for journals published by Elsevier are around $1millionUS/year. Subscription costs go up every year.

    The deal that Elsevier offered for access to their e-journal collection (electronic access to print journals) was a little complicated, but boiled down to:

    • UNC gets access to e-journals + print journals, with some extra e-journal titles thrown in "free"
    • UNC is not allowed to cancel ANY subscription for at least 3 years
    • Continued access is vaguely guaranteed - maybe it's through OCLC, maybe another organization. Basically, it's "trust us."

    The solution is to accept the deal (sort of Faustian, I'd say). At the same time, we've made local agreements with Duke and NC State to make sure one location keeps a print copy of every journal we're otherwise getting an electronic copy of.

    This way, libraries are sure they're continuing their archival role (with paper, in this case), but at the same time trying to offer the benefits of electronic access to their constituents.

    Bottom line: While we don't really know how to best maintain archives to ejournals, at least libraries can cooperate to make sure some sort of access is retained, while going forward with new e-journals.

  • Finally, someone who really gets it. I laugh at all this other typical short-sighted beating on the method in the original post and lame attacks on the value of archiving scientific data.
  • They are self-organizing and self-policing.
    If they do a bad job, then people start new
    ones to do a better job. The client chooses
    which ones to join or start based on their
    perception of quality.
  • I'm sure that if the LOCKSS system works out well the librarians and archivists will still complain.

    They are just the type who will complain about anything. I mean, they get to sit all day and stare at shelves of smelly old books, and they have nothing better to do than complain.

    Personally, I think that if they really want to do something they should all get themselves some really reliable laser printer [hp.com], and get started on making those electronic archives permanent. ;D
  • no problem! just put all your priceless articles on an NT webserver* in Word2000 format. they'll be available for the ages to Share and Enjoy!#

    * assuming the evil "hackers" don't trash your webserver (though we have no idea how, since there are no security holes in IIS)

    # at least until we release the next version of Word

    seriously, I don't think it's going to happen any time soon. I'm publishing a paper this summer, and there is a whole freaking PAGE I have to add in saying that SAE owns any and all rights to my paper, etc.

    personally, I find that a little scary.

    nor do I think that they're going to let just anyone archive them -- maybe that service that already archives about a zillion news sources. that's subscription-based, though, and I don't want to see what's going to happen once we don't have paper copies to rely on anymore!

    Lea
  • no problem! just put all your priceless articles on an NT webserver* in Word2000 format. they'll be available for the ages to Share and Enjoy!#

    Lea

    * assuming the evil "hackers" don't trash your webserver (though we have no idea how, since there are no security holes in IIS)

    # at least until we release the next version of Word
  • This doesn't just touch physics journals- although the physicists are more likely to be rational about the issue than the record or movie industries. But no debate about copyright or intellectual ownership I have seen to date has looked at the long-term issues. And by long-term, I don't just mean years or decades, I mean thousands of years.

    Everyone decries the burning of the library of Alexandia. It's destruction has greatly impoversihed us today, it denied us access to the thoughts and works of the people who wrote those books. The burning of the library of Alexandria was a lobotimization of human culture.

    But if Alexandria was a lobotomization, today's IP rules are senility. This is because, for short-term gain, the ignore the fact that the only way information survives long-term is if it's _copied_. No media lasts forever. Pop quiz: how many books today are older than 200 years old? Euclid's "Elements" survives today because it was copied, and copied again. The image of cloistered monks painfully hand-copying books survives to this day.

    We need to address the concerns not only of the artists, writers, creators, and IP corporations, not only of the IP consumers, but also the concerns of our decendents a thousand years from now. Otherwise, we risk being a culture with no history, and therefor no future.
  • Why is paper considered more "permanent" by any standards? It is bulky, it requires trees, something we don't have a lot of, it is way too difficult to disseminate (as someone who has done a quite some research I can tell you how backbreaking it is when you have to lug massive journals to and fro), it is difficult to search through it. And if you happen to find that the particular article you happen to be looking for has been neatly cut out by someone, well can't do much about it except cursing it. Photocopying journals is a pita. And most important, the speed with which research is done today means that by the time that paper journal is compiled and reviewed and edited and mailed, the whole thing is already outdated. Journals aren't something you curl up in bed with (usually). And data on hard drives is way more easy to backup, archive, disseminate then on paper. Paper decays with time and degrades with use, which can't be said for electronic media. I for one, think that paper journals can be completely done away with.

    Farhat.
  • If they're worried, they should just archive away.

    As for the journals making it difficult, I think the universities are paying way too much for an independent party to just organise and collate _their_ material.

    The universities should just get together and set up their own journals. They can then publish, subscribe and archive for whatever prices they want.

    Storage is pretty cheap considering the volume involved and how much they are already forking out to the journals per year.

    Cheerio,
    Link.
  • ...if there is a already a good tool out there to do this sort of thing? Check out CVSUP [polstra.com], used to replicate all web resources of FreeBSD [freebsd.org] as well as the online newspaper DaemonNews [daemonnews.org].

    Can anyone here say NIH?


  • paper will last longer than any computer.
    if you print on hemp or cotton paper, there are samples that have lasted more than 400 years. you really think that the html you create now and store on a hard disk or CD-ROM will still be accessible in 400 years... <ha ha ha ha ha!>

    anything physical/electronic is transitory -- get over it.

  • No, not obvious.

    I guess you are too young to remember the 8-track tape. Or videotapes in Beta format.

    CDs have been popular for about 15 years now, but what happens when the Next Big Thing hits?

  • Our taxes also fund the local fire department, but that doesn't mean they have to give us rides in their big red trucks for free.

    No, they put out fires for free; that is the purpose of funding them. The purpose of funding them is not to provide a taxi service.

    I concede some of the points you make, such as that there are free & low cost electronic journals, that cost will never fully be wrung from the system, that you generall have to pay for quality, that there is too much crud around.

    If government can fund research and education - as it does in the UK and EU and presumably US - can it not fund whatever costs cannot be wrung out of a system that dispenses with the Elseviers? Yes, you need reviewers and editors and administrators, and the better they do their jobs the better the quality is. But in my experience, reviewing journals provides kudos for the reviewer, which they parley into better & higher paid academic jobs. The better the reputation of the journal they review for, the more the kudos. For me, writing and reviewing papers would seem to be part of the normal fare of the academic, the salary of whom is paid by the state. Editorial and admin tasks are surely not so onorous that their being funded by Universities (The MIT Journal of This, The LSE Journal of That) would cause bankrupcy. A can see a Free Journal business model which appears to be Win (for readers) Win (for academics) Win (for government) Win (for society, development, implementation of that which arises from the research) Win (for quality...or at least, no worse quality than is currently the case.)

    Granted, finally, we might have to meet in the middle ground somewhere. Currently the stakes are well against the impoverished - or even the fairly wealthy - would be reader.

    And, I would argue, there is value in getting this information out of the University library shelf and into the hands of the masses (or that subset that is interested). Yes, we need better crud filters and less trash published. But we need better access now to the fruit of the research that we fund.

    Finally, I recall back in 95-96, in the UK, there was much debate in government as to whether government publications should be sold or given away free on the internet. Thankfully, the decision from central government was "publish for free". And its great. Now, with ease, I can read Hansard, Bills, Acts, Statutory Instruments, Research Papers. Government recognising that citizens should not be charged to read those things they have paid to be written. Let us see if the Academics can be pursuaded to look down from their lofty comfortable subsidised towers into the real world - small town England - where there is the same demand to read that which for which we have already paid.

    (I once had lunch with the chairman of Elsevier; nice bloke - he picked up the bill too.) Hmm. I think

  • What you say is true for some large journal publishers, but certainly not for all. Remember that while many major journals are for-profit and have the interests you describe, many other journals (possibly more) are actually published by non-profit academic societies who have more interest in disseminating their information than in profit (although many of them still have serious issues with the control of their content, of course).

    In fact, as it happens, LOCKSS is being worked on in association with HighWire Press [stanford.edu] at Stanford (disclosure: also my employer :), which publishes the online versions of nearly 200 major scientific journals and has a _very_ good relationship with a large number of publishers. They also list the Journal of Biological Chemistry [jbc.org] and Science Online [sciencemag.org] as "partners".

    I wouldn't be so quick to dismiss the interest that the non-profit academic societies have in preserving this information.

  • Really, who cares? This isn't the burning issue. Fact is, any papers of any worth whatever will be downloaded and stored by others working in the same field. The University where the paper originates will certainly have a copy.

    The real issue which everybody seems to be ignoring is that most scientific journals charge exorbitant subscription fees. Remember that in order to cover some very active fields (eg neuroscience) you need access to a dozen journals. This puts access to scientific papers completely our of most peoples' reach since most people don't live within a short walking distance of a University library.

    Most published scientific research comes out of universities and government-funded laboratories and much of it is ultimately paid for by you the public. In any event most scientists today would surely agree that this most fertile fruit of human knowledge should belong to humanity at large, not a select few.

    Us open source types often bang on about how the open source model of software development is theoretically sound because it closely resembles the peer-review model of scientific progress. But while with open source software absolutely anybody can obtain code through easily accessible channels and abslutely anybody is welcome to contribute, with mainstream scientific research not only are laymen mostly excluded from actively participating, they also generally can't even get to read the published details.

    The whole system of publication in subscription-only journals is thoroughly outdated and completely inappropriate for the so-called "information age" IMHO.

    Consciousness is not what it thinks it is
    Thought exists only as an abstraction
  • This does sound like a good idea but I think the biggest problem here would be setting up a decent readership.

    I've been a PhD student now for a few years and I find it difficult to read and critically assess papers submitted to journals and conference proceedings in my field alone.
    Much of this is chicken - and - egg stuff - without a large, searchable database, the readers won't come, but without readers and submitters, you don't get a base to work from. It may work if the "classical" media allow their data to be imported

    Another problem that springs to mind is the current protocol of referencing sources when writing papers of your own. If these prototype papers are freely available people will want to reference them.
    Given a suitably large database then references could be hyperlinks to the same server - obviously, you would have to stop people using currently Draft-status papers as backreferences, though.

    I suspect that the problem is a startup one - given a WEIGHTED average, then a person with a number of accepted papers in the field would have more weight than someone who has none to his account - obviously, *enough* students would outweigh such an authority, but he would have to be pretty wrong for that many people to go against him. The other requirement however would be the balance between the advantages of anonymous posting (with an account - I think such a system would not work with Pure AC's) and people having to put their NAME to a piece they write.
    But hey, I am just throwing out ideas here, not offering to write the code :+)
    --

  • I would not use such a system, because there is no security for my valuable work... someone could take my idea, give it a twist, and push it into peer review.
    I suspect this may depend on the interaction between this system and the classic media - I suspect that the simpliest method (logging who accesses draft papers, with the Peer Reviewers taking a dim view of the sort of Claim Jumping you are proposing) would not work; people would end up asking friends to look up stuff in draft papers just in case something they were already working on was already there, and it compromised their own work by making it seem like a seedy copy. Does anyone else here have any ideas on this?
    --
  • Good point. It should be legal for libraries that subscribe to journals to archive the articles they have paid for. This way the distributed system of backups is maintained at all the universities that ever subscribed to a journal. Much like the print versions. It can't take that much data storage to archive the existing online journal, in the worst case scenario, several universities could band together to create multiple secure backups of the materials they have subscribed to.

The explanation requiring the fewest assumptions is the most likely to be correct. -- William of Occam

Working...