Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Internet Science

Web Pages Are Weak Links in the Chain of Knowledge 361

PizzaFace writes "Contributions to science, law, and other scholarly fields rely for their authority on citations to earlier publications. The ease of publishing on the web has made it an explosively popular medium, and web pages are increasingly cited as authorities in other publications. But easy come, easy go: web pages often get moved or removed, and publications that cite them lose their authorities. The Washington Post reports on the loss of knowledge in ephemeral web pages, which a medical researcher compares to the burning of ancient Alexandria's library. As the board chairman of the Internet Archive says, "The average lifespan of a Web page today is 100 days. This is no way to run a culture.""
This discussion has been archived. No new comments can be posted.

Web Pages Are Weak Links in the Chain of Knowledge

Comments Filter:
  • by Anonymous Coward on Monday November 24, 2003 @10:36AM (#7547425)
    > Why don't we setup a sort of unique web page number ...

    Read the article. They mention a system called DOI.
  • by ericspinder ( 146776 ) on Monday November 24, 2003 @10:36AM (#7547433) Journal
    You mean to tell me that those researchers found a dead link on the Internet, the horror. Were can I get one of those jobs!
    Another study, published in January, found that 40 percent to 50 percent of the URLs referenced in articles in two computing journals were inaccessible within four years
    That's because they were ads for companies that went out of business.

    besides if you want to see old pages just go the the the wayback machine [waybackmachine.org]. Between that and backup tapes, everything you ever wrote still lives (in many cases I wish it didn't !).

  • by kalidasa ( 577403 ) * on Monday November 24, 2003 @10:36AM (#7547434) Journal
    There already is such an identifier. It's called a Universal Resource Identifier, or URI. See Berners-Lee essay Cool URIs Don't Change [w3.org].
  • DSPACE (Score:1, Informative)

    by Anonymous Coward on Monday November 24, 2003 @10:49AM (#7547540)
    Look at DSpace [mit.edu], the mission of which is "To create and establish an electronic system that captures, preserves and communicates the intellectual output of MIT's faculty and researchers."

    Each data set (collection) has a handle [handle.net], suppoosedly longer lasting than URNs. We're talking about long term data storage here.

    There's an implementation [cam.ac.uk] of it at Cambridge University, and my organisation will be evauluation it as soon as the SuSE Linux Enterprise Server software lands on my desk and I've installed my server.

    Tom.

  • by YU Nicks NE Way ( 129084 ) on Monday November 24, 2003 @11:07AM (#7547650)
    Even if your statement accurately reflected the concerns in the article, it would still be misguided.

    Historians are concerned about all the ephemera of a civilization, not just the "official" ones. The random archives of everyday junk can, and often do, tell a very different story about the civilization than the story that the society would like to hear about itself, so historians treasure those postings of pics for your family to see.

    For example, if you read the official press, you'd see a lot of articles about how bad the economy is for IT folk. That's entirely true, as far as it goes, but it only goes so far. The official press talks about the disappearance of jobs, and about the outsourcing of jobs, and about the unemployment rate, but doesn't talk about the fates of individual people displaced by the upheaval. Are the people who've been thrown out of work starving, or are they managing to live and to feed and clothe their families? The official story doesn't cover that -- but those silly little picture pages do, just by showing the children of these unemployed workers well-fed and dressed in new-ish clothes. Web pages are very cheap, so that indicates that the unemployed techies aren't starving.

    It's kind of like the character in the play who found out one morning that he'd spent his whole life speaking in prose. You've spent your whole life participating in the culture, and a record of that life is important to a historian interested in your culture.
  • by boneglorious ( 718907 ) on Monday November 24, 2003 @11:38AM (#7547861) Journal
    Do you think because you print it out it suddenly becomes a more stable reference? Sometimes people doing professional articles have to cite web pages because that's where the information they are talking about is.
  • by Xolotl ( 675282 ) on Monday November 24, 2003 @11:51AM (#7547963) Journal
    Printing it out merely saves the information in another location, it doesn't change the citation. Citations are not for yourself, but for other people to see where you got the information from - a journal, private communication, or, in this case, a webpage. Once the webpage is gone they can, of course, come to you for the information, but can't check it for themselves - which is the point of citations in scientific articles.

    As for point of citing webpages, often they contain information such as HOWTOs or work-in-progress which may be very useful but is not yet published - and, perhaps, never will be.

  • by c13v3rm0nk3y ( 189767 ) on Monday November 24, 2003 @11:51AM (#7547965) Homepage

    Hmmm. I'm not sure most scholary works are allowed to just cite arbitrary URLs for inline references or footnotes.

    The idea is that you generally have to cite peer-reviewed, published and presented articles; criteria which the majority of web published material simply does not satisfy. Web reading would fall under the "course reading", and would have to be backed up by a "real" reference.

    According to my GF (currently working on a Masters in Anthropology) there is a lot of confusion on how to use the web for scholary references. Many people cite URLs in citations that are really just online archives of previously-published work. In this case, noting the URL is like saying which library you checked the article out, and what shelf it was on. If you are an undergrad and cite a URL, it is almost a sure thing that the prof or the TA's will take marks off for improper citations.

    There are a few peer-reviewed journals that are (partly or completely) published online, in which case the URL might be a valid citation. This is likely to changed, and it seems the original article was suggesting that we need to handle this case now, before we lose more good work.

    In a much smaller way, this is the kind of thing that those involved in the whole blog phenomenon are trying to resolve [xmlrpc.com]; making sure that their blog-rolls, trackbacks and search-engine cached pages stay historically maintainable.

  • by mtpruitt ( 561752 ) on Monday November 24, 2003 @12:40PM (#7548477) Homepage

    Law journals have tried to tried to cope with the proper weight of authority to grant web pages by trying to follow the Blue Book [legalbluebook.com], a citation manual.

    The general rule has been that whenever you can find something in print, cite to that, but add an internet cite when either it is available and would make it easier to find, or if it is only available online.

    Things that are only available online are surprisingly common in citation. The leading court reporter services (WestLaw and Lexis Nexis) both have cases that aren't "officially" printed, but are available online.

    Also, many journal articles will cite to web pages such as a company's official description or press releases.

    In general, these citations are treated for their functional purpose and not their form of media -- online cases are grouped (last) with other cases, and information from most web site is considered a pamphlet or other unofficial publication.

    This system seems to deal with the fact that they are ephemera pretty well. The citations really are only used to make a point that is merely illustrative or is easily accessible to legal practitioners.

  • by cquark ( 246669 ) on Monday November 24, 2003 @02:13PM (#7549283)
    The idea is that you generally have to cite peer-reviewed, published and presented articles; criteria which the majority of web published material simply does not satisfy.

    While it's obvious that not every URL is appropriate for a research paper, papers in high energy physics have used URL-references to preprints at arxiv [arxiv.org] since 1991. It's not surprising to see some less technical fields like anthropology further behind in understanding and using the technology, and high energy physics has a particular advantage in that the web was originally created for disseminating information in that field.

    People interested in the evolution of an electronic knowledge architecture that's gradually replacing the print one in some scientific fields will likely find the articles Creating a global knowledge network [arxiv.org] and Can Peer Review be better Focused? [arxiv.org] interesting. Both are by Paul Ginsparg, who started the preprint archive 12 years ago at LANL.

    It's also worth noting that free, public access to preprints has democratized physics research, as all researchers have access to timely information instead of only a few who had the right connections to get early copies of preprints before 1991. It also provides affordable access to physics articles to researchers at institutions whose libraries can't afford the 5-figure subscription fees of many modern scientific journals.

  • by ihummel ( 154369 ) <ihummel.gmail@com> on Monday November 24, 2003 @02:59PM (#7549696)
    archive.org provides an essential service to counteract the short lifespan of the typical webpage. It also allows for permanent links to webpages that might be gone soon. I personally think that academia should either pour money into archive.org or create their own specialized archive for academic websites.

    In the later case, the service would archive sites of scholarly interest on its own and it would have a feature that would allow someone writing an academic paper to request that a particular page be archives. The page that he references in his work would be a http://academicarchive.org page, not the original.

"Spock, did you see the looks on their faces?" "Yes, Captain, a sort of vacant contentment."

Working...