Forgot your password?
typodupeerror
Science

Elsevier Opens Its Papers To Text-Mining 52

Posted by samzenpus
from the take-a-look dept.
ananyo writes "Publishing giant Elsevier says that it has now made it easy for scientists to extract facts and data computationally from its more than 11 million online research papers. Other publishers are likely to follow suit this year, lowering barriers to the computer-based research technique. But some scientists object that even as publishers roll out improved technical infrastructure and allow greater access, they are exerting tight legal controls over the way text-mining is done. Under the arrangements, announced on 26 January at the American Library Association conference in Las Vegas, Nevada, researchers at academic institutions can use Elsevier's online interface (API) to batch-download documents in computer-readable XML format. Elsevier has chosen to provisionally limit researchers to 10,000 articles per week. These can be freely mined — so long as the researchers, or their institutions, sign a legal agreement. The deal includes conditions: for instance, that researchers may publish the products of their text-mining work only under a license that restricts use to non-commercial purposes, can include only snippets (of up to 200 characters) of the original text, and must include links to original content."
This discussion has been archived. No new comments can be posted.

Elsevier Opens Its Papers To Text-Mining

Comments Filter:
  • Isn't this called search engine spamming, and several publishing outfits have been doing it for about a decade, with varying degree of success?

    • by c0lo (1497653)

      Isn't this called search engine spamming, and several publishing outfits have been doing it for about a decade, with varying degree of success?

      While it may be SEO spamming, I'm inclined to see this as an attempt to outsource the cost of indexing.:
      On the line of: "You fools, I have a trove of papers you are drooling for. What about... I'll let you index it by whatever your brilliant minds discover it works the best for you, then I'll use it to increase the value of my trove"

  • 1. Please generate as many sales leads as you can 2. Profit!!!
  • by Anonymous Coward on Monday February 03, 2014 @03:15PM (#46143227)

    Publishing giant Elsevier says that it has now made it easy for scientists to extract facts and data computationally from its more than 11 million online research papers. Other publishers are likely t

  • Get Watson over here will you?
  • If the Internet is killing newspapers, why isn't it killing this dead tree company?

    • by dj245 (732906) on Monday February 03, 2014 @03:30PM (#46143389) Homepage

      If the Internet is killing newspapers, why isn't it killing this dead tree company?

      When people stop buying newspapers, they fire the reporters and news correspondants.

      When people stop buying scientific journals (and electronic access to such), it doesn't matter. There are still hundreds of professors lined up around the block to try to get published, since it is basically required for them to earn tenure. Anytime you have a barrier to career advancement, the people who own that barrier have a near monopoly and can charge whatever the market will bear. And the market of people trying to advance their career will bear a lot.

    • by John Bokma (834313) on Monday February 03, 2014 @03:41PM (#46143503) Homepage

      Because news or "news" [1] can be gotten for free on the Internet while peer reviewed scientific papers is a bit harder. My experience is that quite some sites bait Google search results (see my earlier post; you google for pdfs but end up on a landing page which allows you to buy one time access for 30+ USD for a handful of pages). My successful workaround (so far) has been contacting one of the authors for a copy (for personal study).

      [1] a lot of people don't seem to care if it's made up or not

    • by Jane Q. Public (1010737) on Monday February 03, 2014 @03:41PM (#46143505)

      "If the Internet is killing newspapers, why isn't it killing this dead tree company?"

      It isn't a dead tree company, per se. Elsevier publishes as much online as offline. And more than most.

      Having said that: they can still die in a fire.

  • I like this bit from TFA:
    Shillum says that Elsevier is ahead of the curve — but that other publishers are likely to follow soon. CrossRef, a non-profit collaboration of thousands of scholarly publishers, will in the next few months launch a service that lets researchers agree to standard text-mining terms and conditions by clicking a button on a publisher’s website, a ‘one-click’ solution similar to Elsevier’s set-up.

    I would like to see that.

  • by DeadDecoy (877617) on Monday February 03, 2014 @03:31PM (#46143401)
    ... publishers removed the paywall to publicly funded literature, or at least made the prices more sane.

    Also, while we're on the topic of text mining, would it be possible to get text-only or xml-based articles, with figures attached and cross-references as needed? It's quite annoying to manually convert a pdf when trying to setup an automated analysis over several documents. I know one could setup a shell script to dump it out using the pdftoxml converter, but the output is a bit messy to parse.
    • It wouldn't be nicer. It would be the least they should possibly do.

      Publishers like Elsevier are leaches sucking at the teat of scientific institutions, weakening their libraries, which are the cornerstone of humanity's research efforts. The sooner they FOAD the better.

  • Soon...once the exclusive contracts and the End User LIcense Agreements expire, the users will revolt. It was foretold in the Scientific Prophecy of Rebirth.
  • that should have been pruned long ago.
  • by Anonymous Coward

    Haha, back in the 90's, I worked at a company that built some websites for Elsevier. The effort was overseen by a young Dutch woman who came to our offices and wanted to know why we didn't have orange juice and buns for her every morning.

    We designed a background image that looked great at normal viewing distances from the screen, but when seen from far away it looked like it really said "GReed-Elsevier". The sites went public, but we were made to change the background about a week after launch.

  • Acording to “Why you and I should NOT sign up for Elsevier’s TDM service“ [0], this is not all that good, as the Text and Data Mining policy is actually overly restrictive. Most notably, it forces you to go through their API to do the work, rather than parsing things locally at your leisure, and imposes conditions on the release of the uncovered data (namely a non-free CC-NC).

    [0] http://blogs.ch.cam.ac.uk/pmr/... [cam.ac.uk]

  • Note:

    If you have to sign or agree to something in order to access it, it's not free, even if they say otherwise.

    • Even a "Public Domain" copyrighted work has rules embedded in copyright law, which apply whether you agree or not. Games played entierly without rules get very strange, very quickly, and inevitably wind up with rules evolved very quickly and not necessarily well.

      Having the rules spelled out, in writing, is very helpful to let both sides know what _is_ allowed. This is often far better than the very confusing and potentially dangerous lawsuits involving what is _not_ allowed. Whether these agreements are rea

What is wanted is not the will to believe, but the will to find out, which is the exact opposite. -- Bertrand Russell, "Skeptical Essays", 1928

Working...