Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Science

Hoax-Detecting Software Spots Fake Papers 61

sciencehabit writes: In 2005, three computer science Ph.D. students at the Massachusetts Institute of Technology created a program to generate nonsensical computer science research papers. The goal was "to expose the lack of peer review at low-quality conferences that essentially scam researchers with publication and conference fees." The program — dubbed SCIgen — soon found users across the globe, and before long its automatically generated creations were being accepted by scientific conferences and published in purportedly peer-reviewed journals. But SCIgen may have finally met its match. Academic publisher Springer this week is releasing SciDetect, an open-source program to automatically detect automatically generated papers. SCIgen uses a "context-free grammar" to create word salad that looks like reasonable text from a distance but is easily spotted as nonsense by a human reader.
This discussion has been archived. No new comments can be posted.

Hoax-Detecting Software Spots Fake Papers

Comments Filter:
  • So? Surely, after coding this up, the first thing any scientist would do is scan, at the very least, all of arXiv, and see what comes out as fake? I mean I have seen my fair share of papers that might as well have been generated by SCIgen and the like.

    • Re:Results? (Score:5, Funny)

      by I'm not god any more ( 613402 ) on Friday March 27, 2015 @05:30PM (#49357831)
      1. The first thing SCIgen should do is to incorporate SciDetect, to make sure that their random papers pass the SciDetect test.
      2. SCIDetect should then improve their algorithms, and SCIgen should again take a snapshot of SciDetect source code and incorporate it.
      3. Run this loop a few times and what we'll have is some serious papers
      4. Profit!!!
      • by zerro ( 1820876 )

        Is there such thing as a Turing Race ?!

      • Well why not automate the process? SCIgen should just subscribe to the SciDetect source repo, and auto-update its copy when the trunk updates. SciDetect should then subscribe to the SCIgen source repo, and ensure that it detects any newly missed sets.

        Leave this system alone for a while, and we won't need to write articles anymore, as SCIgen should do a better job of producing insightful but unintelligible drivel than you'd get from any peer-reviewed journal -- and it would detect itself to boot!

      • Sorta like turnitin.com's business model. Require students to give you their content in order to get a grade, and scrape the web for text content. Sell lookups of newly submitted content against that content archive back to educational institutions. Then start up a pre-processing service for students to check their submissions against first before they submit to the teacher for a grade.

    • by Roger W Moore ( 538166 ) on Friday March 27, 2015 @05:34PM (#49357855) Journal
      arXiv is not peer reviewed. What I found interesting though was the response of the publisher: write a program to detect fake papers. Even the most simplistic peer review - i.e. reading the paper - would immediately catch these papers. If they need to write a program to catch fake papers then their peer review model is essentially worthless and frankly a journal that poor is no better, and liekly worse, than arXiv: at least arXiv doesn't pretend to have peer review.
    • Re:Results? (Score:4, Interesting)

      by phantomfive ( 622387 ) on Friday March 27, 2015 @05:40PM (#49357889) Journal
      Of all the problems you might find at arXiv, I don't think "auto-generated papers going undetected" is one of their problems.

      ArXiv's problem is recognizing when human-written, realistic sounding papers are actually BS.
      • ArXiv's problem is recognizing when human-written, realistic sounding papers are actually BS.

        Actually each ArXiv section has an editor who screens the papers, checking if they have reasonable content. And it unfortunately happens that legitimate papers are withheld for several weeks, and the ArXiV administration is not responding reliably to emails (being understaffed and having many submissions). So unfortunately, ArXiV is not just a pre-print server anymore where everyone can upload, but has turned into a intransparently half peer-reviewed journal, which scientists read every day.

    • Just because there's a way to scan papers (to help you trick the system) doesn't mean everyone is going to use it. The smart ones will, but that doesn't mean plenty of stupid people won't.

      If tool can't stop every bad guy doesn't mean it's useless. Even a professional will miss some. It's about reducing the numbers that get through.
  • Got it... (Score:4, Funny)

    by Anonymous Coward on Friday March 27, 2015 @05:15PM (#49357747)

    Software detecting papers written by software -- in the dark.

  • by Irate Engineer ( 2814313 ) on Friday March 27, 2015 @05:22PM (#49357789)

    Chicken chicken, (chicken) chicken?

    https://www.improbable.com/airchives/paperair/volume12/v12i5/chicken-12-5.pdf [improbable.com]

  • Evil tech? (Score:5, Interesting)

    by Anonymous Coward on Friday March 27, 2015 @05:26PM (#49357809)

    The purpose of the scam papers was to expose scam journals.
    The purpose of this new software seems to be to all scam journals to continue scammng.
    So it's an evil software, that should not have been developed, right?

    I mean, if you were doing actual peer review, none of this would pass even a half-sentient peer's inpection.

    • by Anonymous Coward

      Publishing houses have 1000's of "peer reviewed" journals to print. They don't have time or actual experts to read them, that is the job of the peers that buy the journal.

    • by pla ( 258480 )
      I mean, if you were doing actual peer review, none of this would pass even a half-sentient peer's inpection.

      This, so much this!

      Seriously - If I don't do my job and my boss catches me playing online poker all day, should I attach a response to my HR writeup explaining that I have addressed my deficiency by rearranging my cube to make it harder for others to see my screen???


      The problem here has nothing to do with people submitting fake papers, Springer. Rather, you need to stop hiring fake editors.
  • by Attila Dimedici ( 1036002 ) on Friday March 27, 2015 @05:31PM (#49357833)
    Springer reveals that they are not interested in fixing the problem revealed by SCIgen, they just want to prevent that software from demonstrating that they have not fixed it. They aren't going to change the review process to ensure that they no longer publish papers which are nonsense. No, they developed software to eliminate those papers which were generated by other software.
  • by Registered Coward v2 ( 447531 ) on Friday March 27, 2015 @05:40PM (#49357883)

    So a program designed to write fake papers to unmask sham journals and conferences gets used to write fake papers to prop up sham degrees? Some what ironic; although in fairness to the authors of the paper writing program they never intended it to be used in such a manner. It would seem, as Springer acknowledged, that they should do a good peer review; which would eliminate the need to run paper through a hoax detector unless they started getting so many fake papers that their peer review process was overwhelmed. In that case, a first run through a program would be justified. A more subtle point in the article is that claimed publications from some countries, such as China, should be viewed with suspicion.

    As a side note, the sham conference industry is interesting. I periodically get, via LinkedIn, invite stop attend an "important conference" and speak and get a "prestigious award" based on my "outstanding accomplishments and renowned expertise" in my field. Funny how, when I send them my speaking fee requirements they never get back to me nor mail me the award as I request if I am unable to make the conference.

    • by Anonymous Coward

      It would seem, as Springer acknowledged, that they should do a good peer review; which would eliminate the need to run paper through a hoax detector unless they started getting so many fake papers that their peer review process was overwhelmed. In that case, a first run through a program would be justified.

      Sorry, I don't buy it. It only takes what, 2 seconds or less for an actual human to detect a phoney paper like chicken chicken chicken. I don't care how "inconvenient" it is to Springer, if I am paying for a subscription to a peer reviewed magazine I expect the papers presented in that magazine to actually be peer reviewed.

  • "SCIgen uses a "context-free grammar" to create word salad that looks like reasonable text from a distance"

    This is great for students who have lazy professors. Write a good introduction on page 1, a good conclusion on page 52, and use SCIgen on pages 2-51.

  • Comment removed based on user account deletion
  • by aaaaaaargh! ( 1150173 ) on Friday March 27, 2015 @07:12PM (#49358369)

    What bothers me is that in the humanities there are whole communities and sub-disciplines in which there is barely any real peer reviewing. These are small niche areas in which everyone knows everyone and basically the whole research is based on invited contributions and papers that are not properly blind peer reviewed - they are cursorily scanned by colleagues who know who wrote the article. In such a field there are about 5-10 journals in total and the authors jump back and forth between them. Most of them are unable to publish articles in top journals of the discipline as a whole. I personally know professors who have built a whole career on the basis of quoting themselves and by doing light editorial work. I know a cross-disciplinary field of study in the humanities that is entirely dominated by two professors, all the rest are scholars of them, and each of them wrote around 40 books, always on the same topic, and all of them more or less repeating the same two pseudo-competing themes over and over.

    It's pretty sad to see these people recognized as experts when at the same time in other fields there is hard work and real progress.

  • Fron this I read that Springer instead of promoting measures to ensure real peer-review and avoid these scam conferences, actually builds a program that helps these scam conferences. Well done.
  • At least they have done something to warrant their publication costs. I figured the charges were just all going to the CEO, now we see that some very small part of them went to hire a CSci intern for a few weeks.
    • by Z00L00K ( 682162 )

      Raise the stakes and detect lying politicians.

      It may be easier to detect when they are speaking the truth however.

  • The biggest source of these fake papers appears to be phd papers. And given that we're producing more phds than ever before, maybe we should reform the way we do that. Because in requiring that they actually discover or examine something new the chances are that they're going to lie about something.

    If we had fewer phds maybe they wouldn't do that so much. But the issue is that there are so many papers that no one can read them. And that means trying to audit this stuff is impractical.

    The solution of having

  • The existence of this tool is admitting that these papers aren't peer reviewed. Wouldn't it be simpler to just admit that and stop committing fraud?

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...