Hoax-Detecting Software Spots Fake Papers 61
sciencehabit writes: In 2005, three computer science Ph.D. students at the Massachusetts Institute of Technology created a program to generate nonsensical computer science research papers. The goal was "to expose the lack of peer review at low-quality conferences that essentially scam researchers with publication and conference fees." The program — dubbed SCIgen — soon found users across the globe, and before long its automatically generated creations were being accepted by scientific conferences and published in purportedly peer-reviewed journals. But SCIgen may have finally met its match. Academic publisher Springer this week is releasing SciDetect, an open-source program to automatically detect automatically generated papers. SCIgen uses a "context-free grammar" to create word salad that looks like reasonable text from a distance but is easily spotted as nonsense by a human reader.
Results? (Score:2)
So? Surely, after coding this up, the first thing any scientist would do is scan, at the very least, all of arXiv, and see what comes out as fake? I mean I have seen my fair share of papers that might as well have been generated by SCIgen and the like.
Re:Results? (Score:5, Funny)
2. SCIDetect should then improve their algorithms, and SCIgen should again take a snapshot of SciDetect source code and incorporate it.
3. Run this loop a few times and what we'll have is some serious papers
4. Profit!!!
Re: (Score:2)
Is there such thing as a Turing Race ?!
Re: (Score:2)
There is now!
Re: (Score:1)
Well why not automate the process? SCIgen should just subscribe to the SciDetect source repo, and auto-update its copy when the trunk updates. SciDetect should then subscribe to the SCIgen source repo, and ensure that it detects any newly missed sets.
Leave this system alone for a while, and we won't need to write articles anymore, as SCIgen should do a better job of producing insightful but unintelligible drivel than you'd get from any peer-reviewed journal -- and it would detect itself to boot!
Re: (Score:3)
Sorta like turnitin.com's business model. Require students to give you their content in order to get a grade, and scrape the web for text content. Sell lookups of newly submitted content against that content archive back to educational institutions. Then start up a pre-processing service for students to check their submissions against first before they submit to the teacher for a grade.
Interesting Response (Score:5, Insightful)
Re:Results? (Score:4, Interesting)
ArXiv's problem is recognizing when human-written, realistic sounding papers are actually BS.
Re: (Score:2)
ArXiv's problem is recognizing when human-written, realistic sounding papers are actually BS.
Actually each ArXiv section has an editor who screens the papers, checking if they have reasonable content. And it unfortunately happens that legitimate papers are withheld for several weeks, and the ArXiV administration is not responding reliably to emails (being understaffed and having many submissions). So unfortunately, ArXiV is not just a pre-print server anymore where everyone can upload, but has turned into a intransparently half peer-reviewed journal, which scientists read every day.
Re: (Score:2)
If tool can't stop every bad guy doesn't mean it's useless. Even a professional will miss some. It's about reducing the numbers that get through.
Got it... (Score:4, Funny)
Software detecting papers written by software -- in the dark.
Chicken chicken Chicken? (Score:4, Funny)
Chicken chicken, (chicken) chicken?
https://www.improbable.com/airchives/paperair/volume12/v12i5/chicken-12-5.pdf [improbable.com]
Re: (Score:1)
Don't forget the author's presentation on this article:
https://www.youtube.com/watch?v=yL_-1d9OSdk
Re: (Score:2)
No, I think it was Buffalo, and not Chicken: http://en.wikipedia.org/wiki/B... [wikipedia.org]
Re: (Score:2)
Let us not forget about the issues with mailing lists either.
http://www.scs.stanford.edu/~d... [stanford.edu]
http://www.vox.com/2014/11/21/... [vox.com]
Evil tech? (Score:5, Interesting)
The purpose of the scam papers was to expose scam journals.
The purpose of this new software seems to be to all scam journals to continue scammng.
So it's an evil software, that should not have been developed, right?
I mean, if you were doing actual peer review, none of this would pass even a half-sentient peer's inpection.
How naive.... (Score:1)
Publishing houses have 1000's of "peer reviewed" journals to print. They don't have time or actual experts to read them, that is the job of the peers that buy the journal.
Re: (Score:3)
This, so much this!
Seriously - If I don't do my job and my boss catches me playing online poker all day, should I attach a response to my HR writeup explaining that I have addressed my deficiency by rearranging my cube to make it harder for others to see my screen???
The problem here has nothing to do with people submitting fake papers, Springer. Rather, you need to stop hiring fake editors.
It is too much trouble to fix the problem (Score:5, Interesting)
Authentic Frontier Gibberish (Score:3)
So a program designed to write fake papers to unmask sham journals and conferences gets used to write fake papers to prop up sham degrees? Some what ironic; although in fairness to the authors of the paper writing program they never intended it to be used in such a manner. It would seem, as Springer acknowledged, that they should do a good peer review; which would eliminate the need to run paper through a hoax detector unless they started getting so many fake papers that their peer review process was overwhelmed. In that case, a first run through a program would be justified. A more subtle point in the article is that claimed publications from some countries, such as China, should be viewed with suspicion.
As a side note, the sham conference industry is interesting. I periodically get, via LinkedIn, invite stop attend an "important conference" and speak and get a "prestigious award" based on my "outstanding accomplishments and renowned expertise" in my field. Funny how, when I send them my speaking fee requirements they never get back to me nor mail me the award as I request if I am unable to make the conference.
Re: (Score:1)
It would seem, as Springer acknowledged, that they should do a good peer review; which would eliminate the need to run paper through a hoax detector unless they started getting so many fake papers that their peer review process was overwhelmed. In that case, a first run through a program would be justified.
Sorry, I don't buy it. It only takes what, 2 seconds or less for an actual human to detect a phoney paper like chicken chicken chicken. I don't care how "inconvenient" it is to Springer, if I am paying for a subscription to a peer reviewed magazine I expect the papers presented in that magazine to actually be peer reviewed.
Lazy professors (Score:2)
"SCIgen uses a "context-free grammar" to create word salad that looks like reasonable text from a distance"
This is great for students who have lazy professors. Write a good introduction on page 1, a good conclusion on page 52, and use SCIgen on pages 2-51.
Re: (Score:2)
I don't care about hoax papers (Score:3)
What bothers me is that in the humanities there are whole communities and sub-disciplines in which there is barely any real peer reviewing. These are small niche areas in which everyone knows everyone and basically the whole research is based on invited contributions and papers that are not properly blind peer reviewed - they are cursorily scanned by colleagues who know who wrote the article. In such a field there are about 5-10 journals in total and the authors jump back and forth between them. Most of them are unable to publish articles in top journals of the discipline as a whole. I personally know professors who have built a whole career on the basis of quoting themselves and by doing light editorial work. I know a cross-disciplinary field of study in the humanities that is entirely dominated by two professors, all the rest are scholars of them, and each of them wrote around 40 books, always on the same topic, and all of them more or less repeating the same two pseudo-competing themes over and over.
It's pretty sad to see these people recognized as experts when at the same time in other fields there is hard work and real progress.
Well done Springer (Score:1)
Good for Springer (Score:2)
Re: (Score:2)
Raise the stakes and detect lying politicians.
It may be easier to detect when they are speaking the truth however.
Perhaps there should be fewer papers (Score:2)
The biggest source of these fake papers appears to be phd papers. And given that we're producing more phds than ever before, maybe we should reform the way we do that. Because in requiring that they actually discover or examine something new the chances are that they're going to lie about something.
If we had fewer phds maybe they wouldn't do that so much. But the issue is that there are so many papers that no one can read them. And that means trying to audit this stuff is impractical.
The solution of having
Re: (Score:1)
I neither hate science nor discount the effort that goes into earning such accreditations. I merely point out that the number of such people has increased radically and that doing things the way we did them in the 19th century might not be the best way to do them in the 21st century.
*crushes mental insect and moves on*
Re: (Score:1)
How does saying phd papers instead of thesis papers either suggest or prove that I hate science?
Either back off that position and apologize or you've been caught in a lie and we're done.
Re: (Score:1)
I looked for a point in there and found only sputtering insults... so, I win?
because when my opposition is reduced to making sputtering insults... that's game over.
You want to try again or is this Good Game?
Re: (Score:1)
At no point did I trivialize a PhD.
Increase the amount of rat poison in your daily diet.
What I was saying is that we are producing so many of them that the means of auditing them used in the 19th century might not be applicable in the 21st.
Seriously... rat poison.
Re: (Score:2)
In what way does my statement trivialize the process?
BE SPECIFIC. SAY "WHY".
Then you say something is 100 percent bullshit but don't say why that is either.
Absent "why" you have no argument and therefore your post is a NULL statement.
Why am I wrong?
Why do people have such a fundamental difficulty with making a rational statement? It is baffling to me.
Re: (Score:2)
I didn't trivialize anything.
You're pushing a strawman and I'm tired of indulging your deceit.
We're done.
Re: (Score:2)
It exists only in your mind.
I have nothing but respect for those that go through a PhD program and I have nothing but respect for the education and the disciplines involved... so long as the people involved in them have respect for them as well. There are examples of fraud and I have no respect for them.
The mere fact that I am arguing against you so strenuously here proves that you misunderstood my intentions. If I did feel that way, then I would agree with your position... right? And yet I don't... which m
Re: (Score:2)
Stupid insults from an Anonymous Coward? Shocking.
Re: (Score:2)
Okay, so you open with a silly attempt to browbeat me on the grounds that my posts are often not grammatically correct... on an internet forum.
And on the that basis you attempt to justify the statement that I am out of my depth in all issues... I mean, you say I don't proofread but you need to think over your arguments a bit more, sport. This crap is sad.
And then you say I am emotionally breaking down? On what basis? I assume your mind reading powers.
Your post was either logically unsustainable such as your
Re: (Score:2)
I actually do know how AC works. You chose to not use your fake name on the site because you're a weasel or too lazy to log on.
Peer review? (Score:2)