Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Education Science

Competition Seeks Best Approaches To Detecting Plagiarism 289

marpot writes "Does your school/university check your homeworks/theses for plagiarism? Nowadays, probably Yes, but are they doing it properly? Little is known about plagiarism detection accuracy, which is why we conduct a competition on plagiarism detection, sponsored by Yahoo! We have set up a corpus of artificial plagiarism which contains plagiarism with varying degrees of obfuscation, and translation plagiarism from Spanish or German source documents. A random plagiarist was employed who attempts to obfuscate his plagiarism with random sequences of text operations, e.g., shuffling, deleting, inserting, or replacing a word. Translated plagiarism is created using machine translation."
This discussion has been archived. No new comments can be posted.

Competition Seeks Best Approaches To Detecting Plagiarism

Comments Filter:
  • by Erwos ( 553607 ) on Tuesday April 28, 2009 @11:39AM (#27746365)

    The tools are fairly good, but, in my experience, they'll always report 3-7% or so of your paper as plagiarized, just because it's pretty difficult to write about _anything_ without unknowingly using previously written words. I would _hope_ that anyone who would pursue disciplinary action from such a tool's results would at least take a look to see if the sections being flagged are consequential.

    I have no idea how good they are with catching paraphrasing, though... it strikes me that the semi-intelligent plagiarizers would be doing that more than a straight copy and paste. There's also the "acceptable vs unacceptable" distinction to be made.

  • by llamapater ( 1542875 ) on Tuesday April 28, 2009 @11:49AM (#27746481)
    It's a monkeys on a typewriter thing. these companies add papers to there database as they compare them. If you feed enough papers into a database eventually they will all come back plagiarized there are not an infinite number of possible term papers there are only so many things that could be written for a topic that make sense, and most English teachers recycle topics. why English departments buy into this I don't understand let it go for long enough(it would only take another decade or two at most) and you will start getting people who didn't even know they were plagiarizing getting kicked out of college, I'm not talking about improper citations I'm talking about guy in Washington has the same idea as a guy in New York 20 years later. I'm not a lawyer, so i don't know if this is possible, but couldn't they copyright these databases in some form or render them proprietary. If they did that there business model could change to just collecting royalties.
  • by russotto ( 537200 ) on Tuesday April 28, 2009 @11:55AM (#27746557) Journal

    I once was on a Fido forum with someone who would often write responses nearly word-for-word identical to mine. It was uncanny; I'd see his post and recognize my own writing, only to realize it wasn't mine. Timestamps would sometimes show my post was written first, sometimes his. I imagine some others on the forum thought at least one of us was a sock puppet, but neither of us was.

    (If he's on slashdot, he's probably composing a post just like this one)

    That probably happens rarely. But build a big enough database, and it will happen often. Particularly given the restricted problem domains in undergraduate papers. It's not just a computer problem; even humans will think "plagiarism" when they see two papers with similar ideas and similar turns of phrase. Which I think demonstrates that plagiarism cannot be established satisfactorily merely by showing similarity between papers.

  • by mathx314 ( 1365325 ) on Tuesday April 28, 2009 @11:57AM (#27746587)
    It can be substantially higher than that as well. In high school I wrote a five page paper about A Tale of Two Cities, with a few lengthy quotes, being a book by Dickens. Since it wasn't a terribly long paper and I had length quotes, I got somewhere around 20% plagiarized. Fortunately my teacher was smart enough to check before accusing me, but I remember hearing some talk from a later English teacher that the department was considering a 10% cutoff, above which you received disciplinary action regardless of the circumstances.
  • by BillCable ( 1464383 ) on Tuesday April 28, 2009 @11:59AM (#27746603)
    My wife teaches for Phoenix. Probably 90% of the plagiarism she sees is from students copying and pasting whole papers word-for-word from random cheat sites. Occasionally she'll get someone who fails to properly quote sources, but that's very much the minority. For the most part, the cheaters aren't all that bright, nor do they try to hide their cheating. They're just hoping they get away with it.
  • by cpu_fusion ( 705735 ) on Tuesday April 28, 2009 @11:59AM (#27746619)

    Plagiarism is a symptom of professors only being involved in the last step: reviewing the final product.

    Require the students to submit multiple drafts. Meet with them for 15 minutes each and discuss their thought processes on the ongoing paper. You'll get better final products, teach people not to procrastinate, and smoke-out people who have no involvement in their "own work."

    What, can't do that because you have 60 students in a class? Well, there's part of the problem too.

    We're trying to find a technology solution to a problem with less student-teacher interaction. Typical!

  • by Areyoukiddingme ( 1289470 ) on Tuesday April 28, 2009 @12:03PM (#27746667)

    Seriously, the humanities are in trouble. With over 6 billion people on the planet, it's extremely difficult to have an original thought. This sets the stage for endless repetition. Add to that the fact that the very process of teaching the humanities usually means imparting a teacher's single interpretation of the source material to the students who then do the natural thing when it comes to writing a paper and parrot back to the teacher what they've heard, knowing that's the only way to get a good grade, and the resulting combination is deadly.

    The papers are all going to be similar from the beginning, because it's a rare instructor who actually encourages dissenting opinions (and that fault in teaching is a whole other discussion of its own). Then the papers are going to be similar because there really are only so many ways to interpret the source material that are defensible. And finally, the papers are heavily likely to be similar to at least one other paper written about the subject, when every paper ever written on the subject is considered (exactly what the plagiarism sites attempt to do).

    I think the problem this competition is trying to solve is intractable in the face of the current educational system. It's gotten to the point where, if the software considers a large enough number of sources, even the instructor's own papers are going to look like plagiarism.

    Hell, look at the Slashdot comment system. A million people read the front page, but only a few thousand post comments. Thousands more are content to simply moderate the comments, and face it, comments they agree with are more likely to be modded up, one way or another. Then compare the modded comments. We get a lot of duplicate or near duplicate thought, and hence near duplicate comments on every article. Why? Because when you get enough people together in one place, discussing the same subject in writing, there are only so many viewpoints and only so many comments that won't get modded down for being of the "cubic what?" variety.

    Time to go back to grading on spelling and grammar. We've reached the end of the grading on ideas road. Coherency of presentation is all we have left. (One could argue it's all we ever had.)

  • Uh, Use Google? (Score:2, Interesting)

    by chainLynx ( 939076 ) on Tuesday April 28, 2009 @12:04PM (#27746671) Homepage
    Here's a good article explaining how Google makes plagiarism detection easy: http://questioncopyright.org/node/4 [questioncopyright.org] There was a story a couple years ago about one of these plagiarism detection services, Turnitin, getting sued for copyright infringement... does anyone know if that went anywhere? http://education.zdnet.com/?p=953 [zdnet.com]
  • by SerpentMage ( 13390 ) on Tuesday April 28, 2009 @12:20PM (#27746917)

    I think that this is very dangerous...

    Let me tell you about a situation. I was a speaker until recently. And around 98 I was giving a talk on technology X. Another speaker who was from the company who created the technology also gave a talk on technology X. Me and this other speaker knew each other, but we did not converse.

    Oddly our two talks were VERY VERY similar. He in a private manner accused me of copying his slide deck. Since he was a more well known speaker and I a newbie it seemed all logical.

    It was only when a good friend of mine who also worked at the company jumped in and said, "Naa, he would not do that."

    Then when my good friend came later to talk to me he asked, "you did not copy, right?"

    Answer was a definite NO! I did not copy. We just happened to be thinking along the same lines and came up with a VERY VERY similar slide deck.

    In other words a fluke! And this is why I hate statistics and numbers without a thought behind it.

  • by jcohen ( 131471 ) * on Tuesday April 28, 2009 @12:24PM (#27746967) Homepage

    I realize that plagiarism detection represents an interesting problem in computer science, and that it goes some distance toweard solving a serious problem. However, I read an article [chronicle.com] in the Chronicle of Higher Education, behind a paywall, alas, which leads me to believe that it is only a partial solution to academic dishonesty. The article suggested that, thanks to the Internet, the costs of human capital are now so low that hiring a ghostwriter to compose one's papers, sidestepping the problem of plagiarism to begin with, is far more expedient than plagiarism itself. It described a Russian-"businessman"-headed network of Filipino paper-writers, most paid between $1 and $3 a page, who are able to market their services to the West through a web site [bestessays.com] and remote call centers. At $20/page to the end-user, with no possibility of plagiarism detection, I think that most desperate students would find this a good deal. In my opinion, ghostwriting will supplant plagiarism as time goes on.

    What is a teacher to do? In-class writing samples would seem to be the only hope of detecting ghostwriting. Students could, of course, argue that at home, they can "polish" their papers, and that therefore they will not resemble the in-class samples. Moreover, checking samples against papers is a thankless and time-consuming task which is only a preliminary to actually evaluating the work. Perhaps there is a computer-based solution to this, but, in the meantime, perhaps potential ghostwriting customers could take their desires to their logical conclusion, and simply buy their degrees on the Internet directly.

  • My Dissertation (Score:3, Interesting)

    by Kryis ( 947024 ) on Tuesday April 28, 2009 @12:34PM (#27747113)
    The Computer Science department at my uni routinely scans final year dissertations using automated software. Mine was flagged up as "possibly plagiarised"; a significant amount of content could be found elsewhere on the web (can't remember the exact percentage).

    My project supervisor said when he got the email from the system saying it came back positive he was very surprised - given the small amount of research in the area (there are only 5 or 6 papers on the same topic that I am aware of), and no other research on that exact method of solving the problem .

    When I found this out I was more than a little worried - I wasn't aware of copying any other work . It turns out that it had picked up on stupid stuff, like the boilerplate at the beginning of the dissertation, or phrases like "In conclusion,", and nothing longer than 3 or 4 words in any paragraph.

    This sort of plagiarism detection that detects word shuffling is fine for people that REALLY don't have a clue (i.e. the ones that forget to change the @author javadoc tag when copying their friends Java coursework), but it would still be relatively trivial to change enough words in a sentence to fool the system.
  • by Urza9814 ( 883915 ) on Tuesday April 28, 2009 @12:35PM (#27747123)

    Here's the interesting thing: I have a professor that uses such tools (specifically TurnItIn.com), and I submitted a paper not too long ago and was told it was 6% plagiarized. No big deal. My prof said 20-30% would be allowable, as like you said, it's hard to write anything without seemingly plagiarizing. But the problem is this: I didn't cheat, I never saw anyone else's papers, and nobody ever saw mine. Yet a week later, after everyone else had sumbitted, suddenly I was at 23% plagiarized. Now, my professor didn't make any mention of it, but this raises a question about such services - they provide no way to see what the percentage was when submitted, only what it is now. And depending on how specific the prompt was that is being submitted, your percentage plagiarized can increase dramatically from other students submitting their own responses.

    Oh, and of course the reason nobody should act on such tools alone - I have yet to see one that can will determine if a source has been cited or not. That doesn't mean there aren't ones that do that out there - I would be surprised if there weren't - but with TurnItIn as my example again, if I made heavy use of attributed quotes in my paper, I may start off with 20%+ plagiarized. And after everyone else submits I may even break 50%. Even without plagiarizing a single sentence. Anyone who is stupid enough to rely entirely on the score some program gives has no place in education.

  • by kcdoodle ( 754976 ) on Tuesday April 28, 2009 @12:37PM (#27747155)
    If you have graded more than 2 assignments in your life, and really read each and every paper, and provided good critical feedback, then it is really easy to spot a plagiarized paper.

    Also, a grader usually knows the subject matter and has read many other good and bad works on the subject. You can get a feel for a person's writing style and depth of knowledge on a subject in just a few sentences. Then when you "smell something fishy", then it usually is.

    So far, whenever I "smell something fishy" I try to find the best sentence near the fishiness and paste it into Google. Plagiarists are not going to rewrite every sentence, if they do, then they probably learned something anyway. No, plagiarists are just lazy and in a hurry and deep down they know they deserve to be caught.
  • Wrong Problem (Score:3, Interesting)

    by green1 ( 322787 ) on Tuesday April 28, 2009 @12:45PM (#27747219)

    They are trying to invalidate plagarism detection software by proving that you can still manage to plagarise in a way it won't detect (false negative). The thing is, this isn't the problem with plagarism software, the real problem is where it detects plagarism when none in fact took place (false positive). This will happen in a few ways:

    1) There have been several highly publicized incidents where students have been in big trouble for plagarising their own work. This is ludicrous, they wrote it in the first place!

    2) A large enough database of phrases, paragraphs, etc. will eventually encompass the majority of ways of phrasing a particular idea, therefore when discussing an existing idea the odds of saying something that has been said before will eventually approach certainty.
    Now this wouldn't necessarilly apply if you were inventing a whole new concept, but in most classes that is not what you are being asked to do, instead you are asked to research how something has already been done. There is bound to be duplication here, especially as the database grows. This doesn't mean you plagarised something, merely that someone else has worded something similarily in the past. (For it to be plagarism you would have had to have seen and copied that earlier work, in this case you may not even know about it.)

  • by HikingStick ( 878216 ) <z01riemer@hotmaH ... minus herbivore> on Tuesday April 28, 2009 @02:09PM (#27748341)
    My problem with automatic checks is that there is always a chance that someone's seemingly original thought may actually reflect thoughts someone else already may have put down on paper. I remember being accused of copying someone else's work once. It was in the early '80s when the Internet as we know it was not part of general public awareness. When the instructor interrogated me on the sentence (one sentence in a paper at least five pages long), he insisted I copied it from some specific book or article. I had absolutely no clue what the guy was talking about. At the time, all I ever read was Fred Saberhagen, Tolkein, Piers Anthony, and Terry Brooks. After what seemed like forever (it was probably no more than ten minutes), he finally realized that I had no clue about his source, and I'm guessing he realized that whatever I wrote matched the way I wrote the rest of the paper and the way I used the spoken word. Yech! I haven't thought about that situation in a long time. I must be getting old.
  • by Archangel Michael ( 180766 ) on Tuesday April 28, 2009 @02:14PM (#27748409) Journal

    In my experience many professors (too many) basically ask for Plagiarized papers. There are a variety of reasons why they ask for plagiarized papers, mostly having to do with either laziness or wanting a particular view point regurgitated.

    People in my generation had Cliff Notes which one could spew forth a regurgitated version from Cliff Notes and get an A, while those with original thoughts would be graded much more harshly.

    I once researched a paper where it was fully documented with sources and such, all my own research and writing and got an D- on it. I finished the class basically plagiarizing my roommates papers from the year before and scoring A's and B's on all the other papers.

    What is the point of using your brain when it is punished?

    Not all Professors are like this, but too many are.

  • by DudeTheMath ( 522264 ) on Tuesday April 28, 2009 @02:15PM (#27748439) Homepage

    The important point when "incorporating research done by someone else into your own" (as BrokenHalo mentions--see, I'm citing my quotation!) is to cite (not necessarily quote) the other someone. This is how I got my A's in college and HS. Failing to do so is plagiarism.

    If you use someone else's idea, you cite it ("Hey, someone else thought of this before me."). That's it. If you use someone else's words, you quote it ("Someone else said 'exactly this'.").

    If you don't use someone's exact words, it makes it harder to spot and/or prove plagiarism, but it doesn't mean you can't be caught. And the brain is an amazing thing: You'd be surprised how often that clever phrase you write two or three days later, regardless of intervening beer, is a nearly exact quote.

  • by Zerth ( 26112 ) on Tuesday April 28, 2009 @03:01PM (#27749159)

    I think it might be an april fool's day prank, but I found this: http://csma31.csm.jmu.edu/physics/rudmin/titan/titan.htm [jmu.edu]

  • by Anonymous Coward on Wednesday April 29, 2009 @10:37AM (#27759197)

    The students cannot fake it, if the teacher cares about them learning.

    Well, that sums it up fairly well.

    These "plagiarism detection" tools are only needed because the teachers don't (whether it's can't or won't) monitor and teach their students as individuals. After all, who really cares where they got their cribs/quotes - if they actually learned the stuff in the process.

Beware of Programmers who carry screwdrivers. -- Leonard Brandwein

Working...