Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Education Science

Competition Seeks Best Approaches To Detecting Plagiarism 289

marpot writes "Does your school/university check your homeworks/theses for plagiarism? Nowadays, probably Yes, but are they doing it properly? Little is known about plagiarism detection accuracy, which is why we conduct a competition on plagiarism detection, sponsored by Yahoo! We have set up a corpus of artificial plagiarism which contains plagiarism with varying degrees of obfuscation, and translation plagiarism from Spanish or German source documents. A random plagiarist was employed who attempts to obfuscate his plagiarism with random sequences of text operations, e.g., shuffling, deleting, inserting, or replacing a word. Translated plagiarism is created using machine translation."
This discussion has been archived. No new comments can be posted.

Competition Seeks Best Approaches To Detecting Plagiarism

Comments Filter:
  • Hmmm.... (Score:1, Insightful)

    by Anonymous Coward on Tuesday April 28, 2009 @11:27AM (#27746177)

    Given that many, many teachers give out broadly similar assignments all over the country, how many years it will be until most possible ways of talking, say, of what Dante meant in a certain canto in the Inferno, will be in the database and will make it impossible to write a paper without being suspected of plagiarizing? Especially if the system runs with a very low threshold (say, 3-4 words in a row that are the same = plagiarizing)

    It would really be interesting if all the published books on one particular subject (again, say, the Divine Comedy) were submitted to this service and a check was run about just how much 'plagiarizing' and 'original thinking' there is going around...

  • Plausible test? (Score:5, Insightful)

    by fuzzyfuzzyfungus ( 1223518 ) on Tuesday April 28, 2009 @11:28AM (#27746193) Journal
    Now, I understand that plagiarism is common among the weakest of undergrad writers; but "machine translation from Spanish or German source documents" and "random text operations" seem like unrealistic experimental stimuli.

    In order to be a success, a plagiarized paper has to survive scrutiny by automated systems, if any are deployed, and human graders, if any are paying attention. Machine translation and text mangling should trivially defeat automated systems, at least any that aren't cranked well into World o' false positives territory; but would they pass human scrutiny? Even if they did, handing in something produced by machine translation and text mangling would probably earn you a referral to "Remedial English 101 For Life".
  • by DingerX ( 847589 ) on Tuesday April 28, 2009 @11:39AM (#27746363) Journal
    A plagiarised paper just smells bad, and is characterized by shifts in voices and writing styles, sudden ignorance of the the critical points raised earlier. The same author who can't write a grammatically correct sentence one moment is throwing down complex constructions the next The harder part is identifying the source of the plagiarism. For undergraduate papers, even the harder part is trivial. After all, the point of plagiarism is that the author is too lazy to write anything original.

    For academics (professors), the situation isn't all that different. Plagiarism is usually a mix of stupidity, laziness and pressure to get stuff done. It usually happens where big, popularizing authors try to rip off the obscure ones (go back twenty years a la Mr. Ambrose, or pick something in a different language, preferably Italian), or when someone needs a book in an obscure field, and tries to pirate something really obscure.

    Even so, if a plagiarist has enemies who give a damn, they can find the source fairly fast. So why construct a test for the most obfuscated cases, when a plagiarist clever enough to obfuscate could simply write something original and sufficiently clever?
  • by eln ( 21727 ) on Tuesday April 28, 2009 @11:41AM (#27746375)

    That sort of thing is just unfair. In my opinion, plagiarism is indeed a heinous crime in an academic setting because it goes against everything the pursuit of academics is supposed to be about. Given that, the punishment should be severe.

    However, since the punishment for plagiarism should be severe, there should be great care to investigate it properly. If you can show a preponderance of evidence that not only is a paper plagiarized, but you can accurately identify the source(s) from which each plagiarized section of it was copied, then the student should be expelled after the first offense. If you can't come up with that evidence, though, you should not be punishing the student.

    I thought professors had legions of grad students to ferret this sort of thing out, why do they need these programs? Trusting a decision that could permanently impact a student's entire life to a computer program seems careless and dangerous.

  • by MarkvW ( 1037596 ) on Tuesday April 28, 2009 @12:02PM (#27746649)

    Law enforcement uses automated fingerprint detection to identify possible matches. It never claims a match based on the computer.

    Using a program as the sole plagiarism judge and jury is profoundly unfair. If a university wants to discipline a student for a plagiarism hit, then it needs to obtain the source document--and pay the source document's creator if necessary to obtain it.

    Confronting the student with the alleged source gives the student a fair chance to defend himself/herself.

  • by Deagol ( 323173 ) on Tuesday April 28, 2009 @12:03PM (#27746669) Homepage

    I think that the objection here comes from the lack of transparency of the product being used. You input a paper, and you get a percentage answer. You're not given a list of papers/sources that registered a match (it would seem, anyway -- I don't know), thus you cannot verify the claims of the machine. Of course, being proprietary systems, I highly doubt that the vendor will allow inspection of the methods of detection or the database.

    The point is, that 35% means *nothing* useful without the exact context it was generated in.

    As we've seen with black-box voting machines, block-box web filters, and black-box breathalyzers, I suspect we'll see many lawsuits about black-box plagiarism detectors. After all, such a program can adversely affect one's long-term future, so the system better damned well be transparent and close to infallible (at least as much as the human-based method of detection).

  • by El_Muerte_TDS ( 592157 ) on Tuesday April 28, 2009 @12:06PM (#27746703) Homepage

    For the most part, the cheaters aren't all that bright, nor do they try to hide their cheating.

    How would you know? The best cheaters won't be caught, but that doesn't mean they're not cheaters.

  • by Colonel Korn ( 1258968 ) on Tuesday April 28, 2009 @12:12PM (#27746791)

    Plagiarism is a symptom of professors only being involved in the last step: reviewing the final product.

    Require the students to submit multiple drafts. Meet with them for 15 minutes each and discuss their thought processes on the ongoing paper. You'll get better final products, teach people not to procrastinate, and smoke-out people who have no involvement in their "own work."

    What, can't do that because you have 60 students in a class? Well, there's part of the problem too.

    We're trying to find a technology solution to a problem with less student-teacher interaction. Typical!

    I never taught a class involving humanities paper writing (in the science classes I taught, I could detect borrowed work by asking our kids to explain the calculations in their presentations and reports), but my wife meets with students several at least once after they turn in a required outline and bibliography to her. The bibliography, meeting, and my wife's extensive knowledge of scholarship in her field have made plagiarism rare and very obvious. Also, they make the students write vastly better papers and learn a lot more. Even having students meet with a TA to discuss paper ideas and progress is a huge help, and required outlines, drafts, and (especially) bibliographies should be part of the writing process in every lower level undergrad class. In upper level classes, the meeting is sufficient.

  • by johnsonav ( 1098915 ) on Tuesday April 28, 2009 @12:13PM (#27746813) Journal

    The best cheaters won't be caught, but that doesn't mean they're not cheaters.

    Sufficiently advanced cheating is indistinguishable from original work.

    How can you know that everyone isn't cheating? Do you give up? Or, try and pick the low-hanging fruit?

  • by Animats ( 122034 ) on Tuesday April 28, 2009 @12:14PM (#27746823) Homepage

    This is a useful mechanism for search engines, which need to distinguish original content from hundreds or thousands of blogs echoing it. Imagine the Web with all the duplicate, repetitive material ignored. No wonder Yahoo is supporting this. Someone over there is thinking.

  • by bcrowell ( 177657 ) on Tuesday April 28, 2009 @12:14PM (#27746837) Homepage

    In my opinion, plagiarism is indeed a heinous crime in an academic setting because it goes against everything the pursuit of academics is supposed to be about. Given that, the punishment should be severe. [...] the student should be expelled after the first offense

    I teach physics at a community college, and although I don't assign the kind of term papers you'd see in an English course, I do grade homework, lab writeups, and exams, and plagiarism is an issue that comes up. My school's policy is that the only punishment the professor can give for cheating is to assign a zero on that particular assignment. This is, in my opinion, almost no punishment at all; typically the reason people cheat is because they know they're going to fail, so assigning an F isn't a punishment, it's more like assigning the grade that the student actually earned. The school's administration tells us that this policy is the way it is because of a recent legal decision in California. Before this rule was imposed on us, my policy had been to give the student an F in the course if it was a serious case of cheating. In any case, my school, like most community colleges, has an extremely late drop deadline (the 14th week of the semester), so, e.g., if I give a student an F on an exam for cheating on the exam, the student will typically just drop the course, resulting in no penalty on his transcript other than a W, which will not affect his GPA.

    My school does provide a process where the professor can file a form to report academic misconduct. The form is then supposed to be followed up on by the dean, filed somewhere, and referred to later if the student shows a repeating pattern of cheating. Theoretically the student can be expelled, but never on the first offense. My experience is that this process doesn't actually seem to work, because the administrators involved aren't interested in spending the time and meeting with angry students. The threat hanging over the heads of the profs and deans is always that the parents will sue. Avoiding lawsuits is always the administration's top priority, far higher than education.

    The long and the short of it is that when a student makes a calculated decision to risk cheating, he's usually doing it based on a realistic assessment that the consequences of getting caught are extremely mild.

    However, since the punishment for plagiarism should be severe, there should be great care to investigate it properly. If you can show a preponderance of evidence that not only is a paper plagiarized, but you can accurately identify the source(s) from which each plagiarized section of it was copied, then the student should be expelled after the first offense. If you can't come up with that evidence, though, you should not be punishing the student.

    There is absolutely no way, at least at my school, that a student would ever be expelled for plagiarism. To get expelled, you would have to physically attack someone. You seem to be imagining a situation in which the professor and/or the school punishes the student just because a particular piece of software flashes a message on the screen saying "plagiarized." I can't believe that anyone would ever do that. Of course you're going to look at the text that matched, and see whether you really believe that it looks like it was plagiarized.

    I thought professors had legions of grad students to ferret this sort of thing out, why do they need these programs?

    No, most professors do not have grad students to do this. I work at a community college. No grad students. My wife teaches at Cal State LA. They have grad students, but the grad students don't work as TAs or graders; the professors have to grade 100% of the written work.

    Trusting a decision that could permanently impact a student's entire life to a computer program seems careless and dangerous.

    I don't think anyone does trust such a decision to a program. They use the program as a first step.

  • by Ralph Spoilsport ( 673134 ) on Tuesday April 28, 2009 @12:47PM (#27747257) Journal
    When you've got Markov Generators? [doctornerve.org]

    And the Postmodernism Generator? [elsewhere.org]

    You don't have to write much of anything at all. Would you get a good grade? Fuck no. Would they FLUNK YOU FOR IT? Fuck no. Because its graded by untenured faculty who have to curry favour with students, or its graded by Grad Assistants who don't give a shit, and why should they.

    Oh, look, a paper by Cindy Bleethstain. She's a fucking idiot. Let's see. Hmmmm. Yup. Incomprehensible bullshit, as usual. Give her a C+ because some of it is intelligible and kind of funny.

    Oh, look another paper by Guido LeDouchebag. Bottlecaps are smarter than this turnip. Hmmm. Yup. More incomprehensible bullshit. C+. At least he finally discovered the spellchecker.

    THAT'S what it is often like, unfortunately.

    I read the paper, and if there is a passage that is noticeably different in tone, I'll copy past a section into Google and see where they pulled it. 9 times out of 10, it's a direct lift from a web page, unattributed. I send it back, and tell them "Footnotes, please. Also, automatic single grade loss. right off the top."

    If it comes back still broken, then I nail 'em for plagiarism. It's a big deal, and requires paperwork I don't like to fill out...

    So far I've only had one student have the cajones to not bother fixing their attributions, and he got crucified by the Ethics board. He was an arrogant little prick, too.

    RS

  • by Idiomatick ( 976696 ) on Tuesday April 28, 2009 @12:56PM (#27747379)
    That is the goal. Culling the crappy cheaters is the same as culling the crappy students. So long as you are failing a high enough quota you will ensure a high enough quality of students make it through. This isn't new at all. Just make it so that to cheat and succeed it requires you to be as smart or smarter than someone doing the work legitimately.
  • by DangerFace ( 1315417 ) on Tuesday April 28, 2009 @01:05PM (#27747497) Journal

    I think this kind of hits the nail on the head. The problem with plagiarism detection is that if you're writing a paper on the Russo-German war of 1941, or classical conditioning, or yaddah yaddah yaddah, is that unless you have found some significant new information, which is highly doubtful, everything you write will have been written before. The purpose of writing these papers - in general, at least - isn't in order to educate the entire field but to show that you have the ability to put together a coherent piece of work.

    In this day and age plagiarism is a bit like cheatbot.exe. When you can subcontract your work out to Indian PhDs, and Turnitin.com make every piece of work handed in to them, ever, available for download for a small fee, the only possible defence against plagiarism is decent teachers working decent hours and getting to know their pupils well enough to recognise their work. Admittedly, that isn't exactly ironclad but it's the best method for teaching anyway and it's the best way to avoid false positives, which is a priority for me since I don't plagiarise.

  • by Anonymous Coward on Tuesday April 28, 2009 @02:18PM (#27748489)

    I think the worrying part is that Filipino essay-factory workers earning $1/page can churn out papers deemed "good enough" by *higher*-education professors.

    Not worrying for the people earning $1/page, but worrying for the people paying $10,000 a semester for an education that doesn't noticeably distance them in ability from the guy making $1/page.

  • by AliasMarlowe ( 1042386 ) on Tuesday April 28, 2009 @03:24PM (#27749441) Journal
    The students cannot fake it, if the teacher cares about them learning.

    Many many many moons ago, I was a Chem. Eng. grad student. This was before the internet existed, and before my beard had turned gray. One of my duties to pay my way was supervising a lab course for undergrads, and marking the students' lab reports (they were expected to produce about 20 pages per week just on this one lab course). I insisted on interviewing them individually on their reports, where they had to explain their results and conclusions. Nobody tried faking anything twice, because it was caught immediately; they had to read up and understand the background, or they were in deep shit. That class got the highest average mark ever in the year-end exam on the associated theory (the professor was pleasantly surprised).
  • by CodeBuster ( 516420 ) on Tuesday April 28, 2009 @04:30PM (#27750343)

    Perhaps I'm letting my engineering background run away with me

    I don't think so. I received my undergraduate degree in CS and I remember the ongoing feuds between the Humanities and the Sciences and especially the engineering disciplines (which includes CS at many universities) which tend to be the more pragmatic and practical ones among the scientists. The humanists would always dismiss us and our profession(s) as merely a necessary evil of modern society, considering their own bullshit musings to be the highest form of human development, the epitome of achievement, and what we would all be doing if we were somehow freed from the concerns of daily living and left with unlimited time to devote to philosophical discussion of the human condition. Scientists, and to a lesser extent engineers, tend to view the entire history of this planet and humans in general as merely temporary concerns in a temporary society and seek instead to understand the universe itself which existed before us and will probably be around long after our demise. This necessarily leads them into the study of mathematics which is really the antithesis of the humanities and causes some rather spectacular misunderstandings as the two opposite world views clash; but enough of this bullshit, I am beginning to sound like a humanist rather than an engineer.

THEGODDESSOFTHENETHASTWISTINGFINGERSANDHERVOICEISLIKEAJAVELININTHENIGHTDUDE

Working...