Forgot your password?
typodupeerror
Education Science

Competition Seeks Best Approaches To Detecting Plagiarism 289

Posted by timothy
from the upsetting-the-market-in-online-term-papers dept.
marpot writes "Does your school/university check your homeworks/theses for plagiarism? Nowadays, probably Yes, but are they doing it properly? Little is known about plagiarism detection accuracy, which is why we conduct a competition on plagiarism detection, sponsored by Yahoo! We have set up a corpus of artificial plagiarism which contains plagiarism with varying degrees of obfuscation, and translation plagiarism from Spanish or German source documents. A random plagiarist was employed who attempts to obfuscate his plagiarism with random sequences of text operations, e.g., shuffling, deleting, inserting, or replacing a word. Translated plagiarism is created using machine translation."
This discussion has been archived. No new comments can be posted.

Competition Seeks Best Approaches To Detecting Plagiarism

Comments Filter:
  • by telchine (719345) * on Tuesday April 28, 2009 @11:22AM (#27746101)

    Here's an insightful fact related to this article:

    Little is known about plagiarism detection accuracy

    • by gnick (1211984) on Tuesday April 28, 2009 @11:31AM (#27746241) Homepage

      But a lot of faith is put in it. I've got a friend that works at the University of Phoenix. We caught up not long ago and he was singing praises about how you just dump a paper into this tool he uses and it instantly tells you the exact percentage of plagiarism content in the student's paper. Too high == disciplinary action - Apparently without even bothering tracking sources or verifying specific plagiarized sections.

      Of course, this all came to me second hand - I've not used the tools myself.

      • by Erwos (553607) on Tuesday April 28, 2009 @11:39AM (#27746365)

        The tools are fairly good, but, in my experience, they'll always report 3-7% or so of your paper as plagiarized, just because it's pretty difficult to write about _anything_ without unknowingly using previously written words. I would _hope_ that anyone who would pursue disciplinary action from such a tool's results would at least take a look to see if the sections being flagged are consequential.

        I have no idea how good they are with catching paraphrasing, though... it strikes me that the semi-intelligent plagiarizers would be doing that more than a straight copy and paste. There's also the "acceptable vs unacceptable" distinction to be made.

        • Re: (Score:2, Interesting)

          by mathx314 (1365325)
          It can be substantially higher than that as well. In high school I wrote a five page paper about A Tale of Two Cities, with a few lengthy quotes, being a book by Dickens. Since it wasn't a terribly long paper and I had length quotes, I got somewhere around 20% plagiarized. Fortunately my teacher was smart enough to check before accusing me, but I remember hearing some talk from a later English teacher that the department was considering a 10% cutoff, above which you received disciplinary action regardles
          • by samcan (1349105) on Tuesday April 28, 2009 @12:23PM (#27746947)

            Forget 20%, I had a rough draft with as high as 61%! The particular service we used in high school was Turnitin.com, and a research paper I wrote for high school had an appendix with a copy of the 1805 Treaty of Tripoli (as a help for the teacher)...the website flagged that as 18% plagiarized, from some random Bell Atlantic user's website.

            Excluding that, the site would flag random sentences, and would flag part of a sentence as plagiarized, skip a word or two, and then say the rest of the sentence was plagiarized from the same source!

            An example is shown below (words in bold are supposedly plagiarized from one source, words in italics from another):

            Thus, the Founding Fathers wanted to create a government that was stable, and protected the rights of the people.

            Another example from a paper on the Russo-German war of 1941:

            They propose that German troops push all the way to the outskirts of Moscow, causing Joseph Stalin to abandon the city. While escaping, his train is destroyed by German planes, removing all signiïcant leadership to the Red Army.

            In another paper, when I quoted an article, I listed the title of the article in-text. Turnitin reported that the title of the article was plagiarism...of the article I was citing!

            Turnitin.com has "features" for excluding the quoted text, and excluding the bibliography, but as I use LaTeX, and like to use block quotes, the usefulness of these features are questionable.

            In my opinion, Turnitin.com is a joke.

            • Re: (Score:3, Insightful)

              by DangerFace (1315417)

              I think this kind of hits the nail on the head. The problem with plagiarism detection is that if you're writing a paper on the Russo-German war of 1941, or classical conditioning, or yaddah yaddah yaddah, is that unless you have found some significant new information, which is highly doubtful, everything you write will have been written before. The purpose of writing these papers - in general, at least - isn't in order to educate the entire field but to show that you have the ability to put together a coher

              • by cayenne8 (626475)
                "I think this kind of hits the nail on the head. The problem with plagiarism detection is that if you're writing a paper on the Russo-German war of 1941, or classical conditioning, or yaddah yaddah yaddah, is that unless you have found some significant new information, which is highly doubtful, everything you write will have been written before. The purpose of writing these papers - in general, at least - isn't in order to educate the entire field but to show that you have the ability to put together a cohe
                • by AliasMarlowe (1042386) on Tuesday April 28, 2009 @03:24PM (#27749441) Journal
                  The students cannot fake it, if the teacher cares about them learning.

                  Many many many moons ago, I was a Chem. Eng. grad student. This was before the internet existed, and before my beard had turned gray. One of my duties to pay my way was supervising a lab course for undergrads, and marking the students' lab reports (they were expected to produce about 20 pages per week just on this one lab course). I insisted on interviewing them individually on their reports, where they had to explain their results and conclusions. Nobody tried faking anything twice, because it was caught immediately; they had to read up and understand the background, or they were in deep shit. That class got the highest average mark ever in the year-end exam on the associated theory (the professor was pleasantly surprised).
        • by BillCable (1464383) on Tuesday April 28, 2009 @11:59AM (#27746603)
          My wife teaches for Phoenix. Probably 90% of the plagiarism she sees is from students copying and pasting whole papers word-for-word from random cheat sites. Occasionally she'll get someone who fails to properly quote sources, but that's very much the minority. For the most part, the cheaters aren't all that bright, nor do they try to hide their cheating. They're just hoping they get away with it.
          • by El_Muerte_TDS (592157) <.elmuerte. .at. .drunksnipers.com.> on Tuesday April 28, 2009 @12:06PM (#27746703) Homepage

            For the most part, the cheaters aren't all that bright, nor do they try to hide their cheating.

            How would you know? The best cheaters won't be caught, but that doesn't mean they're not cheaters.

            • by johnsonav (1098915) on Tuesday April 28, 2009 @12:13PM (#27746813) Journal

              The best cheaters won't be caught, but that doesn't mean they're not cheaters.

              Sufficiently advanced cheating is indistinguishable from original work.

              How can you know that everyone isn't cheating? Do you give up? Or, try and pick the low-hanging fruit?

              • Re: (Score:3, Insightful)

                by Idiomatick (976696)
                That is the goal. Culling the crappy cheaters is the same as culling the crappy students. So long as you are failing a high enough quota you will ensure a high enough quality of students make it through. This isn't new at all. Just make it so that to cheat and succeed it requires you to be as smart or smarter than someone doing the work legitimately.
              • by severoon (536737) on Tuesday April 28, 2009 @01:31PM (#27747807) Journal

                Sufficiently advanced cheating is called learning.

                Here's the proper way to cheat—it never failed me in university philosophy courses. Let's say you're supposed to read two or three of Nietzsche's works and write on the topic of Nietzsche: Feminist or Misogynist? You could try to read his books, but you won't understand them. Even if you do understand them, you will need to research his life in order to interpret his works in the proper context. And, after all that, when you finally do all this legwork, you'll only learn that he specifically designed his writings and behavior to lead you into a black hole. No normal human has a chance.

                So here's what you do. You put down the primary sources and go to the library. Read papers published by Ph.D. students that interpret Nietzsche's works and struggle to answer the question before you. Make notes on the general points of the argument and the supporting quotes across several of these papers (they're generally pretty short, and way easier to understand that the primary text). You can even read some Nietzsche if you're feeling adventurous, but I don't recommend it.

                Once you've formulated your own fervently held beliefs about Nietzsche in this way (by ripping them off of original thoughts by people that actually cared), you can leave with only your general notes outlining the arguments and citing the supporting quotes. If there's a good amount of material to choose from, make sure you choose an interpretation that is controversial (but well-supported)...don't turn in just another paper that will make the TA's eye's glaze or the professor want to put a gun in his mouth—liven things up a bit for those poor saps, they're stuck studying philosophy their entire lives! Look at all of the material you've collected and turn it over in your brain...try to synthesize your own controversial conclusion drawn from the points that others have worked so hard to create. Now go party for a couple of days to let it all sink in. The more beer you drink during this time, the less likely that some random quote you read will bubble up from the depths verbatim and get you busted. Once the requisite few days have been partied away, sit down and write the sentence or two that ties together all of the supporting material that you have decided to randomly & provocatively tie together. Include the points and supporting quotes to "prove" what you're saying.

                Instant A. Takes an hour, maybe two at the uni libe, and maybe another couple of hours to draft a typical 5-8 pager. This takes other students in the class weeks of devotion to achieve, leaving you plenty of time to study for your other courses, or study the local bar scene, or interact with the student body (as it were -winkwink-). The best part is, when you sit down to write after a couple of days of partying, it could be a paper, or it could be an in-class midterm or final. Whatever...either way, you're covered.

                • Re: (Score:3, Interesting)

                  by DudeTheMath (522264)

                  The important point when "incorporating research done by someone else into your own" (as BrokenHalo mentions--see, I'm citing my quotation!) is to cite (not necessarily quote) the other someone. This is how I got my A's in college and HS. Failing to do so is plagiarism.

                  If you use someone else's idea, you cite it ("Hey, someone else thought of this before me."). That's it. If you use someone else's words, you quote it ("Someone else said 'exactly this'.").

                  If you don't use someone's exact words, it makes

        • Re: (Score:3, Interesting)

          by SerpentMage (13390)

          I think that this is very dangerous...

          Let me tell you about a situation. I was a speaker until recently. And around 98 I was giving a talk on technology X. Another speaker who was from the company who created the technology also gave a talk on technology X. Me and this other speaker knew each other, but we did not converse.

          Oddly our two talks were VERY VERY similar. He in a private manner accused me of copying his slide deck. Since he was a more well known speaker and I a newbie it seemed all logical.

          It was

        • Re: (Score:3, Interesting)

          by Urza9814 (883915)

          Here's the interesting thing: I have a professor that uses such tools (specifically TurnItIn.com), and I submitted a paper not too long ago and was told it was 6% plagiarized. No big deal. My prof said 20-30% would be allowable, as like you said, it's hard to write anything without seemingly plagiarizing. But the problem is this: I didn't cheat, I never saw anyone else's papers, and nobody ever saw mine. Yet a week later, after everyone else had sumbitted, suddenly I was at 23% plagiarized. Now, my professo

      • by eln (21727) on Tuesday April 28, 2009 @11:41AM (#27746375) Homepage

        That sort of thing is just unfair. In my opinion, plagiarism is indeed a heinous crime in an academic setting because it goes against everything the pursuit of academics is supposed to be about. Given that, the punishment should be severe.

        However, since the punishment for plagiarism should be severe, there should be great care to investigate it properly. If you can show a preponderance of evidence that not only is a paper plagiarized, but you can accurately identify the source(s) from which each plagiarized section of it was copied, then the student should be expelled after the first offense. If you can't come up with that evidence, though, you should not be punishing the student.

        I thought professors had legions of grad students to ferret this sort of thing out, why do they need these programs? Trusting a decision that could permanently impact a student's entire life to a computer program seems careless and dangerous.

        • by Erwos (553607)

          I don't really get what you're saying. If the program is showing 35%+ of the paper as plagiarized, that's pretty much a preponderance of evidence right there. The program will tell you were the plagiarism is from, too, if it's anything like what I used.

          • Re: (Score:3, Insightful)

            by Deagol (323173)

            I think that the objection here comes from the lack of transparency of the product being used. You input a paper, and you get a percentage answer. You're not given a list of papers/sources that registered a match (it would seem, anyway -- I don't know), thus you cannot verify the claims of the machine. Of course, being proprietary systems, I highly doubt that the vendor will allow inspection of the methods of detection or the database.

            The point is, that 35% means *nothing* useful without the exact contex

            • by Erwos (553607)

              The app I used not only told you what the plagiarized source was, but also gave you the passage that was plagiarized from. So your objection is irrelevant. In fact, I specifically addressed it in the post you're replying to.

              These detectors are not black boxes at all.

              • by radtea (464814)

                These detectors are not black boxes at all.

                False. The detector you used was not a black box. The discussion you're replying too specifically cited turnitin.com as a detector that IS a blackbox.

                At best you have a plausible conjecture based on your limited experience that all detectors are transparent and give adequate feedback on the sources of suspected plagiarism, but as someone once said, "plausible conjecture should not be misrepresented as proof positive."

            • Re: (Score:3, Informative)

              When I was at university, one of the lecturers showed us the plagiarism detection tool. Sure, it gave you a percentage, but it also gave you some output showing the passages in the text vs. what the program thought those passages had been taken from. He showed that most of the things that the tool had detected there were inconsequential, on the paper he was using for the demonstration.
          • The program needs to justify its accusation - if 35% of the paper is plagiarized, it should be able to provide some lengthy passages from some other site that match the paper.
          • I don't really get what you're saying. If the program is showing 35%+ of the paper as plagiarized, that's pretty much a preponderance of evidence right there. The program will tell you were the plagiarism is from, too, if it's anything like what I used.

            You raise a good point. If the computer says it's plagiarism, then it is. Assuming that plagiarism is defined as, "what the program catches".

        • by bcrowell (177657) on Tuesday April 28, 2009 @12:14PM (#27746837) Homepage

          In my opinion, plagiarism is indeed a heinous crime in an academic setting because it goes against everything the pursuit of academics is supposed to be about. Given that, the punishment should be severe. [...] the student should be expelled after the first offense

          I teach physics at a community college, and although I don't assign the kind of term papers you'd see in an English course, I do grade homework, lab writeups, and exams, and plagiarism is an issue that comes up. My school's policy is that the only punishment the professor can give for cheating is to assign a zero on that particular assignment. This is, in my opinion, almost no punishment at all; typically the reason people cheat is because they know they're going to fail, so assigning an F isn't a punishment, it's more like assigning the grade that the student actually earned. The school's administration tells us that this policy is the way it is because of a recent legal decision in California. Before this rule was imposed on us, my policy had been to give the student an F in the course if it was a serious case of cheating. In any case, my school, like most community colleges, has an extremely late drop deadline (the 14th week of the semester), so, e.g., if I give a student an F on an exam for cheating on the exam, the student will typically just drop the course, resulting in no penalty on his transcript other than a W, which will not affect his GPA.

          My school does provide a process where the professor can file a form to report academic misconduct. The form is then supposed to be followed up on by the dean, filed somewhere, and referred to later if the student shows a repeating pattern of cheating. Theoretically the student can be expelled, but never on the first offense. My experience is that this process doesn't actually seem to work, because the administrators involved aren't interested in spending the time and meeting with angry students. The threat hanging over the heads of the profs and deans is always that the parents will sue. Avoiding lawsuits is always the administration's top priority, far higher than education.

          The long and the short of it is that when a student makes a calculated decision to risk cheating, he's usually doing it based on a realistic assessment that the consequences of getting caught are extremely mild.

          However, since the punishment for plagiarism should be severe, there should be great care to investigate it properly. If you can show a preponderance of evidence that not only is a paper plagiarized, but you can accurately identify the source(s) from which each plagiarized section of it was copied, then the student should be expelled after the first offense. If you can't come up with that evidence, though, you should not be punishing the student.

          There is absolutely no way, at least at my school, that a student would ever be expelled for plagiarism. To get expelled, you would have to physically attack someone. You seem to be imagining a situation in which the professor and/or the school punishes the student just because a particular piece of software flashes a message on the screen saying "plagiarized." I can't believe that anyone would ever do that. Of course you're going to look at the text that matched, and see whether you really believe that it looks like it was plagiarized.

          I thought professors had legions of grad students to ferret this sort of thing out, why do they need these programs?

          No, most professors do not have grad students to do this. I work at a community college. No grad students. My wife teaches at Cal State LA. They have grad students, but the grad students don't work as TAs or graders; the professors have to grade 100% of the written work.

          Trusting a decision that could permanently impact a student's entire life to a computer program seems careless and dangerous.

          I don't think anyone does trust such a decision to a program. They use the program as a first step.

        • by MickLinux (579158)

          I thought professors had legions of grad students to ferret this sort of thing out

          Well, theoretically, yes. However, in my experience, the grad students would assign 10 times too much work, then not grade it all semester, and at the end of the semester call them all "90", minus 10x number of days late. So those who did the work get Ds, while the 80% who don't even bother get Bs. Then the faculty support them, because they weren't paying attention, and it would look bad for 80% of their class to fail the

    • Award

      Yahoo! Research will award a cash prize of 500 Euros to the winner of the competition.

      Wow, 500 Euros for solving a problem that every single college in the world would pay good money to have? Sounds like a gyp for the guy who wins.

      "Yeah, thanks for spending time and effort to solve this complex problem. Here's your 500 Euros. Now we're going to go sell that pants off of this and make millions. Have a nice day!"

  • by svendsen (1029716) on Tuesday April 28, 2009 @11:23AM (#27746131)
    Does your school/university check your homeworks/theses for plagiarism? Nowadays, probably Yes, but are they doing it properly? Little is known about plagiarism detection accuracy, which is why we conduct a competition on plagiarism detection, sponsored by Yahoo! We have set up a corpus of artificial plagiarism which contains plagiarism with varying degrees of obfuscation, and translation plagiarism from Spanish or German source documents. A random plagiarist was employed who attempts to obfuscate his plagiarism with random sequences of text operations, e.g., shuffling, deleting, inserting, or replacing a word. Translated plagiarism is created using machine translation
    • by snarfies (115214)

      Does your college check your theses/homeworks for plagiarism? Nowadays, probably so, but are they doing it correctly? Not much is known about the accuracy of plagiarism detection, which is why we conduct a competition on plagiarism detection, which was sponsored by Yahoo! We have set up a body of fake plagiarism which consists of plagiarism with varying degrees of translation plagiarism from Spanish or German source documents and obfuscation. A randomly selected plagiarist was used who tries to cover her

      • Does your school verify his thesis / homeworks of plagiarism? Today, probably, but they are done properly? Not much is known about the accuracy of detection of plagiarism, which is why we carry out a competition in the detection of plagiarism, which was sponsored by Yahoo! We created a body that is false plagiarism plagiarism plagiarism with varying degrees of translation from Spanish or German source documents and obfuscation. A randomly selected plagiarist who was trying to cover his theft with random seq

    • The issue I have with Plagiarism. Has been the Degree of Sin it implies. Espectially with Undergrad and Grad work that isn't published. If you are found that you forgot to quote or site a quote you could get kicked out of college. If you are found with drugs or have done a violent act on campus you may loose your on campus housing. I am weak at witting myself, it takes me a long time to even write a single page paper (1 to 2 hours), so after I have wrote the paper, I need to put all my effort into making

  • Oesday ouryay oolschay/universitysay eckchay ouryay omeworkshay/esesthay orfay agiarismplay?

    As long as your prof accepts foreign language papers, you're golden. Or, find a paper that you want to rip off written in German/French/Spanish/whatever and dump it through babelfish:

    Your school/university controls your homeworks/teses plagiat?

  • Plausible test? (Score:5, Insightful)

    by fuzzyfuzzyfungus (1223518) on Tuesday April 28, 2009 @11:28AM (#27746193) Journal
    Now, I understand that plagiarism is common among the weakest of undergrad writers; but "machine translation from Spanish or German source documents" and "random text operations" seem like unrealistic experimental stimuli.

    In order to be a success, a plagiarized paper has to survive scrutiny by automated systems, if any are deployed, and human graders, if any are paying attention. Machine translation and text mangling should trivially defeat automated systems, at least any that aren't cranked well into World o' false positives territory; but would they pass human scrutiny? Even if they did, handing in something produced by machine translation and text mangling would probably earn you a referral to "Remedial English 101 For Life".
  • Irony (Score:5, Funny)

    by Shadow Wrought (586631) * <<shadow.wrought> <at> <gmail.com>> on Tuesday April 28, 2009 @11:35AM (#27746303) Homepage Journal
    Just imagine everyone's surprise when all the entrants turn in the exact same process.
  • When George Harrison wrote the song "My Sweet Lord" for his solo debut album, he accidentally plagiarized a Ronald Mack song. He ended up losing a million dollar lawsuit over it [wikipedia.org]. What should he have done to avoid plagiarizing any of the millions of songs that had been written before then?
    • by gnick (1211984)

      Just a side note - George Harrison/Ronald Mack is a much better example of musical plagiarism than what sprang to my mind.

      Damn you xkcd [xkcd.com]. You've ruined me.

  • A while back I worked on a program to find duplicated code - CPD (copy/paste detector) [sourceforge.net]. It discards comments and whitespace and (optionally) normalizes variable names... but probably wouldn't deal well with tokens being moved around. There's a chapter on it in my PMD book [pmdapplied.com], too.

    What was interesting were some of the performance optimizations that folks came up with. My first version used JavaSpaces to distribute the computation - but subsequent versions (thanks to Brian Ewins and Steve Hawkins) were fast e

  • by DingerX (847589) on Tuesday April 28, 2009 @11:39AM (#27746363) Journal
    A plagiarised paper just smells bad, and is characterized by shifts in voices and writing styles, sudden ignorance of the the critical points raised earlier. The same author who can't write a grammatically correct sentence one moment is throwing down complex constructions the next The harder part is identifying the source of the plagiarism. For undergraduate papers, even the harder part is trivial. After all, the point of plagiarism is that the author is too lazy to write anything original.

    For academics (professors), the situation isn't all that different. Plagiarism is usually a mix of stupidity, laziness and pressure to get stuff done. It usually happens where big, popularizing authors try to rip off the obscure ones (go back twenty years a la Mr. Ambrose, or pick something in a different language, preferably Italian), or when someone needs a book in an obscure field, and tries to pirate something really obscure.

    Even so, if a plagiarist has enemies who give a damn, they can find the source fairly fast. So why construct a test for the most obfuscated cases, when a plagiarist clever enough to obfuscate could simply write something original and sufficiently clever?
    • by Erwos (553607)

      Very true. My wife reviews proposals at her work from time to time, and she has gotten surprisingly good at detecting which ones are doing wholesale plagiarizing. I suspect she'd probably miss it if it was a sentence or two, but some of these idiots are doing whole pages of it.

    • I get the distinct impression that most of the interest in automated plagiarism detection has little to do with circumstances where writers are writing to actually be read, and more to do with ensuring Compliance among high school students and undergrads in big lecture courses.

      As you say, if somebody actually reads it, it won't be too hard to detect. If the somebody reading has an ongoing familiarity with the writer's style, it'll be even easier. What they want, though, is something that can skip that st
    • by Thaelon (250687)

      Don't blame laziness!

      Progress is made by lazy men looking for easier ways to do things.

    • If you have graded more than 2 assignments in your life, and really read each and every paper, and provided good critical feedback, then it is really easy to spot a plagiarized paper.

      Also, a grader usually knows the subject matter and has read many other good and bad works on the subject. You can get a feel for a person's writing style and depth of knowledge on a subject in just a few sentences. Then when you "smell something fishy", then it usually is.

      So far, whenever I "smell something fishy" I try to f
  • This isn't a particularly good test of plagiarism detection at all, since the data corpus is computer generated. Real-world plagiarism detection needs to take account of subject matter (correct answers to a physics paper will be less diverse than ones on wide ranging literary topics) and allowable duplication, such as quotations, restatement of the question, citations of sources, etc.

    • by Millennium (2451)

      A good detection tool understands that some duplication, as with the coincidences and quotations you mention, is going to be inevitable. But such tools also understand that coincidences and quotations can only go so far without raising eyebrows.

      The best a detection tool can do is flag those papers which seem to have an unusually high proportion of duplication -say, 20% or more- and present these to the teacher along with the works most likely to have been plagiarized from. In the end, the teacher needs to m

  • by PPH (736903) on Tuesday April 28, 2009 @11:42AM (#27746389)

    ... use the same system the US Patent Office uses for finding prior art.

    On second thought, scratch that idea.

  • by Anonymous Coward

    Calculate an md5 hash of the paper, if it matches the md5 of another, it's plagiarized.

  • It's a monkeys on a typewriter thing. these companies add papers to there database as they compare them. If you feed enough papers into a database eventually they will all come back plagiarized there are not an infinite number of possible term papers there are only so many things that could be written for a topic that make sense, and most English teachers recycle topics. why English departments buy into this I don't understand let it go for long enough(it would only take another decade or two at most) and y
  • Hard to detect in an academic paper, but easy to find on the web. Go to almost any Wikipedia article and you'll find it right there in front of you. Especially any article on a movie -- almost are are ripped directly from imdb.
    • ...and the IMDB article is ripped directly from a review, and the review is based on the Movie companies website, etc ....

  • by russotto (537200) on Tuesday April 28, 2009 @11:55AM (#27746557) Journal

    I once was on a Fido forum with someone who would often write responses nearly word-for-word identical to mine. It was uncanny; I'd see his post and recognize my own writing, only to realize it wasn't mine. Timestamps would sometimes show my post was written first, sometimes his. I imagine some others on the forum thought at least one of us was a sock puppet, but neither of us was.

    (If he's on slashdot, he's probably composing a post just like this one)

    That probably happens rarely. But build a big enough database, and it will happen often. Particularly given the restricted problem domains in undergraduate papers. It's not just a computer problem; even humans will think "plagiarism" when they see two papers with similar ideas and similar turns of phrase. Which I think demonstrates that plagiarism cannot be established satisfactorily merely by showing similarity between papers.

  • by cpu_fusion (705735) on Tuesday April 28, 2009 @11:59AM (#27746619)

    Plagiarism is a symptom of professors only being involved in the last step: reviewing the final product.

    Require the students to submit multiple drafts. Meet with them for 15 minutes each and discuss their thought processes on the ongoing paper. You'll get better final products, teach people not to procrastinate, and smoke-out people who have no involvement in their "own work."

    What, can't do that because you have 60 students in a class? Well, there's part of the problem too.

    We're trying to find a technology solution to a problem with less student-teacher interaction. Typical!

    • Re: (Score:3, Insightful)

      by Colonel Korn (1258968)

      Plagiarism is a symptom of professors only being involved in the last step: reviewing the final product.

      Require the students to submit multiple drafts. Meet with them for 15 minutes each and discuss their thought processes on the ongoing paper. You'll get better final products, teach people not to procrastinate, and smoke-out people who have no involvement in their "own work."

      What, can't do that because you have 60 students in a class? Well, there's part of the problem too.

      We're trying to find a technology solution to a problem with less student-teacher interaction. Typical!

      I never taught a class involving humanities paper writing (in the science classes I taught, I could detect borrowed work by asking our kids to explain the calculations in their presentations and reports), but my wife meets with students several at least once after they turn in a required outline and bibliography to her. The bibliography, meeting, and my wife's extensive knowledge of scholarship in her field have made plagiarism rare and very obvious. Also, they make the students write vastly better papers

    • Most of my college papers had exactly one draft written the night before they were due with bibliography. Most of them received a B or better.
      I do however agree that more student-teacher interaction would be a better solution to this problem. Teaching is a "labor" intensive task in that it optimizes at some small number of students per teacher. I do not believe that technology is capable of changing that to a significant degree.
    • If you're on the other side of the equation, as a student, save your drafts.

      If you are ever wrongly accused of plagiarism (or for that matter, copyright infringement), having several earlier versions of a paper, along with outlines, notes, etc., will work greatly in your favor.

      Not only that, but it also allows to see the progression in your work, and can double as a backup in case something goes catastrophically wrong with your current document.

    • Actually the problem is our institutionalization of education.
      Somewhere along the line, the educational systems became gate keepers to jobs so to speak.

      Can't be a doctor without first doing well in school, getting accepted into a medical school... ...

      So grades are of prime importance as that is how the educational system ranks people. Otherwise, we could ALL be doctors and earn big bucks, we could ALL be lawyers and earn big bucks, we could all be X and earn big bucks (of course we could not ALL earn big b

  • by MarkvW (1037596) on Tuesday April 28, 2009 @12:02PM (#27746649)

    Law enforcement uses automated fingerprint detection to identify possible matches. It never claims a match based on the computer.

    Using a program as the sole plagiarism judge and jury is profoundly unfair. If a university wants to discipline a student for a plagiarism hit, then it needs to obtain the source document--and pay the source document's creator if necessary to obtain it.

    Confronting the student with the alleged source gives the student a fair chance to defend himself/herself.

  • by Areyoukiddingme (1289470) on Tuesday April 28, 2009 @12:03PM (#27746667)

    Seriously, the humanities are in trouble. With over 6 billion people on the planet, it's extremely difficult to have an original thought. This sets the stage for endless repetition. Add to that the fact that the very process of teaching the humanities usually means imparting a teacher's single interpretation of the source material to the students who then do the natural thing when it comes to writing a paper and parrot back to the teacher what they've heard, knowing that's the only way to get a good grade, and the resulting combination is deadly.

    The papers are all going to be similar from the beginning, because it's a rare instructor who actually encourages dissenting opinions (and that fault in teaching is a whole other discussion of its own). Then the papers are going to be similar because there really are only so many ways to interpret the source material that are defensible. And finally, the papers are heavily likely to be similar to at least one other paper written about the subject, when every paper ever written on the subject is considered (exactly what the plagiarism sites attempt to do).

    I think the problem this competition is trying to solve is intractable in the face of the current educational system. It's gotten to the point where, if the software considers a large enough number of sources, even the instructor's own papers are going to look like plagiarism.

    Hell, look at the Slashdot comment system. A million people read the front page, but only a few thousand post comments. Thousands more are content to simply moderate the comments, and face it, comments they agree with are more likely to be modded up, one way or another. Then compare the modded comments. We get a lot of duplicate or near duplicate thought, and hence near duplicate comments on every article. Why? Because when you get enough people together in one place, discussing the same subject in writing, there are only so many viewpoints and only so many comments that won't get modded down for being of the "cubic what?" variety.

    Time to go back to grading on spelling and grammar. We've reached the end of the grading on ideas road. Coherency of presentation is all we have left. (One could argue it's all we ever had.)

    • Re: (Score:3, Funny)

      Shit, by the time I came back to the keyboard after writing this post and not hitting submit, there were 30 other posts that said the same thing. I must be a plagiarist.... Damnit.

    • Re: (Score:3, Interesting)

      In my experience many professors (too many) basically ask for Plagiarized papers. There are a variety of reasons why they ask for plagiarized papers, mostly having to do with either laziness or wanting a particular view point regurgitated.

      People in my generation had Cliff Notes which one could spew forth a regurgitated version from Cliff Notes and get an A, while those with original thoughts would be graded much more harshly.

      I once researched a paper where it was fully documented with sources and such, all

  • Uh, Use Google? (Score:2, Interesting)

    by chainLynx (939076)
    Here's a good article explaining how Google makes plagiarism detection easy: http://questioncopyright.org/node/4 [questioncopyright.org] There was a story a couple years ago about one of these plagiarism detection services, Turnitin, getting sued for copyright infringement... does anyone know if that went anywhere? http://education.zdnet.com/?p=953 [zdnet.com]
  • by Animats (122034) on Tuesday April 28, 2009 @12:14PM (#27746823) Homepage

    This is a useful mechanism for search engines, which need to distinguish original content from hundreds or thousands of blogs echoing it. Imagine the Web with all the duplicate, repetitive material ignored. No wonder Yahoo is supporting this. Someone over there is thinking.

  • The next contest will be to see who can write an automated paper generator that fools the plagiarism detector.

  • by jcohen (131471) * on Tuesday April 28, 2009 @12:24PM (#27746967) Homepage

    I realize that plagiarism detection represents an interesting problem in computer science, and that it goes some distance toweard solving a serious problem. However, I read an article [chronicle.com] in the Chronicle of Higher Education, behind a paywall, alas, which leads me to believe that it is only a partial solution to academic dishonesty. The article suggested that, thanks to the Internet, the costs of human capital are now so low that hiring a ghostwriter to compose one's papers, sidestepping the problem of plagiarism to begin with, is far more expedient than plagiarism itself. It described a Russian-"businessman"-headed network of Filipino paper-writers, most paid between $1 and $3 a page, who are able to market their services to the West through a web site [bestessays.com] and remote call centers. At $20/page to the end-user, with no possibility of plagiarism detection, I think that most desperate students would find this a good deal. In my opinion, ghostwriting will supplant plagiarism as time goes on.

    What is a teacher to do? In-class writing samples would seem to be the only hope of detecting ghostwriting. Students could, of course, argue that at home, they can "polish" their papers, and that therefore they will not resemble the in-class samples. Moreover, checking samples against papers is a thankless and time-consuming task which is only a preliminary to actually evaluating the work. Perhaps there is a computer-based solution to this, but, in the meantime, perhaps potential ghostwriting customers could take their desires to their logical conclusion, and simply buy their degrees on the Internet directly.

    • by radtea (464814)

      In-class writing samples would seem to be the only hope of detecting ghostwriting.

      The problem with this is that the same students could provide writing samples to ghosting services, ensuring a degree of similarity between them.

      The problem with all of this is not that students can so easily circumvent the nominal evaluative process, but that "higher" education has such low standards of evaluation. We grade students in ways that are completely unscientific, all in the name of cramming huge numbers of kids in

  • I'd maintain a database of all writing assignments submitted by a student over their college career. I remember an assignment in my CSCI classes that used an algorithm based on Euclidean distances and a count-table for each word to compare documents, so even using a simple metric would probably work well.

    Since this method would be based on vocabulary, studying for tests like the GRE vocab section may through it off, since someone could conceivably rapidly change their vocabulary, and through off the system

  • The best way to do this is probably to have a cache of all likely possible sources from which the material could be copied, and who else has that but google? Other search engines, of course... Your major limiting factor is that there's only so many ways to say the same thing. At some point, if you collect enough sample papers, you're going to discover that every paper actually on the topic can only be made up of so many possible phrases :)

  • My Dissertation (Score:3, Interesting)

    by Kryis (947024) on Tuesday April 28, 2009 @12:34PM (#27747113)
    The Computer Science department at my uni routinely scans final year dissertations using automated software. Mine was flagged up as "possibly plagiarised"; a significant amount of content could be found elsewhere on the web (can't remember the exact percentage).

    My project supervisor said when he got the email from the system saying it came back positive he was very surprised - given the small amount of research in the area (there are only 5 or 6 papers on the same topic that I am aware of), and no other research on that exact method of solving the problem .

    When I found this out I was more than a little worried - I wasn't aware of copying any other work . It turns out that it had picked up on stupid stuff, like the boilerplate at the beginning of the dissertation, or phrases like "In conclusion,", and nothing longer than 3 or 4 words in any paragraph.

    This sort of plagiarism detection that detects word shuffling is fine for people that REALLY don't have a clue (i.e. the ones that forget to change the @author javadoc tag when copying their friends Java coursework), but it would still be relatively trivial to change enough words in a sentence to fool the system.
  • Wrong Problem (Score:3, Interesting)

    by green1 (322787) on Tuesday April 28, 2009 @12:45PM (#27747219)

    They are trying to invalidate plagarism detection software by proving that you can still manage to plagarise in a way it won't detect (false negative). The thing is, this isn't the problem with plagarism software, the real problem is where it detects plagarism when none in fact took place (false positive). This will happen in a few ways:

    1) There have been several highly publicized incidents where students have been in big trouble for plagarising their own work. This is ludicrous, they wrote it in the first place!

    2) A large enough database of phrases, paragraphs, etc. will eventually encompass the majority of ways of phrasing a particular idea, therefore when discussing an existing idea the odds of saying something that has been said before will eventually approach certainty.
    Now this wouldn't necessarilly apply if you were inventing a whole new concept, but in most classes that is not what you are being asked to do, instead you are asked to research how something has already been done. There is bound to be duplication here, especially as the database grows. This doesn't mean you plagarised something, merely that someone else has worded something similarily in the past. (For it to be plagarism you would have had to have seen and copied that earlier work, in this case you may not even know about it.)

  • by Ralph Spoilsport (673134) on Tuesday April 28, 2009 @12:47PM (#27747257) Journal
    When you've got Markov Generators? [doctornerve.org]

    And the Postmodernism Generator? [elsewhere.org]

    You don't have to write much of anything at all. Would you get a good grade? Fuck no. Would they FLUNK YOU FOR IT? Fuck no. Because its graded by untenured faculty who have to curry favour with students, or its graded by Grad Assistants who don't give a shit, and why should they.

    Oh, look, a paper by Cindy Bleethstain. She's a fucking idiot. Let's see. Hmmmm. Yup. Incomprehensible bullshit, as usual. Give her a C+ because some of it is intelligible and kind of funny.

    Oh, look another paper by Guido LeDouchebag. Bottlecaps are smarter than this turnip. Hmmm. Yup. More incomprehensible bullshit. C+. At least he finally discovered the spellchecker.

    THAT'S what it is often like, unfortunately.

    I read the paper, and if there is a passage that is noticeably different in tone, I'll copy past a section into Google and see where they pulled it. 9 times out of 10, it's a direct lift from a web page, unattributed. I send it back, and tell them "Footnotes, please. Also, automatic single grade loss. right off the top."

    If it comes back still broken, then I nail 'em for plagiarism. It's a big deal, and requires paperwork I don't like to fill out...

    So far I've only had one student have the cajones to not bother fixing their attributions, and he got crucified by the Ethics board. He was an arrogant little prick, too.

    RS

    • by rgviza (1303161)

      >He was an arrogant little prick, too.

      It's a good thing there are no arrogant professors or TAs.

    • Re: (Score:3, Interesting)

      by HikingStick (878216)
      My problem with automatic checks is that there is always a chance that someone's seemingly original thought may actually reflect thoughts someone else already may have put down on paper. I remember being accused of copying someone else's work once. It was in the early '80s when the Internet as we know it was not part of general public awareness. When the instructor interrogated me on the sentence (one sentence in a paper at least five pages long), he insisted I copied it from some specific book or articl
  • I was talking to a comp sci proof who uses plagarism software to detect copied source programs. Claims it detects common ruses like transposition, reformatting, and variable renaming. The school suspends for rest of year if claim is verified.

    Some professors now encourage group programming projects because that is how it works in the real world.

No amount of careful planning will ever replace dumb luck.

Working...