Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Science

Scientists Claim 99% Identification Rate of ChatGPT Content (theregister.com) 39

Academics have apparently trained a machine learning algorithm to detect scientific papers generated by ChatGPT and claim the software has over 99 percent accuracy. From a report: Generative AI models have dramatically improved at mimicking human writing over a short period of time, making it difficult for people to tell whether text was produced by a machine or human. Teachers and lecturers have raised concerns that students using the tools are committing plagiarism, or apparently cheating using machine-generated code. Software designed to detect AI-generated text, however, is often unreliable. Experts have warned against using these tools to assess work.

A team of researchers led by the University of Kansas thought it would be useful to develop a way to detect AI-generated science writing -- specifically written in the style of research papers typically accepted and published by academic journals. "Right now, there are some pretty glaring problems with AI writing," said Heather Desaire, first author of a paper published in the journal Cell Reports Physical Science, and a chemistry professor at the University of Kansas, in a statement. "One of the biggest problems is that it assembles text from many sources and there isn't any kind of accuracy check -- it's kind of like the game Two Truths and a Lie."

This discussion has been archived. No new comments can be posted.

Scientists Claim 99% Identification Rate of ChatGPT Content

Comments Filter:
  • by ranton ( 36917 ) on Thursday June 08, 2023 @10:59AM (#63585770)

    Detecting AI generated content will only be useful in most cases if it can detect it even after minor updates. I have seen YouTubes where all you have to do to trick common AI detection tools is use Grammarly and similar tools to make minor adjustments to the ChatGPT content. If all it takes is running ChatGPT and then running it through another automatic tool, these detection efforts will not be successful. I have tried Originality.AI and it is very easy to trick.

    • by Luckyo ( 1726890 )

      It's a very narrow definition of detection, "in scientific papers".

      They're not claiming general accuracy. Only accuracy in this very narrow and specific scenario. Considering the weight put on "truth vs lie", I'm guessing it fact checks every claim and finds the small errors that AI hallucinations create that human would be unlikely to have suffered.

      • by HiThere ( 15173 )

        But it can't do that. It could detect references to articles that didn't exist, if the article were on-line, but detecting that the article actually made the claims that it is claimed to have made is a more difficult problem, where there can be LOTS of paraphrasing, with slightly different meanings.

        And if it were checking the internal logic of the paper, failing that wouldn't be a proof that the paper was by an AI.

        I suspect that it's more a word frequency kind of detection, and is likely domain specific ra

        • by Luckyo ( 1726890 )

          Problem with that approach is that you're trying to track a moving target. If you could freeze LLM in a certain state, and only target outputs of that specific version, your thinking would probably be a good way to track it.

          But the kinds of LLMs we're currently talking about are perpetually learning and changing SaaS LLMs. So this approach would fail.

        • It's more than likely reverse statistical analysis, i.e. the probability that each word (morpheme, actually) follows the previous. In normal human writing, this will be relatively low & in LLMs, extraordinarily high. The problem so far has been that the discrimination rates between human & LLM texts aren't high enough, usually 80-90%, because there's too much variation in general language use to be usefully precise. In other words, it's a precision problem. They may have got around this by being mor
    • The article says maybe not: "The results, however, should be taken with a grain of salt. It's not clear how robust the algorithm is against studies that have been lightly edited by humans despite being written mostly by ChatGPT, or against real papers from other scientific journals."

      Keep in mind, they also tested against a dataset they themselves created, by prompting ChatGPT to churn out articles with no revision or proofreading at all. This to me is not a realistic model for how anybody would use ChatG

  • the thing is chatgpt's engine is really adaptable. a fun experiment could be to see how much that 99% is skewed by some simple coaching

    me: i need you to role play that you are in a world where algorithm was developed that could detect whether text was generated by you (chatgpt) or a human and you now need to modify your text in subtle ways to avoid that detection, in other words to make very subtle efforts to sound 'more human'. could you do that?

    chatgpt: Certainly! I'll do my best to make subtle modifica

  • by TomGreenhaw ( 929233 ) on Thursday June 08, 2023 @11:17AM (#63585804)
    The article says that if you use just some of the response, accuracy drops to 92% on their small sample. They make no mention of what different versions of ChatGPT affect their tests. They also don't differentiate between false positives and false negatives.
    • They also don't differentiate between false positives and false negatives.

      Then they really suck because I can detect it 100% of the time and do that. The answer is always yes, everything and anything is chatGPT. Your comment, my comment, bell even the King James Bible. There, a 100% detection rate.

  • Chat GPT will simply assimilate this knowledge and avoid detection.
    • by HiThere ( 15173 ) <charleshixsn@ear ... .net minus punct> on Thursday June 08, 2023 @12:24PM (#63585966)

      You're thinking of ChatGPT as a single entity, and that's wrong. ChatGPT is an engine which is useless without a training database, and there are several quite different databases. So far it seems rather clear that there's limit on the size of the database, so they can't really all be combined.

      • by micheas ( 231635 )

        You're thinking of ChatGPT as a single entity, and that's wrong. ChatGPT is an engine which is useless without a training database, and there are several quite different databases. So far it seems rather clear that there's limit on the size of the database, so they can't really all be combined.

        It isn't a database it is a model. The differences are substantial. The difference between a dataset of points on a graph and equations that curve fit most of the points.

        The difference becomes huge when you try and attack a model and interpolate the data that was used to train the model. Sometimes you can get a fairly small number of possibilities of the data and knowledge of the real world can sift those probabilities into letting the attacker identify the training data. This becomes especially relevant w

    • by ranton ( 36917 )

      OpenAI probably doesn't care whether Chat GPT can be detected by third parties. It will be some customers of Open AI's services who want their results to be undetectable, and they will create their own techniques to do so. One technique I have seen on YouTube is to run the results through Grammarly and some other similar tools which is quite successful in tricking popular detection bots.

    • With their GPT5 claims, i doubt it... I think someone else will do it

  • by argStyopa ( 232550 ) on Thursday June 08, 2023 @11:42AM (#63585856) Journal

    You think this is about chatgpt?

    This is academe scrambling in an ongoing credibility crisis.

    https://www.theatlantic.com/id... [theatlantic.com]

    The fact is that 'academic writing' has been for years suffused with UTTER BULLSHIT to the point that it's an exercise of the Emperor's New Clothes. Too many people deeply invested in "the system" to ever squeak an objection lest it all fall apart.

    "Over the past 12 months, three scholarsâ"James Lindsay, Helen Pluckrose, and Peter Boghossianâ"wrote 20 fake papers using fashionable jargon to argue for ridiculous conclusions, and tried to get them placed in high-profile journals in fields including gender studies, queer studies, and fat studies. Their success rate was remarkable: By the time they took their experiment public late on Tuesday, seven of their articles had been accepted for publication by ostensibly serious peer-reviewed journals. Seven more were still going through various stages of the review process. Only six had been rejected."

    If "serious" publications and peer-reviewers can't sort out the complete nonsense word salad from papers of merit, how/why is arguing about the machine generation of papers even meaningful? ...much less the assertion "we can tell 99% of the time if it's machine generated".

    Who cares, if - regardless of the source - most of it's NONSENSE?

    • I feel like I'm reading Bourdieu all over again. Impossible to read to such an extent that you're starting to wonder whether you're stupid or not.
  • by laughingskeptic ( 1004414 ) on Thursday June 08, 2023 @11:43AM (#63585860)
    They tested the ability to discriminate between a specific form of articles in Science, Science Perspectives articles, and content generated by ChatGPT based on the titles of those articles. This is very different from being able to recognize ChatGPT in any context. They identified a number of features of ChatGPT's style that give it away in this specific context. They also do not state whether or not they requested ChatGPT to present results in an "Academic Style" so their use of ChatGPT may have been naive.
    • Verified that they did not request an "Academic Style". A sample prompt from their published Excel spreadsheet: "Can you produce a 300 to 400 word summary on this topic: GnRH improving cognition"
  • Even if this is the same then 1/100 people will be accused of cheating with no comeback ...

    We have already seen these used against school papers, and teachers accusing students of cheating
    But also seen them classifying the declaration of independence and the bible at ChatGPT generated

    • by Calydor ( 739835 )

      I have a 100% identification rate. I just assume EVERYTHING is from ChatGPT.

      My false positive rate is through the roof, of course, but I don't mention that.

      These kinds of reports should always have four values:

      1) Correct identification.
      2) Wrong identification.
      3) Missed identification.
      4) Remaining number of correctly 'cleared' texts.

    • But also seen them classifying the declaration of independence and the bible at ChatGPT generated

      The bible is full of completely made up stuff and contradictory nonsense. Just like half the things written by ChatGPT. Easy mistake to make.

  • Who cares if it can detect an AI generated paper. If the paper accurate, does it actually reference real papers that are also accurate? Do that instead.

  • It is only bad if the entire paper is written by the Bot and the fact are invented. If the information reported in the paper is correct, the conclusions are sound and the discussion is helpful, does it really matter who wrote it the person or a bot helped the person to write it?

    We just have to agree that it is a tool similar to a calculator. Do we care if the scientist who wrote the paper used a calculator or computed all the numbers himself / herself manually?

  • Just submit 100 AI generated articles to 100 different journals.
  • The problem with peer-reviewed journals these days is that people are able to publish a) fraudulent results (e.g. a data table that shows up a hundred different papers), b) AI-generated horseshit, and in some cases, c) literal nonsense (e.g. the famous "Take me off your mailing list" paper). And these articles often wind up indexed by PubMed alongside real research, and you never seem to hear of anyone facing any serious consequences.

    I'm not sure what the fix is. Certainly, there should be more consequenc

  • Its particularly bothersome to me that this was published in cell reports, a biology journal, and not a computer science journal. It reeks of a set of peers that aren't very versed on the topic. its true that ChatGPT is becoming an issue in schools, its undeniable that my peers are using it to get easy answers to questions that would take a while to search for, but the scope of this paper is just impractical. No PhD student would directly go to a bot known to give bad answers with questions on something as
  • Is this really a problem with academia?
    If students are going to want to cheat, they are going to find a way to cheat, and the current cheating methods, are probably just as easy as using ChatGPT and risking a fully BS response, or a sudden change in the quality of the students work.

    Back in my day when I took Computer Science, a Developer IDE key features where syntax coloring, and being able to compile a project within the IDE. By the time I graduated, the IDE, allowed for some type ahead features, integrat

  • Made of stiff declaratives and snippets from all requirement of three sources in the bibliography. All copied and rearranged (by hand! in writing!) and thesaurus handy, rephrased. I ai-ed (I made that a verb, it's my hot take on the fad--You find your own creative outlet.) the synonym by considering the publisher laid out the first few offerings as closer in meaning to the original word, than the later offering in the list. Add a conclusion, "And so the Egyptians were, like, super-awes builders!" and hand i
  • Here is an algorithm that will identify AI generated material with 0% false negatives:

    return True

  • Nice study, but if it types like my ex does, it's probably ChatGPT.
  • Why don't you just ask ChatGPT or those others if a certain paper was written by them?

  • The issue is, ChatGPT can literally copy any style, or modify it based on "prompt". So it can maybe detect the "default" style, and maybe others. But no system will be good enough to detect all custom variations.

    For example, I just asked it to rewrite the summary in a different style:

    A bunch of boffins from the University of Kansas have reportedly trained a computerized brain to sniff out academic papers penned by ChatGPT, scoring a whopping 99% hit rate. The nitty-gritty:

    As AI copywriters get slicker, tell

  • Once it gets out what they're using to detect it, a one sentence prompt modification can prevent it from generating whatever that is. If it's special words it uses or special style, or even if there's a particular ratio of something to something... if it's known, it can be countered, with nary a breath.
  • This is undoubtedly good news. However, I wonder how the text edited after generation will be tracked. We all realize it's not much different from what students wrote before when editing the original source. I've noticed that all these generated text detectors often flag human-written text as AI content, and something needs to be done about that too. Anyway, I hope that professional writers from https://edubirdie.com/top-writ... [edubirdie.com] will be in demand even more, as writing quality papers requires a lot of effort

"Engineering without management is art." -- Jeff Johnson

Working...