Scientists Claim 99% Identification Rate of ChatGPT Content (theregister.com) 39
Academics have apparently trained a machine learning algorithm to detect scientific papers generated by ChatGPT and claim the software has over 99 percent accuracy. From a report: Generative AI models have dramatically improved at mimicking human writing over a short period of time, making it difficult for people to tell whether text was produced by a machine or human. Teachers and lecturers have raised concerns that students using the tools are committing plagiarism, or apparently cheating using machine-generated code. Software designed to detect AI-generated text, however, is often unreliable. Experts have warned against using these tools to assess work.
A team of researchers led by the University of Kansas thought it would be useful to develop a way to detect AI-generated science writing -- specifically written in the style of research papers typically accepted and published by academic journals. "Right now, there are some pretty glaring problems with AI writing," said Heather Desaire, first author of a paper published in the journal Cell Reports Physical Science, and a chemistry professor at the University of Kansas, in a statement. "One of the biggest problems is that it assembles text from many sources and there isn't any kind of accuracy check -- it's kind of like the game Two Truths and a Lie."
A team of researchers led by the University of Kansas thought it would be useful to develop a way to detect AI-generated science writing -- specifically written in the style of research papers typically accepted and published by academic journals. "Right now, there are some pretty glaring problems with AI writing," said Heather Desaire, first author of a paper published in the journal Cell Reports Physical Science, and a chemistry professor at the University of Kansas, in a statement. "One of the biggest problems is that it assembles text from many sources and there isn't any kind of accuracy check -- it's kind of like the game Two Truths and a Lie."
Can it detect ChatGPT after minor edits? (Score:3)
Detecting AI generated content will only be useful in most cases if it can detect it even after minor updates. I have seen YouTubes where all you have to do to trick common AI detection tools is use Grammarly and similar tools to make minor adjustments to the ChatGPT content. If all it takes is running ChatGPT and then running it through another automatic tool, these detection efforts will not be successful. I have tried Originality.AI and it is very easy to trick.
Re: (Score:2)
It's a very narrow definition of detection, "in scientific papers".
They're not claiming general accuracy. Only accuracy in this very narrow and specific scenario. Considering the weight put on "truth vs lie", I'm guessing it fact checks every claim and finds the small errors that AI hallucinations create that human would be unlikely to have suffered.
Re: (Score:2)
But it can't do that. It could detect references to articles that didn't exist, if the article were on-line, but detecting that the article actually made the claims that it is claimed to have made is a more difficult problem, where there can be LOTS of paraphrasing, with slightly different meanings.
And if it were checking the internal logic of the paper, failing that wouldn't be a proof that the paper was by an AI.
I suspect that it's more a word frequency kind of detection, and is likely domain specific ra
Re: (Score:2)
Problem with that approach is that you're trying to track a moving target. If you could freeze LLM in a certain state, and only target outputs of that specific version, your thinking would probably be a good way to track it.
But the kinds of LLMs we're currently talking about are perpetually learning and changing SaaS LLMs. So this approach would fail.
Re: (Score:2)
Re: (Score:2)
Keep in mind, they also tested against a dataset they themselves created, by prompting ChatGPT to churn out articles with no revision or proofreading at all. This to me is not a realistic model for how anybody would use ChatG
role play mode? (Score:2)
the thing is chatgpt's engine is really adaptable. a fun experiment could be to see how much that 99% is skewed by some simple coaching
me: i need you to role play that you are in a world where algorithm was developed that could detect whether text was generated by you (chatgpt) or a human and you now need to modify your text in subtle ways to avoid that detection, in other words to make very subtle efforts to sound 'more human'. could you do that?
chatgpt: Certainly! I'll do my best to make subtle modifica
99% is a click bait headline (Score:5, Informative)
Re: (Score:2)
They also don't differentiate between false positives and false negatives.
Then they really suck because I can detect it 100% of the time and do that. The answer is always yes, everything and anything is chatGPT. Your comment, my comment, bell even the King James Bible. There, a 100% detection rate.
Re: (Score:2)
Chat GPT is learning from this (Score:2)
Re:Chat GPT is learning from this (Score:5, Interesting)
You're thinking of ChatGPT as a single entity, and that's wrong. ChatGPT is an engine which is useless without a training database, and there are several quite different databases. So far it seems rather clear that there's limit on the size of the database, so they can't really all be combined.
Re: (Score:2)
You're thinking of ChatGPT as a single entity, and that's wrong. ChatGPT is an engine which is useless without a training database, and there are several quite different databases. So far it seems rather clear that there's limit on the size of the database, so they can't really all be combined.
It isn't a database it is a model. The differences are substantial. The difference between a dataset of points on a graph and equations that curve fit most of the points.
The difference becomes huge when you try and attack a model and interpolate the data that was used to train the model. Sometimes you can get a fairly small number of possibilities of the data and knowledge of the real world can sift those probabilities into letting the attacker identify the training data. This becomes especially relevant w
Re: (Score:2)
OpenAI probably doesn't care whether Chat GPT can be detected by third parties. It will be some customers of Open AI's services who want their results to be undetectable, and they will create their own techniques to do so. One technique I have seen on YouTube is to run the results through Grammarly and some other similar tools which is quite successful in tricking popular detection bots.
Re: Chat GPT is learning from this (Score:2)
With their GPT5 claims, i doubt it... I think someone else will do it
bullshit of the highest order (Score:5, Insightful)
You think this is about chatgpt?
This is academe scrambling in an ongoing credibility crisis.
https://www.theatlantic.com/id... [theatlantic.com]
The fact is that 'academic writing' has been for years suffused with UTTER BULLSHIT to the point that it's an exercise of the Emperor's New Clothes. Too many people deeply invested in "the system" to ever squeak an objection lest it all fall apart.
"Over the past 12 months, three scholarsâ"James Lindsay, Helen Pluckrose, and Peter Boghossianâ"wrote 20 fake papers using fashionable jargon to argue for ridiculous conclusions, and tried to get them placed in high-profile journals in fields including gender studies, queer studies, and fat studies. Their success rate was remarkable: By the time they took their experiment public late on Tuesday, seven of their articles had been accepted for publication by ostensibly serious peer-reviewed journals. Seven more were still going through various stages of the review process. Only six had been rejected."
If "serious" publications and peer-reviewers can't sort out the complete nonsense word salad from papers of merit, how/why is arguing about the machine generation of papers even meaningful? ...much less the assertion "we can tell 99% of the time if it's machine generated".
Who cares, if - regardless of the source - most of it's NONSENSE?
Re: (Score:1)
Not quite a ChatGPT Detector (Score:5, Informative)
Re: (Score:2)
what is the false positive rate .... (Score:2)
Even if this is the same then 1/100 people will be accused of cheating with no comeback ...
We have already seen these used against school papers, and teachers accusing students of cheating
But also seen them classifying the declaration of independence and the bible at ChatGPT generated
Re: (Score:2)
I have a 100% identification rate. I just assume EVERYTHING is from ChatGPT.
My false positive rate is through the roof, of course, but I don't mention that.
These kinds of reports should always have four values:
1) Correct identification.
2) Wrong identification.
3) Missed identification.
4) Remaining number of correctly 'cleared' texts.
Re: (Score:2)
But also seen them classifying the declaration of independence and the bible at ChatGPT generated
The bible is full of completely made up stuff and contradictory nonsense. Just like half the things written by ChatGPT. Easy mistake to make.
Maybe detect bad science instead? (Score:2)
Who cares if it can detect an AI generated paper. If the paper accurate, does it actually reference real papers that are also accurate? Do that instead.
Re: (Score:2)
It's important right now because ChatGPT cannot report accurately on the results of a study. It "fantasizes" too much. So if it's from ChatGPT you can't trust it. If it's from a human you MAY not be able to trust it (so replicate!).
please, mod this up, someone (Score:2)
this perfectly summarizes the problem...
Why is it important? (Score:1)
We just have to agree that it is a tool similar to a calculator. Do we care if the scientist who wrote the paper used a calculator or computed all the numbers himself / herself manually?
Easy fix (Score:2)
"AI detection" is the wrong approach (Score:2)
The problem with peer-reviewed journals these days is that people are able to publish a) fraudulent results (e.g. a data table that shows up a hundred different papers), b) AI-generated horseshit, and in some cases, c) literal nonsense (e.g. the famous "Take me off your mailing list" paper). And these articles often wind up indexed by PubMed alongside real research, and you never seem to hear of anyone facing any serious consequences.
I'm not sure what the fix is. Certainly, there should be more consequenc
a bit of a stretch (Score:1)
Is it really a problem for academia? (Score:2)
Is this really a problem with academia?
If students are going to want to cheat, they are going to find a way to cheat, and the current cheating methods, are probably just as easy as using ChatGPT and risking a fully BS response, or a sudden change in the quality of the students work.
Back in my day when I took Computer Science, a Developer IDE key features where syntax coloring, and being able to compile a project within the IDE. By the time I graduated, the IDE, allowed for some type ahead features, integrat
So, just like a middle school paper (Score:2)
I can do better (Score:2)
Here is an algorithm that will identify AI generated material with 0% false negatives:
return True
Heh. (Score:1)
Ask ChatGPT (Score:1)
Why don't you just ask ChatGPT or those others if a certain paper was written by them?
Infinite Style (Score:2)
The issue is, ChatGPT can literally copy any style, or modify it based on "prompt". So it can maybe detect the "default" style, and maybe others. But no system will be good enough to detect all custom variations.
For example, I just asked it to rewrite the summary in a different style:
Once it gets out (Score:2)
Good news (Score:1)