AI Writing Is Improving, But It Still Can't Match Human Creativity (science.org) 42
sciencehabit shares a report from Science Magazine: With a few keystrokes, anyone can ask an artificial intelligence (AI) program such as ChatGPT to write them a term paper, a rap song, or a play. But don't expect William Shakespeare's originality. A new study finds such output remains derivative -- at least for now. [...] [O]bjectively testing this creativity has been tricky. Scientists have generally taken two tacks. One is to use another computer program to search for signs of plagiarism -- though a lack of plagiarism does not necessarily equal creativity. The other approach is to have humans judge the AI output themselves, rating factors such as fluency and originality. But that's subjective and time intensive. So Ximing Lu, a computer scientist at the University of Washington, and colleagues created a program featuring both objectivity and a bit of nuance.
Called DJ Search, it collects pieces of text of a minimum length from whatever the AI outputs and searches for them in large online databases. DJ Search doesn't just look for identical matches; it also scans for strings whose words have similar meanings. To evaluate the meaning of a word or phrase, the program itself relies on a separate AI algorithm that produces a set of numbers called an "embedding," which roughly represents the contexts in which words are typically found. Synonymous words have numerically close embeddings. For example, phrases that swap "anticipation" and "excitement" are considered matches. After removing all matches, the program calculates the ratio of the remaining words to the original document length, which should give an estimate of how much of the AI's output is novel. The program conducts this process for various string lengths (the study uses a minimum of five words) and combines the ratios into one index of linguistic novelty. (The team calls it a "creativity index," but creativity requires both novelty and quality -- random gibberish is novel but not creative.)
The researchers compared the linguistic novelty of published novels, poetry, and speeches with works written by recent LLMs. Humans outscored AIs by about 80% in poetry, 100% in novels, and 150% in speeches, the researchers report in a preprint posted on OpenReview and currently under peer review. Although DJ Search was designed for comparing people and machines, it can also be used to compare two or more humanmade works. For example, Suzanne Collins's 2008 novel The Hunger Games scored 35% higher in linguistic originality than Stephenie Meyer's 2005 hit Twilight. (You can try the tool online.)
Called DJ Search, it collects pieces of text of a minimum length from whatever the AI outputs and searches for them in large online databases. DJ Search doesn't just look for identical matches; it also scans for strings whose words have similar meanings. To evaluate the meaning of a word or phrase, the program itself relies on a separate AI algorithm that produces a set of numbers called an "embedding," which roughly represents the contexts in which words are typically found. Synonymous words have numerically close embeddings. For example, phrases that swap "anticipation" and "excitement" are considered matches. After removing all matches, the program calculates the ratio of the remaining words to the original document length, which should give an estimate of how much of the AI's output is novel. The program conducts this process for various string lengths (the study uses a minimum of five words) and combines the ratios into one index of linguistic novelty. (The team calls it a "creativity index," but creativity requires both novelty and quality -- random gibberish is novel but not creative.)
The researchers compared the linguistic novelty of published novels, poetry, and speeches with works written by recent LLMs. Humans outscored AIs by about 80% in poetry, 100% in novels, and 150% in speeches, the researchers report in a preprint posted on OpenReview and currently under peer review. Although DJ Search was designed for comparing people and machines, it can also be used to compare two or more humanmade works. For example, Suzanne Collins's 2008 novel The Hunger Games scored 35% higher in linguistic originality than Stephenie Meyer's 2005 hit Twilight. (You can try the tool online.)
Inherent flaw? (Score:5, Insightful)
"A new study finds such output remains derivative -- at least for now."
For now? The whole principle they're building on is to replicate what it's seen. How can it be anything *other* than derivative?
Re: (Score:2)
You simply add a random number generator to it. Generate random stuff, then start polishing it and you have yourself an original story. That is not the hard part.
Hard part is to identify parts that humans enjoy. If you had a good scoring algorithm for that, you could just generate random stuff and pick the good stuff from the noise.
Re:Inherent flaw? (Score:4, Insightful)
You simply add a random number generator to it. Generate random stuff, then start polishing it and you have yourself an original story. That is not the hard part.
Hard part is to identify parts that humans enjoy. If you had a good scoring algorithm for that, you could just generate random stuff and pick the good stuff from the noise.
Creativity, and creative people are not normal people. Not throwing shade, but that they might see and think things that are not what most people think or see. So they create, and sometimes it is pretty profound. What is more, is the misunderstanding that creativity needs no bounds. Creativity is all about restrictions.
Re: (Score:2, Interesting)
What is more, is the misunderstanding that creativity needs no bounds. Creativity is all about restrictions.
Exactly, It is about doing something _meaningful_ within restrictions that make sense. It is about ideas and structures derived from that idea. AI can, say, replace a character in an existing story or it can mix some stories together, but it cannot add to things. It can only make derivative things that are on lower quality than the input.
Incidentally, the unavoidable problem of "model collapse" is a result of that.
Re: (Score:2)
What is more, is the misunderstanding that creativity needs no bounds. Creativity is all about restrictions.
Exactly, It is about doing something _meaningful_ within restrictions that make sense. It is about ideas and structures derived from that idea. AI can, say, replace a character in an existing story or it can mix some stories together, but it cannot add to things. It can only make derivative things that are on lower quality than the input.
Incidentally, the unavoidable problem of "model collapse" is a result of that.
And the closest that AI comes to creativity is when it hallucinates. Of course that is still not creativity at all. At best it can be inadvertently funny.
Re: (Score:2)
Hallucinations are sort-of randomizations with worse randomness. That may look profound in some cases, bit it is still complete bullshit. And the thing is the AI cannot tell in which cases it looks profound and in which cases it just looks like nonsense.
It is a bit like the 1000 monkeys with 1000 typewriters and unlimited time. Sure, at some point they will have _also_ written all great works of literature, but they cannot tell where they are in all the nonsense and random crap.
Re: (Score:3)
Your criticisms of AI always rely upon definitions, terminology, and benchmark you that alone define and consider to be worthy of merit. Fortunately, many other of us try to think a little more critically about our statements.
Re: (Score:2)
No, they do not. Like at all. And you did not even notice that I do typically not criticise LLMs, I criticize the lies that get pushed about them.
I think you have no clue what I write here about LLMs at all. You just see something you do not understand but somehow admire being criticized and then you try to fling crap.
Re: (Score:2)
Nope. You will have a _random_ story. That is fundamentally different. Randomness cannot replace insight or creativity, even if some artists throughout history have tried that path.
Re: (Score:3, Informative)
Indeed. It will always be derivative and it will always be low quality with regard to content. Anything else would require insight and creativity and AI cannot do those. Period. What can get better is the language used, as that does not require insight or creativity.
No idea why people continue to expect things from AI that it fundamentally cannot do.
Strawman argument (was Re:Inherent flaw? (Score:3)
Indeed. It will always be derivative and it will always be low quality with regard to content. Anything else would require insight and creativity and AI cannot do those. Period. What can get better is the language used, as that does not require insight or creativity.
No idea why people continue to expect things from AI that it fundamentally cannot do.
The claim that AI "will always be derivative" and "low quality" because it lacks "insight and creativity" is a strawman argument, trying to reframe the discussion. By framing the discussion around abstract qualities like insight and creativity—terms that are not well-defined and often subjective—it misrepresents the goals and capabilities of AI systems. Most advocates for AI development are not claiming these systems possess human-like consciousness or insight. Instead, they focus on achieving
Re: (Score:2)
We are talking about LLMs. And LLM results will always be derivative and lower quality than their training data because that is THEIR FUCKING MATHEMATICAL NATURE. No "strawman" in there, just a lot of people, like you, that refuse to see actual facts.
Re: (Score:2)
Re: (Score:3)
Re: Inherent flaw? (Score:3)
What needs to be done differently still isn't clear.
Two things: AI needs better semantic models. Or even just one would be better than what we have now. And then AI needs heuristics trained by semantic generate and test routines. Maybe supervised at the outset*. But eventually internalized, as it is with experienced humans. To throw the garbage out before it even surfaces as a creation.
*But this would fly in the face of AI investment. Having to pay actual human tutors rather than scrape "free" stuff off the Internet.
Re: (Score:2)
Children start writing highly derivative stories also.
I'm not sure that's true. Sometimes children's stories are literal hallucinations (Lucy in the Sky with Diamonds?).
Re: (Score:3)
"A new study finds such output remains derivative -- at least for now."
For now? The whole principle they're building on is to replicate what it's seen. How can it be anything *other* than derivative?
This assertion relies on flawed reasoning and underestimates both the emergent nature of creativity and the trajectory of AI development. A more productive discussion would explore how AI enhances innovation rather than reducing it to a narrow definition of "derivative."
The claim that AI "must always be derivative" because it "replicates what it's seen" makes several missteps. First, it begs the question by assuming that replication excludes novelty, ignoring that creativity often emerges from the recombina
Re: (Score:2)
Whenever a new AI service comes out, I put it through its paces and inevitably see the same pattern emerge:
For the first few days, it blows me away with seemingly unique, on-target responses to my prompts. This is true whether it's a chatbot for creative writing, an image generator, or a music generator.
Then, over time, I start to see that it's rehashing the same concept over and over. In an extended roleplay conversation, it repeats the same stock phrases no matter what the context.
What we're seeing is ELI [playclassic.games]
And a bear . . . (Score:2)
"relieves" itself in the woods?
Re: (Score:2, Troll)
"relieves" itself in the woods?
When thee white women are meeting them instead of a man.
Re: (Score:3)
"relieves" itself in the woods?
When thee white women are meeting them instead of a man.
And who can blame them [foxnews.com]?
Re: (Score:2)
"relieves" itself in the woods?
When thee white women are meeting them instead of a man.
And who can blame them [foxnews.com]?
Wahddya think? https://www.theguardian.com/us... [theguardian.com]
https://www.14news.com/story/9... [14news.com]
https://www.wjtv.com/news/loca... [wjtv.com]
taint just the evil men who seem to enjoy ending people. The ladies are getting into the game as well.
Re: (Score:2)
98.5% of all murders are committed by men.
80% of all violent crimes are committed by men.
Women have a long way to go to catch up.
Duh! (Score:2)
Wrong question (Score:5, Interesting)
Why should AI be creative when the whole source of its creativity is a long int seed? Without your own creativity, all you get is variants of what the model likes to write. It may read well, but after a while it will always be the same.
Give the model input from your creativity and use the model's writing skills to make your vision of a text come true. Why do we need to outsource this to the model?
Re:Wrong question (Score:4, Interesting)
I suspect this tool's analyses & results were a foregone conclusion when the researchers thought up the idea. Of course, GPT LLMs are going to produce bland prose; they're essentially "averaging machines" & all distinctiveness has been statistically cancelled out.
Re: (Score:2)
I think the largest problem with bland prose is bad datasets. If you look at the bland prose, most of it isn't all that bad. Yes, all common tropes and so on, but not rare in other literature and not bad per se. But the models have a way too large repetition quote and too little diversity.
One thing is, that the model starts anew with each text. Write one chapter without the previous one in the context, and you get repetitive phrasing, because the model doesn't know it (over)used this phrase in the last chap
Re: (Score:2)
joke to a brick wall (Score:2)
Forest and trees (Score:2)
They are measuring linguistic creativity which concerns only a measure of uniqueness of words in sentences and phrases rather than attempting to measure overall creativity of the work. Neither does the paper even once mention temperature parameter. They select poor models like ChatGPT well known for being highly overfit and llama2 when there are way better models tuned for this kind of work readily available.
Overall I think the paper is fundamentally flawed and guaranteed to cause confusion in its choice
Re: (Score:2)
Well, I slammed a few of my poems into that tool, and got a creativity index between 75% and 80%.
I'm yet to figure out... 75%-80% of what, exactly?
Re: (Score:3)
Indeed. Essentially, they have created a metric and benchmark to (fake) support for the conclusions they wanted to find. That is junk-science and meaningless.
Re: (Score:2)
Since it is accepted that people possess creativity and the question is whether AI merits admission to the club, the judges of creativity must be human and the criteria must be subjective. This can still constitute proof if the judging is blind (the judges aren't to
Wrong priority (Score:3, Insightful)
We don't need robot artists.
We need AI systems that can solve previously intractable problems in physics, medicine and engineering.
The art problem has been solved a long time ago. People are good at art and don't need robot help.
but you don't need creativity anymore (Score:2)
We're headed to a world of zombies and automatons and "intelligent web agents" <ha haahahhaaa>
You're being trained to NOT be creative, actually, to NOT think at all, and it appears you're embracing it
press the button for your food pellet.
see what the eggheads at U of T say... what I've being saying for a while now
https://techxplore.com/news/2024-10-explores-impact-llms-human-creativity.html
AI is stealing works more and more (Score:2)
Input=output (Score:2)
The model doesn't stray far from the answer it has. It might be varied slightly, but not by much. It might have a different flavor or tone, but typically it is the same answer.
At work, some people were writing a very long and complex prompt to get some information about a topic. I went ahead and wrote a few sentences, asking the question more directly. And while the others might have gotten some verbose output, flavored and toned a certain way, I got essentially the same answer.
I say there is not enough dat
But It Still Can't Match Human Creativity (Score:2)
So, like 99% of US?
Originality? (Score:2)
It's the more "meta" parts of the question of "originality" that I don't see any sign of (in the incessant writing on the subject - never had the slightest interest in actually trying them for myself. I'd rather spend hours per day reading novel papers on Ar&chi.,v)