AI Writing Is Improving, But It Still Can't Match Human Creativity (science.org) 54

Posted by BeauHD on Saturday December 21, 2024 @09:00AM from the humans-win-this-round dept.

sciencehabit shares a report from Science Magazine: With a few keystrokes, anyone can ask an artificial intelligence (AI) program such as ChatGPT to write them a term paper, a rap song, or a play. But don't expect William Shakespeare's originality. A new study finds such output remains derivative -- at least for now. [...] [O]bjectively testing this creativity has been tricky. Scientists have generally taken two tacks. One is to use another computer program to search for signs of plagiarism -- though a lack of plagiarism does not necessarily equal creativity. The other approach is to have humans judge the AI output themselves, rating factors such as fluency and originality. But that's subjective and time intensive. So Ximing Lu, a computer scientist at the University of Washington, and colleagues created a program featuring both objectivity and a bit of nuance.

Called DJ Search, it collects pieces of text of a minimum length from whatever the AI outputs and searches for them in large online databases. DJ Search doesn't just look for identical matches; it also scans for strings whose words have similar meanings. To evaluate the meaning of a word or phrase, the program itself relies on a separate AI algorithm that produces a set of numbers called an "embedding," which roughly represents the contexts in which words are typically found. Synonymous words have numerically close embeddings. For example, phrases that swap "anticipation" and "excitement" are considered matches. After removing all matches, the program calculates the ratio of the remaining words to the original document length, which should give an estimate of how much of the AI's output is novel. The program conducts this process for various string lengths (the study uses a minimum of five words) and combines the ratios into one index of linguistic novelty. (The team calls it a "creativity index," but creativity requires both novelty and quality -- random gibberish is novel but not creative.)

The researchers compared the linguistic novelty of published novels, poetry, and speeches with works written by recent LLMs. Humans outscored AIs by about 80% in poetry, 100% in novels, and 150% in speeches, the researchers report in a preprint posted on OpenReview and currently under peer review. Although DJ Search was designed for comparing people and machines, it can also be used to compare two or more humanmade works. For example, Suzanne Collins's 2008 novel The Hunger Games scored 35% higher in linguistic originality than Stephenie Meyer's 2005 hit Twilight. (You can try the tool online.)

AI Writing Is Improving, But It Still Can't Match Human Creativity

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 54 Comments Log In/Create an Account

Comments Filter:

Inherent flaw? (Score:5, Insightful)

by Chris Mattern ( 191822 ) writes: on Saturday December 21, 2024 @09:11AM (#65030573)

"A new study finds such output remains derivative -- at least for now."
For now? The whole principle they're building on is to replicate what it's seen. How can it be anything *other* than derivative?

- Re: (Score:2)
  
  by dvice ( 6309704 ) writes:
  
  You simply add a random number generator to it. Generate random stuff, then start polishing it and you have yourself an original story. That is not the hard part.
  Hard part is to identify parts that humans enjoy. If you had a good scoring algorithm for that, you could just generate random stuff and pick the good stuff from the noise.
  - Re:Inherent flaw? (Score:4, Insightful)
    
    by Ol Olsoc ( 1175323 ) writes: on Saturday December 21, 2024 @11:10AM (#65030695)
    
    You simply add a random number generator to it. Generate random stuff, then start polishing it and you have yourself an original story. That is not the hard part.
    Hard part is to identify parts that humans enjoy. If you had a good scoring algorithm for that, you could just generate random stuff and pick the good stuff from the noise.
    Creativity, and creative people are not normal people. Not throwing shade, but that they might see and think things that are not what most people think or see. So they create, and sometimes it is pretty profound. What is more, is the misunderstanding that creativity needs no bounds. Creativity is all about restrictions.
    
    - Re: (Score:2, Interesting)
      
      by gweihir ( 88907 ) writes:
      
      What is more, is the misunderstanding that creativity needs no bounds. Creativity is all about restrictions.
      Exactly, It is about doing something _meaningful_ within restrictions that make sense. It is about ideas and structures derived from that idea. AI can, say, replace a character in an existing story or it can mix some stories together, but it cannot add to things. It can only make derivative things that are on lower quality than the input.
      Incidentally, the unavoidable problem of "model collapse" is a result of that.
      - Re: (Score:2)
        
        by Ol Olsoc ( 1175323 ) writes:
        
        What is more, is the misunderstanding that creativity needs no bounds. Creativity is all about restrictions.
        Exactly, It is about doing something _meaningful_ within restrictions that make sense. It is about ideas and structures derived from that idea. AI can, say, replace a character in an existing story or it can mix some stories together, but it cannot add to things. It can only make derivative things that are on lower quality than the input.
        Incidentally, the unavoidable problem of "model collapse" is a result of that.
        And the closest that AI comes to creativity is when it hallucinates. Of course that is still not creativity at all. At best it can be inadvertently funny.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Hallucinations are sort-of randomizations with worse randomness. That may look profound in some cases, bit it is still complete bullshit. And the thing is the AI cannot tell in which cases it looks profound and in which cases it just looks like nonsense.
        It is a bit like the 1000 monkeys with 1000 typewriters and unlimited time. Sure, at some point they will have _also_ written all great works of literature, but they cannot tell where they are in all the nonsense and random crap.
      - Re: (Score:3)
        
        by null etc. ( 524767 ) writes:
        
        Your criticisms of AI always rely upon definitions, terminology, and benchmark you that alone define and consider to be worthy of merit. Fortunately, many other of us try to think a little more critically about our statements.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        No, they do not. Like at all. And you did not even notice that I do typically not criticise LLMs, I criticize the lies that get pushed about them.
        I think you have no clue what I write here about LLMs at all. You just see something you do not understand but somehow admire being criticized and then you try to fling crap.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Nope. You will have a _random_ story. That is fundamentally different. Randomness cannot replace insight or creativity, even if some artists throughout history have tried that path.
- Re: (Score:3, Informative)
  
  by gweihir ( 88907 ) writes:
  
  Indeed. It will always be derivative and it will always be low quality with regard to content. Anything else would require insight and creativity and AI cannot do those. Period. What can get better is the language used, as that does not require insight or creativity.
  No idea why people continue to expect things from AI that it fundamentally cannot do.
  - Strawman argument (was Re:Inherent flaw? (Score:3)
    
    by rocket rancher ( 447670 ) writes:
    
    Indeed. It will always be derivative and it will always be low quality with regard to content. Anything else would require insight and creativity and AI cannot do those. Period. What can get better is the language used, as that does not require insight or creativity.
    No idea why people continue to expect things from AI that it fundamentally cannot do.
    The claim that AI "will always be derivative" and "low quality" because it lacks "insight and creativity" is a strawman argument, trying to reframe the discussion. By framing the discussion around abstract qualities like insight and creativity—terms that are not well-defined and often subjective—it misrepresents the goals and capabilities of AI systems. Most advocates for AI development are not claiming these systems possess human-like consciousness or insight. Instead, they focus on achieving
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      We are talking about LLMs. And LLM results will always be derivative and lower quality than their training data because that is THEIR FUCKING MATHEMATICAL NATURE. No "strawman" in there, just a lot of people, like you, that refuse to see actual facts.
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      ok, but the paper defines creativity [openreview.net] and developed a metric for measuring it. If you had even read the summary you would have known that.
      - Re: (Score:2)
        
        by rocket rancher ( 447670 ) writes:
        
        ok, but the paper defines creativity [openreview.net] and developed a metric for measuring it. If you had even read the summary you would have known that.
        I did read the paper.
        The CREATIVITY INDEX described in the paper is a statistical metric designed to quantify 'linguistic creativity' by analyzing the degree to which text can be reconstructed from existing web snippets. This metric provides insights into originality as defined by L-uniqueness. While this quantifies linguistic originality, it narrowly focuses on reconstructability and ignores broader aspects of AI creativity like conceptual novelty or structural innovation, underscoring that this metric d
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Thanks. That is pretty much what I expected.
        That said, here is a way to game the "Creativity" index: Use a thesaurus and change out words randomly. Use a grammatical transformer on top for extra effect.
        Anybody smart can immediately see this has no connection to the regular human-applicable definition of creativity and is just a form of adding a bit of "noise" with no insight or understanding or even real computational effort required. And hence this index is completely meaningless when talking about creativ
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        However, please leave the ad hominem attacks, like implying I did not read the article
        Ad homenim says point X is true because of personal attack Y. What I did was abuse/insult. There was no logical conclusion drawn, therefore it was not ad hominem.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Lots of papers claim lots of crap. I have done paper reviews for about 15 years. Some claims in some paper do not impress me. Also, people coming up with their own metrics to measure something they claim to have achieved is a BIG red flag. In most cases, that mean they have discovered nothing of worth, in the rest of the cases it means they are lying. Exceptions to that are _extremely_ rare. If you do your own metrics, you are essentially evaluation your own work and that never goes well and means you proba
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        No, I'm wondering why you entered the discussion with a non sequitur. I'm glad to hear you are not impressed, but...
- Re: (Score:3)
  
  by JoshuaZ ( 1134087 ) writes:
  
  Children start writing highly derivative stories also. We're not really clear on what people do to actually write genuinely creative stories, but even highly skilled writers seem to start with a lot of derivative things. In that sense, ChatGPT's attempts to write fiction resemble that of about a 12 to 14 year old child (although I've seen 12 year olds who are better writers than it). What needs to be done differently still isn't clear. That said, I'm not sure that writers as writers really want this. Writin
  - Re: Inherent flaw? (Score:3)
    
    by PPH ( 736903 ) writes:
    
    What needs to be done differently still isn't clear.
    Two things: AI needs better semantic models. Or even just one would be better than what we have now. And then AI needs heuristics trained by semantic generate and test routines. Maybe supervised at the outset*. But eventually internalized, as it is with experienced humans. To throw the garbage out before it even surfaces as a creation.
    *But this would fly in the face of AI investment. Having to pay actual human tutors rather than scrape "free" stuff off the Internet.
  - Re: (Score:2)
    
    by phantomfive ( 622387 ) writes:
    
    Children start writing highly derivative stories also.
    
    I'm not sure that's true. Sometimes children's stories are literal hallucinations (Lucy in the Sky with Diamonds?).
- Re: (Score:3)
  
  by rocket rancher ( 447670 ) writes:
  
  "A new study finds such output remains derivative -- at least for now."
  For now? The whole principle they're building on is to replicate what it's seen. How can it be anything *other* than derivative?
  This assertion relies on flawed reasoning and underestimates both the emergent nature of creativity and the trajectory of AI development. A more productive discussion would explore how AI enhances innovation rather than reducing it to a narrow definition of "derivative."
  The claim that AI "must always be derivative" because it "replicates what it's seen" makes several missteps. First, it begs the question by assuming that replication excludes novelty, ignoring that creativity often emerges from the recombina
  - Re: (Score:2)
    
    by Chris Mattern ( 191822 ) writes:
    
    "The claim that AI "must always be derivative" because it "replicates what it's seen" makes several missteps"
    At no point did I claim that all AIs must be derivative. My position is that all *LLMs* must be derivative, because it is baked into the very mechanics of how they work.
- Re: (Score:2)
  
  by Deep Esophagus ( 686515 ) writes:
  
  Whenever a new AI service comes out, I put it through its paces and inevitably see the same pattern emerge:
  For the first few days, it blows me away with seemingly unique, on-target responses to my prompts. This is true whether it's a chatbot for creative writing, an image generator, or a music generator.
  Then, over time, I start to see that it's rehashing the same concept over and over. In an extended roleplay conversation, it repeats the same stock phrases no matter what the context.
  What we're seeing is ELI [playclassic.games]
- Re: (Score:2)
  
  by silverjacket ( 1467653 ) writes:
  
  LLMs can be fine-tuned using RLHF to write things that people prefer over an imitation of typical pre-training data. (Though ironically in this case the researches found that fine-tuned models generated less novel output.)
And a bear . . . (Score:2)

by Latent Heat ( 558884 ) writes:

"relieves" itself in the woods?
- Re: (Score:2, Troll)
  
  by Ol Olsoc ( 1175323 ) writes:
  
  "relieves" itself in the woods?
  When thee white women are meeting them instead of a man.
  - Re: (Score:3)
    
    by quonset ( 4839537 ) writes:
    
    "relieves" itself in the woods?
    When thee white women are meeting them instead of a man.
    And who can blame them [foxnews.com]?
    - Re: (Score:2)
      
      by Ol Olsoc ( 1175323 ) writes:
      
      "relieves" itself in the woods?
      When thee white women are meeting them instead of a man.
      And who can blame them [foxnews.com]?
      Wahddya think? https://www.theguardian.com/us... [theguardian.com]
      https://www.14news.com/story/9... [14news.com]
      https://www.wjtv.com/news/loca... [wjtv.com]
      taint just the evil men who seem to enjoy ending people. The ladies are getting into the game as well.
      - Re: (Score:2)
        
        by quonset ( 4839537 ) writes:
        
        98.5% of all murders are committed by men.
        80% of all violent crimes are committed by men.
        Women have a long way to go to catch up.
        
        Re: (Score:2)
        
        by Ol Olsoc ( 1175323 ) writes:
        
        98.5% of all murders are committed by men.
        80% of all violent crimes are committed by men.
        Women have a long way to go to catch up.
        Women are in general inherently less violent than males. Do your statistics include men hired by women to kill their husbands? That's one of the favorite modes. But your being triggered enough to take my joke seriously? Wowsers. full disclaimer, you getting spun up was not unenjoyable. You appear to be deeply immersed in fear culture with Olsoc's silly little reference to the modern women's mantra of preferentially meeting a bear in the woods surely tells me that you agree that is the better option becau
Duh! (Score:2)

by methano ( 519830 ) writes:

Duh!
Wrong question (Score:5, Interesting)

by allo ( 1728082 ) writes: on Saturday December 21, 2024 @10:27AM (#65030641)

Why should AI be creative when the whole source of its creativity is a long int seed? Without your own creativity, all you get is variants of what the model likes to write. It may read well, but after a while it will always be the same.
Give the model input from your creativity and use the model's writing skills to make your vision of a text come true. Why do we need to outsource this to the model?

- Comment removed (Score:4, Interesting)
  
  by account_deleted ( 4530225 ) writes: on Saturday December 21, 2024 @11:19AM (#65030705)
  
  Comment removed based on user account deletion
  
  - Re: (Score:2)
    
    by allo ( 1728082 ) writes:
    
    I think the largest problem with bland prose is bad datasets. If you look at the bland prose, most of it isn't all that bad. Yes, all common tropes and so on, but not rare in other literature and not bad per se. But the models have a way too large repetition quote and too little diversity.
    One thing is, that the model starts anew with each text. Write one chapter without the previous one in the context, and you get repetitive phrasing, because the model doesn't know it (over)used this phrase in the last chap
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - Re: (Score:2)
        
        by allo ( 1728082 ) writes:
        
        The other method is to handle the statistics well. One method against "GPTisms" is for example the XTC (exclude top choices) sampler, that explicitly avoids the most likely tokens if the next tokens still have enough probability. Most people don't know that aspect about text models, but once you got the probabilities for all tokens (you always get all probabilities after the network is evaluated) you can chain a lot of samplers that change the distribution (temperature), cut off parts of the distribution (m
        
        Re: (Score:2)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
        
        Re: (Score:2)
        
        by allo ( 1728082 ) writes:
        
        Yes and no. It's kind of "a good fake is as good as the real thing". The whole meaningful thing can mostly be judged by statistics and that's why LLM proof to be useful even though one would not expect that architecture to do what people do with it. It is also interesting to see LLM-as-a-judge techniques to, for example, let a large network judge the outputs of a smaller one to distill knowledge into the smaller one. You won't be able to fully fact-check outputs with it (especially when none of the networks
joke to a brick wall (Score:2)

by bobbutts ( 927504 ) writes:

sure honestly ai just cant write like us its all robotic and stuff never gets the feelings right sometimes the stories are just boring and lack the depth you need its like trying to explain a joke to a brick wall ai will never truly understand what makes writing special its just lines of code not real creativity
Forest and trees (Score:2)

by WaffleMonster ( 969671 ) writes:

They are measuring linguistic creativity which concerns only a measure of uniqueness of words in sentences and phrases rather than attempting to measure overall creativity of the work. Neither does the paper even once mention temperature parameter. They select poor models like ChatGPT well known for being highly overfit and llama2 when there are way better models tuned for this kind of work readily available.
Overall I think the paper is fundamentally flawed and guaranteed to cause confusion in its choice
- Re: (Score:2)
  
  by war4peace ( 1628283 ) writes:
  
  Well, I slammed a few of my poems into that tool, and got a creativity index between 75% and 80%.
  I'm yet to figure out... 75%-80% of what, exactly?
- Re: (Score:3)
  
  by gweihir ( 88907 ) writes:
  
  Indeed. Essentially, they have created a metric and benchmark to (fake) support for the conclusions they wanted to find. That is junk-science and meaningless.
- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  Yeah, I think there's a basic contradiction in trying to 'prove' that humans are more creative than AI by applying a metric that is itself an algorithm. Train the AI to optimize this metric of creativity and my guess is it will take the lead.
  Since it is accepted that people possess creativity and the question is whether AI merits admission to the club, the judges of creativity must be human and the criteria must be subjective. This can still constitute proof if the judging is blind (the judges aren't to
Wrong priority (Score:3, Insightful)

by MpVpRb ( 1423381 ) writes: on Saturday December 21, 2024 @11:51AM (#65030757)

We don't need robot artists.
We need AI systems that can solve previously intractable problems in physics, medicine and engineering.
The art problem has been solved a long time ago. People are good at art and don't need robot help.

- Re: (Score:2)
  
  by strikethree ( 811449 ) writes:
  
  The art problem has been solved a long time ago. People are good at art and don't need robot help.
  The entertainment industry is a multi-trillion dollar industry. Artists salaries take a good chunk of that money. Someone wants that money without the artists involvement.
  So no, the "art problem" has not been solved at all. It will be solved when no artists are getting paid.
but you don't need creativity anymore (Score:2)

by Big Hairy Gorilla ( 9839972 ) writes:

and you won't have any left after using AI to think for you... <see link>

We're headed to a world of zombies and automatons and "intelligent web agents" <ha haahahhaaa>
You're being trained to NOT be creative, actually, to NOT think at all, and it appears you're embracing it
press the button for your food pellet.

see what the eggheads at U of T say... what I've being saying for a while now
https://techxplore.com/news/2024-10-explores-impact-llms-human-creativity.html
AI is stealing works more and more (Score:2)

by BrendaEM ( 871664 ) writes:

Look, the more other people's works we steal, the better it gets!
Input=output (Score:2)

by mukundajohnson ( 10427278 ) writes:

The model doesn't stray far from the answer it has. It might be varied slightly, but not by much. It might have a different flavor or tone, but typically it is the same answer.
At work, some people were writing a very long and complex prompt to get some information about a topic. I went ahead and wrote a few sentences, asking the question more directly. And while the others might have gotten some verbose output, flavored and toned a certain way, I got essentially the same answer.
I say there is not enough dat
But It Still Can't Match Human Creativity (Score:2)

by nospam007 ( 722110 ) * writes:

So, like 99% of US?
Originality? (Score:2)

by RockDoctor ( 15477 ) writes:

One of the first responses made the point that, since they rely on a "corpus" to give them clues as to the most-likely response to a particular set of words in a query, these tools can't help but be derivative. Which is pretty obvious to me.
It's the more "meta" parts of the question of "originality" that I don't see any sign of (in the incessant writing on the subject - never had the slightest interest in actually trying them for myself. I'd rather spend hours per day reading novel papers on Ar&chi.,v)

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Inherent flaw? (Score:5, Insightful)

Re: (Score:2)

Re:Inherent flaw? (Score:4, Insightful)

Re: (Score:2, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Strawman argument (was Re:Inherent flaw? (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: Inherent flaw? (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

And a bear . . . (Score:2)

Re: (Score:2, Troll)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Duh! (Score:2)

Wrong question (Score:5, Interesting)

Comment removed (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

joke to a brick wall (Score:2)

Forest and trees (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Wrong priority (Score:3, Insightful)

Re: (Score:2)

but you don't need creativity anymore (Score:2)

AI is stealing works more and more (Score:2)

Input=output (Score:2)

But It Still Can't Match Human Creativity (Score:2)

Originality? (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals