Google DeepMind Uses LLM To Solve Unsolvable Math Problem

Google DeepMind Uses LLM To Solve Unsolvable Math Problem (technologyreview.com) 48

Posted by BeauHD on Thursday December 14, 2023 @11:30PM from the what-will-they-think-of-next dept.

An anonymous reader quotes a report from MIT Technology Review: In a paper published in Nature today, the researchers say it is the first time a large language model has been used to discover a solution to a long-standing scientific puzzle -- producing verifiable and valuable new information that did not previously exist. "It's not in the training data -- it wasn't even known," says coauthor Pushmeet Kohli, vice president of research at Google DeepMind. Large language models have a reputation for making things up, not for providing new facts. Google DeepMind's new tool, called FunSearch, could change that. It shows that they can indeed make discoveries -- if they are coaxed just so, and if you throw out the majority of what they come up with.

FunSearch (so called because it searches for mathematical functions, not because it's fun) continues a streak of discoveries in fundamental math and computer science that DeepMind has made using AI. First Alpha Tensor found a way to speed up a calculation at the heart of many different kinds of code, beating a 50-year record. Then AlphaDev found ways to make key algorithms used trillions of times a day run faster. Yet those tools did not use large language models. Built on top of DeepMind's game-playing AI AlphaZero, both solved math problems by treating them as if they were puzzles in Go or chess. The trouble is that they are stuck in their lanes, says Bernardino Romera-Paredes, a researcher at the company who worked on both AlphaTensor and FunSearch: "AlphaTensor is great at matrix multiplication, but basically nothing else." FunSearch takes a different tack. It combines a large language model called Codey, a version of Google's PaLM 2 that isfine-tuned on computer code, with other systems that reject incorrect or nonsensical answers and plug good ones back in.

The researchers started by sketching out the problem they wanted to solve in Python, a popular programming language. But they left out the lines in the program that would specify how to solve it. That is where FunSearch comes in. It gets Codey to fill in the blanks -- in effect, to suggest code that will solve the problem. A second algorithm then checks and scores what Codey comes up with. The best suggestions -- even if not yet correct -- are saved and given back to Codey, which tries to complete the program again. After a couple of million suggestions and a few dozen repetitions of the overall process -- which took a few days -- FunSearch was able to come up with code that produced a correct and previously unknown solution to the cap set problem, which involves finding the largest size of a certain type of set. Imagine plotting dots on graph paper. [...] To test its versatility, the researchers used FunSearch to approach another hard problem in math: the bin packing problem, which involves trying to pack items into as few bins as possible. This is important for a range of applications in computer science, from data center management to e-commerce. FunSearch came up with a way to solve it that's faster than human-devised ones.

Google DeepMind Uses LLM To Solve Unsolvable Math Problem

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 48 Comments Log In/Create an Account

Comments Filter:

A million monkeys? (Score:5, Interesting)

by AmiMoJo ( 196126 ) writes: on Thursday December 14, 2023 @11:55PM (#64082887) Homepage Journal

It sounds an awful lot like they are producing massive amounts of worthless code, like a million monkeys with a million typewriters, and then testing the output until it works. Every iteration the ones that do anything resembling the problem are sent back to the monkeys for editing.

- Re:A million monkeys? (Score:5, Insightful)
  
  by phantomfive ( 622387 ) writes: on Friday December 15, 2023 @12:02AM (#64082891) Journal
  
  It sounds like they are using the LLM portion to help prod their genetic algorithm to escape local maxima [wikipedia.org].
  
  - Re: (Score:3)
    
    by kvezach ( 1199717 ) writes:
    
    Global search algorithms, particularly simulated annealing, benefit from a mutation/transition space where going from an awful solution to a good solution is possible in relatively few steps. I wouldn't be surprised if LLMs can provide a better such space than simpler methods, even if the LLMs can never get anywhere near optimal on their own.
- Re:A million monkeys? (Score:5, Funny)
  
  by ShanghaiBill ( 739463 ) writes: on Friday December 15, 2023 @12:03AM (#64082893)
  
  producing massive amounts of worthless code, like a million monkeys with a million typewriters, and then testing the output until it works.
  I've met humans who develop code the same way.
  
  - Re:A million monkeys? (Score:5, Insightful)
    
    by NFN_NLN ( 633283 ) writes: on Friday December 15, 2023 @12:56AM (#64082937)
    
    The more important questions is: have you met humans that don't develop code that way?
    
  - Re: (Score:2)
    
    by VeryFluffyBunny ( 5037285 ) writes:
    
    Also sounds like the PR & marketing industry to a tee.
  - Re: (Score:2)
    
    by ThePawArmy ( 952965 ) writes:
    
    Unlike horses, you can flog a dead program into life.
  - Re: (Score:2)
    
    by hAckz0r ( 989977 ) writes:
    
    I've met humans who develop code the same way.
    Humans did after all write all the words on the Internet that these LLM's have processed. It's a kind of like the convergent evolution of all BS. Talk enough like you know something and pretty soon it might start looking like you have an answer to some problem. Computers are just faster at connecting the dots between completely unrelated topics using statistics. With enough statistics all lies become true.
- Re: A million monkeys? (Score:2)
  
  by aldousd666 ( 640240 ) writes:
  
  Hi. It's not useless if it produces results. These are solutions we do not have previously. And it's a first Gen product. I'd say that's a win. 9 million versions of todo list applications that populate GitHub, on the other hand, that's a million monkeys
  - Re: (Score:3)
    
    by Eunomion ( 8640039 ) writes:
    
    It is useless if the problems it causes are at least as troublesome as the results are helpful. And since tornadoes aren't known to spontaneously assemble communities out of random parts, the "million monkeys" approach to technology almost never works unless the goal is stalling or con artistry.
    - Re: (Score:3)
      
      by WaffleMonster ( 969671 ) writes:
      
      It is useless if the problems it causes are at least as troublesome as the results are helpful. And since tornadoes aren't known to spontaneously assemble communities out of random parts, the "million monkeys" approach to technology almost never works unless the goal is stalling or con artistry.
      Most of nature and our own technological development is a result of trial and error. It is essential to life and technological progress. What matters here to quote a sentence from the paper "The LLM is the creative core of FunSearch, in charge of coming up with improvements to the functions presented in the prompt and sending these for evaluation. "
      Monkeys typing on keyboard is just a loose metaphor that shouldn't be taken literally... what is really happening is more akin to informed rather than random g
      - Re: (Score:2)
        
        by Eunomion ( 8640039 ) writes:
        
        Per what I said about con artistry, there's a fundamental difference between actually accomplishing a task vs. just manipulating the judges into believing it's been accomplished based on superficial appearances. I get the strong impression... and it would seem to be inherent in what they're doing... that AI researchers tend to think there literally is no difference between the two. It's the definition of a social engineering hack.
        
        Re: (Score:2)
        
        by WaffleMonster ( 969671 ) writes:
        
        Per what I said about con artistry, there's a fundamental difference between actually accomplishing a task vs. just manipulating the judges into believing it's been accomplished based on superficial appearances. I get the strong impression... and it would seem to be inherent in what they're doing... that AI researchers tend to think there literally is no difference between the two. It's the definition of a social engineering hack.
        I don't know what you are trying to say with regards to manipulating judges. In this particular case there is a trivially checkable objective function that is the ultimate discriminator.
    - Re: (Score:3)
      
      by Linux Torvalds ( 647197 ) writes:
      
      That's the same idiotic argument you'd hear from any creationist.
      Hint: it's not a lottery ticket if the numbers can change.
      - Re: (Score:2)
        
        by Eunomion ( 8640039 ) writes:
        
        AI is not a brute-force computation like organic chemistry over eons. The analogy is completely irrelevant.
        
        Re: (Score:2)
        
        by Linux Torvalds ( 647197 ) writes:
        
        If so, it's irrelevant in the direction that makes your argument even less valid.
        
        Re: (Score:2)
        
        by Eunomion ( 8640039 ) writes:
        
        Again, no. There are not billions of years available to make an algorithm. The point of feedback occurs at human scale.
    - Re: (Score:2)
      
      by ceoyoyo ( 59147 ) writes:
      
      I think you've just found a counter argument to the million monkeys hypothesis.
      It produces real, useful results. That is an observation.
- Re: A million monkeys? (Score:3)
  
  by anonymouscoward52236 ( 6163996 ) writes:
  
  Dear AI gods, please do not destroy human civilization by "solving" (breaking) all encryption.
- Re:A million monkeys? (Score:4, Informative)
  
  by real_nickname ( 6922224 ) writes: on Friday December 15, 2023 @02:11AM (#64082977)
  
  Also it found a new set, it didn't find an algorithm to find all sets. It's great but the long-standing scientific puzzle is still long-standing. https://deepmind.google/discov... [deepmind.google]
  
  - Re: (Score:2)
    
    by aRTeeNLCH ( 6256058 ) writes:
    
    The algorithm is banging on the keyboard all day long. AI just bang faster.
- Re: (Score:2)
  
  by VeryFluffyBunny ( 5037285 ) writes:
  
  Yep, sounds very similar to p-hacking. Remember that it's the experts who evaluate the validity of the constrained but ultimately random responses that the LLM spits out. It doesn't sound all that different to asking ChatGPT to list suggestions in a variety of ways, systematically changing parameters to get a range of responses & then putting in the hard, uniquely human expert work of invalidating them, hoping that there might be a exception. Great work human experts!
- Re: (Score:1)
  
  by drinkypoo ( 153816 ) writes:
  
  It sounds an awful lot like they are producing massive amounts of worthless code, like a million monkeys with a million typewriters, and then testing the output until it works.
  So pretty much the same way the world of software development works overall... but in a box
Unsolvable? (Score:5, Insightful)

by burtosis ( 1124179 ) writes: on Friday December 15, 2023 @12:15AM (#64082907)

You keep using that word. I do not think it means what you think it means.

- Re: Unsolvable? (Score:3)
  
  by sabian2008 ( 6338768 ) writes:
  
  Yeah. The title is phrased like they solved the Javier Stokes blowup or something like that. In reality they found an speed up. I hope that at least this time they decreased the asymptotic order by an integer value and not from N^2.6x to N^2.5x (or something like that) like they did a couple of years ago for matmul.
  - Re: Unsolvable? (Score:3)
    
    by sabian2008 ( 6338768 ) writes:
    
    We all know about Javier Stokes, brother of Navier, right. Damn you autocorrect!
    - Re: (Score:3)
      
      by serviscope_minor ( 664417 ) writes:
      
      Nah it's Javier Xtoques. Unrelated.
Headline is stupid (Score:2)

by aldousd666 ( 640240 ) writes:

If it's impossible.... So therefore it's not impossible.
- Re: (Score:3)
  
  by CaseCrash ( 1120869 ) writes:
  
  I think they mean that the P=NP problem seems unsolvable (so far) and the AI made an algorithm for a specific one of those problems that gives a "good-enough" answer. It definitely didn't solve bin-packing, but maybe we have a better/faster decent answer now.
Do we understand why? (Score:4, Insightful)

by bidule ( 173941 ) writes: on Friday December 15, 2023 @01:20AM (#64082949) Homepage

So the new code is faster.
Do we understand how it became faster, what insight we missed. That more interesting from my point of view.

- Re: (Score:3)
  
  by javaman235 ( 461502 ) writes:
  
  Is it? Or is the question of a generalized process that speeds things up without understanding anything about the specifics of how in each individual case more interesting? Not sure I know. Having the latter seems powerful, but not predictable in any way you can rely on.
  - Re: (Score:2)
    
    by bidule ( 173941 ) writes:
    
    If it's a trick like inverse square root, it's neat but we can't extent it further.
    Like you said, getting random improvement is useful but unpredictable.
    If it's more generalized, it can open new avenues of research and ultimately make us smarter.
    Because it's a waste if AI only makes us dumb and dumber.
- Re: (Score:3)
  
  by mattr ( 78516 ) writes:
  
  From TFA: To be very honest with you, we have hypotheses, but we don’t know exactly why this works,” says Alhussein Fawzi, a research scientist at Google DeepMind. “In the beginning of the project, we didn’t know whether this would work at all.”
  In other words, "Uh-oh".
- Re: (Score:2)
  
  by trybywrench ( 584843 ) writes:
  
  So the new code is faster.
  Do we understand how it became faster, what insight we missed. That more interesting from my point of view.
  does it matter? if it works and works well enough then it works. Do you really need to understand it? I don't fully understand how my car works but i can drive it just fine.
Hey! (Score:5, Funny)

by SuperKendall ( 25149 ) writes: on Friday December 15, 2023 @04:07AM (#64083033)

FunSearch (so called because it searches for mathematical functions, not because it's fun)
Hey maybe searching for mathematical functions is not fun for YOU but for some of us that's a good Saturday night!

So when will an AI win the Fields medal (Score:2)

by Bruce66423 ( 1678196 ) writes:

Asking for a friend... ;)
- Re: (Score:2)
  
  by airport76 ( 7682176 ) writes:
  
  When was the last time a calculator won?
Indeed (Score:2)

by nospam007 ( 722110 ) * writes:

" It shows that they can indeed make discoveries -- if they are coaxed just so, and if you throw out the majority of what they come up with. "
So, just like a human scientist.
The Nature preprint (Score:3)

by mattr ( 78516 ) writes: <mattr.telebody@com> on Friday December 15, 2023 @07:41AM (#64083111) Homepage Journal

is interesting and shows some code. In particular the discussion is short but describes the cap and bin packing solutions.
The bin packing solution is neat. It seems to evolve an heuristic to score different bins and then chooses which to put the next package into.. but not if the fit is too tight, so that no tiny spaces get left that will never be filled. It beats currently known heuristics. This alone seems to be quite valuable in a commercial sense.

Why not FuncSearch? (Score:2)

by chas.williams ( 6256556 ) writes:

FunSearch is a terrible name. FuncSearch is just way cooler, even funkier one might say.
Actual read (skimmed) the paper (Score:3)

by coop247 ( 974899 ) writes: on Friday December 15, 2023 @09:17AM (#64083181)

While the theoretical problem of cap set was already solved, actual implementations used rudimentary "first fit" or "best fit" for bin selection/placement. Seems like the AI developed a custom heuristic search function that outperforms those human built fit ones.

Apparently the function is complex and they aren't even entirely sure what it is doing. To me this seems like a problem of overfitting. They trained the model on 4 synthetic test sets, and compared its performance on these same datasets. The millionth monkey managed to find a some great function parameters to minimize the objective function this particular dataset, not "universally solve" anything.

Maybe I'm wrong but that's how I read it. It did not seem clear that there was separate test sets it had never seen and compared to other methods.

error checker (Score:3)

by packrat0x ( 798359 ) writes: on Friday December 15, 2023 @10:35AM (#64083355)

The most impressive part of this story is about these *other* programs being able to review and check the output of the LLM.

And the answer is ... (Score:2)

by PPH ( 736903 ) writes:

... 42.
solve the unsolvable (Score:2)

by groobly ( 6155920 ) writes:

If it is unsolvable, then the "solution" is not a solution. If it is a solution, then the problem is not "unsolvable." Maybe they meant "previously unsolved"?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

A million monkeys? (Score:5, Interesting)

Re:A million monkeys? (Score:5, Insightful)

Re: (Score:3)

Re:A million monkeys? (Score:5, Funny)

Re:A million monkeys? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: A million monkeys? (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: A million monkeys? (Score:3)

Re:A million monkeys? (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Unsolvable? (Score:5, Insightful)

Re: Unsolvable? (Score:3)

Re: Unsolvable? (Score:3)

Re: (Score:3)

Headline is stupid (Score:2)

Re: (Score:3)

Do we understand why? (Score:4, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Hey! (Score:5, Funny)

So when will an AI win the Fields medal (Score:2)

Re: (Score:2)

Indeed (Score:2)

The Nature preprint (Score:3)

Why not FuncSearch? (Score:2)

Actual read (skimmed) the paper (Score:3)

error checker (Score:3)

And the answer is ... (Score:2)

solve the unsolvable (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals