Google DeepMind Uses LLM To Solve Unsolvable Math Problem (technologyreview.com) 48
An anonymous reader quotes a report from MIT Technology Review: In a paper published in Nature today, the researchers say it is the first time a large language model has been used to discover a solution to a long-standing scientific puzzle -- producing verifiable and valuable new information that did not previously exist. "It's not in the training data -- it wasn't even known," says coauthor Pushmeet Kohli, vice president of research at Google DeepMind. Large language models have a reputation for making things up, not for providing new facts. Google DeepMind's new tool, called FunSearch, could change that. It shows that they can indeed make discoveries -- if they are coaxed just so, and if you throw out the majority of what they come up with.
FunSearch (so called because it searches for mathematical functions, not because it's fun) continues a streak of discoveries in fundamental math and computer science that DeepMind has made using AI. First Alpha Tensor found a way to speed up a calculation at the heart of many different kinds of code, beating a 50-year record. Then AlphaDev found ways to make key algorithms used trillions of times a day run faster. Yet those tools did not use large language models. Built on top of DeepMind's game-playing AI AlphaZero, both solved math problems by treating them as if they were puzzles in Go or chess. The trouble is that they are stuck in their lanes, says Bernardino Romera-Paredes, a researcher at the company who worked on both AlphaTensor and FunSearch: "AlphaTensor is great at matrix multiplication, but basically nothing else." FunSearch takes a different tack. It combines a large language model called Codey, a version of Google's PaLM 2 that isfine-tuned on computer code, with other systems that reject incorrect or nonsensical answers and plug good ones back in.
The researchers started by sketching out the problem they wanted to solve in Python, a popular programming language. But they left out the lines in the program that would specify how to solve it. That is where FunSearch comes in. It gets Codey to fill in the blanks -- in effect, to suggest code that will solve the problem. A second algorithm then checks and scores what Codey comes up with. The best suggestions -- even if not yet correct -- are saved and given back to Codey, which tries to complete the program again. After a couple of million suggestions and a few dozen repetitions of the overall process -- which took a few days -- FunSearch was able to come up with code that produced a correct and previously unknown solution to the cap set problem, which involves finding the largest size of a certain type of set. Imagine plotting dots on graph paper. [...] To test its versatility, the researchers used FunSearch to approach another hard problem in math: the bin packing problem, which involves trying to pack items into as few bins as possible. This is important for a range of applications in computer science, from data center management to e-commerce. FunSearch came up with a way to solve it that's faster than human-devised ones.
FunSearch (so called because it searches for mathematical functions, not because it's fun) continues a streak of discoveries in fundamental math and computer science that DeepMind has made using AI. First Alpha Tensor found a way to speed up a calculation at the heart of many different kinds of code, beating a 50-year record. Then AlphaDev found ways to make key algorithms used trillions of times a day run faster. Yet those tools did not use large language models. Built on top of DeepMind's game-playing AI AlphaZero, both solved math problems by treating them as if they were puzzles in Go or chess. The trouble is that they are stuck in their lanes, says Bernardino Romera-Paredes, a researcher at the company who worked on both AlphaTensor and FunSearch: "AlphaTensor is great at matrix multiplication, but basically nothing else." FunSearch takes a different tack. It combines a large language model called Codey, a version of Google's PaLM 2 that isfine-tuned on computer code, with other systems that reject incorrect or nonsensical answers and plug good ones back in.
The researchers started by sketching out the problem they wanted to solve in Python, a popular programming language. But they left out the lines in the program that would specify how to solve it. That is where FunSearch comes in. It gets Codey to fill in the blanks -- in effect, to suggest code that will solve the problem. A second algorithm then checks and scores what Codey comes up with. The best suggestions -- even if not yet correct -- are saved and given back to Codey, which tries to complete the program again. After a couple of million suggestions and a few dozen repetitions of the overall process -- which took a few days -- FunSearch was able to come up with code that produced a correct and previously unknown solution to the cap set problem, which involves finding the largest size of a certain type of set. Imagine plotting dots on graph paper. [...] To test its versatility, the researchers used FunSearch to approach another hard problem in math: the bin packing problem, which involves trying to pack items into as few bins as possible. This is important for a range of applications in computer science, from data center management to e-commerce. FunSearch came up with a way to solve it that's faster than human-devised ones.
A million monkeys? (Score:5, Interesting)
It sounds an awful lot like they are producing massive amounts of worthless code, like a million monkeys with a million typewriters, and then testing the output until it works. Every iteration the ones that do anything resembling the problem are sent back to the monkeys for editing.
Re:A million monkeys? (Score:5, Insightful)
Re: (Score:3)
Re:A million monkeys? (Score:5, Funny)
producing massive amounts of worthless code, like a million monkeys with a million typewriters, and then testing the output until it works.
I've met humans who develop code the same way.
Re:A million monkeys? (Score:5, Insightful)
The more important questions is: have you met humans that don't develop code that way?
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I've met humans who develop code the same way.
Humans did after all write all the words on the Internet that these LLM's have processed. It's a kind of like the convergent evolution of all BS. Talk enough like you know something and pretty soon it might start looking like you have an answer to some problem. Computers are just faster at connecting the dots between completely unrelated topics using statistics. With enough statistics all lies become true.
Re: A million monkeys? (Score:2)
Re: (Score:3)
Re: (Score:3)
It is useless if the problems it causes are at least as troublesome as the results are helpful. And since tornadoes aren't known to spontaneously assemble communities out of random parts, the "million monkeys" approach to technology almost never works unless the goal is stalling or con artistry.
Most of nature and our own technological development is a result of trial and error. It is essential to life and technological progress. What matters here to quote a sentence from the paper "The LLM is the creative core of FunSearch, in charge of coming up with improvements to the functions presented in the prompt and sending these for evaluation. "
Monkeys typing on keyboard is just a loose metaphor that shouldn't be taken literally... what is really happening is more akin to informed rather than random g
Re: (Score:2)
Re: (Score:2)
Per what I said about con artistry, there's a fundamental difference between actually accomplishing a task vs. just manipulating the judges into believing it's been accomplished based on superficial appearances. I get the strong impression... and it would seem to be inherent in what they're doing... that AI researchers tend to think there literally is no difference between the two. It's the definition of a social engineering hack.
I don't know what you are trying to say with regards to manipulating judges. In this particular case there is a trivially checkable objective function that is the ultimate discriminator.
Re: (Score:3)
That's the same idiotic argument you'd hear from any creationist.
Hint: it's not a lottery ticket if the numbers can change.
Re: (Score:2)
Re: (Score:2)
If so, it's irrelevant in the direction that makes your argument even less valid.
Re: (Score:2)
Re: (Score:2)
I think you've just found a counter argument to the million monkeys hypothesis.
It produces real, useful results. That is an observation.
Re: A million monkeys? (Score:3)
Dear AI gods, please do not destroy human civilization by "solving" (breaking) all encryption.
Re:A million monkeys? (Score:4, Informative)
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
It sounds an awful lot like they are producing massive amounts of worthless code, like a million monkeys with a million typewriters, and then testing the output until it works.
So pretty much the same way the world of software development works overall... but in a box
Unsolvable? (Score:5, Insightful)
Re: Unsolvable? (Score:3)
Re: Unsolvable? (Score:3)
Re: (Score:3)
Nah it's Javier Xtoques. Unrelated.
Headline is stupid (Score:2)
Re: (Score:3)
Do we understand why? (Score:4, Insightful)
So the new code is faster.
Do we understand how it became faster, what insight we missed. That more interesting from my point of view.
Re: (Score:3)
Is it? Or is the question of a generalized process that speeds things up without understanding anything about the specifics of how in each individual case more interesting? Not sure I know. Having the latter seems powerful, but not predictable in any way you can rely on.
Re: (Score:2)
If it's a trick like inverse square root, it's neat but we can't extent it further.
Like you said, getting random improvement is useful but unpredictable.
If it's more generalized, it can open new avenues of research and ultimately make us smarter.
Because it's a waste if AI only makes us dumb and dumber.
Re: (Score:3)
From TFA: To be very honest with you, we have hypotheses, but we don’t know exactly why this works,” says Alhussein Fawzi, a research scientist at Google DeepMind. “In the beginning of the project, we didn’t know whether this would work at all.”
In other words, "Uh-oh".
Re: (Score:2)
So the new code is faster.
Do we understand how it became faster, what insight we missed. That more interesting from my point of view.
does it matter? if it works and works well enough then it works. Do you really need to understand it? I don't fully understand how my car works but i can drive it just fine.
Hey! (Score:5, Funny)
FunSearch (so called because it searches for mathematical functions, not because it's fun)
Hey maybe searching for mathematical functions is not fun for YOU but for some of us that's a good Saturday night!
So when will an AI win the Fields medal (Score:2)
Asking for a friend... ;)
Re: (Score:2)
Indeed (Score:2)
" It shows that they can indeed make discoveries -- if they are coaxed just so, and if you throw out the majority of what they come up with. "
So, just like a human scientist.
The Nature preprint (Score:3)
is interesting and shows some code. In particular the discussion is short but describes the cap and bin packing solutions.
The bin packing solution is neat. It seems to evolve an heuristic to score different bins and then chooses which to put the next package into.. but not if the fit is too tight, so that no tiny spaces get left that will never be filled. It beats currently known heuristics. This alone seems to be quite valuable in a commercial sense.
Why not FuncSearch? (Score:2)
Actual read (skimmed) the paper (Score:3)
Apparently the function is complex and they aren't even entirely sure what it is doing. To me this seems like a problem of overfitting. They trained the model on 4 synthetic test sets, and compared its performance on these same datasets. The millionth monkey managed to find a some great function parameters to minimize the objective function this particular dataset, not "universally solve" anything.
Maybe I'm wrong but that's how I read it. It did not seem clear that there was separate test sets it had never seen and compared to other methods.
error checker (Score:3)
The most impressive part of this story is about these *other* programs being able to review and check the output of the LLM.
And the answer is ... (Score:2)
solve the unsolvable (Score:2)
If it is unsolvable, then the "solution" is not a solution. If it is a solution, then the problem is not "unsolvable." Maybe they meant "previously unsolved"?