OpenAI Claims It Solved an 80-Year-Old Math Problem 66
An anonymous reader quotes a report from TechCrunch: OpenAI claims its new reasoning model has produced an original mathematical proof disproving a famous unsolved conjecture in geometry, which was first posed by Paul Erdos in 1946. If this sounds familiar to you, it's because this isn't the first time OpenAI has made such a bold claim. Seven months ago, the AI giant's former VP Kevin Weil posted on X: "GPT-5 found solutions to 10 (!) previously unsolved Erds problems and made progress on 11 others."
It turns out, GPT-5 didn't actually solve those problems; it just found solutions that already existed in the literature. Taunts from rivals like Yann LeCun and Google DeepMind CEO Demis Hassabis followed, and Weil promptly took down his premature post. Today, at least, it seems OpenAI didn't make the same mistake twice. Alongside the announcement, the company published companion remarks (PDF) in support of the disproof from mathematicians like Noga Alon, Melanie Wood, and Thomas Bloom, who maintains the Erdos Problems website, and previously called Weil's post "a dramatic misrepresentation."
[...] The proof, per OpenAI, came from a new general-purpose reasoning model, not a system specifically designed to solve math problems or even this problem in particular. OpenAI says this is significant because it means AI systems are now more capable of holding together long, difficult chains of reasoning and connecting ideas across fields in ways researchers may not have previously explored. That has implications for biology, physics, engineering, and medicine.
It turns out, GPT-5 didn't actually solve those problems; it just found solutions that already existed in the literature. Taunts from rivals like Yann LeCun and Google DeepMind CEO Demis Hassabis followed, and Weil promptly took down his premature post. Today, at least, it seems OpenAI didn't make the same mistake twice. Alongside the announcement, the company published companion remarks (PDF) in support of the disproof from mathematicians like Noga Alon, Melanie Wood, and Thomas Bloom, who maintains the Erdos Problems website, and previously called Weil's post "a dramatic misrepresentation."
[...] The proof, per OpenAI, came from a new general-purpose reasoning model, not a system specifically designed to solve math problems or even this problem in particular. OpenAI says this is significant because it means AI systems are now more capable of holding together long, difficult chains of reasoning and connecting ideas across fields in ways researchers may not have previously explored. That has implications for biology, physics, engineering, and medicine.
Mathematician commentary included (Score:5, Informative)
"One other concern that directly arises in this development is that there is a history of closely related ideas in the literature,.. which are not appropriately referenced in Chat GPT’s paper. If a human came up with this argument and didn’t cite such previous work, we would assume that they were unfamiliar with the previous work and came up with the ideas independently, since our professional norms require us to cite previous work whose ideas influenced our work. On the other hand, Chat GPT is in some sense “familiar” with all the previous work."
Re:Mathematician commentary included (Score:4, Interesting)
To be fair, even just a tool that can search the literature for solutions of similar problems is extremely useful.
Re:Mathematician commentary included (Score:4, Interesting)
I don't think anyone is contesting that. What they are contesting is a sociopath's claim that his tool solved something humans couldn't ... when really 95+% of what it did was just leverage existing human knowledge.
Caveat: I'm not a mathematician and I didn't read the paper ... but Sam Altman is VERY well known for lying (constantly), so everything he says should be taken with multiple bags of salt.
Re: (Score:2)
when really 95+% of what it did was just leverage existing human knowledge.
All mathematicians build math by leveraging what other mathematicians have done. When Andrew Wiles proved enough of the Modularity Theorem to prove Fermat's Last Theorem, he was leveraging ideas from all over, from algebraic geometry, from representation theory, from complex analysis, from Galois theory, from elliptic curves, etc. Combining that was the big thing. When Peter Scholze and Dustin Clausen recently did their work on "condensed mathematics" (which may get Clausen a Fields Medal- not Scholze since
Re:Mathematician commentary included (Score:4, Informative)
Right, so you could make a claim like "all mathematicians leveraged (say) 95% prior knowledge". If OpenAI similarly leveraged 95%, and "discovered" the rest, it'd be (at least somewhat) legitimate to say "OpenAI invented a new theorem".
But, if OpenAI actually leveraged 99.9% existing knowledge (remember, my post said ""95+%"), then it's NOT fair to compare it to a human discovery. If a company claims as much, they're dishonestly promoting their product.
Again, I'm not a mathematician, and I did not read the paper. But given Altman's history of deception, I think any disinterested observer should lean towards assuming that he is falsely promoting the competency of his product, before assuming he's invented a machine that can out-invent humans.
Re: (Score:2)
Made up numbers are kind of silly. Especially when they refer to things that aren't easily quantifiable.
If the AI made a non-trivial contribution to a proof then that's interesting. In this case it seems like it did so. I doubt it was something mathematicians couldn't do, but it does seem to be something they hadn't done.
Re: (Score:2, Interesting)
Also, it's silly that people are acting like "all problems but this one were already in the literature". AI has solved a whole slew on Erdos problems, and only a fraction had anything to do with existing literature [github.com].
And even in "existing literature" examples, it's not "nobody ever thought to search before" as if all mathematicians are morons, or that mathematicians adore putting out Erdos problem solutions without claiming them, It's that nobody had ever thought to apply an obscure technique from a given pi
Re: (Score:2)
I am surprised that a site suppsedly full of computer scientists is the least bit surprised that AI can be good at mathematical proofs. For any formalizable problem you know where you start, you know where you want to end up and you know the legal state transitions. It's a simple tree search that we have, in fact, written lots of standard computer programs to execute.
The difficulty comes because any non-trivial proof is a very big tree search. But learning style AI is really good at pruning really big trees
Re: (Score:2)
Literally no one has expressed that, as far as i can see, in this thread. You're fighting a straw man.
What I said, and stand by, is that we should be skeptical because Sam Altman is a sociopathic liar, and there is a long history of examples to support that. It has nothing to do with whether Open AI (or any AI) can create proofs.
Re: (Score:2)
I wasn't replying to you.
LOTS of people here have been skeptical that AI can do X where X is pretty much anything, and certainly where X is "formulate novel math proofs."
PS: I don't disagree with you that Sam Altman claiming something isn't good evidence. That was not the subject of the post I replied to and has nothing to do with my reply to not your post.
Re: (Score:2)
Chess and Go pruned the search tree by applying heuristics. Simplified: Take the current board, do one step, play random until the end. If you win many times, the subtree is more interesting than the subtrees where you lose more often. There is no such heuristic for a proof. Nobody tells you "You're close to the correct solution" and some attempts may look good until the end and the missing step then shows that the whole idea didn't work out. Chess and Go also allow to find suboptimal solutions and still be
Re: (Score:2)
The old method of building chess players was heuristics. You have a bunch of engineers who hopefully know something about chess come up with rules about how good a position is. That's what "heuristics" are.
The thing that made chess computers better than any human and go computers better than all but the very best (or probably all by now) was using neural networks that learn their heuristics by experience. And yes, human mathematicians absolutely do learn heuristics, i.e. "gut feelings" regarding good approa
Re: (Score:2)
Nah, I am talking about Monte Carlo Tree Search. That can use heuristics, but is mostly about pruning search trees in an automated manner. Heuristics are useful to better estimate the worth of a subtree, but in general it is a self-play strategy that can work without human preknowledge. The breakthrough for Go was to add neural networks to MCTS to further improve the algorithm, but for chess the tree search alone is enough.
And while you could say that some proofs may be overly complicated, you can only know
Re: (Score:2)
The simple fact is, AI has gotten much better at solving unsolved math problems than humans are. It's simply another field that it's taking over, the same way it has been taking over programming.
I'm not sure how well it works in programming. It helps in brainstorming and for a developer working in an area in which he is not advanced. There is a claim it works very well in crypto implemented in Rust with full test coverage and after human developer provided all the main interfaces. It is plausible. Most crypto code is opensource and easy to test so AI can work very well essentially rewriting the code it was trained on into Rust.[1]
But on the other side there is that study that did show that develope
Re: (Score:2)
The simple fact is, AI has gotten much better at solving unsolved math problems than humans are.
We're not at that point yet. Right now, we're not seeing it solve the genuinely hardest problems, like say the Riemann Hypothesis, or P ?= NP. What is true is that these systems are at least as good as a beginning grad student in all subfields and are outputting results equivalent to a top-notch mathematician on some problems. But it is also true that these systems are improving rapidly. So while your statement is false right now, it looks likely your statement i
Re: (Score:3)
What is true is that these systems are at least as good as a beginning grad student in all subfields and are outputting results equivalent to a top-notch mathematician on some problems.
That might not be true. They didn't release the actual output from the AI. Instead, they released a proof that was heavily modified by mathematicians before being released to the world. It's not clear what the AI actually did. https://cdn.openai.com/pdf/74c... [openai.com]
Presumably if the output from the AI was very good, they would have just released that.
Re: (Score:2)
If the mathematicians would have done the main work, they would have claimed the result for themselves. And the prompts seem to be included in the paper, you are only linking the remarks.
https://cdn.openai.com/pdf/74c... [openai.com]
See page 3
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
" If the level and type of human expertise that is represented on this note had been assembled to find a counterexample to this conjecture a month ago, and those people put in similar amounts of time working on it than they did to reading and thinking about Chat GPT’s solution, the mathematicians would have found a counterexample. However, without the claimed proof by Chat GPT, there is no particular reason anyone would have tried to look for a counterexample, assembled a group of experts with the appropriate expertise, or that the experts would have agreed to turn their attention to this problem. We can all be reminded by this development of how frequently interesting and powerful things happen mathematically when one applies ideas from one field to another, and think about how AI can help us find more cross-field applications."
Re: (Score:2)
So it's entirely possible that all the machine did was a literature search, and then the humans fixed the last 1% or even the last 5%. Useful, but different than what is claimed.
Re: (Score:2)
No.There's no leveraging. The purpose of research papers in mathematics is to advance the frontiers of knowledge. That requires, by definition, original contribution. It's about as far removed from "leveraging" as you can get.
Engineers (of the software kind, typically) don't always understand the difference between using something someone else has done for solving a problem, and creating a new original solution to a problem.
Re: (Score:2)
Re: (Score:2)
a sociopath's claim....
Are you hinting at Sam Altman?
Re: (Score:2)
No, not hinting :)
Sam Altman is VERY well known for lying (constantly), so everything he says should be taken with multiple bags of salt.
Re: (Score:2)
No, the contention is that OpenAI is claiming the tool is solving when in reality it is only regurgitating a found solution.
LLMs do pretty well at being a search engine when they're illegally fed everything.
Re: (Score:2)
Re: (Score:2)
"Couldn't" is a strong word here. Better say "Didn't yet". Some things need many people trying before someone gets the idea, or luck, or a combination of both. Ai isn't the superhuman, but AI managed to solve that one.
Re:Mathematician commentary included (Score:4, Informative)
To be fair, even just a tool that can search the literature for solutions of similar problems is extremely useful.
The problem for all the AI companies is that means they've built a discount version of the Library Computer Access Retrieval System (LCARS) with questionable data. That is not AGI. That is not intelligence at all. That is a search engine with a different, many times using a terrible, interface.
That tool, while useful in some ways, is nowhere near worth the billions they've burned selling it as the tool to do everything. That means Sam, Dario, Elon, and the rest spent the GDP of several countries to build a fancier Wikipedia.
Re: (Score:2)
That is true. But that is not as what this gets pushed and such a search tool would never even remotely justify the extreme effort LLMs need to do this.
Re: (Score:2)
Re:Mathematician commentary included (Score:4, Informative)
They mostly use plain English, so their comments are very helpful.
Re: (Score:2)
Either that verification will take a few years or this was exceptionally easy to prove once the pre-requisites were clear. Hence at this time there are only two options: The proof is wrong or it was not hard to find for a machine that cannot reason, but can trawl though vast amounts of data looking for correlations.
My guess is the second: The only reason nobody else found this is because it is very easy to do, but the prerequisites are very non-intuitive and hence nobody looked. Kind of like those "20 year
Re: (Score:2)
It's hard to say intelligent things about what the AI is actually doing, because they haven't released the source code. It's not just an LLM, it's an LLM with some kind of algorithm tied to it somehow. In particular, it didn't need to write out a logical train of thought (which would be a proof), all it had to do was say, "here is a tighter way to pack dots on a grid."
Re: (Score:2)
Good point. Obviously, a counter example is very simple in "proof structure", especially as it does not need to tell you anything about what an optimal result would look like.
I guess Mathematicians will continue to have good job opportunities after all.
As to what that LLM does, there are a lot of non-statistical tools it could be using. Obviously, if they tell us, say, "the LLM handed 100'000 possible counterexamples to Wolfram Alpha and Wolfram Alpha picked the single one that was not nonsense", that kind
Re: (Score:2)
"One other concern that directly arises in this development is that there is a history of closely related ideas in the literature,.. which are not appropriately referenced in Chat GPT’s paper. If a human came up with this argument and didn’t cite such previous work, we would assume that they were unfamiliar with the previous work and came up with the ideas independently, since our professional norms require us to cite previous work whose ideas influenced our work. On the other hand, Chat GPT is
Re: (Score:3)
You didn't read the paper so you are ignorant by definition. RTFP or GTFO, you ignoramus. Be ignorant no longer.
Re: (Score:2)
The real question is how much effort was wasted on problems it could not solve. My guess is a _lot_. Even a statistical model can get lucky occasionally, if everything was already in the training data. This is not any proof a systematic or meaningful skill. It is like asking a student to find a problem they can solve and then have them solve it, instead of giving them a specific problem to solve. Meaningless.
Re: (Score:1)
LLMs are not "statistical models" (randomness only even comes into play in the final conversion from latent space to token space because latent space is high dimensional, token space is low dimension, you need a rounding mechanism, and a "noisy" rounding mechanism works best; what you're thinking of, by contrast, is Markov models). And you cannot just "get lucky and randomly solve an unsolved math problem"; that's not how any of this works.
Re: (Score:2)
My understanding is that LLMs are built on a foundation of ANNs, and that indeed the backpropagation used to train ANNs is a statistical process; the cost function that must be minimized (via vector calculus) is a least-squared-error variant, a decidedly statistical calculation. How does this not make the model statistical ?
Re: (Score:2)
Neural networks are approximations you use when the data is too large to be tractable with a Bayesian classifier.
Re: (Score:1)
Two responses. One, that's discussing individual-neuron scale processes rather than collective processes; and this was a discussion about inference, not training. Human neurons also learn by error minimization (Hebbian learning). But this does not describe the macroscopic processes that result from said minimization.
* During training, neurons develop into clas
Re: (Score:3)
The words to describe the process that you have identified are statistical inference, not logical inference. I don't believe that you can square that circle; it's why NNs are said to interpolate, but not extrapolate. But my beliefs are, shall we say, flexible -- I'm open to a counter-argument.
Re: (Score:2)
Essentially, yes. The problem with "statistical inference" is that is does not stack. Logical inference can be stacked as high as you want and the end-result (unless you made a real "hard" error in applying the rules) will always be valid. With "statistical inference", that is not true. Each step only has a probability smaller than 1 to succeed and that is fundamental. At some, not very high, number of steps, the results become arbitrary. Hence the "feat" here is that the LLM found the result to be very clo
Re: (Score:2)
So what is statistical inference to you and how does it related to what LLM models do? While were at it, what is extrapolation? I feel like you're using a lot of terminology in an incorrect or at least imprecise wa
Re: (Score:2)
The mathematicians in the paper you linked all contend that the AI came up with the key counter example. Just because humans wrote it in a more readable form that is slightly generalized doesn't invalidate the the counter example the AI came up with works for the original conjecture. Are you going to say the same of Grigori Perelman's proof.
Did it use Lean ? (Score:2)
Or any other proof assistant / verifier ? Is this true NN reasoning, or just more LLM/NN spaghetti thrown up against a symbolic verifier ?
Re:Did it use Lean ? (Score:4, Informative)
Re: (Score:3)
I am in no way dismissing hybrid systems that incorporate Lean, or similar symbolic apparatus. To the contrary, I find that there must be some "symbolic assist" (i.e., some predicate calculus engine) to pure NN systems to verify that their pattern match is a valid proof. But I'm more than willing to be proven wrong. I just don't see how to get past the fact that statistical inference is not the same as logical inference, at least in the province of proofs. But WTH, feel free to educate me.
Re: (Score:2)
Adding to your comment, mathematicians find logic engines useful tools too. Hybrid systems aren't clunky and shouldn't seem to be so. Whether it's some kind of trained AI plus a logic solver or a human plus a logic solver, it's good design taking advantages of the strengths of both systems.
Re: (Score:1)
It is a statistical model getting lucky. Works like "predictions" by stock analysts: You put out 1000s of predictions and if you are right by pure random chance once, you claim that it was your superior skills.
This is the real deal (Score:3, Insightful)
More than any other AI use yet to solve an open problem, this one cannot be dismissed without just being completely irrational. Even Erdos 1196 people could use maybe was somewhere hidden in the training data (which as a mathematician in a closely related area seemed extremely for a whole bunch of reasons I'm happy to expand on) or that the problem just hadn't gotten a lot of attention (which was arguable there even as it was a well known enough problem that I had heard of it). But the Erdos unit distance problem is a genuinely famous problem. There's no way that there was a lack of attention to the problem, and there's no way to say some solution was in an obscure journal no one noticed. This is a problem which literally gets discussed in some undergrad classes.
The Annals of Mathematics is the most prestigious math journal in the world, and most mathematicians will never get a paper published there at all (I certainly don't expect to). I talked with another mathematician whose work is closer to this problem and asked "So is this the time when an AI first gets a result that should be essentially in the Annals?" and his response was "delete essentially from that sentence and the answer is yes." I have a bet with another mathematician that there would be no papers in either the Annals, Inventiones, or Crelle where the result was discovered by an AI before 2028. 72 hours ago I thought I had a decent chance at winning that bet. Now, I'm seeing what is likely the result that is going to make me lose.
Re: (Score:2)
Au contraire. If you look at 1000s of problems and burn a mountain of tokens, you are bound to find some rare cases where everything was already there but nobody put it together. This is exactly a case of that: An entirely meaningless stunt. It is like instead of a student having to solve a specific exam question, you ask them to find any question they can solve and then have them solve that.
Re: (Score:3)
Au contraire. If you look at 1000s of problems and burn a mountain of tokens, you are bound to find some rare cases where everything was already there but nobody put it together.
Have you read the paper? I have, and it is very much not the case of what is going on here. There are multiple deeply clever bits in this argument. If this were written by a human, it would be recognized as highly insightful. Moreover, you are also missing how much what human mathematicians often do really does look like what you are dismissing. I've worked on hundreds of problems, and gotten successful results in maybe 5 or 6 of them. If someone dismissed humans under that basis, you'd recognize the probl
Re: (Score:2)
There are multiple deeply clever bits in this argument.
Which is the part that you think is deeply clever?
Re: (Score:2)
How pathetic (Score:1, Troll)
They must have looked at 1000s of problems, burned mountains of tokens, and have had failure after failure, just to be able to find one "success". Of course the usual AI fans will not understand that this is a completely meaningless stunt.
Re: (Score:2)
Must have.
And that would be totes different than human mathematicians who pick a problem and work at it it and only it until they succeed.
Dem gaps are a closin hey?
To be fair ... (Score:2)
Seven months ago, the AI giant's former VP Kevin Weil posted on X: "GPT-5 found solutions to 10 (!) previously unsolved Erds problems and made progress on 11 others." ... It turns out, GPT-5 didn't actually solve those problems; it just found solutions that already existed in the literature.
Technically, he said they "found solutions" and they did find them - in the literature. He didn't say they "solved them".
Everyone else assumed he meant solved and assumptions are like AI companies, everyone (apparently) has one. :-)
Re: (Score:2)
Along those lines, I can say that I have personally found a marvelous proof which is too large to fit in the margin of this comment...