OpenAI Claims It Solved an 80-Year-Old Math Problem 24
An anonymous reader quotes a report from TechCrunch: OpenAI claims its new reasoning model has produced an original mathematical proof disproving a famous unsolved conjecture in geometry, which was first posed by Paul Erdos in 1946. If this sounds familiar to you, it's because this isn't the first time OpenAI has made such a bold claim. Seven months ago, the AI giant's former VP Kevin Weil posted on X: "GPT-5 found solutions to 10 (!) previously unsolved Erds problems and made progress on 11 others."
It turns out, GPT-5 didn't actually solve those problems; it just found solutions that already existed in the literature. Taunts from rivals like Yann LeCun and Google DeepMind CEO Demis Hassabis followed, and Weil promptly took down his premature post. Today, at least, it seems OpenAI didn't make the same mistake twice. Alongside the announcement, the company published companion remarks (PDF) in support of the disproof from mathematicians like Noga Alon, Melanie Wood, and Thomas Bloom, who maintains the Erdos Problems website, and previously called Weil's post "a dramatic misrepresentation."
[...] The proof, per OpenAI, came from a new general-purpose reasoning model, not a system specifically designed to solve math problems or even this problem in particular. OpenAI says this is significant because it means AI systems are now more capable of holding together long, difficult chains of reasoning and connecting ideas across fields in ways researchers may not have previously explored. That has implications for biology, physics, engineering, and medicine.
It turns out, GPT-5 didn't actually solve those problems; it just found solutions that already existed in the literature. Taunts from rivals like Yann LeCun and Google DeepMind CEO Demis Hassabis followed, and Weil promptly took down his premature post. Today, at least, it seems OpenAI didn't make the same mistake twice. Alongside the announcement, the company published companion remarks (PDF) in support of the disproof from mathematicians like Noga Alon, Melanie Wood, and Thomas Bloom, who maintains the Erdos Problems website, and previously called Weil's post "a dramatic misrepresentation."
[...] The proof, per OpenAI, came from a new general-purpose reasoning model, not a system specifically designed to solve math problems or even this problem in particular. OpenAI says this is significant because it means AI systems are now more capable of holding together long, difficult chains of reasoning and connecting ideas across fields in ways researchers may not have previously explored. That has implications for biology, physics, engineering, and medicine.
Mathematician commentary included (Score:5, Informative)
"One other concern that directly arises in this development is that there is a history of closely related ideas in the literature,.. which are not appropriately referenced in Chat GPT’s paper. If a human came up with this argument and didn’t cite such previous work, we would assume that they were unfamiliar with the previous work and came up with the ideas independently, since our professional norms require us to cite previous work whose ideas influenced our work. On the other hand, Chat GPT is in some sense “familiar” with all the previous work."
Re:Mathematician commentary included (Score:4, Interesting)
To be fair, even just a tool that can search the literature for solutions of similar problems is extremely useful.
Re: (Score:2)
I don't think anyone is contesting that. What they are contesting is a sociopath's claim that his tool solved something humans couldn't ... when really 95+% of what it did was just leverage existing human knowledge.
Caveat: I'm not a mathematician and I didn't read the paper ... but Sam Altman is VERY well known for lying (constantly), so everything he says should be taken with multiple bags of salt.
Re: (Score:2)
when really 95+% of what it did was just leverage existing human knowledge.
All mathematicians build math by leveraging what other mathematicians have done. When Andrew Wiles proved enough of the Modularity Theorem to prove Fermat's Last Theorem, he was leveraging ideas from all over, from algebraic geometry, from representation theory, from complex analysis, from Galois theory, from elliptic curves, etc. Combining that was the big thing. When Peter Scholze and Dustin Clausen recently did their work on "condensed mathematics" (which may get Clausen a Fields Medal- not Scholze since
Re: (Score:2)
Right, so you could make a claim like "all mathematicians leveraged (say) 95% prior knowledge". If OpenAI similarly leveraged 95%, and "discovered" the rest, it'd be (at least somewhat) legitimate to say "OpenAI invented a new theorem".
But, if OpenAI actually leveraged 99.9% existing knowledge (remember, my post said ""95+%"), then it's NOT fair to compare it to a human discovery. If a company claims as much, they're dishonestly promoting their product.
Again, I'm not a mathematician, and I did not read th
Re: (Score:2)
Made up numbers are kind of silly. Especially when they refer to things that aren't easily quantifiable.
If the AI made a non-trivial contribution to a proof then that's interesting. In this case it seems like it did so. I doubt it was something mathematicians couldn't do, but it does seem to be something they hadn't done.
Re: (Score:2)
Also, it's silly that people are acting like "all problems but this one were already in the literature". AI has solved a whole slew on Erdos problems, and only a fraction had anything to do with existing literature [github.com].
And even in "existing literature" examples, it's not "nobody ever thought to search before" as if all mathematicians are morons, or that mathematicians adore putting out Erdos problem solutions without claiming them, It's that nobody had ever thought to apply an obscure technique from a given pi
Re: (Score:2)
To be fair, even just a tool that can search the literature for solutions of similar problems is extremely useful.
The problem for all the AI companies is that means they've built a discount version of the Library Computer Access Retrieval System (LCARS) with questionable data. That is not AGI. That is not intelligence at all. That is a search engine with a different, many times using a terrible, interface.
That tool, while useful in some ways, is nowhere near worth the billions they've burned selling it as the tool to do everything. That means Sam, Dario, Elon, and the rest spent the GDP of several countries to buil
Re: (Score:2)
That is true. But that is not as what this gets pushed and such a search tool would never even remotely justify the extreme effort LLMs need to do this.
Re: (Score:2)
Re: (Score:2)
"One other concern that directly arises in this development is that there is a history of closely related ideas in the literature,.. which are not appropriately referenced in Chat GPT’s paper. If a human came up with this argument and didn’t cite such previous work, we would assume that they were unfamiliar with the previous work and came up with the ideas independently, since our professional norms require us to cite previous work whose ideas influenced our work. On the other hand, Chat GPT is
Re: (Score:2)
The real question is how much effort was wasted on problems it could not solve. My guess is a _lot_. Even a statistical model can get lucky occasionally, if everything was already in the training data. This is not any proof a systematic or meaningful skill. It is like asking a student to find a problem they can solve and then have them solve it, instead of giving them a specific problem to solve. Meaningless.
Re: (Score:2)
LLMs are not "statistical models" (randomness only even comes into play in the final conversion from latent space to token space because latent space is high dimensional, token space is low dimension, you need a rounding mechanism, and a "noisy" rounding mechanism works best; what you're thinking of, by contrast, is Markov models). And you cannot just "get lucky and randomly solve an unsolved math problem"; that's not how any of this works.
Did it use Lean ? (Score:2)
Or any other proof assistant / verifier ? Is this true NN reasoning, or just more LLM/NN spaghetti thrown up against a symbolic verifier ?
Re: (Score:2)
Re: (Score:2)
I am in no way dismissing hybrid systems that incorporate Lean, or similar symbolic apparatus. To the contrary, I find that there must be some "symbolic assist" (i.e., some predicate calculus engine) to pure NN systems to verify that their pattern match is a valid proof. But I'm more than willing to be proven wrong. I just don't see how to get past the fact that statistical inference is not the same as logical inference, at least in the province of proofs. But WTH, feel free to educate me.
Re: (Score:2)
Adding to your comment, mathematicians find logic engines useful tools too. Hybrid systems aren't clunky and shouldn't seem to be so. Whether it's some kind of trained AI plus a logic solver or a human plus a logic solver, it's good design taking advantages of the strengths of both systems.
Re: (Score:1)
It is a statistical model getting lucky. Works like "predictions" by stock analysts: You put out 1000s of predictions and if you are right by pure random chance once, you claim that it was your superior skills.
This is the real deal (Score:3)
More than any other AI use yet to solve an open problem, this one cannot be dismissed without just being completely irrational. Even Erdos 1196 people could use maybe was somewhere hidden in the training data (which as a mathematician in a closely related area seemed extremely for a whole bunch of reasons I'm happy to expand on) or that the problem just hadn't gotten a lot of attention (which was arguable there even as it was a well known enough problem that I had heard of it). But the Erdos unit distance problem is a genuinely famous problem. There's no way that there was a lack of attention to the problem, and there's no way to say some solution was in an obscure journal no one noticed. This is a problem which literally gets discussed in some undergrad classes.
The Annals of Mathematics is the most prestigious math journal in the world, and most mathematicians will never get a paper published there at all (I certainly don't expect to). I talked with another mathematician whose work is closer to this problem and asked "So is this the time when an AI first gets a result that should be essentially in the Annals?" and his response was "delete essentially from that sentence and the answer is yes." I have a bet with another mathematician that there would be no papers in either the Annals, Inventiones, or Crelle where the result was discovered by an AI before 2028. 72 hours ago I thought I had a decent chance at winning that bet. Now, I'm seeing what is likely the result that is going to make me lose.
Re: (Score:2)
Au contraire. If you look at 1000s of problems and burn a mountain of tokens, you are bound to find some rare cases where everything was already there but nobody put it together. This is exactly a case of that: An entirely meaningless stunt. It is like instead of a student having to solve a specific exam question, you ask them to find any question they can solve and then have them solve that.
Re: (Score:2)
Au contraire. If you look at 1000s of problems and burn a mountain of tokens, you are bound to find some rare cases where everything was already there but nobody put it together.
Have you read the paper? I have, and it is very much not the case of what is going on here. There are multiple deeply clever bits in this argument. If this were written by a human, it would be recognized as highly insightful. Moreover, you are also missing how much what human mathematicians often do really does look like what you are dismissing. I've worked on hundreds of problems, and gotten successful results in maybe 5 or 6 of them. If someone dismissed humans under that basis, you'd recognize the probl
How pathetic (Score:2)
They must have looked at 1000s of problems, burned mountains of tokens, and have had failure after failure, just to be able to find one "success". Of course the usual AI fans will not understand that this is a completely meaningless stunt.