Forgot your password?
typodupeerror
AI Math

OpenAI Claims It Solved an 80-Year-Old Math Problem 62

An anonymous reader quotes a report from TechCrunch: OpenAI claims its new reasoning model has produced an original mathematical proof disproving a famous unsolved conjecture in geometry, which was first posed by Paul Erdos in 1946. If this sounds familiar to you, it's because this isn't the first time OpenAI has made such a bold claim. Seven months ago, the AI giant's former VP Kevin Weil posted on X: "GPT-5 found solutions to 10 (!) previously unsolved Erds problems and made progress on 11 others."

It turns out, GPT-5 didn't actually solve those problems; it just found solutions that already existed in the literature. Taunts from rivals like Yann LeCun and Google DeepMind CEO Demis Hassabis followed, and Weil promptly took down his premature post. Today, at least, it seems OpenAI didn't make the same mistake twice. Alongside the announcement, the company published companion remarks (PDF) in support of the disproof from mathematicians like Noga Alon, Melanie Wood, and Thomas Bloom, who maintains the Erdos Problems website, and previously called Weil's post "a dramatic misrepresentation."

[...] The proof, per OpenAI, came from a new general-purpose reasoning model, not a system specifically designed to solve math problems or even this problem in particular. OpenAI says this is significant because it means AI systems are now more capable of holding together long, difficult chains of reasoning and connecting ideas across fields in ways researchers may not have previously explored. That has implications for biology, physics, engineering, and medicine.

OpenAI Claims It Solved an 80-Year-Old Math Problem

Comments Filter:
  • by phantomfive ( 622387 ) on Thursday May 21, 2026 @11:17AM (#66154144) Journal
    Here is the paper [openai.com]. It has some really nice commentary from mathematicians at the bottom. I recommend reading (or at least skimming) it. It's not clear exactly what the AI did, since it was "human-digested, somewhat simplified, and somewhat generalized." This quote from Melanie Matchett Wood is clarifying:

    "One other concern that directly arises in this development is that there is a history of closely related ideas in the literature,.. which are not appropriately referenced in Chat GPT’s paper. If a human came up with this argument and didn’t cite such previous work, we would assume that they were unfamiliar with the previous work and came up with the ideas independently, since our professional norms require us to cite previous work whose ideas influenced our work. On the other hand, Chat GPT is in some sense “familiar” with all the previous work."

    • by Hentes ( 2461350 ) on Thursday May 21, 2026 @11:46AM (#66154186)

      To be fair, even just a tool that can search the literature for solutions of similar problems is extremely useful.

      • I don't think anyone is contesting that. What they are contesting is a sociopath's claim that his tool solved something humans couldn't ... when really 95+% of what it did was just leverage existing human knowledge.

        Caveat: I'm not a mathematician and I didn't read the paper ... but Sam Altman is VERY well known for lying (constantly), so everything he says should be taken with multiple bags of salt.

        • when really 95+% of what it did was just leverage existing human knowledge.

          All mathematicians build math by leveraging what other mathematicians have done. When Andrew Wiles proved enough of the Modularity Theorem to prove Fermat's Last Theorem, he was leveraging ideas from all over, from algebraic geometry, from representation theory, from complex analysis, from Galois theory, from elliptic curves, etc. Combining that was the big thing. When Peter Scholze and Dustin Clausen recently did their work on "condensed mathematics" (which may get Clausen a Fields Medal- not Scholze since

          • Right, so you could make a claim like "all mathematicians leveraged (say) 95% prior knowledge". If OpenAI similarly leveraged 95%, and "discovered" the rest, it'd be (at least somewhat) legitimate to say "OpenAI invented a new theorem".

            But, if OpenAI actually leveraged 99.9% existing knowledge (remember, my post said ""95+%"), then it's NOT fair to compare it to a human discovery. If a company claims as much, they're dishonestly promoting their product.

            Again, I'm not a mathematician, and I did not read th

            • by ceoyoyo ( 59147 )

              Made up numbers are kind of silly. Especially when they refer to things that aren't easily quantifiable.

              If the AI made a non-trivial contribution to a proof then that's interesting. In this case it seems like it did so. I doubt it was something mathematicians couldn't do, but it does seem to be something they hadn't done.

              • Re: (Score:2, Interesting)

                by Rei ( 128717 )

                Also, it's silly that people are acting like "all problems but this one were already in the literature". AI has solved a whole slew on Erdos problems, and only a fraction had anything to do with existing literature [github.com].

                And even in "existing literature" examples, it's not "nobody ever thought to search before" as if all mathematicians are morons, or that mathematicians adore putting out Erdos problem solutions without claiming them, It's that nobody had ever thought to apply an obscure technique from a given pi

                • by ceoyoyo ( 59147 )

                  I am surprised that a site suppsedly full of computer scientists is the least bit surprised that AI can be good at mathematical proofs. For any formalizable problem you know where you start, you know where you want to end up and you know the legal state transitions. It's a simple tree search that we have, in fact, written lots of standard computer programs to execute.

                  The difficulty comes because any non-trivial proof is a very big tree search. But learning style AI is really good at pruning really big trees

                  • Literally no one has expressed that, as far as i can see, in this thread. You're fighting a straw man.

                    What I said, and stand by, is that we should be skeptical because Sam Altman is a sociopathic liar, and there is a long history of examples to support that. It has nothing to do with whether Open AI (or any AI) can create proofs.

                    • by ceoyoyo ( 59147 )

                      I wasn't replying to you.

                      LOTS of people here have been skeptical that AI can do X where X is pretty much anything, and certainly where X is "formulate novel math proofs."

                      PS: I don't disagree with you that Sam Altman claiming something isn't good evidence. That was not the subject of the post I replied to and has nothing to do with my reply to not your post.

                  • by allo ( 1728082 )

                    Chess and Go pruned the search tree by applying heuristics. Simplified: Take the current board, do one step, play random until the end. If you win many times, the subtree is more interesting than the subtrees where you lose more often. There is no such heuristic for a proof. Nobody tells you "You're close to the correct solution" and some attempts may look good until the end and the missing step then shows that the whole idea didn't work out. Chess and Go also allow to find suboptimal solutions and still be

                    • by ceoyoyo ( 59147 )

                      The old method of building chess players was heuristics. You have a bunch of engineers who hopefully know something about chess come up with rules about how good a position is. That's what "heuristics" are.

                      The thing that made chess computers better than any human and go computers better than all but the very best (or probably all by now) was using neural networks that learn their heuristics by experience. And yes, human mathematicians absolutely do learn heuristics, i.e. "gut feelings" regarding good approa

                • by vyvepe ( 809573 )

                  The simple fact is, AI has gotten much better at solving unsolved math problems than humans are. It's simply another field that it's taking over, the same way it has been taking over programming.

                  I'm not sure how well it works in programming. It helps in brainstorming and for a developer working in an area in which he is not advanced. There is a claim it works very well in crypto implemented in Rust with full test coverage and after human developer provided all the main interfaces. It is plausible. Most crypto code is opensource and easy to test so AI can work very well essentially rewriting the code it was trained on into Rust.[1]

                  But on the other side there is that study that did show that develope

                • I am mostly in agreement. Disagreement here:

                  The simple fact is, AI has gotten much better at solving unsolved math problems than humans are.

                  We're not at that point yet. Right now, we're not seeing it solve the genuinely hardest problems, like say the Riemann Hypothesis, or P ?= NP. What is true is that these systems are at least as good as a beginning grad student in all subfields and are outputting results equivalent to a top-notch mathematician on some problems. But it is also true that these systems are improving rapidly. So while your statement is false right now, it looks likely your statement i

                  • What is true is that these systems are at least as good as a beginning grad student in all subfields and are outputting results equivalent to a top-notch mathematician on some problems.

                    That might not be true. They didn't release the actual output from the AI. Instead, they released a proof that was heavily modified by mathematicians before being released to the world. It's not clear what the AI actually did. https://cdn.openai.com/pdf/74c... [openai.com]

                    Presumably if the output from the AI was very good, they would have just released that.

                • You need to actually read the paper [openai.com]. For example:

                  " If the level and type of human expertise that is represented on this note had been assembled to find a counterexample to this conjecture a month ago, and those people put in similar amounts of time working on it than they did to reading and thinking about Chat GPT’s solution, the mathematicians would have found a counterexample. However, without the claimed proof by Chat GPT, there is no particular reason anyone would have tried to look for a counterexample, assembled a group of experts with the appropriate expertise, or that the experts would have agreed to turn their attention to this problem. We can all be reminded by this development of how frequently interesting and powerful things happen mathematically when one applies ideas from one field to another, and think about how AI can help us find more cross-field applications."

            • The proof was processed quite a bit by humans after the computer wrote it, and they didn't release the original computer output (afaict).

              So it's entirely possible that all the machine did was a literature search, and then the humans fixed the last 1% or even the last 5%. Useful, but different than what is claimed.
          • All mathematicians build math by leveraging what other mathematicians have done

            No.There's no leveraging. The purpose of research papers in mathematics is to advance the frontiers of knowledge. That requires, by definition, original contribution. It's about as far removed from "leveraging" as you can get.

            Engineers (of the software kind, typically) don't always understand the difference between using something someone else has done for solving a problem, and creating a new original solution to a problem.

        • a sociopath's claim....

          Are you hinting at Sam Altman?

          • No, not hinting :)

            Sam Altman is VERY well known for lying (constantly), so everything he says should be taken with multiple bags of salt.

        • by evanh ( 627108 )

          No, the contention is that OpenAI is claiming the tool is solving when in reality it is only regurgitating a found solution.

          LLMs do pretty well at being a search engine when they're illegally fed everything.

          • This is a *famous* unsolved math problem. It was already highly unlikely that there was a solution hiding in the literature for Problem 1196. The Unit Distance Problem is so much more famous, with so much more work, it is genuinely hard to express how fantastically unlikely it was for this solution to be somehow hidden in the literature.
        • by allo ( 1728082 )

          "Couldn't" is a strong word here. Better say "Didn't yet". Some things need many people trying before someone gets the idea, or luck, or a combination of both. Ai isn't the superhuman, but AI managed to solve that one.

      • To be fair, even just a tool that can search the literature for solutions of similar problems is extremely useful.

        The problem for all the AI companies is that means they've built a discount version of the Library Computer Access Retrieval System (LCARS) with questionable data. That is not AGI. That is not intelligence at all. That is a search engine with a different, many times using a terrible, interface.

        That tool, while useful in some ways, is nowhere near worth the billions they've burned selling it as the tool to do everything. That means Sam, Dario, Elon, and the rest spent the GDP of several countries to buil

      • by gweihir ( 88907 )

        That is true. But that is not as what this gets pushed and such a search tool would never even remotely justify the extreme effort LLMs need to do this.

    • I am waiting for the paper to be thoroughly reviewed before I would declare that the model proved anything. Andrew Wiles made a mistake in his first attempt proving Fermat's Last Theorem where he relied on logic that had not been proven previously. It was a fundamental problem where he had to rework his proof around that flaw.
      • You can look, they had several mathematicians review the proof (scroll down): https://cdn.openai.com/pdf/74c... [openai.com]

        They mostly use plain English, so their comments are very helpful.
        • by gweihir ( 88907 )

          Either that verification will take a few years or this was exceptionally easy to prove once the pre-requisites were clear. Hence at this time there are only two options: The proof is wrong or it was not hard to find for a machine that cannot reason, but can trawl though vast amounts of data looking for correlations.

          My guess is the second: The only reason nobody else found this is because it is very easy to do, but the prerequisites are very non-intuitive and hence nobody looked. Kind of like those "20 year

          • One of the mathematicians commenting pointed out that this is hardly a proof at all, but rather a counter-example.

            It's hard to say intelligent things about what the AI is actually doing, because they haven't released the source code. It's not just an LLM, it's an LLM with some kind of algorithm tied to it somehow. In particular, it didn't need to write out a logical train of thought (which would be a proof), all it had to do was say, "here is a tighter way to pack dots on a grid."
    • by dfghjk ( 711126 )

      "One other concern that directly arises in this development is that there is a history of closely related ideas in the literature,.. which are not appropriately referenced in Chat GPT’s paper. If a human came up with this argument and didn’t cite such previous work, we would assume that they were unfamiliar with the previous work and came up with the ideas independently, since our professional norms require us to cite previous work whose ideas influenced our work. On the other hand, Chat GPT is

    • by gweihir ( 88907 )

      The real question is how much effort was wasted on problems it could not solve. My guess is a _lot_. Even a statistical model can get lucky occasionally, if everything was already in the training data. This is not any proof a systematic or meaningful skill. It is like asking a student to find a problem they can solve and then have them solve it, instead of giving them a specific problem to solve. Meaningless.

      • by Rei ( 128717 )

        LLMs are not "statistical models" (randomness only even comes into play in the final conversion from latent space to token space because latent space is high dimensional, token space is low dimension, you need a rounding mechanism, and a "noisy" rounding mechanism works best; what you're thinking of, by contrast, is Markov models). And you cannot just "get lucky and randomly solve an unsolved math problem"; that's not how any of this works.

        • My understanding is that LLMs are built on a foundation of ANNs, and that indeed the backpropagation used to train ANNs is a statistical process; the cost function that must be minimized (via vector calculus) is a least-squared-error variant, a decidedly statistical calculation. How does this not make the model statistical ?

          • A precise model would be Bayesian classification, because you can output the exact right answer in every case based on the training data.

            Neural networks are approximations you use when the data is too large to be tractable with a Bayesian classifier.
          • by Rei ( 128717 )

            My understanding is that LLMs are built on a foundation of ANNs, and that indeed the backpropagation used to train ANNs is a statistical process;

            Two responses. One, that's discussing individual-neuron scale processes rather than collective processes; and this was a discussion about inference, not training. Human neurons also learn by error minimization (Hebbian learning). But this does not describe the macroscopic processes that result from said minimization.

            * During training, neurons develop into clas

            • The words to describe the process that you have identified are statistical inference, not logical inference. I don't believe that you can square that circle; it's why NNs are said to interpolate, but not extrapolate. But my beliefs are, shall we say, flexible -- I'm open to a counter-argument.

              • by gweihir ( 88907 )

                Essentially, yes. The problem with "statistical inference" is that is does not stack. Logical inference can be stacked as high as you want and the end-result (unless you made a real "hard" error in applying the rules) will always be valid. With "statistical inference", that is not true. Each step only has a probability smaller than 1 to succeed and that is fundamental. At some, not very high, number of steps, the results become arbitrary. Hence the "feat" here is that the LLM found the result to be very clo

              • The words to describe the process that you have identified are statistical inference, not logical inference. I don't believe that you can square that circle; it's why NNs are said to interpolate, but not extrapolate. But my beliefs are, shall we say, flexible -- I'm open to a counter-argument.

                So what is statistical inference to you and how does it related to what LLM models do? While were at it, what is extrapolation? I feel like you're using a lot of terminology in an incorrect or at least imprecise wa

    • It's not clear exactly what the AI did, since it was "human-digested, somewhat simplified, and somewhat generalized."

      The mathematicians in the paper you linked all contend that the AI came up with the key counter example. Just because humans wrote it in a more readable form that is slightly generalized doesn't invalidate the the counter example the AI came up with works for the original conjecture. Are you going to say the same of Grigori Perelman's proof.

      This quote from Melanie Matchett Wood is clarif

  • Or any other proof assistant / verifier ? Is this true NN reasoning, or just more LLM/NN spaghetti thrown up against a symbolic verifier ?

    • Re:Did it use Lean ? (Score:4, Informative)

      by JoshuaZ ( 1134087 ) on Thursday May 21, 2026 @12:51PM (#66154282) Homepage
      This system did not use Lean. But note that systems which do use Lean as a direct verification shouldn't be dismissed either. The fact that LLMs with symbolic verifiers are powerful doesn't get to be less true because it seems like a really clunky architecture to some people. We don't know the exact way this system functions since it is an internal model used by OpenAI that they have not released to general use or given a lot of details about.
      • I am in no way dismissing hybrid systems that incorporate Lean, or similar symbolic apparatus. To the contrary, I find that there must be some "symbolic assist" (i.e., some predicate calculus engine) to pure NN systems to verify that their pattern match is a valid proof. But I'm more than willing to be proven wrong. I just don't see how to get past the fact that statistical inference is not the same as logical inference, at least in the province of proofs. But WTH, feel free to educate me.

      • by ceoyoyo ( 59147 )

        Adding to your comment, mathematicians find logic engines useful tools too. Hybrid systems aren't clunky and shouldn't seem to be so. Whether it's some kind of trained AI plus a logic solver or a human plus a logic solver, it's good design taking advantages of the strengths of both systems.

    • by gweihir ( 88907 )

      It is a statistical model getting lucky. Works like "predictions" by stock analysts: You put out 1000s of predictions and if you are right by pure random chance once, you claim that it was your superior skills.

  • by JoshuaZ ( 1134087 ) on Thursday May 21, 2026 @12:59PM (#66154292) Homepage

    More than any other AI use yet to solve an open problem, this one cannot be dismissed without just being completely irrational. Even Erdos 1196 people could use maybe was somewhere hidden in the training data (which as a mathematician in a closely related area seemed extremely for a whole bunch of reasons I'm happy to expand on) or that the problem just hadn't gotten a lot of attention (which was arguable there even as it was a well known enough problem that I had heard of it). But the Erdos unit distance problem is a genuinely famous problem. There's no way that there was a lack of attention to the problem, and there's no way to say some solution was in an obscure journal no one noticed. This is a problem which literally gets discussed in some undergrad classes.

    The Annals of Mathematics is the most prestigious math journal in the world, and most mathematicians will never get a paper published there at all (I certainly don't expect to). I talked with another mathematician whose work is closer to this problem and asked "So is this the time when an AI first gets a result that should be essentially in the Annals?" and his response was "delete essentially from that sentence and the answer is yes." I have a bet with another mathematician that there would be no papers in either the Annals, Inventiones, or Crelle where the result was discovered by an AI before 2028. 72 hours ago I thought I had a decent chance at winning that bet. Now, I'm seeing what is likely the result that is going to make me lose.

    • by gweihir ( 88907 )

      Au contraire. If you look at 1000s of problems and burn a mountain of tokens, you are bound to find some rare cases where everything was already there but nobody put it together. This is exactly a case of that: An entirely meaningless stunt. It is like instead of a student having to solve a specific exam question, you ask them to find any question they can solve and then have them solve that.

      • Au contraire. If you look at 1000s of problems and burn a mountain of tokens, you are bound to find some rare cases where everything was already there but nobody put it together.

        Have you read the paper? I have, and it is very much not the case of what is going on here. There are multiple deeply clever bits in this argument. If this were written by a human, it would be recognized as highly insightful. Moreover, you are also missing how much what human mathematicians often do really does look like what you are dismissing. I've worked on hundreds of problems, and gotten successful results in maybe 5 or 6 of them. If someone dismissed humans under that basis, you'd recognize the probl

        • There are multiple deeply clever bits in this argument.

          Which is the part that you think is deeply clever?

          • Lemma 2.2 struck me as a type of bound on an extension with complex multiplication that I had not seen before and seemed clever. I was also struck by even as the Lemma itself was clever, that the proof of that Lemma was pretty straightforward. The overall approach is in many respects pretty similar to existing work and feels in some respects in the same spirit as Erdos's own lower bound construction, but having a tower of fields which seemed clever to me, but the writeup notes three prior papers where a to
  • by gweihir ( 88907 )

    They must have looked at 1000s of problems, burned mountains of tokens, and have had failure after failure, just to be able to find one "success". Of course the usual AI fans will not understand that this is a completely meaningless stunt.

    • by ceoyoyo ( 59147 )

      Must have.

      And that would be totes different than human mathematicians who pick a problem and work at it it and only it until they succeed.

      Dem gaps are a closin hey?

  • Seven months ago, the AI giant's former VP Kevin Weil posted on X: "GPT-5 found solutions to 10 (!) previously unsolved Erds problems and made progress on 11 others." ... It turns out, GPT-5 didn't actually solve those problems; it just found solutions that already existed in the literature.

    Technically, he said they "found solutions" and they did find them - in the literature. He didn't say they "solved them".

    Everyone else assumed he meant solved and assumptions are like AI companies, everyone (apparently) has one. :-)

    • Good point.

      Along those lines, I can say that I have personally found a marvelous proof which is too large to fit in the margin of this comment...

The solution of problems is the most characteristic and peculiar sort of voluntary thinking. -- William James

Working...