Forgot your password?
typodupeerror
AI Math

OpenAI Claims It Solved an 80-Year-Old Math Problem 29

An anonymous reader quotes a report from TechCrunch: OpenAI claims its new reasoning model has produced an original mathematical proof disproving a famous unsolved conjecture in geometry, which was first posed by Paul Erdos in 1946. If this sounds familiar to you, it's because this isn't the first time OpenAI has made such a bold claim. Seven months ago, the AI giant's former VP Kevin Weil posted on X: "GPT-5 found solutions to 10 (!) previously unsolved Erds problems and made progress on 11 others."

It turns out, GPT-5 didn't actually solve those problems; it just found solutions that already existed in the literature. Taunts from rivals like Yann LeCun and Google DeepMind CEO Demis Hassabis followed, and Weil promptly took down his premature post. Today, at least, it seems OpenAI didn't make the same mistake twice. Alongside the announcement, the company published companion remarks (PDF) in support of the disproof from mathematicians like Noga Alon, Melanie Wood, and Thomas Bloom, who maintains the Erdos Problems website, and previously called Weil's post "a dramatic misrepresentation."

[...] The proof, per OpenAI, came from a new general-purpose reasoning model, not a system specifically designed to solve math problems or even this problem in particular. OpenAI says this is significant because it means AI systems are now more capable of holding together long, difficult chains of reasoning and connecting ideas across fields in ways researchers may not have previously explored. That has implications for biology, physics, engineering, and medicine.

OpenAI Claims It Solved an 80-Year-Old Math Problem

Comments Filter:
  • by phantomfive ( 622387 ) on Thursday May 21, 2026 @11:17AM (#66154144) Journal
    Here is the paper [openai.com]. It has some really nice commentary from mathematicians at the bottom. I recommend reading (or at least skimming) it. It's not clear exactly what the AI did, since it was "human-digested, somewhat simplified, and somewhat generalized." This quote from Melanie Matchett Wood is clarifying:

    "One other concern that directly arises in this development is that there is a history of closely related ideas in the literature,.. which are not appropriately referenced in Chat GPT’s paper. If a human came up with this argument and didn’t cite such previous work, we would assume that they were unfamiliar with the previous work and came up with the ideas independently, since our professional norms require us to cite previous work whose ideas influenced our work. On the other hand, Chat GPT is in some sense “familiar” with all the previous work."

    • by Hentes ( 2461350 ) on Thursday May 21, 2026 @11:46AM (#66154186)

      To be fair, even just a tool that can search the literature for solutions of similar problems is extremely useful.

      • I don't think anyone is contesting that. What they are contesting is a sociopath's claim that his tool solved something humans couldn't ... when really 95+% of what it did was just leverage existing human knowledge.

        Caveat: I'm not a mathematician and I didn't read the paper ... but Sam Altman is VERY well known for lying (constantly), so everything he says should be taken with multiple bags of salt.

        • when really 95+% of what it did was just leverage existing human knowledge.

          All mathematicians build math by leveraging what other mathematicians have done. When Andrew Wiles proved enough of the Modularity Theorem to prove Fermat's Last Theorem, he was leveraging ideas from all over, from algebraic geometry, from representation theory, from complex analysis, from Galois theory, from elliptic curves, etc. Combining that was the big thing. When Peter Scholze and Dustin Clausen recently did their work on "condensed mathematics" (which may get Clausen a Fields Medal- not Scholze since

          • Right, so you could make a claim like "all mathematicians leveraged (say) 95% prior knowledge". If OpenAI similarly leveraged 95%, and "discovered" the rest, it'd be (at least somewhat) legitimate to say "OpenAI invented a new theorem".

            But, if OpenAI actually leveraged 99.9% existing knowledge (remember, my post said ""95+%"), then it's NOT fair to compare it to a human discovery. If a company claims as much, they're dishonestly promoting their product.

            Again, I'm not a mathematician, and I did not read th

            • by ceoyoyo ( 59147 )

              Made up numbers are kind of silly. Especially when they refer to things that aren't easily quantifiable.

              If the AI made a non-trivial contribution to a proof then that's interesting. In this case it seems like it did so. I doubt it was something mathematicians couldn't do, but it does seem to be something they hadn't done.

              • Re: (Score:2, Interesting)

                by Rei ( 128717 )

                Also, it's silly that people are acting like "all problems but this one were already in the literature". AI has solved a whole slew on Erdos problems, and only a fraction had anything to do with existing literature [github.com].

                And even in "existing literature" examples, it's not "nobody ever thought to search before" as if all mathematicians are morons, or that mathematicians adore putting out Erdos problem solutions without claiming them, It's that nobody had ever thought to apply an obscure technique from a given pi

                • by ceoyoyo ( 59147 )

                  I am surprised that a site suppsedly full of computer scientists is the least bit surprised that AI can be good at mathematical proofs. For any formalizable problem you know where you start, you know where you want to end up and you know the legal state transitions. It's a simple tree search that we have, in fact, written lots of standard computer programs to execute.

                  The difficulty comes because any non-trivial proof is a very big tree search. But learning style AI is really good at pruning really big trees

                • by vyvepe ( 809573 )

                  The simple fact is, AI has gotten much better at solving unsolved math problems than humans are. It's simply another field that it's taking over, the same way it has been taking over programming.

                  I'm not sure how well it works in programming. It helps in brainstorming and for a developer working in an area in which he is not advanced. There is a claim it works very well in crypto implemented in Rust with full test coverage and after human developer provided all the main interfaces. It is plausible. Most crypto code is opensource and easy to test so AI can work very well essentially rewriting the code it was trained on into Rust.[1]

                  But on the other side there is that study that did show that develope

                • I am mostly in agreement. Disagreement here:

                  The simple fact is, AI has gotten much better at solving unsolved math problems than humans are.

                  We're not at that point yet. Right now, we're not seeing it solve the genuinely hardest problems, like say the Riemann Hypothesis, or P ?= NP. What is true is that these systems are at least as good as a beginning grad student in all subfields and are outputting results equivalent to a top-notch mathematician on some problems. But it is also true that these systems are improving rapidly. So while your statement is false right now, it looks likely your statement i

      • To be fair, even just a tool that can search the literature for solutions of similar problems is extremely useful.

        The problem for all the AI companies is that means they've built a discount version of the Library Computer Access Retrieval System (LCARS) with questionable data. That is not AGI. That is not intelligence at all. That is a search engine with a different, many times using a terrible, interface.

        That tool, while useful in some ways, is nowhere near worth the billions they've burned selling it as the tool to do everything. That means Sam, Dario, Elon, and the rest spent the GDP of several countries to buil

      • by gweihir ( 88907 )

        That is true. But that is not as what this gets pushed and such a search tool would never even remotely justify the extreme effort LLMs need to do this.

    • I am waiting for the paper to be thoroughly reviewed before I would declare that the model proved anything. Andrew Wiles made a mistake in his first attempt proving Fermat's Last Theorem where he relied on logic that had not been proven previously. It was a fundamental problem where he had to rework his proof around that flaw.
    • by dfghjk ( 711126 )

      "One other concern that directly arises in this development is that there is a history of closely related ideas in the literature,.. which are not appropriately referenced in Chat GPT’s paper. If a human came up with this argument and didn’t cite such previous work, we would assume that they were unfamiliar with the previous work and came up with the ideas independently, since our professional norms require us to cite previous work whose ideas influenced our work. On the other hand, Chat GPT is

    • by gweihir ( 88907 )

      The real question is how much effort was wasted on problems it could not solve. My guess is a _lot_. Even a statistical model can get lucky occasionally, if everything was already in the training data. This is not any proof a systematic or meaningful skill. It is like asking a student to find a problem they can solve and then have them solve it, instead of giving them a specific problem to solve. Meaningless.

      • Re: (Score:2, Informative)

        by Rei ( 128717 )

        LLMs are not "statistical models" (randomness only even comes into play in the final conversion from latent space to token space because latent space is high dimensional, token space is low dimension, you need a rounding mechanism, and a "noisy" rounding mechanism works best; what you're thinking of, by contrast, is Markov models). And you cannot just "get lucky and randomly solve an unsolved math problem"; that's not how any of this works.

        • My understanding is that LLMs are built on a foundation of ANNs, and that indeed the backpropagation used to train ANNs is a statistical process; the cost function that must be minimized (via vector calculus) is a least-squared-error variant, a decidedly statistical calculation. How does this not make the model statistical ?

  • Or any other proof assistant / verifier ? Is this true NN reasoning, or just more LLM/NN spaghetti thrown up against a symbolic verifier ?

    • This system did not use Lean. But note that systems which do use Lean as a direct verification shouldn't be dismissed either. The fact that LLMs with symbolic verifiers are powerful doesn't get to be less true because it seems like a really clunky architecture to some people. We don't know the exact way this system functions since it is an internal model used by OpenAI that they have not released to general use or given a lot of details about.
      • I am in no way dismissing hybrid systems that incorporate Lean, or similar symbolic apparatus. To the contrary, I find that there must be some "symbolic assist" (i.e., some predicate calculus engine) to pure NN systems to verify that their pattern match is a valid proof. But I'm more than willing to be proven wrong. I just don't see how to get past the fact that statistical inference is not the same as logical inference, at least in the province of proofs. But WTH, feel free to educate me.

      • by ceoyoyo ( 59147 )

        Adding to your comment, mathematicians find logic engines useful tools too. Hybrid systems aren't clunky and shouldn't seem to be so. Whether it's some kind of trained AI plus a logic solver or a human plus a logic solver, it's good design taking advantages of the strengths of both systems.

    • by gweihir ( 88907 )

      It is a statistical model getting lucky. Works like "predictions" by stock analysts: You put out 1000s of predictions and if you are right by pure random chance once, you claim that it was your superior skills.

  • by JoshuaZ ( 1134087 ) on Thursday May 21, 2026 @12:59PM (#66154292) Homepage

    More than any other AI use yet to solve an open problem, this one cannot be dismissed without just being completely irrational. Even Erdos 1196 people could use maybe was somewhere hidden in the training data (which as a mathematician in a closely related area seemed extremely for a whole bunch of reasons I'm happy to expand on) or that the problem just hadn't gotten a lot of attention (which was arguable there even as it was a well known enough problem that I had heard of it). But the Erdos unit distance problem is a genuinely famous problem. There's no way that there was a lack of attention to the problem, and there's no way to say some solution was in an obscure journal no one noticed. This is a problem which literally gets discussed in some undergrad classes.

    The Annals of Mathematics is the most prestigious math journal in the world, and most mathematicians will never get a paper published there at all (I certainly don't expect to). I talked with another mathematician whose work is closer to this problem and asked "So is this the time when an AI first gets a result that should be essentially in the Annals?" and his response was "delete essentially from that sentence and the answer is yes." I have a bet with another mathematician that there would be no papers in either the Annals, Inventiones, or Crelle where the result was discovered by an AI before 2028. 72 hours ago I thought I had a decent chance at winning that bet. Now, I'm seeing what is likely the result that is going to make me lose.

    • by gweihir ( 88907 )

      Au contraire. If you look at 1000s of problems and burn a mountain of tokens, you are bound to find some rare cases where everything was already there but nobody put it together. This is exactly a case of that: An entirely meaningless stunt. It is like instead of a student having to solve a specific exam question, you ask them to find any question they can solve and then have them solve that.

      • Au contraire. If you look at 1000s of problems and burn a mountain of tokens, you are bound to find some rare cases where everything was already there but nobody put it together.

        Have you read the paper? I have, and it is very much not the case of what is going on here. There are multiple deeply clever bits in this argument. If this were written by a human, it would be recognized as highly insightful. Moreover, you are also missing how much what human mathematicians often do really does look like what you are dismissing. I've worked on hundreds of problems, and gotten successful results in maybe 5 or 6 of them. If someone dismissed humans under that basis, you'd recognize the probl

  • by gweihir ( 88907 ) on Thursday May 21, 2026 @01:19PM (#66154322)

    They must have looked at 1000s of problems, burned mountains of tokens, and have had failure after failure, just to be able to find one "success". Of course the usual AI fans will not understand that this is a completely meaningless stunt.

  • Seven months ago, the AI giant's former VP Kevin Weil posted on X: "GPT-5 found solutions to 10 (!) previously unsolved Erds problems and made progress on 11 others." ... It turns out, GPT-5 didn't actually solve those problems; it just found solutions that already existed in the literature.

    Technically, he said they "found solutions" and they did find them - in the literature. He didn't say they "solved them".

    Everyone else assumed he meant solved and assumptions are like AI companies, everyone (apparently) has one. :-)

UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things. -- Doug Gwyn

Working...