Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Math AI

Advanced Version of Gemini With Deep Think Officially Achieves Gold-Medal Standard at the International Mathematical Olympiad (deepmind.google) 40

An anonymous reader shares a blog post: The International Mathematical Olympiad is the world's most prestigious competition for young mathematicians, and has been held annually since 1959. Each country taking part is represented by six elite, pre-university mathematicians who compete to solve six exceptionally difficult problems in algebra, combinatorics, geometry, and number theory. Medals are awarded to the top half of contestants, with approximately 8% receiving a prestigious gold medal.

Recently, the IMO has also become an aspirational challenge for AI systems as a test of their advanced mathematical problem-solving and reasoning capabilities. Last year, Google DeepMind's combined AlphaProof and AlphaGeometry 2 systems achieved the silver-medal standard, solving four out of the six problems and scoring 28 points. Making use of specialist formal languages, this breakthrough demonstrated that AI was beginning to approach elite human mathematical reasoning.

This year, we were amongst an inaugural cohort to have our model results officially graded and certified by IMO coordinators using the same criteria as for student solutions. Recognizing the significant accomplishments of this year's student-participants, we're now excited to share the news of Gemini's breakthrough performance. An advanced version of Gemini Deep Think solved five out of the six IMO problems perfectly, earning 35 total points, and achieving gold-medal level performance.

Advanced Version of Gemini With Deep Think Officially Achieves Gold-Medal Standard at the International Mathematical Olympiad

Comments Filter:
  • I have to wonder. Did "Gemini Deep Think" solve the problems or simply regurgitate the answer from the billions of sucked up webpages, math research papers, etc. used to train the model? Actual competitors don't have the complete history of https://math.stackexchange.com... [stackexchange.com] at their fingertips.

    • Re:AI Training (Score:5, Informative)

      by JoshuaZ ( 1134087 ) on Monday July 21, 2025 @03:57PM (#65535218) Homepage
      I'm a mathematician, so I may have some expertise here: Humans spend years training for the IMO so they functionally sucking all of that up. And the IMO problems themselves vary a lot. Even extremely bright people including professional mathematicians, would have trouble with some IMO problems. Mere regurgitation is insufficient to solve the problems.
      • Thank you

      • by gweihir ( 88907 )

        But what about putting them into a Computer Algebra system? We have had these for decades now. In fact, I used one when I started my CS studies 35 years ago.

        The very point of such a competition is to have a human do it, not a machine.

        • The easy IMO problems, like some P1s or P2s could be sometimes done by a computer algebra system, but that's an exception there. And sure, the point of the competition is for humans to do it. This isn't about the IMO as a competition, but the usefulness of the IMO as essentially a natural metric of how effective these AI systems are at difficult reasoning problems.
          • by gweihir ( 88907 )

            but the usefulness of the IMO as essentially a natural metric of how effective these AI systems are at difficult reasoning problems.

            Since LLMs have zero reasoning capability (the math does not allow it), it is obviously a failure at this.

            • Does it worry you that there's a circularity in your own reasoning? You are taking for granted that LLMs cannot reason and therefore choosing to decide that any metric which shows they can reason must be flawed, and therefore cannot be evidence against your claim. Aside from the circularity, you are also confusing whether an LLM AN has "reasoning capability" with whether the system can succeed at problems which humans solve via difficult reasoning. This is essentially akin to insisting that an airplane must
              • Or that watches based on quartz crystals are just an idle curiosity because they aren't built using intricate gear assemblies.

              • No, they are not designed to reason. You're arguing from ignorance of the subject just using rhetorical techniques. You're mistaking technical questions with real answers for philosophical questions where you can just blow any horseshit you want out your ass and then use rhetoric to "argue" it.

                And your rhetoric is mostly red herrings and strawmen.

                • Instead of insults and general claims of ignorance, do you want to go and explain what "red herrings and strawmen" I've engaged in?

                  You're mistaking technical questions with real answers for philosophical questions where you can just blow any horseshit you want out your ass and then use rhetoric to "argue" it.

                  What technical questions do you think I'm mistaking for philosophical questions? I must confess also that I find it somewhat strange that you make a claim about strawmen and then say "No, they are not designed to reason" as if that somehow responds to what I wrote where I explicitly said that the question about whether an LLM AI can reason is a distinct question from whether they can solve problems humans solve via reasoning. So it seems very odd to tell me that they are not designed to reason. So what straw positions are there here other than yours?

                • Sorry, messed up the formatting with the previous comment, so replying separately to make it more readable. My apologies. Instead of insults and general claims of ignorance, do you want to go and explain what "red herrings and strawmen" I've engaged in?

                  You're mistaking technical questions with real answers for philosophical questions where you can just blow any horseshit you want out your ass and then use rhetoric to "argue" it.

                  What technical questions do you think I'm mistaking for philosophical questions? I must confess also that I find it somewhat strange that you make a claim about strawmen and then say "No, they are not designed to reason" as if that somehow responds to what I

                • by gweihir ( 88907 )

                  That nicely sums it up. Most of these people (like the one you responded to) have zero clue how LLMs actually work and what limits that imposes. They just want to believe like fucking morons in a cult.

              • by gweihir ( 88907 )

                You are so utterly without insight here, it is embarrassing. If it is a benchmark for difficult reasoning questions, it must measure reasoning or it is a failure. Take chess: It is not a reasoning problem to a computer. It is, in part, to a human. But because it is not to a computer, it is not suitable as a benchmark for reasoning capabilities.

                • You are so utterly without insight here, it is embarrassing. If it is a benchmark for difficult reasoning questions, it must measure reasoning or it is a failure. Take chess: It is not a reasoning problem to a computer. It is, in part, to a human. But because it is not to a computer, it is not suitable as a benchmark for reasoning capabilities.

                  Please reread the last two sentences of my previous reply since they address exactly this sort of issue. Whether humans use reasoning to do something doesn't mean something cannot be solved any other way, hence the analogy with the birds. If you think there's a problem with that analogy, by all means please explain it. That said, given that in the other subthread you are actively refusing to test a claim you've made that should just take you a few minutes, I'm guessing that this is not going to be a terribl

                • by HiThere ( 15173 )

                  To justify that assertion you must have a very specific definition of what "reasoning" entails. Would you care to share it?

                  In the definition of reasoning that I use, e.g., alpha-beta pruning would count as reasoning. Clearly you disagree.

    • You could ask the same question about the contestants. They solved these problems because they trained. They recognized techniques and ideas similar to other problems they encountered, and they applied them to a new problem.
  • by ebunga ( 95613 ) on Monday July 21, 2025 @03:43PM (#65535190)

    What will they think of next?

  • I have in mind, the latest Grok 4 Heavy, was he tested? I didn't saw any information about other engines doing that test to compare.

  • by gweihir ( 88907 )

    I have no doubt that Maple or any other decent Computer Algebra system could have done the same ... 30 years ago. Or Wolfram Alpha.

    This is a completely meaningless stunt. The only purpose is to deceive the stupid about what these systems can do, or rather cannot do.

    • If you really believe this, by all means try this. This year's problems are here https://artofproblemsolving.com/wiki/index.php/2025_IMO_Problems [artofproblemsolving.com] and you can try to see if you can get Maple, or Mathematica to do these problems. The same link has all the previous contests back to 1959 (1980 did not have a contest). Given format changes, you are more likely to see problems that these systems can solve, especially some of the real inequality problems (which Maple can certainly do), but those problems have not
      • by gweihir ( 88907 )

        Why would I waste time in what is clearly a failed technology? The stupid always need years and years to find out a hype is just a hype and never has any of the substance claimed. I can do it directly.

        • So, your evidence is to make a claim about what Maple or another computer algebra system must be able to do, and you aren't willing to actually spend 5 minutes checking whether you can actually do that. Here: I'll even make the following offer. Take the 2025 IMO. If you can get Maple or any other computer algebra system to solve more than 1 problem on it, and provide screenshots of it doing so, I'll donate $100 to a charity of your choice. So now want to actually go and do what you claim must be obviously t
        • by ceoyoyo ( 59147 )

          Ah, the gaps close even tighter.

          "I'm totally sure X or Y could do that."

          "Okay, try it!"

          "Why would I waste my time?"

          Why indeed. You might find out that you're wrong.

  • The IMO is pretty pissed off at Google because the results are embargoed from publication until the 28th of July.
    Apparently Google did their own metrics and claimed the Gold-Medal. IMO is not happy,
    https://arstechnica.com/ai/202... [arstechnica.com]

    But hey, good on Goog for demonstrating some level of success, whether IMO gives them a gold medal or not.
    Shame on them for violating the publication embargo, but I haven't read that contract or those T&Cs so I only
    judge by what IMO says and what Goog says.

  • What's worse than the self-grading is that the rules include not using calculators.

    How can an AI do math without using a calculator?

    There is no way a computer anything can score even 1% using the same rules as humans.

    • If you read the summary, you'd see it was not self-graded but graded by IMO people. Also, while it is true that AI systems functionally have in some sense access to calculators, you can go and look at the IMO problems yourself https://artofproblemsolving.com/wiki/index.php/2025_IMO [artofproblemsolving.com] and see that the assistance one would get from a standard calculator is pretty minimal.
      • I think when people talk about calculator assistance for LLMs, they are conflating a general class of approaches to bolt on reasoning capabilities into those LLMs by delegating arithmetic, and delegating logic, to specialized, provably correct software tools. These tools are not AIs themselves and do not use ML algorithms in their implementations, so should be considered traditional software. The tools are necessary though, as the Chatbot/language models alone are incapable of performing arithmetic, and th

Riches: A gift from Heaven signifying, "This is my beloved son, in whom I am well pleased." -- John D. Rockefeller, (slander by Ambrose Bierce)

Working...