Advanced Version of Gemini With Deep Think Officially Achieves Gold-Medal Standard at the International Mathematical Olympiad (deepmind.google) 64

Posted by msmash on Monday July 21, 2025 @03:08PM from the moving-forward dept.

An anonymous reader shares a blog post: The International Mathematical Olympiad is the world's most prestigious competition for young mathematicians, and has been held annually since 1959. Each country taking part is represented by six elite, pre-university mathematicians who compete to solve six exceptionally difficult problems in algebra, combinatorics, geometry, and number theory. Medals are awarded to the top half of contestants, with approximately 8% receiving a prestigious gold medal.

Recently, the IMO has also become an aspirational challenge for AI systems as a test of their advanced mathematical problem-solving and reasoning capabilities. Last year, Google DeepMind's combined AlphaProof and AlphaGeometry 2 systems achieved the silver-medal standard, solving four out of the six problems and scoring 28 points. Making use of specialist formal languages, this breakthrough demonstrated that AI was beginning to approach elite human mathematical reasoning.

This year, we were amongst an inaugural cohort to have our model results officially graded and certified by IMO coordinators using the same criteria as for student solutions. Recognizing the significant accomplishments of this year's student-participants, we're now excited to share the news of Gemini's breakthrough performance. An advanced version of Gemini Deep Think solved five out of the six IMO problems perfectly, earning 35 total points, and achieving gold-medal level performance.

Advanced Version of Gemini With Deep Think Officially Achieves Gold-Medal Standard at the International Mathematical Olympiad

Post Load All Comments

Search 64 Comments Log In/Create an Account

Comments Filter:

- Re: (Score:3)
  
  by backslashdot ( 95548 ) writes:
  
  Oh, so you must have found it easy when you got your gold medal?
  - Re: (Score:2)
    
    by Junta ( 36770 ) writes:
    
    The point is we have a myriad of "tests that are hard for humans, but don't necessarily translate to anything vaguely useful". In academics, a lot of tests are only demanding of reasoning ability because the human has limited memory. Computers short on actual "reasoning" largely make up for it by having just mind boggling amounts of something more akin to recall than reasoning (it's something a bit weirder, but as far as analogies go, recall is closer).
    It's kind of like bragging that your RC boat could ge
    - Re: (Score:2)
      
      by ShanghaiBill ( 739463 ) writes:
      
      The five states of grief are denial, anger, bargaining, depression, and acceptance.
      You seem to be stuck on denial.
      - Re: (Score:2)
        
        by Junta ( 36770 ) writes:
        
        I'm not in denial, the LLMs and other forms of AI have utility, but expectations have to be mitigated.
        Was in a discussion with a software executive a couple weeks back who said he fully anticipates he can lay off every one of his software developers and testers in the next year and only have to retain the 'important' people: the executives and sales people.
        People see articles like this show how LLMs enable computing to reach another tier of 'stupid human tricks', which is certainly novel, but people overext
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      It's kind of like bragging that your RC boat could get a gold medal in the Olympic 1500 meter freestyle. It didn't complete the challenge in the same way, and that boat would be unable to, for example, save someone that is about to drown, because the boat can just go places, it can't do what that human swimmer could do. That person swimming 1500m itself is a useless feat of interest, and not really directly useful in and of itself.
      Nice example.
    - Re: (Score:2)
      
      by JoshuaZ ( 1134087 ) writes:
      
      It's kind of like bragging that your RC boat could get a gold medal in the Olympic 1500 meter freestyle. It didn't complete the challenge in the same way, and that boat would be unable to, for example, save someone that is about to drown, because the boat can just go places, it can't do what that human swimmer could do. That person swimming 1500m itself is a useless feat of interest, and not really directly useful in and of itself.
      Let's extend your analogy a bit more. An actual boat, not just a little RC, with an engine can go faster and farther than any human. Yes the boat cannot rescue a person by itself, but it doesn't need to. And if you are drowning two miles off shore I strongly suspect you'd rather the rescuers get in a boat and motor over to you and then rescue you rather than trying to swim out two miles. The question is not "Is this technology identical to a human?"
      - Re: (Score:2)
        
        by Junta ( 36770 ) writes:
        
        I chose RC boat because some people are using this competing at "stupid human trick" as an example of intrinsic proof of LLMs being able to supersede humans.
        In the scenario of an olympic swimming competition, an autonomous boat vs a manned boat would show no difference to each other, both would compete the task much better than a human. It's a useless test to measure general utility. Just like a person swimming a 1500 meter distance is not really a useful indicator on its own of how useful they are. Thes
        
        Re: (Score:2)
        
        by JoshuaZ ( 1134087 ) writes:
        
        It seems then that there's not actually much disagreement there, but at least some disagreement on emphasis. I probably read you as making a less nuanced claim since your reply was part of the larger thread where one person called this evidence that the IMO "repetitive bullshit" and the other person made a point about a gold medal being difficult. I don't see anyone in this thread though claiming that IMO success from a system is a perfect metric or that it is a good metric of usefulness. But it is a metric
AI Training (Score:1)

by ebonum ( 830686 ) writes:

I have to wonder. Did "Gemini Deep Think" solve the problems or simply regurgitate the answer from the billions of sucked up webpages, math research papers, etc. used to train the model? Actual competitors don't have the complete history of https://math.stackexchange.com... [stackexchange.com] at their fingertips.
- Re:AI Training (Score:5, Informative)
  
  by JoshuaZ ( 1134087 ) writes: on Monday July 21, 2025 @03:57PM (#65535218) Homepage
  
  I'm a mathematician, so I may have some expertise here: Humans spend years training for the IMO so they functionally sucking all of that up. And the IMO problems themselves vary a lot. Even extremely bright people including professional mathematicians, would have trouble with some IMO problems. Mere regurgitation is insufficient to solve the problems.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re: AI Training (Score:1)
    
    by subie ( 1062756 ) writes:
    
    Thank you
  - - Re:Ok (Score:5, Interesting)
      
      by JoshuaZ ( 1134087 ) writes: on Monday July 21, 2025 @05:03PM (#65535378) Homepage
      
      Yes, they are high school students, but the students who get gold medals are students who frequently started studying for the IMO in 9th or 10th grade, and sometimes even earlier. And yes, mathematicians aren't always going to see the trick that a given problem relies on. And it is true that IMO problems often involve tricks or approaches that one can study for. The problems are not research mathematics, and a lot of very good mathematicians never did well at the IMO as a young person. At the same time, some problems which very hard in the IMO are things where mathematicians in specific areas would not have trouble. For example, P3 of 2007 is a tough graph theory problem https://artofproblemsolving.com/wiki/index.php/2007_IMO_Problems/Problem_3 [artofproblemsolving.com] but if one has done a lot of graph theory it may seem simpler. But the IMO also frequently involves some problems which require a degree not just of pre-existing knowledge, but also something we would normally call creativity or involve concepts that are just not standard. For example, P6 from 2014 involves a problem where the idea of the problem is essentially a creative thematic connection between geometry and Ramsey theory https://artofproblemsolving.com/wiki/index.php/2014_IMO_Problems/Problem_6 [artofproblemsolving.com]. P6 from 2009 is another interesting way things can go https://artofproblemsolving.com/wiki/index.php/2009_IMO_Problems/Problem_6 [artofproblemsolving.com] and is a curious one because although very few people got it right, once one has seen the solution it feels completely obvious (unlike some other IMO problems where even after having seen a solution it isn't clear where it came from). (Note that traditionally they aim it so that Problem 6 is the hardest problem.)
      Your last point that these problems are more suited to solving with AI on a whole probably has some validity. A lot of the geometry-style IMO problems are highly narrow in framing, and there's been a lot more success with AI with those problems. And all IMO problems are in an important sense easier than genuine research problems, because you know either what you need to prove or very close to it, whereas one of the big issues in research is that you are often spending a massive amount of time trying to prove something that turns out to be false. And the IMO problems are also selected so that they do not require any "advanced" techniques such as calculus, which also drastically reduces what one's functional search space looks like. In fact, one issue some LLM AIs had early on when trying to do IMO problems is that the AIs would sometimes try to jump to developing solutions which used high-powered techniques which just weren't useful in that context. So no this isn't research math, but it remains extremely impressive and shows how fast these systems have gone. My own opinion has been from where things have gone in the last two years that the current AI systems would likely fizzle out in the sense discussed here https://scottaaronson.blog/?p=7266 [scottaaronson.blog]. This is evidence that I was wrong in that assessment.
      In terms of where AI and research mathematics is going, we're still not there. But we are getting closer to the point where AI systems can be genuinely useful. For example, not too long ago, I was looking for a specific result of a type I had seen before, and I asked an LLM about it. The LLM hallucinated a bunch of junk, but it also kept hallucinating papers by a specific mathematician, and it turned out that there was an actual paper by that person that had the sort of result I was looking for. But more direct use of AI for not just looking up things like that but to do actual research is being developed. The major hope is that we'll use systems like Lean http [wikipedia.org]
      Read the rest of this comment...
      
      Reply to This Parent Share
      Flag as Inappropriate
      - Re: (Score:3)
        
        by sinij ( 911942 ) writes:
        
        As a kid I participated in a version of this. Being good at math is not enough to do well in such competition no matter how smart you are (3sigma no idea past that). To do well, you need to study competition-style problems and learn broad array of various moves and shortcuts. This is much closer to chess than to actual logic and reasoning of free-form problem solving.
      - Re: (Score:2)
        
        by mesterha ( 110796 ) writes:
        
        The major hope is that we'll use systems like Lean https://en.wikipedia.org/wiki/... [wikipedia.org] [wikipedia.org] or other formal proof systems, where the AIs will generate formal code and then a separate system will use Lean or another similar system to check if the code is valid, and then the LLM will keep trying to generate Lean code until it either times out or it finds a proof or disproof of the given system that compiles in Lean. But we are pretty far from that point.
        That's the basics of what the Google system
  - Re: (Score:1)
    
    by gweihir ( 88907 ) writes:
    
    But what about putting them into a Computer Algebra system? We have had these for decades now. In fact, I used one when I started my CS studies 35 years ago.
    The very point of such a competition is to have a human do it, not a machine.
    - Re: (Score:2)
      
      by JoshuaZ ( 1134087 ) writes:
      
      The easy IMO problems, like some P1s or P2s could be sometimes done by a computer algebra system, but that's an exception there. And sure, the point of the competition is for humans to do it. This isn't about the IMO as a competition, but the usefulness of the IMO as essentially a natural metric of how effective these AI systems are at difficult reasoning problems.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        but the usefulness of the IMO as essentially a natural metric of how effective these AI systems are at difficult reasoning problems.
        Since LLMs have zero reasoning capability (the math does not allow it), it is obviously a failure at this.
        
        Re:AI Training (Score:4, Insightful)
        
        by JoshuaZ ( 1134087 ) writes: on Monday July 21, 2025 @06:23PM (#65535516) Homepage
        
        Does it worry you that there's a circularity in your own reasoning? You are taking for granted that LLMs cannot reason and therefore choosing to decide that any metric which shows they can reason must be flawed, and therefore cannot be evidence against your claim. Aside from the circularity, you are also confusing whether an LLM AN has "reasoning capability" with whether the system can succeed at problems which humans solve via difficult reasoning. This is essentially akin to insisting that an airplane must be unable to fly because its wings don't flap like the wings of a bird.
        
        Reply to This Parent Share
        Flag as Inappropriate
        
        Re: (Score:2)
        
        by Petersko ( 564140 ) writes:
        
        Or that watches based on quartz crystals are just an idle curiosity because they aren't built using intricate gear assemblies.
        
        Re: (Score:2)
        
        by Aighearach ( 97333 ) writes:
        
        No, they are not designed to reason. You're arguing from ignorance of the subject just using rhetorical techniques. You're mistaking technical questions with real answers for philosophical questions where you can just blow any horseshit you want out your ass and then use rhetoric to "argue" it.
        And your rhetoric is mostly red herrings and strawmen.
        
        Re: (Score:2)
        
        by JoshuaZ ( 1134087 ) writes:
        
        Instead of insults and general claims of ignorance, do you want to go and explain what "red herrings and strawmen" I've engaged in?
        You're mistaking technical questions with real answers for philosophical questions where you can just blow any horseshit you want out your ass and then use rhetoric to "argue" it.
        What technical questions do you think I'm mistaking for philosophical questions? I must confess also that I find it somewhat strange that you make a claim about strawmen and then say "No, they are not designed to reason" as if that somehow responds to what I wrote where I explicitly said that the question about whether an LLM AI can reason is a distinct question from whether they can solve problems humans solve via reasoning. So it seems very odd to tell me that they are not designed to reason. So what straw positions are there here other than yours?
        
        Re: (Score:2)
        
        by JoshuaZ ( 1134087 ) writes:
        
        Sorry, messed up the formatting with the previous comment, so replying separately to make it more readable. My apologies. Instead of insults and general claims of ignorance, do you want to go and explain what "red herrings and strawmen" I've engaged in?
        You're mistaking technical questions with real answers for philosophical questions where you can just blow any horseshit you want out your ass and then use rhetoric to "argue" it.
        What technical questions do you think I'm mistaking for philosophical questions? I must confess also that I find it somewhat strange that you make a claim about strawmen and then say "No, they are not designed to reason" as if that somehow responds to what I
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        You seem to be unfamiliar with Searle's Chinese Room, among other things.
        
        Re: (Score:2)
        
        by JoshuaZ ( 1134087 ) writes:
        
        I'm famiiiar with the Chinese room. Do you want to explain how its relevant here? Since again, we're explicitly talking about distinguishing whether the system is reasoning from whether it is solving problems which humans use reasoning to solve, it isn't relevant. That's aside from the serious problems with the Chinese room argument.
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        The point of the Chinese room is that something can appear to be intelligent without actually being intelligent.
        
        You can't do that. You have to actually look at what the machine is doing.
        
        Re: (Score:2)
        
        by JoshuaZ ( 1134087 ) writes:
        
        So, there are two issues here. One is that one has to actually agree with the conclusion that that thought problem shows that; not all philosophers agree on that. But second, and more importantly this is precisely why I made the point that whether or not these systems are reasoning is a separate claim from whether or not they are solving problems that humans would apply reasoning to. So the Chinese room thought experiment isn't relevant to the question at hand, which is how effective these systems are, not
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        That nicely sums it up. Most of these people (like the one you responded to) have zero clue how LLMs actually work and what limits that imposes. They just want to believe like fucking morons in a cult.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        You are so utterly without insight here, it is embarrassing. If it is a benchmark for difficult reasoning questions, it must measure reasoning or it is a failure. Take chess: It is not a reasoning problem to a computer. It is, in part, to a human. But because it is not to a computer, it is not suitable as a benchmark for reasoning capabilities.
        
        Re: (Score:3)
        
        by JoshuaZ ( 1134087 ) writes:
        
        You are so utterly without insight here, it is embarrassing. If it is a benchmark for difficult reasoning questions, it must measure reasoning or it is a failure. Take chess: It is not a reasoning problem to a computer. It is, in part, to a human. But because it is not to a computer, it is not suitable as a benchmark for reasoning capabilities.
        Please reread the last two sentences of my previous reply since they address exactly this sort of issue. Whether humans use reasoning to do something doesn't mean something cannot be solved any other way, hence the analogy with the birds. If you think there's a problem with that analogy, by all means please explain it. That said, given that in the other subthread you are actively refusing to test a claim you've made that should just take you a few minutes, I'm guessing that this is not going to be a terribl
        
        Re: (Score:2)
        
        by HiThere ( 15173 ) writes:
        
        To justify that assertion you must have a very specific definition of what "reasoning" entails. Would you care to share it?
        In the definition of reasoning that I use, e.g., alpha-beta pruning would count as reasoning. Clearly you disagree.
        
        Re: (Score:3)
        
        by Moridineas ( 213502 ) writes:
        
        In the definition of reasoning that I use, e.g., alpha-beta pruning would count as reasoning. Clearly you disagree.
        That's a really good point, and I would love to hear the GP's response! Why isn't alpha-beta pruning considered reasoning? Why isn't A* search over a generated graph considered reasoning?
        I've said it before, but I truly do think many of the most vitriolic anti-LLM voices come from a place of being worried about what LLM capabilities say about the nature of human intelligence and reasoning. Perhaps the bottom line is that human intelligence is not so special as we make it out to be?
        Go stands out to me as an
        
        Re: AI Training (Score:2)
        
        by St.Creed ( 853824 ) writes:
        
        At what point do humans become unable to reason, according to your rather strict requirements?
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Since we do not know how the human mind works (or often fails, as this AI hype nicely shows), I have no idea. But I am a PhD level CS type and I have followed AI research for 35 years now. At one point I was considering making it a career myself, but the constant lying and overpromising turned me away.
        And no, I do not use a narrow definition. I use one that makes sense. Unless you want to sell something and then a vacuum cleaner becomes "intelligent" and borderline AGI.
  - Re: (Score:2)
    
    by Dixie_Flatline ( 5077 ) writes:
    
    Maybe you've seen this paper, maybe not:
    "Recent math benchmarks for large language models (LLMs) such as MathArena indicate that state-of-the-art reasoning models achieve impressive performance on mathematical competitions like AIME, with the leading model, Gemini-2.5-Pro, achieving scores comparable to top human competitors. However, these benchmarks evaluate models solely based on final numerical answers, neglecting rigorous reasoning and proof generation which are essential for real-world mathematical ta
    - Re: AI Training (Score:2)
      
      by St.Creed ( 853824 ) writes:
      
      In the mean time maths and physics students are using them at masters level to explain questions, give step by step explanations of proofs and create test questions.
      The whole "can it reason" is arguably a moot issue if LLMs are better at maths and physics than 99% of the population.
- Re:AI Training (Score:4, Insightful)
  
  by fph il quozientatore ( 971015 ) writes: on Monday July 21, 2025 @04:06PM (#65535238)
  
  You could ask the same question about the contestants. They solved these problems because they trained. They recognized techniques and ideas similar to other problems they encountered, and they applied them to a new problem.
  
  Reply to This Parent Share
  Flag as Inappropriate
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  If you had spent even a few minutes actually learning about LLMs you would understand that training data is not directly stored in an LLM. In fact, you don't even need to research it, you can just use your brain and look at open releases. You can load the 8B parameter release of deepseek into 24gb of ram. But it was trained on 14.8 Trillion tokens (approximately 50TB of data). You cannot store 50TB of data in 24GB.
  - Re: (Score:1)
    
    by martin-boundary ( 547041 ) writes:
    
    You're not even wrong. Perhaps start by taking a first year course on data compression. Once you get to a point where you understand how JPEG works, you will have everything you need to conceptually understand how it is possible for an LLM to represent its training data inside the parameters.
    If you're lazy, then I'll just point you to the fact that LLM exploits have been found that make the LLM spit out its training data in clear text. [theregister.com]
    - Re: AI Training (Score:2)
      
      by St.Creed ( 853824 ) writes:
      
      Only some locally coherent snippets. And that only after poking it for a long time. If you claim the LLMs will puke up their entire training set, just take that to the lawyers for the authors in the case vs OpenAI. They'll be quite happy to see you.
A computer that can do math? (Score:2, Funny)

by ebunga ( 95613 ) writes:

What will they think of next?
- Re:good (Score:4, Interesting)
  
  by JoshuaZ ( 1134087 ) writes: on Monday July 21, 2025 @05:05PM (#65535380) Homepage
  
  Extremely unlikely. The Riemann Hypothesis has been thought about by more mathematicians than almost any other serious open problem. LLMs have some limited "creativity" in the sense that they can try to combine existing techniques, but they aren't really capable of inventing entire new techniques out of whole cloth, and it seems pretty clear that fundamentally new insights are needed to resolve RH.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re: (Score:2)
    
    by martin-boundary ( 547041 ) writes:
    
    Unfortunately, the economic justification for AI assumes problems of the caliber of the Riemann Hypothesis will be solved, hopefully without the input of humans, in the next two years. It's laughable, but these claims have had a strong economic impact by distorting investment flows. Until the inevitable reality check occurs, you'll get many people expecting feats such as the RH, and much misinterpretation of the role of AI assistants.
    - Re: (Score:2)
      
      by Moridineas ( 213502 ) writes:
      
      Unfortunately, the economic justification for AI assumes problems of the caliber of the Riemann Hypothesis will be solved
      Does it? It would be fascinating if AI could eventually solve problems like that, but I think the economic justification is more along the lines of "we have billions of dollars of capital available, billions of people who want to use this technology, tens of thousands of other companies working on AI technology, and we don't want to be left behind when the next big discoveries are made."
  - Re: good (Score:2)
    
    by St.Creed ( 853824 ) writes:
    
    Yet they are already answering some long standing unsolved problems. So while the Riemann hypothesis may be out of reach, the tools required to crack it may still come out of an LLM.
    - Re: (Score:2)
      
      by JoshuaZ ( 1134087 ) writes:
      
      I don't know of any long-standing open problems solved by an LLM. There have been open problems solved by AI systems for a long time since even before LLMs. The classic example is the Robbins conjecture https://en.wikipedia.org/wiki/Robbins_algebra#History [wikipedia.org] all the way back in 1996. And other systems have made progress since then. But in the last few years, the primary way LLMs have been used has been not to answer long-standing open questions but rather to tweak some existing bounds or improve searches for
And other AI engines? (Score:1)

by Ummite ( 195748 ) writes:

I have in mind, the latest Grok 4 Heavy, was he tested? I didn't saw any information about other engines doing that test to compare.
So? (Score:1)

by gweihir ( 88907 ) writes:

I have no doubt that Maple or any other decent Computer Algebra system could have done the same ... 30 years ago. Or Wolfram Alpha.
This is a completely meaningless stunt. The only purpose is to deceive the stupid about what these systems can do, or rather cannot do.
- Re: (Score:2)
  
  by JoshuaZ ( 1134087 ) writes:
  
  If you really believe this, by all means try this. This year's problems are here https://artofproblemsolving.com/wiki/index.php/2025_IMO_Problems [artofproblemsolving.com] and you can try to see if you can get Maple, or Mathematica to do these problems. The same link has all the previous contests back to 1959 (1980 did not have a contest). Given format changes, you are more likely to see problems that these systems can solve, especially some of the real inequality problems (which Maple can certainly do), but those problems have not
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    You're replying to slashdot's local resident AI retard! Don't waste time reasoning with him. It won't work. For some reason, he feels it is his duty to shit on anything related to AI. He will never give any explanation or reasoning behind his comments and doesn't care what anyone says.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Why would I waste time in what is clearly a failed technology? The stupid always need years and years to find out a hype is just a hype and never has any of the substance claimed. I can do it directly.
    - Re:So? (Score:4, Insightful)
      
      by JoshuaZ ( 1134087 ) writes: on Monday July 21, 2025 @06:28PM (#65535522) Homepage
      
      So, your evidence is to make a claim about what Maple or another computer algebra system must be able to do, and you aren't willing to actually spend 5 minutes checking whether you can actually do that. Here: I'll even make the following offer. Take the 2025 IMO. If you can get Maple or any other computer algebra system to solve more than 1 problem on it, and provide screenshots of it doing so, I'll donate $100 to a charity of your choice. So now want to actually go and do what you claim must be obviously the case. I suspect you aren't however going to respond to this by actually trying what you claim is possible, and I'll let interested readers of this thread make their own conclusions about why that's the case.
      
      Reply to This Parent Share
      Flag as Inappropriate
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Not interested. You are deep in delusion and cannot be reached.
      - Re: (Score:2)
        
        by Moridineas ( 213502 ) writes:
        
        Not relevant to your post, but I appreciate your insights as a mathematician on this article (says the guy who tapped out after 3rd semester calc!). For what it's worth, I don't _think_ gweihir is trolling, though he could be be, but he does seem to spend a substantial number of his waking hours posting ad hominems and just outright calling people stupid if anyone suggests any possible productive usage of an LLM. He doesn't engage in good faith, so the conversations are never particularly interesting.
    - Re: (Score:2)
      
      by ceoyoyo ( 59147 ) writes:
      
      Ah, the gaps close even tighter.
      "I'm totally sure X or Y could do that."
      "Okay, try it!"
      "Why would I waste my time?"
      Why indeed. You might find out that you're wrong.
      - Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        You might find out that you're wrong.
        Or I could waste a lot of time. Which is the more likely case.
        
        Re: (Score:2)
        
        by ceoyoyo ( 59147 ) writes:
        
        You're nicely illustrating the issue with AI reasoning. Humans "reason" mostly by deciding what they believe then coming up with truthy stories to support that belief. When confronted with actual conflicting evidence or proper reasoning to the contrary, they make up more stories to "rationalize" it, or, that failing, make excuses.
        The reasoning systems are generally made up of an LLM, a bunch of more general purpose neural network layers, and some conventional logic systems. The neural network part comes up
NOTHING OFFICIAL AT ALL (Score:1)

by gavron ( 1300111 ) writes:

The IMO is pretty pissed off at Google because the results are embargoed from publication until the 28th of July.
Apparently Google did their own metrics and claimed the Gold-Medal. IMO is not happy,
https://arstechnica.com/ai/202... [arstechnica.com]
But hey, good on Goog for demonstrating some level of success, whether IMO gives them a gold medal or not.
Shame on them for violating the publication embargo, but I haven't read that contract or those T&Cs so I only
judge by what IMO says and what Goog says.
Complete Fail (Score:2)

by Aighearach ( 97333 ) writes:

What's worse than the self-grading is that the rules include not using calculators.
How can an AI do math without using a calculator?
There is no way a computer anything can score even 1% using the same rules as humans.
- Re: (Score:2)
  
  by JoshuaZ ( 1134087 ) writes:
  
  If you read the summary, you'd see it was not self-graded but graded by IMO people. Also, while it is true that AI systems functionally have in some sense access to calculators, you can go and look at the IMO problems yourself https://artofproblemsolving.com/wiki/index.php/2025_IMO [artofproblemsolving.com] and see that the assistance one would get from a standard calculator is pretty minimal.
  - Re: (Score:2)
    
    by martin-boundary ( 547041 ) writes:
    
    I think when people talk about calculator assistance for LLMs, they are conflating a general class of approaches to bolt on reasoning capabilities into those LLMs by delegating arithmetic, and delegating logic, to specialized, provably correct software tools. These tools are not AIs themselves and do not use ML algorithms in their implementations, so should be considered traditional software. The tools are necessary though, as the Chatbot/language models alone are incapable of performing arithmetic, and th

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Advanced Version of Gemini With Deep Think Officially Achieves Gold-Medal Standard at the International Mathematical Olympiad More | Reply Login

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

AI Training (Score:1)

Re:AI Training (Score:5, Informative)

Re: AI Training (Score:1)

Re:Ok (Score:5, Interesting)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re:AI Training (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: AI Training (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: AI Training (Score:2)

Re:AI Training (Score:4, Insightful)

Re: (Score:1)

Re: (Score:1)

Re: AI Training (Score:2)

A computer that can do math? (Score:2, Funny)

Re:good (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: good (Score:2)

Re: (Score:2)

And other AI engines? (Score:1)

So? (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re:So? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

NOTHING OFFICIAL AT ALL (Score:1)

Complete Fail (Score:2)

Re: (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals