OpenAI wins gold at prestigious math competition - why that matters more than you think

OpenAI has achieved a brand new milestone within the race to construct AI fashions that may cause their manner by means of complicated math issues.

On Saturday, the corporate announced that one among its fashions achieved gold medal-level efficiency on the Worldwide Math Olympiad (IMO), extensively considered essentially the most prestigious and tough math competitors on the earth.

We achieved gold medal-level efficiency 🥇on the 2025 Worldwide Mathematical Olympiad with a general-purpose reasoning LLM!
Our mannequin solved world-class math issues—on the stage of prime human contestants. A significant milestone for AI and arithmetic. https://t.co/u2RlFFavyT
— OpenAI (@OpenAI) July 19, 2025

Critically, the successful mannequin wasn’t designed particularly to unravel IMO issues, in the way in which that earlier methods like DeepMind’s AlphaGo — which famously beat the world’s leading Go player in 2016 — had been skilled on an enormous dataset inside a really slender, task-specific area. Relatively, the winner was a general-purpose reasoning mannequin, designed to assume by means of issues methodically utilizing pure language.

Additionally: Is ChatGPT down? You’re not alone. Here’s what OpenAI is saying

“That is an LLM doing math and never a particular formal math system,” OpenAI wrote in its X publish. “It is a part of our foremost push in the direction of normal intelligence.”

(Disclosure: Ziff Davis, ZDNET’s mother or father firm, filed an April 2025 lawsuit in opposition to OpenAI, alleging it infringed Ziff Davis copyrights in coaching and working its AI methods. Ziff Davis additionally owns DownDetector.)

Not a lot is understood at this level concerning the identification of the mannequin that was used. Alexander Wei, a researcher at OpenAI who led the IMO analysis, referred to as it “an experimental reasoning LLM” in an X post, which included an illustration of a strawberry wreathed in a gold medal, suggesting it is constructed atop the corporate’s o1 family of reasoning models, which debuted in September.

“To be clear: We’re releasing GPT-5 quickly, however the mannequin we used at IMO is a separate experimental mannequin,” OpenAI added on X. “It makes use of new analysis strategies that may present up in future fashions — however we do not plan to launch a mannequin with this stage of functionality for a lot of months.”

How properly did the mannequin carry out?

The IMO, which started in 1959, attracts round 50 contestants from greater than 100 nations annually.

Contestants should present proof-based responses to a complete of six questions over the course of two days. These proofs are assessed by former IMO gold medalists, with unanimous consensus required for every last rating. Fewer than 9% of members obtain gold.

In keeping with Wei, OpenAI’s experimental mannequin solved 5 out of the six issues and earned 35 out of 42 doable factors (about 83%), incomes a gold medal. Every proof comprised hundreds of lines of text, representing the person steps the mannequin took to work by means of its reasoning course of. In line with the competitors’s prohibition in opposition to the usage of calculators or different exterior instruments, OpenAI’s mannequin had no entry to the web; it was purely reasoning by means of every of the issues step-by-step.

Additionally: My 8 ChatGPT Agent tests produced only 1 near-perfect result – and a lot of alternative facts

The “mannequin thinks for a lengthy time,” Noam Brown, one other OpenAI researcher concerned within the analysis mission, wrote in an X post. “o1 thought for seconds. Deep Research for minutes. This one thinks for hours. Importantly, it is also extra environment friendly with its pondering.”

Analysts had beforehand estimated that there was solely an 18% likelihood that an AI system would win gold within the IMO by 2025, in accordance with OpenAI.

The massive image

For all of its spectacular talents, AI has lengthy struggled with easy arithmetic and primary math phrase issues — duties that one would possibly assume needs to be comparatively simple for superior algorithms. However not like extra slender logical puzzles, math requires a stage of summary reasoning and conceptual juggling that has been past the attain of most AI methods.

That is been altering, nonetheless, at a very speedy tempo. Just a little over a yr in the past, AI fashions had been nonetheless being assessed utilizing grade school-level math benchmarks just like the GSM8K. Reasoning fashions like o1 and DeepSeek’s R1 rapidly excelled, first acing excessive school-level benchmarks like AIME after which advancing to the college stage and past.

A capability for high-level arithmetic has grow to be the gold commonplace for reasoning fashions, since even a small quantity of hallucination or corner-cutting can in a short time and clearly damage a mannequin’s output. It is simpler to get away with when producing other forms of responses, for instance, offering assist with a written essay, since they’re fairly often open to numerous sorts of interpretation.

Additionally: 5 tips for building foundation models for AI

OpenAI’s IMO gold medal reveals {that a} scalable, general-purpose reasoning strategy can surpass domain-specific fashions in duties which have lengthy been believed to be past the attain of present AI methods. Because it seems, you need not construct hyperfocused, AlphaGo-like fashions skilled to do nothing however math; it is sufficient to coach them to parse language and thoroughly cause by means of their thought course of, and in the event that they’re given sufficient time, they will be capable of construct AI methods which might be capable of compete on par with world-class human mathematicians.

In keeping with Brown, the present tempo of innovation taking place all through the AI trade means that its mathematical and reasoning prowess will solely develop from right here. “I totally count on the development to proceed,” he wrote on X. “Importantly, I believe we’re near AI considerably contributing to scientific discovery.”

Need extra tales about AI? Sign up for Innovation, our weekly e-newsletter.

Source link