DeepMind and OpenAI AI models achieve gold medal math scores on par with top students
Google DeepMind announced on July 21 that its AI system achieved gold medal-level performance on a set of mathematics problems from the International Mathematical Olympiad (IMO), matching the skills of the world's top high-school students. While the results appeared only slightly improved from the previous year's performance—where the system scored in the upper range of silver medal territory—the achievement represents a major shift in how AI approaches mathematical problem-solving. DeepMind’s earlier successes relied on specialized AI tools like AlphaGeometry and AlphaProof, which were designed to perform precise logical steps in mathematical reasoning. These systems required human experts to first convert problem statements into a format resembling programming code, then translate the AI's solutions back into natural language. This process was time-consuming and limited the AI's ability to work independently. This year, however, the company used a new large language model (LLM) called DeepThink, built on its Gemini system with enhancements that improved its ability to generate mathematical arguments. The model was able to handle multiple lines of reasoning simultaneously, achieving a score of 35 out of 42 on the six problems presented to human competitors. The solutions were evaluated by the same judges who assessed the human participants, as part of an agreement with the Olympiad organizers. Meanwhile, OpenAI, the creator of ChatGPT, also reported that its own LLM solved the same IMO problems at a gold medal level. However, its solutions were evaluated independently, highlighting different approaches between the two companies. The performance of these AI systems has sparked discussion among researchers. For years, AI development has been divided between two main strategies: one focused on manually coding logical rules into machines, known as neurosymbolic AI, and the other on training neural networks to learn from large datasets. Until 2012, the former was the dominant approach, but the rise of neural networks led to major breakthroughs, with tools like ChatGPT becoming widely used. Gary Marcus, a neuroscientist at New York University and a proponent of the neurosymbolic method, called the achievements of DeepMind and OpenAI "awfully impressive." Despite his skepticism about the hype surrounding LLMs, he acknowledged that solving math problems at the level of the top 67 high-school students globally is a significant accomplishment. Thang Luong, a computer scientist at DeepMind, noted that the shift to natural language processing and end-to-end problem-solving marks a turning point. He said that while the two approaches—neurosymbolic and neural network-based—have long been seen as competing, they may eventually merge. "At this point, the two camps still keep developing," he said. "They could converge together."