HyperAIHyperAI

Command Palette

Search for a command to run...

Console

FrontierScience: Evaluating AI’s Ability To Perform Expert-Level Scientific Tasks

Miles Wang Joy Jiao Neil Chowdhury Ethan Chang Tejal Patwardhan

Abstract

We introduce FrontierScience, a benchmark evaluating AI capabilities for expert-level scientific reasoning. FrontierScience consists of two tracks: (1) Olympiad, which contains international olympiad problems (at the level of IPhO, IChO, and IBO), and (2) Research, which contains PhD-level, open-ended problems representative of sub-problems in scientific research. In total, FrontierScience is composed of several hundred questions (160 in the open-sourced gold set) covering subfields across physics, chemistry, and biology, from quantum electrodynamics to synthetic organic chemistry. Recent model progress has nearly saturated existing science benchmarks, which often rely on multiple-choice knowledge questions or already published information. In contrast, all Olympiad problems are originally produced by international olympiad medalists and national team coaches to ensure standards of difficulty, originality, and factuality. All Research problems are research sub-tasks written and verified by PhD scientists (doctoral candidates, post-doctoral researchers, or professors). For Research, we also introduce a granular rubric-based architecture to evaluate model capabilities throughout the process of solving a research task, as opposed to judging a standalone answer. In initial evaluations of several frontier models, GPT-5.2 is the top performing model on FrontierScience, scoring 77% on the Olympiad set and 25% on the Research set.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp