HyperAIHyperAI
14 days ago

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, Dong Yu
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Abstract

Self-evolving Large Language Models (LLMs) offer a scalable path towardsuper-intelligence by autonomously generating, refining, and learning fromtheir own experiences. However, existing methods for training such models stillrely heavily on vast human-curated tasks and labels, typically via fine-tuningor reinforcement learning, which poses a fundamental bottleneck to advancing AIsystems toward capabilities beyond human intelligence. To overcome thislimitation, we introduce R-Zero, a fully autonomous framework that generatesits own training data from scratch. Starting from a single base LLM, R-Zeroinitializes two independent models with distinct roles, a Challenger and aSolver. These models are optimized separately and co-evolve throughinteraction: the Challenger is rewarded for proposing tasks near the edge ofthe Solver capability, and the Solver is rewarded for solving increasinglychallenging tasks posed by the Challenger. This process yields a targeted,self-improving curriculum without any pre-existing tasks and labels.Empirically, R-Zero substantially improves reasoning capability acrossdifferent backbone LLMs, e.g., boosting the Qwen3-4B-Base by +6.49 onmath-reasoning benchmarks and +7.54 on general-domain reasoning benchmarks.

R-Zero: Self-Evolving Reasoning LLM from Zero Data | Latest Papers | HyperAI