Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Recent advances in large language models (LLMs) and multi-agent systems havedemonstrated remarkable capabilities in complex problem-solving tasks such asdeep research, vibe coding, and mathematical reasoning. However, most existingmulti-agent systems are built upon manual prompt/workflow engineering withsophisticated agent frameworks, making them computationally inefficient, lesscapable, and can not benefit from data-centric learning. In this work, weintroduce Chain-of-Agents (CoA), a novel paradigm of LLM reasoning that enablesnative end-to-end complex problem-solving in the same way as a multi-agentsystem (i.e., multi-turn problem solving with multiple tools and multipleagents) within one model. In chain-of-agents problem-solving, the modeldynamically activates different tool agents and role-playing agents to simulatemulti-agent collaboration in an end-to-end fashion. To elicit end-to-endchain-of-agents problem-solving abilities in LLMs, we introduce a multi-agentdistillation framework to distill state-of-the-art multi-agent systems intochain-of-agents trajectories for agentic supervised fine-tuning. We then useagentic reinforcement learning on verifiable agentic tasks to further improvethe models' capabilities on chain-of-agents problem solving. We call theresulting models Agent Foundation Models (AFMs). Our empirical studiesdemonstrate that AFM establishes new state-of-the-art performance acrossdiverse benchmarks in both web agent and code agent settings. We make theentire research, including the model weights, code for training and evaluation,and the training data, fully open-sourced, which offers a solid starting pointfor future research on agent models and agentic RL.