3 days ago

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Weizhen Li, Jianbo Lin, Zhuosong Jiang, Jingyi Cao, Xinpeng Liu, Jiayu Zhang, Zhenqiang Huang, Qianben Chen, Weichen Sun, Qiexiang Wang, Hongxuan Lu, Tianrui Qin, Chenghao Zhu, Yi Yao, Shuying Fan, Xiaowan Li, Tiannan Wang, Pai Liu, King Zhu, He Zhu, Dingfeng Shi, Piaohong Wang, Yeyi Guan, Xiangru Tang, Minghao Liu, Yuchen Eleanor Jiang, Jian Yang, Jiaheng Liu, Ge Zhang, Wangchunshu Zhou

View Paper Details View Code

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent
Distillation and Agentic RL

Abstract

Recent advances in large language models (LLMs) and multi-agent systems havedemonstrated remarkable capabilities in complex problem-solving tasks such asdeep research, vibe coding, and mathematical reasoning. However, most existingmulti-agent systems are built upon manual prompt/workflow engineering withsophisticated agent frameworks, making them computationally inefficient, lesscapable, and can not benefit from data-centric learning. In this work, weintroduce Chain-of-Agents (CoA), a novel paradigm of LLM reasoning that enablesnative end-to-end complex problem-solving in the same way as a multi-agentsystem (i.e., multi-turn problem solving with multiple tools and multipleagents) within one model. In chain-of-agents problem-solving, the modeldynamically activates different tool agents and role-playing agents to simulatemulti-agent collaboration in an end-to-end fashion. To elicit end-to-endchain-of-agents problem-solving abilities in LLMs, we introduce a multi-agentdistillation framework to distill state-of-the-art multi-agent systems intochain-of-agents trajectories for agentic supervised fine-tuning. We then useagentic reinforcement learning on verifiable agentic tasks to further improvethe models' capabilities on chain-of-agents problem solving. We call theresulting models Agent Foundation Models (AFMs). Our empirical studiesdemonstrate that AFM establishes new state-of-the-art performance acrossdiverse benchmarks in both web agent and code agent settings. We make theentire research, including the model weights, code for training and evaluation,and the training data, fully open-sourced, which offers a solid starting pointfor future research on agent models and agentic RL.