DualDistill and Agentic-R1: AI That Mixes Language and Tools for Better Math Problem Solving
Researchers from Carnegie Mellon University have introduced a new approach called DualDistill, designed to enhance mathematical problem-solving in AI by combining natural language reasoning with tool use. This method creates a more efficient and accurate model, addressing limitations in existing long-chain-of-thought (long-CoT) reasoning systems. Long-CoT models have demonstrated strong performance in mathematical reasoning by generating detailed reasoning steps and refining them through self-verification. However, open-source versions of these models often rely solely on natural language, which can be inefficient and error-prone. On the other hand, tool-aided reasoning systems, such as those using frameworks like OpenHands, offer greater computational efficiency and reliability for numerical tasks. Yet, they face challenges when dealing with abstract or complex reasoning problems. To overcome these issues, the researchers developed DualDistill, a distillation framework that integrates knowledge from two distinct teachers: one focused on natural language reasoning and the other on tool-assisted problem solving. This combination allows the resulting student model, Agentic-R1, to dynamically choose the best approach for each problem type. For arithmetic and algorithmic tasks, Agentic-R1 executes code, while for abstract reasoning, it uses natural language. The framework uses trajectory composition to merge insights from both teachers, followed by self-distillation to refine the model further. OpenHands was used as the agentic reasoning teacher, while DeepSeek-R1 served as the text-based reasoning teacher. Agentic-R1 was tested on multiple mathematical reasoning benchmarks, including DeepMath-L and Combinatorics300, and compared to models like DeepSeek-R1-Distill and Qwen-2.5-Instruct. The results showed that Agentic-R1 significantly outperformed models that specialized in either tool-based or pure reasoning approaches. It achieved better performance by intelligently applying reasoning strategies when needed and maintaining efficiency on standard mathematical tasks. Qualitative analysis revealed that Agentic-R1 effectively uses tools, activating code execution in 79.2% of complex combinatorics problems from the Combinatorics300 dataset, while using them less frequently—52.0%—on simpler problems from the AMC dataset. This behavior was learned through supervised fine-tuning without explicit instructions, demonstrating the model’s ability to balance computational efficiency and reasoning accuracy. The framework also shows resilience when working with imperfect teachers. Even when the agentic teacher achieved only 48.4% accuracy on the Combinatorics300 benchmark, the student model improved from 44.7% to 50.9% accuracy, surpassing the teacher’s performance. In conclusion, DualDistill successfully merges the strengths of natural language reasoning and tool-assisted methods by distilling knowledge from two specialized teachers into a single adaptable model. Agentic-R1 demonstrates the ability to dynamically select the most effective strategy for each problem, achieving higher accuracy and efficiency than models that focus on a single approach. Its performance across diverse mathematical benchmarks highlights its potential as a robust solution for complex AI reasoning tasks.