3 months ago

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Ye Tian Baolin Peng Linfeng Song Lifeng Jin Dian Yu Haitao Mi Dong Yu

Abstract

Despite the impressive capabilities of Large Language Models (LLMs) onvarious tasks, they still struggle with scenarios that involves complexreasoning and planning. Recent work proposed advanced prompting techniques andthe necessity of fine-tuning with high-quality data to augment LLMs' reasoningabilities. However, these approaches are inherently constrained by dataavailability and quality. In light of this, self-correction and self-learningemerge as viable solutions, employing strategies that allow LLMs to refinetheir outputs and learn from self-assessed rewards. Yet, the efficacy of LLMsin self-refining its response, particularly in complex reasoning and planningtask, remains dubious. In this paper, we introduce AlphaLLM for theself-improvements of LLMs, which integrates Monte Carlo Tree Search (MCTS) withLLMs to establish a self-improving loop, thereby enhancing the capabilities ofLLMs without additional annotations. Drawing inspiration from the success ofAlphaGo, AlphaLLM addresses the unique challenges of combining MCTS with LLMfor self-improvement, including data scarcity, the vastness search spaces oflanguage tasks, and the subjective nature of feedback in language tasks.AlphaLLM is comprised of prompt synthesis component, an efficient MCTS approachtailored for language tasks, and a trio of critic models for precise feedback.Our experimental results in mathematical reasoning tasks demonstrate thatAlphaLLM significantly enhances the performance of LLMs without additionalannotations, showing the potential for self-improvement in LLMs.

Code Repositories

yetianjhu/alphallm

Official

pytorch

Benchmarks

Benchmark	Methodology	Metrics
arithmetic-reasoning-on-gsm8k	AlphaLLM (with MCTS)	Accuracy: 92 Parameters (Billion): 70
gsm8k-on-gsm8k	AlphaLLM (with MCTS)	Accuracy: 92
math-word-problem-solving-on-math	AlphaLLM (with MCTS)	Accuracy: 51

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette