HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Ye Tian Baolin Peng Linfeng Song Lifeng Jin Dian Yu Haitao Mi Dong Yu

Toward Self-Improvement of LLMs via Imagination, Searching, and
  Criticizing

Abstract

Despite the impressive capabilities of Large Language Models (LLMs) onvarious tasks, they still struggle with scenarios that involves complexreasoning and planning. Recent work proposed advanced prompting techniques andthe necessity of fine-tuning with high-quality data to augment LLMs' reasoningabilities. However, these approaches are inherently constrained by dataavailability and quality. In light of this, self-correction and self-learningemerge as viable solutions, employing strategies that allow LLMs to refinetheir outputs and learn from self-assessed rewards. Yet, the efficacy of LLMsin self-refining its response, particularly in complex reasoning and planningtask, remains dubious. In this paper, we introduce AlphaLLM for theself-improvements of LLMs, which integrates Monte Carlo Tree Search (MCTS) withLLMs to establish a self-improving loop, thereby enhancing the capabilities ofLLMs without additional annotations. Drawing inspiration from the success ofAlphaGo, AlphaLLM addresses the unique challenges of combining MCTS with LLMfor self-improvement, including data scarcity, the vastness search spaces oflanguage tasks, and the subjective nature of feedback in language tasks.AlphaLLM is comprised of prompt synthesis component, an efficient MCTS approachtailored for language tasks, and a trio of critic models for precise feedback.Our experimental results in mathematical reasoning tasks demonstrate thatAlphaLLM significantly enhances the performance of LLMs without additionalannotations, showing the potential for self-improvement in LLMs.

Code Repositories

yetianjhu/alphallm
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
arithmetic-reasoning-on-gsm8kAlphaLLM (with MCTS)
Accuracy: 92
Parameters (Billion): 70
gsm8k-on-gsm8kAlphaLLM (with MCTS)
Accuracy: 92
math-word-problem-solving-on-mathAlphaLLM (with MCTS)
Accuracy: 51

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp