HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

Jonas Hübotter Sascha Bongni Ido Hakimi Andreas Krause

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

Abstract

Recent efforts in fine-tuning language models often rely on automatic data selection, commonly using Nearest Neighbors retrieval from large datasets. However, we theoretically show that this approach tends to select redundant data, limiting its effectiveness or even hurting performance. To address this, we introduce SIFT, a data selection algorithm designed to reduce uncertainty about the model's response given a prompt, which unifies ideas from retrieval and active learning. Whereas Nearest Neighbor retrieval typically fails in the presence of information duplication, SIFT accounts for information duplication and optimizes the overall information gain of the selected examples. We focus our evaluations on fine-tuning at test-time for prompt-specific language modeling on the Pile dataset, and show that SIFT consistently outperforms Nearest Neighbor retrieval, with minimal computational overhead. Moreover, we show that our uncertainty estimates can predict the performance gain of test-time fine-tuning, and use this to develop an adaptive algorithm that invests test-time compute proportional to realized performance gains. We provide the $\texttt{activeft}$ (Active Fine-Tuning) library which can be used as a drop-in replacement for Nearest Neighbor retrieval.

Code Repositories

jonhue/activeft
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
language-modelling-on-the-pileTest-Time Fine-Tuning with SIFT + Llama-3.2 (3B)
Bits per byte: 0.557
language-modelling-on-the-pileLlama-3.2 3B
Bits per byte: 0.640
language-modelling-on-the-pilePhi-3 3.8B
Bits per byte: 0.679
language-modelling-on-the-pileGemma-2 9B
Bits per byte: 0.670
language-modelling-on-the-pileGemma-2 27B
Bits per byte: 0.629
language-modelling-on-the-pileGemma-2 2B
Bits per byte: 0.721
language-modelling-on-the-pileTest-Time Fine-Tuning with SIFT + GPT-2 (124M)
Bits per byte: 0.862
language-modelling-on-the-pilePhi-3 7B
Bits per byte: 0.678
language-modelling-on-the-pileTest-Time Fine-Tuning with SIFT + GPT-2 (774M)
Bits per byte: 0.762
language-modelling-on-the-pileTest-Time Fine-Tuning with SIFT + Phi-3 (3.8B)
Bits per byte: 0.595
language-modelling-on-the-pileLlama-3.2 1B
Bits per byte: 0.697
language-modelling-on-the-pilePhi-3 14B
Bits per byte: 0.651
language-modelling-on-the-pileLlama-3.2-Instruct 1B
Bits per byte: 0.807
language-modelling-on-the-pileLlama-3.2-Instruct 3B
Bits per byte: 0.737
language-modelling-on-the-pileTest-Time Fine-Tuning with SIFT + Llama-3.2 (1B)
Bits per byte: 0.606

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp