6 months ago

Neel Alex Eli Lifland Lewis Tunstall Abhishek Thakur Pegah Maham C. Jess Riedel Emmie Hine Carolyn Ashurst Paul Sedille Alexis Carlier

Abstract

Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants? Existing benchmarks are not designed to measure progress in applied settings, and so don't directly answer this question. The RAFT benchmark (Real-world Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment. Baseline evaluations on RAFT reveal areas current techniques struggle with: reasoning over long texts and tasks with many classes. Human baselines show that some classification tasks are difficult for non-expert humans, reflecting that real-world value sometimes depends on domain expertise. Yet even non-expert human baseline F1 scores exceed GPT-3 by an average of 0.11. The RAFT datasets and leaderboard will track which model improvements translate into real-world benefits at https://raft.elicit.org .

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Benchmarks

Dataset

Document Understanding

AI Infra

Natural Language Processing

Task/Problem

Neel Alex Eli Lifland Lewis Tunstall Abhishek Thakur Pegah Maham C. Jess Riedel Emmie Hine Carolyn Ashurst Paul Sedille Alexis Carlier

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Benchmarks

Dataset

Document Understanding

AI Infra

Natural Language Processing

Task/Problem

Neel Alex Eli Lifland Lewis Tunstall Abhishek Thakur Pegah Maham C. Jess Riedel Emmie Hine Carolyn Ashurst Paul Sedille Alexis Carlier

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

RAFT: A Real-World Few-Shot Text Classification Benchmark

Neel Alex Eli Lifland Lewis Tunstall Abhishek Thakur Pegah Maham C. Jess Riedel Emmie Hine Carolyn Ashurst Paul Sedille Alexis Carlier2 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

RAFT: A Real-World Few-Shot Text Classification Benchmark

Neel Alex Eli Lifland Lewis Tunstall Abhishek Thakur Pegah Maham C. Jess Riedel Emmie Hine Carolyn Ashurst Paul Sedille Alexis Carlier2 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

RAFT: A Real-World Few-Shot Text Classification Benchmark

Neel Alex Eli Lifland Lewis Tunstall Abhishek Thakur Pegah Maham C. Jess Riedel Emmie Hine Carolyn Ashurst Paul Sedille Alexis Carlier2 more

Abstract

Build AI with AI

HyperAI Newsletters

Neel Alex Eli Lifland Lewis Tunstall Abhishek Thakur Pegah Maham C. Jess Riedel Emmie Hine Carolyn Ashurst Paul Sedille Alexis Carlier

Neel Alex Eli Lifland Lewis Tunstall Abhishek Thakur Pegah Maham C. Jess Riedel Emmie Hine Carolyn Ashurst Paul Sedille Alexis Carlier

Neel Alex Eli Lifland Lewis Tunstall Abhishek Thakur Pegah Maham C. Jess Riedel Emmie Hine Carolyn Ashurst Paul Sedille Alexis Carlier