HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Model Card and Evaluations for Claude Models

{Anthropic}

Abstract

This report includes the model card [1] for Claude models, focusing on Claude 2, along with the results of a range of safety, alignment, and capabilities evaluations. We have been iterating on the training and evaluation of Claude-type models since our first work on Reinforcement Learning from Human Feedback (RLHF) [2]; the newest Claude 2 model represents a continuous evolution from those early and less capable ‘helpful and harmless’ language assistants.This report is not intended to be a scientific paper since most aspects of training and evaluating these models have been documented in our research papers. These include papers on preference modeling [3], reinforcement learning from human feedback for helpful and harmless models [2], red teaming language models [4], measuring representation of subjective global values in language models [5], honesty, (i.e., exploring language models’ ability to recognize what they know) [6], evaluating language models with language model-generated tests [7], moral self-correction [8], and Constitutional AI [9]. We also discussed Claude’s specific constitution in a recent blog post [10]. Our work using human evaluations to test model safety is most thoroughly documented in our paper “Red-Teaming Language Models to Reduce Harms” [4], while our recent work on automated safety evaluation is “Discovering Language Model Behaviors with Model-Written Evaluations” [7]. This report is also not comprehensive – we expect to release new findings as we continue our research and evaluations of frontier models. However, we hope it provides useful insight into Claude 2’s capabilities and limitations.

Benchmarks

BenchmarkMethodologyMetrics
arithmetic-reasoning-on-gsm8kClaude 1.3 (0-shot chain-of-thought)
Accuracy: 85.2
arithmetic-reasoning-on-gsm8kClaude 2 (0-shot chain-of-thought)
Accuracy: 88
arithmetic-reasoning-on-gsm8kClaude Instant 1.1 (0-shot chain-of-thought)
Accuracy: 80.9
common-sense-reasoning-on-arc-challengeClaude 2 (few-shot, k=5)
Accuracy: 91
common-sense-reasoning-on-arc-challengeClaude Instant 1.1 (few-shot, k=5)
Accuracy: 85.7
common-sense-reasoning-on-arc-challengeClaude 1.3 (few-shot, k=5)
Accuracy: 90
multi-task-language-understanding-on-mmluClaude Instant 1.1 (5-shot)
Average (%): 73.4
question-answering-on-qualityClaude Instant 1.1 (5-shot)
Accuracy: 80.5
question-answering-on-qualityClaude 1.3 (5-shot)
Accuracy: 84.1
question-answering-on-qualityClaude 2 (5-shot)
Accuracy: 83.2
question-answering-on-triviaqaClaude 2 (few-shot, k=5)
EM: 87.5
question-answering-on-triviaqaClaude Instant 1.1 (few-shot, k=5)
EM: 78.9
question-answering-on-triviaqaClaude 1.3 (few-shot, k=5)
EM: 86.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp