Common Sense Reasoning On Winogrande

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

		Paper Title	Repository
ST-MoE-32B 269B (fine-tuned)	96.1	ST-MoE: Designing Stable and Transferable Sparse Expert Models
Unicorn 11B (fine-tuned)	91.3	UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
CompassMTL 567M with Tailor	90.5	Task Compass: Scaling Multi-task Pre-training with Task Prefix
CompassMTL 567M	89.6	Task Compass: Scaling Multi-task Pre-training with Task Prefix
UnifiedQA 11B (fine-tuned)	89.4	UnifiedQA: Crossing Format Boundaries With a Single QA System
Claude 3 Opus (5-shot)	88.5	The Claude 3 Model Family: Opus, Sonnet, Haiku	-
GPT-4 (5-shot)	87.5	GPT-4 Technical Report
ExDeBERTa 567M	87	Task Compass: Scaling Multi-task Pre-training with Task Prefix
LLaMA-2 13B + MixLoRA	86.3	MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
LLaMA3 8B+MoSLoRA	85.8	Mixture-of-Subspaces in Low-Rank Adaptation
PaLM 2-L (1-shot)	83.0	PaLM 2 Technical Report
LLaMA-3 8B + MixLoRA	82.1	MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
ST-MoE-L 4.1B (fine-tuned)	81.7	ST-MoE: Designing Stable and Transferable Sparse Expert Models
GPT-3.5 (5-shot)	81.6	GPT-4 Technical Report
PaLM 540B (0-shot)	81.1	PaLM: Scaling Language Modeling with Pathways
Camelidae-8×34B	80.9	Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
PaLM 2-M (1-shot)	79.2	PaLM 2 Technical Report
RoBERTa-Winogrande 355M (fine-tuned)	79.1	WinoGrande: An Adversarial Winograd Schema Challenge at Scale
PaLM 2-S (1-shot)	77.9	PaLM 2 Technical Report
Mixtral 8x7B (0-shot)	77.2	Mixtral of Experts

0 of 77 row(s) selected.

Command Palette

Common Sense Reasoning On Winogrande

评估指标

评测结果