Question Answering On Piqa

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

		Paper Title	Repository
Unicorn 11B (fine-tuned)	90.1	UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
LLaMA3 8B+MoSLoRA	89.7	Mixture-of-Subspaces in Low-Rank Adaptation
CompassMTL 567M with Tailor	88.3	Task Compass: Scaling Multi-task Pre-training with Task Prefix
LLaMA-3 8B + MixLoRA	87.6	MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
DeBERTa-Large 304M	87.4	Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
CompassMTL 567M	87.3	Task Compass: Scaling Multi-task Pre-training with Task Prefix
LLaMA-2 13B + MixLoRA	86.8	MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Shakti-LLM (2.5B)	86.2	SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments	-
DeBERTa-Large 304M (classification-based)	85.9	Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering
ExDeBERTa 567M	85.5	Task Compass: Scaling Multi-task Pre-training with Task Prefix
UnifiedQA 3B	85.3	UnifiedQA: Crossing Format Boundaries With a Single QA System
PaLM 2-L (1-shot)	85.0	PaLM 2 Technical Report
Mixtral 8x7B (0-shot)	83.6	Mixtral of Experts
PaLM 2-M (1-shot)	83.2	PaLM 2 Technical Report
LLaMA-2 7B + MixLoRA	83.2	MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Mistral 7B (0-shot)	83.0	Mistral 7B
LLaMA 65B (0-shot)	82.8	LLaMA: Open and Efficient Foundation Language Models
LLaMA 2 70B (0-shot)	82.8	Llama 2: Open Foundation and Fine-Tuned Chat Models
Camelidae-8×34B	82.7	Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
LLaMA 33B (0-shot)	82.3	LLaMA: Open and Efficient Foundation Language Models

0 of 67 row(s) selected.

Command Palette

Question Answering On Piqa

评估指标

评测结果