HyperAI超神经

Question Answering On Openbookqa

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

模型名称
Accuracy
Paper TitleRepository
UnifiedQA 11B87.2UnifiedQA: Crossing Format Boundaries With a Single QA System
FLAN-T5-Large 783M31.2LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
GenMC 11B89.8Clues Before Answers: Generation-Enhanced Multiple-Choice QA
BLOOM 176B (2-shot)47.2BloombergGPT: A Large Language Model for Finance-
GPT-4 + knowledge base95.9--
PaLM 540B (Standard-Prompting)84.4Large Language Models Can Self-Improve-
PaLM 540B (CoT Prompting)86.4Large Language Models Can Self-Improve-
GrapeQA: PEGA+CANP90GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering-
LLaMA-2 7B + MixLoRA81.6MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
LLaMA-3 8B+MoSLoRA86.8Mixture-of-Subspaces in Low-Rank Adaptation
BiLSTM max-out question-match (WordNet + science fact)56.3Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
MVP-Tuning (ensemble)95.2--
PaLM 540B (Self Improvement, Self Consistency)94.4Large Language Models Can Self-Improve-
GPT-3 175B (few-shot, k=32)65.4Language Models are Few-Shot Learners
GrapeQA: PEGA82GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering-
LaMini-F-T5 783M34LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
AristoRoBERTa + MVP-Tuning87.6--
GrapeQA: CANP66.2GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering-
LaMini-T5 738M36LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
LLaMA-3 8B + MixLoRA84.8MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
0 of 45 row(s) selected.