PaLM 540B (Standard-Prompting) | 84.4 | Large Language Models Can Self-Improve | - |
PaLM 540B (CoT Prompting) | 86.4 | Large Language Models Can Self-Improve | - |
BiLSTM max-out question-match (WordNet + science fact) | 56.3 | Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering | |
PaLM 540B (Self Improvement, Self Consistency) | 94.4 | Large Language Models Can Self-Improve | - |
GPT-3 175B (few-shot, k=32) | 65.4 | Language Models are Few-Shot Learners | |
AristoRoBERTa + MVP-Tuning | 87.6 | - | - |