Home News Latest Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Science Question Answering On Scienceqa

Metrics

Avg. Accuracy

Grades 1-6

Grades 7-12

Image Context

Language Science

Natural Science

No Context

Social Science

Text Context

Results

Performance results of various models on this benchmark

Model Name	Avg. Accuracy	Grades 1-6	Grades 7-12	Image Context	Language Science	Natural Science	No Context	Social Science	Text Context	Paper Title	Repository
GPT-3 (QCM→A, 2-shot)	73.97	76.80	68.89	67.28	76.00	74.64	77.42	69.74	74.44	Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering	-
GPT-3 - CoT(QCM→AE, 2-shot)	74.61	78.49	67.63	66.09	77.55	76.60	79.58	65.92	75.51	Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering	-
Chat-UniVi-13B	90.99	91.19	90.64	88.05	88.91	90.41	90.94	95.05	89.64	Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding	-
MC-CoT F-Large	94.88	95.3	94.13	93.75	93.18	97.47	94.49	90.44	96.97	Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training	-
Honeybee	94.39	95.04	93.21	93.75	91.18	95.20	93.17	96.29	94.48	Honeybee: Locality-enhanced Projector for Multimodal LLM	-
Video-LaVIT	70.0	-	-	-	-	-	-	-	-	Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization	-
GPT-3 - CoT (QCM→ALE , 2-shot)	75.17	78.23	69.68	67.43	78.09	75.44	79.93	70.87	74.68	Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering	-
UnifiedQA-BASE - CoT (QCM→ALE)	74.11	77.06	68.82	66.53	78.91	71.00	81.81	76.04	66.42	Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering	-
LLaVA (+ GPT-4)	92.53	-	-	-	-	-	-	-	-	-	-
Multimodal CoT	91.68	92.44	90.31	88.80	90.82	95.91	92.89	82.00	95.26	Multimodal Chain-of-Thought Reasoning in Language Models	-

0 of 10 row(s) selected.