Science Question Answering On Scienceqa
Metrics
Avg. Accuracy
Grades 1-6
Grades 7-12
Image Context
Language Science
Natural Science
No Context
Social Science
Text Context
Results
Performance results of various models on this benchmark
Model Name | Avg. Accuracy | Grades 1-6 | Grades 7-12 | Image Context | Language Science | Natural Science | No Context | Social Science | Text Context | Paper Title | Repository |
---|---|---|---|---|---|---|---|---|---|---|---|
GPT-3 (QCM→A, 2-shot) | 73.97 | 76.80 | 68.89 | 67.28 | 76.00 | 74.64 | 77.42 | 69.74 | 74.44 | - | - |
GPT-3 - CoT(QCM→AE, 2-shot) | 74.61 | 78.49 | 67.63 | 66.09 | 77.55 | 76.60 | 79.58 | 65.92 | 75.51 | - | - |
Chat-UniVi-13B | 90.99 | 91.19 | 90.64 | 88.05 | 88.91 | 90.41 | 90.94 | 95.05 | 89.64 | - | - |
MC-CoT F-Large | 94.88 | 95.3 | 94.13 | 93.75 | 93.18 | 97.47 | 94.49 | 90.44 | 96.97 | - | - |
Honeybee | 94.39 | 95.04 | 93.21 | 93.75 | 91.18 | 95.20 | 93.17 | 96.29 | 94.48 | - | - |
Video-LaVIT | 70.0 | - | - | - | - | - | - | - | - | - | - |
GPT-3 - CoT (QCM→ALE , 2-shot) | 75.17 | 78.23 | 69.68 | 67.43 | 78.09 | 75.44 | 79.93 | 70.87 | 74.68 | - | - |
UnifiedQA-BASE - CoT (QCM→ALE) | 74.11 | 77.06 | 68.82 | 66.53 | 78.91 | 71.00 | 81.81 | 76.04 | 66.42 | - | - |
LLaVA (+ GPT-4) | 92.53 | - | - | - | - | - | - | - | - | - | - |
Multimodal CoT | 91.68 | 92.44 | 90.31 | 88.80 | 90.82 | 95.91 | 92.89 | 82.00 | 95.26 | - | - |
0 of 10 row(s) selected.