HyperAI

Science Question Answering On Scienceqa

Metrics

Avg. Accuracy
Grades 1-6
Grades 7-12
Image Context
Language Science
Natural Science
No Context
Social Science
Text Context

Results

Performance results of various models on this benchmark

Model Name
Avg. Accuracy
Grades 1-6
Grades 7-12
Image Context
Language Science
Natural Science
No Context
Social Science
Text Context
Paper TitleRepository
GPT-3 (QCM→A, 2-shot)73.9776.80 68.89 67.2876.0074.6477.4269.7474.44--
GPT-3 - CoT(QCM→AE, 2-shot)74.6178.4967.6366.0977.5576.6079.5865.9275.51--
Chat-UniVi-13B90.9991.1990.6488.0588.9190.4190.9495.0589.64--
MC-CoT F-Large94.8895.394.1393.7593.1897.4794.4990.4496.97--
Honeybee94.3995.0493.2193.7591.1895.2093.1796.2994.48--
Video-LaVIT70.0----------
GPT-3 - CoT (QCM→ALE , 2-shot)75.1778.23 69.6867.4378.0975.4479.9370.8774.68--
UnifiedQA-BASE - CoT (QCM→ALE)74.1177.0668.8266.5378.9171.0081.8176.0466.42--
LLaVA (+ GPT-4)92.53----------
Multimodal CoT91.6892.4490.3188.8090.8295.9192.8982.0095.26--
0 of 10 row(s) selected.