HyperAI超神经

Chart Question Answering On Chartqa

评估指标

1:1 Accuracy

评测结果

各个模型在此基准测试上的表现结果

模型名称
1:1 Accuracy
Paper TitleRepository
PaLI-3 (w/ OCR)69.5PaLI-3 Vision Language Models: Smaller, Faster, Stronger
DePlot+GPT3 (Self-Consistency)42.3DePlot: One-shot visual language reasoning by plot-to-table translation
MatCha64.2MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
UniChart66.24UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
PaLI-X (Single-task FT)70.9PaLI-X: On Scaling up a Multilingual Vision and Language Model
DePlot+GPT3 (CoT)36.9DePlot: One-shot visual language reasoning by plot-to-table translation
VisionTapas-OCR45.5ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
Pix2Struct-large58.6Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Gemini Ultra80.8Gemini: A Family of Highly Capable Multimodal Models
StructChart+GPT3.5 (STR)60.7StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding-
SMoLA-PaLI-X Generalist Model73.8Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts-
Qwen-VL65.7Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
SMoLA-PaLI-X Specialist Model74.6Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts-
ScreenAI 5B (4.62 B params, w/ OCR)76.7ScreenAI: A Vision-Language Model for UI and Infographics Understanding
ChartPaLI-5B + PaLM 2-S81.3Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs-
DePlot+Codex (PoT Self-Consistency)76.7DePlot: One-shot visual language reasoning by plot-to-table translation
PaLI-X (Multi-task FT)70.6PaLI-X: On Scaling up a Multilingual Vision and Language Model
DePlot+FlanPaLM (CoT)67.3DePlot: One-shot visual language reasoning by plot-to-table translation
PaLI-370PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Pix2Struct-base56.0Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
0 of 27 row(s) selected.