HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Chart Question Answering
Chart Question Answering On Chartqa
Chart Question Answering On Chartqa
评估指标
1:1 Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
1:1 Accuracy
Paper Title
Repository
PaLI-3 (w/ OCR)
69.5
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
DePlot+GPT3 (Self-Consistency)
42.3
DePlot: One-shot visual language reasoning by plot-to-table translation
MatCha
64.2
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
UniChart
66.24
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
PaLI-X (Single-task FT)
70.9
PaLI-X: On Scaling up a Multilingual Vision and Language Model
DePlot+GPT3 (CoT)
36.9
DePlot: One-shot visual language reasoning by plot-to-table translation
VisionTapas-OCR
45.5
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
Pix2Struct-large
58.6
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Gemini Ultra
80.8
Gemini: A Family of Highly Capable Multimodal Models
StructChart+GPT3.5 (STR)
60.7
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding
-
SMoLA-PaLI-X Generalist Model
73.8
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
-
Qwen-VL
65.7
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
SMoLA-PaLI-X Specialist Model
74.6
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
-
ScreenAI 5B (4.62 B params, w/ OCR)
76.7
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
ChartPaLI-5B + PaLM 2-S
81.3
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
-
DePlot+Codex (PoT Self-Consistency)
76.7
DePlot: One-shot visual language reasoning by plot-to-table translation
PaLI-X (Multi-task FT)
70.6
PaLI-X: On Scaling up a Multilingual Vision and Language Model
DePlot+FlanPaLM (CoT)
67.3
DePlot: One-shot visual language reasoning by plot-to-table translation
PaLI-3
70
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Pix2Struct-base
56.0
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
0 of 27 row(s) selected.
Previous
Next