Command Palette
Search for a command to run...
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Masry Ahmed ; Kavehzadeh Parsa ; Do Xuan Long ; Hoque Enamul ; Joty Shafiq

Abstract
Charts are very popular for analyzing data, visualizing key insights andanswering complex reasoning questions about data. To facilitate chart-baseddata analysis using natural language, several downstream tasks have beenintroduced recently such as chart question answering and chart summarization.However, most of the methods that solve these tasks use pretraining on languageor vision-language tasks that do not attempt to explicitly model the structureof the charts (e.g., how data is visually encoded and how chart elements arerelated to each other). To address this, we first build a large corpus ofcharts covering a wide variety of topics and visual styles. We then presentUniChart, a pretrained model for chart comprehension and reasoning. UniChartencodes the relevant text, data, and visual elements of charts and then uses achart-grounded text decoder to generate the expected output in naturallanguage. We propose several chart-specific pretraining tasks that include: (i)low-level tasks to extract the visual elements (e.g., bars, lines) and datafrom charts, and (ii) high-level tasks to acquire chart understanding andreasoning skills. We find that pretraining the model on a large corpus withchart-specific low- and high-level tasks followed by finetuning on threedown-streaming tasks results in state-of-the-art performance on threedownstream tasks.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| chart-question-answering-on-chartqa | UniChart | 1:1 Accuracy: 66.24 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.