HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Carbune Victor ; Mansoor Hassan ; Liu Fangyu ; Aralikatte Rahul ; Baechler Gilles ; Chen Jindong ; Sharma Abhanshu

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Abstract

Vision-language models (VLMs) are achieving increasingly strong performanceon multimodal tasks. However, reasoning capabilities remain limitedparticularly for smaller VLMs, while those of large-language models (LLMs) haveseen numerous improvements. We propose a technique to transfer capabilitiesfrom LLMs to VLMs. On the recently introduced ChartQA, our method obtainsstate-of-the-art performance when applied on the PaLI3-5B VLM by\citet{chen2023pali3}, while also enabling much better performance on PlotQAand FigureQA. We first improve the chart representation by continuing the pre-trainingstage using an improved version of the chart-to-table translation task by\citet{liu2023deplot}. We then propose constructing a 20x larger dataset thanthe original training set. To improve general reasoning capabilities andimprove numerical operations, we synthesize reasoning traces using the tablerepresentation of charts. Lastly, our model is fine-tuned using the multitaskloss introduced by \citet{hsieh2023distilling}. Our variant ChartPaLI-5B outperforms even 10x larger models such as PaLIX-55Bwithout using an upstream OCR system, while keeping inference time constantcompared to the PaLI3-5B baseline. When rationales are further refined with asimple program-of-thought prompt \cite{chen2023program}, our model outperformsthe recently introduced Gemini Ultra and GPT-4V.

Benchmarks

BenchmarkMethodologyMetrics
chart-question-answering-on-chartqaChartPaLI-5B + PaLM 2-S
1:1 Accuracy: 81.3
chart-question-answering-on-chartqaChartPaLI-5B
1:1 Accuracy: 77.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs | Papers | HyperAI