5 months ago

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

Liu Fangyu ; Piccinno Francesco ; Krichene Syrine ; Pang Chenxi ; Lee Kenton ; Joshi Mandar ; Altun Yasemin ; Collier Nigel ; Eisenschlos Julian Martin

Abstract

Visual language data such as plots, charts, and infographics are ubiquitousin the human world. However, state-of-the-art vision-language models do notperform well on these data. We propose MatCha (Math reasoning and Chartderendering pretraining) to enhance visual language models' capabilities injointly modeling charts/plots and language data. Specifically, we proposeseveral pretraining tasks that cover plot deconstruction and numericalreasoning which are the key capabilities in visual language modeling. We perform the MatCha pretraining starting from Pix2Struct, a recentlyproposed image-to-text visual language model. On standard benchmarks such asPlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by asmuch as nearly 20%. We also examine how well MatCha pretraining transfers todomains such as screenshots, textbook diagrams, and document figures andobserve overall improvement, verifying the usefulness of MatCha pretraining onbroader visual language tasks.

Code Repositories

huggingface/transformers

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
chart-question-answering-on-chartqa	MatCha	1:1 Accuracy: 64.2
chart-question-answering-on-plotqa	MatCha	1:1 Accuracy: 91.5
chart-question-answering-on-realcqa	Matcha-chartQA	1:1 Accuracy: 0.259728175283818
visual-question-answering-on-docvqa-test	MatCha	ANLS: 0.742
visual-question-answering-on-plotqa-d1-1	MatCha	1:1 Accuracy: 92.3
visual-question-answering-on-plotqa-d2-1	MatCha	1:1 Accuracy: 90.7
visual-question-answering-vqa-on	MatCha	ANLS: 37.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette