5 months ago

TAPEX: Table Pre-training via Learning a Neural SQL Executor

Qian Liu; Bei Chen; Jiaqi Guo; Morteza Ziyadi; Zeqi Lin; Weizhu Chen; Jian-Guang Lou

Abstract

Recent progress in language model pre-training has achieved a great success via leveraging large-scale unstructured textual data. However, it is still a challenge to apply pre-training on structured tabular data due to the absence of large-scale high-quality tabular data. In this paper, we propose TAPEX to show that table pre-training can be achieved by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries and their execution outputs. TAPEX addresses the data scarcity challenge via guiding the language model to mimic a SQL executor on the diverse, large-scale and high-quality synthetic corpus. We evaluate TAPEX on four benchmark datasets. Experimental results demonstrate that TAPEX outperforms previous table pre-training approaches by a large margin and achieves new state-of-the-art results on all of them. This includes the improvements on the weakly-supervised WikiSQL denotation accuracy to 89.5% (+2.3%), the WikiTableQuestions denotation accuracy to 57.5% (+4.8%), the SQA denotation accuracy to 74.5% (+3.5%), and the TabFact accuracy to 84.2% (+3.2%). To our knowledge, this is the first work to exploit table pre-training via synthetic executable programs and to achieve new state-of-the-art results on various downstream tasks. Our code can be found at https://github.com/microsoft/Table-Pretraining.

Code Repositories

pwc-1/Paper-9/tree/main/1/tapex

mindspore

sohanpatnaik106/cabinet_qa

pytorch

Mentioned in GitHub

MindCode-4/code-5/tree/main/tapex

mindspore

microsoft/Table-Pretraining

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
semantic-parsing-on-sqa	TAPEX-Large	Denotation Accuracy: 74.5
semantic-parsing-on-wikisql-1	TAPEX-Large (weak supervision)	Denotation accuracy (test): 89.5
semantic-parsing-on-wikitablequestions	TAPEX-Large	Accuracy (Dev): 57.0 Accuracy (Test): 57.5
table-based-fact-verification-on-tabfact	TAPEX-Large	Test: 84.2 Val: 84.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette