HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Learning Metadata-Agnostic Representations for Text-to-SQL In-Context Example Selection

Mai Chuhong ; Tal Ro-ee ; Mohamed Thahir

Learning Metadata-Agnostic Representations for Text-to-SQL In-Context
  Example Selection

Abstract

In-context learning (ICL) is a powerful paradigm where large language models(LLMs) benefit from task demonstrations added to the prompt. Yet, selectingoptimal demonstrations is not trivial, especially for complex or multi-modaltasks where input and output distributions differ. We hypothesize that formingtask-specific representations of the input is key. In this paper, we propose amethod to align representations of natural language questions and those of SQLqueries in a shared embedding space. Our technique, dubbed MARLO -Metadata-Agnostic Representation Learning for Text-tO-SQL - uses querystructure to model querying intent without over-indexing on underlying databasemetadata (i.e. tables, columns, or domain-specific entities of a databasereferenced in the question or query). This allows MARLO to select examples thatare structurally and semantically relevant for the task rather than examplesthat are spuriously related to a certain domain or question phrasing. When usedto retrieve examples based on question similarity, MARLO shows superiorperformance compared to generic embedding models (on average +2.9\%pt. inexecution accuracy) on the Spider benchmark. It also outperforms the next bestmethod that masks metadata information by +0.8\%pt. in execution accuracy onaverage, while imposing a significantly lower inference latency.

Benchmarks

BenchmarkMethodologyMetrics
text-to-sql-on-spiderMARLO + Claude 2.1
Execution Accuracy (Dev): 83.6
Execution Accuracy (Test): 84.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Learning Metadata-Agnostic Representations for Text-to-SQL In-Context Example Selection | Papers | HyperAI