Command Palette
Search for a command to run...
Evaluating and Enhancing LLMs for Multi-turn Text-to-SQL with Multiple Question Types
Guo Ziming ; Ma Chao ; Sun Yinggang ; Zhao Tiancheng ; Wang Guangyao ; Huang Hai

Abstract
Recent advancements in large language models (LLMs) have significantlyadvanced text-to-SQL systems. However, most LLM-based methods often narrowlyfocus on SQL generation, neglecting the complexities of real-worldconversational queries. This oversight can lead to unreliable responses,particularly for ambiguous questions that cannot be directly addressed withSQL. To bridge this gap, we propose MMSQL, a comprehensive test suite designedto evaluate the question classification and SQL generation capabilities of LLMsby simulating real-world scenarios with diverse question types and multi-turnQ&A interactions. Using MMSQL, we assessed the performance of popular LLMs,including both open-source and closed-source models, and identified key factorsimpacting their performance in such scenarios. Moreover, we introduce anLLM-based multi-agent framework that employs specialized agents to identifyquestion types and determine appropriate answering strategies. Our experimentsdemonstrate that this approach significantly enhances the model's ability tonavigate the complexities of conversational dynamics, effectively handling thediverse and complex nature of user queries. Our dataset and code are publiclyavailable at https://mcxiaoxiao.github.io/MMSQL.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| mmsql-performance-on-mmsql | SQLCoder-8B | TDEX: 30.7 |
| mmsql-performance-on-mmsql | Gemini-1.5 Flash | TDEX: 65.8 |
| mmsql-performance-on-mmsql | Llama3-8B | TDEX: 64.0 |
| mmsql-performance-on-mmsql | GPT-4 Turbo | TDEX: 67.0 |
| mmsql-performance-on-mmsql | Llama3-70B | TDEX: 62.8 |
| mmsql-performance-on-mmsql | GPT-3.5 Turbo | TDEX: 64.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.