Speed Up Text-to-SQL Inference with NVIDIA NIM for More Efficient Analytics Queries
Accelerating Text-to-SQL Inference on Vanna with NVIDIA NIM for Faster Analytics Slow and inefficient query generation from natural language inputs can hinder decision-making processes, forcing analysts and business users to depend heavily on data teams. This dependency can delay insights and limit agility. Text-to-SQL technology is changing the way we interact with data, allowing users to query structured databases using natural language. This is particularly valuable in specialized industries that benefit from domain-specific models. However, scaling these models for analytics workloads often leads to performance and latency issues. In this tutorial, we demonstrate how to optimize Vanna's text-to-SQL solution using NVIDIA NIM, which provides accelerated inference microservices for generative AI models. Vanna’s open-source text-to-SQL solution has gained popularity among organizations for its flexibility, security, and adaptability. What You'll Learn How to download and preprocess the Steam games dataset from Kaggle. How to initialize Vanna with NVIDIA NIM and NeMo Retriever. How to create and populate an SQLite database. How to train Vanna with contextual data. How to generate SQL queries using the optimized Vanna instance. Prerequisites Python installed on your system. Access to the NVIDIA NIM documentation. Basic knowledge of SQL and Python programming. A Kaggle account and API credentials to download the dataset. Step-by-Step Procedure 1. Download and Preprocess the Steam Games Dataset To follow along, clone the repository located at /NVIDIA/GenerativeAIExamples and open the Vanna_with_NVIDIA notebook in the community section. The preprocessing steps are adapted from an excellent Kaggle notebook and can be found in the ‘Data Preparation’ section of the Vanna with NVIDIA notebook. The key preprocessing steps include: Cleaning and formatting the dataset. Splitting the data into separate tables for games, categories, and tags. Handling missing or inconsistent data. At the end of the preprocessing, you should have three CSV files: games.csv: Contains game metadata such as app ID, name, release date, price, and descriptions. categories.csv: Lists the categories associated with each game. tags.csv: Includes tags and their frequencies for each game. 2. Initialize Vanna with NVIDIA NIM and NeMo Retriever A Vanna instance requires connections to a vector database, an embedder, and the LLM endpoint. For this tutorial, we use the Milvus vector database, the NVIDIA retriever embedding model (llama-3.2-nv-embedqa-1b-v2), and the Llama 3.1 70B NIM microservice. NVIDIA NIM microservices run on NVIDIA's accelerated infrastructure, offering faster response times and cost efficiency for production deployments. NIM is included with the NVIDIA AI Enterprise software license, allowing you to maintain full control over your customizations, intellectual property (IP), and AI applications. For detailed setup instructions on NIM endpoints, consult the NVIDIA NIM documentation. Figure 1. NVIDIA Accelerated Vanna Pipeline Next, we will initialize a Vanna instance with NVIDIA-accelerated components. 3. Create and Populate the SQLite Database First, import the necessary libraries: python import sqlite3 import pandas as pd Specify the path to the SQLite database and connect to it: python sqlite_path = 'steam_data.db' sql_connect = sqlite3.connect(sqlite_path) c = sql_connect.cursor() Create the necessary tables in the database: ```python init_sqls = """ CREATE TABLE IF NOT EXISTS games ( app_id INTEGER PRIMARY KEY, name TEXT, release_date TEXT, price REAL, short_description TEXT, positive INTEGER, negative INTEGER, min_owners INTEGER, max_owners INTEGER, hltb_single REAL ); CREATE TABLE IF NOT EXISTS categories ( app_id INTEGER, categories TEXT, FOREIGN KEY (app_id) REFERENCES games(app_id) ); CREATE TABLE IF NOT EXISTS tags ( app_id INTEGER, tags TEXT, tag_frequencies TEXT, FOREIGN KEY (app_id) REFERENCES games(app_id) ); """ for sql in init_sqls.split(";"): c.execute(sql) ``` Read the processed CSV files and insert the data into the respective tables: ```python games_df = pd.read_csv('processed_dataset/games.csv') categories_df = pd.read_csv('processed_dataset/categories.csv') tags_df = pd.read_csv('processed_dataset/tags.csv') games_df.to_sql('games', sql_connect, if_exists='append', index=False) categories_df.to_sql('categories', sql_connect, if_exists='append', index=False) tags_df.to_sql('tags', sql_connect, if_exits='append', index=False) sql_connect.commit() ``` 4. Train with Data Vanna performs optimally when provided with context on the queried data and the specific terminology used in your business. To train Vanna, you can use the preprocessed data and any additional business-specific information. 5. Generate SQL Queries Once everything is set up, you can start generating SQL queries using the NIM-powered Vanna instance. This will significantly speed up the process of converting natural language inputs into structured queries, making analytics more responsive and efficient. Conclusion In this tutorial, you learned how to accelerate text-to-SQL inference on Vanna using NVIDIA NIM. By setting up a clean database schema, using a real-world dataset, and optimizing the inference endpoint, the system is well-positioned to deliver faster and more accurate analytics responses to user-generated queries. Next Steps Experiment with different training data to improve SQL query generation. Explore other datasets and use cases to further enhance Vanna’s capabilities. Refer to the NVIDIA NIM documentation for advanced configurations and optimizations.