Natural Language to SQL: Build a Text-to-SQL Pipeline with LangChain
Your product manager asks "What were our top 5 customers by revenue last quarter?" and a developer spends 10 minutes writing the SQL. An LLM can generate that query in under a second — but only if you build the pipeline right. Without schema awareness, safety checks, and proper prompting, the LLM will hallucinate table names, produce queries that delete data, or return results nobody can read. This tutorial builds a complete text-to-SQL pipeline, piece by piece, so you understand every moving part.
What Is Text-to-SQL and How Does It Work?
Text-to-SQL is a pipeline that converts a plain-English question into a SQL query, runs that query against a database, and returns the results in a human-readable format. The LLM does the translation step — it reads the user's question and the database schema, then generates a syntactically correct SQL statement.
The pipeline has five distinct stages, and I want you to see the full picture before we build each one:
Stages 1, 2, 4, and 5 are pure Python — no LLM needed. Stage 3 is the only part that requires an API call. This matters because it means you can build, test, and debug 80% of your pipeline without spending a single API token. That's exactly what we'll do: build the scaffolding first, then plug in the LLM.
Building the Sample Database
Before we can convert questions to SQL, we need a database to query. SQLite ships with Python, so every code block in this section runs right in your browser. I'm building a small e-commerce database with three related tables — the kind of schema you'd see in any real SaaS product.
Three tables, five to seven columns each, with foreign key relationships. That's enough complexity to test joins, aggregations, and filtering — the queries users actually ask for.
Five customers, five products, ten orders. Small enough to trace manually, large enough to produce meaningful aggregations. Every order links back to a real customer and a real product, so joins will work correctly.
Schema Extraction — Teaching the LLM Your Database
An LLM can't generate correct SQL if it doesn't know what tables and columns exist. This is the most common failure point I see in text-to-SQL projects: developers pass the user's question to the LLM without any schema context, and the model invents table names that sound plausible but don't exist.
The fix is straightforward: extract the schema from the database itself and include it in the prompt. SQLite stores schema information in sqlite_master, which we can query directly.
That function pulls the exact CREATE TABLE statements from sqlite_master. This is better than manually listing column names because it includes constraints, types, and foreign key relationships — all context that helps the LLM generate correct joins.
Here's a more production-ready version that includes sample data alongside the schema:
The sample rows serve as few-shot examples for the LLM. When it sees that signup_date contains strings like "2024-01-15", it knows to use string comparison for date filtering rather than a DATE() function that might not exist in SQLite.
Prompt Construction — The SQL Generation Template
The prompt is where everything comes together: the schema, the user's question, and the instructions that tell the LLM exactly how to respond. A weak prompt produces SQL that's syntactically valid but semantically wrong — it joins the wrong tables, misinterprets column names, or includes unnecessary complexity.
I've iterated on text-to-SQL prompts across dozens of projects, and the template below covers the patterns that actually matter. The key insight: you need to constrain the LLM's output format strictly. If you let it return SQL embedded in an explanation paragraph, your parser breaks.
That prompt works as a standalone function, but in production you'd use LangChain's ChatPromptTemplate so you can compose it into a chain. Here's the LangChain version:
The system message carries the schema and rules. The human message is just the question. Keeping them separate lets the LLM clearly distinguish between instructions and the user's input.
Write a function get_table_info(connection) that returns a formatted string describing all tables in a SQLite database. For each table, list the column names and their types in the format shown below.
Expected output format (one table per block):
Table: customers
id (INTEGER)
name (TEXT)
email (TEXT)Use PRAGMA table_info(table_name) to get column details. It returns rows where index 1 is the column name and index 2 is the type.
SQL Validation and Safety Checks
Here's a scenario that should terrify you: a user types "delete all inactive customers" into your text-to-SQL interface, the LLM generates DELETE FROM customers WHERE last_login < '2023-01-01', and your pipeline executes it. Congratulations — you just deleted production data through a chat interface.
SQL validation is not optional. Every text-to-SQL pipeline needs a safety layer between the LLM's output and the database. The rules are simple: only allow SELECT queries, block anything that modifies data, and reject queries that reference tables not in the schema.
The validator catches the obvious attacks: data modification, table drops, SQL injection comments. But notice the word-boundary check on line with \b — without it, a column named UPDATED_AT would falsely trigger the UPDATE blocker.
Beyond safety, we should also verify that the SQL is syntactically valid before sending it to the database. SQLite's EXPLAIN command parses the query without executing it:
EXPLAIN runs the query planner without touching any data. It catches typos, missing table names, and bad column references — the exact errors that LLMs produce most often. Combining safety validation with syntax checking catches problems before they hit the database.
Executing Queries and Formatting Results
Validation tells us whether a query is safe. Execution gets the actual data. We need a function that chains these together — validate first, execute only if safe, and return results in a structured format that's easy to display.
The execute_query function wraps validation and execution into a single call. It returns a dictionary with structured data — the columns, the rows, and whether it succeeded. The format_results function turns that dictionary into a readable ASCII table, similar to what you'd see in a SQL client.
Let me show you both functions working together on a real query. Watch how the entire flow works end-to-end:
The SELECT query runs and produces a revenue ranking. The DELETE query gets blocked before it ever touches the database. That's exactly the behavior we want.
Write a function is_safe_query(sql) that returns True if a SQL query is safe to execute, and False otherwise.
Rules:
1. The query must start with SELECT (case-insensitive)
2. The query must NOT contain any of these keywords: DROP, DELETE, INSERT, UPDATE, ALTER, TRUNCATE
3. The query must NOT contain SQL comment markers: -- or /*
4. Keywords should be matched as whole words only (so UPDATED_AT is fine, but UPDATE is not)
Return True for safe queries, False for dangerous ones.
The Complete LangChain Pipeline
We've built every component individually: schema extraction, prompt construction, validation, execution, and formatting. It's time to wire them together into a single LangChain chain. The LLM only handles one step — generating the SQL. Everything else is deterministic Python.
The run_text_to_sql function is the entry point. It takes a plain-English question, generates SQL, validates it, executes it, and returns formatted results. The schema is extracted once and reused — no need to query sqlite_master on every call.
For a more composable version, you can build this as a proper LCEL chain using RunnableLambda to wire the components:
The LCEL version composes the same logic into a single chain that you can invoke(), batch(), or stream(). Each RunnableLambda wraps a pure Python function, so testing is straightforward — you can unit test clean_sql and validate_and_execute independently.
Real-World Example: Interactive Database Explorer
Let's pull everything together into a reusable class that wraps the entire pipeline. This is the pattern I use in production — a DatabaseExplorer that initializes with a database connection, extracts the schema once, and exposes a single ask() method.
The class encapsulates the entire pipeline. Schema extraction happens once during __init__. Each ask() call generates SQL, validates it, runs it, and returns a structured dictionary. In a web application, you'd instantiate one DatabaseExplorer per database and serve it behind an API endpoint.
Common Mistakes and How to Fix Them
I've debugged text-to-SQL pipelines for teams ranging from startups to enterprise. These are the mistakes I see repeatedly, ranked by how much damage they cause.
# The LLM guesses table names
prompt = f"Write SQL to answer: {question}"
# LLM output: SELECT * FROM user_accounts
# Real table name: customers --> query fails# The LLM sees exactly what exists
prompt = f"""Schema:
{schema}
Write SQL to answer: {question}"""
# LLM output: SELECT name FROM customers
# Correct table name --> query succeedsWithout schema context, the LLM will confidently generate SQL using table names like user_accounts, user_data, or users — all reasonable guesses, all wrong. This is the single most impactful fix you can make.
# LLM returns SQL wrapped in explanation
response = """Here's the SQL query:
```sql
SELECT name FROM customers
```
This query selects..."""
# Your parser crashes on the markdown# Prompt says "ONLY the SQL query"
# Plus a cleanup function as backup
def clean_sql(text):
text = text.strip()
if text.startswith("```"):
lines = text.split("\n")
text = "\n".join(lines[1:-1])
return text.strip()Even with explicit instructions to return only SQL, models occasionally add explanations or markdown formatting. The cleanup function is your safety net. Without it, your pipeline fails intermittently — the worst kind of bug to debug.
Limitations and Accuracy Tips
Text-to-SQL is powerful but not universal. After building these systems for multiple clients, I've developed a clear sense of where this approach excels and where it falls apart.
Works well for:
Struggles with:
col_a, f1, metric_03)One technique that dramatically improves accuracy is providing example question-SQL pairs in your prompt. This is essentially few-shot prompting applied to SQL generation:
The few-shot examples teach the LLM your naming conventions (e.g., customer_count instead of cnt), your join style, and your preference for aliases. Three to five examples is usually enough to establish the pattern.
Write a function format_as_table(columns, rows) that takes a list of column names and a list of row tuples, and returns a formatted ASCII table string.
The table should have:
Example:
name | city
------+----------
Priya | Mumbai
James | San Francisco
(2 rows)Each column width should be the maximum of the column name length and the longest value in that column.
Common SQLite Errors from LLM-Generated Queries
Beyond logical mistakes, these are the exact error messages you'll see most often. Each one points to a specific gap in your prompt or validation layer.
The fourth common issue isn't a crash — it's a semantically wrong query. The SQL runs fine but answers a different question than what the user asked. The best defense against this is including few-shot examples in your prompt that demonstrate the correct interpretation of ambiguous terms.
Frequently Asked Questions
Can text-to-SQL handle multiple databases (PostgreSQL, MySQL)?
Yes. The pipeline structure stays identical — you change two things: (1) the schema extraction function (each database has its own metadata tables), and (2) the dialect parameter in the prompt. SQLite uses sqlite_master, PostgreSQL uses information_schema.tables, and MySQL uses SHOW TABLES. The LLM handles dialect-specific syntax differences if you tell it which dialect to use.
How do I handle very large schemas with 50+ tables?
Don't send the entire schema in every prompt. First, use an LLM call to identify which tables are relevant to the question (a "table selector" step), then send only those tables' schemas to the SQL generation step. This two-step approach keeps the prompt focused and reduces token costs.
What about LangChain's built-in `SQLDatabaseChain`?
LangChain offers SQLDatabaseChain and create_sql_query_chain that automate much of what we built manually. They're useful for prototyping, but building the pipeline yourself gives you full control over validation, safety rules, and error handling. In production, I always end up replacing the built-in chain with a custom pipeline because every database has quirks that need custom logic.
Summary
A text-to-SQL pipeline has five stages: schema extraction, prompt construction, SQL generation, validation, and execution. The LLM only handles one of those five stages — the rest is deterministic Python that you can test and debug without API calls.
The three decisions that matter most: (1) include the full schema with sample rows in every prompt, (2) validate every generated query before executing it, and (3) connect to the database with a read-only user. Get those three right and your pipeline will be both accurate and safe.