Natural Language to SQL: Build a Text-to-SQL Pipeline with LangChain

Intermediate60 min3 exercises50 XP

Prerequisites

0/3 exercises

Your product manager asks "What were our top 5 customers by revenue last quarter?" and a developer spends 10 minutes writing the SQL. An LLM can generate that query in under a second — but only if you build the pipeline right. Without schema awareness, safety checks, and proper prompting, the LLM will hallucinate table names, produce queries that delete data, or return results nobody can read. This tutorial builds a complete text-to-SQL pipeline, piece by piece, so you understand every moving part.

What Is Text-to-SQL and How Does It Work?

Text-to-SQL is a pipeline that converts a plain-English question into a SQL query, runs that query against a database, and returns the results in a human-readable format. The LLM does the translation step — it reads the user's question and the database schema, then generates a syntactically correct SQL statement.

The pipeline has five distinct stages, and I want you to see the full picture before we build each one:

The five stages of a text-to-SQL pipeline

Loading editor...

Stages 1, 2, 4, and 5 are pure Python — no LLM needed. Stage 3 is the only part that requires an API call. This matters because it means you can build, test, and debug 80% of your pipeline without spending a single API token. That's exactly what we'll do: build the scaffolding first, then plug in the LLM.

Building the Sample Database

Before we can convert questions to SQL, we need a database to query. SQLite ships with Python, so every code block in this section runs right in your browser. I'm building a small e-commerce database with three related tables — the kind of schema you'd see in any real SaaS product.

Create the e-commerce database

Loading editor...

Three tables, five to seven columns each, with foreign key relationships. That's enough complexity to test joins, aggregations, and filtering — the queries users actually ask for.

Populate with sample data

Loading editor...

Five customers, five products, ten orders. Small enough to trace manually, large enough to produce meaningful aggregations. Every order links back to a real customer and a real product, so joins will work correctly.

Schema Extraction — Teaching the LLM Your Database

An LLM can't generate correct SQL if it doesn't know what tables and columns exist. This is the most common failure point I see in text-to-SQL projects: developers pass the user's question to the LLM without any schema context, and the model invents table names that sound plausible but don't exist.

The fix is straightforward: extract the schema from the database itself and include it in the prompt. SQLite stores schema information in sqlite_master, which we can query directly.

Extract schema from SQLite

Loading editor...

That function pulls the exact CREATE TABLE statements from sqlite_master. This is better than manually listing column names because it includes constraints, types, and foreign key relationships — all context that helps the LLM generate correct joins.

Here's a more production-ready version that includes sample data alongside the schema:

Schema extraction with sample rows

Loading editor...

The sample rows serve as few-shot examples for the LLM. When it sees that signup_date contains strings like "2024-01-15", it knows to use string comparison for date filtering rather than a DATE() function that might not exist in SQLite.

Prompt Construction — The SQL Generation Template

The prompt is where everything comes together: the schema, the user's question, and the instructions that tell the LLM exactly how to respond. A weak prompt produces SQL that's syntactically valid but semantically wrong — it joins the wrong tables, misinterprets column names, or includes unnecessary complexity.

I've iterated on text-to-SQL prompts across dozens of projects, and the template below covers the patterns that actually matter. The key insight: you need to constrain the LLM's output format strictly. If you let it return SQL embedded in an explanation paragraph, your parser breaks.

Build the SQL generation prompt template

Loading editor...

That prompt works as a standalone function, but in production you'd use LangChain's ChatPromptTemplate so you can compose it into a chain. Here's the LangChain version:

LangChain prompt template for SQL generation

Loading editor...

The system message carries the schema and rules. The human message is just the question. Keeping them separate lets the LLM clearly distinguish between instructions and the user's input.

Build a Schema Extractor with Column Types

Write Code

Write a function get_table_info(connection) that returns a formatted string describing all tables in a SQLite database. For each table, list the column names and their types in the format shown below.

Expected output format (one table per block):

Table: customers
  id (INTEGER)
  name (TEXT)
  email (TEXT)

Use PRAGMA table_info(table_name) to get column details. It returns rows where index 1 is the column name and index 2 is the type.

Loading editor...

SQL Validation and Safety Checks

Here's a scenario that should terrify you: a user types "delete all inactive customers" into your text-to-SQL interface, the LLM generates DELETE FROM customers WHERE last_login < '2023-01-01', and your pipeline executes it. Congratulations — you just deleted production data through a chat interface.

SQL validation is not optional. Every text-to-SQL pipeline needs a safety layer between the LLM's output and the database. The rules are simple: only allow SELECT queries, block anything that modifies data, and reject queries that reference tables not in the schema.

SQL safety validator

Loading editor...

The validator catches the obvious attacks: data modification, table drops, SQL injection comments. But notice the word-boundary check on line with \b — without it, a column named UPDATED_AT would falsely trigger the UPDATE blocker.

Beyond safety, we should also verify that the SQL is syntactically valid before sending it to the database. SQLite's EXPLAIN command parses the query without executing it:

Syntax validation using EXPLAIN

Loading editor...

EXPLAIN runs the query planner without touching any data. It catches typos, missing table names, and bad column references — the exact errors that LLMs produce most often. Combining safety validation with syntax checking catches problems before they hit the database.

Executing Queries and Formatting Results

Validation tells us whether a query is safe. Execution gets the actual data. We need a function that chains these together — validate first, execute only if safe, and return results in a structured format that's easy to display.

Safe query executor with formatted output

Loading editor...

The execute_query function wraps validation and execution into a single call. It returns a dictionary with structured data — the columns, the rows, and whether it succeeded. The format_results function turns that dictionary into a readable ASCII table, similar to what you'd see in a SQL client.

Let me show you both functions working together on a real query. Watch how the entire flow works end-to-end:

End-to-end: validate, execute, format

Loading editor...

The SELECT query runs and produces a revenue ranking. The DELETE query gets blocked before it ever touches the database. That's exactly the behavior we want.

Build a SQL Validator That Handles Subqueries

Write Code

Write a function is_safe_query(sql) that returns True if a SQL query is safe to execute, and False otherwise.

Rules:

1. The query must start with SELECT (case-insensitive)

2. The query must NOT contain any of these keywords: DROP, DELETE, INSERT, UPDATE, ALTER, TRUNCATE

3. The query must NOT contain SQL comment markers: -- or /*

4. Keywords should be matched as whole words only (so UPDATED_AT is fine, but UPDATE is not)

Return True for safe queries, False for dangerous ones.

Loading editor...

The Complete LangChain Pipeline

We've built every component individually: schema extraction, prompt construction, validation, execution, and formatting. It's time to wire them together into a single LangChain chain. The LLM only handles one step — generating the SQL. Everything else is deterministic Python.

Full text-to-SQL pipeline with LangChain

Loading editor...

The run_text_to_sql function is the entry point. It takes a plain-English question, generates SQL, validates it, executes it, and returns formatted results. The schema is extracted once and reused — no need to query sqlite_master on every call.

For a more composable version, you can build this as a proper LCEL chain using RunnableLambda to wire the components:

LCEL chain version

Loading editor...

The LCEL version composes the same logic into a single chain that you can invoke(), batch(), or stream(). Each RunnableLambda wraps a pure Python function, so testing is straightforward — you can unit test clean_sql and validate_and_execute independently.

Real-World Example: Interactive Database Explorer

Let's pull everything together into a reusable class that wraps the entire pipeline. This is the pattern I use in production — a DatabaseExplorer that initializes with a database connection, extracts the schema once, and exposes a single ask() method.

Production-ready DatabaseExplorer class

Loading editor...

The class encapsulates the entire pipeline. Schema extraction happens once during __init__. Each ask() call generates SQL, validates it, runs it, and returns a structured dictionary. In a web application, you'd instantiate one DatabaseExplorer per database and serve it behind an API endpoint.

Common Mistakes and How to Fix Them

I've debugged text-to-SQL pipelines for teams ranging from startups to enterprise. These are the mistakes I see repeatedly, ranked by how much damage they cause.

No schema in the prompt

# The LLM guesses table names
prompt = f"Write SQL to answer: {question}"
# LLM output: SELECT * FROM user_accounts
# Real table name: customers  --> query fails

Schema included in prompt

# The LLM sees exactly what exists
prompt = f"""Schema:
{schema}
Write SQL to answer: {question}"""
# LLM output: SELECT name FROM customers
# Correct table name --> query succeeds

Without schema context, the LLM will confidently generate SQL using table names like user_accounts, user_data, or users — all reasonable guesses, all wrong. This is the single most impactful fix you can make.

No output format constraints

# LLM returns SQL wrapped in explanation
response = """Here's the SQL query:
```sql
SELECT name FROM customers
```
This query selects..."""
# Your parser crashes on the markdown

Strict output format + cleanup

# Prompt says "ONLY the SQL query"
# Plus a cleanup function as backup
def clean_sql(text):
    text = text.strip()
    if text.startswith("```"):
        lines = text.split("\n")
        text = "\n".join(lines[1:-1])
    return text.strip()

Even with explicit instructions to return only SQL, models occasionally add explanations or markdown formatting. The cleanup function is your safety net. Without it, your pipeline fails intermittently — the worst kind of bug to debug.

Common error: LLM generates wrong column name

Loading editor...

Limitations and Accuracy Tips

Text-to-SQL is powerful but not universal. After building these systems for multiple clients, I've developed a clear sense of where this approach excels and where it falls apart.

Works well for:

Simple to moderately complex queries (joins, aggregations, filters)

Databases with clear, descriptive column names

Questions that map directly to SQL operations

Internal tools where the user base is known and questions are predictable

Struggles with:

Queries requiring multiple CTEs or recursive SQL

Databases with cryptic column names (e.g., col_a, f1, metric_03)

Questions that need domain-specific business logic ("active customers" means different things to different teams)

Cross-database joins or queries spanning multiple schemas

One technique that dramatically improves accuracy is providing example question-SQL pairs in your prompt. This is essentially few-shot prompting applied to SQL generation:

Few-shot examples improve accuracy

Loading editor...

The few-shot examples teach the LLM your naming conventions (e.g., customer_count instead of cnt), your join style, and your preference for aliases. Three to five examples is usually enough to establish the pattern.

Build a Query Result Formatter

Write Code

Write a function format_as_table(columns, rows) that takes a list of column names and a list of row tuples, and returns a formatted ASCII table string.

The table should have:

A header row with column names

A separator row of dashes

Data rows aligned to column widths

A row count at the end

Example:

name  | city
------+----------
Priya | Mumbai
James | San Francisco

(2 rows)

Each column width should be the maximum of the column name length and the longest value in that column.

Loading editor...

Common SQLite Errors from LLM-Generated Queries

Beyond logical mistakes, these are the exact error messages you'll see most often. Each one points to a specific gap in your prompt or validation layer.

Error 1: no such table

Loading editor...

Error 2: no such column

Loading editor...

Error 3: ambiguous column name

Loading editor...

The fourth common issue isn't a crash — it's a semantically wrong query. The SQL runs fine but answers a different question than what the user asked. The best defense against this is including few-shot examples in your prompt that demonstrate the correct interpretation of ambiguous terms.

Frequently Asked Questions

Can text-to-SQL handle multiple databases (PostgreSQL, MySQL)?

Yes. The pipeline structure stays identical — you change two things: (1) the schema extraction function (each database has its own metadata tables), and (2) the dialect parameter in the prompt. SQLite uses sqlite_master, PostgreSQL uses information_schema.tables, and MySQL uses SHOW TABLES. The LLM handles dialect-specific syntax differences if you tell it which dialect to use.

How do I handle very large schemas with 50+ tables?

Don't send the entire schema in every prompt. First, use an LLM call to identify which tables are relevant to the question (a "table selector" step), then send only those tables' schemas to the SQL generation step. This two-step approach keeps the prompt focused and reduces token costs.

What about LangChain's built-in `SQLDatabaseChain`?

LangChain offers SQLDatabaseChain and create_sql_query_chain that automate much of what we built manually. They're useful for prototyping, but building the pipeline yourself gives you full control over validation, safety rules, and error handling. In production, I always end up replacing the built-in chain with a custom pipeline because every database has quirks that need custom logic.

Summary

A text-to-SQL pipeline has five stages: schema extraction, prompt construction, SQL generation, validation, and execution. The LLM only handles one of those five stages — the rest is deterministic Python that you can test and debug without API calls.

The three decisions that matter most: (1) include the full schema with sample rows in every prompt, (2) validate every generated query before executing it, and (3) connect to the database with a read-only user. Get those three right and your pipeline will be both accurate and safe.

Practice exercise

Extend the DatabaseExplorer class to support a conversation history. When the user asks a follow-up question like "and what about last quarter?", the system should remember the previous question's context (which tables and columns were used) and incorporate it into the new prompt. Hint: store the last generated SQL and include it as context in the next prompt.

<summary>Solution sketch</summary>

```python

class DatabaseExplorer:

def __init__(self, db_path):

# ... existing init code ...

self.history = [] # Store (question, sql) pairs

def ask(self, question):

# Build context from history

context = ""

if self.history:

prev_q, prev_sql = self.history[-1]

context = f"\nPrevious question: {prev_q}\nPrevious SQL: {prev_sql}\n"

# Add context to the prompt

full_question = context + question

# ... rest of the pipeline ...

# Store in history

self.history.append((question, generated_sql))

```

</details>

References

LangChain Documentation — SQL. Link

SQLite Documentation — PRAGMA statements. Link

SQLite Documentation — sqlite_master table. Link

Rajkumar, N. et al. — "Evaluating the Text-to-SQL Capabilities of Large Language Models." arXiv:2204.00498 (2023).

LangChain API Reference — ChatPromptTemplate. Link

Python Documentation — sqlite3 module. Link

Yu, T. et al. — "Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task." EMNLP 2018.

What Is Text-to-SQL and How Does It Work?

Building the Sample Database

Schema Extraction — Teaching the LLM Your Database

Prompt Construction — The SQL Generation Template

SQL Validation and Safety Checks

Executing Queries and Formatting Results

The Complete LangChain Pipeline

Real-World Example: Interactive Database Explorer

Common Mistakes and How to Fix Them

Limitations and Accuracy Tips

Common SQLite Errors from LLM-Generated Queries

Frequently Asked Questions

Summary

References

Related Tutorials