Structured Output from LLMs: JSON Mode, Pydantic, and the Instructor Library in Python
Get reliable, typed data from any LLM — from fragile [JSON](/python/python-json/) prompts to schema-constrained generation with automatic validation and retries.
You ask an LLM to extract product information from a customer review, and it returns a nicely worded paragraph. Helpful for a human, useless for your database. You need a JSON object with product_name, rating, and issues — not prose. And you need it in that exact shape every single time. The code downstream will crash the moment a field is missing or the rating shows up as "four out of five" instead of 4.
This tutorial walks you through three levels of forcing LLMs to return structured data: prompt-based JSON extraction (fragile), OpenAI's built-in JSON mode and structured output (reliable), and the Instructor library for complex schemas with automatic retry on validation failure. Every JSON-mode code block runs directly in your browser.
The Problem with Free-Text LLM Output
LLMs generate text. Your application needs data. That mismatch is the root cause of most integration headaches I've seen in production AI systems. The model returns "The sentiment is positive" when your code expects {"sentiment": "positive", "confidence": 0.92}. You end up writing brittle regex parsers or string-splitting hacks that break the moment the model rephrases its answer.
There are three approaches to solving this, each more robust than the last:
| Approach | Reliability | Complexity | Best For |
|---|---|---|---|
| Prompt engineering ("reply in JSON") | Low — model may add commentary | Minimal | Quick prototypes |
response_format with JSON mode | Medium — guaranteed valid JSON | Low | Simple flat schemas |
response_format with strict schemas | High — exact field names and types | Medium | Production systems |
| Instructor + Pydantic | Highest — validation + retries | Medium | Complex nested schemas |
We will build up from the weakest approach to the strongest, so you understand exactly what each layer adds.
Setup — OpenAI Client and API Key
Approach 1: Prompt-Based JSON Extraction
The simplest approach: just ask the model to respond in JSON. No special parameters, no libraries. I still use this for throwaway scripts where I don't care if it occasionally fails.
This works most of the time. The problem is "most of the time" is not good enough for production. The model might wrap the JSON in markdown code fences (`
... ), add a preamble like "Here is the extracted data:", or occasionally return malformed JSON. I've had a model return trailing commas that break json.loads()` — valid in JavaScript, invalid in Python.
Approach 2: JSON Mode with response_format
OpenAI provides a response_format parameter that forces the model to return valid JSON. No code fences, no preamble, no trailing commas. The output is guaranteed to be parseable by json.loads(). This is the minimum you should use for any code that runs unattended.
Two things to notice. First, response_format={"type": "json_object"} guarantees the output is valid JSON — no parsing errors. Second, you still need to mention "JSON" in the system message. OpenAI requires this; if you set json_object mode without mentioning JSON in the prompt, the API returns an error.
Here is a comparison of what can go wrong with each approach so far:
# All of these can happen:
# 1. "Here is the JSON: {...}" (preamble)
# 2. ```json\n{...}\n``` (code fences)
# 3. {"rating": "7/10"} (wrong type)
# 4. {"product": "..."} (wrong field name)
# 5. {trailing_comma: true,} (invalid JSON)
try:
data = json.loads(raw)
except json.JSONDecodeError:
# Now what? Retry? Regex? Give up?
pass# Eliminated: preamble, code fences, invalid JSON
# Still possible:
# 1. {"rating": "7/10"} (wrong type)
# 2. {"product": "..."} (wrong field name)
# 3. Missing fields
# But json.loads() always succeeds
data = json.loads(response.choices[0].message.content)
# You still need to validate the shapeApproach 3: Strict Structured Output with JSON Schema
This is the big upgrade. Instead of just asking for "some valid JSON," you hand OpenAI a JSON Schema that defines the exact fields, types, and structure. The model is constrained at the token-generation level — it literally cannot produce output that violates your schema. Field names, types, required fields, enum values — all enforced.
The response is guaranteed to match your schema. product_name will always be a string. rating will always be an integer. pros and cons will always be arrays of strings. The "strict": True flag enables constrained decoding — the model cannot deviate from the schema even if it "wants" to.
There are a few constraints when using strict mode. All fields listed in properties must appear in required. You must set "additionalProperties": false. Optional fields use {"type": ["string", "null"]} instead of just omitting them from required. These constraints exist because the grammar-based approach needs a fully deterministic schema.
Nested and Complex Schemas
Real-world data is rarely flat. You might need an array of objects, nested structures, or enum-constrained fields. Structured output handles all of these. The example below demonstrates three patterns at once: an array of objects (segments), a nullable field (growth_pct), and an enum constraint (sentiment).
Notice "growth_pct": {"type": ["number", "null"]}. This is how you make a field optional in strict mode — it must still be present in the output, but its value can be null. The automotive segment has no growth percentage mentioned in the article, so the model correctly returns null for that field.
Write a function called build_recipe_schema that returns a dictionary representing a JSON Schema for a recipe. The schema must define an object with these fields:
name (string, required)cuisine (string, must be one of: "italian", "mexican", "indian", "chinese", "japanese", "french", "other") (required)prep_time_minutes (integer, required)ingredients (array of objects, each with item (string) and quantity (string), both required, no additional properties) (required)vegetarian (boolean, required)All fields are required. Set additionalProperties to false at both the top level and inside the ingredient objects.
Return only the `"schema"` portion — the object containing "type", "properties", "required", and "additionalProperties".
Pydantic — Validating LLM Output in Python
JSON Schema works at the API level, but once the data lands in your Python code, you want a proper Python object — not a raw dictionary. Pydantic gives you exactly that: define a class with typed fields, pass in a dictionary, and get back a validated object with autocomplete, type checking, and clear error messages when something is wrong.
If you have used dataclasses, Pydantic will feel familiar. The key difference: Pydantic validates and coerces data on creation. A dataclass with rating: int silently accepts "7" as a string. A Pydantic model with rating: int either converts "7" to 7 or raises a validation error — your choice.
The Field(ge=1, le=10) adds a constraint: rating must be between 1 and 10 inclusive. If the LLM returns rating: 15, Pydantic raises a ValidationError with a clear message. This is your safety net — even if the LLM returns valid JSON with valid types, the values themselves might be nonsensical.
Catching Validation Errors
Two errors caught in one pass: rating exceeds the maximum, and cons is missing entirely. In a production pipeline, you'd catch this ValidationError, log it, and either retry the LLM call or fall back to a default. The error messages are specific enough for automated handling — you get the field name, the constraint that was violated, and the invalid value.
Combining Pydantic Models with OpenAI Structured Output
Here is where the pieces snap together. You define a Pydantic model, convert it to a JSON Schema for the API call, then validate the response with the same model. One source of truth for both the LLM constraint and the Python validation.
Pydantic's .model_json_schema() generates the JSON Schema automatically from your type annotations. Field descriptions become "description" entries in the schema, which helps the model understand what each field should contain. No hand-writing JSON Schema — the Python class is the single source of truth.
To use this with OpenAI's strict structured output, you need to adapt the schema slightly. Strict mode requires "additionalProperties": false and all properties in "required". Here is a helper that does the conversion:
Create a Pydantic model called JobPosting that validates job posting data extracted by an LLM. The model should have:
title (string, required)company (string, required)salary_min (integer, must be >= 0, required)salary_max (integer, must be >= 0, required)remote (boolean, required)required_skills (list of strings, required)Then write a function parse_job_posting(data: dict) that:
1. Tries to create a JobPosting from the dictionary
2. Returns the JobPosting object if validation succeeds
3. Returns None if validation fails
Also add a validation check: if salary_max < salary_min, it should fail validation. Use Pydantic's model_validator for this.
The Instructor Library — Pydantic-Native Structured Output
So far we have been building the pipeline ourselves: define a Pydantic model, convert it to JSON Schema, make the API call, parse the JSON, validate with Pydantic. The Instructor library wraps all of this into a single function call. You pass a Pydantic model as response_model, and Instructor handles the schema generation, API call, parsing, validation, and — crucially — automatic retries when validation fails.
That's it. No json.loads(), no manual schema conversion, no validation step. The return value is a typed Pydantic object. If you hover over profile.name in your IDE, you get str. If you try profile.age + "hello", your type checker flags it immediately.
Automatic Retries on Validation Failure
This is the feature that made me switch to Instructor for production work. When the LLM returns data that fails Pydantic validation, Instructor automatically retries the request with the validation error appended to the messages. The model sees what went wrong and corrects itself. You set max_retries and let it handle the loop.
If the model returns ticker: "aapl" (lowercase), Pydantic's field_validator rejects it. Instructor catches the ValidationError, appends a message like "Validation error: Ticker must be uppercase, got 'aapl'" to the conversation, and resends. On the second attempt the model usually gets it right. Three retries covers virtually every edge case I've encountered.
Extracting Lists of Objects
A common pattern: extract multiple structured items from a single text. Instructor handles this cleanly with a wrapper model containing a list field.
The Optional[str] = None pattern handles missing information gracefully. Not every contact has both email and phone. The model fills in what it finds and leaves the rest as None. Your code checks with a simple if contact.email — no KeyError, no missing-field surprises.
Choosing the Right Approach for Your Project
After working with all three approaches in production systems, here is my decision framework:
| Situation | Recommended Approach |
|---|---|
| Quick prototype, non-critical | Prompt-based JSON |
| Simple schema, one-off extraction | JSON mode ("type": "json_object") |
| Production system, flat schemas | Strict structured output ("type": "json_schema") |
| Complex/nested schemas, needs validation | Pydantic + strict structured output |
| Multi-provider, needs retries, complex validation | Instructor library |
For most projects, I start with strict structured output and Pydantic validation. If I find myself writing retry logic or fighting with edge cases, I upgrade to Instructor. The prompt-only approach is fine for Jupyter notebooks and one-off scripts — just don't deploy it.
Real-World Example: Extracting Structured Data from Emails
Let me put everything together with a realistic example. You have incoming customer emails and need to route them to the right department with structured metadata. This is exactly the kind of task I've built multiple times in production.
This pattern handles hundreds of emails per minute. The structured output guarantees that your routing logic downstream never gets a surprise field name or missing value. The Pydantic validation catches any edge cases where the model might produce technically valid JSON that doesn't make business sense.
Write a function called validate_and_clean that takes a list of dictionaries (simulating batch LLM output) and returns a tuple of (valid_items, error_count).
Each dictionary represents a book extracted by an LLM. Create a Pydantic model called Book with:
title (string)author (string)year (integer, between 1000 and 2030 inclusive)genre (string)page_count (integer, must be > 0)The function should:
1. Try to validate each dictionary as a Book
2. Collect all valid Book objects
3. Return (valid_books, error_count) where valid_books is a list of validated Book objects and error_count is the number of items that failed validation.
Common Mistakes and How to Fix Them
Mistake 1: Forgetting "JSON" in the System Message
When using "type": "json_object", OpenAI requires the word "JSON" to appear somewhere in the messages. This is a safety check — it prevents accidental activation of JSON mode. With "type": "json_schema" (strict mode), this requirement does not apply because the schema itself makes the intent clear.
Mistake 2: Not Setting additionalProperties to False
Mistake 3: Using Optional Fields Wrong in Strict Mode
In strict mode, every property must be in the required list. You cannot make a field optional by omitting it from required. Instead, use a union type with null:
{
"type": "object",
"properties": {
"name": {"type": "string"},
"nickname": {"type": "string"}
},
"required": ["name"], # nickname optional
"additionalProperties": false
}
# Error: strict mode requires all
# properties in "required"{
"type": "object",
"properties": {
"name": {"type": "string"},
"nickname": {"type": ["string", "null"]}
},
"required": ["name", "nickname"],
"additionalProperties": false
}
# Works: nickname is always present
# but can be nullFrequently Asked Questions
Does structured output cost more tokens?
Structured output uses the same pricing as regular completions — you pay per input and output token. The JSON Schema is included in the system prompt tokens, so very complex schemas add a small cost. In practice, the schema overhead is negligible compared to the actual content. OpenAI caches schemas across requests, so repeated calls with the same schema don't reprocess it.
Can I use structured output with streaming?
Yes. Both json_object and json_schema modes work with streaming. You receive JSON tokens incrementally and parse the complete object when the stream finishes. Instructor also supports streaming with partial validation — you can process fields as they arrive.
What happens if the LLM cannot fill a required field?
In strict mode, the model must provide a value for every required field. If the source text does not contain enough information, the model will generate its best guess or a placeholder (like an empty string or null for nullable fields). To handle this cleanly, make fields that might be missing nullable ({"type": ["string", "null"]} in JSON Schema, or Optional[str] = None in Pydantic) and check for None in your application code.
Is Instructor worth the extra dependency?
For simple schemas with a single LLM provider, Pydantic plus OpenAI's built-in structured output is sufficient. Instructor becomes valuable when you need automatic retries on validation failure, multi-provider support (switch between OpenAI, Anthropic, and Gemini without changing your extraction code), or complex nested schemas where the manual schema conversion gets tedious. If you are building a system that extracts structured data as a core feature, Instructor pays for itself in reduced boilerplate.
Summary
Structured output transforms LLMs from text generators into data extraction engines. The progression is clear: prompt-based JSON is quick but fragile, JSON mode guarantees valid syntax, strict schemas guarantee correct structure, and Pydantic adds Python-level type safety with meaningful error messages. The Instructor library ties everything together with automatic retries and multi-provider support.
The pattern you will use most often: define a Pydantic model with Field constraints, convert it to a strict JSON Schema for the API call, and validate the response with the same model. One class, three jobs — schema generation, API constraint, and data validation.