Skip to main content

Output Formatting Masterclass: Make LLMs Output JSON, XML, Markdown, and Custom Formats

Intermediate60 min2 exercises35 XP
0/2 exercises

You ask an LLM to return a list of products as JSON. It responds with a lovely paragraph of prose, a few bullet points, and maybe some JSON buried inside a markdown code fence. Your json.loads() call explodes. You have been there — I certainly have, probably a hundred times before I figured out the patterns that actually work.

This tutorial shows you how to make LLMs reliably output JSON, XML, Markdown tables, and custom-delimited formats. More importantly, you will learn how to parse each format safely and what to do when the model inevitably gets creative with your instructions.

Why Output Format Matters More Than You Think

Every time you build an application on top of an LLM, you hit the same wall: the model produces text, but your code needs data. A chatbot can get away with free-form responses. But the moment you need to feed the LLM's answer into a database, render it in a UI, or chain it into another API call, you need structure.

I think of output formatting as the contract between your prompt and your parser. Get the contract right and your pipeline runs cleanly. Get it wrong and you spend more time writing error-recovery code than you spent on the actual feature. This builds directly on the prompt engineering fundamentals we cover elsewhere — here we focus specifically on format control.

Setup: install openai and create a reusable helper
Loading editor...

We set temperature=0.2 because lower temperature means less creative variation — exactly what you want when you need the model to follow a strict format. Our sampling parameters tutorial covers how temperature, top_p, and other knobs affect output consistency.

JSON via Prompting — The Most Common Format

JSON is the workhorse of structured LLM output. Every language can parse it, every API speaks it, and models are trained on enormous amounts of it. If you have worked through our first AI app tutorial, you have already seen the basics of sending prompts and getting responses. But asking for JSON and getting clean JSON are two different things.

The naive approach — just saying "respond in JSON" — fails more often than you would expect. The model wraps the JSON in a markdown code fence, adds an introductory sentence, or produces keys that do not match your schema. Here is a prompt pattern I have found reliable across GPT-4o, Claude, and Gemini:

A reliable JSON prompt pattern
Loading editor...

Three things make this prompt work: showing the exact schema (not just describing it), using the pipe notation for enumerations, and the explicit "no markdown fences" instruction. Without that last rule, most models wrap the output in `

...

which breaks json.loads()`.

Parsing JSON Safely

Even with a good prompt, you should never trust that the response is valid JSON. Models occasionally add trailing commas, use single quotes, or sneak in a comment. Here is a defensive parsing function that handles the most common failure modes:

Defensive JSON parser for LLM output
Loading editor...

This parser handles three common issues: markdown code fences around the JSON, extra text before or after the JSON object, and JSON arrays instead of objects. I use a version of this function in every LLM project I build.

Batch JSON — Multiple Objects in One Call

Sometimes you need the model to return multiple structured objects — say, analyzing five reviews at once. Asking for a JSON array works, but you need to be explicit about it:

Getting a JSON array from the model
Loading editor...

The model returns a list of three objects, each with the exact keys we specified. Batch processing like this is cheaper than making three separate API calls because you pay for fewer prompt tokens (the system message and instructions are only sent once).

Exercise 1: Parse and Validate LLM JSON
Write Code

Write a function extract_fields(text) that takes a string containing JSON (possibly wrapped in markdown code fences) and returns a dictionary with only the keys "name", "age", and "city". If the JSON contains extra keys, ignore them. If any of the three required keys are missing, set their value to None.

The function should:

1. Strip markdown code fences if present

2. Parse the JSON

3. Return a dict with exactly three keys: name, age, city

Loading editor...

XML for Hierarchical Data

JSON is great for flat or lightly nested structures. But when your data is deeply hierarchical — think a document outline, a conversation tree, or configuration with attributes and metadata — XML can actually be a better fit. This might sound old-fashioned, but there is a practical reason: XML tags are self-closing and unambiguous, which makes partial or malformed output easier to recover from.

The following prompt asks the model to extract an organizational structure from a company description and return it as nested XML. Each <department> element carries its name and headcount as attributes, and contains <role> child elements with title and seniority level. This maps naturally to a tree structure that would be awkward to represent in flat JSON.

Prompting for XML output
Loading editor...

Parsing XML in Python is straightforward with the built-in xml.etree.ElementTree module — no pip install needed. The parser below first strips any markdown fences the model might add, then uses a regex to locate the root XML element. Once parsed, root.findall('department') walks the tree and .get('name') pulls attributes. If you have used our OpenAI API tutorial, you will recognize the await pattern for API calls.

Parsing XML with ElementTree
Loading editor...

Notice how cleanly the attributes map to metadata (department name, headcount) while the nesting represents the hierarchy (company contains departments, departments contain roles). This would work in JSON too, but the XML version reads more naturally when the hierarchy is the point.

Custom Delimiters — The Simplest Reliable Format

Sometimes JSON and XML are overkill. You need to pull out a title, an author, a date, and a few tags — four flat fields, no nesting. For that, I skip formal schemas entirely and use custom delimiters: distinctive markers like ===TITLE=== that tell the model exactly where each field starts and ends. The prompt below extracts metadata from a research article description using this pattern.

Using custom delimiters for structured extraction
Loading editor...

I reach for custom delimiters when I need to extract 3-6 fields and do not want the overhead of JSON schema definitions. The triple-equals pattern (===FIELD===) works well because it is visually distinctive, unlikely to appear in normal text, and trivial to parse with a regex or simple string split.

Parser for custom-delimited output
Loading editor...

The parser walks through the lines, tracking which field it is currently inside. When it hits a new ===FIELD=== marker, it saves the previous field and starts a new one. This handles multi-line values gracefully — if the model puts a paragraph between two markers, you get the whole paragraph.

Exercise 2: Build a Delimiter Parser
Write Code

Write a function parse_sections(text) that parses text using ###SECTION_NAME### delimiters (note: three # on each side). The function should return a dictionary mapping lowercase section names to their content (stripped of leading/trailing whitespace). Ignore any text before the first delimiter.

Example input:

###TITLE###
My Report
###SUMMARY###
This is a brief summary.
It has two lines.
###END###

Expected output: {"title": "My Report", "summary": "This is a brief summary.\nIt has two lines."}

The ###END### marker signals the end of parsing — do not include it as a key.

Loading editor...

Markdown — Structured Output That Humans Can Read

Markdown sits in a sweet spot: it has enough structure for light parsing (headers, tables, lists) but is still comfortable for humans to read. When your output needs to be both machine-parseable and directly displayable — think reports, summaries, or documentation — Markdown is the right choice.

Markdown tables are especially useful. They are the one format where I find models are almost always reliable, probably because LLMs have seen millions of Markdown tables in training data. The prompt below asks the model to compare Django, Flask, FastAPI, and Tornado across five columns: best use case, learning curve, async support, and community size. Specifying the exact column headers prevents the model from inventing its own.

Getting a Markdown table from the model
Loading editor...
Parsing a Markdown table into a list of dictionaries is straightforward once you understand the structure. The parser splits each row by the pipe character `, extracts the first row as column headers, skips the separator row (the ------` line), and zips each data row's cells with the headers to produce a dictionary per row. The result is a list you can feed to pandas, write to CSV, or pass to another API.
Parsing Markdown tables into dictionaries
Loading editor...

Each row becomes a dictionary with the header names as keys. This is handy when you need to render the data in a different format — you could convert these dicts to a pandas DataFrame, a CSV, or pass them to a template.

Format Reliability Comparison — Which Format Fails Least?

After running thousands of structured output requests across GPT-4o, GPT-4o-mini, Claude, and Gemini, I have a rough reliability ranking. This is practical experience, not a controlled benchmark — but the patterns are consistent enough to be useful.

FormatReliabilityParsing DifficultyBest For
Custom delimitersHighestEasiest3-6 simple fields
Markdown tableHighEasyTabular comparisons
JSON (with schema)HighMediumAPIs, databases, structured data
JSON (without schema)MediumMediumQuick prototyping
XMLMediumMediumHierarchical documents
Free-form with structureLowHardAvoid in production

Hardening Your Output Formatting

Even the best prompts fail sometimes. Production code needs fallback strategies. These are the techniques I use to push format compliance from ~90% to ~99%.

Technique 1: Temperature and Top-P

For structured output, set temperature between 0.0 and 0.3. Higher temperatures increase creativity — exactly the opposite of what you want when asking for precise formatting. If you need varied content within a strict format, keep temperature at 0.2 and use top_p=0.9.

Technique 2: Few-Shot Examples

Showing the model a completed example is one of the most effective ways to get consistent formatting — this is the few-shot prompting technique applied to output structure. The model mirrors the structure it sees:

Few-shot example for consistent JSON formatting
Loading editor...

The model follows the example's exact key names, value types, and structure. One example is usually enough for simple schemas. For complex schemas with edge cases, two or three examples work better.

Technique 3: Retry with Correction

When parsing fails, you can send the malformed output back to the model and ask it to fix the formatting. This succeeds roughly 95% of the time on the retry:

Auto-retry with correction for JSON output
Loading editor...

The retry loop tries to parse the response, and if it fails, sends the error message and the malformed text back to the model for correction. This is cheap — the correction call uses very few tokens — and dramatically improves reliability.

Common Mistakes and How to Fix Them

These are the formatting failures I see most often in code reviews and Slack channels. Each one is easy to fix once you know the pattern.

Vague format instruction
prompt = "Analyze this review and give me JSON."
Explicit schema with rules
prompt = """Analyze this review. Return JSON:
{"sentiment": "positive"|"negative", "score": <float 0-1>}
Return ONLY the JSON object."""

Without an explicit schema, the model invents its own key names every time. One call returns "sentiment", the next returns "feeling", the next returns "opinion". Your downstream code breaks on every variation.

Trusting raw output
data = json.loads(response)  # crashes on markdown fences
Defensive parsing
data = parse_llm_json(response)  # handles fences and extra text

Even GPT-4o wraps JSON in code fences roughly 15-20% of the time, depending on the prompt. A raw json.loads() call is a ticking time bomb in any production system.

Real-World Example: Combining Formats in a Data Pipeline

Real-world applications rarely use just one format. A pipeline might need JSON for an API, Markdown for email, and tagged items for a task tracker — all from the same LLM call. Custom delimiters work as the outer container, with each section using whatever inner format fits best.

The prompt below takes raw meeting notes and produces three outputs in a single API call, each wrapped in ===SECTION=== delimiters. The ===JSON_SUMMARY=== section generates a JSON object with date, attendees, and item count for an API. The ===MARKDOWN_REPORT=== section generates a human-readable Markdown summary for email. The ===ACTION_ITEMS=== section generates a plain-text list of tasks with owners and deadlines for a task tracker.

Multi-format pipeline: processing meeting notes
Loading editor...

Once we have the raw output, we split it into sections with parse_delimited() — the same function we built earlier. Then each section gets parsed with the appropriate method: parse_llm_json() for the JSON summary, plain text display for the Markdown, and a line split for the action items.

Parsing each section for its downstream consumer
Loading editor...

One LLM call produces output for three systems. The JSON goes to an API, the Markdown goes to an email template, and the action items go to a task tracker. The custom delimiters let us extract each section and parse it with the appropriate method.


Frequently Asked Questions

Should I use the OpenAI response_format parameter instead?

The response_format: { type: 'json_object' } parameter is a great option when available — it guarantees valid JSON from the API level. But it is provider-specific (OpenAI and a few others), and it still does not guarantee your schema. You get valid JSON, but the keys and structure are whatever the model decides. The prompt-based techniques in this article work across all providers and give you schema control. In practice, I use both: response_format for the structural guarantee plus a schema in the prompt for key control. For more on native structured outputs, see our Structured Output from LLMs tutorial.


What about Pydantic or function calling for structured output?

Function calling (also called tool use) and Pydantic-based structured outputs are the most reliable way to get schema-compliant JSON. Our OpenAI Function Calling tutorial covers this in depth. The prompt-based approach in this article is valuable because it works with any model, including open-source models via Ollama or Hugging Face that may not support function calling. It is also simpler — you do not need to define Pydantic models or tool schemas.


How do I handle very long outputs that might get truncated?

If your expected output is large (>2000 tokens), set max_tokens explicitly to a high enough value. If the model hits the token limit mid-JSON, the output will be truncated and unparseable. For very large structured outputs, break the task into smaller chunks — process 10 items at a time instead of 100. Our context windows tutorial explains how to estimate token counts and budget your requests.


Does this work with open-source models like Llama or Mistral?

Yes. The prompt patterns work with any instruction-tuned model. Smaller models (7B-13B) are less reliable at following complex schemas, so I recommend simpler formats (custom delimiters or flat JSON) with smaller models and save nested JSON/XML for larger models (70B+ or GPT-4 class). See our sampling parameters tutorial for how temperature and top_p settings affect format compliance across model sizes.

What to Learn Next

References

  • OpenAI API documentation — Chat Completions: response_format. Link
  • OpenAI documentation — Structured Outputs. Link
  • Python documentation — json module. Link
  • Python documentation — xml.etree.ElementTree. Link
  • Anthropic documentation — Tool Use (Structured Output). Link
  • Google Gemini API — Structured Output. Link
  • Complete Code

    Click to expand the full script (copy-paste and run)
    Complete utility code (no API calls)
    Loading editor...

    </details>

    Related Tutorials

    Save your progress across devices

    Never lose your code, challenges, or XP. Sign up free — no password needed.

    Already have an account?