Output Formatting Masterclass: Make LLMs Output JSON, XML, Markdown, and Custom Formats

Intermediate60 min2 exercises35 XP

Prerequisites

Your First Gen AI App Prompt Engineering Basics

0/2 exercises

You ask an LLM to return a list of products as JSON. It responds with a lovely paragraph of prose, a few bullet points, and maybe some JSON buried inside a markdown code fence. Your json.loads() call explodes. You have been there — I certainly have, probably a hundred times before I figured out the patterns that actually work.

This tutorial shows you how to make LLMs reliably output JSON, XML, Markdown tables, and custom-delimited formats. More importantly, you will learn how to parse each format safely and what to do when the model inevitably gets creative with your instructions.

Why Output Format Matters More Than You Think

Every time you build an application on top of an LLM, you hit the same wall: the model produces text, but your code needs data. A chatbot can get away with free-form responses. But the moment you need to feed the LLM's answer into a database, render it in a UI, or chain it into another API call, you need structure.

I think of output formatting as the contract between your prompt and your parser. Get the contract right and your pipeline runs cleanly. Get it wrong and you spend more time writing error-recovery code than you spent on the actual feature.

The formats we will cover, and when each one shines:

JSON — the default choice for APIs and structured data. Every language has a parser.

XML — better than JSON for hierarchical or nested documents with metadata attributes.

Markdown — best for human-readable output that still has lightweight structure (tables, headers, lists).

Custom delimiters — the simplest option when you just need to extract a few fields reliably.

Setup: install openai and create a reusable helper

Loading editor...

We set temperature=0.2 because lower temperature means less creative variation — exactly what you want when you need the model to follow a strict format. We will talk more about when to adjust this later.

JSON via Prompting — The Most Common Format

JSON is the workhorse of structured LLM output. Every language can parse it, every API speaks it, and models are trained on enormous amounts of it. But asking for JSON and getting clean JSON are two different things.

The naive approach — just saying "respond in JSON" — fails more often than you would expect. The model wraps the JSON in a markdown code fence, adds an introductory sentence, or produces keys that do not match your schema. Here is a prompt pattern I have found reliable across GPT-4o, Claude, and Gemini:

A reliable JSON prompt pattern

Loading editor...

Three things make this prompt work: showing the exact schema (not just describing it), using the pipe notation for enumerations, and the explicit "no markdown fences" instruction. Without that last rule, most models wrap the output in `

...

which breaks json.loads()`.

Parsing JSON Safely

Even with a good prompt, you should never trust that the response is valid JSON. Models occasionally add trailing commas, use single quotes, or sneak in a comment. Here is a defensive parsing function that handles the most common failure modes:

Defensive JSON parser for LLM output

Loading editor...

This parser handles three common issues: markdown code fences around the JSON, extra text before or after the JSON object, and JSON arrays instead of objects. I use a version of this function in every LLM project I build.

Batch JSON — Multiple Objects in One Call

Sometimes you need the model to return multiple structured objects — say, analyzing five reviews at once. Asking for a JSON array works, but you need to be explicit about it:

Getting a JSON array from the model

Loading editor...

The model returns a list of three objects, each with the exact keys we specified. Batch processing like this is cheaper than making three separate API calls because you pay for fewer prompt tokens (the system message and instructions are only sent once).

Exercise 1: Parse and Validate LLM JSON

Write Code

Write a function extract_fields(text) that takes a string containing JSON (possibly wrapped in markdown code fences) and returns a dictionary with only the keys "name", "age", and "city". If the JSON contains extra keys, ignore them. If any of the three required keys are missing, set their value to None.

The function should:

1. Strip markdown code fences if present

2. Parse the JSON

3. Return a dict with exactly three keys: name, age, city

Loading editor...

XML for Hierarchical Data

JSON is great for flat or lightly nested structures. But when your data is deeply hierarchical — think a document outline, a conversation tree, or configuration with attributes and metadata — XML can actually be a better fit. This might sound old-fashioned, but there is a practical reason: XML tags are self-closing and unambiguous, which makes partial or malformed output easier to recover from.

The key advantage of XML over JSON for LLM output is that attributes and content are separate. A JSON object uses the same mechanism (keys) for metadata and data. An XML element can carry metadata in attributes and data in its text content. That distinction matters when you are building document-processing pipelines.

Prompting for XML output

Loading editor...

Parsing XML in Python is straightforward with the built-in xml.etree.ElementTree module — no pip install needed:

Parsing XML with ElementTree

Loading editor...

Notice how cleanly the attributes map to metadata (department name, headcount) while the nesting represents the hierarchy (company contains departments, departments contain roles). This would work in JSON too, but the XML version reads more naturally when the hierarchy is the point.

Custom Delimiters — The Simplest Reliable Format

Using custom delimiters for structured extraction

Loading editor...

I reach for custom delimiters when I need to extract 3-6 fields and do not want the overhead of JSON schema definitions. The triple-equals pattern (===FIELD===) works well because it is visually distinctive, unlikely to appear in normal text, and trivial to parse with a regex or simple string split.

Parser for custom-delimited output

Loading editor...

The parser walks through the lines, tracking which field it is currently inside. When it hits a new ===FIELD=== marker, it saves the previous field and starts a new one. This handles multi-line values gracefully — if the model puts a paragraph between two markers, you get the whole paragraph.

Exercise 2: Build a Delimiter Parser

Write Code

Write a function parse_sections(text) that parses text using ###SECTION_NAME### delimiters (note: three # on each side). The function should return a dictionary mapping lowercase section names to their content (stripped of leading/trailing whitespace). Ignore any text before the first delimiter.

Example input:

###TITLE###
My Report
###SUMMARY###
This is a brief summary.
It has two lines.
###END###

Expected output: {"title": "My Report", "summary": "This is a brief summary.\nIt has two lines."}

The ###END### marker signals the end of parsing — do not include it as a key.

Loading editor...

Markdown — Structured Output That Humans Can Read

Markdown sits in a sweet spot: it has enough structure for light parsing (headers, tables, lists) but is still comfortable for humans to read. When your output needs to be both machine-parseable and directly displayable — think reports, summaries, or documentation — Markdown is the right choice.

Markdown tables are especially useful. They are the one format where I find models are almost always reliable, probably because LLMs have seen millions of Markdown tables in training data.

Getting a Markdown table from the model

Loading editor...

Parsing a Markdown table into a list of dictionaries takes just a few lines:

Parsing Markdown tables into dictionaries

Loading editor...

Each row becomes a dictionary with the header names as keys. This is handy when you need to render the data in a different format — you could convert these dicts to a pandas DataFrame, a CSV, or pass them to a template.

Format Reliability Comparison — Which Format Fails Least?

After running thousands of structured output requests across GPT-4o, GPT-4o-mini, Claude, and Gemini, I have a rough reliability ranking. This is practical experience, not a controlled benchmark — but the patterns are consistent enough to be useful.

Format	Reliability	Parsing Difficulty	Best For
Custom delimiters	Highest	Easiest	3-6 simple fields
Markdown table	High	Easy	Tabular comparisons
JSON (with schema)	High	Medium	APIs, databases, structured data
JSON (without schema)	Medium	Medium	Quick prototyping
XML	Medium	Medium	Hierarchical documents
Free-form with structure	Low	Hard	Avoid in production

Custom delimiters are the most reliable because they are the simplest — the model just needs to put text between markers. JSON with an explicit schema is close behind, especially with temperature=0. Markdown tables are surprisingly reliable because models produce them constantly during training.

Hardening Your Output Formatting

Even the best prompts fail sometimes. Production code needs fallback strategies. These are the techniques I use to push format compliance from ~90% to ~99%.

Technique 1: Temperature and Top-P

For structured output, set temperature between 0.0 and 0.3. Higher temperatures increase creativity — exactly the opposite of what you want when asking for precise formatting. If you need varied content within a strict format, keep temperature at 0.2 and use top_p=0.9.

Technique 2: Few-Shot Examples

Showing the model a completed example is one of the most effective ways to get consistent formatting. The model mirrors the structure it sees:

Few-shot example for consistent JSON formatting

Loading editor...

The model follows the example's exact key names, value types, and structure. One example is usually enough for simple schemas. For complex schemas with edge cases, two or three examples work better.

Technique 3: Retry with Correction

When parsing fails, you can send the malformed output back to the model and ask it to fix the formatting. This succeeds roughly 95% of the time on the retry:

Auto-retry with correction for JSON output

Loading editor...

The retry loop tries to parse the response, and if it fails, sends the error message and the malformed text back to the model for correction. This is cheap — the correction call uses very few tokens — and dramatically improves reliability.

Common Mistakes and How to Fix Them

These are the formatting failures I see most often in code reviews and Slack channels. Each one is easy to fix once you know the pattern.

Vague format instruction

prompt = "Analyze this review and give me JSON."

Explicit schema with rules

prompt = """Analyze this review. Return JSON:
{"sentiment": "positive"|"negative", "score": <float 0-1>}
Return ONLY the JSON object."""

Without an explicit schema, the model invents its own key names every time. One call returns "sentiment", the next returns "feeling", the next returns "opinion". Your downstream code breaks on every variation.

Trusting raw output

data = json.loads(response)  # crashes on markdown fences

Defensive parsing

data = parse_llm_json(response)  # handles fences and extra text

Even GPT-4o wraps JSON in code fences roughly 15-20% of the time, depending on the prompt. A raw json.loads() call is a ticking time bomb in any production system.

Real-World Example: Combining Formats in a Data Pipeline

Real-world applications rarely use just one format. A pipeline might need JSON for an API, Markdown for email, and tagged items for a task tracker — all from the same LLM call. Custom delimiters work as the outer container, with each section using whatever inner format fits best.

Here is a practical example. Imagine you are building a tool that takes raw meeting notes and produces structured output for three different consumers:

Multi-format pipeline: processing meeting notes

Loading editor...

Parsing each section for its downstream consumer

Loading editor...

One LLM call produces output for three systems. The JSON goes to an API, the Markdown goes to an email template, and the action items go to a task tracker. The custom delimiters let us extract each section and parse it with the appropriate method.

Frequently Asked Questions

Should I use the OpenAI `response_format` parameter instead of prompt-based formatting?

The response_format: { type: 'json_object' } parameter is a great option when available — it guarantees valid JSON from the API level. But it is provider-specific (OpenAI and a few others), and it still does not guarantee your schema. You get valid JSON, but the keys and structure are whatever the model decides. The prompt-based techniques in this article work across all providers and give you schema control. In practice, I use both: response_format for the structural guarantee plus a schema in the prompt for key control.

What about Pydantic or structured outputs via function calling?

Function calling (also called tool use) and Pydantic-based structured outputs are the most reliable way to get schema-compliant JSON. They are covered in separate tutorials. The prompt-based approach in this article is valuable because it works with any model, including open-source models via Ollama or Hugging Face that may not support function calling. It is also simpler — you do not need to define Pydantic models or tool schemas.

How do I handle very long outputs that might get truncated?

If your expected output is large (>2000 tokens), set max_tokens explicitly to a high enough value. If the model hits the token limit mid-JSON, the output will be truncated and unparseable. For very large structured outputs, break the task into smaller chunks — process 10 items at a time instead of 100.

Does this work with open-source models like Llama or Mistral?

Yes. The prompt patterns work with any instruction-tuned model. Smaller models (7B-13B) are less reliable at following complex schemas, so I recommend simpler formats (custom delimiters or flat JSON) with smaller models and save nested JSON/XML for larger models (70B+ or GPT-4 class).

References

OpenAI API documentation — Chat Completions: response_format. Link

OpenAI documentation — Structured Outputs. Link

Python documentation — json module. Link

Python documentation — xml.etree.ElementTree. Link

Anthropic documentation — Tool Use (Structured Output). Link

Google Gemini API — Structured Output. Link

Complete Code

Click to expand the full script (copy-paste and run)

Complete utility code (no API calls)

Loading editor...

</details>

Why Output Format Matters More Than You Think

JSON via Prompting — The Most Common Format

Parsing JSON Safely

Batch JSON — Multiple Objects in One Call

XML for Hierarchical Data

Custom Delimiters — The Simplest Reliable Format

Markdown — Structured Output That Humans Can Read

Format Reliability Comparison — Which Format Fails Least?

Hardening Your Output Formatting

Technique 1: Temperature and Top-P

Technique 2: Few-Shot Examples

Technique 3: Retry with Correction

Common Mistakes and How to Fix Them

Real-World Example: Combining Formats in a Data Pipeline

Frequently Asked Questions

References

Complete Code

Related Tutorials