LangChain Chains: Sequential, Parallel, Branching, and Retry Patterns
You have a working LCEL chain that prompts an LLM and parses the output. One pipe, one model, one result. But real applications need more: summarise a document then translate the summary, call three models at once and merge the answers, route customer questions to different chains based on topic, and gracefully retry when an API times out. These are the four chain patterns every LangChain application needs, and I use all four in every production project I build.
Why Chain Patterns Matter
A single prompt-model-parser pipeline handles maybe 20% of real-world LLM use cases. The moment you need to feed one model's output into another, compare answers from multiple providers, or handle the inevitable API failures, you need composition patterns.
LangChain's LCEL (LangChain Expression Language) gives you four composable patterns that cover virtually every chain architecture I've encountered:
| Pattern | What It Does | When to Use |
|---|---|---|
| Sequential | Step A → Step B → Step C | Multi-step processing (summarise → translate) |
| Parallel | Steps A, B, C run simultaneously | Fan-out (ask 3 models, merge answers) |
| Branching | Route to different chains by condition | Topic-based routing, difficulty-based prompts |
| Retry / Fallback | If Step A fails, try Step B | API reliability, model failover |
Each pattern is a Runnable that you can compose with the pipe operator (|). Once you understand these four, you can build arbitrarily complex chains by nesting them.
Sequential Chains — Multi-Step Pipelines
The sequential chain is the simplest pattern: output from step A becomes input to step B. You've already used this if you've piped a prompt into a model into a parser. But multi-step sequential chains go further — they let you chain entire workflows where each step transforms the data for the next.
Here's a practical scenario: given a block of technical text, first summarise it in plain language, then translate the summary into Spanish. Two LLM calls, each with its own prompt, chained together.
The output will be a Spanish translation of the plain-language summary. The key detail to notice is the RunnableLambda in the middle — it repackages the string output from step 1 into a dictionary that matches step 2's prompt template variable. Without it, you'd get an error because translate_prompt expects a dict with a "summary" key, not a raw string.
Passing Multiple Values Between Steps
In many real workflows, the second step needs the original input and the first step's output. You can achieve this by merging the original input with intermediate results using RunnablePassthrough.
RunnableParallel here serves double duty. It runs the summarise chain and the passthrough in parallel, producing a dict with both summary and original_text keys. The comparison prompt then receives both values. I reach for this pattern constantly when building evaluation chains.
Parallel Chains — Fan-Out and Merge
Sometimes you need to run the same input through multiple chains at the same time. Maybe you're asking three different models the same question and comparing their answers. Maybe you're extracting sentiment, entities, and a summary from the same text simultaneously. RunnableParallel handles this.
All three LLM calls fire at the same time. Instead of three sequential round-trips, you pay the latency of only the slowest one. The result is a dictionary with one key per parallel branch.
Merging Parallel Results Into a Final Step
Running analyses in parallel is only half the pattern. Usually you want to feed the merged results into a final chain — a synthesis step that combines the parallel outputs into a single coherent response.
This is the fan-out-and-merge pattern. The parallel step produces a dict with sentiment, summary, and keywords. That dict flows directly into report_prompt because the template variables match the dict keys. No RunnableLambda needed — the shape already fits.
Write a function build_analysis_pipeline that takes a list of analysis names and returns a dictionary of results. Specifically:
1. Accept a list of analysis names (strings) and an input text.
2. Create a dictionary where each key is the analysis name (lowercased) and each value is a formatted string "Analysis: <name> | Input: <text>".
3. Return the dictionary.
This simulates the pattern of fanning out work and collecting results, without requiring an LLM.
Branching — Conditional Routing
Here's a situation I run into all the time: incoming queries should go to different chains depending on the topic. Technical questions get a detailed, code-heavy chain. Customer complaints get an empathetic, resolution-focused chain. General questions get a concise FAQ-style chain. RunnableBranch handles this.
RunnableBranch takes a sequence of (condition, runnable) tuples followed by a default runnable. It evaluates conditions top-to-bottom and runs the first match. The default at the end catches anything that doesn't match earlier conditions.
The keyword-matching approach above is simple but fragile. A more robust pattern is to use the LLM itself as the classifier — have one chain classify the query, then route based on the classification.
LLM-Powered Routing
Instead of hardcoding keywords, ask the model to classify the query first, then route based on its classification. This is the pattern I use in production because it handles ambiguous queries far better than keyword matching.
The first LLM call classifies the query, costing fractions of a cent. The second call uses the right specialised prompt. This two-hop approach is more expensive than keyword matching but dramatically more accurate for edge cases like "my API is broken" (technical? complaint? both?).
Retry and Fallback Patterns
LLM APIs fail. Rate limits, timeouts, server errors — if you're making hundreds of calls, failures are not "if" but "when." LangChain's .with_fallbacks() method lets you define backup chains that activate when the primary chain raises an exception.
The caller doesn't need to know which model responded. If OpenAI returns a successful response, Claude is never called. If OpenAI raises any exception — RateLimitError, APITimeoutError, APIConnectionError — the fallback chain takes over seamlessly.
Chaining Multiple Fallbacks
You're not limited to one fallback. Pass a list and LangChain tries each one in order until something succeeds.
I use this exact three-tier pattern in production: cloud primary, cloud backup from a different provider, local model as the last resort. The local model is slower and less capable, but it guarantees the user always gets a response.
Filtering Which Exceptions Trigger Fallbacks
By default, .with_fallbacks() catches any exception. But sometimes you only want to fall back on specific errors — a rate limit should trigger a fallback, but a malformed prompt should raise immediately so you can fix it.
This is a subtle but important distinction. Catching all exceptions can mask bugs. If your prompt template has a missing variable, you want that error to surface immediately, not silently switch to a different model that might also fail for the same reason.
Write a function call_with_fallbacks that simulates LangChain's fallback pattern.
1. Accept a list of callables (functions) and an input value.
2. Try each callable in order. If it returns a result without raising an exception, return that result.
3. If a callable raises any exception, catch it and move to the next one.
4. If all callables fail, raise a RuntimeError with the message "All fallbacks exhausted".
Batch Processing — Scaling to Many Inputs
Every Runnable in LangChain has a .batch() method that processes a list of inputs concurrently. This is essential when you need to classify 500 support tickets, summarise 100 documents, or extract entities from a batch of emails.
By default, .batch() runs up to 5 inputs concurrently. You can control this with the max_concurrency parameter.
Set max_concurrency based on your API provider's rate limits. OpenAI's tier 1 allows about 500 requests per minute, so max_concurrency=10 is usually safe. For free-tier Anthropic or Ollama running locally, keep it at 2-3.
Combining Patterns — Real-World Example
The power of these patterns comes from combining them. Here's a realistic scenario I've built variations of for multiple clients: a customer support pipeline that classifies incoming tickets, routes them to specialised handlers, and falls back to a generic handler if the specialised chain fails.
This pipeline uses all four patterns. The classifier and ticket passthrough run in parallel via RunnableParallel. The classifier's output determines the branch via the routing function. Each specialised handler has a fallback to the general chain. And the entire pipeline processes multiple tickets concurrently via .batch(). That's a production-grade support system in about 50 lines.
Common Mistakes and How to Fix Them
Mistake 1: Mismatched Keys Between Steps
The most frequent error I see is a KeyError because the output of one step doesn't match the input variables of the next step's prompt template.
# Step 1 output has key "result"
step1 = RunnableParallel(result=some_chain)
# Step 2 template expects key "summary"
step2_prompt = ChatPromptTemplate.from_template(
"Translate: {summary}" # KeyError!
)# Step 1 output has key "summary"
step1 = RunnableParallel(summary=some_chain)
# Step 2 template expects key "summary"
step2_prompt = ChatPromptTemplate.from_template(
"Translate: {summary}" # Matches!
)Mistake 2: Forgetting to Repackage String Outputs
When a chain ending in StrOutputParser() feeds into a prompt template, the raw string won't match the template's expected dictionary input.
chain = (
prompt_a | model | StrOutputParser()
| prompt_b # Expects {"text": ...}, gets a string
| model | StrOutputParser()
)chain = (
prompt_a | model | StrOutputParser()
| RunnableLambda(lambda s: {"text": s})
| prompt_b | model | StrOutputParser()
)Mistake 3: RunnableBranch Conditions Not Returning Booleans
Each condition in RunnableBranch must be a callable that returns True or False. Returning a truthy string or non-empty list works in Python but leads to confusing bugs when the "wrong" branch fires.
# This ALWAYS matches because non-empty strings are truthy
(lambda x: x.get("category", ""),
technical_chain)# Explicit comparison returns True/False
(lambda x: x.get("category", "") == "technical",
technical_chain)Mistake 4: Catching All Exceptions in Fallbacks
Using .with_fallbacks() without exceptions_to_handle catches everything, including programming errors that should crash loudly.
Write a function route_by_priority that simulates conditional routing.
1. Accept a dictionary with keys "priority" (string) and "message" (string).
2. If priority is "high", return "[URGENT] <message>".
3. If priority is "medium", return "[NORMAL] <message>".
4. For any other priority (including "low"), return "[LOW] <message>".
All comparisons should be case-insensitive.
Performance and Design Tips
After building dozens of chain architectures, here are the patterns that consistently matter for performance and maintainability.
Parallelise I/O-bound steps aggressively. If two LLM calls don't depend on each other, wrap them in RunnableParallel. The latency savings are linear — three 2-second calls become one 2-second wait.
Use the smallest model for classification and routing. A gpt-4o-mini call for routing costs ~$0.0001 and takes ~300ms. Using gpt-4o for the same routing call costs 30x more and takes 2x longer, with no accuracy improvement for simple classification tasks.
Set `max_concurrency` in batch calls. Without it, LangChain defaults to 5 concurrent requests. Set it higher for generous rate limits, lower for strict ones. I always set it explicitly rather than relying on the default.
Configure `max_retries` at the model level, not the chain level. ChatOpenAI(max_retries=3) handles transient failures with exponential backoff. Use .with_fallbacks() only when you want to switch to a genuinely different model or provider.
Quick Reference
| Pattern | LangChain Class | Key Method | Use When | |
|---|---|---|---|---|
| Sequential | Pipe operator (`\ | `) | .invoke() | Output of A feeds into B |
| Parallel | RunnableParallel | .invoke() | Multiple independent tasks on same input | |
| Branching | RunnableBranch | .invoke() | Route to different chains by condition | |
| Fallback | .with_fallbacks() | .invoke() | Graceful degradation on errors | |
| Batch | Any Runnable | .batch() | Process many inputs concurrently | |
| Passthrough | RunnablePassthrough | .invoke() | Carry original input alongside transformed data | |
| Custom logic | RunnableLambda | .invoke() | Transform data between steps |
Complete Code
Here is a self-contained script combining all four chain patterns into a single runnable example. Replace the API key and run it locally.
Frequently Asked Questions
Can I nest RunnableParallel inside RunnableBranch?
Yes. Every LangChain Runnable is composable with every other Runnable. You can put a RunnableParallel inside one branch of a RunnableBranch, or chain multiple RunnableBranch nodes in sequence. The pipe operator and all composition methods work regardless of nesting depth.
What happens if a branch condition raises an exception?
If a condition callable raises an exception, RunnableBranch does not catch it — it propagates up. Only the matched branch's runnable execution is subject to fallback handling. Wrap your condition logic in a try/except if it involves operations that might fail, such as parsing or external lookups.
Is .batch() the same as calling .invoke() in a loop with asyncio.gather?
Functionally, yes — .batch() runs .ainvoke() concurrently under the hood using asyncio. But .batch() also handles max_concurrency limiting, error collection, and consistent return ordering. Writing your own asyncio.gather is possible but .batch() handles the edge cases for you.
How do I log which fallback was used?
LangChain's callback system reports which runnable executed. Attach a logging callback to see which chain in the fallback sequence actually produced the result. The LangSmith UI also shows this visually in the trace view.