LangChain Chains: Sequential, Parallel, Branching, and Retry Patterns

Intermediate90 min3 exercises45 XP

Prerequisites

0/3 exercises

You have a working LCEL chain that prompts an LLM and parses the output. One pipe, one model, one result. But real applications need more: summarise a document then translate the summary, call three models at once and merge the answers, route customer questions to different chains based on topic, and gracefully retry when an API times out. These are the four chain patterns every LangChain application needs, and I use all four in every production project I build.

Why Chain Patterns Matter

A single prompt-model-parser pipeline handles maybe 20% of real-world LLM use cases. The moment you need to feed one model's output into another, compare answers from multiple providers, or handle the inevitable API failures, you need composition patterns.

LangChain's LCEL (LangChain Expression Language) gives you four composable patterns that cover virtually every chain architecture I've encountered:

Pattern	What It Does	When to Use
Sequential	Step A → Step B → Step C	Multi-step processing (summarise → translate)
Parallel	Steps A, B, C run simultaneously	Fan-out (ask 3 models, merge answers)
Branching	Route to different chains by condition	Topic-based routing, difficulty-based prompts
Retry / Fallback	If Step A fails, try Step B	API reliability, model failover

Each pattern is a Runnable that you can compose with the pipe operator (|). Once you understand these four, you can build arbitrarily complex chains by nesting them.

Sequential Chains — Multi-Step Pipelines

The sequential chain is the simplest pattern: output from step A becomes input to step B. You've already used this if you've piped a prompt into a model into a parser. But multi-step sequential chains go further — they let you chain entire workflows where each step transforms the data for the next.

Here's a practical scenario: given a block of technical text, first summarise it in plain language, then translate the summary into Spanish. Two LLM calls, each with its own prompt, chained together.

Sequential chain: summarise then translate

Loading editor...

The output will be a Spanish translation of the plain-language summary. The key detail to notice is the RunnableLambda in the middle — it repackages the string output from step 1 into a dictionary that matches step 2's prompt template variable. Without it, you'd get an error because translate_prompt expects a dict with a "summary" key, not a raw string.

Passing Multiple Values Between Steps

In many real workflows, the second step needs the original input and the first step's output. You can achieve this by merging the original input with intermediate results using RunnablePassthrough.

Preserving original input across steps

Loading editor...

RunnableParallel here serves double duty. It runs the summarise chain and the passthrough in parallel, producing a dict with both summary and original_text keys. The comparison prompt then receives both values. I reach for this pattern constantly when building evaluation chains.

Parallel Chains — Fan-Out and Merge

Sometimes you need to run the same input through multiple chains at the same time. Maybe you're asking three different models the same question and comparing their answers. Maybe you're extracting sentiment, entities, and a summary from the same text simultaneously. RunnableParallel handles this.

Fan-out: run three analyses in parallel

Loading editor...

All three LLM calls fire at the same time. Instead of three sequential round-trips, you pay the latency of only the slowest one. The result is a dictionary with one key per parallel branch.

Merging Parallel Results Into a Final Step

Running analyses in parallel is only half the pattern. Usually you want to feed the merged results into a final chain — a synthesis step that combines the parallel outputs into a single coherent response.

Fan-out then merge: parallel analysis with synthesis

Loading editor...

This is the fan-out-and-merge pattern. The parallel step produces a dict with sentiment, summary, and keywords. That dict flows directly into report_prompt because the template variables match the dict keys. No RunnableLambda needed — the shape already fits.

Exercise 1: Build a Parallel Processing Pipeline

Write Code

Write a function build_analysis_pipeline that takes a list of analysis names and returns a dictionary of results. Specifically:

1. Accept a list of analysis names (strings) and an input text.

2. Create a dictionary where each key is the analysis name (lowercased) and each value is a formatted string "Analysis: <name> | Input: <text>".

3. Return the dictionary.

This simulates the pattern of fanning out work and collecting results, without requiring an LLM.

Loading editor...

Branching — Conditional Routing

Here's a situation I run into all the time: incoming queries should go to different chains depending on the topic. Technical questions get a detailed, code-heavy chain. Customer complaints get an empathetic, resolution-focused chain. General questions get a concise FAQ-style chain. RunnableBranch handles this.

Routing queries to different chains based on topic

Loading editor...

RunnableBranch takes a sequence of (condition, runnable) tuples followed by a default runnable. It evaluates conditions top-to-bottom and runs the first match. The default at the end catches anything that doesn't match earlier conditions.

The keyword-matching approach above is simple but fragile. A more robust pattern is to use the LLM itself as the classifier — have one chain classify the query, then route based on the classification.

LLM-Powered Routing

Instead of hardcoding keywords, ask the model to classify the query first, then route based on its classification. This is the pattern I use in production because it handles ambiguous queries far better than keyword matching.

LLM-based classification then routing

Loading editor...

The first LLM call classifies the query, costing fractions of a cent. The second call uses the right specialised prompt. This two-hop approach is more expensive than keyword matching but dramatically more accurate for edge cases like "my API is broken" (technical? complaint? both?).

Retry and Fallback Patterns

LLM APIs fail. Rate limits, timeouts, server errors — if you're making hundreds of calls, failures are not "if" but "when." LangChain's .with_fallbacks() method lets you define backup chains that activate when the primary chain raises an exception.

Model failover: OpenAI with Claude fallback

Loading editor...

The caller doesn't need to know which model responded. If OpenAI returns a successful response, Claude is never called. If OpenAI raises any exception — RateLimitError, APITimeoutError, APIConnectionError — the fallback chain takes over seamlessly.

Chaining Multiple Fallbacks

You're not limited to one fallback. Pass a list and LangChain tries each one in order until something succeeds.

Cascading fallbacks: three models deep

Loading editor...

I use this exact three-tier pattern in production: cloud primary, cloud backup from a different provider, local model as the last resort. The local model is slower and less capable, but it guarantees the user always gets a response.

Filtering Which Exceptions Trigger Fallbacks

By default, .with_fallbacks() catches any exception. But sometimes you only want to fall back on specific errors — a rate limit should trigger a fallback, but a malformed prompt should raise immediately so you can fix it.

Fall back only on specific exception types

Loading editor...

This is a subtle but important distinction. Catching all exceptions can mask bugs. If your prompt template has a missing variable, you want that error to surface immediately, not silently switch to a different model that might also fail for the same reason.

Exercise 2: Implement a Fallback Pipeline

Write Code

Write a function call_with_fallbacks that simulates LangChain's fallback pattern.

1. Accept a list of callables (functions) and an input value.

2. Try each callable in order. If it returns a result without raising an exception, return that result.

3. If a callable raises any exception, catch it and move to the next one.

4. If all callables fail, raise a RuntimeError with the message "All fallbacks exhausted".

Loading editor...

Batch Processing — Scaling to Many Inputs

Every Runnable in LangChain has a .batch() method that processes a list of inputs concurrently. This is essential when you need to classify 500 support tickets, summarise 100 documents, or extract entities from a batch of emails.

Batch processing with concurrency control

Loading editor...

By default, .batch() runs up to 5 inputs concurrently. You can control this with the max_concurrency parameter.

Controlling concurrency to respect rate limits

Loading editor...

Set max_concurrency based on your API provider's rate limits. OpenAI's tier 1 allows about 500 requests per minute, so max_concurrency=10 is usually safe. For free-tier Anthropic or Ollama running locally, keep it at 2-3.

Combining Patterns — Real-World Example

The power of these patterns comes from combining them. Here's a realistic scenario I've built variations of for multiple clients: a customer support pipeline that classifies incoming tickets, routes them to specialised handlers, and falls back to a generic handler if the specialised chain fails.

Full support pipeline: classify, route, fallback, batch

Loading editor...

This pipeline uses all four patterns. The classifier and ticket passthrough run in parallel via RunnableParallel. The classifier's output determines the branch via the routing function. Each specialised handler has a fallback to the general chain. And the entire pipeline processes multiple tickets concurrently via .batch(). That's a production-grade support system in about 50 lines.

Common Mistakes and How to Fix Them

Mistake 1: Mismatched Keys Between Steps

The most frequent error I see is a KeyError because the output of one step doesn't match the input variables of the next step's prompt template.

Breaks: output key doesn't match template variable

# Step 1 output has key "result"
step1 = RunnableParallel(result=some_chain)

# Step 2 template expects key "summary"
step2_prompt = ChatPromptTemplate.from_template(
    "Translate: {summary}"  # KeyError!
)

Works: keys match between steps

# Step 1 output has key "summary"
step1 = RunnableParallel(summary=some_chain)

# Step 2 template expects key "summary"
step2_prompt = ChatPromptTemplate.from_template(
    "Translate: {summary}"  # Matches!
)

Mistake 2: Forgetting to Repackage String Outputs

When a chain ending in StrOutputParser() feeds into a prompt template, the raw string won't match the template's expected dictionary input.

Breaks: string fed directly into prompt expecting dict

chain = (
    prompt_a | model | StrOutputParser()
    | prompt_b  # Expects {"text": ...}, gets a string
    | model | StrOutputParser()
)

Works: RunnableLambda repackages the string

chain = (
    prompt_a | model | StrOutputParser()
    | RunnableLambda(lambda s: {"text": s})
    | prompt_b | model | StrOutputParser()
)

Mistake 3: RunnableBranch Conditions Not Returning Booleans

Each condition in RunnableBranch must be a callable that returns True or False. Returning a truthy string or non-empty list works in Python but leads to confusing bugs when the "wrong" branch fires.

Fragile: truthy string, always matches

# This ALWAYS matches because non-empty strings are truthy
(lambda x: x.get("category", ""),
 technical_chain)

Correct: explicit boolean comparison

# Explicit comparison returns True/False
(lambda x: x.get("category", "") == "technical",
 technical_chain)

Mistake 4: Catching All Exceptions in Fallbacks

Using .with_fallbacks() without exceptions_to_handle catches everything, including programming errors that should crash loudly.

Be specific about which exceptions trigger fallbacks

Loading editor...

Exercise 3: Build a Priority Router

Write Code

Write a function route_by_priority that simulates conditional routing.

1. Accept a dictionary with keys "priority" (string) and "message" (string).

2. If priority is "high", return "[URGENT] <message>".

3. If priority is "medium", return "[NORMAL] <message>".

4. For any other priority (including "low"), return "[LOW] <message>".

All comparisons should be case-insensitive.

Loading editor...

Performance and Design Tips

After building dozens of chain architectures, here are the patterns that consistently matter for performance and maintainability.

Parallelise I/O-bound steps aggressively. If two LLM calls don't depend on each other, wrap them in RunnableParallel. The latency savings are linear — three 2-second calls become one 2-second wait.

Use the smallest model for classification and routing. A gpt-4o-mini call for routing costs ~$0.0001 and takes ~300ms. Using gpt-4o for the same routing call costs 30x more and takes 2x longer, with no accuracy improvement for simple classification tasks.

Set `max_concurrency` in batch calls. Without it, LangChain defaults to 5 concurrent requests. Set it higher for generous rate limits, lower for strict ones. I always set it explicitly rather than relying on the default.

Configure `max_retries` at the model level, not the chain level. ChatOpenAI(max_retries=3) handles transient failures with exponential backoff. Use .with_fallbacks() only when you want to switch to a genuinely different model or provider.

Quick Reference

Pattern	LangChain Class	Key Method	Use When
Sequential	Pipe operator (`\	`)	`.invoke()`	Output of A feeds into B
Parallel	`RunnableParallel`	`.invoke()`	Multiple independent tasks on same input
Branching	`RunnableBranch`	`.invoke()`	Route to different chains by condition
Fallback	`.with_fallbacks()`	`.invoke()`	Graceful degradation on errors
Batch	Any `Runnable`	`.batch()`	Process many inputs concurrently
Passthrough	`RunnablePassthrough`	`.invoke()`	Carry original input alongside transformed data
Custom logic	`RunnableLambda`	`.invoke()`	Transform data between steps

Complete Code

Here is a self-contained script combining all four chain patterns into a single runnable example. Replace the API key and run it locally.

Complete code: all chain patterns in one script

Loading editor...

Frequently Asked Questions

Can I nest RunnableParallel inside RunnableBranch?

Yes. Every LangChain Runnable is composable with every other Runnable. You can put a RunnableParallel inside one branch of a RunnableBranch, or chain multiple RunnableBranch nodes in sequence. The pipe operator and all composition methods work regardless of nesting depth.

What happens if a branch condition raises an exception?

If a condition callable raises an exception, RunnableBranch does not catch it — it propagates up. Only the matched branch's runnable execution is subject to fallback handling. Wrap your condition logic in a try/except if it involves operations that might fail, such as parsing or external lookups.

Python

Loading editor...

Is .batch() the same as calling .invoke() in a loop with asyncio.gather?

Functionally, yes — .batch() runs .ainvoke() concurrently under the hood using asyncio. But .batch() also handles max_concurrency limiting, error collection, and consistent return ordering. Writing your own asyncio.gather is possible but .batch() handles the edge cases for you.

How do I log which fallback was used?

LangChain's callback system reports which runnable executed. Attach a logging callback to see which chain in the fallback sequence actually produced the result. The LangSmith UI also shows this visually in the trace view.

Python

Loading editor...

References

LangChain documentation — LCEL Primitives: RunnableParallel, RunnablePassthrough, RunnableLambda. Link

LangChain documentation — How to add fallbacks to a runnable. Link

LangChain documentation — Routing between sub-chains. Link

LangChain documentation — How to invoke runnables in parallel. Link

LangChain documentation — Batch processing. Link

Harrison Chase, "LangChain Expression Language," LangChain Blog (2023). Link

Why Chain Patterns Matter

Sequential Chains — Multi-Step Pipelines

Passing Multiple Values Between Steps

Parallel Chains — Fan-Out and Merge

Merging Parallel Results Into a Final Step

Branching — Conditional Routing

LLM-Powered Routing

Retry and Fallback Patterns

Chaining Multiple Fallbacks

Filtering Which Exceptions Trigger Fallbacks

Batch Processing — Scaling to Many Inputs

Combining Patterns — Real-World Example

Common Mistakes and How to Fix Them

Mistake 1: Mismatched Keys Between Steps

Mistake 2: Forgetting to Repackage String Outputs

Mistake 3: RunnableBranch Conditions Not Returning Booleans

Mistake 4: Catching All Exceptions in Fallbacks

Performance and Design Tips

Quick Reference

Complete Code

Frequently Asked Questions

Can I nest RunnableParallel inside RunnableBranch?

What happens if a branch condition raises an exception?

Is .batch() the same as calling .invoke() in a loop with asyncio.gather?

How do I log which fallback was used?

References

Related Tutorials