Skip to main content

LangChain Chains: Sequential, Parallel, Branching, and Retry Patterns

Intermediate90 min3 exercises45 XP
Prerequisites
0/3 exercises

You have a working LCEL chain that prompts an LLM and parses the output. One pipe, one model, one result. But real applications need more: summarise a document then translate the summary, call three models at once and merge the answers, route customer questions to different chains based on topic, and gracefully retry when an API times out. These are the four chain patterns every LangChain application needs, and I use all four in every production project I build.

Why Chain Patterns Matter

A single prompt-model-parser pipeline handles maybe 20% of real-world LLM use cases. The moment you need to feed one model's output into another, compare answers from multiple providers, or handle the inevitable API failures, you need composition patterns.

LangChain's LCEL (LangChain Expression Language) gives you four composable patterns that cover virtually every chain architecture I've encountered:

PatternWhat It DoesWhen to Use
SequentialStep A → Step B → Step CMulti-step processing (summarise → translate)
ParallelSteps A, B, C run simultaneouslyFan-out (ask 3 models, merge answers)
BranchingRoute to different chains by conditionTopic-based routing, difficulty-based prompts
Retry / FallbackIf Step A fails, try Step BAPI reliability, model failover

Each pattern is a Runnable that you can compose with the pipe operator (|). Once you understand these four, you can build arbitrarily complex chains by nesting them.

Sequential Chains — Multi-Step Pipelines

The sequential chain is the simplest pattern: output from step A becomes input to step B. You've already used this if you've piped a prompt into a model into a parser. But multi-step sequential chains go further — they let you chain entire workflows where each step transforms the data for the next.

Here's a practical scenario: given a block of technical text, first summarise it in plain language, then translate the summary into Spanish. Two LLM calls, each with its own prompt, chained together.

Sequential chain: summarise then translate
Loading editor...

The output will be a Spanish translation of the plain-language summary. The key detail to notice is the RunnableLambda in the middle — it repackages the string output from step 1 into a dictionary that matches step 2's prompt template variable. Without it, you'd get an error because translate_prompt expects a dict with a "summary" key, not a raw string.

Passing Multiple Values Between Steps

In many real workflows, the second step needs the original input and the first step's output. You can achieve this by merging the original input with intermediate results using RunnablePassthrough.

Preserving original input across steps
Loading editor...

RunnableParallel here serves double duty. It runs the summarise chain and the passthrough in parallel, producing a dict with both summary and original_text keys. The comparison prompt then receives both values. I reach for this pattern constantly when building evaluation chains.


Parallel Chains — Fan-Out and Merge

Sometimes you need to run the same input through multiple chains at the same time. Maybe you're asking three different models the same question and comparing their answers. Maybe you're extracting sentiment, entities, and a summary from the same text simultaneously. RunnableParallel handles this.

Fan-out: run three analyses in parallel
Loading editor...

All three LLM calls fire at the same time. Instead of three sequential round-trips, you pay the latency of only the slowest one. The result is a dictionary with one key per parallel branch.

Merging Parallel Results Into a Final Step

Running analyses in parallel is only half the pattern. Usually you want to feed the merged results into a final chain — a synthesis step that combines the parallel outputs into a single coherent response.

Fan-out then merge: parallel analysis with synthesis
Loading editor...

This is the fan-out-and-merge pattern. The parallel step produces a dict with sentiment, summary, and keywords. That dict flows directly into report_prompt because the template variables match the dict keys. No RunnableLambda needed — the shape already fits.

Exercise 1: Build a Parallel Processing Pipeline
Write Code

Write a function build_analysis_pipeline that takes a list of analysis names and returns a dictionary of results. Specifically:

1. Accept a list of analysis names (strings) and an input text.

2. Create a dictionary where each key is the analysis name (lowercased) and each value is a formatted string "Analysis: <name> | Input: <text>".

3. Return the dictionary.

This simulates the pattern of fanning out work and collecting results, without requiring an LLM.

Loading editor...

Branching — Conditional Routing

Here's a situation I run into all the time: incoming queries should go to different chains depending on the topic. Technical questions get a detailed, code-heavy chain. Customer complaints get an empathetic, resolution-focused chain. General questions get a concise FAQ-style chain. RunnableBranch handles this.

Routing queries to different chains based on topic
Loading editor...

RunnableBranch takes a sequence of (condition, runnable) tuples followed by a default runnable. It evaluates conditions top-to-bottom and runs the first match. The default at the end catches anything that doesn't match earlier conditions.

The keyword-matching approach above is simple but fragile. A more robust pattern is to use the LLM itself as the classifier — have one chain classify the query, then route based on the classification.

LLM-Powered Routing

Instead of hardcoding keywords, ask the model to classify the query first, then route based on its classification. This is the pattern I use in production because it handles ambiguous queries far better than keyword matching.

LLM-based classification then routing
Loading editor...

The first LLM call classifies the query, costing fractions of a cent. The second call uses the right specialised prompt. This two-hop approach is more expensive than keyword matching but dramatically more accurate for edge cases like "my API is broken" (technical? complaint? both?).


Retry and Fallback Patterns

LLM APIs fail. Rate limits, timeouts, server errors — if you're making hundreds of calls, failures are not "if" but "when." LangChain's .with_fallbacks() method lets you define backup chains that activate when the primary chain raises an exception.

Model failover: OpenAI with Claude fallback
Loading editor...

The caller doesn't need to know which model responded. If OpenAI returns a successful response, Claude is never called. If OpenAI raises any exception — RateLimitError, APITimeoutError, APIConnectionError — the fallback chain takes over seamlessly.

Chaining Multiple Fallbacks

You're not limited to one fallback. Pass a list and LangChain tries each one in order until something succeeds.

Cascading fallbacks: three models deep
Loading editor...

I use this exact three-tier pattern in production: cloud primary, cloud backup from a different provider, local model as the last resort. The local model is slower and less capable, but it guarantees the user always gets a response.

Filtering Which Exceptions Trigger Fallbacks

By default, .with_fallbacks() catches any exception. But sometimes you only want to fall back on specific errors — a rate limit should trigger a fallback, but a malformed prompt should raise immediately so you can fix it.

Fall back only on specific exception types
Loading editor...

This is a subtle but important distinction. Catching all exceptions can mask bugs. If your prompt template has a missing variable, you want that error to surface immediately, not silently switch to a different model that might also fail for the same reason.

Exercise 2: Implement a Fallback Pipeline
Write Code

Write a function call_with_fallbacks that simulates LangChain's fallback pattern.

1. Accept a list of callables (functions) and an input value.

2. Try each callable in order. If it returns a result without raising an exception, return that result.

3. If a callable raises any exception, catch it and move to the next one.

4. If all callables fail, raise a RuntimeError with the message "All fallbacks exhausted".

Loading editor...

Batch Processing — Scaling to Many Inputs

Every Runnable in LangChain has a .batch() method that processes a list of inputs concurrently. This is essential when you need to classify 500 support tickets, summarise 100 documents, or extract entities from a batch of emails.

Batch processing with concurrency control
Loading editor...

By default, .batch() runs up to 5 inputs concurrently. You can control this with the max_concurrency parameter.

Controlling concurrency to respect rate limits
Loading editor...

Set max_concurrency based on your API provider's rate limits. OpenAI's tier 1 allows about 500 requests per minute, so max_concurrency=10 is usually safe. For free-tier Anthropic or Ollama running locally, keep it at 2-3.


Combining Patterns — Real-World Example

The power of these patterns comes from combining them. Here's a realistic scenario I've built variations of for multiple clients: a customer support pipeline that classifies incoming tickets, routes them to specialised handlers, and falls back to a generic handler if the specialised chain fails.

Full support pipeline: classify, route, fallback, batch
Loading editor...

This pipeline uses all four patterns. The classifier and ticket passthrough run in parallel via RunnableParallel. The classifier's output determines the branch via the routing function. Each specialised handler has a fallback to the general chain. And the entire pipeline processes multiple tickets concurrently via .batch(). That's a production-grade support system in about 50 lines.


Common Mistakes and How to Fix Them

Mistake 1: Mismatched Keys Between Steps

The most frequent error I see is a KeyError because the output of one step doesn't match the input variables of the next step's prompt template.

Breaks: output key doesn't match template variable
# Step 1 output has key "result"
step1 = RunnableParallel(result=some_chain)

# Step 2 template expects key "summary"
step2_prompt = ChatPromptTemplate.from_template(
    "Translate: {summary}"  # KeyError!
)
Works: keys match between steps
# Step 1 output has key "summary"
step1 = RunnableParallel(summary=some_chain)

# Step 2 template expects key "summary"
step2_prompt = ChatPromptTemplate.from_template(
    "Translate: {summary}"  # Matches!
)

Mistake 2: Forgetting to Repackage String Outputs

When a chain ending in StrOutputParser() feeds into a prompt template, the raw string won't match the template's expected dictionary input.

Breaks: string fed directly into prompt expecting dict
chain = (
    prompt_a | model | StrOutputParser()
    | prompt_b  # Expects {"text": ...}, gets a string
    | model | StrOutputParser()
)
Works: RunnableLambda repackages the string
chain = (
    prompt_a | model | StrOutputParser()
    | RunnableLambda(lambda s: {"text": s})
    | prompt_b | model | StrOutputParser()
)

Mistake 3: RunnableBranch Conditions Not Returning Booleans

Each condition in RunnableBranch must be a callable that returns True or False. Returning a truthy string or non-empty list works in Python but leads to confusing bugs when the "wrong" branch fires.

Fragile: truthy string, always matches
# This ALWAYS matches because non-empty strings are truthy
(lambda x: x.get("category", ""),
 technical_chain)
Correct: explicit boolean comparison
# Explicit comparison returns True/False
(lambda x: x.get("category", "") == "technical",
 technical_chain)

Mistake 4: Catching All Exceptions in Fallbacks

Using .with_fallbacks() without exceptions_to_handle catches everything, including programming errors that should crash loudly.

Be specific about which exceptions trigger fallbacks
Loading editor...

Exercise 3: Build a Priority Router
Write Code

Write a function route_by_priority that simulates conditional routing.

1. Accept a dictionary with keys "priority" (string) and "message" (string).

2. If priority is "high", return "[URGENT] <message>".

3. If priority is "medium", return "[NORMAL] <message>".

4. For any other priority (including "low"), return "[LOW] <message>".

All comparisons should be case-insensitive.

Loading editor...

Performance and Design Tips

After building dozens of chain architectures, here are the patterns that consistently matter for performance and maintainability.

Parallelise I/O-bound steps aggressively. If two LLM calls don't depend on each other, wrap them in RunnableParallel. The latency savings are linear — three 2-second calls become one 2-second wait.

Use the smallest model for classification and routing. A gpt-4o-mini call for routing costs ~$0.0001 and takes ~300ms. Using gpt-4o for the same routing call costs 30x more and takes 2x longer, with no accuracy improvement for simple classification tasks.

Set `max_concurrency` in batch calls. Without it, LangChain defaults to 5 concurrent requests. Set it higher for generous rate limits, lower for strict ones. I always set it explicitly rather than relying on the default.

Configure `max_retries` at the model level, not the chain level. ChatOpenAI(max_retries=3) handles transient failures with exponential backoff. Use .with_fallbacks() only when you want to switch to a genuinely different model or provider.


Quick Reference

PatternLangChain ClassKey MethodUse When
SequentialPipe operator (`\`).invoke()Output of A feeds into B
ParallelRunnableParallel.invoke()Multiple independent tasks on same input
BranchingRunnableBranch.invoke()Route to different chains by condition
Fallback.with_fallbacks().invoke()Graceful degradation on errors
BatchAny Runnable.batch()Process many inputs concurrently
PassthroughRunnablePassthrough.invoke()Carry original input alongside transformed data
Custom logicRunnableLambda.invoke()Transform data between steps

Complete Code

Here is a self-contained script combining all four chain patterns into a single runnable example. Replace the API key and run it locally.

Complete code: all chain patterns in one script
Loading editor...

Frequently Asked Questions

Can I nest RunnableParallel inside RunnableBranch?

Yes. Every LangChain Runnable is composable with every other Runnable. You can put a RunnableParallel inside one branch of a RunnableBranch, or chain multiple RunnableBranch nodes in sequence. The pipe operator and all composition methods work regardless of nesting depth.

What happens if a branch condition raises an exception?

If a condition callable raises an exception, RunnableBranch does not catch it — it propagates up. Only the matched branch's runnable execution is subject to fallback handling. Wrap your condition logic in a try/except if it involves operations that might fail, such as parsing or external lookups.

Python
Loading editor...

Is .batch() the same as calling .invoke() in a loop with asyncio.gather?

Functionally, yes — .batch() runs .ainvoke() concurrently under the hood using asyncio. But .batch() also handles max_concurrency limiting, error collection, and consistent return ordering. Writing your own asyncio.gather is possible but .batch() handles the edge cases for you.

How do I log which fallback was used?

LangChain's callback system reports which runnable executed. Attach a logging callback to see which chain in the fallback sequence actually produced the result. The LangSmith UI also shows this visually in the trace view.

Python
Loading editor...

References

  • LangChain documentation — LCEL Primitives: RunnableParallel, RunnablePassthrough, RunnableLambda. Link
  • LangChain documentation — How to add fallbacks to a runnable. Link
  • LangChain documentation — Routing between sub-chains. Link
  • LangChain documentation — How to invoke runnables in parallel. Link
  • LangChain documentation — Batch processing. Link
  • Harrison Chase, "LangChain Expression Language," LangChain Blog (2023). Link
  • Related Tutorials