Skip to main content

LangChain Chains: Sequential, Parallel, Branching, and Retry Patterns

Intermediate90 min3 exercises45 XP
Prerequisites
0/3 exercises

You have a working LCEL chain that prompts an LLM and parses the output. One pipe, one model, one result. But real applications need more: summarise a document then translate the summary, call three models at once and merge the answers, route customer questions to different chains based on topic, and gracefully retry when an API times out. These are the four chain patterns every LangChain application needs, and I use all four in every production project I build.

Why Chain Patterns Matter

A LangChain chain is a sequence of composable steps — prompts, models, parsers, and custom functions — piped together into a single callable. A single prompt-model-parser pipeline, the kind you build in the LangChain quickstart, handles maybe 20% of real-world LLM use cases. The moment you need to feed one model's output into another, compare answers from multiple providers, or handle API failures, you need composition patterns.

LangChain's LCEL (LangChain Expression Language) gives you four composable patterns that cover virtually every chain architecture I've encountered:


PatternWhat It DoesWhen to Use
SequentialStep A → Step B → Step CMulti-step processing (summarise → translate)
ParallelSteps A, B, C run simultaneouslyFan-out (ask 3 models, merge answers)
BranchingRoute to different chains by conditionTopic-based routing, difficulty-based prompts
Retry / FallbackIf Step A fails, try Step BAPI reliability, model failover

Each pattern is a Runnable you can compose with the pipe operator (|). Once you understand these four, you can build arbitrarily complex chains by nesting them — and even combine all four in a single pipeline, as we'll do in the combined example.

Sequential Chains — Multi-Step Pipelines

The sequential chain is the simplest pattern: output from step A becomes input to step B. You've already used this if you've piped a prompt into a model into a parser. But multi-step sequential chains go further — they let you chain entire workflows where each step transforms the data for the next.

Here's a practical scenario: given a block of technical text, first summarise it in plain language, then translate the summary into Spanish. Two LLM calls, each with its own prompt, chained together.

Sequential chain: summarise then translate
Loading editor...

The output will be a Spanish translation of the plain-language summary. The key detail to notice is the RunnableLambda in the middle — it repackages the string output from step 1 into a dictionary that matches step 2's prompt template variable. Without it, you'd get an error because translate_prompt expects a dict with a "summary" key, not a raw string.

Passing Multiple Values Between Steps

In many real workflows, the second step needs the original input and the first step's output. You can achieve this by merging the original input with intermediate results using RunnablePassthrough.

Preserving original input across steps
Loading editor...

RunnableParallel here serves double duty. It runs the summarise chain and the passthrough in parallel, producing a dict with both summary and original_text keys. The comparison prompt then receives both values. I reach for this pattern constantly when building evaluation chains.


Parallel Chains — Fan-Out and Merge

Sometimes you need to run the same input through multiple chains at the same time. Maybe you're asking three different models the same question and comparing their answers. Maybe you're extracting sentiment, entities, and a summary from the same text simultaneously. RunnableParallel handles this.

Fan-out: run three analyses in parallel
Loading editor...

All three LLM calls fire at the same time. Instead of three sequential round-trips, you pay the latency of only the slowest one. The result is a dictionary with one key per parallel branch.

Merging Parallel Results Into a Final Step

Running analyses in parallel is only half the pattern. Usually you want to feed the merged results into a final chain — a synthesis step that combines the parallel outputs into a single coherent response.

The code below builds a product report by piping the parallel chain's output dict into a synthesis prompt. The key trick: RunnableParallel returns a dict with keys sentiment, summary, and keywords, and the synthesis prompt's template variables use those exact same key names. LangChain auto-maps the dict keys to the template variables, so no RunnableLambda repackaging is needed.

Fan-out then merge: parallel analysis with synthesis
Loading editor...

This is the fan-out-and-merge pattern. The parallel step produces a dict with sentiment, summary, and keywords. That dict flows directly into report_prompt because the template variables match the dict keys. No RunnableLambda needed — the shape already fits.

Exercise 1: Build a Parallel Processing Pipeline
Write Code

Write a function build_analysis_pipeline that takes a list of analysis names and returns a dictionary of results. Specifically:

1. Accept a list of analysis names (strings) and an input text.

2. Create a dictionary where each key is the analysis name (lowercased) and each value is a formatted string "Analysis: <name> | Input: <text>".

3. Return the dictionary.

This simulates the pattern of fanning out work and collecting results, without requiring an LLM.

Loading editor...

Branching — Conditional Routing

Here's a situation I run into all the time: incoming queries should go to different chains depending on the topic. Technical questions get a detailed, code-heavy chain. Customer complaints get an empathetic, resolution-focused chain. General questions get a concise FAQ-style chain. RunnableBranch handles this.

Routing queries to different chains based on topic
Loading editor...

RunnableBranch takes a sequence of (condition, runnable) tuples followed by a default runnable. It evaluates conditions top-to-bottom and runs the first match. The default at the end catches anything that doesn't match earlier conditions.

The keyword-matching approach above is simple but fragile. A more robust pattern is to use the LLM itself as the classifier — have one chain classify the query, then route based on the classification.

LLM-Powered Routing

Instead of hardcoding keywords, ask the model to classify the query first, then route based on its classification. This is the pattern I use in production because it handles ambiguous queries far better than keyword matching.

The code below works in two stages. First, a RunnableParallel runs the classifier LLM call and a passthrough lambda simultaneously — producing a dict with both the classification result and the original query. Second, a RunnableLambda wrapping a routing function reads the classification string and calls the appropriate specialised chain. This two-step classify-then-route mechanism is the standard LCEL pattern for dynamic routing.

LLM-based classification then routing
Loading editor...

The first LLM call classifies the query, costing fractions of a cent. The second call uses the right specialised prompt. This two-hop approach is more expensive than keyword matching but dramatically more accurate for edge cases like "my API is broken" (technical? complaint? both?).


Retry and Fallback Patterns

LLM APIs fail. Rate limits, timeouts, server errors — if you're making hundreds of calls, failures are not "if" but "when." LangChain's .with_fallbacks() method lets you define backup chains that activate when the primary chain raises an exception.

Model failover: OpenAI with Claude fallback
Loading editor...

The caller doesn't need to know which model responded. If OpenAI returns a successful response, Claude is never called. If OpenAI raises any exception — RateLimitError, APITimeoutError, APIConnectionError — the fallback chain takes over seamlessly.

Chaining Multiple Fallbacks

You're not limited to one fallback. Pass a list and LangChain tries each one in order until something succeeds. The example below implements a three-tier strategy: OpenAI as the cloud primary, Claude as a cloud backup from a different provider, and a local Ollama model (llama3.2) as the last resort. The local model is slower but has zero API dependency — it guarantees a response even during a complete cloud outage.

Cascading fallbacks: three models deep
Loading editor...

Notice that the local chain uses ChatOllama from langchain_ollama (install via pip install langchain-ollama). The .with_fallbacks() list is ordered — LangChain tries the primary first, then the first fallback, then the second. The moment any chain returns a successful response, the remaining fallbacks are skipped entirely.

Filtering Which Exceptions Trigger Fallbacks

By default, .with_fallbacks() catches any exception. But sometimes you only want to fall back on specific errors — a rate limit should trigger a fallback, but a malformed prompt should raise immediately so you can fix it.

Fall back only on specific exception types
Loading editor...

This is a subtle but important distinction. Catching all exceptions can mask bugs. If your prompt template has a missing variable, you want that error to surface immediately, not silently switch to a different model that might also fail for the same reason.

Retrying the Same Chain with .with_retry()

Fallbacks switch to a different chain. But sometimes you want to re-run the same chain — the error was transient (a brief rate limit spike, a network blip) and the same call would succeed on a second attempt. That's what .with_retry() does.

Retry with exponential backoff before falling back
Loading editor...

The key difference: .with_retry() re-runs the same runnable with backoff delays between attempts. .with_fallbacks() switches to a completely different runnable. Combining both gives you the most resilient pattern — retry the primary a few times, then fall back to a different provider only if the primary is truly down.


Exercise 2: Implement a Fallback Pipeline
Write Code

Write a function call_with_fallbacks that simulates LangChain's fallback pattern.

1. Accept a list of callables (functions) and an input value.

2. Try each callable in order. If it returns a result without raising an exception, return that result.

3. If a callable raises any exception, catch it and move to the next one.

4. If all callables fail, raise a RuntimeError with the message "All fallbacks exhausted".

Loading editor...

Batch Processing — Scaling to Many Inputs

Every Runnable in LangChain has a .batch() method that processes a list of inputs concurrently. This is essential when you need to classify 500 support tickets, summarise 100 loaded documents, or extract entities from a batch of emails.

Batch processing with concurrency control
Loading editor...

By default, .batch() runs up to 5 inputs concurrently. You can control this with the max_concurrency parameter.

Controlling concurrency to respect rate limits
Loading editor...

Set max_concurrency based on your API provider's rate limits. OpenAI's tier 1 allows about 500 requests per minute, so max_concurrency=10 is usually safe. For free-tier Anthropic or Ollama running locally, keep it at 2-3.


Combining Patterns — Real-World Example

The power of these patterns comes from combining them. Here's a realistic scenario I've built variations of for multiple clients: a customer support pipeline that handles incoming tickets end-to-end.

The pipeline below has five stages. Step 1: A classifier chain categorises each ticket as billing, technical, or general. Step 2: Three specialised handler chains are built, each with its own system prompt. Step 3: A routing function reads the classification and selects the right handler, wrapping each one with a .with_fallbacks() to the general chain. Step 4: RunnableParallel + RunnableLambda compose the classify-then-route pipeline. Step 5: .batch() processes multiple tickets concurrently.

Full support pipeline: classify, route, fallback, batch
Loading editor...

This pipeline uses all four patterns. The classifier and ticket passthrough run in parallel via RunnableParallel. The classifier's output determines the branch via the routing function. Each specialised handler has a fallback to the general chain. And the entire pipeline processes multiple tickets concurrently via .batch(). That's a production-grade support system in about 50 lines. To add structured output from these chains, see the output parsers tutorial.


Common Mistakes and How to Fix Them

Mistake 1: Mismatched Keys Between Steps

The most frequent error I see is a KeyError because the output of one step doesn't match the input variables of the next step's prompt template.

Breaks: output key doesn't match template variable
# Step 1 output has key "result"
step1 = RunnableParallel(result=some_chain)

# Step 2 template expects key "summary"
step2_prompt = ChatPromptTemplate.from_template(
    "Translate: {summary}"  # KeyError!
)
Works: keys match between steps
# Step 1 output has key "summary"
step1 = RunnableParallel(summary=some_chain)

# Step 2 template expects key "summary"
step2_prompt = ChatPromptTemplate.from_template(
    "Translate: {summary}"  # Matches!
)

Mistake 2: Forgetting to Repackage String Outputs

When a chain ending in StrOutputParser() feeds into a prompt template, the raw string won't match the template's expected dictionary input.

Breaks: string fed directly into prompt expecting dict
chain = (
    prompt_a | model | StrOutputParser()
    | prompt_b  # Expects {"text": ...}, gets a string
    | model | StrOutputParser()
)
Works: RunnableLambda repackages the string
chain = (
    prompt_a | model | StrOutputParser()
    | RunnableLambda(lambda s: {"text": s})
    | prompt_b | model | StrOutputParser()
)

Mistake 3: RunnableBranch Conditions Not Returning Booleans

Each condition in RunnableBranch must be a callable that returns True or False. Returning a truthy string or non-empty list works in Python but leads to confusing bugs when the "wrong" branch fires.

Fragile: truthy string, always matches
# This ALWAYS matches because non-empty strings are truthy
(lambda x: x.get("category", ""),
 technical_chain)
Correct: explicit boolean comparison
# Explicit comparison returns True/False
(lambda x: x.get("category", "") == "technical",
 technical_chain)

Mistake 4: Catching All Exceptions in Fallbacks

Using .with_fallbacks() without exceptions_to_handle catches everything, including programming errors that should crash loudly.

Be specific about which exceptions trigger fallbacks
Loading editor...

Exercise 3: Build a Priority Router
Write Code

Write a function route_by_priority that simulates conditional routing.

1. Accept a dictionary with keys "priority" (string) and "message" (string).

2. If priority is "high", return "[URGENT] <message>".

3. If priority is "medium", return "[NORMAL] <message>".

4. For any other priority (including "low"), return "[LOW] <message>".

All comparisons should be case-insensitive.

Loading editor...

Performance and Design Tips

After building dozens of chain architectures, here are the patterns that consistently matter for performance and maintainability.

Use the smallest model for classification and routing. A gpt-4o-mini call for routing costs ~$0.0001 and takes ~300ms. Using gpt-4o for the same routing call costs 30x more and takes 2x longer, with no accuracy improvement for simple classification tasks.

Concurrency and retry configuration in one place
Loading editor...

I always set max_concurrency explicitly rather than relying on the default. And I configure max_retries at the model level — ChatOpenAI(max_retries=3) handles transient failures with exponential backoff automatically. Reserve .with_fallbacks() for when you want to switch to a genuinely different provider.


Quick Reference

PatternLangChain ClassKey MethodUse When
SequentialPipe operator (`\`).invoke()Output of A feeds into B
ParallelRunnableParallel.invoke()Multiple independent tasks on same input
BranchingRunnableBranch.invoke()Route to different chains by condition
Fallback.with_fallbacks().invoke()Graceful degradation on errors
BatchAny Runnable.batch()Process many inputs concurrently
PassthroughRunnablePassthrough.invoke()Carry original input alongside transformed data
Custom logicRunnableLambda.invoke()Transform data between steps

Complete Code

Here is a self-contained script you can copy into a single .py file and run locally. It demonstrates all five patterns in sequence: (1) a sequential summarise-then-translate chain, (2) a parallel three-analysis fan-out, (3) a keyword-based branching router, (4) a fallback chain with a backup model, and (5) batch processing with concurrency control. Replace the API key placeholder with your own key before running.

Complete code: all chain patterns in one script
Loading editor...

Frequently Asked Questions

Can I nest RunnableParallel inside RunnableBranch?

Yes. Every LangChain Runnable is composable with every other Runnable. You can put a RunnableParallel inside one branch of a RunnableBranch, or chain multiple RunnableBranch nodes in sequence. The pipe operator and all composition methods work regardless of nesting depth.

What happens if a branch condition raises an exception?

If a condition callable raises an exception, RunnableBranch does not catch it — it propagates up. Only the matched branch's runnable execution is subject to fallback handling. Wrap your condition logic in a try/except if it involves operations that might fail, such as parsing or external lookups.

Python
Loading editor...

Is .batch() the same as calling .invoke() in a loop with asyncio.gather?

Functionally, yes — .batch() runs .ainvoke() concurrently under the hood using asyncio. But .batch() also handles max_concurrency limiting, error collection, and consistent return ordering. Writing your own asyncio.gather is possible but .batch() handles the edge cases for you.

How do I log which fallback was used?

LangChain's callback system reports which runnable executed. Attach a logging callback to see which chain in the fallback sequence actually produced the result. The LangSmith UI also shows this visually in the trace view.

Python
Loading editor...

Can I add memory to a chain so it remembers previous messages?

Yes. Chains are stateless by default — each .invoke() call starts fresh. To add conversation memory, wrap your chain with a message history store. Our chatbot memory tutorial walks through RunnableWithMessageHistory, buffer memory, and window-based memory patterns that plug directly into any chain from this tutorial.

How do I call external APIs or databases from within a chain?

Use RunnableLambda to wrap any Python function — database queries, REST API calls, file reads — and pipe it into your chain like any other step. For structured tool use with automatic argument parsing and error handling, see the LangChain tools tutorial. Tools integrate seamlessly with the branching and fallback patterns covered here.


References

  • LangChain documentation — LCEL Primitives: RunnableParallel, RunnablePassthrough, RunnableLambda. Link
  • LangChain documentation — How to add fallbacks to a runnable. Link
  • LangChain documentation — Routing between sub-chains. Link
  • LangChain documentation — How to invoke runnables in parallel. Link
  • LangChain documentation — Batch processing. Link
  • Harrison Chase, "LangChain Expression Language," LangChain Blog (2023). Link
  • Related Tutorials

    Save your progress across devices

    Never lose your code, challenges, or XP. Sign up free — no password needed.

    Already have an account?