Skip to main content

LCEL Mastery: Compose LLM Chains with the Pipe Operator in Python

Intermediate90 min3 exercises50 XP
0/3 exercises

You have a prompt template, an LLM, and an output parser. You could glue them together with five lines of imperative code — variable here, function call there. Or you could write prompt | model | parser and be done. That pipe operator is LCEL, and once you see how it works, you won't go back.

What Is LCEL and Why Does LangChain Use It?

LCEL stands for LangChain Expression Language. It is a declarative way to compose LangChain components into pipelines using the | (pipe) operator. Prompts, models, parsers, retrievers, custom functions — chain them together, and data flows from left to right.

I think of it as Unix pipes for LLM applications. In a shell, you write cat file.txt | grep error | wc -l and data flows left to right through each command. LCEL does the same thing: data flows through each component, and you can read the entire pipeline at a glance.

The LCEL mental model
Loading editor...

The key insight: every component in LangChain implements a Runnable interface. A Runnable has three core methods — .invoke(), .stream(), and .batch(). When you pipe Runnables together, the resulting chain is itself a Runnable. You get invoke/stream/batch on the entire pipeline for free.

Your First LCEL Chain — Prompt, Model, Parser

Let's build the most common LCEL pattern: a prompt template piped to a chat model piped to an output parser. This is the bread and butter of LangChain apps, and I reach for this pattern probably ten times a week.

The classic prompt | model | parser chain
Loading editor...

That single line — prompt | model | parser — replaces three separate .invoke() calls. The ChatPromptTemplate takes a dictionary with your variables and formats them into messages. The model returns an AIMessage, and StrOutputParser extracts just the text content as a plain string.

What makes this more than syntactic sugar is what happens behind the scenes. Call chain.stream({"concept": "generators"}) and tokens stream through the parser in real time — zero streaming logic on your part. Call chain.batch([...]) and inputs run in parallel. The composition is doing real work.

The Runnable Interface — invoke, stream, batch

Three ways to run any LCEL chain
Loading editor...

Every LCEL chain supports all three methods, no matter how complex. You build the pipeline once, then choose how to run it. Building an API? Use stream so users see tokens arrive live. Processing a batch of documents? Use batch to parallelize. Simple script? Plain invoke.

There are async variants too: ainvoke, astream, and abatch. If your app uses asyncio (most modern Python web frameworks do), these avoid blocking the event loop.

RunnableSequence — Building Multi-Step Pipelines

When you write a | b | c, LangChain creates a RunnableSequence. Each step takes the output of the previous step as input. The sequence is itself a Runnable, so you can nest sequences inside other sequences.

Chaining two LCEL pipelines together
Loading editor...

Notice the RunnableLambda in the middle. The joke chain outputs a plain string, but the analysis chain expects a dictionary with a "joke" key. The lambda reshapes the data to bridge that gap. You will use this pattern constantly — LCEL is strict about input/output types, and lambdas are the glue.

RunnableParallel — Running Steps Side by Side

Sometimes you need to run multiple operations on the same input simultaneously. Maybe you want to translate text into three languages at once, or generate a summary and a list of key points in parallel. That's what RunnableParallel is for.

Analyzing text three ways in parallel
Loading editor...

RunnableParallel takes keyword arguments where each value is a Runnable chain. It passes the same input to all of them, runs them concurrently, and collects results into a dictionary. The keys you choose (summary, keywords, sentiment) become the output keys.

I use this pattern heavily when building real applications. A common case: you have a user query and you need to both retrieve relevant documents AND classify the query intent before deciding how to respond. Running those in parallel instead of sequentially cuts your latency roughly in half.


Build a Data Pipeline with the Pipe Pattern
Write Code

LCEL uses the pipe operator to chain functions together. Implement a pipe function that takes a value and a list of functions, and applies each function in sequence — just like the | operator chains Runnables.

Write a function pipe(value, *functions) that:

1. Takes an initial value and any number of functions

2. Applies the first function to the value

3. Passes the result to the second function, and so on

4. Returns the final result

Then create three simple transformation functions and pipe them together.

Loading editor...

RunnableLambda — Injecting Custom Python Logic

Not every step in a chain is a prompt or a model call. Sometimes you need to clean text, extract a field, log intermediate results, or run arbitrary Python logic. RunnableLambda wraps any Python function into a Runnable so it slots into an LCEL chain.

Custom preprocessing and postprocessing with RunnableLambda
Loading editor...

The first RunnableLambda cleans the input and computes metadata that the prompt template uses. The last one formats the raw model output into a structured display. Both are plain Python functions — no LangChain magic, just functions that take one argument and return one value.

RunnablePassthrough — Forwarding and Augmenting Data

Here is a problem you will hit quickly: your chain needs some input data to pass through unchanged while also computing new fields. For example, you want to pass the user's original question to the prompt while also retrieving relevant documents. RunnablePassthrough solves this.

Using RunnablePassthrough.assign() in a RAG-style chain
Loading editor...

RunnablePassthrough.assign(context=...) keeps all existing keys (question) and adds a new key (context) computed by your function. The downstream prompt template receives both {question} and {context}. This pattern is the foundation of every RAG pipeline in LangChain.

Without RunnablePassthrough (verbose)
# Must manually construct the full dict
chain = (
    RunnableLambda(lambda x: {
        "question": x["question"],
        "context": fake_retriever(x["question"])
    })
    | prompt
    | model
    | parser
)
With RunnablePassthrough.assign (clean)
# Passthrough keeps existing keys, adds new ones
chain = (
    RunnablePassthrough.assign(
        context=lambda x: fake_retriever(x["question"])
    )
    | prompt
    | model
    | parser
)

Fallbacks and Retries — Building Resilient Chains

LLM APIs fail. Rate limits, timeouts, server errors — these are not edge cases, they are Tuesday. In my experience, any production LLM application that does not handle failures will crash within the first week. LCEL has built-in support for both retries and fallbacks.

Retries — Try Again on Transient Failures

Adding retry logic to a model call
Loading editor...

.with_retry() wraps any Runnable with automatic retry logic. With wait_exponential_jitter=True, wait times grow exponentially (1s, 2s, 4s...) with random jitter. The jitter prevents multiple failing clients from retrying simultaneously and hammering the API.

Fallbacks — Switch to a Backup Model

Falling back from one model to another
Loading editor...

.with_fallbacks() takes a list of alternative Runnables. If the primary raises an exception, LangChain tries each fallback in order. In production, I typically chain gpt-4o-mini to gpt-4o to claude-3-haiku so there is always a model available.


Implement a Parallel Merge Function
Write Code

RunnableParallel runs multiple functions on the same input and collects results into a dictionary. Implement a parallel_run function that mimics this behavior.

Write a function parallel_run(input_data, **functions) that:

1. Takes an input value and keyword arguments where each value is a function

2. Calls every function with the same input

3. Returns a dictionary mapping each keyword name to its function's result

Example: parallel_run(10, doubled=lambda x: x*2, squared=lambda x: x**2) returns {"doubled": 20, "squared": 100}

Loading editor...

Real-World Example: A Multi-Step Document Analyzer

This is the kind of pipeline I build most often at work. You have raw text — could be a support ticket, a research paper, or a news article — and you need a structured analysis report. Summary, key entities, difficulty rating, all in one pass.

Complete document analyzer pipeline
Loading editor...

This pipeline combines every LCEL concept from this tutorial. RunnableLambda handles preprocessing and formatting. RunnablePassthrough.assign forwards the cleaned text while running three analyses in parallel. The whole thing reads top to bottom and supports streaming, batching, and async out of the box.

Common Mistakes and How to Fix Them

Mistake 1: Input/Output Type Mismatch

This is the single most frequent LCEL error, and I see it in almost every first-time LCEL project. A prompt template expects a dictionary, but the previous step outputs a string. Or a model outputs an AIMessage, but you pipe it into a function expecting a string.

Broken — string fed to prompt expecting dict
# StrOutputParser returns a string
chain_1 = prompt | model | StrOutputParser()

# ChatPromptTemplate expects a dict with keys
chain_2 = another_prompt | model | StrOutputParser()

# This fails: chain_1 output is a string, chain_2 input needs a dict
broken = chain_1 | chain_2  # TypeError!
Fixed — reshape with RunnableLambda
chain_1 = prompt | model | StrOutputParser()
chain_2 = another_prompt | model | StrOutputParser()

# Bridge the gap with a lambda
fixed = (
    chain_1
    | RunnableLambda(lambda text: {"input": text})
    | chain_2
)

Mistake 2: Forgetting That StrOutputParser Discards Metadata

StrOutputParser extracts just the text content from an AIMessage. Token usage, finish reason, and all other metadata are gone after that point. If you need metadata downstream, skip StrOutputParser and extract what you need with a custom lambda.

Keeping metadata when you need it
Loading editor...

Mistake 3: Using Lambda Expressions That Are Hard to Debug

Inline lambdas are convenient but invisible in stack traces. When a chain with five lambdas fails, the error says "error in <lambda>" — no indication of which one broke. For anything beyond trivial transforms, use named functions.

Hard to debug — anonymous lambdas
chain = (
    RunnableLambda(lambda x: x["text"].strip())
    | RunnableLambda(lambda x: {"query": x, "k": 5})
    | RunnableLambda(lambda x: retriever(x))  # which lambda failed?
)
Easy to debug — named functions
def clean_text(x):
    return x["text"].strip()

def prepare_query(text):
    return {"query": text, "k": 5}

def retrieve_docs(params):
    return retriever(params)

chain = (
    RunnableLambda(clean_text)
    | RunnableLambda(prepare_query)
    | RunnableLambda(retrieve_docs)
)

Performance Tips and How LCEL Works Internally

I have never seen LCEL overhead be the bottleneck — the pipe operator creates a RunnableSequence at definition time, and .invoke() just loops through steps. The real performance wins come from how you structure your chains.

Use `RunnableParallel` for independent LLM calls. If two steps don't depend on each other, run them in parallel. Three sequential 2-second LLM calls = 6 seconds. Three parallel calls = 2 seconds. That is the single biggest performance win in LCEL.

Use `.batch()` for processing multiple inputs. Calling chain.invoke() in a loop is sequential. Calling chain.batch(items) runs them concurrently with configurable parallelism:

Batch processing with concurrency control
Loading editor...

Prefer `.stream()` over `.invoke()` for user-facing apps. Streaming gives users something to read immediately instead of staring at a blank screen. Total time is the same, but perceived latency drops dramatically.


Build a Chain with Fallback Logic
Write Code

Implement a with_fallback function that wraps a primary function with fallback behavior — just like LCEL's .with_fallbacks() method.

Write a function with_fallback(primary, *fallbacks) that:

1. Returns a new function

2. When called, the new function tries the primary function first

3. If the primary raises any exception, it tries each fallback function in order

4. Returns the result of the first function that succeeds

5. If all functions fail, raises the last exception

Also implement with_retry(func, max_attempts=3) that:

1. Returns a new function that retries func up to max_attempts times

2. If all attempts fail, raises the last exception

Loading editor...

Frequently Asked Questions

Can I use LCEL with non-LangChain functions?

Yes. Wrap any Python callable in RunnableLambda and it becomes a full LCEL citizen with .invoke(), .batch(), and .stream() support. You can also use the @chain decorator for the same effect with cleaner syntax.

Wrapping any Python function as a Runnable
Loading editor...

Is LCEL faster than calling components manually?

For a single sequential chain, LCEL and manual calls have nearly identical performance. The bottleneck is the LLM API call, not chain overhead. Where LCEL wins is RunnableParallel and .batch(), which parallelize work you would otherwise thread manually. LCEL also handles streaming propagation across multi-step chains automatically.

What is the difference between RunnableSequence and the old LLMChain?

LLMChain was the original LangChain API for combining a prompt and a model. It has been deprecated in favor of LCEL. The old LLMChain(prompt=prompt, llm=model) is now simply prompt | model.

LCEL is more composable (chain anything with anything), more transparent (no hidden state), and supports streaming and batching natively. If you see LLMChain in tutorials or Stack Overflow answers, translate it to LCEL.

How do I debug an LCEL chain that returns unexpected results?

Insert a RunnableLambda that prints intermediate values at the point where you suspect the issue. This is the chain equivalent of adding print statements:

Inserting debug taps into an LCEL chain
Loading editor...

For production debugging, use LangSmith. It captures the full trace of every chain execution — inputs, outputs, latency, and token usage at each step. It is the dedicated observability tool for LangChain apps.

References

  • LangChain Documentation — LangChain Expression Language (LCEL). Link
  • LangChain Documentation — Runnable Interface. Link
  • LangChain Documentation — RunnableParallel. Link
  • LangChain Documentation — RunnablePassthrough. Link
  • LangChain Documentation — How-to: Chain Runnables. Link
  • LangChain Blog — LCEL Design Philosophy. Link
  • LangSmith Documentation — Tracing and Debugging. Link
  • Related Tutorials