LCEL Mastery: Compose LLM Chains with the Pipe Operator in Python
You have a prompt template, an LLM, and an output parser. You could glue them together with five lines of imperative code — variable here, function call there. Or you could write prompt | model | parser and be done. That pipe operator is LCEL, and once you see how it works, you won't go back.
What Is LCEL and Why Does LangChain Use It?
LCEL stands for LangChain Expression Language. It is a declarative way to compose LangChain components into pipelines using the | (pipe) operator. Prompts, models, parsers, retrievers, custom functions — chain them together, and data flows from left to right.
I think of it as Unix pipes for LLM applications. In a shell, you write cat file.txt | grep error | wc -l and data flows left to right through each command. LCEL does the same thing: data flows through each component, and you can read the entire pipeline at a glance.
The key insight: every component in LangChain implements a Runnable interface. A Runnable has three core methods — .invoke(), .stream(), and .batch(). When you pipe Runnables together, the resulting chain is itself a Runnable. You get invoke/stream/batch on the entire pipeline for free.
Your First LCEL Chain — Prompt, Model, Parser
The most common LCEL pattern connects three components. `ChatPromptTemplate` formats your input variables into messages. ChatOpenAI sends those messages to the OpenAI API and returns an AIMessage. `StrOutputParser` extracts the plain text content from that AIMessage.
I reach for this three-piece chain constantly. The code below creates each component separately, then composes them with |. When you call chain.invoke({...}), your dictionary flows through the prompt, the resulting messages hit the model, and the parser hands you back a clean string.
What makes this more than syntactic sugar is what happens behind the scenes. Call chain.stream({"concept": "generators"}) and tokens stream through the parser in real time — zero streaming logic on your part. Call chain.batch([...]) and inputs run in parallel. The composition is doing real work.
The Runnable Interface — invoke, stream, batch
Every LCEL chain — no matter how complex — exposes the same three methods. .invoke(input) runs the chain once with a single input and returns the result. .batch([input_1, input_2, ...]) runs the chain on multiple inputs concurrently, returning a list of results. .stream(input) yields output tokens one at a time as they arrive from the model.
The code below builds the same prompt | model | parser chain and demonstrates all three calling methods. The chain definition is identical each time — only the method changes. I find this uniform interface the core design win of LCEL: build once, run any way you need.
Every LCEL chain supports all three methods, no matter how complex. Building an API? Use stream so users see tokens arrive live. Processing a batch of documents? Use batch to parallelize. Simple script? Plain invoke.
RunnableSequence — Building Multi-Step Pipelines
When you write a | b | c, LangChain creates a RunnableSequence. Each step takes the output of the previous step as input. The sequence is itself a Runnable, so you can nest sequences inside other sequences.
The example below chains two separate pipelines: a joke generator that outputs a string and a joke analyzer that expects a {"joke": ...} dictionary. Because the types don't match, a RunnableLambda in the middle reshapes the output. Watch for this shape-bridging pattern — you will use it constantly in multi-chain LCEL compositions.
Notice the RunnableLambda in the middle. The joke chain outputs a plain string, but the analysis chain expects a dictionary with a "joke" key. The lambda reshapes the data to bridge that gap. LCEL is strict about input/output types, and lambdas are the glue.
RunnableParallel — Running Steps Side by Side
Sometimes you need to run multiple operations on the same input simultaneously. Maybe you want to translate text into three languages at once, or generate a summary and a list of key points in parallel. That's what RunnableParallel is for.
The code below creates three independent analysis chains (summary, keywords, sentiment) and wraps them in a RunnableParallel. When invoked, all three chains receive the same {text} input, run concurrently, and return their results as a dictionary with keys matching the names you chose.
I use this pattern heavily when building real applications. A common case: you have a user query and you need to both retrieve documents from a vector database AND classify the query intent before deciding how to respond. Running those in parallel instead of sequentially cuts your latency roughly in half.
LCEL uses the pipe operator to chain functions together. Implement a pipe function that takes a value and a list of functions, and applies each function in sequence — just like the | operator chains Runnables.
Write a function pipe(value, *functions) that:
1. Takes an initial value and any number of functions
2. Applies the first function to the value
3. Passes the result to the second function, and so on
4. Returns the final result
Then create three simple transformation functions and pipe them together.
RunnableLambda — Injecting Custom Python Logic
Not every step in a chain is a prompt or a model call. Sometimes you need to clean text, extract a field, log intermediate results, or run arbitrary Python logic. RunnableLambda wraps any Python function into a Runnable so it slots into an LCEL chain.
The chain below uses two custom functions: clean_and_prepare at the front strips whitespace, lowercases the input, counts words, and returns a dictionary with text, word_count, and instruction keys for the prompt template. format_output at the end wraps the raw model response in a structured display with header lines. Both are plain Python functions that take one argument and return one value.
RunnablePassthrough — Forwarding and Augmenting Data
Here is a problem you will hit quickly: your chain needs some input data to pass through unchanged while also computing new fields. For example, in a RAG pipeline you want to pass the user's original question to the prompt while also retrieving relevant documents. RunnablePassthrough solves this.
The code below simulates a RAG pipeline. RunnablePassthrough.assign(context=...) keeps the original question key intact and adds a new context key by calling fake_retriever. The downstream prompt template receives both {question} and {context}, which is exactly the pattern every RAG chain in LangChain uses.
# Must manually construct the full dict
chain = (
RunnableLambda(lambda x: {
"question": x["question"],
"context": fake_retriever(x["question"])
})
| prompt
| model
| parser
)# Passthrough keeps existing keys, adds new ones
chain = (
RunnablePassthrough.assign(
context=lambda x: fake_retriever(x["question"])
)
| prompt
| model
| parser
)RunnableBranch — Conditional Routing in Chains
Not every input should follow the same path. A chatbot might route billing questions to one chain and technical questions to another. RunnableBranch adds if/elif/else logic to LCEL pipelines — you define condition-chain pairs, and the first matching condition routes the input to its corresponding chain.
The example below classifies user questions by topic and routes them to specialized chains. RunnableBranch takes a list of (condition_function, runnable) tuples followed by a default runnable. Each condition receives the input and returns True or False. The first True condition wins.
The last argument to RunnableBranch is always the default — no condition function, just a Runnable. If none of the conditions match, the input goes here. This mirrors Python's if/elif/else structure.
Fallbacks and Retries — Building Resilient Chains
LLM APIs fail. Rate limits, timeouts, server errors — these are not edge cases, they are Tuesday. In my experience, any production LLM application that does not handle failures will crash within the first week. LCEL has built-in support for both retries and fallbacks.
Retries — Try Again on Transient Failures
.with_retry() wraps any Runnable with automatic retry logic. The code below configures the model to retry up to 3 times with exponential backoff and jitter. Exponential backoff means wait times grow (1s, 2s, 4s...), and jitter adds randomness so multiple failing clients don't hammer the API simultaneously.
Fallbacks — Switch to a Backup Model
When retries aren't enough — maybe the entire provider is down — you need a fallback to a different model. .with_fallbacks() takes a list of alternative Runnables. If the primary raises an exception, LangChain tries each fallback in order until one succeeds.
In production, I typically chain gpt-4o-mini to gpt-4o to a third provider so there is always a model available. The model switching tutorial covers multi-provider setups in depth.
RunnableParallel runs multiple functions on the same input and collects results into a dictionary. Implement a parallel_run function that mimics this behavior.
Write a function parallel_run(input_data, **functions) that:
1. Takes an input value and keyword arguments where each value is a function
2. Calls every function with the same input
3. Returns a dictionary mapping each keyword name to its function's result
Example: parallel_run(10, doubled=lambda x: x*2, squared=lambda x: x**2) returns {"doubled": 20, "squared": 100}
Real-World Example: A Multi-Step Document Analyzer
This is the kind of pipeline I build most often at work. You have raw text — maybe loaded via document loaders — and you need a structured analysis report with summary, entities, and difficulty rating, all in one pass.
The pipeline below combines every LCEL concept from this tutorial into a single chain. It works in three stages: (1) a RunnableLambda preprocesses the raw text — stripping whitespace, counting words and characters; (2) RunnablePassthrough.assign forwards the cleaned data while running three parallel LLM analyses (summary, entity extraction, difficulty rating) via an implicit RunnableParallel; (3) a final RunnableLambda formats all results into a readable report.
Common Mistakes and How to Fix Them
Mistake 1: Input/Output Type Mismatch
This is the single most frequent LCEL error, and I see it in almost every first-time LCEL project. A prompt template expects a dictionary, but the previous step outputs a string. Or a model outputs an AIMessage, but you pipe it into a function expecting a string.
# StrOutputParser returns a string
chain_1 = prompt | model | StrOutputParser()
# ChatPromptTemplate expects a dict with keys
chain_2 = another_prompt | model | StrOutputParser()
# This fails: chain_1 output is a string, chain_2 input needs a dict
broken = chain_1 | chain_2 # TypeError!chain_1 = prompt | model | StrOutputParser()
chain_2 = another_prompt | model | StrOutputParser()
# Bridge the gap with a lambda
fixed = (
chain_1
| RunnableLambda(lambda text: {"input": text})
| chain_2
)Mistake 2: Forgetting That StrOutputParser Discards Metadata
StrOutputParser extracts just the text content from an AIMessage. Token usage, finish reason, and all other metadata are gone after that point. If you need metadata downstream, skip StrOutputParser and extract what you need with a custom lambda.
Mistake 3: Using Lambda Expressions That Are Hard to Debug
Inline lambdas are convenient but invisible in stack traces. When a chain with five lambdas fails, the error says "error in <lambda>" — no indication of which one broke. For anything beyond trivial transforms, use named functions.
chain = (
RunnableLambda(lambda x: x["text"].strip())
| RunnableLambda(lambda x: {"query": x, "k": 5})
| RunnableLambda(lambda x: retriever(x)) # which lambda failed?
)def clean_text(x):
return x["text"].strip()
def prepare_query(text):
return {"query": text, "k": 5}
def retrieve_docs(params):
return retriever(params)
chain = (
RunnableLambda(clean_text)
| RunnableLambda(prepare_query)
| RunnableLambda(retrieve_docs)
)Performance Tips and How LCEL Works Internally
I have never seen LCEL overhead be the bottleneck — the pipe operator creates a RunnableSequence at definition time, and .invoke() just loops through steps. The real performance wins come from how you structure your chains.
You can also fine-tune model behavior per-chain using .bind(). This attaches fixed kwargs — like stop sequences or response format — to a model without changing the model object itself. Useful when the same model appears in multiple chains with different constraints.
Implement a with_fallback function that wraps a primary function with fallback behavior — just like LCEL's .with_fallbacks() method.
Write a function with_fallback(primary, *fallbacks) that:
1. Returns a new function
2. When called, the new function tries the primary function first
3. If the primary raises any exception, it tries each fallback function in order
4. Returns the result of the first function that succeeds
5. If all functions fail, raises the last exception
Also implement with_retry(func, max_attempts=3) that:
1. Returns a new function that retries func up to max_attempts times
2. If all attempts fail, raises the last exception
Frequently Asked Questions
Can I use LCEL with non-LangChain functions?
Yes. Wrap any Python callable in RunnableLambda and it becomes a full LCEL citizen with .invoke(), .batch(), and .stream() support. You can also use the @chain decorator for the same effect with cleaner syntax.
Is LCEL faster than calling components manually?
For a single sequential chain, LCEL and manual calls have nearly identical performance. The bottleneck is the LLM API call, not chain overhead. Where LCEL wins is RunnableParallel and .batch(), which parallelize work you would otherwise thread manually. LCEL also handles streaming propagation across multi-step chains automatically.
What is the difference between RunnableSequence and the old LLMChain?
LLMChain was the original LangChain API for combining a prompt and a model. It has been deprecated in favor of LCEL. The old LLMChain(prompt=prompt, llm=model) is now simply prompt | model. LCEL is more composable, more transparent (no hidden state), and supports streaming and batching natively.
How do I debug an LCEL chain that returns unexpected results?
Insert a RunnableLambda that prints intermediate values at the point where you suspect the issue. This is the chain equivalent of adding print statements.
For production debugging, use LangSmith. It captures the full trace of every chain execution — inputs, outputs, latency, and token usage at each step. It is the dedicated observability tool for LangChain apps.
When should I use RunnableBranch vs. a simple if/else?
Use RunnableBranch when the routing logic is part of a larger LCEL pipeline and you want streaming, batching, and tracing to work automatically. Use a plain if/else inside a RunnableLambda when the branching is simple and you prefer readability over LCEL integration.
Where to Go Next
You now have the full LCEL toolkit: pipe composition, parallel fan-out, custom lambdas, passthrough augmentation, conditional branching, and resilient fallbacks. Here are the natural next steps depending on what you are building.