LCEL Mastery: Compose LLM Chains with the Pipe Operator in Python
You have a prompt template, an LLM, and an output parser. You could glue them together with five lines of imperative code — variable here, function call there. Or you could write prompt | model | parser and be done. That pipe operator is LCEL, and once you see how it works, you won't go back.
What Is LCEL and Why Does LangChain Use It?
LCEL stands for LangChain Expression Language. It is a declarative way to compose LangChain components into pipelines using the | (pipe) operator. Prompts, models, parsers, retrievers, custom functions — chain them together, and data flows from left to right.
I think of it as Unix pipes for LLM applications. In a shell, you write cat file.txt | grep error | wc -l and data flows left to right through each command. LCEL does the same thing: data flows through each component, and you can read the entire pipeline at a glance.
The key insight: every component in LangChain implements a Runnable interface. A Runnable has three core methods — .invoke(), .stream(), and .batch(). When you pipe Runnables together, the resulting chain is itself a Runnable. You get invoke/stream/batch on the entire pipeline for free.
Your First LCEL Chain — Prompt, Model, Parser
Let's build the most common LCEL pattern: a prompt template piped to a chat model piped to an output parser. This is the bread and butter of LangChain apps, and I reach for this pattern probably ten times a week.
That single line — prompt | model | parser — replaces three separate .invoke() calls. The ChatPromptTemplate takes a dictionary with your variables and formats them into messages. The model returns an AIMessage, and StrOutputParser extracts just the text content as a plain string.
What makes this more than syntactic sugar is what happens behind the scenes. Call chain.stream({"concept": "generators"}) and tokens stream through the parser in real time — zero streaming logic on your part. Call chain.batch([...]) and inputs run in parallel. The composition is doing real work.
The Runnable Interface — invoke, stream, batch
Every LCEL chain supports all three methods, no matter how complex. You build the pipeline once, then choose how to run it. Building an API? Use stream so users see tokens arrive live. Processing a batch of documents? Use batch to parallelize. Simple script? Plain invoke.
There are async variants too: ainvoke, astream, and abatch. If your app uses asyncio (most modern Python web frameworks do), these avoid blocking the event loop.
RunnableSequence — Building Multi-Step Pipelines
When you write a | b | c, LangChain creates a RunnableSequence. Each step takes the output of the previous step as input. The sequence is itself a Runnable, so you can nest sequences inside other sequences.
Notice the RunnableLambda in the middle. The joke chain outputs a plain string, but the analysis chain expects a dictionary with a "joke" key. The lambda reshapes the data to bridge that gap. You will use this pattern constantly — LCEL is strict about input/output types, and lambdas are the glue.
RunnableParallel — Running Steps Side by Side
Sometimes you need to run multiple operations on the same input simultaneously. Maybe you want to translate text into three languages at once, or generate a summary and a list of key points in parallel. That's what RunnableParallel is for.
RunnableParallel takes keyword arguments where each value is a Runnable chain. It passes the same input to all of them, runs them concurrently, and collects results into a dictionary. The keys you choose (summary, keywords, sentiment) become the output keys.
I use this pattern heavily when building real applications. A common case: you have a user query and you need to both retrieve relevant documents AND classify the query intent before deciding how to respond. Running those in parallel instead of sequentially cuts your latency roughly in half.
LCEL uses the pipe operator to chain functions together. Implement a pipe function that takes a value and a list of functions, and applies each function in sequence — just like the | operator chains Runnables.
Write a function pipe(value, *functions) that:
1. Takes an initial value and any number of functions
2. Applies the first function to the value
3. Passes the result to the second function, and so on
4. Returns the final result
Then create three simple transformation functions and pipe them together.
RunnableLambda — Injecting Custom Python Logic
Not every step in a chain is a prompt or a model call. Sometimes you need to clean text, extract a field, log intermediate results, or run arbitrary Python logic. RunnableLambda wraps any Python function into a Runnable so it slots into an LCEL chain.
The first RunnableLambda cleans the input and computes metadata that the prompt template uses. The last one formats the raw model output into a structured display. Both are plain Python functions — no LangChain magic, just functions that take one argument and return one value.
RunnablePassthrough — Forwarding and Augmenting Data
Here is a problem you will hit quickly: your chain needs some input data to pass through unchanged while also computing new fields. For example, you want to pass the user's original question to the prompt while also retrieving relevant documents. RunnablePassthrough solves this.
RunnablePassthrough.assign(context=...) keeps all existing keys (question) and adds a new key (context) computed by your function. The downstream prompt template receives both {question} and {context}. This pattern is the foundation of every RAG pipeline in LangChain.
# Must manually construct the full dict
chain = (
RunnableLambda(lambda x: {
"question": x["question"],
"context": fake_retriever(x["question"])
})
| prompt
| model
| parser
)# Passthrough keeps existing keys, adds new ones
chain = (
RunnablePassthrough.assign(
context=lambda x: fake_retriever(x["question"])
)
| prompt
| model
| parser
)Fallbacks and Retries — Building Resilient Chains
LLM APIs fail. Rate limits, timeouts, server errors — these are not edge cases, they are Tuesday. In my experience, any production LLM application that does not handle failures will crash within the first week. LCEL has built-in support for both retries and fallbacks.
Retries — Try Again on Transient Failures
.with_retry() wraps any Runnable with automatic retry logic. With wait_exponential_jitter=True, wait times grow exponentially (1s, 2s, 4s...) with random jitter. The jitter prevents multiple failing clients from retrying simultaneously and hammering the API.
Fallbacks — Switch to a Backup Model
.with_fallbacks() takes a list of alternative Runnables. If the primary raises an exception, LangChain tries each fallback in order. In production, I typically chain gpt-4o-mini to gpt-4o to claude-3-haiku so there is always a model available.
RunnableParallel runs multiple functions on the same input and collects results into a dictionary. Implement a parallel_run function that mimics this behavior.
Write a function parallel_run(input_data, **functions) that:
1. Takes an input value and keyword arguments where each value is a function
2. Calls every function with the same input
3. Returns a dictionary mapping each keyword name to its function's result
Example: parallel_run(10, doubled=lambda x: x*2, squared=lambda x: x**2) returns {"doubled": 20, "squared": 100}
Real-World Example: A Multi-Step Document Analyzer
This is the kind of pipeline I build most often at work. You have raw text — could be a support ticket, a research paper, or a news article — and you need a structured analysis report. Summary, key entities, difficulty rating, all in one pass.
This pipeline combines every LCEL concept from this tutorial. RunnableLambda handles preprocessing and formatting. RunnablePassthrough.assign forwards the cleaned text while running three analyses in parallel. The whole thing reads top to bottom and supports streaming, batching, and async out of the box.
Common Mistakes and How to Fix Them
Mistake 1: Input/Output Type Mismatch
This is the single most frequent LCEL error, and I see it in almost every first-time LCEL project. A prompt template expects a dictionary, but the previous step outputs a string. Or a model outputs an AIMessage, but you pipe it into a function expecting a string.
# StrOutputParser returns a string
chain_1 = prompt | model | StrOutputParser()
# ChatPromptTemplate expects a dict with keys
chain_2 = another_prompt | model | StrOutputParser()
# This fails: chain_1 output is a string, chain_2 input needs a dict
broken = chain_1 | chain_2 # TypeError!chain_1 = prompt | model | StrOutputParser()
chain_2 = another_prompt | model | StrOutputParser()
# Bridge the gap with a lambda
fixed = (
chain_1
| RunnableLambda(lambda text: {"input": text})
| chain_2
)Mistake 2: Forgetting That StrOutputParser Discards Metadata
StrOutputParser extracts just the text content from an AIMessage. Token usage, finish reason, and all other metadata are gone after that point. If you need metadata downstream, skip StrOutputParser and extract what you need with a custom lambda.
Mistake 3: Using Lambda Expressions That Are Hard to Debug
Inline lambdas are convenient but invisible in stack traces. When a chain with five lambdas fails, the error says "error in <lambda>" — no indication of which one broke. For anything beyond trivial transforms, use named functions.
chain = (
RunnableLambda(lambda x: x["text"].strip())
| RunnableLambda(lambda x: {"query": x, "k": 5})
| RunnableLambda(lambda x: retriever(x)) # which lambda failed?
)def clean_text(x):
return x["text"].strip()
def prepare_query(text):
return {"query": text, "k": 5}
def retrieve_docs(params):
return retriever(params)
chain = (
RunnableLambda(clean_text)
| RunnableLambda(prepare_query)
| RunnableLambda(retrieve_docs)
)Performance Tips and How LCEL Works Internally
I have never seen LCEL overhead be the bottleneck — the pipe operator creates a RunnableSequence at definition time, and .invoke() just loops through steps. The real performance wins come from how you structure your chains.
Use `RunnableParallel` for independent LLM calls. If two steps don't depend on each other, run them in parallel. Three sequential 2-second LLM calls = 6 seconds. Three parallel calls = 2 seconds. That is the single biggest performance win in LCEL.
Use `.batch()` for processing multiple inputs. Calling chain.invoke() in a loop is sequential. Calling chain.batch(items) runs them concurrently with configurable parallelism:
Prefer `.stream()` over `.invoke()` for user-facing apps. Streaming gives users something to read immediately instead of staring at a blank screen. Total time is the same, but perceived latency drops dramatically.
Implement a with_fallback function that wraps a primary function with fallback behavior — just like LCEL's .with_fallbacks() method.
Write a function with_fallback(primary, *fallbacks) that:
1. Returns a new function
2. When called, the new function tries the primary function first
3. If the primary raises any exception, it tries each fallback function in order
4. Returns the result of the first function that succeeds
5. If all functions fail, raises the last exception
Also implement with_retry(func, max_attempts=3) that:
1. Returns a new function that retries func up to max_attempts times
2. If all attempts fail, raises the last exception
Frequently Asked Questions
Can I use LCEL with non-LangChain functions?
Yes. Wrap any Python callable in RunnableLambda and it becomes a full LCEL citizen with .invoke(), .batch(), and .stream() support. You can also use the @chain decorator for the same effect with cleaner syntax.
Is LCEL faster than calling components manually?
For a single sequential chain, LCEL and manual calls have nearly identical performance. The bottleneck is the LLM API call, not chain overhead. Where LCEL wins is RunnableParallel and .batch(), which parallelize work you would otherwise thread manually. LCEL also handles streaming propagation across multi-step chains automatically.
What is the difference between RunnableSequence and the old LLMChain?
LLMChain was the original LangChain API for combining a prompt and a model. It has been deprecated in favor of LCEL. The old LLMChain(prompt=prompt, llm=model) is now simply prompt | model.
LCEL is more composable (chain anything with anything), more transparent (no hidden state), and supports streaming and batching natively. If you see LLMChain in tutorials or Stack Overflow answers, translate it to LCEL.
How do I debug an LCEL chain that returns unexpected results?
Insert a RunnableLambda that prints intermediate values at the point where you suspect the issue. This is the chain equivalent of adding print statements:
For production debugging, use LangSmith. It captures the full trace of every chain execution — inputs, outputs, latency, and token usage at each step. It is the dedicated observability tool for LangChain apps.