Skip to main content

Build a Chatbot with Memory: Buffer, Window, and Summary Memory in LangChain

Intermediate120 min3 exercises50 XP
0/3 exercises

Your LangChain chatbot answers questions perfectly — until the user says "wait, go back to what you said earlier." The bot has no idea what it said earlier. Every chain invocation starts from scratch, and without explicit memory management, even the most sophisticated prompt template produces a bot with amnesia.

By the end of this tutorial, you'll know three distinct memory strategies — buffer, window, and summary — and exactly when each one is the right choice. You'll also learn the modern RunnableWithMessageHistory pattern that replaced the legacy ConversationChain.

Why LLMs Forget — and How LangChain Memory Fixes It

Every LLM API call is stateless. You send a list of messages, the model generates a response, and then it forgets everything. What feels like "memory" in ChatGPT is actually the application resending the entire conversation with every request.

I spent an embarrassing amount of time debugging a chatbot that "forgot" user preferences before realizing the issue wasn't the model — it was my code. I was creating a fresh message list on every call instead of appending to a persistent one. LangChain's memory module exists to solve exactly this problem: it manages conversation history so you don't have to track it manually.

Here's a bare-bones chatbot with no memory. Two calls, and the second one has no idea about the first:

A stateless chatbot — no memory between calls
Loading editor...

The second call returns something like "I don't have access to your personal information." The model genuinely doesn't know — it never saw the first message.

LangChain provides three memory classes that solve this at different levels of sophistication. Each one answers the same question differently: what should we include in the message history?

Memory TypeWhat It StoresToken UsageBest For
ConversationBufferMemoryEvery message, verbatimGrows linearlyShort conversations (<20 turns)
ConversationBufferWindowMemoryLast k message pairsFixed ceilingMedium conversations with recent-context focus
ConversationSummaryMemoryRunning summary of the conversationRoughly constantLong conversations (50+ turns)

ConversationBufferMemory — Remember Everything

This is the simplest memory strategy and the one I reach for first when prototyping. ConversationBufferMemory stores every message in the conversation — every user input and every AI response — and resends all of them on every call. Nothing is trimmed, nothing is summarized.

The setup requires three pieces: a prompt template with a placeholder for chat history, the memory object, and the chain that wires them together. Here's the complete working example:

Full chatbot with ConversationBufferMemory
Loading editor...

Five pieces, each with a clear job. The session_store dictionary maps session IDs to ChatMessageHistory objects. RunnableWithMessageHistory handles the plumbing: before each call it loads the history, after each call it saves the new messages. You never touch the history list directly.

The config dictionary is how you tell the chatbot which conversation session to use. Every invocation needs it:

Multi-turn conversation with buffer memory
Loading editor...

Turn 3 works because the entire conversation — all six messages (three from the user, three from the assistant) — gets sent to the model. The bot can answer "Your name is Priya and you're learning decorators" because it literally sees both facts in the history.

You can inspect what's stored at any time by pulling the session history:

Inspecting stored messages
Loading editor...

After three turns, you'll see six messages — alternating human and ai types. This is the full, uncompressed record of the conversation.

ConversationBufferWindowMemory — Keep Only the Last k Turns

Window memory — only the last 5 exchanges
Loading editor...

The k=5 parameter means "keep the last 5 exchange pairs" (10 messages total — 5 human + 5 AI). Once the conversation exceeds 5 turns, the oldest pair gets dropped. The model never sees turn 1 when you're on turn 7.

This is the strategy I use for most production chatbots. The tradeoff is explicit: you lose distant context but gain predictable token costs and a guarantee you'll never blow past the context window. For customer support bots and FAQ assistants, the last 5-10 turns contain everything the model needs.

The modern approach uses RunnableWithMessageHistory with a custom trimming function instead of the legacy memory class. Here's how to implement window-style memory with the current LangChain patterns:

Modern window memory with message trimming
Loading editor...

The trim_messages utility trims the history before it reaches the prompt. Setting strategy="last" keeps the most recent messages and discards older ones. The include_system=True flag ensures the system prompt is never trimmed away — a subtle but important detail.

One gotcha with window memory: if the user refers to something from turn 1 and you're on turn 15 with k=5, the model will confidently hallucinate or say it doesn't know. It genuinely doesn't — that message was trimmed. For some applications this is fine. For others, you need summary memory.

ConversationSummaryMemory — Compress the Past Into a Summary

What if the conversation is 50 turns long and the user references something from turn 3? Window memory lost it. Buffer memory would work but costs a fortune in tokens. Summary memory offers a third option: instead of storing raw messages, it asks the LLM to maintain a running summary of the conversation.

After each exchange, the summary gets updated to include the new information. The model receives this compressed summary plus the most recent messages, giving it both distant context and recent detail.

ConversationSummaryMemory in action
Loading editor...

Instead of storing six messages, the memory holds a condensed summary like: "The user is Priya, a data engineer at Spotify who needs to build a real-time ETL pipeline using Kafka with Avro schema evolution for backward compatibility." Three turns compressed into one sentence.

The real power of summary memory shows up in long conversations. After 50 turns, buffer memory is sending 100+ messages per call. Summary memory is still sending one paragraph plus the latest exchange.

Here's a comparison of what the LLM receives at turn 20 under each strategy:

Token usage comparison at turn 20
Loading editor...

Running this produces:

Python
Loading editor...

The gap widens as conversations get longer. At turn 100, buffer memory sends roughly 10,000 tokens per call while summary memory stays around 300.

RunnableWithMessageHistory — The Modern Pattern

If you've read LangChain tutorials from 2023, you'll see ConversationChain everywhere. That class is now deprecated. The replacement is RunnableWithMessageHistory, which is more flexible, composable, and fits naturally into LangChain's expression language (LCEL).

We've already used RunnableWithMessageHistory in the buffer memory section. Here I want to break down why it works the way it does and show the complete pattern with all the moving parts labeled.

Complete RunnableWithMessageHistory pattern — annotated
Loading editor...

Three things to notice. First, input_messages_key tells the wrapper which part of your input dictionary is the user's message — this is what gets saved to history. Second, history_messages_key must match the variable_name in your MessagesPlaceholder. Third, the session ID comes from the config at invocation time, not at construction time.

Multiple Sessions Running Simultaneously

A real chatbot serves many users at once. The session ID is how you keep their conversations separate. Each user gets their own history:

Separate sessions for different users
Loading editor...

The store dictionary now has two entries: "user-001" with Priya's history and "user-002" with Carlos's history. In production, you'd replace the dictionary with Redis, a database, or another persistent backend.

Build a Session Store Function
Write Code

Write a function get_or_create_session that takes a session_id (string) and a store (dictionary) as arguments. If the session ID exists in the store, return the existing list. If not, create a new empty list, add it to the store, and return it.

Then simulate two sessions:

1. Add the message "Hello from Priya" to session "s1"

2. Add the message "Hello from Carlos" to session "s2"

3. Add the message "Follow-up from Priya" to session "s1"

Finally, print the length of session "s1" and session "s2" on separate lines.

Loading editor...

Combining Strategies — Summary Plus Recent Messages

In production, the most effective pattern combines summary and window memory. You keep the last k messages verbatim for detailed recent context, and prepend a summary of everything older for long-range awareness. This gives the model both precision (exact recent messages) and breadth (compressed older context).

LangChain provides ConversationSummaryBufferMemory for exactly this. It stores raw messages until the total token count exceeds a threshold, then summarizes the older messages and keeps the recent ones intact:

ConversationSummaryBufferMemory — the best of both worlds
Loading editor...

With only two turns, the token count is below 300, so you'll see the raw messages. Add 5-6 more turns and the older messages get compressed into a summary while recent ones stay verbatim.

Persistent Memory — Surviving Server Restarts

Everything we've built so far uses a Python dictionary for session storage. Restart the server and all conversations disappear. For production chatbots, you need persistent storage. LangChain supports several backends through ChatMessageHistory implementations.

The architecture stays the same — you just swap the get_session_history function. Here's how it looks with Redis as the backend:

Redis-backed persistent memory
Loading editor...

That's the elegance of the RunnableWithMessageHistory design. Your chain, prompt, and invocation code stay identical. Only the storage backend changes. LangChain's community package includes implementations for Redis, PostgreSQL, MongoDB, DynamoDB, Firestore, and more.

Here's a quick reference for the most common backends:

BackendImportWhen to Use
In-memory dictChatMessageHistoryDevelopment and testing
RedisRedisChatMessageHistoryLow-latency production apps
PostgreSQLPostgresChatMessageHistoryWhen you already have a Postgres DB
MongoDBMongoDBChatMessageHistoryDocument-oriented storage needs
SQLiteSQLChatMessageHistorySingle-server apps, local persistence
Implement a Window Trimmer
Write Code

Write a function trim_to_window(messages: list, k: int) -> list that takes a list of messages and returns only the last k messages. If the list has fewer than k messages, return all of them.

Then test it:

1. Create a list of 8 messages: ["msg1", "msg2", ..., "msg8"]

2. Trim to window size 5 and print the result

3. Trim the original list to window size 20 and print the result

Loading editor...

Real-World Example: Customer Support Bot with Memory

Let's put everything together into a customer support chatbot that uses summary buffer memory, handles multiple sessions, and includes a system prompt tailored for support interactions. This is close to what I've deployed in production.

Production-style support chatbot — complete example
Loading editor...

The chain combines trimming with the prompt and LLM in a single LCEL pipeline. The trimmer runs first, cutting the history to 20 messages before the prompt template formats everything.

Session management and invocation
Loading editor...

Each turn builds on the previous context. By turn 4, the bot knows the customer's order number (TS-78432), that they paid for express shipping, and that the package hasn't arrived. Without memory, each turn would start from zero and the customer would have to repeat everything.

You can inspect the full history to verify what's been stored:

Inspecting the conversation history
Loading editor...

Common Mistakes and How to Fix Them

After building several LangChain chatbots, these are the bugs I see most often — including ones I've made myself.

Mistake 1: Forgetting the Config on Invocation

Missing config — raises an error
# This crashes with a ConfigError
response = chatbot.invoke({"input": "Hello"})
Include the session config
# Always pass config with session_id
config = {"configurable": {"session_id": "user-001"}}
response = chatbot.invoke({"input": "Hello"}, config=config)

RunnableWithMessageHistory requires a session ID to know which conversation to load. Without the config, you get a ValueError about missing configurable fields.

Mistake 2: Mismatched Placeholder and Key Names

Silent bug — history never appears
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are helpful."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
])

bot = RunnableWithMessageHistory(
    prompt | llm,
    get_history,
    input_messages_key="input",
    history_messages_key="history",  # WRONG: doesn't match "chat_history"
)
Keys match — history works correctly
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are helpful."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}"),
])

bot = RunnableWithMessageHistory(
    prompt | llm,
    get_history,
    input_messages_key="input",
    history_messages_key="history",  # Matches "history" above
)

This is the sneakiest bug because it doesn't raise an error. The history placeholder just receives an empty list, and the bot behaves like it has no memory. I've debugged this for other developers at least five times.

Mistake 3: Using the Deprecated ConversationChain

If you see from langchain.chains import ConversationChain in a tutorial, that tutorial is outdated. ConversationChain was deprecated in LangChain 0.2.7. The modern replacement is the RunnableWithMessageHistory pattern shown throughout this tutorial.

Deprecated — ConversationChain (pre-0.2.7)
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

# This still works but is deprecated
chain = ConversationChain(
    llm=llm,
    memory=ConversationBufferMemory(),
)
Modern — RunnableWithMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# Compose with LCEL, wrap with history
chain = prompt | llm
bot = RunnableWithMessageHistory(chain, get_history,
    input_messages_key="input",
    history_messages_key="history",
)
Build a Conversation Summarizer
Write Code

Write a function summarize_conversation(messages: list[dict]) -> str that takes a list of message dictionaries (each with "role" and "content" keys) and returns a one-line summary string.

The summary should follow this format: "{n} messages: {roles}" where {n} is the total message count and {roles} is a comma-separated list of unique roles in the order they first appear.

Test with:

1. A conversation with 4 messages (roles: user, assistant, user, assistant)

2. A conversation with 1 message (role: system)

Loading editor...

Frequently Asked Questions

Can I use memory with streaming responses?

Yes. RunnableWithMessageHistory works with both .invoke() and .stream(). The wrapper saves messages to history after the stream completes:

Python
Loading editor...

How do I clear a session's memory?

Call .clear() on the ChatMessageHistory object for that session:

Python
Loading editor...

What happens when the conversation exceeds the model's context window?

With buffer memory, you get an API error when the total tokens (system prompt + history + new message) exceed the model's limit. GPT-4o-mini supports 128K tokens, so this is rare for text-only conversations. For models with smaller context windows (8K-32K), use window memory or summary buffer memory to prevent hitting the limit.

Can I add metadata to messages (timestamps, user roles)?

LangChain messages have an additional_kwargs field for arbitrary metadata. You can add timestamps, user IDs, or any other data without affecting how the LLM processes the message:

Python
Loading editor...

Complete Code

Complete chatbot with memory — copy and run
Loading editor...

References

  • LangChain documentation — Memory. Link
  • LangChain documentation — How to add message history. Link
  • LangChain documentation — RunnableWithMessageHistory. Link
  • LangChain documentation — trim_messages. Link
  • LangChain documentation — ChatMessageHistory. Link
  • OpenAI API documentation — Chat completions. Link
  • LangChain blog — Migrating from ConversationChain. Link
  • Related Tutorials