Build a Chatbot with Memory: Buffer, Window, and Summary Memory in LangChain

Intermediate120 min3 exercises50 XP

Prerequisites

0/3 exercises

Your LangChain chatbot answers questions perfectly — until the user says "wait, go back to what you said earlier." The bot has no idea what it said earlier. Every chain invocation starts from scratch, and without explicit memory management, even the most sophisticated prompt template produces a bot with amnesia.

By the end of this tutorial, you'll know three distinct memory strategies — buffer, window, and summary — and exactly when each one is the right choice. You'll also learn the modern RunnableWithMessageHistory pattern that replaced the legacy ConversationChain.

Why LLMs Forget — and How LangChain Memory Fixes It

Every LLM API call is stateless. You send a list of messages, the model generates a response, and then it forgets everything. What feels like "memory" in ChatGPT is actually the application resending the entire conversation with every request.

I spent an embarrassing amount of time debugging a chatbot that "forgot" user preferences before realizing the issue wasn't the model — it was my code. I was creating a fresh message list on every call instead of appending to a persistent one. LangChain's memory module exists to solve exactly this problem: it manages conversation history so you don't have to track it manually.

Here's a bare-bones chatbot with no memory. Two calls, and the second one has no idea about the first:

A stateless chatbot — no memory between calls

Loading editor...

The second call returns something like "I don't have access to your personal information." The model genuinely doesn't know — it never saw the first message.

LangChain provides three memory classes that solve this at different levels of sophistication. Each one answers the same question differently: what should we include in the message history?

Memory Type	What It Stores	Token Usage	Best For
ConversationBufferMemory	Every message, verbatim	Grows linearly	Short conversations (<20 turns)
ConversationBufferWindowMemory	Last k message pairs	Fixed ceiling	Medium conversations with recent-context focus
ConversationSummaryMemory	Running summary of the conversation	Roughly constant	Long conversations (50+ turns)

ConversationBufferMemory — Remember Everything

This is the simplest memory strategy and the one I reach for first when prototyping. ConversationBufferMemory stores every message in the conversation — every user input and every AI response — and resends all of them on every call. Nothing is trimmed, nothing is summarized.

The setup requires three pieces: a prompt template with a placeholder for chat history, the memory object, and the chain that wires them together. Here's the complete working example:

Full chatbot with ConversationBufferMemory

Loading editor...

Five pieces, each with a clear job. The session_store dictionary maps session IDs to ChatMessageHistory objects. RunnableWithMessageHistory handles the plumbing: before each call it loads the history, after each call it saves the new messages. You never touch the history list directly.

The config dictionary is how you tell the chatbot which conversation session to use. Every invocation needs it:

Multi-turn conversation with buffer memory

Loading editor...

Turn 3 works because the entire conversation — all six messages (three from the user, three from the assistant) — gets sent to the model. The bot can answer "Your name is Priya and you're learning decorators" because it literally sees both facts in the history.

You can inspect what's stored at any time by pulling the session history:

Inspecting stored messages

Loading editor...

After three turns, you'll see six messages — alternating human and ai types. This is the full, uncompressed record of the conversation.

ConversationBufferWindowMemory — Keep Only the Last k Turns

Window memory — only the last 5 exchanges

Loading editor...

The k=5 parameter means "keep the last 5 exchange pairs" (10 messages total — 5 human + 5 AI). Once the conversation exceeds 5 turns, the oldest pair gets dropped. The model never sees turn 1 when you're on turn 7.

This is the strategy I use for most production chatbots. The tradeoff is explicit: you lose distant context but gain predictable token costs and a guarantee you'll never blow past the context window. For customer support bots and FAQ assistants, the last 5-10 turns contain everything the model needs.

The modern approach uses RunnableWithMessageHistory with a custom trimming function instead of the legacy memory class. Here's how to implement window-style memory with the current LangChain patterns:

Modern window memory with message trimming

Loading editor...

The trim_messages utility trims the history before it reaches the prompt. Setting strategy="last" keeps the most recent messages and discards older ones. The include_system=True flag ensures the system prompt is never trimmed away — a subtle but important detail.

One gotcha with window memory: if the user refers to something from turn 1 and you're on turn 15 with k=5, the model will confidently hallucinate or say it doesn't know. It genuinely doesn't — that message was trimmed. For some applications this is fine. For others, you need summary memory.

ConversationSummaryMemory — Compress the Past Into a Summary

What if the conversation is 50 turns long and the user references something from turn 3? Window memory lost it. Buffer memory would work but costs a fortune in tokens. Summary memory offers a third option: instead of storing raw messages, it asks the LLM to maintain a running summary of the conversation.

After each exchange, the summary gets updated to include the new information. The model receives this compressed summary plus the most recent messages, giving it both distant context and recent detail.

ConversationSummaryMemory in action

Loading editor...

Instead of storing six messages, the memory holds a condensed summary like: "The user is Priya, a data engineer at Spotify who needs to build a real-time ETL pipeline using Kafka with Avro schema evolution for backward compatibility." Three turns compressed into one sentence.

The real power of summary memory shows up in long conversations. After 50 turns, buffer memory is sending 100+ messages per call. Summary memory is still sending one paragraph plus the latest exchange.

Here's a comparison of what the LLM receives at turn 20 under each strategy:

Token usage comparison at turn 20

Loading editor...

Running this produces:

Python

Loading editor...

The gap widens as conversations get longer. At turn 100, buffer memory sends roughly 10,000 tokens per call while summary memory stays around 300.

RunnableWithMessageHistory — The Modern Pattern

If you've read LangChain tutorials from 2023, you'll see ConversationChain everywhere. That class is now deprecated. The replacement is RunnableWithMessageHistory, which is more flexible, composable, and fits naturally into LangChain's expression language (LCEL).

We've already used RunnableWithMessageHistory in the buffer memory section. Here I want to break down why it works the way it does and show the complete pattern with all the moving parts labeled.

Complete RunnableWithMessageHistory pattern — annotated

Loading editor...

Three things to notice. First, input_messages_key tells the wrapper which part of your input dictionary is the user's message — this is what gets saved to history. Second, history_messages_key must match the variable_name in your MessagesPlaceholder. Third, the session ID comes from the config at invocation time, not at construction time.

Multiple Sessions Running Simultaneously

A real chatbot serves many users at once. The session ID is how you keep their conversations separate. Each user gets their own history:

Separate sessions for different users

Loading editor...

The store dictionary now has two entries: "user-001" with Priya's history and "user-002" with Carlos's history. In production, you'd replace the dictionary with Redis, a database, or another persistent backend.

Build a Session Store Function

Write Code

Write a function get_or_create_session that takes a session_id (string) and a store (dictionary) as arguments. If the session ID exists in the store, return the existing list. If not, create a new empty list, add it to the store, and return it.

Then simulate two sessions:

1. Add the message "Hello from Priya" to session "s1"

2. Add the message "Hello from Carlos" to session "s2"

3. Add the message "Follow-up from Priya" to session "s1"

Finally, print the length of session "s1" and session "s2" on separate lines.

Loading editor...

Combining Strategies — Summary Plus Recent Messages

In production, the most effective pattern combines summary and window memory. You keep the last k messages verbatim for detailed recent context, and prepend a summary of everything older for long-range awareness. This gives the model both precision (exact recent messages) and breadth (compressed older context).

LangChain provides ConversationSummaryBufferMemory for exactly this. It stores raw messages until the total token count exceeds a threshold, then summarizes the older messages and keeps the recent ones intact:

ConversationSummaryBufferMemory — the best of both worlds

Loading editor...

With only two turns, the token count is below 300, so you'll see the raw messages. Add 5-6 more turns and the older messages get compressed into a summary while recent ones stay verbatim.

Persistent Memory — Surviving Server Restarts

Everything we've built so far uses a Python dictionary for session storage. Restart the server and all conversations disappear. For production chatbots, you need persistent storage. LangChain supports several backends through ChatMessageHistory implementations.

The architecture stays the same — you just swap the get_session_history function. Here's how it looks with Redis as the backend:

Redis-backed persistent memory

Loading editor...

That's the elegance of the RunnableWithMessageHistory design. Your chain, prompt, and invocation code stay identical. Only the storage backend changes. LangChain's community package includes implementations for Redis, PostgreSQL, MongoDB, DynamoDB, Firestore, and more.

Here's a quick reference for the most common backends:

Backend	Import	When to Use
In-memory dict	`ChatMessageHistory`	Development and testing
Redis	`RedisChatMessageHistory`	Low-latency production apps
PostgreSQL	`PostgresChatMessageHistory`	When you already have a Postgres DB
MongoDB	`MongoDBChatMessageHistory`	Document-oriented storage needs
SQLite	`SQLChatMessageHistory`	Single-server apps, local persistence

Implement a Window Trimmer

Write Code

Write a function trim_to_window(messages: list, k: int) -> list that takes a list of messages and returns only the last k messages. If the list has fewer than k messages, return all of them.

Then test it:

1. Create a list of 8 messages: ["msg1", "msg2", ..., "msg8"]

2. Trim to window size 5 and print the result

3. Trim the original list to window size 20 and print the result

Loading editor...

Real-World Example: Customer Support Bot with Memory

Let's put everything together into a customer support chatbot that uses summary buffer memory, handles multiple sessions, and includes a system prompt tailored for support interactions. This is close to what I've deployed in production.

Production-style support chatbot — complete example

Loading editor...

The chain combines trimming with the prompt and LLM in a single LCEL pipeline. The trimmer runs first, cutting the history to 20 messages before the prompt template formats everything.

Session management and invocation

Loading editor...

Each turn builds on the previous context. By turn 4, the bot knows the customer's order number (TS-78432), that they paid for express shipping, and that the package hasn't arrived. Without memory, each turn would start from zero and the customer would have to repeat everything.

You can inspect the full history to verify what's been stored:

Inspecting the conversation history

Loading editor...

Common Mistakes and How to Fix Them

After building several LangChain chatbots, these are the bugs I see most often — including ones I've made myself.

Mistake 1: Forgetting the Config on Invocation

Missing config — raises an error

# This crashes with a ConfigError
response = chatbot.invoke({"input": "Hello"})

Include the session config

# Always pass config with session_id
config = {"configurable": {"session_id": "user-001"}}
response = chatbot.invoke({"input": "Hello"}, config=config)

RunnableWithMessageHistory requires a session ID to know which conversation to load. Without the config, you get a ValueError about missing configurable fields.

Mistake 2: Mismatched Placeholder and Key Names

Silent bug — history never appears

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are helpful."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
])

bot = RunnableWithMessageHistory(
    prompt | llm,
    get_history,
    input_messages_key="input",
    history_messages_key="history",  # WRONG: doesn't match "chat_history"
)

Keys match — history works correctly

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are helpful."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}"),
])

bot = RunnableWithMessageHistory(
    prompt | llm,
    get_history,
    input_messages_key="input",
    history_messages_key="history",  # Matches "history" above
)

This is the sneakiest bug because it doesn't raise an error. The history placeholder just receives an empty list, and the bot behaves like it has no memory. I've debugged this for other developers at least five times.

Mistake 3: Using the Deprecated ConversationChain

If you see from langchain.chains import ConversationChain in a tutorial, that tutorial is outdated. ConversationChain was deprecated in LangChain 0.2.7. The modern replacement is the RunnableWithMessageHistory pattern shown throughout this tutorial.

Deprecated — ConversationChain (pre-0.2.7)

from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

# This still works but is deprecated
chain = ConversationChain(
    llm=llm,
    memory=ConversationBufferMemory(),
)

Modern — RunnableWithMessageHistory

from langchain_core.runnables.history import RunnableWithMessageHistory

# Compose with LCEL, wrap with history
chain = prompt | llm
bot = RunnableWithMessageHistory(chain, get_history,
    input_messages_key="input",
    history_messages_key="history",
)

Build a Conversation Summarizer

Write Code

Write a function summarize_conversation(messages: list[dict]) -> str that takes a list of message dictionaries (each with "role" and "content" keys) and returns a one-line summary string.

The summary should follow this format: "{n} messages: {roles}" where {n} is the total message count and {roles} is a comma-separated list of unique roles in the order they first appear.

Test with:

1. A conversation with 4 messages (roles: user, assistant, user, assistant)

2. A conversation with 1 message (role: system)

Loading editor...

Frequently Asked Questions

Can I use memory with streaming responses?

Yes. RunnableWithMessageHistory works with both .invoke() and .stream(). The wrapper saves messages to history after the stream completes:

Python

Loading editor...

How do I clear a session's memory?

Call .clear() on the ChatMessageHistory object for that session:

Python

Loading editor...

What happens when the conversation exceeds the model's context window?

With buffer memory, you get an API error when the total tokens (system prompt + history + new message) exceed the model's limit. GPT-4o-mini supports 128K tokens, so this is rare for text-only conversations. For models with smaller context windows (8K-32K), use window memory or summary buffer memory to prevent hitting the limit.

Can I add metadata to messages (timestamps, user roles)?

LangChain messages have an additional_kwargs field for arbitrary metadata. You can add timestamps, user IDs, or any other data without affecting how the LLM processes the message:

Python

Loading editor...

Complete Code

Complete chatbot with memory — copy and run

Loading editor...

References

LangChain documentation — Memory. Link

LangChain documentation — How to add message history. Link

LangChain documentation — RunnableWithMessageHistory. Link

LangChain documentation — trim_messages. Link

LangChain documentation — ChatMessageHistory. Link

OpenAI API documentation — Chat completions. Link

LangChain blog — Migrating from ConversationChain. Link

Why LLMs Forget — and How LangChain Memory Fixes It

ConversationBufferMemory — Remember Everything

ConversationBufferWindowMemory — Keep Only the Last k Turns

ConversationSummaryMemory — Compress the Past Into a Summary

RunnableWithMessageHistory — The Modern Pattern

Multiple Sessions Running Simultaneously

Combining Strategies — Summary Plus Recent Messages

Persistent Memory — Surviving Server Restarts

Real-World Example: Customer Support Bot with Memory

Common Mistakes and How to Fix Them

Mistake 1: Forgetting the Config on Invocation

Mistake 2: Mismatched Placeholder and Key Names

Mistake 3: Using the Deprecated ConversationChain

Frequently Asked Questions

Can I use memory with streaming responses?

How do I clear a session's memory?

What happens when the conversation exceeds the model's context window?

Can I add metadata to messages (timestamps, user roles)?

Complete Code

References

Related Tutorials