LangChain Chatbot Memory in Python — With Examples

Intermediate120 min3 exercises50 XP

Prerequisites

0/3 exercises

Your LangChain chatbot answers questions perfectly — until the user says "wait, go back to what you said earlier." The bot draws a blank. Every chain invocation starts from scratch, and without explicit memory management, even the most sophisticated prompt template produces a bot with amnesia.

This tutorial covers three distinct memory strategies — buffer, window, and summary — and exactly when each one fits. You will also learn the modern RunnableWithMessageHistory pattern that replaced the deprecated ConversationChain.

Why LLMs Forget — and How LangChain Memory Fixes It

Every LLM API call is stateless. You send a list of messages, the model generates a response, and then it forgets everything. What feels like "memory" in ChatGPT is actually the application resending the entire conversation with every request.

The code below makes two separate LLM calls with no shared history. The model receives the second question in complete isolation — it has never seen the first message. It cannot answer "What's my name?" because that information was never included in the second request.

A stateless chatbot — no memory between calls

Loading editor...

The second call returns something like "I don't have access to your personal information." The model genuinely does not know. This is the core problem LangChain's memory module solves: it manages conversation history automatically so your chatbot can reference earlier messages.

LangChain provides three core memory strategies. Each one answers the same question differently: what should we include in the message history?

Memory Type	What It Stores	Token Usage	Best For
ConversationBufferMemory	Every message, verbatim	Grows linearly	Short conversations (<20 turns)
ConversationBufferWindowMemory	Last k message pairs	Fixed ceiling	Medium conversations with recent-context focus
ConversationSummaryMemory	Running summary of the conversation	Roughly constant	Long conversations (50+ turns)

ConversationBufferMemory — Remember Everything

Buffer memory is the natural starting point when prototyping. It stores every message — every user input and every AI response — and resends all of them on every call. Nothing is trimmed, nothing is summarized. I reach for this whenever I want something working fast.

The setup requires five pieces working together. A ChatPromptTemplate defines the system instructions and includes a MessagesPlaceholder where conversation history gets injected. A ChatOpenAI instance handles the LLM calls. A dictionary-based session_store maps session IDs to ChatMessageHistory objects. Finally, RunnableWithMessageHistory wires the chain to the store, automatically loading history before each call and saving new messages after.

Full chatbot with ConversationBufferMemory

Loading editor...

The next block runs a three-turn conversation that tests whether memory is working. Turn 1 introduces a name ("Priya") and a topic ("decorators"). Turn 2 asks for an example without restating the topic — the bot must pull it from stored history. Turn 3 explicitly tests recall by asking the bot to repeat both facts. If all three turns produce correct responses, buffer memory is wired up correctly.

Multi-turn conversation with buffer memory

Loading editor...

Turn 3 succeeds because all six messages (three user, three AI) get sent to the model every time. The bot answers "Your name is Priya and you're learning decorators" because both facts are literally in the history.

To verify what the memory stored, call get_session_history() with the session ID and access .messages. This returns a list of HumanMessage and AIMessage objects — one per message exchanged. After three turns, you should see six messages alternating between human and ai types. The loop below prints the type and first 80 characters of each.

Inspecting stored messages

Loading editor...

ConversationBufferWindowMemory — Keep Only the Last k Turns

Buffer memory works great until conversations grow past 20 turns. At that point, token costs climb and you risk hitting the context window ceiling. Window memory fixes this with a simple rule: keep only the last k exchange pairs and discard everything older.

The legacy ConversationBufferWindowMemory class demonstrates the concept clearly. Setting k=5 means "keep the last 5 exchange pairs" (10 messages total). The code below creates a window memory, feeds it three simulated turns via save_context, and checks the stored count. This class is being phased out in favor of trim_messages — I show the modern approach right after.

Window memory — only the last 5 exchanges (legacy API)

Loading editor...

Once the conversation exceeds 5 turns, the oldest pair gets dropped. The model never sees turn 1 when you are on turn 7. The tradeoff is straightforward: you lose distant context but gain predictable token costs.

The modern approach replaces the legacy memory class with trim_messages from langchain_core.messages. It accepts a strategy parameter ("last" keeps the most recent messages), an include_system flag to protect the system prompt, and a start_on parameter to ensure trimmed history always begins with a human message. The token_counter=len setting counts messages, not actual tokens — for precise budgeting, you would pass a tiktoken-based counter instead.

Modern window memory with message trimming

Loading editor...

One gotcha worth knowing: if the user refers to something from turn 1 and you are on turn 15 with k=5, the model will confidently hallucinate or say it does not know. That early message was trimmed. For some applications this is fine. For others, you need summary memory.

ConversationSummaryMemory — Compress the Past Into a Summary

What if the conversation is 50 turns long and the user references something from turn 3? Window memory lost it. Buffer memory would work but costs a fortune in tokens. Summary memory offers a third option: it asks the LLM to maintain a running summary of the conversation instead of storing raw messages.

ConversationSummaryMemory in action

Loading editor...

Instead of storing six messages, the memory holds a condensed summary like: "The user is Priya, a data engineer at Spotify who needs to build a real-time ETL pipeline using Kafka with Avro schema evolution for backward compatibility." Three turns compressed into one sentence.

The real advantage becomes obvious when you compare what the LLM receives at turn 20. The block below calculates approximate token counts using simple arithmetic: 50 tokens per message average, 20 turns total. Buffer sends all 40 messages (~2,000 tokens). Window with k=5 sends 10 messages (~500 tokens). Summary sends one compressed paragraph plus the latest exchange (~250 tokens). These are rough estimates, but the scaling difference is dramatic.

Token usage comparison at turn 20 (estimates)

Loading editor...

Which produces:

Python

Loading editor...

The gap widens as conversations grow. At turn 100, buffer memory sends roughly 10,000 tokens per call while summary memory stays around 300. For context on how this interacts with model limits, see our Context Windows and Token Budgets tutorial.

RunnableWithMessageHistory — The Modern Pattern

If you have read LangChain tutorials from 2023, you probably saw ConversationChain everywhere. That class is now deprecated. The replacement is RunnableWithMessageHistory, which is more flexible, composable, and fits naturally into LCEL pipelines.

We already used RunnableWithMessageHistory in the buffer memory section. Here I want to break down why it works the way it does and label all the moving parts. Five steps: (1) create the LLM, (2) build a prompt template with a MessagesPlaceholder for history, (3) compose them into an LCEL chain, (4) define a session store with a lookup function, and (5) wrap the chain with RunnableWithMessageHistory.

Complete RunnableWithMessageHistory pattern — annotated

Loading editor...

Three things to notice. First, input_messages_key tells the wrapper which part of your input dictionary is the user's message — this gets saved to history. Second, history_messages_key must match the variable_name in your MessagesPlaceholder. Third, the session ID comes from the config at invocation time, not at construction time.

Multiple Sessions Running Simultaneously

A real chatbot serves many users at once. The session ID keeps their conversations separate — each call to bot.invoke() carries a config dictionary specifying which session to load. The store holds one ChatMessageHistory per session, so User 1's context never leaks into User 2's responses.

Separate sessions for different users

Loading editor...

The store dictionary now has two entries: "user-001" with Priya's history and "user-002" with Carlos's history. In production, you would replace the dictionary with a persistent backend — covered in the Persistent Memory section below.

Build a Session Store Function

Write Code

Write a function get_or_create_session that takes a session_id (string) and a store (dictionary) as arguments. If the session ID exists in the store, return the existing list. If not, create a new empty list, add it to the store, and return it.

Then simulate two sessions:

1. Add the message "Hello from Priya" to session "s1"

2. Add the message "Hello from Carlos" to session "s2"

3. Add the message "Follow-up from Priya" to session "s1"

Finally, print the length of session "s1" and session "s2" on separate lines.

Loading editor...

Combining Strategies — Summary Plus Recent Messages

In practice, the most effective pattern combines summary and window memory. Keep the last k messages verbatim for detailed recent context, and prepend a summary of everything older for long-range awareness. The model gets both precision (exact recent messages) and breadth (compressed older context).

ConversationSummaryBufferMemory handles this automatically. It stores raw messages until the total token count exceeds a threshold you set with max_token_limit. Once the limit is breached, older messages get summarized into a single paragraph while recent messages stay intact. The code below creates one with a 300-token limit, adds two turns, and checks whether messages are still raw or already summarized.

ConversationSummaryBufferMemory — the best of both worlds

Loading editor...

With only two turns, the token count is below 300, so you see raw messages. Add 5–6 more turns and the older messages get compressed into a summary while recent ones stay verbatim.

Other Memory Types Worth Knowing

Buffer, window, and summary cover the vast majority of chatbot use cases. But LangChain offers two specialized memory types that solve problems the core three cannot.

ConversationEntityMemory

Entity memory tracks specific entities — people, projects, companies — mentioned in the conversation. Instead of storing raw messages or summaries, it maintains a dictionary of entity descriptions that gets updated as new information appears. This works well when a chatbot needs to remember facts about particular subjects: a sales assistant tracking multiple clients, or a project manager bot keeping tabs on several workstreams.

ConversationEntityMemory — tracking entities

Loading editor...

VectorStoreRetrieverMemory

This memory type stores messages as vector embeddings and retrieves only the most semantically relevant ones for each new query. Instead of keeping the most recent messages (window) or a summary of all messages (summary), it finds messages most related to the current question. This is valuable in long-running sessions where the user jumps between unrelated topics. The code below creates a FAISS-backed vector memory using OpenAI embeddings, saves one context, and retrieves relevant history by semantic similarity.

VectorStoreRetrieverMemory concept (requires vector store)

Loading editor...

Persistent Memory — Surviving Server Restarts

Everything we have built so far uses a Python dictionary for session storage. Restart the server and all conversations vanish. For production chatbots, you need persistent storage.

The good news: the architecture stays the same. You only swap the get_session_history function. Redis is a popular in-memory data store used for caching and message brokering; here it persists conversation messages across server restarts. The code below imports RedisChatMessageHistory from langchain_community, creates a factory function that connects to a Redis URL, and passes it to RunnableWithMessageHistory. The chain, prompt, and invocation code remain identical — only the factory function changes.

Redis-backed persistent memory

Loading editor...

LangChain supports several persistent backends. The table below lists the most common options:

Backend	Import	When to Use
In-memory dict	`ChatMessageHistory`	Development and testing
Redis	`RedisChatMessageHistory`	Low-latency production apps
PostgreSQL	`PostgresChatMessageHistory`	When you already have a Postgres DB
MongoDB	`MongoDBChatMessageHistory`	Document-oriented storage needs
SQLite	`SQLChatMessageHistory`	Single-server apps, local persistence

Implement a Window Trimmer

Write Code

Write a function trim_to_window(messages: list, k: int) -> list that takes a list of messages and returns only the last k messages. If the list has fewer than k messages, return all of them.

Then test it:

1. Create a list of 8 messages: ["msg1", "msg2", ..., "msg8"]

2. Trim to window size 5 and print the result

3. Trim the original list to window size 20 and print the result

Loading editor...

LangGraph — The Next Generation of Memory

LangChain's memory classes and RunnableWithMessageHistory are not the end of the story. The LangChain team now recommends LangGraph for new agentic applications. LangGraph takes a fundamentally different approach to memory: instead of wrapping chains with history, it builds conversation state directly into a graph of nodes.

In LangGraph, you define a graph with a MessagesState, compile it with a MemorySaver checkpointer, and pass a thread_id in the config. The framework automatically checkpoints state after each node execution. A minimal example looks like this:

LangGraph MemorySaver — checkpoint-based memory

Loading editor...

For a simple chatbot, this is more code than RunnableWithMessageHistory. The payoff comes when you add tool calling, conditional routing, or multi-agent workflows. If your project is a straightforward Q&A bot, stick with the patterns taught earlier in this tutorial.

Real-World Example: Customer Support Bot with Memory

Time to put everything together. The customer support bot below uses trim_messages for window-style memory, handles multiple sessions, and includes a system prompt tailored for support interactions.

The first block sets up the LLM, a system prompt with support-specific rules, the prompt template, and a trimmer keeping the last 20 messages. These four components get composed into a single LCEL chain via the pipe operator.

Production-style support chatbot — complete example

Loading editor...

The second block adds session management and runs a four-turn support conversation. Each turn builds on previous context — by turn 4, the bot knows the customer's order number, shipping tier, and original complaint without the customer repeating anything.

Session management and invocation

Loading editor...

Without memory, each turn starts from zero and the customer repeats their order number and shipping details every time. Memory transforms a frustrating interaction into a natural conversation.

Common Mistakes and How to Fix Them

These three bugs surface over and over when wiring LangChain chatbots with memory. I check for them first whenever a bot acts like it has amnesia.

Mistake 1: Forgetting the Config on Invocation

Missing config — raises an error

# This crashes with a ConfigError
response = chatbot.invoke({"input": "Hello"})

Include the session config

# Always pass config with session_id
config = {"configurable": {"session_id": "user-001"}}
response = chatbot.invoke({"input": "Hello"}, config=config)

RunnableWithMessageHistory requires a session ID to know which conversation to load. Without the config, you get a ValueError about missing configurable fields.

Mistake 2: Mismatched Placeholder and Key Names

Silent bug — history never appears

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are helpful."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
])

bot = RunnableWithMessageHistory(
    prompt | llm,
    get_history,
    input_messages_key="input",
    history_messages_key="history",  # WRONG: doesn't match "chat_history"
)

Keys match — history works correctly

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are helpful."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}"),
])

bot = RunnableWithMessageHistory(
    prompt | llm,
    get_history,
    input_messages_key="input",
    history_messages_key="history",  # Matches "history" above
)

This is the sneakiest bug because it raises no error at all. The history placeholder receives an empty list, and the bot behaves as if it has no memory. Always verify key names match before debugging anything else.

Mistake 3: Using the Deprecated ConversationChain

If you see from langchain.chains import ConversationChain in a tutorial, that tutorial is outdated. ConversationChain was deprecated in LangChain 0.2.7. The modern replacement is the RunnableWithMessageHistory pattern shown throughout this tutorial.

Deprecated — ConversationChain (pre-0.2.7)

from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

# This still works but is deprecated
chain = ConversationChain(
    llm=llm,
    memory=ConversationBufferMemory(),
)

Modern — RunnableWithMessageHistory

from langchain_core.runnables.history import RunnableWithMessageHistory

# Compose with LCEL, wrap with history
chain = prompt | llm
bot = RunnableWithMessageHistory(chain, get_history,
    input_messages_key="input",
    history_messages_key="history",
)

Build a Conversation Summarizer

Write Code

Write a function summarize_conversation(messages: list[dict]) -> str that takes a list of message dictionaries (each with "role" and "content" keys) and returns a one-line summary string.

The summary should follow this format: "{n} messages: {roles}" where {n} is the total message count and {roles} is a comma-separated list of unique roles in the order they first appear.

Test with:

1. A conversation with 4 messages (roles: user, assistant, user, assistant)

2. A conversation with 1 message (role: system)

Loading editor...

Frequently Asked Questions

Can I use memory with streaming responses?

Yes. RunnableWithMessageHistory works with both .invoke() and .stream(). The wrapper saves messages to history after the stream completes. For more on streaming architecture, see our Streaming AI API Backend tutorial.

Python

Loading editor...

How do I clear a session's memory?

Call .clear() on the ChatMessageHistory object for that session:

Python

Loading editor...

What happens when the conversation exceeds the model's context window?

With buffer memory, you get an API error when the total tokens (system prompt + history + new message) exceed the model's limit. GPT-4o-mini supports 128K tokens, so this is rare for text-only conversations. For models with smaller windows (8K–32K), use window memory or summary buffer memory. See our Context Windows and Token Budgets tutorial for details.

Can I add metadata to messages (timestamps, user roles)?

LangChain messages have an additional_kwargs field for arbitrary metadata. You can add timestamps, user IDs, or any other data without affecting how the LLM processes the message:

Python

Loading editor...

Can I use async with RunnableWithMessageHistory?

Yes. Replace .invoke() with await bot.ainvoke() and .stream() with async for chunk in bot.astream(). The history loading and saving happens asynchronously, which matters for high-concurrency applications:

Python

Loading editor...

Complete Code

Complete chatbot with memory — copy and run

Loading editor...

References

LangChain documentation — Memory. Link

LangChain documentation — How to add message history. Link

LangChain documentation — RunnableWithMessageHistory. Link

LangChain documentation — trim_messages. Link

LangChain documentation — ChatMessageHistory. Link

OpenAI API documentation — Chat completions. Link

LangChain documentation — Migrating from ConversationChain. Link

LangChain documentation — Memory types. Link

LangGraph documentation — MemorySaver. Link

LangChain Chatbot Memory in Python — With Examples

Why LLMs Forget — and How LangChain Memory Fixes It

ConversationBufferMemory — Remember Everything

ConversationBufferWindowMemory — Keep Only the Last k Turns

ConversationSummaryMemory — Compress the Past Into a Summary

RunnableWithMessageHistory — The Modern Pattern

Multiple Sessions Running Simultaneously

Combining Strategies — Summary Plus Recent Messages

Other Memory Types Worth Knowing

ConversationEntityMemory

VectorStoreRetrieverMemory

Persistent Memory — Surviving Server Restarts

LangGraph — The Next Generation of Memory

Real-World Example: Customer Support Bot with Memory

Common Mistakes and How to Fix Them

Mistake 1: Forgetting the Config on Invocation

Mistake 2: Mismatched Placeholder and Key Names

Mistake 3: Using the Deprecated ConversationChain

Frequently Asked Questions

Can I use memory with streaming responses?

How do I clear a session's memory?

What happens when the conversation exceeds the model's context window?

Can I add metadata to messages (timestamps, user roles)?

Can I use async with RunnableWithMessageHistory?

Complete Code

References

Related Tutorials

Save your progress across devices