Skip to main content

Python AI Chatbot: Build a Conversational Assistant with Memory

Beginner25 min2 exercises30 XP
Prerequisites
0/2 exercises

In the last tutorial you made a single API call and got one answer back. That's useful, but it's not a conversation. Ask the AI "What's a decorator?" and then follow up with "Show me an example" — it has no idea what "an example" refers to. Every call is a blank slate.

In this tutorial, you'll fix that. By the end, you'll have a chatbot that remembers every message, holds a consistent personality, and streams its responses word by word — like ChatGPT.

The Problem — AI Has No Memory

Run these two calls back to back and watch what happens:

Two separate API calls
Loading editor...

The AI won't know Priya's name. Each create() call is completely independent — the model receives only the messages you pass in that call. It doesn't store anything between calls.

The first time I hit this, I spent 20 minutes convinced the API was broken before realizing the obvious: each call is a completely fresh start. So how does ChatGPT feel like a continuous conversation? The trick is embarrassingly simple: you send the entire conversation history with every call.

Multi-Turn Conversations — Sending History

The messages parameter accepts a list with three roles: "system", "user", and "assistant". To simulate memory, you include the previous messages in every call. Watch:

Sending conversation history
Loading editor...

This time the AI knows Priya's name. It can see the full conversation in the messages list — including the earlier assistant reply. The model doesn't remember anything; it simply reads everything you send.

The pattern is straightforward: after each exchange, append the user's message and the AI's response to a growing list.

A chat function with memory
Loading editor...

Each call sends the entire history list, so the AI sees every previous exchange. The third question — "Does it handle JavaScript-rendered pages?" — only makes sense because the AI can see it recommended a specific library in the second turn.

Build a 3-Turn Conversation
Write Code

Use the chat() function defined above to have a 3-turn conversation:

1. Tell the AI your favorite programming language

2. Ask it to suggest a project idea in that language

3. Ask "How long would that take a beginner?"

Print each response with "Turn N:" prefix, then print "DONE" on the last line.

The chat() function and history list are already available from the previous code block.

Loading editor...

System Prompts — Giving Your Chatbot a Personality

Right now our chatbot is a generic assistant. In the previous tutorial you saw how a system message controls the AI's behavior. For a chatbot, the system message is your chatbot's DNA — it defines who it is for the entire conversation.

The system message always goes first in the messages list, before any user/assistant exchanges. Let's rewrite the chat function to accept a personality:

Chatbot class with personality
Loading editor...

The system message sits at history[0] and gets sent with every call. The AI reads it first, then reads the full conversation, and responds in character. Follow-up questions stay in context:

Multi-turn with personality
Loading editor...

The tutor keeps its patient, beginner-friendly tone across all turns because the system prompt is always the first message the model sees.

The real power of this pattern — and the part I find most fun — is that swapping the system prompt gives you a completely different chatbot. Same code, different personality.

Code reviewer chatbot
Loading editor...
Create a Specialized Chatbot
Write Code

Create a Chatbot that acts as a cooking assistant — it should suggest recipes and cooking tips. Then:

1. Ask it to suggest a quick dinner recipe

2. Ask a follow-up: "What can I substitute for the main protein?"

Print both responses and then print "DONE" on the last line.

The Chatbot class is available from the previous code block.

Loading editor...

Streaming — Getting Responses Word by Word

You've probably noticed that ChatGPT shows text appearing word by word rather than dumping the entire response at once. That's streaming — and I add it to every chatbot I build because users hate waiting. Without streaming, a 200-word response means staring at a blank screen for 2–3 seconds. With streaming, the first word appears in under 200 milliseconds.

The API change is small: add stream=True and iterate over chunks instead of reading the response at once.

Non-streaming response
Loading editor...
Streaming response
Loading editor...

Both calls produce the same content. The difference is timing — streaming starts outputting immediately. Each chunk contains a small piece of text (usually a few tokens). You accumulate them into full_response so you can store the complete reply in your history.

The Complete Chatbot — Memory, Personality, and Streaming

Let's combine everything into a final StreamingChatbot class. This is a production-worthy pattern — I use a version of this in most of my AI projects:

StreamingChatbot — the complete pattern
Loading editor...

Each response streams to the screen word by word, then gets saved to history so the next turn has full context. The message_count() method lets you see the history growing — useful for knowing when to trim it.

Managing Long Conversations

There's a practical limit you'll hit quickly: every model has a context window — the maximum number of tokens it can process in one call. GPT-4o-mini has a 128K-token context, which sounds enormous, but a 50-turn conversation with detailed code examples can approach that limit. And you're paying for every token sent.

The simplest strategy: keep only the last N turns. This is called a sliding window:

Sliding window memory
Loading editor...

With max_turns=2, the bot only sees the last 2 exchanges. Priya's name was in the first turn and got trimmed. In my projects, max_turns=20 covers 95% of conversations — most users don't go beyond 15-20 exchanges. It's a balance between memory and cost.

Summary — What You Built

You now know the core pattern behind every chatbot from ChatGPT to customer support bots:

  • Memory is a list. Append each user message and AI response to a growing messages list. The model is stateless — your code provides the memory.
  • System prompts define personality. The first message in the list controls the AI's behavior across all turns.
  • Streaming improves UX. Add stream=True and iterate over chunks for word-by-word output.
  • Sliding window manages cost. Keep only the last N turns to avoid hitting token limits or running up API bills.
  • Practice exercise: Build a chatbot that acts as a travel planner. Give it a system prompt that makes it ask clarifying questions (budget, dates, interests) before suggesting destinations. Have a 5-turn conversation where you gradually narrow down your trip. Use the StreamingChatbot class from this tutorial.

    References

  • OpenAI Chat Completions API Reference — full API docs for chat.completions.create()
  • OpenAI Streaming Guide — how streaming works, chunk format, delta objects
  • OpenAI Prompt Engineering Guide — best practices for system prompts and message design
  • OpenAI Models Overview — context windows, pricing, and model capabilities
  • OpenAI Tokenizer Tool — interactive tool to count tokens in your messages
  • Versions used in this tutorial: Python 3.12, openai library 1.x, model gpt-4o-mini. Tested March 2026.

    Related Tutorials