Python AI Chatbot: Build a Conversational Assistant with Memory
In the last tutorial you made a single API call and got one answer back. That's useful, but it's not a conversation. Ask the AI "What's a decorator?" and then follow up with "Show me an example" — it has no idea what "an example" refers to. Every call is a blank slate.
In this tutorial, you'll fix that. By the end, you'll have a chatbot that remembers every message, holds a consistent personality, and streams its responses word by word — like ChatGPT.
The Problem — AI Has No Memory
Run these two calls back to back and watch what happens:
The AI won't know Priya's name. Each create() call is completely independent — the model receives only the messages you pass in that call. It doesn't store anything between calls.
The first time I hit this, I spent 20 minutes convinced the API was broken before realizing the obvious: each call is a completely fresh start. So how does ChatGPT feel like a continuous conversation? The trick is embarrassingly simple: you send the entire conversation history with every call.
Multi-Turn Conversations — Sending History
The messages parameter accepts a list with three roles: "system", "user", and "assistant". To simulate memory, you include the previous messages in every call. Watch:
This time the AI knows Priya's name. It can see the full conversation in the messages list — including the earlier assistant reply. The model doesn't remember anything; it simply reads everything you send.
The pattern is straightforward: after each exchange, append the user's message and the AI's response to a growing list.
Each call sends the entire history list, so the AI sees every previous exchange. The third question — "Does it handle JavaScript-rendered pages?" — only makes sense because the AI can see it recommended a specific library in the second turn.
Use the chat() function defined above to have a 3-turn conversation:
1. Tell the AI your favorite programming language
2. Ask it to suggest a project idea in that language
3. Ask "How long would that take a beginner?"
Print each response with "Turn N:" prefix, then print "DONE" on the last line.
The chat() function and history list are already available from the previous code block.
System Prompts — Giving Your Chatbot a Personality
Right now our chatbot is a generic assistant. In the previous tutorial you saw how a system message controls the AI's behavior. For a chatbot, the system message is your chatbot's DNA — it defines who it is for the entire conversation.
The system message always goes first in the messages list, before any user/assistant exchanges. Let's rewrite the chat function to accept a personality:
The system message sits at history[0] and gets sent with every call. The AI reads it first, then reads the full conversation, and responds in character. Follow-up questions stay in context:
The tutor keeps its patient, beginner-friendly tone across all turns because the system prompt is always the first message the model sees.
The real power of this pattern — and the part I find most fun — is that swapping the system prompt gives you a completely different chatbot. Same code, different personality.
Create a Chatbot that acts as a cooking assistant — it should suggest recipes and cooking tips. Then:
1. Ask it to suggest a quick dinner recipe
2. Ask a follow-up: "What can I substitute for the main protein?"
Print both responses and then print "DONE" on the last line.
The Chatbot class is available from the previous code block.
Streaming — Getting Responses Word by Word
You've probably noticed that ChatGPT shows text appearing word by word rather than dumping the entire response at once. That's streaming — and I add it to every chatbot I build because users hate waiting. Without streaming, a 200-word response means staring at a blank screen for 2–3 seconds. With streaming, the first word appears in under 200 milliseconds.
The API change is small: add stream=True and iterate over chunks instead of reading the response at once.
Both calls produce the same content. The difference is timing — streaming starts outputting immediately. Each chunk contains a small piece of text (usually a few tokens). You accumulate them into full_response so you can store the complete reply in your history.
The Complete Chatbot — Memory, Personality, and Streaming
Let's combine everything into a final StreamingChatbot class. This is a production-worthy pattern — I use a version of this in most of my AI projects:
Each response streams to the screen word by word, then gets saved to history so the next turn has full context. The message_count() method lets you see the history growing — useful for knowing when to trim it.
Managing Long Conversations
There's a practical limit you'll hit quickly: every model has a context window — the maximum number of tokens it can process in one call. GPT-4o-mini has a 128K-token context, which sounds enormous, but a 50-turn conversation with detailed code examples can approach that limit. And you're paying for every token sent.
The simplest strategy: keep only the last N turns. This is called a sliding window:
With max_turns=2, the bot only sees the last 2 exchanges. Priya's name was in the first turn and got trimmed. In my projects, max_turns=20 covers 95% of conversations — most users don't go beyond 15-20 exchanges. It's a balance between memory and cost.
Summary — What You Built
You now know the core pattern behind every chatbot from ChatGPT to customer support bots:
messages list. The model is stateless — your code provides the memory.stream=True and iterate over chunks for word-by-word output.Practice exercise: Build a chatbot that acts as a travel planner. Give it a system prompt that makes it ask clarifying questions (budget, dates, interests) before suggesting destinations. Have a 5-turn conversation where you gradually narrow down your trip. Use the StreamingChatbot class from this tutorial.
References
chat.completions.create()Versions used in this tutorial: Python 3.12, openai library 1.x, model gpt-4o-mini. Tested March 2026.