What is the messages array — how ai "remembers" in Python?

I use this pattern probably ten times a week, and it still surprises people how simple it is. Let me show you the problem first, then the fix. The AI won't know Priya's name. Each create() call is completely independent — the model receives only the messages you pass in that single call. It...

Python AI Chatbot: Build a Conversational Assistant with Memory

Beginner25 min2 exercises30 XP

Prerequisites

OpenAI API in Python

0/2 exercises

In the last tutorial you made a single API call and got one answer back. That's useful, but it's not a conversation. Tell the AI your name, then ask "What's my name?" — it has no idea. Every call is a blank slate.

By the end of this tutorial, you'll have a chatbot that remembers every message, holds a consistent personality, and streams responses word by word — the same pattern behind ChatGPT.

The Messages Array — How AI "Remembers"

I use this pattern probably ten times a week, and it still surprises people how simple it is. Let me show you the problem first, then the fix.

Two separate API calls — no memory

Loading editor...

The AI won't know Priya's name. Each create() call is completely independent — the model receives only the messages you pass in that single call. It stores nothing between calls.

So how does ChatGPT feel like a continuous conversation? You send the entire conversation history with every call. The messages parameter accepts a list with three roles: "system", "user", and "assistant". To simulate memory, include previous messages each time:

Sending conversation history

Loading editor...

This time the AI knows Priya's name because it can see the earlier exchange in the messages list. The model doesn't remember anything — it reads everything you send and responds accordingly.

The pattern is straightforward: after each exchange, append the user's message and the AI's response to a growing list. Here's a chat() function that does exactly that:

A chat function with memory

Loading editor...

Each call sends the entire history list. The third question — "Does it handle JavaScript-rendered pages?" — only makes sense because the AI can see it recommended a specific library in turn two.

Build a 3-Turn Conversation

Write Code

Use the chat() function defined above to have a 3-turn conversation:

1. Tell the AI your favorite programming language

2. Ask it to suggest a project idea in that language

3. Ask "How long would that take a beginner?"

Print each response with "Turn N:" prefix, then print "DONE" on the last line.

The chat() function and history list are already available from the previous code block.

Loading editor...

System Prompts — Giving Your Chatbot a Personality

Chatbot class with personality

Loading editor...

A generic assistant is fine for demos, but real chatbots need a personality. The Chatbot class above wraps everything we built in the previous section — history management, the API call — and adds one thing: a system message at history[0].

The system message gets sent with every call, and the AI reads it before reading the conversation. It defines who the chatbot is — its role, tone, format constraints, and scope. Follow-up questions stay in context because the history keeps growing:

Multi-turn with personality

Loading editor...

The tutor keeps its patient, beginner-friendly tone across all turns. The real power — and the part I find most fun — is that swapping the system prompt creates a completely different chatbot. Same class, different personality:

Code reviewer chatbot

Loading editor...

Create a Specialized Chatbot

Write Code

Create a Chatbot that acts as a cooking assistant — it should suggest recipes and cooking tips. Then:

1. Ask it to suggest a quick dinner recipe

2. Ask a follow-up: "What can I substitute for the main protein?"

Print both responses and then print "DONE" on the last line.

The Chatbot class is available from the previous code block.

Loading editor...

Streaming — Real-Time Word-by-Word Output

You've probably noticed that ChatGPT shows text appearing word by word rather than dumping the entire response at once. That's streaming. I add it to every chatbot I build because users hate staring at a blank screen — without streaming, a 200-word response means 2-3 seconds of nothing. With streaming, the first word appears in under 200 milliseconds.

The API change is small: add stream=True and iterate over chunks instead of reading the response at once. Here are both approaches on the same question so you can compare:

Non-streaming response

Loading editor...

Streaming response

Loading editor...

Both calls produce the same content. The difference is timing — streaming starts outputting immediately. Each chunk contains a small piece of text (usually a few tokens). You accumulate them into full_response so you can save the complete reply to your conversation history.

Managing Long Conversations

Here's a problem you'll hit sooner than you expect: every model has a context window — the maximum number of tokens it can process in one call. GPT-4o-mini has a 128K-token context, which sounds enormous, but a 50-turn conversation with detailed code examples can approach that limit. And you're paying for every token sent.

The simplest strategy is a sliding window — keep only the last N exchanges and always prepend the system message. Everything older gets dropped:

SlidingWindowChatbot class

Loading editor...

Let's test it with max_turns=2 so the effect is obvious. The bot only sees the last 2 exchanges — anything older gets trimmed:

Testing the sliding window

Loading editor...

With max_turns=2, the bot only sees the last 2 exchanges. Priya's name was in turn 1 and got trimmed before turn 4. In my projects, max_turns=20 covers 95% of conversations — most users don't go beyond 15-20 exchanges.

Common Mistakes and How to Fix Them

I've seen every one of these in production code — and made most of them myself. They're subtle because the code runs without errors but the chatbot behaves strangely.

Mistake 1: Forgetting to save the assistant's reply to history

❌ Missing history append — AI forgets its own responses

async def chat_broken(msg):
    history.append({"role": "user", "content": msg})
    r = await client.chat.completions.create(
        model="gpt-4o-mini", messages=history
    )
    return r.choices[0].message.content
    # BUG: never appends the assistant reply!

✅ Always append both user AND assistant messages

async def chat_fixed(msg):
    history.append({"role": "user", "content": msg})
    r = await client.chat.completions.create(
        model="gpt-4o-mini", messages=history
    )
    reply = r.choices[0].message.content
    history.append({"role": "assistant", "content": reply})
    return reply

Without the assistant's reply in history, the AI sees a sequence of user messages with no responses. It loses context about what it said and may repeat or contradict itself.

Mistake 2: Sharing history between different conversations

❌ Global history — all conversations bleed together

shared_history = []

bot_a = {"history": shared_history}
bot_b = {"history": shared_history}
# bot_a and bot_b write to the SAME list!

✅ Each conversation gets its own history

bot_a = Chatbot(system_prompt="You are a tutor.")
bot_b = Chatbot(system_prompt="You are a reviewer.")
# Each instance has its own self.history

When two chatbot instances share the same history list, one conversation leaks into the other. The class-based approach we built avoids this — each Chatbot instance has its own self.history.

Mistake 3: Putting the system message in the wrong position

❌ System message buried in the middle

messages = [
    {"role": "user", "content": "Hi"},
    {"role": "assistant", "content": "Hello!"},
    {"role": "system", "content": "Be brief."},  # Wrong position
    {"role": "user", "content": "What is Python?"},
]

✅ System message always first

messages = [
    {"role": "system", "content": "Be brief."},  # Always first
    {"role": "user", "content": "Hi"},
    {"role": "assistant", "content": "Hello!"},
    {"role": "user", "content": "What is Python?"},
]

The system message should always be the first entry. Placing it in the middle reduces its influence — the model may partially ignore it because user and assistant messages have already set a pattern.

Mistake 4: Not collecting chunks during streaming

❌ Prints chunks but loses the full response

async for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="")
# full_response is never built — history gets an empty string

✅ Collect chunks into full_response for history

full_response = ""
async for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="")
        full_response += delta
# Now save full_response to history

With streaming, there's no response.choices[0].message.content to grab at the end. You must build the full text yourself by concatenating chunks. Skip this step and the chatbot's history will have an empty reply, causing it to lose track of what it said.

Frequently Asked Questions

Can I use a different model (Claude, Gemini) with this same pattern?

The concepts — messages list, system prompt, conversation history — transfer directly to other providers. The API syntax differs: Anthropic uses anthropic.Anthropic() with a separate system parameter, and Google's Gemini uses google.generativeai with a chat.send_message() pattern. But the mental model is identical: you manage a growing list of messages and send them with each call.

How many messages can I include in the history before hitting the limit?

GPT-4o-mini supports 128K tokens of context. A typical conversational message is 50-200 tokens, so you could fit roughly 600-2,500 messages before hitting the ceiling. In practice, you'll run into cost concerns long before the limit — a 500-message history resends all 500 messages on every call. The sliding window approach from this tutorial is the practical solution.

Does the system message count toward my token usage?

Yes. The system message is sent with every API call, just like user and assistant messages. A 100-word system prompt adds roughly 130 tokens per call. Keep system prompts concise — long prompts with dozens of rules can quietly inflate costs across hundreds of conversations.

Summary and Next Steps

You now know the core pattern behind every chatbot from ChatGPT to customer support bots:

Memory is a list. Append each user message and AI response to a growing messages list. The model is stateless — your code provides the memory.

System prompts define personality. The first message in the list controls the AI's behavior across all turns.

Streaming improves UX. Add stream=True and iterate over chunks for word-by-word output.

Sliding window manages cost. Keep only the last N turns to avoid hitting token limits or running up bills.

What to explore next:

Structured Data Extraction — Get the AI to return JSON instead of free-form text, which is how you go from "AI that talks" to "AI that powers application logic"

Function Calling — Let the AI trigger Python functions (check weather, query a database) during a conversation

RAG (Retrieval-Augmented Generation) — Give the chatbot access to your own documents and data

Practice exercise: Build a chatbot that acts as a travel planner. Give it a system prompt that makes it ask clarifying questions (budget, dates, interests) before suggesting destinations. Have a 5-turn conversation where you gradually narrow down your trip. Use the Chatbot class from this tutorial.

Click to see the solution

planner = Chatbot(
    system_prompt=(
        "You are a travel planner. Before suggesting destinations, "
        "ask about: 1) budget range, 2) travel dates, "
        "3) interests (adventure, culture, relaxation). "
        "Only suggest destinations after gathering these details. "
        "Keep each response under 150 words."
    )
)

# Turn 1: Start the conversation
print(await planner.say("I want to plan a vacation."))
# Turn 2: Answer budget question
print(await planner.say("My budget is around $2000 for a week."))
# Turn 3: Answer dates question
print(await planner.say("I'm thinking late September."))
# Turn 4: Answer interests
print(await planner.say("I love hiking and local food scenes."))
# Turn 5: Ask for more detail
print(await planner.say("Tell me more about your top suggestion."))

The system prompt forces the AI to ask clarifying questions before jumping to suggestions. Each turn adds context, and by turn 4-5, the AI has enough information to make a targeted recommendation.

Complete Code

Click to expand the full script (copy-paste and run)

# Complete code from: Python AI Chatbot — Build a Conversational Assistant with Memory
# Requires: pip install openai
# Python 3.9+

import micropip
await micropip.install("openai")

import openai
import os

os.environ["OPENAI_API_KEY"] = "sk-your-key-here"
client = openai.AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])

# --- Section 1: Memory via messages list ---
history = []

async def chat(user_message: str) -> str:
    history.append({"role": "user", "content": user_message})
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=history,
    )
    assistant_message = response.choices[0].message.content
    history.append({"role": "assistant", "content": assistant_message})
    return assistant_message

print("AI:", await chat("My name is Priya. I'm building a web scraper."))
print("AI:", await chat("What library should I use?"))
print("AI:", await chat("Does it handle JavaScript-rendered pages?"))

# --- Section 2: System prompts and personality ---
class Chatbot:
    def __init__(self, system_prompt: str, model: str = "gpt-4o-mini"):
        self.model = model
        self.history = [{"role": "system", "content": system_prompt}]

    async def say(self, user_message: str) -> str:
        self.history.append({"role": "user", "content": user_message})
        response = await client.chat.completions.create(
            model=self.model,
            messages=self.history,
        )
        reply = response.choices[0].message.content
        self.history.append({"role": "assistant", "content": reply})
        return reply

tutor = Chatbot(
    system_prompt=(
        "You are a patient Python tutor for beginners. "
        "Use simple analogies. Show short code examples. "
        "Keep responses under 150 words."
    )
)
print(await tutor.say("What's a dictionary in Python?"))
print(await tutor.say("Can I use numbers as keys?"))

# --- Section 3: Streaming ---
stream = await client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a Python tutor. Keep answers under 100 words."},
        {"role": "user", "content": "What is a generator in Python?"},
    ],
    stream=True,
)

full_response = ""
async for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
        full_response += delta
print()

# --- Section 4: Sliding window ---
class SlidingWindowChatbot:
    def __init__(self, system_prompt: str, max_turns: int = 10, model: str = "gpt-4o-mini"):
        self.model = model
        self.system_message = {"role": "system", "content": system_prompt}
        self.history = []
        self.max_turns = max_turns

    async def say(self, user_message: str) -> str:
        self.history.append({"role": "user", "content": user_message})
        if len(self.history) > self.max_turns * 2:
            self.history = self.history[-(self.max_turns * 2):]
        messages = [self.system_message] + self.history
        response = await client.chat.completions.create(
            model=self.model,
            messages=messages,
        )
        reply = response.choices[0].message.content
        self.history.append({"role": "assistant", "content": reply})
        return reply

bot = SlidingWindowChatbot(system_prompt="You are a helpful assistant.", max_turns=2)
print(await bot.say("My name is Priya."))
print(await bot.say("I work at Google."))
print(await bot.say("I'm learning ML."))
print(await bot.say("What's my name?"))  # Won't remember!

print("Script completed successfully.")

References

OpenAI Chat Completions API Reference — full API docs for chat.completions.create()

OpenAI Streaming Guide — how streaming works, chunk format, delta objects

OpenAI Prompt Engineering Guide — best practices for system prompts and message design

OpenAI Models Overview — context windows, pricing, and model capabilities

OpenAI Tokenizer Tool — interactive tool to count tokens in your messages

[FRESHNESS MARKERS]

Python version tested: 3.12

Library versions tested: openai 1.x

APIs that may change: streaming chunk format, model context window sizes

Deprecation risks: gpt-4o-mini may be superseded by newer models; the chat completions API pattern is stable

Suggested review date: 3 months (GenAI topic — rapid evolution)

Python AI Chatbot: Build a Conversational Assistant with Memory

The Messages Array — How AI "Remembers"

System Prompts — Giving Your Chatbot a Personality

Streaming — Real-Time Word-by-Word Output

Managing Long Conversations

Common Mistakes and How to Fix Them

Mistake 1: Forgetting to save the assistant's reply to history

Mistake 2: Sharing history between different conversations

Mistake 3: Putting the system message in the wrong position

Mistake 4: Not collecting chunks during streaming

Frequently Asked Questions

Summary and Next Steps

Complete Code

References

Related Tutorials

Save your progress across devices