Skip to main content

Anthropic Claude API: Messages, Streaming, Tool Use, and Extended Thinking in Python

Intermediate90 min3 exercises55 XP
0/3 exercises

Your OpenAI integration works, but a client just asked for Claude support. Their legal team requires 200K-token context windows for contract analysis. You open the Anthropic docs and notice the API looks similar to OpenAI — but every parameter name is slightly different. This tutorial maps what you already know to Claude's SDK, with every code block runnable right here.

What Is the Claude API and How Does It Compare to OpenAI?

Need to summarize a 150-page legal document in one API call? OpenAI's GPT-4o supports 128K tokens. Claude offers 200K tokens natively — entire codebases or book-length documents without chunking.

The Claude API is Anthropic's REST service that exposes Claude models to developers. Like OpenAI, it accepts messages over HTTPS and returns completions. The Python SDK wraps these calls into clean async methods.

Three things set Claude apart. First, the 200K context window handles documents that would need chunking with other APIs. Second, extended thinking gives Claude a private scratchpad for multi-step reasoning. Third, the system prompt is a first-class parameter — not buried inside the messages array.

The API surface is smaller than OpenAI's. There is one primary endpoint: messages.create(). No separate endpoints for embeddings, images, or audio. Claude focuses on text generation and tool use.

Setup and Your First Message

I'm going to show you a working Claude API call before explaining any of it. We use AsyncAnthropic because it works natively in browser environments like this one. Hit Run to see it work:

Install the SDK and make your first Claude API call
Loading editor...

Two things to notice immediately. The response text lives at message.content[0].text, not message.choices[0].message.content. And max_tokens is required — Claude will not guess a default for you.

The content field is a list of content blocks, not a plain string. Each block has a type (usually "text") and corresponding data. This design matters later when tool use returns mixed block types.

Inspecting the response object
Loading editor...

Claude uses stop_reason instead of OpenAI's finish_reason. The value "end_turn" means the model finished naturally. You will also see "max_tokens" if the output was truncated, or "tool_use" when the model wants to call a tool.

Messages and System Prompts — The Biggest Difference from OpenAI

If you're coming from OpenAI, here is the biggest structural difference. In OpenAI, the system prompt is a message with role: "system" inside the messages array. In Claude, the system prompt is a separate top-level parameter.

OpenAI: System prompt in messages array
# OpenAI approach
response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a Python tutor."},
        {"role": "user", "content": "What is a decorator?"}
    ]
)
Claude: System prompt as separate parameter
# Claude approach
message = await client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a Python tutor.",
    messages=[
        {"role": "user", "content": "What is a decorator?"}
    ]
)

The messages array in Claude only accepts "user" and "assistant" roles. Putting a "system" role inside messages raises an error. This separation makes system prompts explicit rather than a convention.

Let's see system prompts in action. I'll use one to make Claude behave as a concise Python tutor:

Using a system prompt
Loading editor...

Now let's build a multi-turn conversation. Messages must alternate between user and assistant roles. Two consecutive same-role messages raise a validation error.

Multi-turn conversation
Loading editor...
Exercise: Build a Specialized Claude Assistant
Write Code

Write a function called ask_claude that takes a question string and returns Claude's response text. The function should:

1. Use system parameter (not in messages) set to: "You are a Python expert. Answer in under 30 words."

2. Call client.messages.create() with model "claude-sonnet-4-20250514" and max_tokens=512

3. Return message.content[0].text

Then call ask_claude("What is a generator?") and print the result, followed by "DONE" on a new line.

Note: The client variable is already set up from the earlier code block.

Loading editor...

Temperature and Model Selection

Claude supports the same temperature parameter as OpenAI. It controls randomness: 0 is deterministic, 1.0 is creative. I start with temperature=0 for anything where correctness matters.

TemperatureBehaviorBest for
0.0Deterministic — nearly identical output each timeCode generation, data extraction
0.5Balanced — some variation, mostly predictableTutoring, summaries
1.0Creative — varied responses each timeBrainstorming, creative writing
Temperature 0 vs temperature 1
Loading editor...

At temperature 0, all three answers are nearly identical. At temperature 1, each takes a different angle. Claude Sonnet 4 is the best balance of speed, quality, and cost for most tasks.

Building Real Tools — Tutor, Debugger, and Translator

Now that you understand messages, system prompts, and temperature, let's wrap them into tools you'd actually use. All three follow the exact same pattern — the only thing that changes is the system prompt.

Tool 1: A Python tutor
Loading editor...

Tool 2: Code debugger. This is the tool I reach for most often. Paste broken code, get it fixed with an explanation. Temperature 0 because you want the correct fix, not a creative one:

Tool 2: A code debugger
Loading editor...

That's a genuinely tricky bug — modifying a list while iterating by index causes skipped elements. Claude identifies the root cause and suggests the idiomatic fix.

Tool 3: A code translator
Loading editor...

All three tools use the exact same API pattern. The only difference is the system prompt. That's the central insight: the system prompt is your primary programming interface for Claude's behavior.


Streaming Responses

I find streaming essential for any user-facing application. Nobody wants to stare at a blank screen for 10 seconds while the model generates a response. Streaming sends tokens as they are produced.

Claude's streaming uses an async context manager called client.messages.stream(). This gives you typed event handling and automatic resource cleanup — different from OpenAI's stream=True approach.

Basic streaming
Loading editor...

The stream.text_stream async iterator yields each text chunk as it arrives. The context manager handles connection cleanup automatically when the block exits.

Streaming with token usage
Loading editor...
Streaming with a system prompt
Loading editor...

Tool Use (Function Calling)

You ask Claude for today's weather and it politely says it has no real-time data. Tool use solves this by letting Claude call your Python functions to fetch live information, query databases, or perform calculations.

The flow has three steps. First, define tools with JSON Schema. Second, Claude returns a tool_use content block. Third, you execute the function and send the result back as a tool_result message.

Defining tools and triggering tool use
Loading editor...

Notice two key differences from OpenAI. The schema key is input_schema (not parameters). And tool definitions go directly in the tools list — no function wrapper object around each tool.

Completing the tool-use loop
Loading editor...
Reusable tool-use loop
Loading editor...
Exercise: Parse the Tool-Use Response
Write Code

Write a function called parse_tool_response that takes a Claude API response object and returns a dictionary with these keys:

1. "has_tool_use" — boolean, True if any content block has type "tool_use"

2. "tool_names" — list of tool names found in tool_use blocks

3. "text_blocks" — list of text strings from text blocks

Test it with the simulated response below. Print the result dict, then print "DONE".

This exercise uses pure Python logic — no API call needed.

Loading editor...

Extended Thinking

This is my favorite Claude feature, and it has no equivalent in OpenAI's API. Extended thinking gives Claude a private scratchpad to reason through complex problems before writing its final answer.

You enable it with the thinking parameter. Set a budget_tokens value that determines how many tokens Claude can use for internal reasoning. The thinking appears as separate blocks in the response.

Extended thinking for math
Loading editor...

The budget_tokens must be less than max_tokens. Think of max_tokens as the ceiling for the entire response (thinking + answer). A budget of 5,000-10,000 tokens handles most reasoning tasks.

Extended thinking for code review
Loading editor...

Real-World Pattern: A Claude-Powered Research Assistant

Let's combine everything — system prompts, tool use, and extended thinking — into a practical research assistant. This pattern is the backbone of most production Claude applications.

Research assistant — tool definitions
Loading editor...
Research assistant — putting it all together
Loading editor...

This pattern combines all three features: system prompts shape the assistant's role, tool use fetches data, and extended thinking helps Claude synthesize findings before writing. The loop handles multi-step research where Claude calls tools multiple times.

Exercise: Build a Tool Schema Validator
Write Code

Write a function called validate_tool_schema that checks whether a tool definition dictionary is valid for Claude's API. It should return a dictionary with:

1. "valid" — boolean, True only if all checks pass

2. "errors" — list of error strings

Check for these requirements:

  • Must have "name" key (string)
  • Must have "description" key (string)
  • Must have "input_schema" key (dict) — NOT "parameters"
  • input_schema must have "type": "object"
  • input_schema must have "properties" key (dict)
  • Test with both valid and invalid tools. Print results and "DONE".

    This exercise is pure Python — no API call needed.

    Loading editor...

    Claude vs OpenAI — Key API Differences at a Glance

    I keep a cheat sheet on my desk for switching between APIs. Here is the reference table I wish I had when I started. This is a practical mapping, not a product comparison.

    FeatureOpenAIClaude
    Clientopenai.AsyncOpenAI()AsyncAnthropic()
    System promptIn messages as role: "system"Separate system parameter
    max_tokensOptionalRequired
    Response textresponse.choices[0].message.contentmessage.content[0].text
    Stop reasonfinish_reasonstop_reason
    Streamingstream=True, iterate chunksasync with client.messages.stream()
    Tool schema keyparametersinput_schema
    Tool result rolerole: "tool"role: "user" with tool_result block
    Extended thinkingNot availablethinking parameter with budget_tokens
    OpenAI: Tool result as "tool" role
    # OpenAI tool result format
    messages.append({
        "role": "tool",
        "tool_call_id": call.id,
        "content": json.dumps(result)
    })
    Claude: Tool result as "user" role with content array
    # Claude tool result format
    messages.append({
        "role": "user",
        "content": [{
            "type": "tool_result",
            "tool_use_id": block.id,
            "content": json.dumps(result)
        }]
    })

    Common Mistakes and How to Fix Them

    Every developer switching from OpenAI to Claude hits the same four walls. I have seen each of these in production codebases during migration projects.

    1. Forgetting max_tokens

    Wrong — raises BadRequestError
    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        messages=[{"role": "user", "content": "Hello"}]
    )
    Correct — always set max_tokens
    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )

    2. Putting System Prompt in Messages

    Wrong — system role not allowed in messages
    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[
            {"role": "system", "content": "Be concise."},
            {"role": "user", "content": "Hello"}
        ]
    )
    Correct — system is a separate parameter
    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system="Be concise.",
        messages=[{"role": "user", "content": "Hello"}]
    )

    3. Accessing Response Text Like OpenAI

    Wrong — OpenAI pattern
    text = response.choices[0].message.content  # AttributeError
    Correct — Claude uses content blocks
    text = response.content[0].text

    4. Using "tool" Role for Tool Results

    Wrong — Claude has no "tool" role
    messages.append({
        "role": "tool",
        "content": "result"
    })
    Correct — use "user" with tool_result content
    messages.append({
        "role": "user",
        "content": [{
            "type": "tool_result",
            "tool_use_id": tool_block.id,
            "content": "result"
        }]
    })

    Summary and Next Steps

    You now have working knowledge of the Claude API's four core features. Messages give you text generation. Streaming delivers real-time output. Tool use connects Claude to external functions. Extended thinking unlocks step-by-step reasoning.

    The core pattern
    Loading editor...

    The key mental model for switching from OpenAI: system prompts move out of messages, max_tokens becomes required, responses use content blocks instead of choices, and tool results use the user role.

    For your next steps, explore multi-modal inputs (sending images to Claude), prompt caching for cost reduction, and batch processing for high-volume workloads. The Anthropic docs at docs.anthropic.com cover each in depth.

    Frequently Asked Questions

    What models are available and which should I use?

    Claude Sonnet 4 (claude-sonnet-4-20250514) is the best balance of speed, quality, and cost. Claude Haiku 3.5 is faster and cheaper for simple tasks. Claude Opus 4 is the most capable for complex reasoning. Start with Sonnet and upgrade only if needed.

    How do I handle rate limits and errors?

    The SDK raises anthropic.RateLimitError for 429 responses. For production, use client = AsyncAnthropic(max_retries=3) for built-in retry with exponential backoff. Use anthropic.APIStatusError as a base class to catch all API errors.

    Can I send images to Claude?

    Yes. Claude supports vision through base64-encoded images or URLs. Pass an image as a content block with type: "image" alongside text blocks. Claude can analyze charts, screenshots, documents, and photos.

    How does Claude pricing compare to OpenAI?

    Claude Sonnet 4 costs $3/M input and $15/M output tokens. GPT-4o costs about $2.50/M input and $10/M output. For learning, both cost pennies per session. The 200K context window and extended thinking may justify the slightly higher per-token cost for complex tasks.

    References

  • Anthropic API Reference — Messages — Official API documentation for the messages endpoint.
  • Anthropic Python SDK — GitHub — Source code, examples, and changelog.
  • Claude Model Overview — Model capabilities, pricing, and context windows.
  • Tool Use Guide — Complete guide to function calling with Claude.
  • Extended Thinking Guide — How to enable and use extended thinking.
  • Anthropic Cookbook — Production patterns and advanced examples.
  • Versions used in this tutorial: Python 3.12, anthropic SDK 0.45+, model claude-sonnet-4-20250514. Tested March 2026.

    Related Tutorials