Anthropic Claude API: Messages, Streaming, Tool Use, and Extended Thinking in Python
Your OpenAI integration works, but a client just asked for Claude support. Their legal team requires 200K-token context windows for contract analysis. You open the Anthropic docs and notice the API looks similar to OpenAI — but every parameter name is slightly different. This tutorial maps what you already know to Claude's SDK, with every code block runnable right here.
What Is the Claude API and How Does It Compare to OpenAI?
Need to summarize a 150-page legal document in one API call? OpenAI's GPT-4o supports 128K tokens. Claude offers 200K tokens natively — entire codebases or book-length documents without chunking.
The Claude API is Anthropic's REST service that exposes Claude models to developers. Like OpenAI, it accepts messages over HTTPS and returns completions. The Python SDK wraps these calls into clean async methods.
Three things set Claude apart. First, the 200K context window handles documents that would need chunking with other APIs. Second, extended thinking gives Claude a private scratchpad for multi-step reasoning. Third, the system prompt is a first-class parameter — not buried inside the messages array.
The API surface is smaller than OpenAI's. There is one primary endpoint: messages.create(). No separate endpoints for embeddings, images, or audio. Claude focuses on text generation and tool use.
Setup and Your First Message
I'm going to show you a working Claude API call before explaining any of it. We use AsyncAnthropic because it works natively in browser environments like this one. Hit Run to see it work:
Two things to notice immediately. The response text lives at message.content[0].text, not message.choices[0].message.content. And max_tokens is required — Claude will not guess a default for you.
The content field is a list of content blocks, not a plain string. Each block has a type (usually "text") and corresponding data. This design matters later when tool use returns mixed block types.
Claude uses stop_reason instead of OpenAI's finish_reason. The value "end_turn" means the model finished naturally. You will also see "max_tokens" if the output was truncated, or "tool_use" when the model wants to call a tool.
Messages and System Prompts — The Biggest Difference from OpenAI
If you're coming from OpenAI, here is the biggest structural difference. In OpenAI, the system prompt is a message with role: "system" inside the messages array. In Claude, the system prompt is a separate top-level parameter.
# OpenAI approach
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a Python tutor."},
{"role": "user", "content": "What is a decorator?"}
]
)# Claude approach
message = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a Python tutor.",
messages=[
{"role": "user", "content": "What is a decorator?"}
]
)The messages array in Claude only accepts "user" and "assistant" roles. Putting a "system" role inside messages raises an error. This separation makes system prompts explicit rather than a convention.
Let's see system prompts in action. I'll use one to make Claude behave as a concise Python tutor:
Now let's build a multi-turn conversation. Messages must alternate between user and assistant roles. Two consecutive same-role messages raise a validation error.
Write a function called ask_claude that takes a question string and returns Claude's response text. The function should:
1. Use system parameter (not in messages) set to: "You are a Python expert. Answer in under 30 words."
2. Call client.messages.create() with model "claude-sonnet-4-20250514" and max_tokens=512
3. Return message.content[0].text
Then call ask_claude("What is a generator?") and print the result, followed by "DONE" on a new line.
Note: The client variable is already set up from the earlier code block.
Temperature and Model Selection
Claude supports the same temperature parameter as OpenAI. It controls randomness: 0 is deterministic, 1.0 is creative. I start with temperature=0 for anything where correctness matters.
| Temperature | Behavior | Best for |
|---|---|---|
| 0.0 | Deterministic — nearly identical output each time | Code generation, data extraction |
| 0.5 | Balanced — some variation, mostly predictable | Tutoring, summaries |
| 1.0 | Creative — varied responses each time | Brainstorming, creative writing |
At temperature 0, all three answers are nearly identical. At temperature 1, each takes a different angle. Claude Sonnet 4 is the best balance of speed, quality, and cost for most tasks.
Building Real Tools — Tutor, Debugger, and Translator
Now that you understand messages, system prompts, and temperature, let's wrap them into tools you'd actually use. All three follow the exact same pattern — the only thing that changes is the system prompt.
Tool 2: Code debugger. This is the tool I reach for most often. Paste broken code, get it fixed with an explanation. Temperature 0 because you want the correct fix, not a creative one:
That's a genuinely tricky bug — modifying a list while iterating by index causes skipped elements. Claude identifies the root cause and suggests the idiomatic fix.
All three tools use the exact same API pattern. The only difference is the system prompt. That's the central insight: the system prompt is your primary programming interface for Claude's behavior.
Streaming Responses
I find streaming essential for any user-facing application. Nobody wants to stare at a blank screen for 10 seconds while the model generates a response. Streaming sends tokens as they are produced.
Claude's streaming uses an async context manager called client.messages.stream(). This gives you typed event handling and automatic resource cleanup — different from OpenAI's stream=True approach.
The stream.text_stream async iterator yields each text chunk as it arrives. The context manager handles connection cleanup automatically when the block exits.
Tool Use (Function Calling)
You ask Claude for today's weather and it politely says it has no real-time data. Tool use solves this by letting Claude call your Python functions to fetch live information, query databases, or perform calculations.
The flow has three steps. First, define tools with JSON Schema. Second, Claude returns a tool_use content block. Third, you execute the function and send the result back as a tool_result message.
Notice two key differences from OpenAI. The schema key is input_schema (not parameters). And tool definitions go directly in the tools list — no function wrapper object around each tool.
Write a function called parse_tool_response that takes a Claude API response object and returns a dictionary with these keys:
1. "has_tool_use" — boolean, True if any content block has type "tool_use"
2. "tool_names" — list of tool names found in tool_use blocks
3. "text_blocks" — list of text strings from text blocks
Test it with the simulated response below. Print the result dict, then print "DONE".
This exercise uses pure Python logic — no API call needed.
Extended Thinking
This is my favorite Claude feature, and it has no equivalent in OpenAI's API. Extended thinking gives Claude a private scratchpad to reason through complex problems before writing its final answer.
You enable it with the thinking parameter. Set a budget_tokens value that determines how many tokens Claude can use for internal reasoning. The thinking appears as separate blocks in the response.
The budget_tokens must be less than max_tokens. Think of max_tokens as the ceiling for the entire response (thinking + answer). A budget of 5,000-10,000 tokens handles most reasoning tasks.
Real-World Pattern: A Claude-Powered Research Assistant
Let's combine everything — system prompts, tool use, and extended thinking — into a practical research assistant. This pattern is the backbone of most production Claude applications.
This pattern combines all three features: system prompts shape the assistant's role, tool use fetches data, and extended thinking helps Claude synthesize findings before writing. The loop handles multi-step research where Claude calls tools multiple times.
Write a function called validate_tool_schema that checks whether a tool definition dictionary is valid for Claude's API. It should return a dictionary with:
1. "valid" — boolean, True only if all checks pass
2. "errors" — list of error strings
Check for these requirements:
"name" key (string)"description" key (string)"input_schema" key (dict) — NOT "parameters"input_schema must have "type": "object"input_schema must have "properties" key (dict)Test with both valid and invalid tools. Print results and "DONE".
This exercise is pure Python — no API call needed.
Claude vs OpenAI — Key API Differences at a Glance
I keep a cheat sheet on my desk for switching between APIs. Here is the reference table I wish I had when I started. This is a practical mapping, not a product comparison.
| Feature | OpenAI | Claude |
|---|---|---|
| Client | openai.AsyncOpenAI() | AsyncAnthropic() |
| System prompt | In messages as role: "system" | Separate system parameter |
max_tokens | Optional | Required |
| Response text | response.choices[0].message.content | message.content[0].text |
| Stop reason | finish_reason | stop_reason |
| Streaming | stream=True, iterate chunks | async with client.messages.stream() |
| Tool schema key | parameters | input_schema |
| Tool result role | role: "tool" | role: "user" with tool_result block |
| Extended thinking | Not available | thinking parameter with budget_tokens |
# OpenAI tool result format
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": json.dumps(result)
})# Claude tool result format
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
}]
})Common Mistakes and How to Fix Them
Every developer switching from OpenAI to Claude hits the same four walls. I have seen each of these in production codebases during migration projects.
1. Forgetting max_tokens
response = await client.messages.create(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello"}]
)response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)2. Putting System Prompt in Messages
response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "system", "content": "Be concise."},
{"role": "user", "content": "Hello"}
]
)response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="Be concise.",
messages=[{"role": "user", "content": "Hello"}]
)3. Accessing Response Text Like OpenAI
text = response.choices[0].message.content # AttributeErrortext = response.content[0].text4. Using "tool" Role for Tool Results
messages.append({
"role": "tool",
"content": "result"
})messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_block.id,
"content": "result"
}]
})Summary and Next Steps
You now have working knowledge of the Claude API's four core features. Messages give you text generation. Streaming delivers real-time output. Tool use connects Claude to external functions. Extended thinking unlocks step-by-step reasoning.
The key mental model for switching from OpenAI: system prompts move out of messages, max_tokens becomes required, responses use content blocks instead of choices, and tool results use the user role.
For your next steps, explore multi-modal inputs (sending images to Claude), prompt caching for cost reduction, and batch processing for high-volume workloads. The Anthropic docs at docs.anthropic.com cover each in depth.
Frequently Asked Questions
What models are available and which should I use?
Claude Sonnet 4 (claude-sonnet-4-20250514) is the best balance of speed, quality, and cost. Claude Haiku 3.5 is faster and cheaper for simple tasks. Claude Opus 4 is the most capable for complex reasoning. Start with Sonnet and upgrade only if needed.
How do I handle rate limits and errors?
The SDK raises anthropic.RateLimitError for 429 responses. For production, use client = AsyncAnthropic(max_retries=3) for built-in retry with exponential backoff. Use anthropic.APIStatusError as a base class to catch all API errors.
Can I send images to Claude?
Yes. Claude supports vision through base64-encoded images or URLs. Pass an image as a content block with type: "image" alongside text blocks. Claude can analyze charts, screenshots, documents, and photos.
How does Claude pricing compare to OpenAI?
Claude Sonnet 4 costs $3/M input and $15/M output tokens. GPT-4o costs about $2.50/M input and $10/M output. For learning, both cost pennies per session. The 200K context window and extended thinking may justify the slightly higher per-token cost for complex tasks.
References
Versions used in this tutorial: Python 3.12, anthropic SDK 0.45+, model claude-sonnet-4-20250514. Tested March 2026.