OpenAI Function Calling: Build an LLM That Executes Your Python Functions

Intermediate90 min3 exercises55 XP

Prerequisites

0/3 exercises

Ask GPT "What's the weather in Tokyo right now?" and it'll confidently make something up. The model has no internet access, no database connection, no way to call your backend. Function calling fixes this -- it lets the model request that your code runs a specific function, then uses the real result in its answer.

What Is Function Calling and Why Does It Matter?

Picture this: a customer asks your support bot "What's the status of order #4521?" The model knows how to answer politely, but it can't query your database. Without function calling, you'd need fragile prompt hacks or regex parsing.

Function calling introduces a structured handoff. You describe your available functions using JSON Schema. When a user's question requires real data, the model returns a tool_calls object instead of text. Your code runs the function and sends the result back.

The model never runs your code directly. It only requests a function call with structured arguments. You remain in full control of execution, validation, and error handling.

Your First Function Call

Enough theory. Let's build a complete working example with a get_weather function. I'll walk you through every step so you see exactly how the pieces fit together.

First, install the library and set up the client. This block also defines the Python function that does the real work, plus the JSON tool definition that tells the model what's available.

Step 1 -- Install, set up client, define the tool

Loading editor...

Notice there are two separate things: the Python function and the JSON tool definition. The model sees only the tool definition -- it never inspects your actual code.

Now send a request and check if the model wants to call a function. When it does, finish_reason will be "tool_calls" instead of "stop".

Step 2 -- Send the request and check for tool calls

Loading editor...

The model returned a tool call request, not a text answer. Now extract the function name and arguments, run the function, and send the result back. The model will then write a natural-language answer using that data.

Step 3 -- Execute the function and send the result back

Loading editor...

That's the complete loop: describe tools, get a tool call request, run the function, send the result back, get a human-readable answer. Every function-calling pattern builds on these exact steps.

Exercise 1: Define a Tool Schema

Write Code

Write a tool definition (a Python dictionary) for a search_products function that searches an e-commerce catalog.

The function accepts:

query (string, required) -- the search term

category (string, optional) -- one of: "electronics", "clothing", "books", "home"

max_price (number, optional) -- maximum price filter

in_stock (boolean, optional) -- filter for in-stock items only

Your dictionary must follow the OpenAI tool format with "type": "function" at the top level.

Note: This exercise tests your understanding of JSON Schema -- the code won't make API calls.

Loading editor...

The Tool-Calling Loop

What happens when the model needs to call two functions in sequence to answer one question? Say the user asks "Is it warmer in Tokyo or London?" -- the model needs both cities' weather before it can compare. You need a loop.

The pattern is always the same: send messages, check for tool calls, execute them, append results, send again. Repeat until the model returns a text response.

A reusable async tool-calling loop

Loading editor...

The while-style loop is the key insight. Some questions need one function call, others need three or four. The loop handles any number automatically. It terminates when the model responds with text instead of requesting more tool calls.

Exercise 2: Complete the Tool-Calling Loop

Write Code

Complete the dispatch_tool_call function that takes a tool call object and a registry dictionary, then returns the function result as a JSON string.

Steps:

1. Extract the function name from tool_call.function.name

2. Parse the arguments from tool_call.function.arguments using json.loads

3. Look up the function in the registry dictionary

4. Call the function with the parsed arguments using **

5. Return the result as a JSON string using json.dumps

If the function name is not in the registry, return json.dumps({"error": f"Unknown function: {name}"}).

Note: This exercise uses mock objects to simulate the OpenAI response -- no API calls needed.

Loading editor...

Defining Tools with JSON Schema

I've found that 80% of function-calling bugs come from poorly written tool definitions. The model can only call your function correctly if the schema tells it exactly what to pass. Think of the schema as documentation for an AI reader.

Required vs Optional Parameters

Put only truly essential parameters in the "required" array. If a parameter has a sensible default, leave it optional. The model will skip optional parameters unless the user's message mentions them.

Required vs optional parameters

Loading editor...

Enum Parameters

Whenever a parameter has a fixed set of valid values, use "enum". This prevents the model from inventing invalid values like "warm" for a temperature unit that only accepts "celsius" or "fahrenheit".

Nested Objects and Arrays

JSON Schema supports complex structures. You can nest objects inside objects and define arrays with typed items.

Nested objects and array parameters

Loading editor...

Parallel Function Calling

Here's a problem you'll hit quickly: a user asks "Compare the weather in Tokyo, London, and New York." The model needs three weather lookups. Does it make three separate round trips?

No. GPT-4o and GPT-4o-mini support parallel function calling. The model returns multiple tool calls in a single response. Your code executes all of them, sends all results back, and the model synthesizes one final answer.

Parallel tool calls in action

Loading editor...

The model requested all three weather lookups at once. Now execute them all and send back every result. Each tool result must include the correct tool_call_id -- match them or the API will reject your request.

Handling parallel results correctly

Loading editor...

Without Parallel Calls (4 API round trips)

# Round trip 1: "What's the weather in Tokyo?"
# -> Model calls get_weather("Tokyo") -> you return result
# Round trip 2: "Now London?"
# -> Model calls get_weather("London") -> you return result
# Round trip 3: "Now New York?"
# -> Model calls get_weather("New York") -> you return result
# Round trip 4: "Now compare them"
# -> Model writes final answer
# Total: 4 API calls, ~3 seconds

With Parallel Calls (2 API round trips)

# Round trip 1: "Compare Tokyo, London, New York"
# -> Model calls get_weather("Tokyo"),
#    get_weather("London"),
#    get_weather("New York") -- ALL AT ONCE
# -> You return all 3 results
# Round trip 2: Model writes final answer
# Total: 2 API calls, ~1 second

Error Handling in Tool Execution

I once deployed an agent that crashed in production because a weather API returned a 503 error. My code raised an unhandled exception, the tool-calling loop died, and the user saw a blank screen. The fix took two lines.

When your function fails, don't crash -- return a structured error. The model is surprisingly good at recovering. It can apologize, try a different approach, or ask the user for clarification.

Wrapping tool execution with error handling

Loading editor...

The model receives the error as a tool result and adapts. I've seen GPT-4o read an error like "Unknown city: Toky" and respond with "I couldn't find weather data for 'Toky'. Did you mean Tokyo?" That's the power of returning errors gracefully.

Real-World Example: Weather + Calculator Agent

Let's combine everything into a practical multi-tool agent. This agent has access to weather data, a calculator, and a temperature converter. It decides which tools to use based on the question.

Define three tools and their functions

Loading editor...

Tool definitions and the complete agent

Loading editor...

Testing the agent with complex questions

Loading editor...

Test 2 is the most interesting. The model made parallel weather calls in round 1, then used the calculator in round 2 to compute the average. Two rounds, three tool calls, one coherent answer.

Exercise 3: Build a 2-Tool Agent

Write Code

Build a tool registry and dispatch function for a simple agent with two tools.

Given:

lookup_user(user_id) -- returns user info as a dict

get_order_history(user_id, limit) -- returns order list as a dict

Write:

1. A REGISTRY dictionary mapping function names to functions

2. A process_tool_calls(tool_calls, registry) function that:

- Takes a list of mock tool call objects and a registry

- Returns a list of {"tool_call_id": ..., "role": "tool", "content": ...} dicts

- Uses json.loads to parse arguments and json.dumps to serialize results

- Handles unknown functions by returning {"error": "Unknown function: <name>"}

Note: This exercise uses mock objects -- no API calls needed.

Loading editor...

Common Mistakes and How to Fix Them

After debugging dozens of function-calling implementations, these are the four mistakes I see most often.

1. Forgetting to Append the Assistant Message

WRONG -- skipping the assistant message

if msg.tool_calls:
    for tc in msg.tool_calls:
        result = dispatch(tc)
        messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})
    # ERROR: API expects the assistant message before tool results

RIGHT -- always append the assistant message first

if msg.tool_calls:
    messages.append(msg)  # <-- This line is critical
    for tc in msg.tool_calls:
        result = dispatch(tc)
        messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})

2. Not Parsing Arguments from JSON

WRONG -- arguments is a string, not a dict

args = tool_call.function.arguments
result = my_function(**args)
# TypeError: strings are not keyword arguments

RIGHT -- parse the JSON string first

args = json.loads(tool_call.function.arguments)
result = my_function(**args)

3. Mismatched tool_call_id

Each tool result must reference the exact id from the corresponding tool call. Hardcoding an ID or swapping IDs between parallel calls causes a 400 error.

4. Vague Tool Descriptions

Vague description -- model guesses wrong

{
    "name": "search",
    "description": "Search for stuff",
    "parameters": {
        "type": "object",
        "properties": {
            "q": {"type": "string"}
        }
    }
}
# Model doesn't know: search what?
# Products? Users? Documents?

Clear description -- model calls correctly

{
    "name": "search_products",
    "description": "Search the product catalog by keyword. Returns matching products with name, price, and availability.",
    "parameters": {
        "type": "object",
        "properties": {
            "q": {
                "type": "string",
                "description": "Search keywords, e.g. 'wireless headphones'"
            }
        },
        "required": ["q"]
    }
}

Summary and Next Steps

Function calling bridges the gap between what LLMs know and what they can do. The model reasons about which tools to use and extracts structured arguments. Your code keeps full control over execution, validation, and error handling.

Here's what we covered:

Tool definitions use JSON Schema to describe available functions

The tool-calling loop sends messages, checks for tool calls, executes functions, and feeds results back

Parallel calls let the model request multiple functions at once for efficiency

Error handling returns structured errors so the model can recover gracefully

The registry pattern maps function names to callables for clean dispatch

With this foundation, you can connect any Python function -- database queries, API calls, file operations, calculations -- to a language model. The next tutorials cover building full agents with memory and multi-step planning.

FAQ

Can the model call functions I haven't defined?

No. The model can only request functions listed in the tools parameter. It might hallucinate a function name in rare cases, but your dispatch code catches this with a registry lookup.

Is function calling the same as "plugins" or "agents"?

Function calling is the mechanism. Agents and plugins are built on top of it. An agent is a loop that uses function calling repeatedly to accomplish multi-step tasks.

Does function calling cost extra?

Tool definitions count as input tokens, so they add to the prompt cost. A typical 3-tool definition adds around 200--400 tokens. The function-calling feature itself has no separate fee.

Can I force the model to call a specific function?

Yes. Set tool_choice={"type": "function", "function": {"name": "get_weather"}} to force a specific tool. Set tool_choice="none" to prevent any tool calls.

References

OpenAI Function Calling Guide -- official function calling documentation

OpenAI API Reference -- chat.completions.create -- full parameter documentation including tools and tool_choice

JSON Schema Reference -- learn the schema format used for tool definitions

OpenAI Pricing -- current model pricing

Versions used in this tutorial: Python 3.12, openai library 1.x, model gpt-4o-mini. Tested March 2026.

What Is Function Calling and Why Does It Matter?

Your First Function Call

The Tool-Calling Loop

Defining Tools with JSON Schema

Required vs Optional Parameters

Enum Parameters

Nested Objects and Arrays

Parallel Function Calling

Error Handling in Tool Execution

Real-World Example: Weather + Calculator Agent

Common Mistakes and How to Fix Them

1. Forgetting to Append the Assistant Message

2. Not Parsing Arguments from JSON

3. Mismatched tool_call_id

4. Vague Tool Descriptions

Summary and Next Steps

FAQ

Can the model call functions I haven't defined?

Is function calling the same as "plugins" or "agents"?

Does function calling cost extra?

Can I force the model to call a specific function?

References

Related Tutorials