Skip to main content

OpenAI Function Calling: Build an LLM That Executes Your Python Functions

Intermediate90 min3 exercises55 XP
0/3 exercises

Ask GPT "What's the weather in Tokyo right now?" and it'll confidently make something up. The model has no internet access, no database connection, no way to call your backend. Function calling fixes this -- it lets the model request that your code runs a specific function, then uses the real result in its answer.

What Is Function Calling and Why Does It Matter?

Picture this: a customer asks your support bot "What's the status of order #4521?" The model knows how to answer politely, but it can't query your database. Without function calling, you'd need fragile prompt hacks or regex parsing.

Function calling introduces a structured handoff. You describe your available functions using JSON Schema. When a user's question requires real data, the model returns a tool_calls object instead of text. Your code runs the function and sends the result back.

The model never runs your code directly. It only requests a function call with structured arguments. You remain in full control of execution, validation, and error handling.


Your First Function Call

Enough theory. Let's build a complete working example with a get_weather function. I'll walk you through every step so you see exactly how the pieces fit together.

First, install the library and set up the client. This block also defines the Python function that does the real work, plus the JSON tool definition that tells the model what's available.

Step 1 -- Install, set up client, define the tool
Loading editor...

Notice there are two separate things: the Python function and the JSON tool definition. The model sees only the tool definition -- it never inspects your actual code.

Now send a request and check if the model wants to call a function. When it does, finish_reason will be "tool_calls" instead of "stop".

Step 2 -- Send the request and check for tool calls
Loading editor...

The model returned a tool call request, not a text answer. Now extract the function name and arguments, run the function, and send the result back. The model will then write a natural-language answer using that data.

Step 3 -- Execute the function and send the result back
Loading editor...

That's the complete loop: describe tools, get a tool call request, run the function, send the result back, get a human-readable answer. Every function-calling pattern builds on these exact steps.

Exercise 1: Define a Tool Schema
Write Code

Write a tool definition (a Python dictionary) for a search_products function that searches an e-commerce catalog.

The function accepts:

  • query (string, required) -- the search term
  • category (string, optional) -- one of: "electronics", "clothing", "books", "home"
  • max_price (number, optional) -- maximum price filter
  • in_stock (boolean, optional) -- filter for in-stock items only
  • Your dictionary must follow the OpenAI tool format with "type": "function" at the top level.

    Note: This exercise tests your understanding of JSON Schema -- the code won't make API calls.

    Loading editor...

    The Tool-Calling Loop

    What happens when the model needs to call two functions in sequence to answer one question? Say the user asks "Is it warmer in Tokyo or London?" -- the model needs both cities' weather before it can compare. You need a loop.

    The pattern is always the same: send messages, check for tool calls, execute them, append results, send again. Repeat until the model returns a text response.

    A reusable async tool-calling loop
    Loading editor...

    The while-style loop is the key insight. Some questions need one function call, others need three or four. The loop handles any number automatically. It terminates when the model responds with text instead of requesting more tool calls.

    Exercise 2: Complete the Tool-Calling Loop
    Write Code

    Complete the dispatch_tool_call function that takes a tool call object and a registry dictionary, then returns the function result as a JSON string.

    Steps:

    1. Extract the function name from tool_call.function.name

    2. Parse the arguments from tool_call.function.arguments using json.loads

    3. Look up the function in the registry dictionary

    4. Call the function with the parsed arguments using **

    5. Return the result as a JSON string using json.dumps

    If the function name is not in the registry, return json.dumps({"error": f"Unknown function: {name}"}).

    Note: This exercise uses mock objects to simulate the OpenAI response -- no API calls needed.

    Loading editor...

    Defining Tools with JSON Schema

    I've found that 80% of function-calling bugs come from poorly written tool definitions. The model can only call your function correctly if the schema tells it exactly what to pass. Think of the schema as documentation for an AI reader.

    Required vs Optional Parameters

    Put only truly essential parameters in the "required" array. If a parameter has a sensible default, leave it optional. The model will skip optional parameters unless the user's message mentions them.

    Required vs optional parameters
    Loading editor...

    Enum Parameters

    Whenever a parameter has a fixed set of valid values, use "enum". This prevents the model from inventing invalid values like "warm" for a temperature unit that only accepts "celsius" or "fahrenheit".

    Nested Objects and Arrays

    JSON Schema supports complex structures. You can nest objects inside objects and define arrays with typed items.

    Nested objects and array parameters
    Loading editor...

    Parallel Function Calling

    Here's a problem you'll hit quickly: a user asks "Compare the weather in Tokyo, London, and New York." The model needs three weather lookups. Does it make three separate round trips?

    No. GPT-4o and GPT-4o-mini support parallel function calling. The model returns multiple tool calls in a single response. Your code executes all of them, sends all results back, and the model synthesizes one final answer.

    Parallel tool calls in action
    Loading editor...

    The model requested all three weather lookups at once. Now execute them all and send back every result. Each tool result must include the correct tool_call_id -- match them or the API will reject your request.

    Handling parallel results correctly
    Loading editor...
    Without Parallel Calls (4 API round trips)
    # Round trip 1: "What's the weather in Tokyo?"
    # -> Model calls get_weather("Tokyo") -> you return result
    # Round trip 2: "Now London?"
    # -> Model calls get_weather("London") -> you return result
    # Round trip 3: "Now New York?"
    # -> Model calls get_weather("New York") -> you return result
    # Round trip 4: "Now compare them"
    # -> Model writes final answer
    # Total: 4 API calls, ~3 seconds
    With Parallel Calls (2 API round trips)
    # Round trip 1: "Compare Tokyo, London, New York"
    # -> Model calls get_weather("Tokyo"),
    #    get_weather("London"),
    #    get_weather("New York") -- ALL AT ONCE
    # -> You return all 3 results
    # Round trip 2: Model writes final answer
    # Total: 2 API calls, ~1 second

    Error Handling in Tool Execution

    I once deployed an agent that crashed in production because a weather API returned a 503 error. My code raised an unhandled exception, the tool-calling loop died, and the user saw a blank screen. The fix took two lines.

    When your function fails, don't crash -- return a structured error. The model is surprisingly good at recovering. It can apologize, try a different approach, or ask the user for clarification.

    Wrapping tool execution with error handling
    Loading editor...

    The model receives the error as a tool result and adapts. I've seen GPT-4o read an error like "Unknown city: Toky" and respond with "I couldn't find weather data for 'Toky'. Did you mean Tokyo?" That's the power of returning errors gracefully.


    Real-World Example: Weather + Calculator Agent

    Let's combine everything into a practical multi-tool agent. This agent has access to weather data, a calculator, and a temperature converter. It decides which tools to use based on the question.

    Define three tools and their functions
    Loading editor...
    Tool definitions and the complete agent
    Loading editor...
    Testing the agent with complex questions
    Loading editor...

    Test 2 is the most interesting. The model made parallel weather calls in round 1, then used the calculator in round 2 to compute the average. Two rounds, three tool calls, one coherent answer.

    Exercise 3: Build a 2-Tool Agent
    Write Code

    Build a tool registry and dispatch function for a simple agent with two tools.

    Given:

  • lookup_user(user_id) -- returns user info as a dict
  • get_order_history(user_id, limit) -- returns order list as a dict
  • Write:

    1. A REGISTRY dictionary mapping function names to functions

    2. A process_tool_calls(tool_calls, registry) function that:

    - Takes a list of mock tool call objects and a registry

    - Returns a list of {"tool_call_id": ..., "role": "tool", "content": ...} dicts

    - Uses json.loads to parse arguments and json.dumps to serialize results

    - Handles unknown functions by returning {"error": "Unknown function: <name>"}

    Note: This exercise uses mock objects -- no API calls needed.

    Loading editor...

    Common Mistakes and How to Fix Them

    After debugging dozens of function-calling implementations, these are the four mistakes I see most often.

    1. Forgetting to Append the Assistant Message

    WRONG -- skipping the assistant message
    if msg.tool_calls:
        for tc in msg.tool_calls:
            result = dispatch(tc)
            messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})
        # ERROR: API expects the assistant message before tool results
    RIGHT -- always append the assistant message first
    if msg.tool_calls:
        messages.append(msg)  # <-- This line is critical
        for tc in msg.tool_calls:
            result = dispatch(tc)
            messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})

    2. Not Parsing Arguments from JSON

    WRONG -- arguments is a string, not a dict
    args = tool_call.function.arguments
    result = my_function(**args)
    # TypeError: strings are not keyword arguments
    RIGHT -- parse the JSON string first
    args = json.loads(tool_call.function.arguments)
    result = my_function(**args)

    3. Mismatched tool_call_id

    Each tool result must reference the exact id from the corresponding tool call. Hardcoding an ID or swapping IDs between parallel calls causes a 400 error.

    4. Vague Tool Descriptions

    Vague description -- model guesses wrong
    {
        "name": "search",
        "description": "Search for stuff",
        "parameters": {
            "type": "object",
            "properties": {
                "q": {"type": "string"}
            }
        }
    }
    # Model doesn't know: search what?
    # Products? Users? Documents?
    Clear description -- model calls correctly
    {
        "name": "search_products",
        "description": "Search the product catalog by keyword. Returns matching products with name, price, and availability.",
        "parameters": {
            "type": "object",
            "properties": {
                "q": {
                    "type": "string",
                    "description": "Search keywords, e.g. 'wireless headphones'"
                }
            },
            "required": ["q"]
        }
    }

    Summary and Next Steps

    Function calling bridges the gap between what LLMs know and what they can do. The model reasons about which tools to use and extracts structured arguments. Your code keeps full control over execution, validation, and error handling.

    Here's what we covered:

  • Tool definitions use JSON Schema to describe available functions
  • The tool-calling loop sends messages, checks for tool calls, executes functions, and feeds results back
  • Parallel calls let the model request multiple functions at once for efficiency
  • Error handling returns structured errors so the model can recover gracefully
  • The registry pattern maps function names to callables for clean dispatch
  • With this foundation, you can connect any Python function -- database queries, API calls, file operations, calculations -- to a language model. The next tutorials cover building full agents with memory and multi-step planning.


    FAQ

    Can the model call functions I haven't defined?

    No. The model can only request functions listed in the tools parameter. It might hallucinate a function name in rare cases, but your dispatch code catches this with a registry lookup.

    Is function calling the same as "plugins" or "agents"?

    Function calling is the mechanism. Agents and plugins are built on top of it. An agent is a loop that uses function calling repeatedly to accomplish multi-step tasks.

    Does function calling cost extra?

    Tool definitions count as input tokens, so they add to the prompt cost. A typical 3-tool definition adds around 200--400 tokens. The function-calling feature itself has no separate fee.

    Can I force the model to call a specific function?

    Yes. Set tool_choice={"type": "function", "function": {"name": "get_weather"}} to force a specific tool. Set tool_choice="none" to prevent any tool calls.


    References

  • OpenAI Function Calling Guide -- official function calling documentation
  • OpenAI API Reference -- chat.completions.create -- full parameter documentation including tools and tool_choice
  • JSON Schema Reference -- learn the schema format used for tool definitions
  • OpenAI Pricing -- current model pricing
  • Versions used in this tutorial: Python 3.12, openai library 1.x, model gpt-4o-mini. Tested March 2026.

    Related Tutorials