LangChain Tools: Build Custom Tools and Connect LLMs to External Services

Intermediate90 min3 exercises55 XP

Prerequisites

0/3 exercises

An LLM can write a beautiful explanation of today's weather — but it has no idea what the actual temperature is. It can describe how to query a database — but it cannot run the query. The gap between knowing how and actually doing is what LangChain tools close. By the end of this tutorial, you'll know how to give an LLM the ability to call any Python function, hit any API, and return structured results — all with type safety and error handling baked in.

What Are LangChain Tools and Why Do They Matter?

A LangChain tool is a Python function wrapped in metadata that tells the LLM what the function does, what inputs it expects, and when to use it. The LLM reads this metadata, decides whether to call the tool, generates the correct arguments, and your code executes the function with those arguments. You need Python 3.10+, langchain 0.3+, and langchain-openai (pip install langchain langchain-openai langchain-community). If you are new to LangChain, start with our LangChain quickstart first.

The code below creates a multiply tool using the @tool decorator, binds it to a GPT-4o-mini model with bind_tools(), and invokes the model with a math question. The model does not compute the answer itself — it returns a structured tool call with the function name and extracted arguments:

Your first LangChain tool

Loading editor...

Running this prints something like:

Python

Loading editor...

Notice what happened: the LLM did not compute 17 * 28 itself. It recognized that a multiply tool exists, extracted the arguments from the natural language question, and returned a structured tool call. Your code then executes multiply(17, 28) to get the actual answer.

The power here is composition. You can give the LLM ten tools — a calculator, a weather API, a database query function, a web search — and the LLM picks the right one based on the user's question. That is the foundation of LangChain agents and chains.

The @tool Decorator — The Fast Way to Create Tools

The @tool decorator is the quickest way to turn any Python function into a LangChain tool. I reach for it 90% of the time because it requires zero boilerplate — you just write a normal function with type hints and a docstring.

LangChain reads three things from your decorated function: the function name becomes the tool name, the docstring becomes the description the LLM sees, and the type annotations become the input schema. All three matter — if the docstring is vague, the LLM won't know when to use the tool.

A well-documented tool

Loading editor...

The output below shows three things to look for: the Name comes from the function name, the Description is the full docstring the LLM reads when deciding whether to call this tool, and the Schema is the auto-generated JSON schema derived from the type hint text: str:

Python

Loading editor...

The LLM receives that schema as part of every request. It knows it must provide exactly one string argument called text — nothing more, nothing less.

Tools with Multiple Parameters and Defaults

This next tool, search_products, demonstrates how optional parameters work in tool schemas. It takes a required query string, an optional category filter typed as str | None that defaults to None, and an optional max_results integer that defaults to 5. LangChain marks both optional parameters in the JSON schema, so the LLM can omit them when they are not relevant:

Tool with optional parameters

Loading editor...

The LLM sees category as optional in the schema. If the user says "find me some books about Python," the LLM might call search_products(query="Python", category="books"). If the user just says "find me something about cooking," it might omit the category entirely.

StructuredTool — Full Control with Pydantic Schemas

The @tool decorator is convenient, but sometimes you need more control. Maybe you want to validate inputs before the function runs, add field-level descriptions that are richer than what docstrings allow, or define the tool dynamically at runtime. That's where StructuredTool comes in.

StructuredTool lets you define the input schema as a Pydantic model, giving you full validation and type coercion. Here is the same multiply tool from earlier, rebuilt with StructuredTool:

StructuredTool with Pydantic schema

Loading editor...

The result:

Python

Loading editor...

The key difference: each field in MultiplyInput has its own description, which goes straight into the JSON schema the LLM sees. For complex tools with many parameters, these per-field descriptions dramatically improve the LLM's accuracy in generating correct arguments. If you are already familiar with Pydantic from LangChain output parsers, this pattern will feel natural.

When to Use @tool vs StructuredTool

@tool decorator — simple and fast

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"Weather in {city}: 22°C, sunny"

# Name, description, schema all auto-generated

StructuredTool — explicit control

class WeatherInput(BaseModel):
    city: str = Field(description="City name, e.g. 'London'")
    units: str = Field(default="celsius", description="Temperature unit")

weather_tool = StructuredTool.from_function(
    func=get_weather_func,
    name="get_weather",
    description="Get weather for a city.",
    args_schema=WeatherInput,
)

My rule of thumb: start with @tool. Move to StructuredTool when you need input validation, per-field descriptions, or dynamic tool creation (e.g., generating tools from a config file at startup).

BaseTool — Maximum Flexibility via Subclassing

LangChain offers a third way to create tools: subclassing BaseTool directly. I rarely reach for this — @tool and StructuredTool cover 95% of cases. But when you need tools that maintain internal state between calls, custom caching, or separate sync and async implementations, BaseTool gives you full control.

You subclass BaseTool, set name, description, and args_schema as class attributes, then implement _run() for synchronous execution. Optionally implement _arun() for async. Here is a stateful counter tool that tracks how many times it has been called:

BaseTool subclass with internal state

Loading editor...

This prints:

Python

Loading editor...

The count attribute persists between calls because the tool is an object instance, not a bare function. Neither @tool nor StructuredTool give you this kind of statefulness out of the box.

@tool vs StructuredTool vs BaseTool — When to Use Each

Feature	`@tool`	`StructuredTool`	`BaseTool`
Setup effort	One decorator	Pydantic model + `from_function()`	Full class definition
Schema control	Auto from type hints	Per-field `Field(description=...)`	Per-field `Field(description=...)`
Input validation	Basic type checking	Full Pydantic validation	Full Pydantic validation
Internal state	No	No	Yes — class attributes persist
Async support	`async def` works	Pass async func	Override `_arun()`
Dynamic creation	No	Yes — runtime `from_function()`	Yes — runtime subclass
Best for	Quick prototypes, simple tools	Production tools needing validation	Stateful tools, complex lifecycle

Forcing Tool Calls with tool_choice

By default, the LLM decides whether to call a tool at all. Sometimes you need to guarantee a tool call — for example, in a pipeline where the next step always expects structured output. The tool_choice parameter controls this. Pass "any" to force the model to pick a tool, a specific tool name to force that exact tool, or "auto" (the default) to let the model decide:

Controlling tool_choice

Loading editor...

I use tool_choice="any" in extraction pipelines where I always want structured output. For conversational agents, stick with the default "auto" — forcing tool calls on questions like "Hello, how are you?" leads to awkward behavior.

Binding Tools to Models and Executing Tool Calls

Creating a tool is only half the story. You need to bind it to a chat model so the LLM knows the tool exists, and then execute the tool call when the LLM requests it. This is where most beginners get tripped up.

The code below walks through a four-step process: (1) bind your tools list to the model with bind_tools(), (2) invoke the model with a question, (3) check response.tool_calls to see if the model wants to use a tool, and (4) look up the requested tool in a dictionary and call it with the model-generated arguments:

Bind tools and process tool calls

Loading editor...

Running this gives:

Python

Loading editor...

The LLM chose get_word_count over multiply because the question was about counting words, not multiplication. It parsed the quoted text as the argument. This is the core loop of tool-augmented LLMs: ask the LLM, execute the tool, and optionally feed the result back for a final answer.

Feeding Tool Results Back to the LLM

In most real applications, you want the LLM to incorporate the tool result into a natural-language answer. To do this, you send the tool result back as a ToolMessage and invoke the model again:

Complete tool call round-trip

Loading editor...

The model responds with something like:

Python

Loading editor...

This round-trip pattern — human message, AI tool call, tool result, AI final answer — is the same loop that powers ChatGPT plugins, Claude's tool use, and every LangChain agent.

Exercise 1: Build a Tool Dispatch Function

Write Code

Write a function dispatch_tool(tool_name, tool_map, args) that looks up a tool by name in a dictionary, calls it with the given args dictionary, and returns the result as a string. If the tool name is not found, return "Error: Tool 'X' not found" where X is the tool name.

This simulates the core of a tool execution loop in a LangChain agent.

Loading editor...

Built-in Tools — Tavily Search, Wikipedia, and More

LangChain ships with dozens of pre-built tools so you don't have to wrap every API from scratch. The most commonly used ones connect your LLM to the internet, knowledge bases, and system utilities.

Tavily Search — Web Search for LLMs

Tavily is a search API built specifically for LLM applications. It returns clean, structured results instead of raw HTML, which means the LLM gets better context with fewer tokens. You need a Tavily API key (free tier available at tavily.com):

Web search with Tavily

Loading editor...

Each result comes back as a dictionary with title, url, and content fields. The content is already cleaned — no HTML tags, no navigation menus.

Wikipedia — Knowledge Base Lookups

The Wikipedia tool wraps the Wikipedia API for factual lookups. You configure it with WikipediaAPIWrapper, where top_k_results controls how many articles to fetch and doc_content_chars_max caps the character count per article. The tool returns plain text that the LLM can use to ground its answers:

Wikipedia tool

Loading editor...

That doc_content_chars_max=500 is important. Without it, a single Wikipedia article can consume thousands of tokens per query, blowing through your context budget fast.

Combining Multiple Tools

The real power emerges when you bind multiple tools to one model and let the LLM route each question to the right tool. The code below binds four tools — multiply, get_word_count, search, and wikipedia — then loops over four different questions. For each question, it prints which tool the LLM selected and what arguments it generated:

Multi-tool model

Loading editor...

The LLM routes math questions to multiply, text questions to get_word_count, current-events questions to search, and factual questions to wikipedia. You wrote zero routing logic — the model picks the tool based on descriptions alone.

Building Real-World Tools — API Wrappers and Database Queries

The toy examples above are useful for understanding the mechanics, but production tools connect to real services. Let me walk through two patterns I use constantly: wrapping a REST API and querying a database.

Wrapping a REST API

Suppose you want your LLM to look up current exchange rates. The tool below wraps a free currency API with three important patterns: a try/except block that catches requests.RequestException and returns the error as a string (so the agent loop never crashes), a timeout=10 to prevent hanging on slow responses, and a None check on the target currency in case the API returns a valid response but the currency code does not exist:

REST API tool with error handling

Loading editor...

Notice the tool always returns a string — even on failure. The LLM needs to read the result and incorporate it into its response. If you raise an exception instead, the entire agent loop crashes. Descriptive error messages also help the LLM explain the failure to the user in natural language.

Querying a SQLite Database

Database tools let the LLM answer questions about structured data without the user needing to know SQL. The tool below enforces read-only access by rejecting any query that does not start with SELECT. It then executes the query with sqlite3, extracts column names from cursor.description, and formats the results as a pipe-delimited text table that the LLM can easily parse:

Database query tool

Loading editor...

In the database tool description, I included the exact table schema. This is not optional — the LLM needs to know the column names and types to write correct SQL. Without it, the LLM guesses column names and the queries fail. For a deeper dive into this pattern, see our natural language to SQL tutorial.

Exercise 2: Validate Tool Input Schemas

Write Code

Write a function validate_tool_args(schema, args) that validates a dictionary of arguments against a schema dictionary. The schema maps parameter names to their expected Python types. The function should return a tuple of (is_valid, errors) where is_valid is a boolean and errors is a list of error strings.

Check two things: (1) all required schema keys must be present in args, and (2) each provided value must be an instance of the expected type.

Loading editor...

Tool Error Handling and Retry Logic

Tools fail. APIs time out, databases go down, rate limits get hit. The question isn't if your tools will error — it's whether your application recovers gracefully or crashes in front of the user. I've spent more time debugging error handling in tool-based systems than writing the tools themselves.

Returning Errors as Strings

The simplest error-handling pattern: catch exceptions inside your tool and return the error as a string. The LLM reads the error message and can explain the problem or try a different approach:

Error handling inside a tool

Loading editor...

When the LLM calls divide(10, 0), it gets back "Error: Cannot divide by zero." instead of a Python traceback. The LLM can then tell the user: "I can't divide by zero — could you provide a different denominator?"

Using handle_tool_error

LangChain also provides a built-in handle_tool_error parameter on tools. When set to True, any exception is caught and returned as a string automatically:

Built-in error handling

Loading editor...

If eval raises a SyntaxError or NameError, LangChain catches it and returns the error message to the LLM rather than crashing the agent loop.

You can also pass a custom error handler function for more control:

Custom error handler

Loading editor...

Retry Logic with Fallbacks

For transient errors (timeouts, rate limits), you often want to retry before giving up. Here's a pattern using Python's tenacity library that works well with LangChain tools:

Retry logic with tenacity

Loading editor...

The wait_exponential strategy waits 1 second after the first failure, 2 seconds after the second, then 4 seconds — giving the external service time to recover. After 3 failed attempts, the exception propagates and the tool returns an error message.

Using Tools in LCEL Chains

Tools slot naturally into LCEL chains. A common pattern is to build a chain that takes a user question, calls the LLM with bound tools, executes any tool calls, and returns the final answer — all in a single composable pipeline.

Tool execution in an LCEL chain

Loading editor...

This outputs:

Python

Loading editor...

The pipe operator connects the LLM (which generates tool calls) to the executor (which runs them). You get a single callable chain that handles the full tool-use flow. This is a simpler alternative to using a full agent when you only need one round of tool calls.

Common Mistakes and How to Fix Them

These are the mistakes that trip up nearly every developer when they start building with LangChain tools. I have made every one of them myself.

Mistake 1: Vague Tool Descriptions

Vague description — LLM guesses wrong

@tool
def process(data: str) -> str:
    """Process the data."""
    # What does "process" mean? The LLM has no idea
    return data.upper()

Specific description — LLM routes correctly

@tool
def uppercase_text(text: str) -> str:
    """Convert text to uppercase letters.

    Use when the user asks to capitalize, uppercase,
    or make text ALL CAPS.

    Args:
        text: The text to convert to uppercase.
    """
    return text.upper()

Mistake 2: Missing Type Annotations

Without type annotations, LangChain cannot generate an input schema. The tool either fails to register or generates a wildcard schema that accepts anything:

No type hints — broken schema

@tool
def add(a, b):
    """Add two numbers."""
    return a + b
# Schema is empty — LLM doesn't know what to pass

Type hints — correct schema

@tool
def add(a: int, b: int) -> int:
    """Add two integers and return the sum."""
    return a + b
# Schema: {"a": int, "b": int}

Mistake 3: Raising Exceptions Instead of Returning Errors

If your tool raises an unhandled exception, the entire agent loop crashes. Always catch exceptions and return error messages as strings:

Exception crashes the agent

@tool
def fetch_data(url: str) -> str:
    """Fetch data from a URL."""
    response = requests.get(url)
    response.raise_for_status()  # Raises on 4xx/5xx
    return response.text

Error returned gracefully

@tool
def fetch_data(url: str) -> str:
    """Fetch data from a URL."""
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        return response.text
    except requests.RequestException as e:
        return f"Failed to fetch {url}: {e}"

Mistake 4: Tools That Return Too Much Data

Returning a 10,000-word Wikipedia article or 500 database rows as a tool result burns through tokens and can exceed context limits. Always truncate, summarize, or paginate tool output:

Truncating tool output

Loading editor...

Exercise 3: Build a Safe Tool Wrapper

Write Code

Write a function safe_tool_call(func, args, max_retries=2) that calls a function with the given args dict and returns the result as a string. If the function raises an exception, retry up to max_retries times. If all retries fail, return "Error after N retries: <error message>" where N is max_retries.

The function should track how many attempts were made.

Loading editor...

Performance Tips and Best Practices

Tool-based LLM applications have a unique performance profile: the bottleneck is almost never your Python code — it is the LLM calls and external API requests. Here is what actually matters for speed and cost.

Limit the number of tools bound to a single model. Most models handle 10-15 tools reliably. Beyond 20, routing accuracy drops noticeably. If you have 50 tools, use a two-stage approach: the first LLM picks a category, the second LLM (bound to just that category's tools) picks the specific tool.

Cache tool results aggressively. If the exchange rate for USD/EUR was fetched 30 seconds ago, do not hit the API again. Use Python's functools.lru_cache or a Redis cache with a TTL:

Caching tool results

Loading editor...

Use async tools for I/O-bound operations. When your chain calls multiple tools in parallel (via RunnableParallel), async tools prevent one slow API call from blocking the others:

Async tool example

Loading editor...

Frequently Asked Questions

Can I use tools with models other than OpenAI?

Yes. bind_tools() works with any LangChain chat model that supports tool calling — including Anthropic Claude, Google Gemini, Mistral, and local models via Ollama. See our model switching guide for details. The interface is identical:

Python

Loading editor...

What is the difference between tools and function calling?

They refer to the same concept. "Function calling" is OpenAI's original term for the feature. "Tool use" is the broader term used by LangChain and Anthropic. In LangChain, both are accessed through the same bind_tools() API regardless of the underlying provider.

How do I make a tool return structured data instead of strings?

Tools can return any serializable type (dicts, lists, Pydantic models). However, when the result is passed back to the LLM as a ToolMessage, it gets serialized to a string. For inter-tool communication within an agent, return dicts. For LLM consumption, format the output as a readable string:

Python

Loading editor...

How many tools can I bind to one model?

There is no hard limit, but practical limits exist. OpenAI supports up to 128 tools per call. Anthropic Claude supports up to 64. However, accuracy degrades well before those limits. In practice, 5-15 well-described tools work reliably. Beyond that, consider a two-stage routing approach.

Summary

LangChain tools bridge the gap between what LLMs know and what they can do. You learned three ways to create tools (@tool, StructuredTool, and BaseTool), how to bind them to models, execute tool calls, feed results back, and handle errors with retry logic. The key takeaway: tools are just Python functions with metadata. The LLM reads the metadata, decides when to use the function, and generates the arguments.

From here, explore LangChain chains and agents for multi-step tool use, LCEL pipelines for composable chains, or LangSmith for debugging and tracing tool calls in production. If you are building RAG systems, tools pair naturally with document loaders and text splitters.

Complete Code

Complete working script (copy-paste and run)

Loading editor...

References

LangChain documentation — Tools. Link

LangChain documentation — How to create tools. Link

LangChain documentation — How to use chat models to call tools. Link

OpenAI documentation — Function calling. Link

Anthropic documentation — Tool use. Link

Pydantic documentation — Models. Link

Tenacity documentation — Retry library. Link

LangChain Tools: Build Custom Tools and Connect LLMs to External Services

What Are LangChain Tools and Why Do They Matter?

The @tool Decorator — The Fast Way to Create Tools

Tools with Multiple Parameters and Defaults

StructuredTool — Full Control with Pydantic Schemas

When to Use @tool vs StructuredTool

BaseTool — Maximum Flexibility via Subclassing

@tool vs StructuredTool vs BaseTool — When to Use Each

Forcing Tool Calls with tool_choice

Binding Tools to Models and Executing Tool Calls

Feeding Tool Results Back to the LLM

Built-in Tools — Tavily Search, Wikipedia, and More

Tavily Search — Web Search for LLMs

Wikipedia — Knowledge Base Lookups

Combining Multiple Tools

Building Real-World Tools — API Wrappers and Database Queries

Wrapping a REST API

Querying a SQLite Database

Tool Error Handling and Retry Logic

Returning Errors as Strings

Using handle_tool_error

Retry Logic with Fallbacks

Using Tools in LCEL Chains

Common Mistakes and How to Fix Them

Mistake 1: Vague Tool Descriptions

Mistake 2: Missing Type Annotations

Mistake 3: Raising Exceptions Instead of Returning Errors

Mistake 4: Tools That Return Too Much Data

Performance Tips and Best Practices

Frequently Asked Questions

Can I use tools with models other than OpenAI?

What is the difference between tools and function calling?

How do I make a tool return structured data instead of strings?

How many tools can I bind to one model?

Summary

Complete Code

References

Related Tutorials

Save your progress across devices