LangChain Tools: Build Custom Tools and Connect LLMs to External Services
An LLM can write a beautiful explanation of today's weather — but it has no idea what the actual temperature is. It can describe how to query a database — but it cannot run the query. The gap between knowing how and actually doing is what LangChain tools close. By the end of this tutorial, you'll know how to give an LLM the ability to call any Python function, hit any API, and return structured results — all with type safety and error handling baked in.
What Are LangChain Tools and Why Do They Matter?
A LangChain tool is a Python function wrapped in metadata that tells the LLM what the function does, what inputs it expects, and when to use it. The LLM reads this metadata, decides whether to call the tool, generates the correct arguments, and your code executes the function with those arguments. You need Python 3.10+, langchain 0.3+, and langchain-openai (pip install langchain langchain-openai langchain-community). If you are new to LangChain, start with our LangChain quickstart first.
The code below creates a multiply tool using the @tool decorator, binds it to a GPT-4o-mini model with bind_tools(), and invokes the model with a math question. The model does not compute the answer itself — it returns a structured tool call with the function name and extracted arguments:
Running this prints something like:
Notice what happened: the LLM did not compute 17 * 28 itself. It recognized that a multiply tool exists, extracted the arguments from the natural language question, and returned a structured tool call. Your code then executes multiply(17, 28) to get the actual answer.
The power here is composition. You can give the LLM ten tools — a calculator, a weather API, a database query function, a web search — and the LLM picks the right one based on the user's question. That is the foundation of LangChain agents and chains.
The @tool Decorator — The Fast Way to Create Tools
The @tool decorator is the quickest way to turn any Python function into a LangChain tool. I reach for it 90% of the time because it requires zero boilerplate — you just write a normal function with type hints and a docstring.
LangChain reads three things from your decorated function: the function name becomes the tool name, the docstring becomes the description the LLM sees, and the type annotations become the input schema. All three matter — if the docstring is vague, the LLM won't know when to use the tool.
The output below shows three things to look for: the Name comes from the function name, the Description is the full docstring the LLM reads when deciding whether to call this tool, and the Schema is the auto-generated JSON schema derived from the type hint text: str:
The LLM receives that schema as part of every request. It knows it must provide exactly one string argument called text — nothing more, nothing less.
Tools with Multiple Parameters and Defaults
This next tool, search_products, demonstrates how optional parameters work in tool schemas. It takes a required query string, an optional category filter typed as str | None that defaults to None, and an optional max_results integer that defaults to 5. LangChain marks both optional parameters in the JSON schema, so the LLM can omit them when they are not relevant:
The LLM sees category as optional in the schema. If the user says "find me some books about Python," the LLM might call search_products(query="Python", category="books"). If the user just says "find me something about cooking," it might omit the category entirely.
StructuredTool — Full Control with Pydantic Schemas
The @tool decorator is convenient, but sometimes you need more control. Maybe you want to validate inputs before the function runs, add field-level descriptions that are richer than what docstrings allow, or define the tool dynamically at runtime. That's where StructuredTool comes in.
StructuredTool lets you define the input schema as a Pydantic model, giving you full validation and type coercion. Here is the same multiply tool from earlier, rebuilt with StructuredTool:
The result:
The key difference: each field in MultiplyInput has its own description, which goes straight into the JSON schema the LLM sees. For complex tools with many parameters, these per-field descriptions dramatically improve the LLM's accuracy in generating correct arguments. If you are already familiar with Pydantic from LangChain output parsers, this pattern will feel natural.
When to Use @tool vs StructuredTool
@tool
def get_weather(city: str) -> str:
"""Get current weather for a city."""
return f"Weather in {city}: 22°C, sunny"
# Name, description, schema all auto-generatedclass WeatherInput(BaseModel):
city: str = Field(description="City name, e.g. 'London'")
units: str = Field(default="celsius", description="Temperature unit")
weather_tool = StructuredTool.from_function(
func=get_weather_func,
name="get_weather",
description="Get weather for a city.",
args_schema=WeatherInput,
)My rule of thumb: start with @tool. Move to StructuredTool when you need input validation, per-field descriptions, or dynamic tool creation (e.g., generating tools from a config file at startup).
BaseTool — Maximum Flexibility via Subclassing
LangChain offers a third way to create tools: subclassing BaseTool directly. I rarely reach for this — @tool and StructuredTool cover 95% of cases. But when you need tools that maintain internal state between calls, custom caching, or separate sync and async implementations, BaseTool gives you full control.
You subclass BaseTool, set name, description, and args_schema as class attributes, then implement _run() for synchronous execution. Optionally implement _arun() for async. Here is a stateful counter tool that tracks how many times it has been called:
This prints:
The count attribute persists between calls because the tool is an object instance, not a bare function. Neither @tool nor StructuredTool give you this kind of statefulness out of the box.
@tool vs StructuredTool vs BaseTool — When to Use Each
| Feature | @tool | StructuredTool | BaseTool |
|---|---|---|---|
| Setup effort | One decorator | Pydantic model + from_function() | Full class definition |
| Schema control | Auto from type hints | Per-field Field(description=...) | Per-field Field(description=...) |
| Input validation | Basic type checking | Full Pydantic validation | Full Pydantic validation |
| Internal state | No | No | Yes — class attributes persist |
| Async support | async def works | Pass async func | Override _arun() |
| Dynamic creation | No | Yes — runtime from_function() | Yes — runtime subclass |
| Best for | Quick prototypes, simple tools | Production tools needing validation | Stateful tools, complex lifecycle |
Forcing Tool Calls with tool_choice
By default, the LLM decides whether to call a tool at all. Sometimes you need to guarantee a tool call — for example, in a pipeline where the next step always expects structured output. The tool_choice parameter controls this. Pass "any" to force the model to pick a tool, a specific tool name to force that exact tool, or "auto" (the default) to let the model decide:
I use tool_choice="any" in extraction pipelines where I always want structured output. For conversational agents, stick with the default "auto" — forcing tool calls on questions like "Hello, how are you?" leads to awkward behavior.
Binding Tools to Models and Executing Tool Calls
Creating a tool is only half the story. You need to bind it to a chat model so the LLM knows the tool exists, and then execute the tool call when the LLM requests it. This is where most beginners get tripped up.
The code below walks through a four-step process: (1) bind your tools list to the model with bind_tools(), (2) invoke the model with a question, (3) check response.tool_calls to see if the model wants to use a tool, and (4) look up the requested tool in a dictionary and call it with the model-generated arguments:
Running this gives:
The LLM chose get_word_count over multiply because the question was about counting words, not multiplication. It parsed the quoted text as the argument. This is the core loop of tool-augmented LLMs: ask the LLM, execute the tool, and optionally feed the result back for a final answer.
Feeding Tool Results Back to the LLM
In most real applications, you want the LLM to incorporate the tool result into a natural-language answer. To do this, you send the tool result back as a ToolMessage and invoke the model again:
The model responds with something like:
This round-trip pattern — human message, AI tool call, tool result, AI final answer — is the same loop that powers ChatGPT plugins, Claude's tool use, and every LangChain agent.
Write a function dispatch_tool(tool_name, tool_map, args) that looks up a tool by name in a dictionary, calls it with the given args dictionary, and returns the result as a string. If the tool name is not found, return "Error: Tool 'X' not found" where X is the tool name.
This simulates the core of a tool execution loop in a LangChain agent.
Built-in Tools — Tavily Search, Wikipedia, and More
LangChain ships with dozens of pre-built tools so you don't have to wrap every API from scratch. The most commonly used ones connect your LLM to the internet, knowledge bases, and system utilities.
Tavily Search — Web Search for LLMs
Tavily is a search API built specifically for LLM applications. It returns clean, structured results instead of raw HTML, which means the LLM gets better context with fewer tokens. You need a Tavily API key (free tier available at tavily.com):
Each result comes back as a dictionary with title, url, and content fields. The content is already cleaned — no HTML tags, no navigation menus.
Wikipedia — Knowledge Base Lookups
The Wikipedia tool wraps the Wikipedia API for factual lookups. You configure it with WikipediaAPIWrapper, where top_k_results controls how many articles to fetch and doc_content_chars_max caps the character count per article. The tool returns plain text that the LLM can use to ground its answers:
That doc_content_chars_max=500 is important. Without it, a single Wikipedia article can consume thousands of tokens per query, blowing through your context budget fast.
Combining Multiple Tools
The real power emerges when you bind multiple tools to one model and let the LLM route each question to the right tool. The code below binds four tools — multiply, get_word_count, search, and wikipedia — then loops over four different questions. For each question, it prints which tool the LLM selected and what arguments it generated:
The LLM routes math questions to multiply, text questions to get_word_count, current-events questions to search, and factual questions to wikipedia. You wrote zero routing logic — the model picks the tool based on descriptions alone.
Building Real-World Tools — API Wrappers and Database Queries
The toy examples above are useful for understanding the mechanics, but production tools connect to real services. Let me walk through two patterns I use constantly: wrapping a REST API and querying a database.
Wrapping a REST API
Suppose you want your LLM to look up current exchange rates. The tool below wraps a free currency API with three important patterns: a try/except block that catches requests.RequestException and returns the error as a string (so the agent loop never crashes), a timeout=10 to prevent hanging on slow responses, and a None check on the target currency in case the API returns a valid response but the currency code does not exist:
Notice the tool always returns a string — even on failure. The LLM needs to read the result and incorporate it into its response. If you raise an exception instead, the entire agent loop crashes. Descriptive error messages also help the LLM explain the failure to the user in natural language.
Querying a SQLite Database
Database tools let the LLM answer questions about structured data without the user needing to know SQL. The tool below enforces read-only access by rejecting any query that does not start with SELECT. It then executes the query with sqlite3, extracts column names from cursor.description, and formats the results as a pipe-delimited text table that the LLM can easily parse:
In the database tool description, I included the exact table schema. This is not optional — the LLM needs to know the column names and types to write correct SQL. Without it, the LLM guesses column names and the queries fail. For a deeper dive into this pattern, see our natural language to SQL tutorial.
Write a function validate_tool_args(schema, args) that validates a dictionary of arguments against a schema dictionary. The schema maps parameter names to their expected Python types. The function should return a tuple of (is_valid, errors) where is_valid is a boolean and errors is a list of error strings.
Check two things: (1) all required schema keys must be present in args, and (2) each provided value must be an instance of the expected type.
Tool Error Handling and Retry Logic
Tools fail. APIs time out, databases go down, rate limits get hit. The question isn't if your tools will error — it's whether your application recovers gracefully or crashes in front of the user. I've spent more time debugging error handling in tool-based systems than writing the tools themselves.
Returning Errors as Strings
The simplest error-handling pattern: catch exceptions inside your tool and return the error as a string. The LLM reads the error message and can explain the problem or try a different approach:
When the LLM calls divide(10, 0), it gets back "Error: Cannot divide by zero." instead of a Python traceback. The LLM can then tell the user: "I can't divide by zero — could you provide a different denominator?"
Using handle_tool_error
LangChain also provides a built-in handle_tool_error parameter on tools. When set to True, any exception is caught and returned as a string automatically:
If eval raises a SyntaxError or NameError, LangChain catches it and returns the error message to the LLM rather than crashing the agent loop.
You can also pass a custom error handler function for more control:
Retry Logic with Fallbacks
For transient errors (timeouts, rate limits), you often want to retry before giving up. Here's a pattern using Python's tenacity library that works well with LangChain tools:
The wait_exponential strategy waits 1 second after the first failure, 2 seconds after the second, then 4 seconds — giving the external service time to recover. After 3 failed attempts, the exception propagates and the tool returns an error message.
Using Tools in LCEL Chains
Tools slot naturally into LCEL chains. A common pattern is to build a chain that takes a user question, calls the LLM with bound tools, executes any tool calls, and returns the final answer — all in a single composable pipeline.
This outputs:
The pipe operator connects the LLM (which generates tool calls) to the executor (which runs them). You get a single callable chain that handles the full tool-use flow. This is a simpler alternative to using a full agent when you only need one round of tool calls.
Common Mistakes and How to Fix Them
These are the mistakes that trip up nearly every developer when they start building with LangChain tools. I have made every one of them myself.
Mistake 1: Vague Tool Descriptions
@tool
def process(data: str) -> str:
"""Process the data."""
# What does "process" mean? The LLM has no idea
return data.upper()@tool
def uppercase_text(text: str) -> str:
"""Convert text to uppercase letters.
Use when the user asks to capitalize, uppercase,
or make text ALL CAPS.
Args:
text: The text to convert to uppercase.
"""
return text.upper()Mistake 2: Missing Type Annotations
Without type annotations, LangChain cannot generate an input schema. The tool either fails to register or generates a wildcard schema that accepts anything:
@tool
def add(a, b):
"""Add two numbers."""
return a + b
# Schema is empty — LLM doesn't know what to pass@tool
def add(a: int, b: int) -> int:
"""Add two integers and return the sum."""
return a + b
# Schema: {"a": int, "b": int}Mistake 3: Raising Exceptions Instead of Returning Errors
If your tool raises an unhandled exception, the entire agent loop crashes. Always catch exceptions and return error messages as strings:
@tool
def fetch_data(url: str) -> str:
"""Fetch data from a URL."""
response = requests.get(url)
response.raise_for_status() # Raises on 4xx/5xx
return response.text@tool
def fetch_data(url: str) -> str:
"""Fetch data from a URL."""
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
return response.text
except requests.RequestException as e:
return f"Failed to fetch {url}: {e}"Mistake 4: Tools That Return Too Much Data
Returning a 10,000-word Wikipedia article or 500 database rows as a tool result burns through tokens and can exceed context limits. Always truncate, summarize, or paginate tool output:
Write a function safe_tool_call(func, args, max_retries=2) that calls a function with the given args dict and returns the result as a string. If the function raises an exception, retry up to max_retries times. If all retries fail, return "Error after N retries: <error message>" where N is max_retries.
The function should track how many attempts were made.
Performance Tips and Best Practices
Tool-based LLM applications have a unique performance profile: the bottleneck is almost never your Python code — it is the LLM calls and external API requests. Here is what actually matters for speed and cost.
Limit the number of tools bound to a single model. Most models handle 10-15 tools reliably. Beyond 20, routing accuracy drops noticeably. If you have 50 tools, use a two-stage approach: the first LLM picks a category, the second LLM (bound to just that category's tools) picks the specific tool.
Cache tool results aggressively. If the exchange rate for USD/EUR was fetched 30 seconds ago, do not hit the API again. Use Python's functools.lru_cache or a Redis cache with a TTL:
Use async tools for I/O-bound operations. When your chain calls multiple tools in parallel (via RunnableParallel), async tools prevent one slow API call from blocking the others:
Frequently Asked Questions
Can I use tools with models other than OpenAI?
Yes. bind_tools() works with any LangChain chat model that supports tool calling — including Anthropic Claude, Google Gemini, Mistral, and local models via Ollama. See our model switching guide for details. The interface is identical:
What is the difference between tools and function calling?
They refer to the same concept. "Function calling" is OpenAI's original term for the feature. "Tool use" is the broader term used by LangChain and Anthropic. In LangChain, both are accessed through the same bind_tools() API regardless of the underlying provider.
How do I make a tool return structured data instead of strings?
Tools can return any serializable type (dicts, lists, Pydantic models). However, when the result is passed back to the LLM as a ToolMessage, it gets serialized to a string. For inter-tool communication within an agent, return dicts. For LLM consumption, format the output as a readable string:
How many tools can I bind to one model?
There is no hard limit, but practical limits exist. OpenAI supports up to 128 tools per call. Anthropic Claude supports up to 64. However, accuracy degrades well before those limits. In practice, 5-15 well-described tools work reliably. Beyond that, consider a two-stage routing approach.
Summary
LangChain tools bridge the gap between what LLMs know and what they can do. You learned three ways to create tools (@tool, StructuredTool, and BaseTool), how to bind them to models, execute tool calls, feed results back, and handle errors with retry logic. The key takeaway: tools are just Python functions with metadata. The LLM reads the metadata, decides when to use the function, and generates the arguments.
From here, explore LangChain chains and agents for multi-step tool use, LCEL pipelines for composable chains, or LangSmith for debugging and tracing tool calls in production. If you are building RAG systems, tools pair naturally with document loaders and text splitters.