LangChain Tools: Build Custom Tools and Connect LLMs to External Services
An LLM can write a beautiful explanation of today's weather — but it has no idea what the actual temperature is. It can describe how to query a database — but it cannot run the query. The gap between knowing how and actually doing is what LangChain tools close. By the end of this tutorial, you'll know how to give an LLM the ability to call any Python function, hit any API, and return structured results — all with type safety and error handling baked in.
What Are LangChain Tools and Why Do They Matter?
A LangChain tool is a Python function wrapped in metadata that tells the LLM what the function does, what inputs it expects, and when to use it. The LLM reads this metadata, decides whether to call the tool, generates the correct arguments, and your code executes the function with those arguments. You'll need Python 3.10+, langchain 0.3+, and langchain-openai (pip install langchain langchain-openai langchain-community).
I think of it this way: you're giving the LLM a menu of capabilities. Each tool is a menu item with a name, a description, and an order form (the input schema). The LLM reads the menu, picks what it needs, fills out the order form, and your code does the actual work.
Here's a minimal example. We create a tool that multiplies two numbers, then bind it to a model so the LLM can decide to use it:
Running this prints something like:
Notice what happened: the LLM did not compute 17 * 28 itself. It recognized that a multiply tool exists, extracted the arguments from the natural language question, and returned a structured tool call. Your code then executes multiply(17, 28) to get the actual answer.
The power here is composition. You can give the LLM ten tools — a calculator, a weather API, a database query function, a web search — and the LLM picks the right one based on the user's question. That is the foundation of AI agents.
The @tool Decorator — The Fast Way to Create Tools
The @tool decorator is the quickest way to turn any Python function into a LangChain tool. I reach for it 90% of the time because it requires zero boilerplate — you just write a normal function with type hints and a docstring.
LangChain reads three things from your decorated function: the function name becomes the tool name, the docstring becomes the description the LLM sees, and the type annotations become the input schema. All three matter — if the docstring is vague, the LLM won't know when to use the tool.
Which produces:
The schema was generated automatically from the type hint text: str. The LLM receives this schema and knows it must provide a string argument called text.
Tools with Multiple Parameters and Defaults
Tools can accept any number of typed parameters. Optional parameters with defaults work exactly as you'd expect:
The LLM sees category as optional in the schema. If the user says "find me some books about Python," the LLM might call search_products(query="Python", category="books"). If the user just says "find me something about cooking," it might omit the category entirely.
StructuredTool — Full Control with Pydantic Schemas
The @tool decorator is convenient, but sometimes you need more control. Maybe you want to validate inputs before the function runs, add field-level descriptions that are richer than what docstrings allow, or define the tool dynamically at runtime. That's where StructuredTool comes in.
StructuredTool lets you define the input schema as a Pydantic model, giving you full validation and type coercion. Here is the same multiply tool from earlier, rebuilt with StructuredTool:
The result:
The key difference: each field in MultiplyInput has its own description, which goes straight into the JSON schema the LLM sees. For complex tools with many parameters, these per-field descriptions dramatically improve the LLM's accuracy in generating correct arguments.
When to Use @tool vs StructuredTool
@tool
def get_weather(city: str) -> str:
"""Get current weather for a city."""
return f"Weather in {city}: 22°C, sunny"
# Name, description, schema all auto-generatedclass WeatherInput(BaseModel):
city: str = Field(description="City name, e.g. 'London'")
units: str = Field(default="celsius", description="Temperature unit")
weather_tool = StructuredTool.from_function(
func=get_weather_func,
name="get_weather",
description="Get weather for a city.",
args_schema=WeatherInput,
)My rule of thumb: start with @tool. Move to StructuredTool when you need input validation, per-field descriptions, or dynamic tool creation (e.g., generating tools from a config file at startup).
Binding Tools to Models and Executing Tool Calls
Creating a tool is only half the story. You need to bind it to a chat model so the LLM knows the tool exists, and then execute the tool call when the LLM requests it. This two-step dance is where most beginners get tripped up.
Running this gives:
The LLM chose get_word_count over multiply because the question was about counting words, not multiplication. It parsed the quoted text as the argument. This is the core loop of tool-augmented LLMs: ask the LLM, execute the tool, and optionally feed the result back for a final answer.
Feeding Tool Results Back to the LLM
In most real applications, you want the LLM to incorporate the tool result into a natural-language answer. To do this, you send the tool result back as a ToolMessage and invoke the model again:
The model responds with something like:
This round-trip pattern — human message, AI tool call, tool result, AI final answer — is the same loop that powers ChatGPT plugins, Claude's tool use, and every LangChain agent.
Write a function dispatch_tool(tool_name, tool_map, args) that looks up a tool by name in a dictionary, calls it with the given args dictionary, and returns the result as a string. If the tool name is not found, return "Error: Tool 'X' not found" where X is the tool name.
This simulates the core of a tool execution loop in a LangChain agent.
Built-in Tools — Tavily Search, Wikipedia, and More
LangChain ships with dozens of pre-built tools so you don't have to wrap every API from scratch. The most commonly used ones connect your LLM to the internet, knowledge bases, and system utilities.
Tavily Search — Web Search for LLMs
Tavily is a search API built specifically for LLM applications. It returns clean, structured results instead of raw HTML, which means the LLM gets better context with fewer tokens. You need a Tavily API key (free tier available at tavily.com):
Each result comes back as a dictionary with title, url, and content fields. The content is already cleaned — no HTML tags, no navigation menus.
Wikipedia — Knowledge Base Lookups
The Wikipedia tool is useful for factual lookups where you want the LLM to ground its answers in encyclopedia-quality content:
The doc_content_chars_max parameter is crucial for controlling token costs. Without it, Wikipedia articles can eat thousands of tokens per query.
Combining Multiple Tools
The real power shows when you bind several tools to a single model. The LLM picks the right tool for each question:
The LLM routes math questions to multiply, text questions to get_word_count, current-events questions to search, and factual questions to wikipedia. No routing logic on your side — the model handles it based on tool descriptions alone.
Building Real-World Tools — API Wrappers and Database Queries
The toy examples above are useful for understanding the mechanics, but production tools connect to real services. Let me walk through two patterns I use constantly: wrapping a REST API and querying a database.
Wrapping a REST API
Suppose you want your LLM to look up current exchange rates. Here's a tool that wraps a free currency API:
Three things to notice. First, the tool returns a string — even for errors. This is important because the LLM needs to read the result and incorporate it into its response. If you raise an exception, the agent loop crashes. Second, the timeout=10 prevents hanging on slow APIs. Third, the descriptive error messages help the LLM explain the failure to the user.
Querying a SQLite Database
Database tools let the LLM answer questions about your data without the user needing to know SQL. Here's a read-only database query tool:
In the database tool description, I included the exact table schema. This is not optional — the LLM needs to know the column names and types to write correct SQL. Without it, the LLM guesses column names and the queries fail.
Write a function validate_tool_args(schema, args) that validates a dictionary of arguments against a schema dictionary. The schema maps parameter names to their expected Python types. The function should return a tuple of (is_valid, errors) where is_valid is a boolean and errors is a list of error strings.
Check two things: (1) all required schema keys must be present in args, and (2) each provided value must be an instance of the expected type.
Tool Error Handling and Retry Logic
Tools fail. APIs time out, databases go down, rate limits get hit. The question isn't if your tools will error — it's whether your application recovers gracefully or crashes in front of the user. I've spent more time debugging error handling in tool-based systems than writing the tools themselves.
Returning Errors as Strings
The simplest error-handling pattern: catch exceptions inside your tool and return the error as a string. The LLM reads the error message and can explain the problem or try a different approach:
When the LLM calls divide(10, 0), it gets back "Error: Cannot divide by zero." instead of a Python traceback. The LLM can then tell the user: "I can't divide by zero — could you provide a different denominator?"
Using handle_tool_error
LangChain also provides a built-in handle_tool_error parameter on tools. When set to True, any exception is caught and returned as a string automatically:
If eval raises a SyntaxError or NameError, LangChain catches it and returns the error message to the LLM rather than crashing the agent loop.
You can also pass a custom error handler function for more control:
Retry Logic with Fallbacks
For transient errors (timeouts, rate limits), you often want to retry before giving up. Here's a pattern using Python's tenacity library that works well with LangChain tools:
The wait_exponential strategy waits 1 second after the first failure, 2 seconds after the second, then 4 seconds — giving the external service time to recover. After 3 failed attempts, the exception propagates and the tool returns an error message.
Using Tools in LCEL Chains
Tools slot naturally into LCEL chains. A common pattern is to build a chain that takes a user question, calls the LLM with bound tools, executes any tool calls, and returns the final answer — all in a single composable pipeline.
This outputs:
The pipe operator connects the LLM (which generates tool calls) to the executor (which runs them). You get a single callable chain that handles the full tool-use flow. This is a simpler alternative to using a full agent when you only need one round of tool calls.
Common Mistakes and How to Fix Them
After building dozens of tool-based systems, these are the mistakes I see most often — and I've made every one of them myself.
Mistake 1: Vague Tool Descriptions
@tool
def process(data: str) -> str:
"""Process the data."""
# What does "process" mean? The LLM has no idea
return data.upper()@tool
def uppercase_text(text: str) -> str:
"""Convert text to uppercase letters.
Use when the user asks to capitalize, uppercase,
or make text ALL CAPS.
Args:
text: The text to convert to uppercase.
"""
return text.upper()Mistake 2: Missing Type Annotations
Without type annotations, LangChain cannot generate an input schema. The tool either fails to register or generates a wildcard schema that accepts anything:
@tool
def add(a, b):
"""Add two numbers."""
return a + b
# Schema is empty — LLM doesn't know what to pass@tool
def add(a: int, b: int) -> int:
"""Add two integers and return the sum."""
return a + b
# Schema: {"a": int, "b": int}Mistake 3: Raising Exceptions Instead of Returning Errors
If your tool raises an unhandled exception, the entire agent loop crashes. Always catch exceptions and return error messages as strings:
@tool
def fetch_data(url: str) -> str:
"""Fetch data from a URL."""
response = requests.get(url)
response.raise_for_status() # Raises on 4xx/5xx
return response.text@tool
def fetch_data(url: str) -> str:
"""Fetch data from a URL."""
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
return response.text
except requests.RequestException as e:
return f"Failed to fetch {url}: {e}"Mistake 4: Tools That Return Too Much Data
Returning a 10,000-word Wikipedia article or 500 database rows as a tool result burns through tokens and can exceed context limits. Always truncate, summarize, or paginate tool output:
Write a function safe_tool_call(func, args, max_retries=2) that calls a function with the given args dict and returns the result as a string. If the function raises an exception, retry up to max_retries times. If all retries fail, return "Error after N retries: <error message>" where N is max_retries.
The function should track how many attempts were made.
Performance Tips and Best Practices
Tool-based LLM applications have a unique performance profile: the bottleneck is almost never your Python code — it's the LLM calls and external API requests. Here's what actually matters for speed and cost.
Keep tool descriptions short but precise. Every character in your tool descriptions and schemas is a token that gets sent with every LLM call. If you have 10 tools with 200-word descriptions each, that's 2,000 words of system-prompt overhead on every request. Aim for 2-3 sentences per tool.
Limit the number of tools bound to a single model. In my experience, GPT-4o handles 10-15 tools well. Beyond 20, the model starts making routing mistakes. If you have 50 tools, group them into categories and use a two-stage approach: the first LLM picks the category, the second LLM (bound to just that category's tools) picks the specific tool.
Cache tool results aggressively. If the exchange rate for USD/EUR was fetched 30 seconds ago, don't hit the API again. Use Python's functools.lru_cache or a Redis cache with a TTL:
Use async tools for I/O-bound operations. When your chain calls multiple tools in parallel (via RunnableParallel), async tools prevent one slow API call from blocking the others:
Frequently Asked Questions
Can I use tools with models other than OpenAI?
Yes. bind_tools() works with any LangChain chat model that supports tool calling — including Anthropic Claude, Google Gemini, Mistral, and local models via Ollama. The interface is identical:
What is the difference between tools and function calling?
They refer to the same concept. "Function calling" is OpenAI's original term for the feature. "Tool use" is the broader term used by LangChain and Anthropic. In LangChain, both are accessed through the same bind_tools() API regardless of the underlying provider.
How do I make a tool return structured data instead of strings?
Tools can return any serializable type (dicts, lists, Pydantic models). However, when the result is passed back to the LLM as a ToolMessage, it gets serialized to a string. For inter-tool communication within an agent, return dicts. For LLM consumption, format the output as a readable string:
How many tools can I bind to one model?
There is no hard limit, but practical limits exist. OpenAI supports up to 128 tools per call. Anthropic Claude supports up to 64. However, accuracy degrades well before those limits. In practice, 5-15 well-described tools work reliably. Beyond that, consider a two-stage routing approach.
Summary
LangChain tools bridge the gap between what LLMs know and what they can do. You learned how to create tools with the @tool decorator and StructuredTool, bind them to models, execute tool calls, feed results back, and handle errors with retry logic. The key takeaway: tools are just Python functions with metadata. The LLM reads the metadata, decides when to use the function, and generates the arguments. Your code handles everything else.
From here, the natural next step is agents — systems where the LLM can call tools in a loop, making multiple tool calls to answer a single question. That is covered in the LangChain Chains tutorial.