ReAct Prompting — Build a Reasoning + Action Agent

Intermediate90 min3 exercises50 XP

Prerequisites

Chain-of-Thought Prompting Context Windows Explained

0/3 exercises

You've seen chain-of-thought prompting push an LLM to think step by step. But thinking isn't enough. When the model needs a fact it doesn't know — today's weather, a Wikipedia article, a calculation result — it hallucinates or stops.

ReAct solves this with a loop: think, act (call a tool), observe the result, then think again. In this tutorial, you'll build that loop from scratch in pure Python. No LangChain. No frameworks. Just you, an LLM, and a few functions.

What Is ReAct? The Thought-Action-Observation Loop

Ask an LLM to summarize a Wikipedia article it has never seen. It will produce a confident, detailed summary — and half the facts will be wrong. The model reasons about the topic but has no way to verify anything against the real world.

That gap between reasoning and reality is exactly what ReAct closes. ReAct stands for Reasoning + Acting. The paper by Yao et al. (2022) showed that LLMs perform dramatically better when they alternate between thinking and taking actions. Instead of generating one big answer, the model follows a repeating cycle:

The ReAct loop — conceptual overview

Loading editor...

Each Thought is the model reasoning about what it knows and what it still needs. Each Action calls a real function — a search engine, a calculator, a database lookup. Each Observation is the raw result that feeds back into the next thought.

The mental model is simple: a ReAct agent is a while loop where the LLM is the brain and your Python functions are its hands. The brain decides what to do, the hands do it, and the brain processes the result.

Setup and Tool Definitions

Install and configure

Loading editor...

A ReAct agent is only as useful as its tools. Each tool is a plain Python function that takes a string input and returns a string output. The three tools below cover math, string manipulation, and fact lookup — deliberately simple so you can focus on the loop itself.

Three simple tools

Loading editor...

We need a registry so the agent can discover which tools exist. The registry is a dictionary mapping each tool name to its function and a natural-language description the LLM reads to decide when to call it.

Tool registry

Loading editor...

The ReAct Prompt Template

The system prompt is the most critical piece. If it does not enforce the output format strictly, the model will answer questions directly instead of using tools. The template below injects tool descriptions dynamically, so adding a new tool to the registry updates the prompt automatically.

Reusable system prompt builder

Loading editor...

Parsing the Agent's Output

The parser extracts three fields from the model's free-text response: thought (the reasoning), action (which tool to call), and action_input (the argument). Each regex targets one labeled line. The function returns a dictionary with all three values, defaulting to empty strings if a field is missing.

Action parser

Loading editor...

This regex-based approach handles about 95% of model outputs cleanly. The remaining 5% — extra whitespace, commentary after the Action Input — is where production agents add retry logic.

Building the ReAct Loop

Everything we've built so far — tools, registry, prompt, parser — comes together in one async function. The loop works like this: on each iteration, the LLM generates a response, the parser extracts an action, and the tool result is fed back as an observation. Messages accumulate in a list, giving the model its full reasoning history.

Three details matter here. First, temperature=0.0 keeps tool selection deterministic. Second, observations go in as "user" messages (not "assistant") so the model treats them as new information rather than its own prior output. Third, the finish action exits the loop and returns the final answer.

The complete ReAct agent loop

Loading editor...

The Agent in Action

Start with a single-tool query — a calculation the model shouldn't attempt in its head:

Single-tool calculation

Loading editor...

One thought-action-observation cycle and done. The real power emerges with multi-step queries that chain tools together:

Multi-step: knowledge lookup + calculation

Loading editor...

Watch the trace carefully. The agent first looks up the speed of light, extracts the number from the observation, then asks the calculator to multiply. Two separate tool calls, each informed by the previous result. The model does what you'd do manually — look something up, then compute with it.

Three-tool chain

Loading editor...

Handling Errors and Edge Cases

Production agents break in predictable ways. The three most common failure modes: the model invents a tool name, the tool throws an error, or the model loops on the same action. Our agent already handles the first two — let's verify:

Unknown tool recovery

Loading editor...

The agent tried a nonexistent tool, got an error listing the available options, and either retried with a valid tool or gave up gracefully. The error message itself becomes an observation the model can reason about.

Tool error recovery

Loading editor...

Add a Word Count Tool to the Agent

Write Code

Create a function called word_count that takes a string and returns the number of words in it (words are separated by whitespace). Then create a dictionary called word_count_tool with keys "function" (pointing to your function) and "description" (a string explaining what the tool does).

Example: word_count("Hello world foo bar") should return "4".

Loading editor...

Building a Wikipedia Research Agent

Five hardcoded facts won't cut it for real research. This section builds a two-tool Wikipedia system: wiki_search finds article titles by keyword matching, and wiki_lookup retrieves the summary of a specific article. The pattern mirrors real search engines — first discover, then read.

The mock database below stores four articles with summaries and section lists. The wiki_search function uses substring matching to find relevant titles, while wiki_lookup does a case-insensitive exact match with a fuzzy fallback. Both return descriptive strings the agent can parse.

Mock Wikipedia tools

Loading editor...

With the Wikipedia tools defined, we create a new tool registry that swaps knowledge_base and string_length for wiki_search and wiki_lookup while keeping calculator. The build_react_prompt function generates a fresh system prompt from the new descriptions. The same react_agent function handles execution — the only difference is which tools and prompt we pass in.

Research agent — reusing react_agent

Loading editor...

This is the payoff of making react_agent parameterized. Same loop, different tools and prompt. The agent searches for Python, finds Guido van Rossum, looks up his article, extracts the BDFL detail, and delivers a grounded answer.

Multi-hop research query

Loading editor...

The agent needs at least two lookups — the ReAct article and the Python article — then synthesizes both into a single answer. Each observation narrows what it still needs.

Build a Section Counter Tool

Write Code

Create a function called wiki_section_count that takes a Wikipedia article title and returns how many sections that article has. Use the WIKI_ARTICLES dictionary defined above.

If the article exists, return "The article '<title>' has <N> sections." (with the title lowercased). If not, return "Article not found: <title>" (also lowercased).

Remember: keys in WIKI_ARTICLES are all lowercase.

Loading editor...

Agent Traces — Debugging Your Agent's Reasoning

When an agent gives a wrong answer, you need to know exactly where its reasoning went off track. Was it a bad thought? A wrong tool choice? A misinterpreted observation? The most useful debugging tool is a structured trace — a dictionary capturing every step, every tool call, and the total token count.

The traced_agent function below mirrors react_agent but instead of printing, it builds a trace dictionary. Each step records the thought, action, action input, and observation. The trace also accumulates total_tokens from the API response, so you can see the cost of each run.

Traced agent with structured logging

Loading editor...

Tracking Token Costs Per Agent Run

Each ReAct step requires a full API call. A 5-step agent run sends the growing message list 5 times, and later calls include all previous thoughts and observations. Token usage grows quadratically with step count because each call re-sends the entire conversation history.

No cost awareness

# Agent runs with no limit
result = await react_agent(question)
# Could burn tokens for 10+ steps

Cost-bounded agent

# Cap steps and track spend
trace = await traced_agent(
    question, tools, prompt,
    max_steps=5  # Hard limit
)
cost = trace["total_tokens"] / 1e6 * 0.15
print(f"Run cost: ${cost:.4f}")

Five ReAct Pitfalls and How to Fix Them

The same failure patterns appear across every ReAct implementation. Here are the top five with concrete fixes:

2. Tool Name Hallucination. The agent invents tools like web_search or python_executor. The error message listing available tools usually corrects this. For stubborn cases, bold the tool names in the system prompt.

Fragile: No output truncation

# Tool returns whatever it gets
def wiki_lookup(title):
    return full_article_text  # Could be 10,000+ chars

Robust: Truncated output

# Tool truncates long output
def wiki_lookup(title, max_chars=500):
    text = full_article_text
    if len(text) > max_chars:
        return text[:max_chars] + "... [truncated]"
    return text

Build a Truncation-Safe Tool Executor

Write Code

Create a function called safe_execute that takes three arguments: tool_name (str), tool_input (str), and tools_dict (dict). It should:

1. If tool_name is not in tools_dict, return "Error: Unknown tool '<tool_name>'" (with the actual tool name).

2. Call the tool function with tool_input.

3. If the result is longer than 300 characters, truncate to 300 characters and append " [truncated]".

4. Return the (possibly truncated) result.

Loading editor...

ReAct vs Chain-of-Thought vs Function Calling

You now have three approaches to LLM reasoning in your toolkit. Picking the right one depends on whether you need reasoning, tool access, or both.

My rule of thumb: if the task needs only reasoning, use CoT. If it needs tools but not complex reasoning, use function calling. If it needs both — looking things up, computing, then reasoning about results — use ReAct.

Summary and Next Steps

You built a ReAct agent from scratch in pure Python — no frameworks required. Here's what you covered:

The Thought-Action-Observation loop is the core of ReAct. The LLM reasons, calls a tool, and incorporates the result before reasoning again.

Tools are plain Python functions that accept and return strings. A registry makes them discoverable.

The system prompt enforces the output format. A parser extracts structured data from free text.

The agent is parameterized — swap tools and prompts to create specialized agents (research, calculation, etc.).

Error handling and tracing are built into the loop for production debugging.

Token costs grow with step count — always set a max_steps ceiling.

Practice Exercise

Extend the research agent with a wiki_related tool that returns titles of related articles by checking for word overlap in summaries. Then ask a question that requires following links between articles.

<details><summary>Solution sketch</summary>

```python

def wiki_related(title: str) -> str:

title_lower = title.lower()

if title_lower not in WIKI_ARTICLES:

return f"Article not found: {title}"

source_words = set(WIKI_ARTICLES[title_lower]["summary"].lower().split())

related = []

for other_title, article in WIKI_ARTICLES.items():

if other_title == title_lower:

continue

other_words = set(article["summary"].lower().split())

if len(source_words & other_words) > 10:

related.append(other_title)

return f"Related: {', '.join(related) if related else 'none found'}"

```

</details>

Frequently Asked Questions

Can ReAct agents use more than 10 tools?

Yes, but there's a tradeoff. More tools means a longer system prompt consuming more of the context window. Agents start struggling to pick the right tool above 15-20 options. For large toolsets, use a two-stage approach: one agent picks the category, a second picks the specific tool.

How do I add memory across conversations?

The agent we built is stateless — each call starts fresh. To add memory, persist traces to a database and inject a summary of previous interactions into the system prompt. The chatbot with memory tutorial walks through this pattern step by step.

Is ReAct the same as LangChain's agents?

LangChain's AgentExecutor implements a very similar loop with additional abstractions for tool management and memory. Understanding the raw loop — which you just built — makes debugging LangChain agents much easier.

What models work best for ReAct?

Any model that reliably follows formatting instructions. GPT-4o and GPT-4o-mini are excellent. Claude 3.5 Sonnet works well. Smaller open-source models (7B parameters or less) often struggle with strict output formats. For small models, simplify to Action: tool_name(input) on a single line.

References

Yao, S., et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629. https://arxiv.org/abs/2210.03629

Wei, J., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS 2022.

OpenAI API Documentation — Chat Completions. https://platform.openai.com/docs/guides/text-generation

Mialon, G., et al. (2023). Augmented Language Models: A Survey. arXiv:2302.07842.

Schick, T., et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv:2302.04761.

LangChain Documentation — Agents. https://python.langchain.com/docs/modules/agents/

Anthropic Documentation — Tool Use. https://docs.anthropic.com/en/docs/build-with-claude/tool-use

ReAct Prompting — Build a Reasoning + Action Agent

What Is ReAct? The Thought-Action-Observation Loop

Setup and Tool Definitions

The ReAct Prompt Template

Parsing the Agent's Output

Building the ReAct Loop

The Agent in Action

Handling Errors and Edge Cases

Building a Wikipedia Research Agent

Agent Traces — Debugging Your Agent's Reasoning

Tracking Token Costs Per Agent Run

Five ReAct Pitfalls and How to Fix Them

ReAct vs Chain-of-Thought vs Function Calling

Summary and Next Steps

Frequently Asked Questions

References

Related Tutorials

Save your progress across devices