What is building the react loop in Python?

Everything we've built so far — tools, registry, prompt, parser — comes together in one function. This is the engine of the entire agent: A few design decisions worth noting. We use temperature=0.0 because tool-calling agents should be as consistent as possible — you don't want randomness in...

ReAct Prompting: Build a Reasoning + Action Agent from Scratch (No Frameworks)

Intermediate90 min3 exercises50 XP

Prerequisites

Chain-of-Thought Prompting Context Windows Explained

0/3 exercises

You've seen chain-of-thought prompting push an LLM to think step by step. But thinking isn't enough. When the model needs a fact it doesn't know — today's weather, a Wikipedia article, a calculation result — it hallucinates or stops. ReAct solves this with a loop: think, act (call a tool), observe the result, then think again. In this tutorial, you'll build that loop from scratch in pure Python. No LangChain. No frameworks. Just you, an LLM, and a few functions.

What Is ReAct? The Thought-Action-Observation Loop

I remember the first time I tried to build an LLM-powered research assistant. I asked it to summarize a Wikipedia article. It produced a confident, detailed summary — and half the facts were wrong. The model was reasoning about the topic but had no way to verify anything.

That gap between reasoning and reality is exactly what ReAct closes. ReAct stands for Reasoning + Acting. The paper by Yao et al. (2022) showed that LLMs perform dramatically better when they alternate between thinking and taking actions. Instead of generating one big answer, the model follows a repeating cycle:

The ReAct loop — conceptual overview

Loading editor...

Each Thought is the model reasoning about what it knows and what it still needs. Each Action calls a real function — a search engine, a calculator, a database lookup. Each Observation is the raw result that feeds back into the next thought.

The mental model is simple: a ReAct agent is a while loop where the LLM is the brain and your Python functions are its hands. The brain decides what to do, the hands do it, and the brain processes the result.

Setup and Tool Definitions

Install and configure

Loading editor...

A ReAct agent is only as useful as its tools. Each tool is a plain Python function that takes a string input and returns a string output. I'm keeping these deliberately simple so you can focus on the loop itself, not tool complexity.

Three simple tools

Loading editor...

We need a registry so the agent can discover which tools exist. This is just a dictionary mapping names to functions and descriptions.

Tool registry

Loading editor...

The ReAct Prompt Template

When I first tried building a ReAct agent, I spent an hour wondering why the model kept answering questions directly instead of using tools. The problem was my system prompt — it didn't enforce the output format strictly enough. This prompt template is the result of that debugging:

Reusable system prompt builder

Loading editor...

The tool descriptions are injected dynamically — add a new tool to the registry and the prompt updates automatically. The format is rigid on purpose: Action: and Action Input: on separate lines makes parsing reliable. The finish action signals the agent is done.

Parsing the Agent's Output

Action parser

Loading editor...

I've found this regex-based approach handles about 95% of model outputs cleanly. The remaining 5% — extra whitespace, commentary after the Action Input — is where production agents add retry logic. For learning, this parser is solid.

Building the ReAct Loop

Everything we've built so far — tools, registry, prompt, parser — comes together in one function. This is the engine of the entire agent:

The complete ReAct agent loop

Loading editor...

A few design decisions worth noting. We use temperature=0.0 because tool-calling agents should be as consistent as possible — you don't want randomness in choosing which tool to call. The max_steps parameter prevents infinite loops. And the function accepts custom tools and prompts, so we can reuse it later with different configurations.

The Agent in Action

Let's start with a single-tool query — a calculation the model shouldn't attempt in its head:

Single-tool calculation

Loading editor...

One thought-action-observation cycle and done. The real power emerges with multi-step queries that chain tools together:

Multi-step: knowledge lookup + calculation

Loading editor...

Watch the trace carefully. The agent first looks up the speed of light, extracts the number from the observation, then asks the calculator to multiply. Two separate tool calls, each informed by the previous result. This is what I find most elegant about ReAct — the model does what you'd do manually. Look something up, then compute with it.

Three-tool chain

Loading editor...

Three tools, three observations, one coherent answer. The model delegates character counting and division to the right tools instead of attempting them in its head — a habit that prevents subtle errors.

Handling Errors and Edge Cases

Production agents break in predictable ways. I've seen three failure modes more than any others: the model invents a tool name, the tool throws an error, or the model loops on the same action. Our agent already handles the first two — let's verify:

Unknown tool recovery

Loading editor...

The agent tried a nonexistent tool, got an error listing the available options, and either retried with a valid tool or gave up gracefully. The error message itself becomes an observation the model can reason about.

Tool error recovery

Loading editor...

Add a Word Count Tool to the Agent

Write Code

Create a function called word_count that takes a string and returns the number of words in it (words are separated by whitespace). Then create a dictionary called word_count_tool with keys "function" (pointing to your function) and "description" (a string explaining what the tool does).

Example: word_count("Hello world foo bar") should return "4".

Loading editor...

Building a Wikipedia Research Agent

Five hardcoded facts won't cut it for real research. Let's upgrade to a mock Wikipedia API that simulates what a real lookup returns. I'm using mock data because browser-based code has network restrictions — but the ReAct loop works identically with real or mock tools.

Mock Wikipedia tools

Loading editor...

Two tools working together: wiki_search finds articles and wiki_lookup reads a specific one. This mirrors real search — first find, then read. Now let's wire them into the agent:

Research agent — reusing react_agent

Loading editor...

This is the payoff of making react_agent parameterized. Same loop, different tools and prompt. The agent searches for Python, finds Guido van Rossum, looks up his article, extracts the BDFL detail, and delivers a grounded answer.

Multi-hop questions — where no single lookup contains the full answer — showcase ReAct's real strength:

Multi-hop research query

Loading editor...

The agent needs at least two lookups — the ReAct article and the Python article — then synthesizes both into a single answer. Each observation narrows what it still needs.

Build a Section Counter Tool

Write Code

Create a function called wiki_section_count that takes a Wikipedia article title and returns how many sections that article has. Use the WIKI_ARTICLES dictionary defined above.

If the article exists, return "The article '<title>' has <N> sections." (with the title lowercased). If not, return "Article not found: <title>" (also lowercased).

Remember: keys in WIKI_ARTICLES are all lowercase.

Loading editor...

Agent Traces — Debugging Your Agent's Reasoning

When an agent gives a wrong answer, how do you figure out where its reasoning went off track? Was it a bad thought? A wrong tool choice? A misinterpreted observation? In my experience, the single most useful debugging tool is a structured trace — the full sequence of steps with metadata.

Traced agent with structured logging

Loading editor...

The trace dictionary captures everything: step count, tool calls, token consumption, and the full reasoning chain. When something goes wrong, you can pinpoint the exact step where the agent made a bad decision.

Five ReAct Pitfalls and How to Fix Them

After building dozens of these agents, the same failure patterns keep showing up. Here are the top five with concrete fixes:

1. The Infinite Loop. The agent calls the same tool with the same input repeatedly. Fix: Track previous (action, input) pairs and inject "you already tried this — try a different approach" after two identical calls.

2. Tool Name Hallucination. The agent invents tools like web_search or python_executor. Fix: The error message listing available tools usually corrects this. For stubborn cases, bold the tool names in the system prompt.

Fragile: No output truncation

# Tool returns whatever it gets
def wiki_lookup(title):
    return full_article_text  # Could be 10,000+ chars

Robust: Truncated output

# Tool truncates long output
def wiki_lookup(title, max_chars=500):
    text = full_article_text
    if len(text) > max_chars:
        return text[:max_chars] + "... [truncated]"
    return text

Build a Truncation-Safe Tool Executor

Write Code

Create a function called safe_execute that takes three arguments: tool_name (str), tool_input (str), and tools_dict (dict). It should:

1. If tool_name is not in tools_dict, return "Error: Unknown tool '<tool_name>'" (with the actual tool name).

2. Call the tool function with tool_input.

3. If the result is longer than 300 characters, truncate to 300 characters and append " [truncated]".

4. Return the (possibly truncated) result.

Loading editor...

ReAct vs Chain-of-Thought vs Function Calling

You now have three approaches to LLM reasoning in your toolkit. Picking the right one depends on whether you need reasoning, tool access, or both.

Comparison table

Loading editor...

My rule of thumb: if the task needs only reasoning, use CoT. If it needs tools but not complex reasoning, use function calling. If it needs both — looking things up, computing, then reasoning about results — use ReAct.

Summary

You built a ReAct agent from scratch in pure Python. Here's what you covered:

The Thought-Action-Observation loop is the core of ReAct. The LLM reasons, calls a tool, and incorporates the result before reasoning again.

Tools are plain Python functions that accept and return strings. A registry makes them discoverable.

The system prompt enforces the output format. A parser extracts structured data from free text.

The agent is parameterized — swap tools and prompts to create specialized agents (research, calculation, etc.).

Error handling is built into the loop. Unknown tools and failed calls become observations the agent can reason about.

Traces capture the full reasoning chain for debugging.

ReAct vs CoT vs function calling — each fits different needs.

Practice Exercise

Extend the research agent with a wiki_related tool that returns titles of related articles by checking for word overlap in summaries. Then ask a question that requires following links between articles.

<details><summary>Solution sketch</summary>

```python

def wiki_related(title: str) -> str:

title_lower = title.lower()

if title_lower not in WIKI_ARTICLES:

return f"Article not found: {title}"

source_words = set(WIKI_ARTICLES[title_lower]["summary"].lower().split())

related = []

for other_title, article in WIKI_ARTICLES.items():

if other_title == title_lower:

continue

other_words = set(article["summary"].lower().split())

if len(source_words & other_words) > 10:

related.append(other_title)

return f"Related: {', '.join(related) if related else 'none found'}"

```

</details>

Frequently Asked Questions

Can ReAct agents use more than 10 tools?

Yes, but there's a tradeoff. More tools means a longer system prompt consuming more of the context window. In my experience, agents start struggling to pick the right tool above 15-20 options. For large toolsets, use a two-stage approach: one agent picks the category, a second picks the specific tool.

How do I add memory across conversations?

The agent we built is stateless — each call starts fresh. To add memory, persist traces to a database and inject a summary of previous interactions into the system prompt.

Is ReAct the same as LangChain's agents?

LangChain's AgentExecutor implements a very similar loop with additional abstractions for tool management and memory. Understanding the raw loop — which you just built — makes debugging LangChain agents much easier.

What models work best for ReAct?

Any model that reliably follows formatting instructions. GPT-4o and GPT-4o-mini are excellent. Claude 3.5 Sonnet works well. Smaller open-source models (7B parameters or less) often struggle with strict output formats. For small models, simplify to Action: tool_name(input) on a single line.

References

Yao, S., et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629. https://arxiv.org/abs/2210.03629

Wei, J., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS 2022.

OpenAI API Documentation — Chat Completions. https://platform.openai.com/docs/guides/text-generation

Mialon, G., et al. (2023). Augmented Language Models: A Survey. arXiv:2302.07842.

Schick, T., et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv:2302.04761.

LangChain Documentation — Agents. https://python.langchain.com/docs/modules/agents/

Anthropic Documentation — Tool Use. https://docs.anthropic.com/en/docs/build-with-claude/tool-use

What Is ReAct? The Thought-Action-Observation Loop

Setup and Tool Definitions

The ReAct Prompt Template

Parsing the Agent's Output

Building the ReAct Loop

The Agent in Action

Handling Errors and Edge Cases

Building a Wikipedia Research Agent

Agent Traces — Debugging Your Agent's Reasoning

Five ReAct Pitfalls and How to Fix Them

ReAct vs Chain-of-Thought vs Function Calling

Summary

Frequently Asked Questions

References

Related Tutorials