Agent Tool Design Best Practices

May 6, 2026 · Practical Guide

30-Second Takeaway

Problem Solved: Poorly defined tools cause models to call the wrong tool, pass bad parameters, and loop infinitely.
Core Method: 8 production-tested rules: trigger conditions, self-documenting parameters, proper granularity, structured output, actionable error messages, tiered exposure, idempotency, real model testing.
Key Insight: Schema correctness ≠ model compatibility. Always test with the actual model you'll use in production.
What You'll Gain: A checklist to audit and improve every tool definition.

Tools are the Agent's hands and feet. Well-designed tools make the model unstoppable; poorly designed ones lead to wrong tool calls, bad parameters, and infinite loops.

This article distills 8 rules from real production Agent projects. Each rule comes with before/after examples you can apply immediately.

Rule 1: Tool Descriptions Must Include Trigger Conditions

Don't just describe what a tool does — describe exactly when to use it. The model needs to answer: "Given my current situation, should I call this tool?"

A good description has three parts:

Trigger condition — what situation warrants this tool
What it does — the action it performs
What it returns — the shape and meaning of the output

# ❌ Too vague — model doesn't know when to use this
"description": "Search the web"

# ✅ Clear trigger + behavior + return value
"description": "Search the web for current information. Use when the answer requires real-time or recent data beyond your training cutoff, or when explicitly asked to look up current facts. Returns top 10 results with titles, URLs, and snippets."

💡 Real lesson: We once had a tool named fetch_documentation that models almost never called. Adding "Use this when you encounter an unfamiliar library, API, or framework" tripled its usage rate — the model needed the trigger condition spelled out.

Rule 2: Parameter Names Are Natural Language Prompts

Remember: parameter names are part of the prompt the model reads. They influence tool selection and parameter filling accuracy.

❌ Bad Name	✅ Good Name	Why
`q`	`search_query`	Self-documenting; model understands intent
`fp`	`file_path`	Avoids abbreviations the model might misinterpret
`id`	`user_id`	Scoped name prevents ambiguity with other IDs
`data`	`csv_content`	Describes the format and content expected

Each parameter also needs its own description field — never skip it. A parameter named limit with no description is a guessing game for the model.

# ❌ Missing descriptions
{"name": "limit", "type": "integer"}

# ✅ Self-explanatory
{"name": "max_results", "type": "integer",
 "description": "Maximum number of results to return. Default 10, max 50."}

Rule 3: Tool Granularity — One Complete Operation Per Tool

This is the hardest rule to get right. The extremes are easy to spot, but the sweet spot takes practice.

Pattern	Example	Problem
Too fine	`open_file()` → `read_byte()` → `close_file()`	10 sequential calls for one task; model loses context
Too coarse	`do_everything(action, target, format, filter, sort, ...)`	Parameter explosion; model doesn't know what to pass
Just right	`read_file(path)`, `write_file(path, content)`, `search_files(pattern)`	One complete operation each; composable but independent

Golden rule: If you can describe the tool's entire purpose in one sentence without an "and," it's probably the right granularity. "Reads a file and returns its contents" — good. "Reads a file, parses it, filters lines, and writes output" — too coarse.

Rule 4: Return Structured, Parseable Output

Tool return values become the model's next input. Garbage in the return → garbage in the model's reasoning.

# ❌ Unstructured — model has to parse natural language
"Found 3 files: report.csv (2.3MB, modified 2024-01-15),
data.json (156KB, modified 2024-01-14), notes.txt (4KB, modified 2024-01-10)"

# ✅ Structured JSON — model extracts fields accurately
{
  "files": [
    {"name": "report.csv", "size_bytes": 2411725, "modified": "2024-01-15T14:30:00Z"},
    {"name": "data.json", "size_bytes": 159744, "modified": "2024-01-14T09:15:00Z"},
    {"name": "notes.txt", "size_bytes": 4096, "modified": "2024-01-10T18:00:00Z"}
  ],
  "count": 3
}

JSON isn't mandatory — but consistency is. If you return text, use a predictable format. If you return JSON, follow the same schema across all tools.

Rule 5: Error Messages Must Suggest Next Actions

The worst thing a tool can return is an empty string or a vague "Error occurred." The model then has zero information to recover. It will either retry the same broken call or hallucinate a result.

# ❌ Useless error
{"error": "Failed"}

# ❌ Better but still unhelpful
{"error": "File not found"}

# ✅ Actionable error — model can self-correct
{
  "success": false,
  "error": "File not found: /data/reports/2024/summary.csv",
  "suggestion": "Try listing /data/reports/2024/ to see available files, or check the path spelling.",
  "available_directories": ["/data/reports/2023", "/data/reports/2025"]
}

💡 Error taxonomy: We classify tool errors into three types to guide recovery strategy — Retryable (timeout, rate limit), Fixable (bad parameter, wrong file path), and Fatal (permission denied, service down). Include this classification in the error response so the Agent can decide: retry, adjust, or escalate.

Rule 6: Limit Tool Count With Tiered Exposure

Beyond ~20 tools in a single prompt, model selection accuracy drops measurably. In our testing, going from 10 to 30 tools increased wrong-tool calls by 40%.

Tiered exposure strategy:

Tier 1 (always visible): 5-8 core tools — read, write, search, execute, ask
Tier 2 (context-gated): Advanced tools exposed only when the task mentions relevant keywords
Tier 3 (on-demand): Specialized tools the Agent can discover via a list_advanced_tools() meta-tool

# Tiered tool registry pattern
TOOL_TIERS = {
    "tier1": ["read_file", "write_file", "search_web", "execute_code", "ask_user"],
    "tier2": ["query_database", "send_email", "create_chart", "run_test_suite"],
    "tier3": ["deploy_service", "manage_permissions", "generate_report"]
}

Rule 7: Design for Idempotency

Agents retry. A lot. If calling the same tool twice with the same parameters produces different side effects (double-charging a customer, sending duplicate emails, creating duplicate records), you have a serious problem.

Operation	Non-Idempotent (Dangerous)	Idempotent (Safe)
Create user	`create_user(email)` — creates duplicate	`get_or_create_user(email)` — returns existing if present
Send message	`send(to, body)` — sends every call	`send(to, body, idempotency_key)` — deduplicates
Charge payment	`charge(amount)` — charges every call	`charge(order_id, amount)` — skips if already charged

The idempotency_key pattern is the simplest fix: generate a unique key per logical operation, pass it to the tool, and have the tool skip execution if it has already processed that key.

Rule 8: Test Tools With Real Model Calls

Unit-testing tool implementations isn't enough. You need to test whether the model actually uses the tool correctly. Schema correctness ≠ model compatibility.

def test_tool_usage(tool_def, test_scenarios):
    """Test if model calls the tool correctly across scenarios."""
    for scenario in test_scenarios:
        response = model.chat(
            messages=[{"role": "user", "content": scenario["prompt"]}],
            tools=[tool_def]
        )
        tool_call = response.tool_calls[0]

        # Verify: did model call the right tool?
        assert tool_call.name == scenario["expected_tool"], \
            f"Expected {scenario['expected_tool']}, got {tool_call.name}"

        # Verify: are parameters reasonable?
        for param, validator in scenario["param_checks"].items():
            assert validator(tool_call.arguments.get(param)), \
                f"Parameter '{param}' validation failed"

⚠️ Common pitfall: A tool definition that passes JSON Schema validation can still confuse the model. For example, a parameter typed as "string" with no examples might receive Markdown when the tool expects plain text. Always test with the actual model you'll use in production.

Quick Reference Checklist

Check	Rule
☐ Description includes trigger condition?	Rule 1
☐ Every parameter has a description?	Rule 2
☐ One complete operation per tool?	Rule 3
☐ Return format is consistent and parseable?	Rule 4
☐ Errors include suggested recovery actions?	Rule 5
☐ Under 20 tools visible at once?	Rule 6
☐ Side-effect tools have idempotency keys?	Rule 7
☐ Tested with actual model calls?	Rule 8

Next Steps

📖 Foundational: Write Your First AI Agent
📖 Related: Agent Error Recovery & Self-Correction
📖 Advanced: Building an Agent Framework from Scratch

Frequently Asked Questions

Q: How to write tool descriptions models actually understand?

A: Good descriptions need: ① trigger condition (when to use), ② behavior (what it does), ③ return value (output structure).

Q: How many tools can an Agent handle?

A: Beyond ~20 tools, accuracy drops measurably. Keep 5-8 core tools always visible, gate advanced tools by context.

Q: Why design tools for idempotency?

A: Agents retry a lot. Idempotency ensures that even when the Agent makes mistakes, no damage is done.

Q: How to test tool definitions?

A: Test with real model calls. Design multiple scenario prompts and verify tool selection and parameters. Make this part of CI.