← Back to Home

Building Model-Agnostic AI Agents

Core takeaway: The adapter pattern is cheap insurance against vendor lock-in. Define a unified chat() + normalize_tools() + normalize_messages() interface, standardize internally on OpenAI tool format — everything else is implementation detail. The cost difference between models can exceed 20x.

Most AI Agent frameworks are born tied to a specific model vendor. LangChain was originally built around OpenAI. Claude Code is naturally Anthropic-exclusive. But in practice, you often need to switch between models — for cost, latency, capability matching, or simply to avoid lock-in.

This article shows you how to build an Agent that works with any model — Claude, GPT, DeepSeek, Llama, or locally deployed open models — by swapping a single line of configuration.

Why Model-Agnostic Matters

ScenarioWhat You NeedLock-In Problem
Production deploymentGPT-4o for complex tasks, Claude for long-form writingYour code has OpenAI SDK hardcoded
Cost optimizationDeepSeek for simple queries (10x cheaper), GPT for hard onesTool definitions only work with one format
Privacy-sensitive dataLocal Llama 3 for internal docs, cloud API for public tasksDifferent message formats break your pipeline
Model evaluationA/B test 3 models on the same Agent taskCan't swap models without code changes

What is Model-Agnostic Architecture

Model-agnostic means your Agent's core logic doesn't depend on any specific model's API format. The Agent loop — observe → think → act → observe — stays identical regardless of which model powers the "think" step.

The architecture has three layers:

  1. Agent Core — the ReAct loop, tool execution, memory management (model-independent)
  2. Adapter Layer — translates between the core's unified format and each model's specific API
  3. Model Providers — Claude, GPT, DeepSeek, Llama, etc. (swappable)
┌─────────────────────────────────┐
│         Agent Core Loop          │  ← Never changes
│  observe → think → act → observe │
└──────────────┬──────────────────┘
               │ Unified interface
┌──────────────▼──────────────────┐
│        Adapter Layer             │  ← Swap per model
│  ┌────────┐ ┌──────┐ ┌───────┐  │
│  │Claude  │ │ GPT  │ │DeepSk │  │
│  │Adapter │ │Adapter│ │Adapter│  │
│  └───┬────┘ └──┬───┘ └───┬───┘  │
└──────┼─────────┼─────────┼──────┘
       │         │         │
   Anthropic  OpenAI  DeepSeek APIs

The Adapter Interface

Every adapter implements the same interface. Here's the contract:

from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any

@dataclass
class AgentResponse:
    """Unified response from any model."""
    content: str | None        # Final text answer (if is_final=True)
    tool_call: dict | None     # Tool call request (if is_final=False)
    is_final: bool             # True = done, False = tool call needed
    usage: dict                # Token usage: {"input": N, "output": M}

class ModelAdapter(ABC):
    """Every model adapter must implement this interface."""

    @abstractmethod
    def chat(self, messages: list[dict],
             tools: list[dict] | None = None,
             temperature: float = 0.7,
             max_tokens: int = 1000) -> AgentResponse:
        """Send messages + tools → receive response or tool call."""
        ...

    @abstractmethod
    def normalize_tools(self, tools: list[dict]) -> list[dict]:
        """Convert unified tool schema to model-specific format."""
        ...

    @abstractmethod
    def normalize_messages(self, messages: list[dict]) -> list[dict]:
        """Convert unified messages to model-specific format."""
        ...

Implementing Real Adapters

Here are concrete implementations for the three most common model families. Notice how each handles tool calling differently.

OpenAI Adapter (GPT-4o, GPT-4, GPT-3.5)

from openai import OpenAI

class OpenAIAdapter(ModelAdapter):
    def __init__(self, model="gpt-4o", api_key=None, base_url=None):
        self.client = OpenAI(api_key=api_key, base_url=base_url)
        self.model = model

    def normalize_tools(self, tools):
        # OpenAI uses the standard function-calling format — minimal changes
        return [{"type": "function", "function": t} for t in tools]

    def normalize_messages(self, messages):
        # OpenAI format is the baseline; tool results use role "tool"
        return messages  # Already in correct format

    def chat(self, messages, tools=None, temperature=0.7, max_tokens=1000):
        kwargs = dict(
            model=self.model,
            messages=self.normalize_messages(messages),
            temperature=temperature,
            max_tokens=max_tokens
        )
        if tools:
            kwargs["tools"] = self.normalize_tools(tools)

        resp = self.client.chat.completions.create(**kwargs)
        msg = resp.choices[0].message

        return AgentResponse(
            content=msg.content,
            tool_call={
                "name": msg.tool_calls[0].function.name,
                "arguments": msg.tool_calls[0].function.arguments
            } if msg.tool_calls else None,
            is_final=msg.tool_calls is None,
            usage={
                "input": resp.usage.prompt_tokens,
                "output": resp.usage.completion_tokens
            }
        )

Anthropic Adapter (Claude Sonnet, Claude Opus)

import anthropic

class AnthropicAdapter(ModelAdapter):
    def __init__(self, model="claude-sonnet-4-20250514", api_key=None):
        self.client = anthropic.Anthropic(api_key=api_key)
        self.model = model

    def normalize_tools(self, tools):
        # Anthropic uses a different tool format — no "type": "function" wrapper
        normalized = []
        for tool in tools:
            inner = tool.get("function", tool)  # Unwrap if nested
            normalized.append({
                "name": inner["name"],
                "description": inner.get("description", ""),
                "input_schema": inner.get("parameters",
                    {"type": "object", "properties": {}})
            })
        return normalized

    def normalize_messages(self, messages):
        # Anthropic needs system prompt extracted to separate parameter
        normalized = []
        for msg in messages:
            if msg["role"] == "system":
                continue  # Handled separately
            if msg["role"] == "tool":
                # Anthropic uses "tool_result" blocks inside user messages
                normalized.append({
                    "role": "user",
                    "content": [{
                        "type": "tool_result",
                        "tool_use_id": msg.get("tool_call_id", "unknown"),
                        "content": msg["content"]
                    }]
                })
            else:
                normalized.append({"role": msg["role"],
                                   "content": msg["content"]})
        return normalized

    def chat(self, messages, tools=None, temperature=0.7, max_tokens=1000):
        system = next((m["content"] for m in messages
                       if m["role"] == "system"), None)
        normalized_msgs = self.normalize_messages(messages)

        kwargs = dict(
            model=self.model,
            messages=normalized_msgs,
            max_tokens=max_tokens,
            temperature=temperature
        )
        if system:
            kwargs["system"] = system
        if tools:
            kwargs["tools"] = self.normalize_tools(tools)

        resp = self.client.messages.create(**kwargs)

        # Extract tool use blocks from response
        tool_calls = [
            block for block in resp.content
            if block.type == "tool_use"
        ]

        return AgentResponse(
            content=resp.content[0].text if resp.content[0].type == "text"
                    else None,
            tool_call={
                "name": tool_calls[0].name,
                "arguments": tool_calls[0].input,
                "id": tool_calls[0].id
            } if tool_calls else None,
            is_final=len(tool_calls) == 0,
            usage={
                "input": resp.usage.input_tokens,
                "output": resp.usage.output_tokens
            }
        )

OpenAI-Compatible Adapter (DeepSeek, vLLM, Ollama, local models)

Many models (DeepSeek, Llama via vLLM/Ollama, Groq) use the OpenAI-compatible API. One adapter covers them all — just change the base_url:

# DeepSeek — 10x cheaper for simple tasks, great for Chinese
agent = ModelAgnosticAgent(
    OpenAIAdapter(
        model="deepseek-chat",
        api_key="sk-xxx",
        base_url="https://api.deepseek.com/v1"
    ),
    tools, prompt
)

# Local Llama 3 via Ollama — zero cost, full privacy
agent = ModelAgnosticAgent(
    OpenAIAdapter(
        model="llama3:70b",
        api_key="ollama",  # Ollama ignores the key
        base_url="http://localhost:11434/v1"
    ),
    tools, prompt
)

# Groq — fastest inference for real-time use cases
agent = ModelAgnosticAgent(
    OpenAIAdapter(
        model="llama-3.1-70b-versatile",
        api_key="gsk_xxx",
        base_url="https://api.groq.com/openai/v1"
    ),
    tools, prompt
)

Tool Format Normalization

Different providers have slightly different tool schemas. The key insight: standardize on OpenAI's function-calling format as the internal representation, and let each adapter convert to its native format.

FeatureOpenAIAnthropicGoogle Gemini
Tool wrapper{"type": "function", "function": {...}}Bare object, no wrapper{"functionDeclarations": [...]}
Schema fieldparameters (JSON Schema)input_schema (JSON Schema)parameters (OpenAPI-like)
Tool result rolerole: "tool"tool_result content blockrole: "tool"
Parallel callsSupported nativelySupported nativelyNot supported
💡 Pro tip: Always use JSON Schema {"type": "object", "properties": {...}, "required": [...]} for tool parameters. This is the only format that all major providers support with minimal conversion. Avoid provider-specific schema features.

Model Selection Strategy

With a model-agnostic architecture, you can route tasks to the optimal model based on characteristics:

Task TypeRecommended ModelReason
Complex reasoning, math, codeClaude Opus / GPT-4oHighest reasoning accuracy
Simple Q&A, summarizationDeepSeek / Llama 3 70B5-10x cheaper, good enough
Long-form writingClaude SonnetExcellent prose quality
Chinese contentDeepSeek / QwenNative Chinese performance
Sensitive internal dataLocal Llama / QwenData never leaves your infra
Real-time (< 500ms)Groq / GPT-4o-miniUltra-low latency
def smart_route(task: str) -> ModelAdapter:
    """Route task to the best model based on heuristics."""
    if any(kw in task.lower() for kw in ["code", "debug", "math", "logic"]):
        return AnthropicAdapter(model="claude-opus-4-20250514")
    if any(kw in task.lower() for kw in ["中文", "chinese", "翻译"]):
        return OpenAIAdapter(model="deepseek-chat",
                            base_url="https://api.deepseek.com/v1")
    if len(task) < 100:  # Simple, short query
        return OpenAIAdapter(model="gpt-4o-mini")
    return AnthropicAdapter(model="claude-sonnet-4-20250514")  # Default

The Complete Agent

Here's the full model-agnostic Agent. The core loop never changes — only the adapter does:

class ModelAgnosticAgent:
    def __init__(self, model: ModelAdapter, tools: list[dict],
                 system_prompt: str):
        self.model = model
        self.tools = tools
        self.messages = [{"role": "system", "content": system_prompt}]
        self.total_cost = 0.0

    def run(self, user_input: str, max_turns: int = 20) -> str:
        self.messages.append({"role": "user", "content": user_input})
        turns = 0

        while turns < max_turns:
            response = self.model.chat(
                self.messages, self.tools,
                temperature=0.7 if turns == 0 else 0.4  # Cool down
            )
            turns += 1

            if response.is_final:
                return response.content

            # Execute tool and feed result back
            tool_name = response.tool_call["name"]
            tool_args = response.tool_call.get("arguments", {})
            result = self._execute_tool(tool_name, tool_args)

            self.messages.append({
                "role": "assistant",
                "content": None,
                "tool_calls": [{
                    "id": f"call_{turns}",
                    "type": "function",
                    "function": {
                        "name": tool_name,
                        "arguments": json.dumps(tool_args)
                            if isinstance(tool_args, dict)
                            else tool_args
                    }
                }]
            })
            self.messages.append({
                "role": "tool",
                "tool_call_id": f"call_{turns}",
                "content": json.dumps(result)
            })

        return "Max turns reached without completion."

Testing Your Model-Agnostic Layer

How do you know the adapter is working correctly? Test with a simple tool-calling task across all models:

def test_adapter(adapter: ModelAdapter):
    """Verify an adapter handles tool calling correctly."""
    tools = [{
        "name": "get_weather",
        "description": "Get current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name"
                }
            },
            "required": ["city"]
        }
    }]

    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the weather in Tokyo?"}
    ]

    response = adapter.chat(messages, tools)

    assert not response.is_final, "Should request a tool call"
    assert response.tool_call["name"] == "get_weather", \
        f"Wrong tool: {response.tool_call['name']}"
    assert "Tokyo" in str(response.tool_call.get("arguments", "")), \
        "Missing city argument"

# Run against all your adapters
for name, adapter in [
    ("OpenAI", OpenAIAdapter()),
    ("Anthropic", AnthropicAdapter()),
    ("DeepSeek", OpenAIAdapter(base_url="https://api.deepseek.com/v1"))
]:
    try:
        test_adapter(adapter)
        print(f"✅ {name}: PASS")
    except Exception as e:
        print(f"❌ {name}: FAIL — {e}")

Existing Frameworks (and When to Roll Your Own)

FrameworkModel SupportBest ForWhen to Skip
smolagents (HuggingFace)Any HF model + external APIsQuick prototyping, HF ecosystem usersNeed fine control over tool loop
DSPy10+ providers via adaptersPrompt optimization, A/B testing modelsSimple tool-calling agents (overkill)
LangChainWide but historically OpenAI-firstComplex RAG pipelines, many integrationsSimplicity; LangChain adds abstraction overhead
Custom adapter (this article)Any model, full controlProduction systems, specific requirementsYou only use one model
⚠️ Honest caveat: Model-agnostic architecture adds complexity. If you genuinely only use one model and have no plans to switch — don't build this. But if you're building a product, a platform, or anything that might outlive your current model choice — the adapter pattern pays for itself the first time you need to switch.

Key Takeaways

  1. Define a unified adapter interface — one chat() method, one normalize_tools(), one normalize_messages(). Everything else is implementation detail.
  2. Standardize on OpenAI's tool format internally — it's the de facto standard that most providers implement or can convert to.
  3. Test every adapter with the same tool-calling scenario — tool format translation bugs are silent and hard to debug in production.
  4. Smart routing saves money — simple queries on cheap models (DeepSeek at $0.14/M tokens), complex reasoning on premium models (Claude at $3/M tokens). The difference can be 20x in cost.
  5. Don't over-engineer — if you only use one model, skip this. But if you're building something that lasts, the adapter pattern is a cheap insurance policy against vendor lock-in.

Frequently Asked Questions

Q: Do I really need a model-agnostic architecture?
A: If you genuinely use one model with no plans to switch — no. But if you're building a product, platform, or anything that might outlive your current model choice — the adapter pattern pays for itself the first time you need to switch. It's cheap insurance against vendor lock-in.
Q: What's different about tool calling across models?
A: OpenAI wraps tools as {"type":"function","function":{...}}, Anthropic uses bare objects with an input_schema field, Gemini uses functionDeclarations arrays. This article standardizes on OpenAI's format internally, with each adapter handling its own conversion.
Q: Why standardize on OpenAI format internally?
A: OpenAI's function-calling format has become the de facto standard. DeepSeek, Groq, vLLM, Ollama, and many others all use OpenAI-compatible APIs. Using this format as your internal representation minimizes conversion overhead.
Q: How much can smart routing actually save?
A: Simple queries (<100 chars) on DeepSeek at ~$0.14/M tokens vs. complex reasoning on Claude at ~$3/M tokens — a 20x cost difference. The article provides a reference smart_route() implementation you can adapt.
Q: What are the adapter pattern's gotchas?
A: Tool format translation bugs are silent — hard to catch in production. Test every adapter with the same simple tool-calling scenario (e.g., "get weather"). Also, Anthropic requires extracting the system prompt as a separate parameter — an easy detail to miss.

Citable Definition

Model-Agnostic Agent: An AI Agent architecture pattern where the core decision loop (observe → think → act → observe) is decoupled from any specific LLM provider's API format. Through the Adapter Pattern, a unified interface contract is defined — typically comprising chat() (send messages and receive responses), normalize_tools() (tool definition format normalization), and normalize_messages() (message format normalization) — with each model provider (OpenAI, Anthropic, DeepSeek, local Llama, etc.) implementing that interface. The Agent core logic always operates on the unified format; switching models requires only replacing the adapter instance with zero business code changes. This is a low-cost insurance policy against vendor lock-in.

Next Steps