chat() + normalize_tools() + normalize_messages() interface, standardize internally on OpenAI tool format — everything else is implementation detail. The cost difference between models can exceed 20x.
Most AI Agent frameworks are born tied to a specific model vendor. LangChain was originally built around OpenAI. Claude Code is naturally Anthropic-exclusive. But in practice, you often need to switch between models — for cost, latency, capability matching, or simply to avoid lock-in.
This article shows you how to build an Agent that works with any model — Claude, GPT, DeepSeek, Llama, or locally deployed open models — by swapping a single line of configuration.
| Scenario | What You Need | Lock-In Problem |
|---|---|---|
| Production deployment | GPT-4o for complex tasks, Claude for long-form writing | Your code has OpenAI SDK hardcoded |
| Cost optimization | DeepSeek for simple queries (10x cheaper), GPT for hard ones | Tool definitions only work with one format |
| Privacy-sensitive data | Local Llama 3 for internal docs, cloud API for public tasks | Different message formats break your pipeline |
| Model evaluation | A/B test 3 models on the same Agent task | Can't swap models without code changes |
Model-agnostic means your Agent's core logic doesn't depend on any specific model's API format. The Agent loop — observe → think → act → observe — stays identical regardless of which model powers the "think" step.
The architecture has three layers:
┌─────────────────────────────────┐
│ Agent Core Loop │ ← Never changes
│ observe → think → act → observe │
└──────────────┬──────────────────┘
│ Unified interface
┌──────────────▼──────────────────┐
│ Adapter Layer │ ← Swap per model
│ ┌────────┐ ┌──────┐ ┌───────┐ │
│ │Claude │ │ GPT │ │DeepSk │ │
│ │Adapter │ │Adapter│ │Adapter│ │
│ └───┬────┘ └──┬───┘ └───┬───┘ │
└──────┼─────────┼─────────┼──────┘
│ │ │
Anthropic OpenAI DeepSeek APIs
Every adapter implements the same interface. Here's the contract:
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any
@dataclass
class AgentResponse:
"""Unified response from any model."""
content: str | None # Final text answer (if is_final=True)
tool_call: dict | None # Tool call request (if is_final=False)
is_final: bool # True = done, False = tool call needed
usage: dict # Token usage: {"input": N, "output": M}
class ModelAdapter(ABC):
"""Every model adapter must implement this interface."""
@abstractmethod
def chat(self, messages: list[dict],
tools: list[dict] | None = None,
temperature: float = 0.7,
max_tokens: int = 1000) -> AgentResponse:
"""Send messages + tools → receive response or tool call."""
...
@abstractmethod
def normalize_tools(self, tools: list[dict]) -> list[dict]:
"""Convert unified tool schema to model-specific format."""
...
@abstractmethod
def normalize_messages(self, messages: list[dict]) -> list[dict]:
"""Convert unified messages to model-specific format."""
...
Here are concrete implementations for the three most common model families. Notice how each handles tool calling differently.
from openai import OpenAI
class OpenAIAdapter(ModelAdapter):
def __init__(self, model="gpt-4o", api_key=None, base_url=None):
self.client = OpenAI(api_key=api_key, base_url=base_url)
self.model = model
def normalize_tools(self, tools):
# OpenAI uses the standard function-calling format — minimal changes
return [{"type": "function", "function": t} for t in tools]
def normalize_messages(self, messages):
# OpenAI format is the baseline; tool results use role "tool"
return messages # Already in correct format
def chat(self, messages, tools=None, temperature=0.7, max_tokens=1000):
kwargs = dict(
model=self.model,
messages=self.normalize_messages(messages),
temperature=temperature,
max_tokens=max_tokens
)
if tools:
kwargs["tools"] = self.normalize_tools(tools)
resp = self.client.chat.completions.create(**kwargs)
msg = resp.choices[0].message
return AgentResponse(
content=msg.content,
tool_call={
"name": msg.tool_calls[0].function.name,
"arguments": msg.tool_calls[0].function.arguments
} if msg.tool_calls else None,
is_final=msg.tool_calls is None,
usage={
"input": resp.usage.prompt_tokens,
"output": resp.usage.completion_tokens
}
)
import anthropic
class AnthropicAdapter(ModelAdapter):
def __init__(self, model="claude-sonnet-4-20250514", api_key=None):
self.client = anthropic.Anthropic(api_key=api_key)
self.model = model
def normalize_tools(self, tools):
# Anthropic uses a different tool format — no "type": "function" wrapper
normalized = []
for tool in tools:
inner = tool.get("function", tool) # Unwrap if nested
normalized.append({
"name": inner["name"],
"description": inner.get("description", ""),
"input_schema": inner.get("parameters",
{"type": "object", "properties": {}})
})
return normalized
def normalize_messages(self, messages):
# Anthropic needs system prompt extracted to separate parameter
normalized = []
for msg in messages:
if msg["role"] == "system":
continue # Handled separately
if msg["role"] == "tool":
# Anthropic uses "tool_result" blocks inside user messages
normalized.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": msg.get("tool_call_id", "unknown"),
"content": msg["content"]
}]
})
else:
normalized.append({"role": msg["role"],
"content": msg["content"]})
return normalized
def chat(self, messages, tools=None, temperature=0.7, max_tokens=1000):
system = next((m["content"] for m in messages
if m["role"] == "system"), None)
normalized_msgs = self.normalize_messages(messages)
kwargs = dict(
model=self.model,
messages=normalized_msgs,
max_tokens=max_tokens,
temperature=temperature
)
if system:
kwargs["system"] = system
if tools:
kwargs["tools"] = self.normalize_tools(tools)
resp = self.client.messages.create(**kwargs)
# Extract tool use blocks from response
tool_calls = [
block for block in resp.content
if block.type == "tool_use"
]
return AgentResponse(
content=resp.content[0].text if resp.content[0].type == "text"
else None,
tool_call={
"name": tool_calls[0].name,
"arguments": tool_calls[0].input,
"id": tool_calls[0].id
} if tool_calls else None,
is_final=len(tool_calls) == 0,
usage={
"input": resp.usage.input_tokens,
"output": resp.usage.output_tokens
}
)
Many models (DeepSeek, Llama via vLLM/Ollama, Groq) use the OpenAI-compatible API. One adapter covers them all — just change the base_url:
# DeepSeek — 10x cheaper for simple tasks, great for Chinese
agent = ModelAgnosticAgent(
OpenAIAdapter(
model="deepseek-chat",
api_key="sk-xxx",
base_url="https://api.deepseek.com/v1"
),
tools, prompt
)
# Local Llama 3 via Ollama — zero cost, full privacy
agent = ModelAgnosticAgent(
OpenAIAdapter(
model="llama3:70b",
api_key="ollama", # Ollama ignores the key
base_url="http://localhost:11434/v1"
),
tools, prompt
)
# Groq — fastest inference for real-time use cases
agent = ModelAgnosticAgent(
OpenAIAdapter(
model="llama-3.1-70b-versatile",
api_key="gsk_xxx",
base_url="https://api.groq.com/openai/v1"
),
tools, prompt
)
Different providers have slightly different tool schemas. The key insight: standardize on OpenAI's function-calling format as the internal representation, and let each adapter convert to its native format.
| Feature | OpenAI | Anthropic | Google Gemini |
|---|---|---|---|
| Tool wrapper | {"type": "function", "function": {...}} | Bare object, no wrapper | {"functionDeclarations": [...]} |
| Schema field | parameters (JSON Schema) | input_schema (JSON Schema) | parameters (OpenAPI-like) |
| Tool result role | role: "tool" | tool_result content block | role: "tool" |
| Parallel calls | Supported natively | Supported natively | Not supported |
{"type": "object", "properties": {...}, "required": [...]} for tool parameters. This is the only format that all major providers support with minimal conversion. Avoid provider-specific schema features.With a model-agnostic architecture, you can route tasks to the optimal model based on characteristics:
| Task Type | Recommended Model | Reason |
|---|---|---|
| Complex reasoning, math, code | Claude Opus / GPT-4o | Highest reasoning accuracy |
| Simple Q&A, summarization | DeepSeek / Llama 3 70B | 5-10x cheaper, good enough |
| Long-form writing | Claude Sonnet | Excellent prose quality |
| Chinese content | DeepSeek / Qwen | Native Chinese performance |
| Sensitive internal data | Local Llama / Qwen | Data never leaves your infra |
| Real-time (< 500ms) | Groq / GPT-4o-mini | Ultra-low latency |
def smart_route(task: str) -> ModelAdapter:
"""Route task to the best model based on heuristics."""
if any(kw in task.lower() for kw in ["code", "debug", "math", "logic"]):
return AnthropicAdapter(model="claude-opus-4-20250514")
if any(kw in task.lower() for kw in ["中文", "chinese", "翻译"]):
return OpenAIAdapter(model="deepseek-chat",
base_url="https://api.deepseek.com/v1")
if len(task) < 100: # Simple, short query
return OpenAIAdapter(model="gpt-4o-mini")
return AnthropicAdapter(model="claude-sonnet-4-20250514") # Default
Here's the full model-agnostic Agent. The core loop never changes — only the adapter does:
class ModelAgnosticAgent:
def __init__(self, model: ModelAdapter, tools: list[dict],
system_prompt: str):
self.model = model
self.tools = tools
self.messages = [{"role": "system", "content": system_prompt}]
self.total_cost = 0.0
def run(self, user_input: str, max_turns: int = 20) -> str:
self.messages.append({"role": "user", "content": user_input})
turns = 0
while turns < max_turns:
response = self.model.chat(
self.messages, self.tools,
temperature=0.7 if turns == 0 else 0.4 # Cool down
)
turns += 1
if response.is_final:
return response.content
# Execute tool and feed result back
tool_name = response.tool_call["name"]
tool_args = response.tool_call.get("arguments", {})
result = self._execute_tool(tool_name, tool_args)
self.messages.append({
"role": "assistant",
"content": None,
"tool_calls": [{
"id": f"call_{turns}",
"type": "function",
"function": {
"name": tool_name,
"arguments": json.dumps(tool_args)
if isinstance(tool_args, dict)
else tool_args
}
}]
})
self.messages.append({
"role": "tool",
"tool_call_id": f"call_{turns}",
"content": json.dumps(result)
})
return "Max turns reached without completion."
How do you know the adapter is working correctly? Test with a simple tool-calling task across all models:
def test_adapter(adapter: ModelAdapter):
"""Verify an adapter handles tool calling correctly."""
tools = [{
"name": "get_weather",
"description": "Get current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
}
},
"required": ["city"]
}
}]
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather in Tokyo?"}
]
response = adapter.chat(messages, tools)
assert not response.is_final, "Should request a tool call"
assert response.tool_call["name"] == "get_weather", \
f"Wrong tool: {response.tool_call['name']}"
assert "Tokyo" in str(response.tool_call.get("arguments", "")), \
"Missing city argument"
# Run against all your adapters
for name, adapter in [
("OpenAI", OpenAIAdapter()),
("Anthropic", AnthropicAdapter()),
("DeepSeek", OpenAIAdapter(base_url="https://api.deepseek.com/v1"))
]:
try:
test_adapter(adapter)
print(f"✅ {name}: PASS")
except Exception as e:
print(f"❌ {name}: FAIL — {e}")
| Framework | Model Support | Best For | When to Skip |
|---|---|---|---|
| smolagents (HuggingFace) | Any HF model + external APIs | Quick prototyping, HF ecosystem users | Need fine control over tool loop |
| DSPy | 10+ providers via adapters | Prompt optimization, A/B testing models | Simple tool-calling agents (overkill) |
| LangChain | Wide but historically OpenAI-first | Complex RAG pipelines, many integrations | Simplicity; LangChain adds abstraction overhead |
| Custom adapter (this article) | Any model, full control | Production systems, specific requirements | You only use one model |
chat() method, one normalize_tools(), one normalize_messages(). Everything else is implementation detail.{"type":"function","function":{...}}, Anthropic uses bare objects with an input_schema field, Gemini uses functionDeclarations arrays. This article standardizes on OpenAI's format internally, with each adapter handling its own conversion.smart_route() implementation you can adapt.Model-Agnostic Agent: An AI Agent architecture pattern where the core decision loop (observe → think → act → observe) is decoupled from any specific LLM provider's API format. Through the Adapter Pattern, a unified interface contract is defined — typically comprising chat() (send messages and receive responses), normalize_tools() (tool definition format normalization), and normalize_messages() (message format normalization) — with each model provider (OpenAI, Anthropic, DeepSeek, local Llama, etc.) implementing that interface. The Agent core logic always operates on the unified format; switching models requires only replacing the adapter instance with zero business code changes. This is a low-cost insurance policy against vendor lock-in.