Ever had this experience? You ask an AI a question, and it gives you an answer that sounds thoroughly reasonable and well-argued. You believe it. Then you rephrase the question from a different angle — and it gives you an equally "reasonable" but completely opposite answer.
This is not a bug. It's a structural problem with single-model reasoning.
In this article, we'll start from cognitive psychology to understand why single AIs systematically err, then solve it with two agents debating each other — complete with runnable Python code.
Large language models learn human language patterns during training — and they also learn human cognitive biases. Here are the three most common and dangerous ones.
Definition: Once an initial judgment forms, subsequent reasoning selectively seeks supporting evidence while ignoring counter-evidence.
An example. You ask an AI:
"Is microservices architecture better than monolithic?"
The AI starts answering: "Microservices have many advantages — independent deployment, flexible tech stacks, team autonomy…" It continues down this path. Everything you hear is pro-microservices.
But if you ask:
"Isn't monolithic architecture more pragmatic than microservices?"
The AI now answers: "Monolithic architecture is indeed more pragmatic — simpler deployment, easier debugging, no distributed transaction complexity…" Equally well-argued, opposite conclusion.
Where's the problem? The AI isn't deliberately deceiving you. It simply retrieves same-camp text from its training data based on your question's framing, then follows that track all the way down. It won't volunteer "however, the opposing side argues…" — unless you explicitly demand it.
Definition: The first piece of information encountered (the "anchor") disproportionately influences subsequent judgments.
An example. Suppose you're estimating a new project timeline:
Every step seems reasonable — but that initial "3 days" might itself be wrong (maybe the login module involves SSO, multi-factor auth, audit logging — actually needing 2 weeks). That error compounds at every layer of subsequent reasoning.
A single AI's conversation is linear: earlier output becomes later input. An early misjudgment is like a foundation tilted 1 degree — the higher you build, the further off you land.
Definition: Excessively high confidence in one's own judgment, and poor at expressing uncertainty.
An example. You ask an AI: "Does this technical solution have security vulnerabilities?"
The AI might answer: "After review, no obvious security vulnerabilities were found. The code uses parameterized queries to prevent SQL injection, passwords are hashed with bcrypt, and session management uses HttpOnly cookies."
Sounds professional and confident. But it won't volunteer: "However, I cannot detect logic-level vulnerabilities (like missing authorization checks), nor can I discover known CVEs in third-party dependencies — those require security testing tools."
Worse, if you ask it to "self-review," it will most likely repeat its previous conclusion with a few cosmetic additions. It's like asking a student to grade their own exam — they can't find their own mistakes because they don't know where they might be wrong.
| Bias | Essence | One-Liner Harm |
|---|---|---|
| Confirmation Bias | Only sees supporting evidence | Whatever you ask, it agrees with you |
| Anchoring | Held hostage by initial information | The first mistake poisons all subsequent reasoning |
| Overconfidence | Overestimates own judgment | Never volunteers "I'm not sure" or "I might have missed something" |
If the bias of a single model comes from having "only one voice," the solution is natural: introduce a second, opposing voice.
Adversarial Collaboration is a scientific methodology originating from cognitive psychology, popularized by Nobel laureate Daniel Kahneman and others. Its core idea:
Have two parties with opposing views jointly design the research protocol, rather than each doing their own thing and attacking the other. The goal is not to "win," but to find the truth together.
Traditional debate is adversarial — both sides want to win. Adversarial collaboration differs in that: both sides agree to establish shared evaluation criteria before engaging, then let the facts speak.
In the world of AI Agents, adversarial collaboration maps intuitively:
This process mirrors academic peer review and the adversarial legal system — truth sharpens through challenge.
Below is a complete Python implementation. It creates two agents — one for and one against a proposition — runs multiple rounds of debate, and has a judge synthesize the conclusion.
Save it as debate.py, install openai, and you're ready to run.
"""
Multi-Agent Adversarial Collaboration — Beginner Example
Two agents debate opposing positions; a judge synthesizes the conclusion.
Requires: pip install openai
"""
import os
import json
from openai import OpenAI
# ──────────────────────────────────────────────
# 1. Initialize LLM client (placeholder credentials)
# ──────────────────────────────────────────────
client = OpenAI(
api_key="your-api-key",
base_url="https://api.example.com/v1"
)
# ──────────────────────────────────────────────
# 2. Debate Agent class
# ──────────────────────────────────────────────
class DebateAgent:
"""
A debate agent holding a specific stance.
Parameters:
name: Agent name (for logging)
stance: Position label, e.g. "Pro" or "Con"
system_prompt: System instructions defining its debate strategy
"""
def __init__(self, name: str, stance: str, system_prompt: str):
self.name = name
self.stance = stance
self.system_prompt = system_prompt
self.history: list[dict] = [] # Full conversation history
def respond(self, opponent_argument: str | None = None) -> str:
"""
Generate one round of argument.
If first round (opponent_argument=None), make an opening statement.
Otherwise, rebut the opponent's arguments and add new points.
"""
messages = [{"role": "system", "content": self.system_prompt}]
# Load conversation history
for entry in self.history:
messages.append(entry)
# Build the current turn's prompt
if opponent_argument is None:
user_prompt = (
"Please begin your opening statement. List 3-5 core arguments "
"supporting your position, each with specific reasoning."
)
else:
user_prompt = (
f"Below is your opponent's argument. Read it carefully, "
f"then rebut each point:\n\n"
f"--- Opponent's argument ---\n{opponent_argument}\n--- End ---\n\n"
f"Requirements:\n"
f"1. Respond to each of the opponent's points, "
f"identifying logical flaws or factual errors\n"
f"2. Present new arguments supporting your position\n"
f"3. If the opponent is genuinely right on some points, "
f"concede them but explain why they don't change your overall stance"
)
messages.append({"role": "user", "content": user_prompt})
# Call the LLM
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0.7,
max_tokens=800
)
reply = response.choices[0].message.content
self.history.append({"role": "assistant", "content": reply})
return reply
# ──────────────────────────────────────────────
# 3. Judge Agent (synthesizes the conclusion)
# ──────────────────────────────────────────────
class JudgeAgent:
"""
An impartial judge that synthesizes the full debate record
into a structured final conclusion.
"""
def evaluate(self, topic: str, debate_log: list[dict]) -> str:
"""
Read the complete debate transcript and produce a structured conclusion.
"""
# Build debate transcript
transcript_parts = []
for entry in debate_log:
transcript_parts.append(
f"### {entry['speaker']} (position: {entry['stance']})"
f" — Round {entry['round']}\n"
f"{entry['content']}\n"
)
transcript = "\n".join(transcript_parts)
system_prompt = (
"You are an absolutely impartial judge. "
"Your task is not to decide 'who won,' but to synthesize. \n\n"
"Please structure your conclusion as follows:\n"
"1. **Pro strengths**: Which pro arguments went unrebutted?\n"
"2. **Con strengths**: Which con arguments went unanswered?\n"
"3. **Areas of agreement**: What facts did both sides agree on?\n"
"4. **Uncertain areas**: Which key questions lack sufficient data to resolve?\n"
"5. **Overall recommendation**: Based on the above, give practical advice."
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": (
f"Debate topic: {topic}\n\n"
f"Full transcript:\n{transcript}\n\n"
f"Please deliver your synthesis."
)}
],
temperature=0.3, # Lower temperature for consistency
max_tokens=1000
)
return response.choices[0].message.content
# ──────────────────────────────────────────────
# 4. Debate Engine
# ──────────────────────────────────────────────
def run_debate(topic: str, rounds: int = 3) -> dict:
"""
Run a full adversarial collaboration debate.
Parameters:
topic: The debate proposition
rounds: Number of debate rounds (default 3)
Returns:
Dict containing the topic, debate transcript, and conclusion
"""
# ── Create Pro Agent ──
agent_pro = DebateAgent(
name="Pro",
stance="For",
system_prompt=(
f"You are a logically rigorous debater. "
f"Your position is [FOR] the following proposition:\n"
f"\"{topic}\"\n\n"
f"Rules:\n"
f"- Support your arguments with facts, data, and logic\n"
f"- When challenged, respond directly — do not evade\n"
f"- Do not voluntarily switch positions during the debate\n"
f"- If the opponent makes a point you cannot refute, "
f"concede honestly but explain why its overall impact is limited"
)
)
# ── Create Con Agent ──
agent_con = DebateAgent(
name="Con",
stance="Against",
system_prompt=(
f"You are a logically rigorous debater. "
f"Your position is [AGAINST] the following proposition:\n"
f"\"{topic}\"\n\n"
f"Rules:\n"
f"- Support your arguments with facts, data, and logic\n"
f"- When challenged, respond directly — do not evade\n"
f"- Do not voluntarily switch positions during the debate\n"
f"- If the opponent makes a point you cannot refute, "
f"concede honestly but explain why its overall impact is limited"
)
)
debate_log = []
pro_last = None
con_last = None
print(f"\n{'=' * 60}")
print(f"\U0001f3af Debate topic: {topic}")
print(f"{'=' * 60}")
# ── Run multiple debate rounds ──
for r in range(1, rounds + 1):
# Pro speaks
pro_arg = agent_pro.respond(con_last)
print(f"\n{'─' * 60}")
print(f"\U0001f5e3\ufe0f Pro — Round {r}")
print(f"{'─' * 60}")
print(pro_arg)
debate_log.append({
"round": r,
"speaker": "Pro",
"stance": "For",
"content": pro_arg
})
pro_last = pro_arg
# Con speaks
con_arg = agent_con.respond(pro_last)
print(f"\n{'─' * 60}")
print(f"\U0001f5e3\ufe0f Con — Round {r}")
print(f"{'─' * 60}")
print(con_arg)
debate_log.append({
"round": r,
"speaker": "Con",
"stance": "Against",
"content": con_arg
})
con_last = con_arg
# ── Judge synthesizes ──
judge = JudgeAgent()
conclusion = judge.evaluate(topic, debate_log)
print(f"\n{'=' * 60}")
print("\u2696\ufe0f Judge's Synthesis")
print(f"{'=' * 60}")
print(conclusion)
return {
"topic": topic,
"rounds": rounds,
"debate_log": debate_log,
"conclusion": conclusion
}
# ──────────────────────────────────────────────
# 5. Run the example
# ──────────────────────────────────────────────
if __name__ == "__main__":
result = run_debate(
topic="Should a small startup (under 10 people) "
"adopt microservices architecture from day one?",
rounds=3
)
# Optional: save debate record to file
with open("/tmp/debate_result.json", "w", encoding="utf-8") as f:
json.dump(result, f, ensure_ascii=False, indent=2)
print("\n\U0001f4c1 Debate record saved to /tmp/debate_result.json")
The code above is nearly 200 lines, but the structure is crystal clear — just three core classes and one engine function:
| Component | Responsibility | Key Detail |
|---|---|---|
DebateAgent |
Holds a single position, generates arguments and responds to rebuttals | Maintains its own history; every response builds on the full history |
JudgeAgent |
Reads the debate transcript and produces a structured conclusion | Uses temperature=0.3 to reduce randomness for consistent judgment |
run_debate() |
Orchestrates the debate flow | Alternates between both agents, collects full logs, triggers the judge |
debate_log |
Structured record of each round: speaker, stance, and content | Complete traceable record for post-hoc analysis |
your-api-key and api.example.com with your actual API credentials. The debate result is saved to /tmp/debate_result.json — you can compare how the pro and con arguments evolved across rounds.
You might ask: "Can't you just have one agent review its own output? Don't prompt engineering techniques like Chain-of-Thought and Self-Refine do exactly that?"
It's a good question, but the answer is: self-reflection has fundamental limitations.
Imagine proofreading an article you just wrote. You read it three times and think it's perfect — not because it is, but because your brain knows what you meant to say. You automatically fill in missing logic, gloss over vague phrasing, and overlook weak arguments.
An AI agent's self-reflection works the same way:
When two agents challenge each other, the situation is entirely different:
| Dimension | Self-Reflection | Mutual Challenge |
|---|---|---|
| Perspective | Single perspective, examined from within | Two orthogonal perspectives, challenged from outside |
| Knowledge boundary | Limited to one model's knowledge | Both sides can introduce different evidence domains (if combined with RAG) |
| Reasoning path | Linear reflection, strong path dependence | Two independent paths cross-colliding |
| Adversarial pressure | None — won't genuinely question itself | Strong — every statement can be rebutted |
| Bias exposure | Hidden — biases self-reinforce during reflection | Exposed — biases become attack points for the opponent |
Suppose you're making an important decision: "Should we migrate our core database from PostgreSQL to TiDB?"
Single Agent + Self-Reflection — The agent lists some pros and cons, then self-reviews: "The above analysis is generally reasonable, though we could add…" You get a conclusion that looks comprehensive but is actually mild.
Two Agents + Mutual Challenge:
See the difference? Self-reflection says "generally reasonable." Mutual challenge says "your second argument lacks data — show me the numbers." The latter exposes problems the former would never find.
debate.py in this article is already a working multi-agent debate system prototype. Copy, replace the API key, run it, and see the effect firsthand.📎 Replaces the earlier version: This site's previously published Multi-Agent Debate System Design briefly introduced the concept. This series systematically rebuilds on that foundation — from cognitive bias principles through code implementation to production deployment, providing a complete gradient learning path. Use this series as the canonical reference.
📖 Next: Structured Debate Protocol — 3-round debate (Opening → Cross-Examination → Closing) + Judge Agent role design