Ever had this experience? You ask an AI a question, and it gives you an answer that sounds thoroughly reasonable and well-argued. You believe it. Then you rephrase the question from a different angle — and it gives you an equally "reasonable" but completely opposite answer.
This is not a bug. It's a structural problem with single-model reasoning.
In this article, we'll start from cognitive psychology to understand why single AIs systematically err, then solve it with two agents debating each other — complete with runnable Python code.
Large language models learn human language patterns during training — and they also learn human cognitive biases. Here are the three most common and dangerous ones.
Definition: Once an initial judgment forms, subsequent reasoning selectively seeks supporting evidence while ignoring counter-evidence.
An example. You ask an AI:
"Is microservices architecture better than monolithic?"
The AI starts answering: "Microservices have many advantages — independent deployment, flexible tech stacks, team autonomy…" It continues down this path. Everything you hear is pro-microservices.
But if you ask:
"Isn't monolithic architecture more pragmatic than microservices?"
The AI now answers: "Monolithic architecture is indeed more pragmatic — simpler deployment, easier debugging, no distributed transaction complexity…" Equally well-argued, opposite conclusion.
Where's the problem? The AI isn't deliberately deceiving you. It simply retrieves same-camp text from its training data based on your question's framing, then follows that track all the way down. It won't volunteer "however, the opposing side argues…" — unless you explicitly demand it.
Definition: The first piece of information encountered (the "anchor") disproportionately influences subsequent judgments.
An example. Suppose you're estimating a new project timeline:
Every step seems reasonable — but that initial "3 days" might itself be wrong (maybe the login module involves SSO, multi-factor auth, audit logging — actually needing 2 weeks). That error compounds at every layer of subsequent reasoning.
A single AI's conversation is linear: earlier output becomes later input. An early misjudgment is like a foundation tilted 1 degree — the higher you build, the further off you land.
Definition: Excessively high confidence in one's own judgment, and poor at expressing uncertainty.
An example. You ask an AI: "Does this technical solution have security vulnerabilities?"
The AI might answer: "After review, no obvious security vulnerabilities were found. The code uses parameterized queries to prevent SQL injection, passwords are hashed with bcrypt, and session management uses HttpOnly cookies."
Sounds professional and confident. But it won't volunteer: "However, I cannot detect logic-level vulnerabilities (like missing authorization checks), nor can I discover known CVEs in third-party dependencies — those require security testing tools."
Worse, if you ask it to "self-review," it will most likely repeat its previous conclusion with a few cosmetic additions. It's like asking a student to grade their own exam — they can't find their own mistakes because they don't know where they might be wrong.
If the bias of a single model comes from having "only one voice," the solution is natural: introduce a second, opposing voice.
Adversarial Collaboration is a scientific methodology originating from cognitive psychology, popularized by Nobel laureate Daniel Kahneman and others. Its core idea:
Have two parties with opposing views jointly design the research protocol, rather than each doing their own thing and attacking the other. The goal is not to "win," but to find the truth together.
Traditional debate is adversarial — both sides want to win. Adversarial collaboration differs in that: both sides agree to establish shared evaluation criteria before engaging, then let the facts speak.
In the world of AI Agents, adversarial collaboration maps intuitively:
This process mirrors academic peer review and the adversarial legal system — truth sharpens through challenge.
Below is a complete Python implementation. It creates two agents — one for and one against a proposition — runs multiple rounds of debate, and has a judge synthesize the conclusion.
Save it as debate.py, install openai, and you're ready to run.
debate.py
openai
""" Multi-Agent Adversarial Collaboration — Beginner Example Two agents debate opposing positions; a judge synthesizes the conclusion. Requires: pip install openai """ import os import json from openai import OpenAI # ────────────────────────────────────────────── # 1. Initialize LLM client (placeholder credentials) # ────────────────────────────────────────────── client = OpenAI( api_key="your-api-key", base_url="https://api.example.com/v1" ) # ────────────────────────────────────────────── # 2. Debate Agent class # ────────────────────────────────────────────── class DebateAgent: """ A debate agent holding a specific stance. Parameters: name: Agent name (for logging) stance: Position label, e.g. "Pro" or "Con" system_prompt: System instructions defining its debate strategy """ def __init__(self, name: str, stance: str, system_prompt: str): self.name = name self.stance = stance self.system_prompt = system_prompt self.history: list[dict] = [] # Full conversation history def respond(self, opponent_argument: str | None = None) -> str: """ Generate one round of argument. If first round (opponent_argument=None), make an opening statement. Otherwise, rebut the opponent's arguments and add new points. """ messages = [{"role": "system", "content": self.system_prompt}] # Load conversation history for entry in self.history: messages.append(entry) # Build the current turn's prompt if opponent_argument is None: user_prompt = ( "Please begin your opening statement. List 3-5 core arguments " "supporting your position, each with specific reasoning." ) else: user_prompt = ( f"Below is your opponent's argument. Read it carefully, " f"then rebut each point:\n\n" f"--- Opponent's argument ---\n{opponent_argument}\n--- End ---\n\n" f"Requirements:\n" f"1. Respond to each of the opponent's points, " f"identifying logical flaws or factual errors\n" f"2. Present new arguments supporting your position\n" f"3. If the opponent is genuinely right on some points, " f"concede them but explain why they don't change your overall stance" ) messages.append({"role": "user", "content": user_prompt}) # Call the LLM response = client.chat.completions.create( model="gpt-4o", messages=messages, temperature=0.7, max_tokens=800 ) reply = response.choices[0].message.content self.history.append({"role": "assistant", "content": reply}) return reply # ────────────────────────────────────────────── # 3. Judge Agent (synthesizes the conclusion) # ────────────────────────────────────────────── class JudgeAgent: """ An impartial judge that synthesizes the full debate record into a structured final conclusion. """ def evaluate(self, topic: str, debate_log: list[dict]) -> str: """ Read the complete debate transcript and produce a structured conclusion. """ # Build debate transcript transcript_parts = [] for entry in debate_log: transcript_parts.append( f"### {entry['speaker']} (position: {entry['stance']})" f" — Round {entry['round']}\n" f"{entry['content']}\n" ) transcript = "\n".join(transcript_parts) system_prompt = ( "You are an absolutely impartial judge. " "Your task is not to decide 'who won,' but to synthesize. \n\n" "Please structure your conclusion as follows:\n" "1. **Pro strengths**: Which pro arguments went unrebutted?\n" "2. **Con strengths**: Which con arguments went unanswered?\n" "3. **Areas of agreement**: What facts did both sides agree on?\n" "4. **Uncertain areas**: Which key questions lack sufficient data to resolve?\n" "5. **Overall recommendation**: Based on the above, give practical advice." ) response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": ( f"Debate topic: {topic}\n\n" f"Full transcript:\n{transcript}\n\n" f"Please deliver your synthesis." )} ], temperature=0.3, # Lower temperature for consistency max_tokens=1000 ) return response.choices[0].message.content # ────────────────────────────────────────────── # 4. Debate Engine # ────────────────────────────────────────────── def run_debate(topic: str, rounds: int = 3) -> dict: """ Run a full adversarial collaboration debate. Parameters: topic: The debate proposition rounds: Number of debate rounds (default 3) Returns: Dict containing the topic, debate transcript, and conclusion """ # ── Create Pro Agent ── agent_pro = DebateAgent( name="Pro", stance="For", system_prompt=( f"You are a logically rigorous debater. " f"Your position is [FOR] the following proposition:\n" f"\"{topic}\"\n\n" f"Rules:\n" f"- Support your arguments with facts, data, and logic\n" f"- When challenged, respond directly — do not evade\n" f"- Do not voluntarily switch positions during the debate\n" f"- If the opponent makes a point you cannot refute, " f"concede honestly but explain why its overall impact is limited" ) ) # ── Create Con Agent ── agent_con = DebateAgent( name="Con", stance="Against", system_prompt=( f"You are a logically rigorous debater. " f"Your position is [AGAINST] the following proposition:\n" f"\"{topic}\"\n\n" f"Rules:\n" f"- Support your arguments with facts, data, and logic\n" f"- When challenged, respond directly — do not evade\n" f"- Do not voluntarily switch positions during the debate\n" f"- If the opponent makes a point you cannot refute, " f"concede honestly but explain why its overall impact is limited" ) ) debate_log = [] pro_last = None con_last = None print(f"\n{'=' * 60}") print(f"\U0001f3af Debate topic: {topic}") print(f"{'=' * 60}") # ── Run multiple debate rounds ── for r in range(1, rounds + 1): # Pro speaks pro_arg = agent_pro.respond(con_last) print(f"\n{'─' * 60}") print(f"\U0001f5e3\ufe0f Pro — Round {r}") print(f"{'─' * 60}") print(pro_arg) debate_log.append({ "round": r, "speaker": "Pro", "stance": "For", "content": pro_arg }) pro_last = pro_arg # Con speaks con_arg = agent_con.respond(pro_last) print(f"\n{'─' * 60}") print(f"\U0001f5e3\ufe0f Con — Round {r}") print(f"{'─' * 60}") print(con_arg) debate_log.append({ "round": r, "speaker": "Con", "stance": "Against", "content": con_arg }) con_last = con_arg # ── Judge synthesizes ── judge = JudgeAgent() conclusion = judge.evaluate(topic, debate_log) print(f"\n{'=' * 60}") print("\u2696\ufe0f Judge's Synthesis") print(f"{'=' * 60}") print(conclusion) return { "topic": topic, "rounds": rounds, "debate_log": debate_log, "conclusion": conclusion } # ────────────────────────────────────────────── # 5. Run the example # ────────────────────────────────────────────── if __name__ == "__main__": result = run_debate( topic="Should a small startup (under 10 people) " "adopt microservices architecture from day one?", rounds=3 ) # Optional: save debate record to file with open("/tmp/debate_result.json", "w", encoding="utf-8") as f: json.dump(result, f, ensure_ascii=False, indent=2) print("\n\U0001f4c1 Debate record saved to /tmp/debate_result.json")
The code above is nearly 200 lines, but the structure is crystal clear — just three core classes and one engine function:
DebateAgent
history
JudgeAgent
temperature=0.3
run_debate()
debate_log
your-api-key
api.example.com
/tmp/debate_result.json
You might ask: "Can't you just have one agent review its own output? Don't prompt engineering techniques like Chain-of-Thought and Self-Refine do exactly that?"
It's a good question, but the answer is: self-reflection has fundamental limitations.
Imagine proofreading an article you just wrote. You read it three times and think it's perfect — not because it is, but because your brain knows what you meant to say. You automatically fill in missing logic, gloss over vague phrasing, and overlook weak arguments.
An AI agent's self-reflection works the same way:
When two agents challenge each other, the situation is entirely different:
Suppose you're making an important decision: "Should we migrate our core database from PostgreSQL to TiDB?"
Single Agent + Self-Reflection — The agent lists some pros and cons, then self-reviews: "The above analysis is generally reasonable, though we could add…" You get a conclusion that looks comprehensive but is actually mild.
Two Agents + Mutual Challenge:
See the difference? Self-reflection says "generally reasonable." Mutual challenge says "your second argument lacks data — show me the numbers." The latter exposes problems the former would never find.
📎 Replaces the earlier version: This site's previously published Multi-Agent Debate System Design briefly introduced the concept. This series systematically rebuilds on that foundation — from cognitive bias principles through code implementation to production deployment, providing a complete gradient learning path. Use this series as the canonical reference.
📖 Next: Structured Debate Protocol — 3-round debate (Opening → Cross-Examination → Closing) + Judge Agent role design
Adversarial Collaboration: A methodology for improving AI decision quality by having two or more AI Agents with opposing stances challenge each other's reasoning, thereby overcoming the cognitive biases inherent in single-model outputs. Single large language models exhibit three systematic biases: Confirmation Bias (once an initial judgment forms, subsequent reasoning selectively seeks supporting evidence), Anchoring Effect (the first piece of information encountered disproportionately influences subsequent judgments), and Overconfidence (models express high confidence in incorrect answers). Adversarial collaboration forcibly introduces external critical perspective: after one Agent presents an argument, another Agent specifically hunts for logical gaps, overlooked counter-evidence, and hidden assumptions. Research demonstrates that problems exposed by two Agents debating each other can never be found by a single Agent's self-reflection. The concept originates from cognitive psychology research on adversarial collaboration, developed by Daniel Kahneman and others as an effective method for correcting human judgment biases.