Multi-Agent Debate × Market Analysis — System Architecture & Data Pipeline

Q: Why 8 agents instead of 2 (one bull, one bear)? Isn't a simple bull-vs-bear debate enough?

Markets are multi-dimensional — time horizons (short-term overbought vs. long-term growth), analytical frameworks (technical vs. fundamental vs. macro vs. sentiment), and within-camp disagreement (two "bullish" technicians may disagree on which indicators matter). A 2-agent debate collapses all factors into a single axis (up or down), losing market structure. The 8-agent matrix (4×2) preserves adversarial tension between camps while introducing within-camp diversity — disagreements between bulls about *why* they're bullish surface the real uncertainty.

Q: What free APIs does the data pipeline use? Do I need to pay for anything?

Two free data sources: Yahoo Finance (via the yfinance library, unlimited calls, for indices/sector ETFs/historical prices/volume) and FRED (via fredapi, free API key with 120 requests/minute, for GDP/CPI/unemployment/yield curve and other macro indicators). The pipeline is designed for graceful degradation — if FRED is unavailable (no key or network issue), the system won't crash; it marks macro data as unavailable and continues. Technical indicators (RSI, MACD, ATR) and sentiment metrics (VIX regime, sector breadth) are computed locally, with no third-party dependency.

Q: What does "agent data slicing" mean? Why not give every agent all the data?

Data slicing means each agent receives only the knowledge base modules relevant to its analytical lens. For example, the Tech Bull sees only meta + indices + technicals — no macro data. This is not a limitation; it's by design. If every agent sees everything, they all converge to the same analysis, defeating the purpose of specialization. A technician forced to comment on GDP growth produces low-quality analysis; a fundamentalist analyzing candlestick patterns is equally out of their depth. Constraint creates depth — each agent goes deep in its own dimension; the judge (who sees everything) handles synthesis.

Q: What model does the debate system use? Can different agents use different models?

In the current architecture, all 8 agents + judge use the same LLM (e.g., GPT-4o or Claude) but with distinct system prompts and knowledge base slices. The design constraint: if the Tech Bear uses GPT-4o and the Tech Bull uses Claude, you can't tell whether debate results reflect genuine analytical differences or just model capability differences. Same model, different prompts = clean experimental design. Multi-model deployment (assigning the best model per analytical task) appears as a robustness upgrade in Article 4.

Q: Are the debate rounds parallel or sequential? How long does a full debate take?

Parallel strategy across 3 rounds: Round 1 (Opening Arguments) — all 8 agents parallel, ~12s; Round 2 (Cross-Examination) — 4 pairs parallel (Tech Bull vs. Tech Bear, Fund Bull vs. Fund Bear, etc.), each pair sequential, ~12s; Round 3 (Closing Statements) — all 8 agents parallel, ~12s. Plus judge synthesis ~3-5s, total debate time ~40s. Cross-examination uses paired rather than free-for-all design (avoiding 8×7=56 attack vectors becoming noise), consistent with the L2 series principle that "constraint creates quality."

Picture this: it is 9:29 AM. In sixty seconds, the market opens. You have inflation data from yesterday, overnight futures from Asia, a Fed speech at noon, and sector rotation signals all screaming in different directions. Your gut says one thing. The headlines say another. And your single LLM query — "analyze the market" — just gave you a bland, hedged summary that could have been written by any financial news bot.

Now picture this instead: eight specialized AI agents — four bulls, four bears — each armed with a different analytical lens, tearing into the same data. A technician dissecting the VIX term structure. A fundamentalist crunching earnings yield spreads. A macro strategist analyzing yield curve dynamics. A sentiment tracker parsing fund flows. They debate across three structured rounds, exposing each other's blind spots. Then a judge agent — with no stake in the outcome — synthesizes their clash into a single, unhedged analysis with explicit reasoning chains.

This is Article 1 of a new series: Multi-Agent Debate × Market Analysis. We are taking the theoretical framework from the L1-L4 debate system (adversarial collaboration, structured protocols, multi-judge consensus) and applying it to the hardest domain for AI reasoning: financial markets. Markets are adversarial by nature — every buyer needs a seller, every thesis has a counter-thesis. If there is any domain that demands a debate architecture, it is this one.

By the end of this article, you will have: a clear system architecture for an 8-agent + judge debate system, a complete data pipeline that pulls real market data from free APIs (Yahoo Finance, FRED), and a runnable Python module you can execute today. More importantly, you will understand why each architectural choice was made — not just what the code does.

Why Debate for Markets?

Before we write a single line of code, let's confront the question that separates a real system from a toy: why does market analysis specifically need a multi-agent debate architecture?

The Single-Agent Problem

Ask any LLM "what is your outlook on the S&P 500?" and you will get something like:

This is the hedge problem. A single LLM, prompted neutrally, defaults to covering both sides. It is not wrong — it is useless. The model has no skin in the game, no accountability for being wrong in a specific direction, and no mechanism to resolve the tension it just identified.

But the problem is deeper than hedging. Single-agent analysis suffers from three structural flaws:

Markets Are Adversarial Systems

Here is the insight that makes debate architecture necessary rather than optional: markets are, by their fundamental structure, adversarial systems. Every transaction has two counterparties with opposing views. Price discovery is literally the process of bulls and bears converging on a clearing price through continuous disagreement.

A single-agent analysis attempts to model an adversarial system with a cooperative reasoning process. That is like trying to simulate a chess game by asking one player to play both sides "fairly." It does not work — not because the player is not smart enough, but because adversarial depth requires adversarial process.

Why 8 Agents, Not 2?

You might wonder: if the L1-L4 series showed that even two agents debating improves reliability, why do we need eight?

Because markets are multi-dimensional. A simple bull vs. bear debate collapses all market factors into a single axis: up or down. But real markets have structure:

Eight agents (4 bulls + 4 bears) organized along two axes — analytical framework and time horizon — gives us exactly this within-camp diversity while maintaining the adversarial tension between camps.

System Architecture

Let us look at the whole system before we zoom into the data pipeline. Here is the architecture:

Component Breakdown

Debate Round Structure

The debate follows a 3-round protocol adapted from L2's structured debate design, but with one critical upgrade: within-round parallelism across all 8 agents.

Optional[TechnicalSignals]: """Compute technical indicators for an index.""" try: data = yf.download(ticker, period=period, progress=False) if data.empty: return None close = data["Close"].squeeze() volume = data["Volume"].squeeze() # Moving averages ma_status = {} for ma in TECHNICAL_CONFIG["ma_periods"]: if len(close) >= ma: ma_val = float(close.rolling(ma).mean().iloc[-1]) current = float(close.iloc[-1]) status = "above" if current > ma_val else "below" ma_status[f"ma{ma}"] = status # RSI(14) rsi = None rsi_period = TECHNICAL_CONFIG["rsi_period"] if len(close) >= rsi_period + 1: delta = close.diff() gain = delta.clip(lower=0) loss = (-delta).clip(lower=0) avg_gain = gain.rolling(rsi_period).mean() avg_loss = loss.rolling(rsi_period).mean() rs = avg_gain / avg_loss.replace(0, np.nan) rsi_series = 100 - (100 / (1 + rs)) rsi = round(float(rsi_series.iloc[-1]), 1) # MACD macd_sig = None cfg = TECHNICAL_CONFIG if len(close) >= cfg["macd_slow"] + cfg["macd_signal"]: ema_fast = close.ewm(span=cfg["macd_fast"]).mean() ema_slow = close.ewm(span=cfg["macd_slow"]).mean() macd_line = ema_fast - ema_slow signal_line = macd_line.ewm(span=cfg["macd_signal"]).mean() if macd_line.iloc[-1] > signal_line.iloc[-1]: macd_sig = "bullish" elif macd_line.iloc[-1] < signal_line.iloc[-1]: macd_sig = "bearish" else: macd_sig = "neutral" # ATR(14) atr = None if len(data) >= cfg["atr_period"] + 1: high = data["High"].squeeze() low = data["Low"].squeeze() tr = pd.concat([ high - low, (high - close.shift()).abs(), (low - close.shift()).abs() ], axis=1).max(axis=1) atr = round(float(tr.rolling(cfg["atr_period"]).mean().iloc[-1]), 2) # Volume trend vol_trend = "flat" if len(volume) >= 20: recent_vol = float(volume.tail(5).mean()) prior_vol = float(volume.tail(20).head(15).mean()) if prior_vol > 0: ratio = recent_vol / prior_vol if ratio > 1.2: vol_trend = "increasing" elif ratio < 0.8: vol_trend = "decreasing" return TechnicalSignals( ticker=ticker, ma_status=ma_status, rsi_14=rsi, macd_signal=macd_sig, atr_14=atr, volume_trend=vol_trend, ) except Exception as e: print(f" ✗ Error computing technicals for {ticker}: {e}", file=sys.stderr) return None

(continued — macro data, knowledge base assembly, helpers)


def fetch_macro_data() -> Dict[str, MacroSnapshot]:
    """Fetch macroeconomic data from FRED."""
    results: Dict[str, MacroSnapshot] = {}

    if not FRED_AVAILABLE:
        for key in MACRO_SERIES:
            results[key] = MacroSnapshot(
                indicator=key, description=f"FRED series {MACRO_SERIES[key]}",
                latest_value=None, trend="unavailable",
            )
        return results

    try:
        fred = Fred(api_key=FRED_API_KEY)
        for key, series_id in MACRO_SERIES.items():
            try:
                series = fred.get_series(series_id)
                if series.empty:
                    results[key] = MacroSnapshot(
                        indicator=key, description=series_id,
                        latest_value=None, trend="no_data",
                    )
                    continue

                latest = float(series.dropna().iloc[-1])
                latest_date = str(series.dropna().index[-1].date())

                yoy = None
                trend = None
                if len(series.dropna()) >= 13:
                    yoy_val = float(series.dropna().iloc[-13])
                    if yoy_val != 0:
                        yoy = round((latest - yoy_val) / abs(yoy_val) * 100, 2)

                if len(series.dropna()) >= 6:
                    recent_avg = float(series.dropna().tail(3).mean())
                    prior_avg = float(series.dropna().tail(6).head(3).mean())
                    if prior_avg != 0:
                        delta_pct = (recent_avg - prior_avg) / abs(prior_avg) * 100
                        if delta_pct > 0.5:
                            trend = "rising"
                        elif delta_pct < -0.5:
                            trend = "falling"
                        else:
                            trend = "flat"

                results[key] = MacroSnapshot(
                    indicator=key, description=series_id,
                    latest_value=round(latest, 4), latest_date=latest_date,
                    yoy_change_pct=yoy, trend=trend,
                )
            except Exception as e:
                results[key] = MacroSnapshot(
                    indicator=key, description=series_id,
                    latest_value=None, trend=f"error: {str(e)[:80]}",
                )
    except Exception as e:
        for key in MACRO_SERIES:
            results[key] = MacroSnapshot(
                indicator=key, description=MACRO_SERIES[key],
                latest_value=None, trend="connection_error",
            )

    return results

(continued — knowledge base assembly, helpers, agent slicer)


# ═══════════════════════════════════════════════════════════
# KNOWLEDGE BASE ASSEMBLY
# ═══════════════════════════════════════════════════════════

def build_knowledge_base() -> KnowledgeBase:
    """Main pipeline entry point. Fetches all data sources and assembles KB."""
    kb = KnowledgeBase()
    now = datetime.now(timezone.utc)

    kb.meta = {
        "generated_at": now.isoformat(),
        "market_status": "open" if _is_market_hours(now) else "closed",
        "data_sources": ["yfinance", "fred"] if FRED_AVAILABLE else ["yfinance"],
        "warnings": [],
    }

    print("📊 Fetching index data...")
    for name, ticker in INDICES.items():
        snap = fetch_index_data(ticker, name)
        if snap is None:
            kb.meta["warnings"].append(f"No data for {name} ({ticker})")
            continue
        if name in ("HSI", "N225", "STOXX"):
            kb.global_markets[name] = snap
        else:
            kb.indices[name] = snap

    print("📈 Computing technical indicators...")
    for name, ticker in INDICES.items():
        signals = compute_technical_signals(ticker)
        if signals:
            kb.technicals[name] = signals

    print("🏢 Fetching sector data...")
    for ticker, sector_name in SECTORS.items():
        snap = fetch_index_data(ticker, sector_name)
        if snap is None:
            continue
        spx = kb.indices.get("SPX")
        spx_ret = spx.returns.get("20d", 0) or 0 if spx else 0
        sec_ret = snap.returns.get("20d", 0) or 0
        rs = round(sec_ret - spx_ret, 2)
        kb.sectors[ticker] = SectorSnapshot(
            ticker=ticker, name=sector_name,
            price=snap.price,
            change_5d_pct=snap.returns.get("5d", 0) or 0,
            change_20d_pct=sec_ret,
            relative_strength_vs_spx=rs,
        )

    print("🏛  Fetching macro data (FRED)...")
    kb.macro = fetch_macro_data()
    if all(v.trend in ("unavailable", "no_data", "connection_error")
           for v in kb.macro.values()):
        kb.meta["warnings"].append(
            "FRED macro data unavailable — check API key or network")

    # — Fundamentals (derived from index/sector/FRED data) —
    spx = kb.indices.get("SPX")
    kb.fundamentals = {
        "sp500_pe_approx": _estimate_pe(spx),
        "sp500_earnings_yield_approx": _estimate_earnings_yield(spx),
        "sector_rotation_signal": _detect_sector_rotation(kb.sectors),
    }

    # — Sentiment (derived from VIX + volume + sector rotation) —
    vix = kb.indices.get("VIX")
    kb.sentiment = {
        "vix_level": vix.price if vix else None,
        "vix_regime": _classify_vix_regime(vix),
        "volume_signal": _volume_sentiment_signal(kb.indices),
        "sector_breadth": _sector_breadth(kb.sectors),
    }

    if kb.meta["market_status"] == "closed":
        kb.meta["warnings"].append(
            "Market closed — prices are last close, may be stale")

    print(f"✅ Knowledge base ready ({len(kb.indices)} indices, "
          f"{len(kb.sectors)} sectors, {len(kb.macro)} macro indicators)")
    if kb.meta["warnings"]:
        print(f"⚠  Warnings: {kb.meta['warnings']}")

    return kb

(continued — helper functions, agent data slicer, main)


# ═══════════════════════════════════════════════════════════
# HELPER FUNCTIONS
# ═══════════════════════════════════════════════════════════

def _is_market_hours(now: datetime) -> bool:
    """Rough check if US markets are open (9:30-16:00 ET, weekdays)."""
    et_hour = (now.hour - 4) % 24  # UTC-4 approximate for EDT
    et_minute = now.minute
    weekday = now.weekday()
    if weekday >= 5:
        return False
    total_minutes = et_hour * 60 + et_minute
    return 570 <= total_minutes <= 960


def _estimate_pe(spx: Optional[IndexSnapshot]) -> Dict[str, Any]:
    """Approximate S&P 500 P/E (placeholder — use real fundamentals API for production)."""
    if spx is None or spx.price == 0:
        return {"note": "PE estimate unavailable — no SPX data"}
    estimated_earnings = 240.0  # Trailing 12-month approximate
    pe = round(spx.price / estimated_earnings, 1)
    return {
        "current_pe_approx": pe,
        "long_term_avg_pe": 17.0,
        "note": "PE estimated from SPX price / approximate trailing earnings."
    }


def _estimate_earnings_yield(spx: Optional[IndexSnapshot]) -> Optional[float]:
    """Earnings yield = 1 / PE (approximate)."""
    pe_data = _estimate_pe(spx)
    pe = pe_data.get("current_pe_approx")
    if pe and pe > 0:
        return round(100 / pe, 2)
    return None


def _detect_sector_rotation(sectors: Dict[str, SectorSnapshot]) -> str:
    """Simple sector rotation signal based on relative strength changes."""
    if not sectors:
        return "insufficient_data"
    defensive = ["XLP", "XLU", "XLV"]
    cyclical = ["XLK", "XLY", "XLI", "XLB"]
    def_rs = sum(sectors[s].relative_strength_vs_spx
                 for s in defensive if s in sectors)
    cyc_rs = sum(sectors[s].relative_strength_vs_spx
                 for s in cyclical if s in sectors)
    if def_rs > cyc_rs + 2:
        return "defensive_rotation"
    elif cyc_rs > def_rs + 2:
        return "cyclical_rotation"
    return "neutral"


def _classify_vix_regime(vix: Optional[IndexSnapshot]) -> str:
    """Classify VIX regime based on level."""
    if vix is None:
        return "unknown"
    if vix.price < 15:
        return "low_volatility"
    elif vix.price < 20:
        return "normal"
    elif vix.price < 30:
        return "elevated"
    else:
        return "high_fear"


def _volume_sentiment_signal(indices: Dict[str, IndexSnapshot]) -> str:
    """Sentiment signal based on volume ratios across indices."""
    if not indices:
        return "unknown"
    ratios = [idx.volume_ratio for idx in indices.values()
              if idx.volume_ratio > 0 and idx.ticker not in ("^VIX",)]
    if not ratios:
        return "unknown"
    avg_ratio = sum(ratios) / len(ratios)
    if avg_ratio > 1.3:
        return "high_volume_rally"
    elif avg_ratio < 0.7:
        return "low_volume_drift"
    return "normal_volume"


def _sector_breadth(sectors: Dict[str, SectorSnapshot]) -> Dict[str, Any]:
    """Count how many sectors are positive over 5d and 20d."""
    if not sectors:
        return {"breadth_5d": None, "breadth_20d": None}
    pos_5d = sum(1 for s in sectors.values() if s.change_5d_pct > 0)
    pos_20d = sum(1 for s in sectors.values() if s.change_20d_pct > 0)
    total = len(sectors)
    regime = ("broad_strength" if pos_20d >= 7 else
              "narrow_leadership" if pos_20d <= 3 else "mixed")
    return {
        "positive_5d": f"{pos_5d}/{total}",
        "positive_20d": f"{pos_20d}/{total}",
        "breadth_regime": regime,
    }

(continued — agent data slicer, main)


# ═══════════════════════════════════════════════════════════
# AGENT DATA SLICER — Each agent gets only its relevant slice
# ═══════════════════════════════════════════════════════════

AGENT_SLICES = {
    "tech_bull":   ["meta", "indices", "technicals"],
    "tech_bear":   ["meta", "indices", "technicals"],
    "fund_bull":   ["meta", "indices", "sectors", "fundamentals"],
    "fund_bear":   ["meta", "indices", "sectors", "fundamentals"],
    "macro_bull":  ["meta", "macro", "global_markets", "indices"],
    "macro_bear":  ["meta", "macro", "global_markets", "indices"],
    "senti_bull":  ["meta", "sentiment", "indices", "sectors"],
    "senti_bear":  ["meta", "sentiment", "indices", "sectors"],
    "judge":       ["meta", "indices", "sectors", "technicals",
                    "fundamentals", "macro", "sentiment", "global_markets"],
}


def slice_for_agent(kb: KnowledgeBase, agent_id: str) -> Dict[str, Any]:
    """Extract only the sections relevant to a specific agent."""
    sections = AGENT_SLICES.get(agent_id, AGENT_SLICES["judge"])
    result = {}
    kb_dict = asdict(kb)
    for section in sections:
        if section in kb_dict:
            result[section] = kb_dict[section]
    return result


# ═══════════════════════════════════════════════════════════
# MAIN
# ═══════════════════════════════════════════════════════════

if __name__ == "__main__":
    print("=" * 60)
    print("📊 Market Data Pipeline — Multi-Agent Debate Knowledge Base")
    print("=" * 60)
    print()

    kb = build_knowledge_base()

    # Save full knowledge base
    output_path = "market_knowledge_base.json"
    with open(output_path, "w", encoding="utf-8") as f:
        json.dump(asdict(kb), f, indent=2, ensure_ascii=False, default=str)
    print(f"\n💾 Full knowledge base saved to: {output_path}")

    # Show sample slice for one agent
    print("\n── Sample: Tech Bull agent data slice ──")
    tech_bull_slice = slice_for_agent(kb, "tech_bull")
    print(json.dumps(tech_bull_slice, indent=2, ensure_ascii=False, default=str)[:1200])
    print("... (truncated)")

    print("\n── Agent data slice schema ──")
    for agent_id, sections in AGENT_SLICES.items():
        print(f"  {agent_id:15s} ← {sections}")

    print(f"\n✅ Pipeline complete. {len(kb.meta.get('warnings', []))} warning(s).")

Running the Pipeline

# Install dependencies
pip install yfinance fredapi pandas numpy

# Get a free FRED API key (optional but recommended):
# https://fred.stlouisfed.org/docs/api/api_key.html

# Run the pipeline
export FRED_API_KEY="***"
python market_data_pipeline.py

Expected output:

============================================================
📊 Market Data Pipeline — Multi-Agent Debate Knowledge Base
============================================================

📊 Fetching index data...
📈 Computing technical indicators...
🏢 Fetching sector data...
🏛  Fetching macro data (FRED)...
✅ Knowledge base ready (7 indices, 10 sectors, 10 macro indicators)

💾 Full knowledge base saved to: market_knowledge_base.json

── Sample: Tech Bull agent data slice ──
{
  "meta": { "generated_at": "2026-05-15T...", ... },
  "indices": { "SPX": { "price": 5847.23, ... }, ... },
  "technicals": { "SPX": { "rsi_14": 58.3, "macd_signal": "bullish" }, ... }
}

── Agent data slice schema ──
  tech_bull       ← ['meta', 'indices', 'technicals']
  tech_bear       ← ['meta', 'indices', 'technicals']
  fund_bull       ← ['meta', 'indices', 'sectors', 'fundamentals']
  ...

✅ Pipeline complete. 0 warning(s).

Key Design Decisions in the Pipeline

Graceful degradation: If FRED is unavailable (no API key, network down), the pipeline does not crash — it marks macro data as unavailable and continues. The debate can still run (macro agents will have less data but can reason from what is available).
Agent data slicing: Each agent only sees the data relevant to its analytical lens. The Tech Bull does not see macro data — not because macro does not matter, but because specialization requires focus. If every agent sees everything, they all converge to the same analysis. Constraint creates diversity.
Derived indicators: Technical signals (RSI, MACD, ATR) and sentiment metrics (VIX regime, sector breadth) are computed locally rather than pulled from external APIs. This ensures reproducibility and avoids dependency on third-party indicator services.
Freshness metadata: The meta section tracks when data was fetched and whether the market is open. This is critical for the debate protocol — agents need to know if they are analyzing live data or yesterday's close.

💡 The pipeline is a module, not a service: This code is designed to be imported by the debate orchestrator (Article 2), not run as a standalone web service. The orchestrator will call build_knowledge_base() once per debate, or once per day for scheduled runs. Caching (Article 4) will add a TTL layer on top so we do not re-fetch for every debate.

What Each Agent Sees: The Slice Principle

Let is make this concrete. After the pipeline runs, here is exactly what each agent receives when the debate starts:

Agent	Data Sections	Key Data Point Example
🐂 Tech Bull	meta, indices, technicals	SPX: price above MA50/MA200, RSI 58 (not overbought), MACD bullish crossover, volume increasing
🐻 Tech Bear	meta, indices, technicals	SPX: approaching resistance at 5900, RSI divergence forming, VIX contango narrowing
🐂 Fund Bull	meta, indices, sectors, fundamentals	SPX P/E 24.3 vs. 10Y Treasury yield divergence favors equities; sector earnings breadth positive
🐻 Fund Bear	meta, indices, sectors, fundamentals	P/E above 5-year average; profit margins at peak; earnings yield spread vs. bonds narrowing
🐂 Macro Bull	meta, macro, global_markets, indices	GDP growth positive, unemployment low, Fed potentially pausing, global PMI expanding
🐻 Macro Bear	meta, macro, global_markets, indices	CPI still above target, yield curve inverted, M2 contracting, geopolitical risk elevated
🐂 Senti Bull	meta, sentiment, indices, sectors	VIX in normal range, put/call ratio elevated (contrarian buy), cyclical rotation underway
🐻 Senti Bear	meta, sentiment, indices, sectors	AAII bullish sentiment elevated (contrarian sell), margin debt high, sector breadth narrowing

⚠️ The slice constraint is deliberate, not a limitation: It is tempting to give every agent the full knowledge base — "more data = better analysis." But this defeats the purpose of specialization. A technician forced to comment on GDP growth will produce low-quality analysis. A fundamentalist analyzing candlestick patterns is out of their depth. By constraining each agent's data to their analytical lens, we get deeper analysis in each dimension rather than shallower analysis across all dimensions. The judge, who sees everything, is responsible for synthesis.

From Data to Debate: The Road Ahead

At this point, we have a running data pipeline. Run python market_data_pipeline.py and you get market_knowledge_base.json — a structured snapshot of the market that any agent can read. But a knowledge base does not debate. Data is fuel, not fire.

In the next article, we will build the debate protocol — the engine that turns this data into competing analyses, cross-examination, and a synthesized conclusion. Here is what is coming:

Article 2 Preview: The Debate Protocol

Agent prompt engineering: The exact system prompts for all 8 agents, designed to produce structured, evidence-backed arguments rather than free-form opinions. Each prompt constrains the agent to cite specific data from its knowledge base slice, reducing hallucination.
3-round protocol implementation: The complete debate orchestration code — opening arguments (parallel), cross-examination (paired), closing statements (parallel). With the async execution pattern inherited from the L4 orchestrator.
Argument format specification: Every agent output follows a strict JSON schema: claim, evidence (with KB citations), confidence level, and key assumptions. This makes arguments machine-readable and comparable across agents.
Debate transcript generation: The full transcript that the judge will receive — structured, timestamped, with clear attribution of every argument to a specific agent.
~300 lines of runnable code: debate_protocol_market.py — instantiate agents, run rounds, collect transcript.

But before that, I want you to try something with today's code. Run the pipeline. Look at the knowledge base. Ask yourself: if you were the judge, given just this data, what would your market view be? Write it down. Do not hedge. Pick a direction and write three bullet points supporting it.

Then, when you read Article 2, compare your single-human-judge analysis to what the 8-agent debate produces. The difference — between one person looking at data and eight specialized agents tearing it apart — is exactly why we are building this system.

📎 Series: Multi-Agent Debate × Market Analysis. Article 1 of 4. Previous series: Multi-Agent Debate L1-L4 (adversarial collaboration theory and production deployment). Next: Article 2 — The Debate Protocol.

🔥 Run the pipeline today. ← Previous Series: Multi-Agent Debate L4 · Return to AI Agent Exploration for more articles.

Next Steps

📖 Next in series: The Debate Protocol — How 8 AI Agents Conduct Structured Adversarial Cross-Examination — Connect this article's data pipeline to a complete 3-round debate engine, with exact system prompts for all 8 agents and JSON argument format specifications.
📖 Debate theory foundation: Multi-Agent Debate L3: Scoring & Consensus Theory — Understand the theoretical origin of the judge scoring system: multi-dimensional scoring, weighted synthesis, and why debates need structured win/loss determination rather than simple "who won."
📖 Foundational skill: Multi-Agent Orchestration — From Single Agent to Agent Teams — Master async execution, message passing, and result aggregation patterns for multi-agent systems — the engineering foundation beneath the debate orchestrator.

Frequently Asked Questions

Q: Why 8 agents instead of 2 (one bull, one bear)? Isn't a simple bull-vs-bear debate enough?

A: Markets are multi-dimensional — time horizons (short-term overbought vs. long-term growth), analytical frameworks (technical vs. fundamental vs. macro vs. sentiment), and within-camp disagreement (two "bullish" technicians may disagree on which indicators matter). A 2-agent debate collapses all factors into a single axis (up or down), losing market structure. The 8-agent matrix (4×2) preserves adversarial tension between camps while introducing within-camp diversity — disagreements between bulls about *why* they're bullish surface the real uncertainty.

Q: What free APIs does the data pipeline use? Do I need to pay for anything?

A: Two free data sources: Yahoo Finance (via the yfinance library, unlimited calls, for indices/sector ETFs/historical prices/volume) and FRED (via fredapi, free API key with 120 requests/minute, for GDP/CPI/unemployment/yield curve and other macro indicators). The pipeline is designed for graceful degradation — if FRED is unavailable (no key or network issue), the system won't crash; it marks macro data as unavailable and continues. Technical indicators (RSI, MACD, ATR) and sentiment metrics (VIX regime, sector breadth) are computed locally, with no third-party dependency.

Q: What does "agent data slicing" mean? Why not give every agent all the data?

A: Data slicing means each agent receives only the knowledge base modules relevant to its analytical lens. For example, the Tech Bull sees only meta + indices + technicals — no macro data. This is not a limitation; it's by design. If every agent sees everything, they all converge to the same analysis, defeating the purpose of specialization. A technician forced to comment on GDP growth produces low-quality analysis; a fundamentalist analyzing candlestick patterns is equally out of their depth. Constraint creates depth — each agent goes deep in its own dimension; the judge (who sees everything) handles synthesis.

Q: What model does the debate system use? Can different agents use different models?

A: In the current architecture, all 8 agents + judge use the same LLM (e.g., GPT-4o or Claude) but with distinct system prompts and knowledge base slices. The design constraint: if the Tech Bear uses GPT-4o and the Tech Bull uses Claude, you can't tell whether debate results reflect genuine analytical differences or just model capability differences. Same model, different prompts = clean experimental design. Multi-model deployment (assigning the best model per analytical task) appears as a robustness upgrade in Article 4.

Q: Are the debate rounds parallel or sequential? How long does a full debate take?

A: Parallel strategy across 3 rounds: Round 1 (Opening Arguments) — all 8 agents parallel, ~12s; Round 2 (Cross-Examination) — 4 pairs parallel (Tech Bull vs. Tech Bear, Fund Bull vs. Fund Bear, etc.), each pair sequential, ~12s; Round 3 (Closing Statements) — all 8 agents parallel, ~12s. Plus judge synthesis ~3-5s, total debate time ~40s. Cross-examination uses paired rather than free-for-all design (avoiding 8×7=56 attack vectors becoming noise), consistent with the L2 series principle that "constraint creates quality."

Flaw	Description	Market Impact
Confirmation Bias	The model naturally gravitates toward evidence that supports the framing implied by your prompt	A "what could go right?" query gets rosy answers; a "what are the risks?" query gets doom. Same model, opposite conclusions.
Narrative Capture	LLMs are trained on text corpora dominated by consensus narratives. They reproduce the dominant story, not the outlier signal.	In March 2020, models trained on pre-pandemic text would have said "markets always recover" — while VIX hit 82 and circuit breakers fired.
False Equivalence	Without adversarial pressure, the model treats all arguments as equally valid — "on one hand X, on the other hand Y" — with no mechanism to weigh evidence.	A 10-year chart pattern and a Fed rate decision get the same paragraph length. Real analysts do not think this way.