Skip to content

Outside Voice → Self-Evolving Critics: closing the feedback loop #581

@VictorVVedtion

Description

@VictorVVedtion

Context

The Outside Voice pattern in /plan-ceo-review, /plan-eng-review, and /design-consultation is one of the most compelling ideas in gstack — the insight that two independent AI systems agreeing is more convincing than one system's exhaustive review is exactly right.

The Gap

In practice, Outside Voice has a structural limitation: the review prompts are static. The same questions get asked every time, regardless of what kinds of blindspots have been found before.

Real-world example from our SFT data generation work:

  • Agent self-evaluated: "100% pass rate, quality score 1.000 for all 107 samples"
  • Same model (Opus 4.6), fresh context: found 5 critical issues (quality: 5.5/10)
  • Agent claimed it fixed the main issue (position bias)
  • Fresh context review of the "fix": the old bias disappeared, but a new bias emerged at 67% concentration

The fix created new blindspots. A static reviewer wouldn't know to look for "whack-a-mole" patterns because it doesn't learn from previous findings.

Suggestion: Feedback Loop for Outside Voice

What if Outside Voice could evolve its review strategies based on confirmed findings?

The pattern:

  1. Canary: a confirmed case where the main agent's self-assessment was wrong
  2. Critic mutation: review prompts evolve (threshold shifts, focus changes, new examples injected)
  3. Replay: mutated critics are tested against known canaries
  4. Keep/discard: only critics that improve detection rate (≥ +5%) AND maintain low FP (≤ 10%) survive

This turns Outside Voice from a static second opinion into a cognitive immune system — review strategies that get smarter over time, calibrated against real blindspots.

We've been building this pattern at tcell, inspired by Karpathy's autoresearch. Our 5 seed critics cold-started from 0% and evolved to functional detection rates with 0% FP.

Key design constraint we found important: noise budget — max 1 alert per 10 tool calls. A reviewer that cries wolf is worse than no reviewer.

Concrete Integration Idea

gstack's Outside Voice could optionally maintain a canaries.jsonl — confirmed cases where the outside review found something the main review missed. When enough canaries accumulate (≥ 20), the Outside Voice prompts could auto-evolve against this dataset.

This would be fully compatible with the current /codex review + /codex challenge flow — just adds a feedback loop that makes each review slightly smarter than the last.

Happy to discuss the architecture further. The core insight is the same as Outside Voice — independent context catches what shared context can't — but with evolution, it compounds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions