SideQuest runs a council of Claude AI agents that collectively act as a Narrator. Players connect via WebSocket (server/client architecture); the system classifies intent, routes to specialist agents, and returns streaming narration with accompanying visuals, audio, and voice.
graph TD
Player -->|WebSocket| Server[GameServer]
Server --> Orchestrator
subgraph "Agent Council"
Orchestrator --> IntentRouter[Intent Router]
IntentRouter -->|exploration, examine| Narrator
IntentRouter -->|combat| Combat
IntentRouter -->|dialogue, persuasion| NPC
Narrator -->|JSON patch| WorldState[World State]
Combat -->|JSON patch| WorldState
NPC -->|JSON patch| WorldState
WorldState -->|updated GameState| Orchestrator
end
subgraph "Prompt Assembly"
PromptComposer[Prompt Composer] -->|system prompts| Orchestrator
GenrePack -->|rules, tone, lore| PromptComposer
SOUL[SOUL.md] -->|guiding principles| PromptComposer
RAG[Lore Retriever] -->|relevant fragments| PromptComposer
end
subgraph "Output Pipeline"
Narrator -->|narrative text| SceneInterpreter[Scene Interpreter]
Combat -->|narrative text| SceneInterpreter
NPC -->|narrative text| SceneInterpreter
SceneInterpreter -->|StageCues| RendererPipeline[Renderer]
RendererPipeline --> ImageGen[Flux Image Generation]
MusicDir[Music Director] -->|mood selection| AudioPipeline[Audio Mixer]
AudioInterp[Audio Interpreter] -->|SFX cues| AudioPipeline
end
subgraph "Voice Pipeline"
VoiceRouter[Voice Router] --> Kokoro[Kokoro TTS]
VoiceRouter --> Piper[Piper TTS — fallback]
Kokoro --> VoiceMixer[Voice Mixer]
Piper --> VoiceMixer
end
subgraph "Client (React)"
Browser[React Browser] -->|WebSocket| Server
Browser --> WhisperSTT[Whisper STT — local]
Browser --> WebAudio[Web Audio API]
Browser --> WebRTC[WebRTC Voice Chat]
end
subgraph "Multiplayer (optional)"
CollectWindow[Collect Window] -->|batched actions| TurnManager[Turn Manager]
TurnManager --> Orchestrator
PerceptionRewriter[Perception Rewriter] -->|per-player variants| Server
end
Server -->|streaming chunks| Player
The hub. Receives player input, coordinates agent sessions, and manages game state updates.
- Each agent runs as a
claudeCLI process (Claude Max subscription) in one of two modes:- Headless:
claude -psubprocesses for production use - Tmux: Visible tmux panes for debugging and evaluation
- Headless:
- No API keys needed — runs on the Claude Max subscription
- Manages the game loop lifecycle: startup, per-turn coordination, shutdown
LLM-based classifier that examines player input and game context to select the right agent.
| Intent | Routed To |
|---|---|
combat |
Combat Agent |
dialogue |
NPC Agent |
exploration |
Narrator Agent |
examine |
Narrator Agent |
inventory |
World State Agent |
world_query |
World State Agent |
meta |
Narrator Agent |
Ten specialist agents and utilities, each with a focused system prompt:
| Agent | File | Responsibility |
|---|---|---|
| Narrator | narrator.py |
Exploration, description, story progression, narrative hooks |
| Combat | combat.py |
Attacks, spells, tactical decisions, initiative |
| NPC | npc.py |
Dialogue, persuasion, social interaction, faction relationships |
| World State | world_state.py |
State tracking via JSON patches, consistency enforcement |
| Intent Router | intent_router.py |
Classifies player input to select the right agent |
| Music Director | music_director.py |
Selects background music based on scene mood and context |
| Perception Rewriter | perception_rewriter.py |
Per-player narration variants for asymmetric knowledge (blinded, charmed, etc.) |
| Narrator Continuity | narrator_continuity.py |
Drift detection — flags dead NPCs referenced alive, wrong locations, inventory contradictions |
| Hook Refiner | hook_refiner.py |
Polishes player-authored narrative hooks via LLM |
| Format Helpers | format_helpers.py |
Shared context formatting utilities for agent prompts |
Assembles structured system prompts using a three-tier rule taxonomy:
graph LR
subgraph "Rule Tiers"
Critical["Critical (MUST obey)"]
Firm["Firm (SHOULD follow)"]
Coherence["Coherence (style/tone)"]
end
Critical --> Assembly[Prompt Assembly]
Firm --> Assembly
Coherence --> Assembly
GenrePack[Genre Pack Overrides] --> Assembly
SOUL[SOUL.md Principles] --> Assembly
GameState[Current Game State] --> Assembly
Assembly --> SystemPrompt["System Prompt with\n<before-you-respond> self-check"]
Critical rules are universal (player agency, no metagaming, output format). Firm rules are agent-specific (living world for narrator, mechanical grounding for combat). Coherence rules govern style (detail weight, brevity, sensory grounding). Genre packs can override or extend any tier.
The GamePromptComposer adds attention-aware section assembly — positioning high-priority content where LLM attention is strongest.
Parses bold-header paragraphs from SOUL.md into SoulPrinciple objects that are injected into agent prompts. Results are cached per file path. This ensures every agent session is grounded in the project's guiding principles:
- Agency — the player controls their character
- Living World — NPCs act on their own goals
- Genre Truth — consequences match the genre
- Tabletop First, Then Better — design like a DM, then leverage the medium
- Cost Scales with Drama — computational effort follows narrative weight
- Diamonds and Coal — detail signals importance
Applies pattern-matching rules to narrative text and game state to produce structured StageCue objects that downstream pipelines consume.
graph LR
NarrativeText[Narrative Text] --> SI[Scene Interpreter]
GameState --> SI
GenrePack --> SI
SI --> StageCue1[StageCue: portrait]
SI --> StageCue2[StageCue: landscape]
SI --> StageCue3[StageCue: tactical_sketch]
Each StageCue carries a RenderTier that determines quality/latency tradeoffs:
| Tier | Purpose |
|---|---|
tactical_sketch |
Quick combat maps |
scene_illustration |
Full scene art |
portrait |
Character close-ups |
landscape |
Environment panoramas |
text_overlay |
Text on image |
cartography |
Maps |
fog_of_war |
Fog-of-war map overlays |
The single source of truth for the current session. All agents read from and write patches to it.
classDiagram
class GameState {
+characters: List~Character~
+npc_registry: NPCRegistry
+location: str
+time_of_day: str
+quest_log: Dict
+combat: CombatState
+chase: ChaseState
+active_tropes: List~TropeState~
+atmosphere: str
+narrative_log: List
+activate_trope()
+progress_trope()
+resolve_trope()
}
class Character {
+name: str
+race: str
+char_class: str
+level: int
+hp / max_hp
+stats: Dict
+inventory: Inventory
+narrative: NarrativeState
+hooks: List~NarrativeHook~
+progression: ProgressionState
}
class CombatState {
+in_combat: bool
+turn_order: List
+current_turn: str
+enemies: List
+round_number: int
}
class TropeState {
+trope_definition_id: str
+status: TropeStatus
+progression: float
}
class ChaseState {
+in_chase: bool
+pursuer: str
+quarry: str
+separation: int
+phase: ChasePhase
+outcome: ChaseOutcome
+drama_weight: float
+beats: List~ChaseBeat~
+rig: RigStats
}
GameState --> Character
GameState --> CombatState
GameState --> ChaseState
GameState --> TropeState
Character --> NarrativeHook
Character --> ProgressionState
Character --> Inventory
Unified model combining narrative identity and mechanical stats:
- Narrative hooks — typed facts from character creation (origin, wound, goal, secret, etc.) that the narrator is authorized to reference
- Affinity progression — tier-based advancement (Unawakened → Novice → Adept → Master) tracked per affinity
- Milestone progression — narrative milestones that accumulate toward level-ups
- Inventory — genre-aware item management with carry limits
State machine that drives character creation through genre-defined scenes:
stateDiagram-v2
[*] --> IDLE
IDLE --> IN_PROGRESS: start()
IN_PROGRESS --> IN_PROGRESS: advance_scene()
IN_PROGRESS --> AWAITING_FOLLOWUP: freeform input
AWAITING_FOLLOWUP --> IN_PROGRESS: followup resolved
IN_PROGRESS --> CONFIRMATION: all scenes complete
CONFIRMATION --> COMPLETE: build()
COMPLETE --> [*]
- Initialize with a
GenrePack(loads scenes fromchar_creation.yaml) - Walk through each scene: player picks a choice or enters freeform text
- Choices are validated against genre rules (allowed classes, races)
- Mechanical effects and narrative hooks accumulate
- At confirmation,
set_name()andgenerate_stats(), thenbuild()produces aCharacter - Builder state serializes via
to_dict()/from_dict()for mid-creation saves
graph TD
subgraph "Genre Pack (YAML directory)"
Pack[pack.yaml — metadata]
Lore[lore.yaml — world, factions, history]
Rules[rules.yaml — mechanics, stats, classes]
Prompts[prompts.yaml — agent prompt extensions]
CharCreation[char_creation.yaml — creation scenes]
Archetypes[archetypes.yaml — character templates]
Theme[theme.yaml — UI colors and styling]
Audio[audio.yaml — music and SFX config]
Inventory[inventory.yaml — item tables]
Progression[progression.yaml — leveling rules]
VisualStyle[visual_style.yaml — image gen directives]
VoicePresets[voice_presets.yaml — TTS per archetype]
end
Pack --> Loader[GenrePackLoader]
Lore --> Loader
Rules --> Loader
Prompts --> Loader
CharCreation --> Loader
Archetypes --> Loader
Theme --> Loader
Audio --> Loader
Inventory --> Loader
Progression --> Loader
VisualStyle --> Loader
VoicePresets --> Loader
Loader --> GenrePack[GenrePack — Pydantic model]
GenrePack --> PromptComposer
GenrePack --> CharacterBuilder
GenrePack --> UITheme[UI Theme]
GenrePack --> ImagePipeline[Image Pipeline]
GenrePack --> AudioPipeline[Audio Pipeline]
GenrePack --> VoicePipeline[Voice Pipeline]
GenrePack— Pydantic model aggregating all pack dataGenrePackLoader— Discovers packs ingenre_packs/, validates YAML, returnsGenrePackinstances- Pack name validation prevents path traversal (
^[a-zA-Z0-9][a-zA-Z0-9_-]*$)
See genre-packs.md for the full format and creation guide.
graph LR
StageCue --> SubjectExtractor[Subject Extractor]
SubjectExtractor --> PromptComposer2[Media Prompt Composer]
VisualStyle[visual_style.yaml] --> PromptComposer2
PromptComposer2 --> RendererFactory[Renderer Factory]
RendererFactory --> Flux[Flux Dev Worker]
Flux --> Cache[Image Cache]
Cache --> Display[Image Display]
- Subject Extractor — pulls subjects from narrative text for image prompts
- Renderer Factory — selects backend (Flux Dev + LoRAs for all image generation)
- Image Cache — SHA256-keyed with LRU eviction, avoids regenerating identical scenes
- Beat Filter — suppresses renders for mundane actions (dialogue-heavy, repetitive movement)
Three-channel mixer with crossfade and ducking:
| Channel | Purpose |
|---|---|
music |
Background music (genre-appropriate tracks) |
sfx |
Sound effects triggered by game events |
ambience |
Environmental atmosphere loops |
- Music Director agent selects tracks based on scene mood
- Audio Interpreter extracts SFX cues from narrative text
- Library Backend manages pre-generated audio assets from genre packs
- Loudnorm normalizes volume across tracks
- Rotator cycles through track variations to avoid repetition
Text-to-speech with per-character voice presets:
graph LR
NarrativeText[Narrative Text] --> Parser[Voice Parser]
Parser --> Segmenter[Text Segmenter]
Segmenter --> Router[Voice Router]
VoicePresets[voice_presets.yaml] --> Registry[Voice Registry]
Registry --> Router
Router --> Kokoro[Kokoro TTS — primary]
Router --> Piper[Piper TTS — fallback]
Kokoro --> Effects[Audio Effects]
Piper --> Effects
Effects --> Mixer[Voice Mixer]
Mixer --> TTSService[TTS Service]
TTSService -->|WebSocket| Client[React Client]
- Kokoro — primary TTS engine with 54 built-in voices, 24kHz, streaming synthesis, voice blending
- Piper — fast local fallback TTS for graceful degradation
- Effects — pitch, reverb, environment-aware post-processing (cavern echo, outdoor openness, whisper mode)
- Voice Registry — maps character archetypes to voice presets from the genre pack
- TTS Service (
server/tts_service.py) — streams synthesized audio to React client over WebSocket
- Channel — file-based JSON message bus for agent-to-agent communication (built, not used in production turn loop)
WebSocket-based architecture:
- GameServer (
server/app.py) — WebSocket server, message dispatch, streaming response relay - CollectWindow (
server/collect_window.py) — Multiplayer action batching with adaptive timer (0ms solo, 3s for 2-3 players, 5s for 4+) - TurnManager (
game/turn_manager.py) — Multiplayer barrier sync (waits for all players before resolving turn) - TurnMode (
server/turn_mode.py) — FREE_PLAY / STRUCTURED / CINEMATIC mode switching via narrator markers - React Client (
client/) — TypeScript + Vite + Tailwind + shadcn. Components: NarrativeView, PartyPanel, CharacterSheet, InventoryPanel, InputBar, MapOverlay, AudioStatus. Local Whisper STT (Transformers.js + WebGPU), Web Audio API for music/SFX/TTS playback, WebRTC peer-to-peer voice chat.
Note: TUI (Textual) and Discord transport layers have been removed. The React web client replaced an earlier PySide6 plan — see adr-react-web-client.md.
Orchestrates image generation with quality/latency tradeoffs:
- RenderQueue — async queue that processes
StageCues without blocking the game loop - Beat Filter — suppresses redundant renders for rapid-fire actions
- Stale Detection — discards renders that are no longer relevant (scene has moved on)
- Null Renderer — no-op backend for testing and headless mode
sequenceDiagram
participant Player
participant Server as GameServer
participant Orchestrator
participant IntentRouter
participant Agent
participant WorldState
participant SceneInterpreter
participant Renderer
Player->>Server: WebSocket: "I search the old chest"
Server->>Orchestrator: handle_player_input_streaming()
Orchestrator->>Orchestrator: sanitize input, cancel speculative pre-gen
Orchestrator->>IntentRouter: classify(input, game_state)
IntentRouter-->>Orchestrator: Intent.EXPLORATION → narrator
Orchestrator->>Orchestrator: build agent context (5-zone prompt + RAG lore)
Orchestrator->>Agent: session.send_streaming(prompt)
loop Streaming
Agent-->>Server: NARRATION_CHUNK
Server-->>Player: chunk displayed in real-time
end
par Background pipelines
Orchestrator->>WorldState: generate JSON patch
WorldState-->>Orchestrator: apply_patch(delta) + trope tick + auto-save
Orchestrator->>SceneInterpreter: interpret(narrative, state)
SceneInterpreter-->>Renderer: StageCue[] → render queue (Flux)
end
Server-->>Player: game view updates (character sheet, inventory, map)
For the full 42-step sequence with all five background pipelines, see turn-sequence-diagram.md.