Skip to content

Latest commit

 

History

History
457 lines (366 loc) · 16.9 KB

File metadata and controls

457 lines (366 loc) · 16.9 KB

Architecture

Overview

SideQuest runs a council of Claude AI agents that collectively act as a Narrator. Players connect via WebSocket (server/client architecture); the system classifies intent, routes to specialist agents, and returns streaming narration with accompanying visuals, audio, and voice.

graph TD
    Player -->|WebSocket| Server[GameServer]
    Server --> Orchestrator

    subgraph "Agent Council"
        Orchestrator --> IntentRouter[Intent Router]
        IntentRouter -->|exploration, examine| Narrator
        IntentRouter -->|combat| Combat
        IntentRouter -->|dialogue, persuasion| NPC
        Narrator -->|JSON patch| WorldState[World State]
        Combat -->|JSON patch| WorldState
        NPC -->|JSON patch| WorldState
        WorldState -->|updated GameState| Orchestrator
    end

    subgraph "Prompt Assembly"
        PromptComposer[Prompt Composer] -->|system prompts| Orchestrator
        GenrePack -->|rules, tone, lore| PromptComposer
        SOUL[SOUL.md] -->|guiding principles| PromptComposer
        RAG[Lore Retriever] -->|relevant fragments| PromptComposer
    end

    subgraph "Output Pipeline"
        Narrator -->|narrative text| SceneInterpreter[Scene Interpreter]
        Combat -->|narrative text| SceneInterpreter
        NPC -->|narrative text| SceneInterpreter
        SceneInterpreter -->|StageCues| RendererPipeline[Renderer]
        RendererPipeline --> ImageGen[Flux Image Generation]
        MusicDir[Music Director] -->|mood selection| AudioPipeline[Audio Mixer]
        AudioInterp[Audio Interpreter] -->|SFX cues| AudioPipeline
    end

    subgraph "Voice Pipeline"
        VoiceRouter[Voice Router] --> Kokoro[Kokoro TTS]
        VoiceRouter --> Piper[Piper TTS — fallback]
        Kokoro --> VoiceMixer[Voice Mixer]
        Piper --> VoiceMixer
    end

    subgraph "Client (React)"
        Browser[React Browser] -->|WebSocket| Server
        Browser --> WhisperSTT[Whisper STT — local]
        Browser --> WebAudio[Web Audio API]
        Browser --> WebRTC[WebRTC Voice Chat]
    end

    subgraph "Multiplayer (optional)"
        CollectWindow[Collect Window] -->|batched actions| TurnManager[Turn Manager]
        TurnManager --> Orchestrator
        PerceptionRewriter[Perception Rewriter] -->|per-player variants| Server
    end

    Server -->|streaming chunks| Player
Loading

Component Details

Orchestrator (orchestrator.py)

The hub. Receives player input, coordinates agent sessions, and manages game state updates.

  • Each agent runs as a claude CLI process (Claude Max subscription) in one of two modes:
    • Headless: claude -p subprocesses for production use
    • Tmux: Visible tmux panes for debugging and evaluation
  • No API keys needed — runs on the Claude Max subscription
  • Manages the game loop lifecycle: startup, per-turn coordination, shutdown

Intent Router (agents/intent_router.py)

LLM-based classifier that examines player input and game context to select the right agent.

Intent Routed To
combat Combat Agent
dialogue NPC Agent
exploration Narrator Agent
examine Narrator Agent
inventory World State Agent
world_query World State Agent
meta Narrator Agent

Agents (agents/)

Ten specialist agents and utilities, each with a focused system prompt:

Agent File Responsibility
Narrator narrator.py Exploration, description, story progression, narrative hooks
Combat combat.py Attacks, spells, tactical decisions, initiative
NPC npc.py Dialogue, persuasion, social interaction, faction relationships
World State world_state.py State tracking via JSON patches, consistency enforcement
Intent Router intent_router.py Classifies player input to select the right agent
Music Director music_director.py Selects background music based on scene mood and context
Perception Rewriter perception_rewriter.py Per-player narration variants for asymmetric knowledge (blinded, charmed, etc.)
Narrator Continuity narrator_continuity.py Drift detection — flags dead NPCs referenced alive, wrong locations, inventory contradictions
Hook Refiner hook_refiner.py Polishes player-authored narrative hooks via LLM
Format Helpers format_helpers.py Shared context formatting utilities for agent prompts

Prompt Composer (prompt_composer.py)

Assembles structured system prompts using a three-tier rule taxonomy:

graph LR
    subgraph "Rule Tiers"
        Critical["Critical (MUST obey)"]
        Firm["Firm (SHOULD follow)"]
        Coherence["Coherence (style/tone)"]
    end

    Critical --> Assembly[Prompt Assembly]
    Firm --> Assembly
    Coherence --> Assembly

    GenrePack[Genre Pack Overrides] --> Assembly
    SOUL[SOUL.md Principles] --> Assembly
    GameState[Current Game State] --> Assembly

    Assembly --> SystemPrompt["System Prompt with\n<before-you-respond> self-check"]
Loading

Critical rules are universal (player agency, no metagaming, output format). Firm rules are agent-specific (living world for narrator, mechanical grounding for combat). Coherence rules govern style (detail weight, brevity, sensory grounding). Genre packs can override or extend any tier.

The GamePromptComposer adds attention-aware section assembly — positioning high-priority content where LLM attention is strongest.

SOUL.md Parser (soul.py)

Parses bold-header paragraphs from SOUL.md into SoulPrinciple objects that are injected into agent prompts. Results are cached per file path. This ensures every agent session is grounded in the project's guiding principles:

  • Agency — the player controls their character
  • Living World — NPCs act on their own goals
  • Genre Truth — consequences match the genre
  • Tabletop First, Then Better — design like a DM, then leverage the medium
  • Cost Scales with Drama — computational effort follows narrative weight
  • Diamonds and Coal — detail signals importance

Scene Interpreter (scene_interpreter.py)

Applies pattern-matching rules to narrative text and game state to produce structured StageCue objects that downstream pipelines consume.

graph LR
    NarrativeText[Narrative Text] --> SI[Scene Interpreter]
    GameState --> SI
    GenrePack --> SI
    SI --> StageCue1[StageCue: portrait]
    SI --> StageCue2[StageCue: landscape]
    SI --> StageCue3[StageCue: tactical_sketch]
Loading

Each StageCue carries a RenderTier that determines quality/latency tradeoffs:

Tier Purpose
tactical_sketch Quick combat maps
scene_illustration Full scene art
portrait Character close-ups
landscape Environment panoramas
text_overlay Text on image
cartography Maps
fog_of_war Fog-of-war map overlays

Game State

GameState (game/state.py)

The single source of truth for the current session. All agents read from and write patches to it.

classDiagram
    class GameState {
        +characters: List~Character~
        +npc_registry: NPCRegistry
        +location: str
        +time_of_day: str
        +quest_log: Dict
        +combat: CombatState
        +chase: ChaseState
        +active_tropes: List~TropeState~
        +atmosphere: str
        +narrative_log: List
        +activate_trope()
        +progress_trope()
        +resolve_trope()
    }

    class Character {
        +name: str
        +race: str
        +char_class: str
        +level: int
        +hp / max_hp
        +stats: Dict
        +inventory: Inventory
        +narrative: NarrativeState
        +hooks: List~NarrativeHook~
        +progression: ProgressionState
    }

    class CombatState {
        +in_combat: bool
        +turn_order: List
        +current_turn: str
        +enemies: List
        +round_number: int
    }

    class TropeState {
        +trope_definition_id: str
        +status: TropeStatus
        +progression: float
    }

    class ChaseState {
        +in_chase: bool
        +pursuer: str
        +quarry: str
        +separation: int
        +phase: ChasePhase
        +outcome: ChaseOutcome
        +drama_weight: float
        +beats: List~ChaseBeat~
        +rig: RigStats
    }

    GameState --> Character
    GameState --> CombatState
    GameState --> ChaseState
    GameState --> TropeState
    Character --> NarrativeHook
    Character --> ProgressionState
    Character --> Inventory
Loading

Character Model (game/character.py)

Unified model combining narrative identity and mechanical stats:

  • Narrative hooks — typed facts from character creation (origin, wound, goal, secret, etc.) that the narrator is authorized to reference
  • Affinity progression — tier-based advancement (Unawakened → Novice → Adept → Master) tracked per affinity
  • Milestone progression — narrative milestones that accumulate toward level-ups
  • Inventory — genre-aware item management with carry limits

Character Creation (game/character_builder.py)

State machine that drives character creation through genre-defined scenes:

stateDiagram-v2
    [*] --> IDLE
    IDLE --> IN_PROGRESS: start()
    IN_PROGRESS --> IN_PROGRESS: advance_scene()
    IN_PROGRESS --> AWAITING_FOLLOWUP: freeform input
    AWAITING_FOLLOWUP --> IN_PROGRESS: followup resolved
    IN_PROGRESS --> CONFIRMATION: all scenes complete
    CONFIRMATION --> COMPLETE: build()
    COMPLETE --> [*]
Loading
  1. Initialize with a GenrePack (loads scenes from char_creation.yaml)
  2. Walk through each scene: player picks a choice or enters freeform text
  3. Choices are validated against genre rules (allowed classes, races)
  4. Mechanical effects and narrative hooks accumulate
  5. At confirmation, set_name() and generate_stats(), then build() produces a Character
  6. Builder state serializes via to_dict()/from_dict() for mid-creation saves

Genre System

Genre Pack Structure

graph TD
    subgraph "Genre Pack (YAML directory)"
        Pack[pack.yaml — metadata]
        Lore[lore.yaml — world, factions, history]
        Rules[rules.yaml — mechanics, stats, classes]
        Prompts[prompts.yaml — agent prompt extensions]
        CharCreation[char_creation.yaml — creation scenes]
        Archetypes[archetypes.yaml — character templates]
        Theme[theme.yaml — UI colors and styling]
        Audio[audio.yaml — music and SFX config]
        Inventory[inventory.yaml — item tables]
        Progression[progression.yaml — leveling rules]
        VisualStyle[visual_style.yaml — image gen directives]
        VoicePresets[voice_presets.yaml — TTS per archetype]
    end

    Pack --> Loader[GenrePackLoader]
    Lore --> Loader
    Rules --> Loader
    Prompts --> Loader
    CharCreation --> Loader
    Archetypes --> Loader
    Theme --> Loader
    Audio --> Loader
    Inventory --> Loader
    Progression --> Loader
    VisualStyle --> Loader
    VoicePresets --> Loader

    Loader --> GenrePack[GenrePack — Pydantic model]
    GenrePack --> PromptComposer
    GenrePack --> CharacterBuilder
    GenrePack --> UITheme[UI Theme]
    GenrePack --> ImagePipeline[Image Pipeline]
    GenrePack --> AudioPipeline[Audio Pipeline]
    GenrePack --> VoicePipeline[Voice Pipeline]
Loading
  • GenrePack — Pydantic model aggregating all pack data
  • GenrePackLoader — Discovers packs in genre_packs/, validates YAML, returns GenrePack instances
  • Pack name validation prevents path traversal (^[a-zA-Z0-9][a-zA-Z0-9_-]*$)

See genre-packs.md for the full format and creation guide.


Media Pipeline

Image Generation (media/)

graph LR
    StageCue --> SubjectExtractor[Subject Extractor]
    SubjectExtractor --> PromptComposer2[Media Prompt Composer]
    VisualStyle[visual_style.yaml] --> PromptComposer2
    PromptComposer2 --> RendererFactory[Renderer Factory]
    RendererFactory --> Flux[Flux Dev Worker]
    Flux --> Cache[Image Cache]
    Cache --> Display[Image Display]
Loading
  • Subject Extractor — pulls subjects from narrative text for image prompts
  • Renderer Factory — selects backend (Flux Dev + LoRAs for all image generation)
  • Image Cache — SHA256-keyed with LRU eviction, avoids regenerating identical scenes
  • Beat Filter — suppresses renders for mundane actions (dialogue-heavy, repetitive movement)

Audio System (audio/)

Three-channel mixer with crossfade and ducking:

Channel Purpose
music Background music (genre-appropriate tracks)
sfx Sound effects triggered by game events
ambience Environmental atmosphere loops
  • Music Director agent selects tracks based on scene mood
  • Audio Interpreter extracts SFX cues from narrative text
  • Library Backend manages pre-generated audio assets from genre packs
  • Loudnorm normalizes volume across tracks
  • Rotator cycles through track variations to avoid repetition

Voice Pipeline (voice/)

Text-to-speech with per-character voice presets:

graph LR
    NarrativeText[Narrative Text] --> Parser[Voice Parser]
    Parser --> Segmenter[Text Segmenter]
    Segmenter --> Router[Voice Router]
    VoicePresets[voice_presets.yaml] --> Registry[Voice Registry]
    Registry --> Router
    Router --> Kokoro[Kokoro TTS — primary]
    Router --> Piper[Piper TTS — fallback]
    Kokoro --> Effects[Audio Effects]
    Piper --> Effects
    Effects --> Mixer[Voice Mixer]
    Mixer --> TTSService[TTS Service]
    TTSService -->|WebSocket| Client[React Client]
Loading
  • Kokoro — primary TTS engine with 54 built-in voices, 24kHz, streaming synthesis, voice blending
  • Piper — fast local fallback TTS for graceful degradation
  • Effects — pitch, reverb, environment-aware post-processing (cavern echo, outdoor openness, whisper mode)
  • Voice Registry — maps character archetypes to voice presets from the genre pack
  • TTS Service (server/tts_service.py) — streams synthesized audio to React client over WebSocket

Communication

Inter-Agent (comms/)

  • Channel — file-based JSON message bus for agent-to-agent communication (built, not used in production turn loop)

Server/Client (server/, client/)

WebSocket-based architecture:

  • GameServer (server/app.py) — WebSocket server, message dispatch, streaming response relay
  • CollectWindow (server/collect_window.py) — Multiplayer action batching with adaptive timer (0ms solo, 3s for 2-3 players, 5s for 4+)
  • TurnManager (game/turn_manager.py) — Multiplayer barrier sync (waits for all players before resolving turn)
  • TurnMode (server/turn_mode.py) — FREE_PLAY / STRUCTURED / CINEMATIC mode switching via narrator markers
  • React Client (client/) — TypeScript + Vite + Tailwind + shadcn. Components: NarrativeView, PartyPanel, CharacterSheet, InventoryPanel, InputBar, MapOverlay, AudioStatus. Local Whisper STT (Transformers.js + WebGPU), Web Audio API for music/SFX/TTS playback, WebRTC peer-to-peer voice chat.

Note: TUI (Textual) and Discord transport layers have been removed. The React web client replaced an earlier PySide6 plan — see adr-react-web-client.md.


Render Pipeline (renderer/)

Orchestrates image generation with quality/latency tradeoffs:

  • RenderQueue — async queue that processes StageCues without blocking the game loop
  • Beat Filter — suppresses redundant renders for rapid-fire actions
  • Stale Detection — discards renders that are no longer relevant (scene has moved on)
  • Null Renderer — no-op backend for testing and headless mode

Data Flow: One Turn

sequenceDiagram
    participant Player
    participant Server as GameServer
    participant Orchestrator
    participant IntentRouter
    participant Agent
    participant WorldState
    participant SceneInterpreter
    participant Renderer

    Player->>Server: WebSocket: "I search the old chest"
    Server->>Orchestrator: handle_player_input_streaming()
    Orchestrator->>Orchestrator: sanitize input, cancel speculative pre-gen
    Orchestrator->>IntentRouter: classify(input, game_state)
    IntentRouter-->>Orchestrator: Intent.EXPLORATION → narrator
    Orchestrator->>Orchestrator: build agent context (5-zone prompt + RAG lore)
    Orchestrator->>Agent: session.send_streaming(prompt)
    loop Streaming
        Agent-->>Server: NARRATION_CHUNK
        Server-->>Player: chunk displayed in real-time
    end
    par Background pipelines
        Orchestrator->>WorldState: generate JSON patch
        WorldState-->>Orchestrator: apply_patch(delta) + trope tick + auto-save
        Orchestrator->>SceneInterpreter: interpret(narrative, state)
        SceneInterpreter-->>Renderer: StageCue[] → render queue (Flux)
    end
    Server-->>Player: game view updates (character sheet, inventory, map)
Loading

For the full 42-step sequence with all five background pipelines, see turn-sequence-diagram.md.