Skip to content

Latest commit

 

History

History
497 lines (375 loc) · 43.3 KB

File metadata and controls

497 lines (375 loc) · 43.3 KB

Architecture & Conventions

Canonical reference for the copilot-session-knowledge architecture, data pipeline, and coding conventions.

This repo is a set of standalone Python CLI scripts — not a package or library. Each script is independently runnable and normally duplicates its own constants. Documented helper exceptions such as _tentacle_core.py, _tentacle_goal.py, _tentacle_pr.py, _tentacle_dispatch.py, and _tentacle_review.py are allowed only when they preserve an existing CLI/module contract and are listed in the script inventory. This keeps the default goal operator-first simplicity, not framework cohesion.

sk thin front door: sk.py is a thin dispatcher that maps memorable sub-commands (sk briefing, sk query, sk index build, …) to the underlying standalone scripts. It adds nothing to the data pipeline or business logic — it is purely a routing layer. After the standard install (install.py --test), a managed cross-platform sk launcher is provisioned automatically on your PATH — no manual alias or pip install needed. Windows PowerShell users without a PATH update can invoke python ~/.copilot/tools/sk.py directly as an equivalent fallback. All standalone scripts remain directly invocable as a fallback or for advanced use. The sk front door does not change the architectural contract below. When SK_HARNESS=1, dispatches are wrapped by the harness/ middleware package (pre/post hooks, timing telemetry, dry-run). See docs/HARNESS.md for API reference.

Data Pipeline

Session files (.md / .jsonl)
        │
        ▼
build-session-index.py  ──→  SQLite FTS5 (knowledge.db)
        │                             │
        ▼                             ▼
extract-knowledge.py  ──→  7 knowledge categories
        │                   (mistake, pattern, decision,
        │                    tool, feature, refactor, discovery)
        │                   + knowledge_relations
        │                     (SAME_SESSION†, SAME_TOPIC†, TAG_OVERLAP†,
        │                      RESOLVED_BY†, SEMANTIC_PROXIMITY*)
        ▼
query-session.py / briefing.py / mcp-server.py  ──→  Search, recall, MCP tools
        │
        ▼
watch-sessions.py  ──→  Incremental re-indexing (adaptive polling)

SEMANTIC_PROXIMITY is populated by the native Rust TF-IDF cosine implementation (watch.rs/tfidf.rs). extract-knowledge.py --semantic-only is available for manual/fallback use only — sk watch (Rust binary) never auto-spawns Python.

†Deterministic relations (SAME_SESSION, SAME_TOPIC, TAG_OVERLAP, RESOLVED_BY) and residual helpers (backfill_affected_files, infer_task_ids, confidence decay) are extracted natively by the Rust binary (sk-rust/src/index/extract.rs). extract-knowledge.py is an intentional manual operator tool — NOT auto-called by sk watch in the default Rust build.

Phases:

  1. build-session-index.py — Phase 1 (session metadata) + Phase 2 (event content) via providers/ → SQLite FTS5 (schema v8; current migration level v17)
  2. extract-knowledge.py — classifies into 7 types, deduplicates by content hash; category-aware confidence floors (pattern=0.5, others=0.4); recurrence reward (+0.03 per upsert, capped). Intentional manual operator toolsk watch (Rust binary) runs all hot-path classification, relation extraction, and semantic proximity natively; this script is named in sk watch recovery hints and is NOT auto-called on the native watch path.
  3. query-session.py / briefing.py / mcp-server.py — BM25 keyword search + optional semantic vector search (RRF blend) exposed via CLI and MCP
  4. watch-sessions.py / sk watch — adaptive polling (5 s / 30 s / 300 s tiers), auto re-indexes on file changes. sk watch Rust binary: handles session indexing, extract, relations, semantic proximity, and first-run DB bootstrap natively; never spawns Python. On DB or extract failure, emits a structured recovery hint naming the exact manual command.
  5. learn.py — manual knowledge entry; CLI interface for agents to record learnings during a session

Intentional Python Boundaries

sk watch (Rust binary) never spawns Python — including on error paths. The following Python surfaces are permanent intentional architecture, not residual code pending deletion:

Python surface Role Status Tested by
sk.py shim Thin launcher/dispatcher for non-binary installs; routes all sk <cmd> calls Intentional — the no-binary install contract tests/test_py_rust_boundary.py::test_python_shim_dispatch_without_rust_binary; tests/test_py_rust_boundary.py::test_project_list_json_matches_python_shim_when_rust_available
hook_runner.py Python hook runner for sk.py shim and non-binary installs; owns all managed hook events when no Rust binary is present Intentional — Python shim hook entry point tests/test_py_rust_boundary.py::test_hook_runner_empty_payload_matches_native_exit_when_rust_available
build-session-index.py Indexes session files → FTS5 DB; named in sk watch DB-failure recovery hints Intentional — manual operator recovery tool tests/test_py_rust_boundary.py::test_watch_db_failure_recovery_hint_no_python_spawn_when_rust_available
extract-knowledge.py Knowledge classification, relation extraction, --semantic-only fallback; named in sk watch extract-failure recovery hints Intentional — manual/fallback operator tool tests/test_py_rust_boundary.py::test_watch_db_failure_recovery_hint_no_python_spawn_when_rust_available
migrate.py Versioned schema migrations via schema_version table Intentional — canonical schema upgrade owner; Rust native bootstrap does NOT replace this tests/test_py_rust_boundary.py::test_migrate_help_remains_manual_python_surface
sync-daemon.py Push/pull sync runtime for Python sk.py shim and non-binary installs Intentional — shim sync path tests/test_py_rust_boundary.py::test_python_shim_sync_run_dispatches_sync_daemon_without_rust_binary
briefing.py, learn.py, query-session.py, project-registry.py, etc. Admin/operator CLI scripts Intentional — these are the primary Python CLI surface; the Rust binary may bypass Python for measured hot read-only subcommands such as sk project list, while the direct scripts and mutating fallbacks remain supported tests/test_py_rust_boundary.py::test_python_shim_dispatch_without_rust_binary; tests/test_py_rust_boundary.py::test_project_list_json_matches_python_shim_when_rust_available

What wave20 removed: auto-spawning Python subprocess on sk watch error paths. The scripts above remain on disk, are invoked by operators manually, and are referenced by name in sk watch recovery hints. None of these scripts are candidates for deletion as a consequence of wave20.

Native Rust Browse Importers

sk-rust/src/browse/importers/ contains a native Rust parity port for the debug-log importers documented in docs/DEBUG-LOG-CONTRACT.md: VS Code Agent Debug Log JSONL and OTel ReadableSpan JSONL. The port includes shared path-safety, bounded-line reading, SHA-1 synthetic span IDs, dedup hashing, and redaction helpers that mirror the production Python importers.

This is parser-only infrastructure. It has no Browse DB persistence and no CLI subcommand yet, so the Python modules remain the production callable importer surface. The Rust port is kept compiled and tested in sk-rust so a later operator command can wire it in without changing the data contract.

Python Lint Surface Inventory

This is the canonical inventory for Python Ruff coverage. Keep it in sync with .github/workflows/ci.yml, hooks/pre-commit, and tests/test_quality_gates.py.

Surface CI Ruff lint/format Local pre-commit Ruff Notes
Root standalone scripts *.py Any staged root .py path via in_python_cleanliness_surface() Every root-level Python entry point is inside the blocking Ruff baseline.
Package/module directories browse/, hooks/, scripts/ Any staged .py path under browse/, hooks/, or scripts/ Directory coverage applies to package-like surfaces where Ruff cleanup is already baselined.
Syntax gate python3 scripts/check_syntax.py All staged .py files via scripts/check_syntax.py when installed Syntax coverage is broader than Ruff coverage.
Complexity advisory Full suite path through run_all_tests.py; local hook runs scripts/check_complexity.py --json on staged .py snapshots All staged .py files, non-blocking and fail-open Advisory only; it prints findings but does not deny commits.
Ruff complexity/refactor advisory Complexity advisory (Ruff C90/PLR) runs ruff check --select C90,PLR0911,PLR0912,PLR0913,PLR0915 --statistics on the same Ruff surface Not run by local pre-commit CI advisory only; continue-on-error: true records baseline counts before enforcement.
Out-of-surface Ruff advisory Not run by CI Staged .py files outside root scripts and covered package directories run ruff check as a non-blocking [advisory] scan Advisory only; it is fail-open when Ruff is absent and does not change the blocking Ruff surface.

Root *.py files are now inside the blocking Ruff lint surface by default. New root scripts still must honor the standalone-script architecture contract: keep them self-contained and run syntax/tests for the touched behavior. Python files outside root scripts, browse/, hooks/, and scripts/ are outside CI Ruff coverage; the local pre-commit hook may print non-blocking [advisory] Ruff findings for those files when Ruff is installed.

Script Inventory

Script Role
build-session-index.py Indexes session files → FTS5 DB
extract-knowledge.py Classifies + deduplicates knowledge entries; category-aware confidence floors; recurrence reward
query-session.py FTS5 + semantic search; JSON/markdown export
briefing.py Task-scoped recall; context packs for agent injection
mcp-server.py Read-only MCP stdio JSON-RPC surface for briefing and query_session
watch-sessions.py File watcher; triggers incremental re-indexing
learn.py Manual knowledge entry
tentacle.py Multi-agent orchestration (create → todo → bundle → swarm → complete) + orchestrator goal loop (goal init/validate/status/dispatch/link/eval/resume/criteria/gate/budget/next-iter/verify-loop/coverage/context). goal.json keeps both the legacy flat tentacles list and a structured per-iteration iterations map so status/history queries can answer which tentacles belonged to each iteration without breaking older readers. Goal writes are serialized with a goal.json.lock sidecar (`O_CREAT
_tentacle_core.py Low-level tentacle.py helper module for core path constants, file-locking helpers, git-root discovery, tentacle directory resolution, Windows filesystem retries, PID liveness checks, and todo parsing/rendering. tentacle.py re-exports these symbols to preserve the long-standing CLI/module contract while later extraction waves split higher-level seams.
_tentacle_goal.py Extracted tentacle.py goal subsystem: goal state locking, iteration/budget helpers, goal lifecycle commands, dispatch planning, continuation context, verify-loop, coverage, loop, and resilience-status behavior. tentacle.py injects the remaining orchestration-owned helpers at import time and re-exports goal symbols so existing CLI/module callers keep working.
_tentacle_pr.py Extracted tentacle.py PR automation seam for collecting tentacle handoffs/verifications, generating commit messages and PR bodies, and implementing sk tentacle pr. tentacle.py injects patchable runtime wrappers and re-exports PR symbols so tests and module callers that patch tentacle._pr_run_subprocess_safe keep working.
_tentacle_dispatch.py Extracted tentacle.py dispatch/profile/recall seam for runtime bundle materialization, live briefing recall packs, specialist agent profiles, prompt sizing, resume, swarm/dispatch, next-step, and bundle. tentacle.py injects patchable runtime wrappers and re-exports dispatch symbols so existing tests/callers that patch tentacle.* keep working.
_tentacle_review.py Extracted tentacle.py fresh-context reviewer and review-loop seam for reviewer bundles/findings, review-loop verification classification, blocker-resolver tentacle creation, dispatch-reviewer, and review-loop. tentacle.py injects patchable runtime wrappers and re-exports review symbols so existing tests/callers that patch tentacle.* keep working.
embed.py Optional semantic search via embedding APIs (OpenAI, Fireworks, etc.) with TF-IDF fallback
claude-adapter.py Parses Claude Code JSONL sessions into the common DB format
sync-knowledge.py Merges knowledge.db files across environments (Windows ↔ WSL); MAX confidence semantics
sync-config.py Single connection_string config; --setup, --setup-env, --status --json
sync-daemon.py Local-first push/pull runtime; backlog-aware adaptive limits and automatic sync queue compaction
sync-status.py Local sync diagnostics; --health-check, --audit, --json
auto-update-tools.py Smart git-diff–based update pipeline; sk-update alias
migrate.py Versioned schema migrations via schema_version table
install.py Deploy skills/hooks; inject global AI instructions
setup-project.py Full project onboarding: skills + hooks + WORKFLOW.md
project-registry.py sk project add/remove/list — manage the persistent project registry (tools-managed-projects.json); Rust sk project list is a measured native read-only hot path, while add/remove and direct-script use stay on this Python owner
host_manifest.py Single source of truth for supported hosts + their filesystem paths
index-status.py Row counts, FTS integrity, event-offset coverage
knowledge-health.py Knowledge base health + recall telemetry
error-analysis.py On-demand error pattern analysis: type distribution, severity, recurrence, root causes
benchmark.py Commit-keyed benchmark ledger for retro + health snapshots
checkpoint-save.py Save named checkpoint
checkpoint-restore.py List/restore checkpoints
checkpoint-diff.py Diff two checkpoints
browse.py Local web UI (127.0.0.1, token auth) with read-only diagnostics plus the authenticated /chat operator console
project-context.py Deterministic project-context.md generator
codebase-map.py Repo structure snapshot (auto-refreshed at session start)
trend-scout.py GitHub repo discovery via multi-lane search
copilot-cli-healer.py Repairs stale Copilot CLI package state

Browse UI Operator Console (/chat)

The browse UI exposes a browser-managed Copilot CLI execution console at /chat. It is the only browse surface that actively launches Copilot CLI; the rest of the UI remains read-only diagnostics and search.

Route prefix note: The Python browse server (browse/core/server.py) and the Firebase-hosted deployment now both serve the Next.js app at root (/*, e.g. /chat, /settings). Compatibility redirects from /v2/*/* remain for old bookmarks and deep links.

Components

Component Role
browse/core/operator_console.py Secure execution/persistence adapter. Starts Copilot CLI runs, normalizes event streams, and persists operator state under ~/.copilot/session-state/operator-console/.
browse/api/operator.py Authenticated REST + SSE surface for session CRUD, prompt submission, run status/history, path suggestions, previews, and diffs.
browse-ui/src/app/chat/ Next.js route wrapper for the /chat operator console.
browse-ui/src/components/chat/ ChatShell, Transcript, Composer, SessionCreateDialog, MetadataBar, file review components, and CLI session adoption UX (CliSessionPicker, CliAdoptedBadge, ConfirmAdoptionPanel).
browse-ui/src/components/chat/cli-session-picker.tsx Lists real CLI sessions and drives the adopt/confirm flow; composer is disabled until confirmed_at is set.
browse-ui/src/lib/api/{types,schemas,hooks}.ts Stable frontend contract layer for /api/operator/*.

Browse-wide host state

All pages share a single host context. The two source-of-truth files are:

File Role
browse-ui/src/providers/host-provider.tsx HostProvider React context — mounted at the root layout; exposes { host, diagnosticsEnabled } to every page via useHostState(). Listens to cross-tab storage events and same-tab BROWSE_HOST_CHANGE_EVENT to stay current without a reload.
browse-ui/src/lib/host-profiles.ts localStorage persistence helpers (saveHostProfile, deleteHostProfile, setSelectedHostId, getEffectiveHost, etc.) and the immutable LOCAL_HOST sentinel. All mutating helpers dispatch BROWSE_HOST_CHANGE_EVENT after writing so HostProvider re-evaluates immediately.

Active host resolution order (documented in getEffectiveHost()):

  1. Explicit selection (browse_selected_host_id in localStorage), if the referenced profile still exists.
  2. First remote profile with is_default === true.
  3. LOCAL_HOST sentinel — same-origin, no bearer token required.

The header renders a compact AWS-region-style global host dropdown that calls setSelectedHostId() on selection and links to /settings#hosts for management. The Settings page exposes the full HostManagement surface (list, add, remove, set-default, restore-local). SessionCreateDialog at /chat reads useHostState() to pre-populate the host picker when the dialog opens.

API surface (/api/operator/*)

POST /api/operator/sessions                  → create session
GET  /api/operator/sessions                  → list sessions
GET  /api/operator/sessions/{id}             → session detail
PATCH /api/operator/sessions/{id}            → update session mutable fields
POST /api/operator/sessions/{id}/prompt      → submit prompt → {run_id}
GET  /api/operator/sessions/{id}/stream      → SSE run output
GET  /api/operator/sessions/{id}/status      → session + active run status (?run=<run_id>)
GET  /api/operator/sessions/{id}/runs        → persisted run history
POST /api/operator/sessions/{id}/delete      → delete session
POST /api/operator/sessions/adopt            → adopt CLI session → operator session (201|200|409)
POST /api/operator/sessions/{id}/confirm     → confirm adopted session → enable resume
GET  /api/operator/suggest                   → path/workspace suggestions under ~/
GET  /api/operator/preview                   → file preview under ~/
GET  /api/operator/diff                      → unified diff for two files under ~/
GET  /api/operator/browsers                  → read-only scan of allowlisted local browser candidates
GET  /api/operator/cli-sessions              → list real CLI sessions (Bearer/cookie; debug=True; read-only)
GET  /api/operator/cli-sessions/{id}         → single CLI session by UUID (Bearer/cookie; debug=True; read-only)
GET  /api/session/{id}/debug-log             → paginated session-scoped debug log; preserves CLI event hierarchy via span_id/parent_span_id (CLI id/parentId → SHA-1 16-hex), derives duration_ms for paired start/completion events; limit 100 (Bearer/cookie; debug=True; read-only)
GET  /api/sessions/{id}/debug-log            → plural alias for the route above

Guardrails

  • Workspace confinement: every workspace or file path is normalized with confine_path() and rejected unless it resolves under Path.home().
  • Token auth: all /api/operator/* routes require the same per-launch browse token as the rest of the UI.
  • Prompt cap: prompts longer than 4096 characters are rejected.
  • Path cap: oversized path inputs are rejected before filesystem access.
  • Separate persistence: operator run history is stored under ~/.copilot/session-state/operator-console/ and replayed from disk on reload.
  • Same Copilot policy surface: operator-console runs still inherit the installed Copilot CLI's hooks, custom instructions, and permission system. Browser mediation does not bypass briefing, tentacle, or other active policy gates.

CLI Session Adoption / Two-ID Model

The operator console supports resuming an existing Copilot CLI session from the browser via the From CLI history picker. Two distinct identifiers are always in play and must never be aliased:

ID Where it lives Purpose
Operator session ID Browse/backend route segment; operator-console/<id>/ on disk Identifies the operator-side session; used in every /api/operator/sessions/<id>/* URL
CLI session UUID Backend field resume_target (UUID4); never surfaced as a route key Identifies the real Copilot CLI session; passed to copilot as --resume=<cli_uuid>

Lifecycle: discover -> adopt -> confirm -> prompt/resume

  1. discoverGET /api/operator/cli-sessions reads real CLI session artifacts under ~/.copilot/session-state/ (read-only, path-confined, Bearer/cookie auth, debug=True).
  2. adoptPOST /api/operator/sessions/adopt creates an operator session with source="cli_adopt", stores the CLI UUID in resume_target, and returns confirmed_at=None. The returned id is the new operator session ID.
  3. confirmPOST /api/operator/sessions/<operator_id>/confirm sets confirmed_at and resume_ready=True. Prompts are blocked until this step completes.
  4. prompt/resumePOST /api/operator/sessions/<operator_id>/prompt launches the CLI subprocess with --resume=<cli_uuid> (from resume_target). --name is never passed for adopted sessions. The operator session ID never appears in the CLI argv.

Guardrails specific to adoption

  • resume_target is validated as UUID4 at adopt time; malformed values are rejected.
  • The confirmation gate in operator_console.py · start_run() returns None for any session where confirmed_at is not set; the UI disables the composer until confirmation.
  • CLI tree discovery is read-only and path-confined; file hashes and workspace.yaml mtime are unchanged after discovery (tests/test_browse_chat_resume.py CR9).
  • Child subprocess env is filtered by _ENV_ALLOWLIST; test state env vars do not leak (CR8).

Stale / missing CLI session recovery

If the CLI session referenced by resume_target no longer exists on disk:

  • The GET /api/operator/cli-sessions/{uuid} probe returns 404.
  • CliSessionPicker (browse-ui) shows a warning badge on stale entries.
  • POST /api/operator/sessions/{id}/confirm calls get_cli_session_by_id and returns CLI_SESSION_NOT_FOUND / 404 when the CLI UUID is absent; there is no supported confirm-with-replacement-workspace fallback. Delete the unconfirmed operator session and adopt a different CLI session or start a fresh operator chat.
  • Deleting an operator session never touches the CLI session tree.

Duplicate adoption

POST /api/operator/sessions/adopt for a CLI UUID that has already been adopted returns one of two responses depending on confirmation state:

  • HTTP 200 — duplicate exists but is still unconfirmed; response body includes the existing operator session object (idempotent re-adopt).
  • HTTP 409 (ALREADY_ADOPTED) — duplicate is already confirmed; response is error-only with no session object. To find the existing session, use GET /api/operator/sessions.

Components added by CLI adoption UX

Component Role
browse-ui/src/components/chat/cli-session-picker.tsx Picker that lists real CLI sessions from /api/operator/cli-sessions and triggers the adopt flow
CliAdoptedBadge (in cli-session-picker.tsx) Badge shown when a session was adopted from CLI history
ConfirmAdoptionPanel (in cli-session-picker.tsx / chat-shell.tsx) Workspace/add_dirs confirmation step before the composer is enabled

Operator runbook for the adopt/confirm flow: docs/OPERATOR-PLAYBOOK.md — Chat Resume / CLI Session Adoption

Compatibility with watch-sessions and auto-update

  • watch-sessions.py still tracks normal Copilot session artifacts under ~/.copilot/session-state/; the operator console reads its own persisted run history directly from operator-console/.
  • auto-update-tools.py can restart watch-sessions.py, but it does not restart the browse server or interfere with active operator runs.
  • UI-only browse-ui/src/ updates require regenerating the local browse-ui/dist/ artifact before the running browse server can serve them; Python changes to browse/api/operator.py or browse/core/operator_console.py still require a manual browse server restart.

Remote access

Two deployment modes are supported:

Mode 1 — Same-origin (Cloudflare Tunnel, default): The Python browse server serves both the static UI and the API behind a single Cloudflare Tunnel URL (e.g. browse.example.com). All API calls are same-origin; no CORS configuration needed. Key code-level constraint: browse/core/auth.py · check_origin() builds the expected CSRF origin as http://{Host}. Behind HTTPS the browser sends Origin: https://… — these do not match, causing POST mutations to return 403. Fix check_origin to accept https:// (check X-Forwarded-Proto) before enabling full remote operator console access.

Mode 2 — Firebase Hosting control plane: The static browse-ui is deployed to Firebase Hosting (operator's chosen custom domain). The operator's browse.py server is reached via a Cloudflare Tunnel. All API calls from the Firebase-hosted UI to the tunnel are cross-origin; the operator host exposes an explicit CORS allowlist, Bearer auth, and a GET /api/operator/capabilities endpoint. Host profiles let the UI target different operator machines. See docs/OPERATOR-PLAYBOOK.md — Firebase-hosted control plane for the full topology and manual console steps.

Full remote-access setup, Cloudflare Access guidance, and Firebase topology: docs/OPERATOR-PLAYBOOK.md

Hosted shell specs (localhost bootstrap, relay, version negotiation, token storage, stream reconnect, first-run UX): see docs/HOSTED-SHELL-ARCHITECTURE.md and docs/HOSTED-SHELL-RESEARCH.md.

Enforcement Hooks

Hooks live in hooks/ and are deployed to ~/.copilot/hooks/ (Copilot CLI only).

  • Unified runner: hook_runner.py dispatches all hook events (1 Python process per event)
  • Supported events: sessionStart, sessionEnd, preToolUse, postToolUse, agentStop, subagentStop, errorOccurred
  • Fail-open: rule errors never block the agent
  • HMAC-signed markers: tamper-resistant counter state
  • Audit log: ~/.copilot/markers/audit.jsonl

Full hook architecture, rule inventory, and dispatched-subagent git guard: docs/HOOKS.md

Central Database

~/.copilot/session-state/knowledge.db — SQLite with FTS5, WAL journal mode, and optional vector embeddings.

Schema versions: v1–v6 (legacy) → v7 (two-phase indexing + event_offsets) → v8 (sessions_fts contentless FTS5 + BM25) → v9–v14 (eval, provenance, recall, sync, benchmark) → v15 (confidence_backfill_wave3: raises pattern confidence floor to 0.5 and applies recurrence reward to existing entries) → v16 (error lifecycle: error_type, root_cause, severity, is_resolved, fix_steps, prevention_hook, recurrence_after_briefing on knowledge_entries) → v17 (briefing_deliveries table for tracking which entries were briefed to each session). Run python3 ~/.copilot/tools/migrate.py to upgrade.

providers/ Package

SessionProvider ABC defines iter_sessions() and iter_events_with_offset().

  • CopilotProvider — handles .md session checkpoints
  • ClaudeProvider — handles JSONL with real byte-offset seeks for Phase 2

Tentacle Workspace

.octogent/ stores local tentacle state and is gitignored in this repo.
Runtime-bundle workflow: createtodo addbundle (optional) → swarmcomplete.

complete accepts an optional --auto-verify <cmd> flag (fail-open): runs the command, persists the result as verification evidence before closing. Use --auto-verify-timeout <seconds> (default: 120 s) if the command is long-running.

Sub-agents must write a structured handoff before stopping:

tentacle.py handoff <name> "<summary>" --status DONE --changed-file <path> [--changed-file ...] --learn

--status must be one of DONE, BLOCKED, TOO_BIG, AMBIGUOUS, or REGRESSED. Include one --changed-file receipt per modified file so the orchestrator can verify the handoff trail.

Quota-blocked handoffs — when a tentacle is BLOCKED due to a quota or rate-limit signal, add machine-readable metadata:

tentacle.py handoff <name> "<summary>" --status BLOCKED \
  --quota-reason rate_limit \
  --retry-hint 2026-05-14T00:00:00Z

--quota-reason is a short token (rate_limit, quota_exceeded, daily_quota, monthly_quota, token_quota, context_limit). --retry-hint is an optional ISO timestamp or human-readable hint. cmd_complete persists these fields into meta.json["quota_reason"] / meta.json["retry_hint"] and appends an entry to goal.json["quota_retry_queue"] for orchestrator tracking. Old BLOCKED handoffs without quota metadata are fully backward compatible.

_classify_quota_signal(text) is available to classify raw dispatch output into a quota_reason token. The pattern list is intentionally minimal pending the fuller failure-mode matrix (#183).

tentacle.py marker-cleanup (dry-run by default, --apply to act) inspects and removes stale entries from the dispatched-subagent marker without completing a tentacle. Only entries whose per-entry timestamp exceeds the declared TTL are eligible; live entries are never touched.

Bundle artifacts

Each bundle directory contains a manifest.json listing all artifacts. The goal_context artifact is optional — it is only present when the tentacle is linked to a goal:

Artifact key File When present
briefing briefing.md Always (placeholder when empty)
instructions instructions.md Always
session_metadata session-metadata.md Always
recall_pack recall-pack.json Always (empty when no matches)
goal_context goal-context.md Only when tentacle is linked to a goal

The goal_context artifact is the output of _goal_render_continuation_context: a compact markdown block with objective, iteration counter, budget limits, criteria progress, remaining criteria IDs + descriptions, and the last N prior handoff summaries. Sub-agents must read goal-context.md (when manifest.json lists it as populated: true) to understand the overarching goal before making changes.

Python-only implementation_goal_render_continuation_context, _goal_write_context_artifact, and _cmd_goal_context live entirely in _tentacle_goal.py (Python) and are re-exported from tentacle.py. The Rust sk binary routes sk tentacle goal context … to tentacle.py via its standard pass-through mechanism — no Rust changes are required when adding or modifying goal context behavior. The bundle injection path (_build_runtime_bundle) similarly calls the Python renderer directly; the Rust binary never constructs the goal-context.md content itself.

Full tentacle workflow reference: docs/USAGE.md

Host Scope

Tools are validated on Copilot CLI and Claude Code only.

Feature Copilot CLI Claude Code
Skill deployment .github/skills/ .claude/skills/
Hook deployment .copilot/hooks/ ❌ not supported
Global instruction injection ~/.github/copilot-instructions.md via CLAUDE.md
Session indexing ✅ via claude-adapter.py

host_manifest.py is the single authoritative source for supported hosts and their filesystem paths. Do not add Codex, Cursor, or other hosts without documented session and hook formats.

Full skill deployment and host scope details: docs/SKILLS.md

Sync Architecture

Sync is local-first: knowledge.db is the authoritative read/query source; remote sync is optional transport/storage only.

  • Single config key: connection_string in ~/.copilot/tools/sync-config.json
  • sync-config.py --setup accepts HTTP(S) gateway URLs only (not raw Postgres/libSQL DSNs)
  • sync-gateway.py is reference/mock only — not a production authority
  • Default provider recommendation: Neon (backing Postgres) + Railway (thin gateway host)
  • Missing connection_string → daemon stays local-only/idle (not a fatal error)

Full sync diagnostics reference: docs/USAGE.md


Conventions

These conventions apply to all scripts in this repo. Follow them in every change.

Language & Runtime

  • Pure stdlib Python 3.10+ — zero pip dependencies required. scikit-learn and embedding API keys are optional.
  • Every script is standalone — no shared library or package imports between scripts. Each script duplicates its own DB path constants, encoding fix, etc.
  • Windows encoding fix — every script starts with the same os.name == "nt" block to reconfigure stdout/stderr to UTF-8. Preserve this pattern in every new script.

SQL Safety

  • Parameterized SQL only — all user input uses ? placeholders. Never interpolate strings into SQL.
  • FTS5 query sanitization — strip FTS5 operators (OR, AND, NOT, NEAR, *, ") before passing to MATCH. See _sanitize_fts_query() in query-session.py.

Serialization & Locking

  • JSON serialization only — never use pickle. Legacy pickle detection exists but new code must use JSON / struct.pack.
  • Atomic lock files — use O_CREAT | O_EXCL for process locks (no TOCTOU races).

Input Limits

  • Title ≤ 200 chars
  • Content ≤ 10 K chars
  • FTS queries ≤ 500 chars
  • Paths ≤ 256 chars

Cross-Platform Paths

  • Use Path.home() and pathlib throughout. Handle WSL path differences explicitly.

JSON Field Envelopes (stable contracts)

  • query-session.py --task --export jsonentries[]
  • briefing.py --task --jsontagged_entries[] / related_entries[]
  • briefing.py --packentries.<category>[]
  • snippet_freshness values: fresh | drifted | missing | unknown
  • related_entry_ids — JSON ints, confidence-ranked, capped to top 3

Project Registry (tools-managed-projects.json)

The file ~/.copilot/session-state/tools-managed-projects.json is the persistent registry of projects managed by session-knowledge tools. It is written by install.py, setup-project.py, and project-registry.py.

Schema — backward-compatible mixed format:

{
  "projects": [
    "/legacy/string/path",
    {"name": "myproject", "path": "/richer/dict/path", "created_at": "2025-01-01T00:00:00+00:00"}
  ]
}
  • Legacy string entries (install.py, setup-project.py): plain path strings. Written by existing scripts; always preserved on any write.
  • Rich dict entries (project-registry.py): {name, path, created_at}. Written by sk project add. Both formats co-exist in the same file.

All readers (_load_project_registry() in install.py, setup-project.py, and auto-update-tools.py) extract the path string from either format.

project-registry.py is the CLI owner of this file: sk project add|remove|list. The Rust binary handles sk project list natively as a measured read-only hot path and preserves Python fallback/direct-script behavior for add, remove, help, and non-native forms.

DB Migrations

Add new migrations to the MIGRATIONS list in migrate.py with incrementing version numbers and a descriptive name.

Script Guards

  • if __name__ == "__main__":migrate.py and generate-summary.py are both guarded; they can be imported without side effects. New scripts that may be imported or tested should follow this pattern.
  • TOOLS_DIR resolution — root scripts that need a reliable tools-directory path use Path(__file__).resolve().parent. hooks/rules/common.py intentionally keeps the installed-hook Path.home() / ".copilot" / "tools" form (hooks run from ~/.copilot/hooks/, not the source tree).

Docs Quality Gates

Agent-authored docs and operator/research outputs (tentacle handoffs, retro summaries, knowledge-health reports) must follow the four-layer QA rubric defined in docs/AGENT-RULES.md: facts, interpretation, actions, and verification evidence are kept distinct. Contributor-facing docs (CONTRIBUTING.md) use the existing concise tone and are not in scope for this rubric.

CI Quality Gates

GitHub Actions runs these jobs on every push / PR:

  • quality-gates — syntax check, scoped Ruff lint, and the Python test suites. The Ruff lint surface is: all root *.py scripts plus browse/, hooks/, and scripts/. Ruff lint is scoped to this surface; Python outside root scripts and those directories is not linted by CI.
  • remote-terminalnpm ci, npm test, lint baseline, clean-zone lint gate, and blocking high-severity dependency audit for remote-terminal/. Legacy complexity/size warnings stay advisory in npm run lint; clean files (pty-daemon.js, test/client.test.js) promote the same rules to errors through npm run lint:clean.
  • browse-uipnpm typecheck, pnpm lint, pnpm format:check, pnpm test, pnpm build. browse-ui/eslint.config.mjs keeps the repo-wide @typescript-eslint/no-explicit-any baseline advisory as warn, then promotes clean zones such as src/lib/**/*.{ts,tsx} to error so strict rules can expand without breaking legacy areas.
  • e2e-smoke — Playwright behavioral project (smoke.spec.ts, shortcuts.spec.ts, chat.spec.ts, diagnostics.spec.ts, and broker-mode.spec.ts) on Chromium.

For sk-rust/** changes, sk CI also runs Rust formatting, strict Clippy, tests across Ubuntu, Windows, and macOS, and a blocking startup benchmark regression gate. sk-rust/clippy.toml defines advisory complexity thresholds (cognitive-complexity-threshold, too-many-lines-threshold, too-many-arguments-threshold). The Complexity advisory (Rust Clippy) step runs before the strict Clippy gate with continue-on-error: true, so complexity warnings are measured before any future enforcement change. The startup benchmark gate runs benchmark.py startup --baseline-file .benchmarks/sk-startup-baseline.json --regression-threshold 20; the first run creates and uploads the baseline, and later runs fail only when median startup time exceeds the cached baseline by more than 20% and a 5ms absolute floor. The RustSec dependency audit (advisory) job installs cargo-audit in a blocking setup step, then runs cargo audit --file Cargo.lock with continue-on-error: true; the YAML TODO records the future path to remove advisory mode and make Rust dependency advisories blocking after the baseline is clean.

Playwright visual snapshot E2E is manual-dispatch only in e2e-visual because screenshot output differs across platforms. The always-on e2e-smoke job runs only the stable behavioral project and excludes visual.spec.ts.

Automation Surfaces

  • Trend Scout is scheduled/manual (trend-scout.yml or explicit CLI runs) — NOT bound to preToolUse/postToolUse hooks (avoid output spam during sessions). Multi-lane discovery (lanes[] config) and --explain are CLI/workflow-only features.
  • Sync browse diagnostics are read-only: /healthz advertises /api/sync/status; /api/sync/status reports local queue/failure/config/cursor state only.
  • Cron DB maintenance (cron-tasks.py templates wal-checkpoint and vacuum) — optional scheduled tasks that run PRAGMA wal_checkpoint(TRUNCATE) daily and VACUUM + PRAGMA quick_check weekly on knowledge.db; busy/missing states return a structured dict without raising; last_run_at is not advanced on busy so the task retries automatically.
  • Retrospective (retro.py) aggregates knowledge health, skill/tentacle outcomes, hook audit decisions, and git history into a composite operator score. The browse server exposes GET /api/retro/summary (defaults to ?mode=repo; pass ?mode=local for full multi-source data) and a minimal HTML page at /retro. The retro.yml workflow is workflow_dispatch-only; it runs retro.py --mode repo --json, writes a markdown summary artifact (including confidence, distortion flags, accuracy notes, and improvement actions when present), and appends to the job summary. No issues, commits, or DB writes are created. A collapsible RetroSection panel in the browse insights dashboard consumes /api/retro/summary and renders the richer explanation fields (score_confidence, distortion_flags, accuracy_notes, improvement_actions, scout, toward_100) when present, failing gracefully when absent or when the API is unavailable. Local mode (?mode=local) includes all sections (including behavior) but may emit score_confidence=low and distortion flags (e.g. hook_deny_dry_noise, skills_unverified) that indicate the score should be treated as a rough signal only; repo mode scores are typically cleaner but cover git signals only. The optional top-level scout object provides read-only Trend Scout coverage health without affecting the composite score. The optional top-level toward_100 array is an additive diagnostic: a list of sections scoring below 100, each with section, score, gap (100 − score), and metric-derived barriers; it explains the current gap but does not change the score formula or any subscore. benchmark.py stores commit-keyed snapshots and exposes retro_gap/health_gap gap-to-target fields in compare output so measurable progress is explicit.