Architecture & Conventions

Canonical reference for the copilot-session-knowledge architecture, data pipeline, and coding conventions.

This repo is a set of standalone Python CLI scripts — not a package or library. Each script is independently runnable and normally duplicates its own constants. Documented helper exceptions such as _tentacle_core.py, _tentacle_goal.py, _tentacle_pr.py, _tentacle_dispatch.py, and _tentacle_review.py are allowed only when they preserve an existing CLI/module contract and are listed in the script inventory. This keeps the default goal operator-first simplicity, not framework cohesion.

sk thin front door: sk.py is a thin dispatcher that maps memorable sub-commands (sk briefing, sk query, sk index build, …) to the underlying standalone scripts. It adds nothing to the data pipeline or business logic — it is purely a routing layer. After the standard install (install.py --test), a managed cross-platform sk launcher is provisioned automatically on your PATH — no manual alias or pip install needed. Windows PowerShell users without a PATH update can invoke python ~/.copilot/tools/sk.py directly as an equivalent fallback. All standalone scripts remain directly invocable as a fallback or for advanced use. The sk front door does not change the architectural contract below. When SK_HARNESS=1, dispatches are wrapped by the harness/ middleware package (pre/post hooks, timing telemetry, dry-run). See docs/HARNESS.md for API reference.

Data Pipeline

Session files (.md / .jsonl)
        │
        ▼
build-session-index.py  ──→  SQLite FTS5 (knowledge.db)
        │                             │
        ▼                             ▼
extract-knowledge.py  ──→  7 knowledge categories
        │                   (mistake, pattern, decision,
        │                    tool, feature, refactor, discovery)
        │                   + knowledge_relations
        │                     (SAME_SESSION†, SAME_TOPIC†, TAG_OVERLAP†,
        │                      RESOLVED_BY†, SEMANTIC_PROXIMITY*)
        ▼
query-session.py / briefing.py / mcp-server.py  ──→  Search, recall, MCP tools
        │
        ▼
watch-sessions.py  ──→  Incremental re-indexing (adaptive polling)

SEMANTIC_PROXIMITY is populated by the native Rust TF-IDF cosine implementation (watch.rs/tfidf.rs). extract-knowledge.py --semantic-only is available for manual/fallback use only — sk watch (Rust binary) never auto-spawns Python.

†Deterministic relations (SAME_SESSION, SAME_TOPIC, TAG_OVERLAP, RESOLVED_BY) and residual helpers (backfill_affected_files, infer_task_ids, confidence decay) are extracted natively by the Rust binary (sk-rust/src/index/extract.rs). extract-knowledge.py is an intentional manual operator tool — NOT auto-called by sk watch in the default Rust build.

Phases:

build-session-index.py — Phase 1 (session metadata) + Phase 2 (event content) via providers/ → SQLite FTS5 (schema v8; current migration level v17)
extract-knowledge.py — classifies into 7 types, deduplicates by content hash; category-aware confidence floors (pattern=0.5, others=0.4); recurrence reward (+0.03 per upsert, capped). Intentional manual operator tool — sk watch (Rust binary) runs all hot-path classification, relation extraction, and semantic proximity natively; this script is named in sk watch recovery hints and is NOT auto-called on the native watch path.
query-session.py / briefing.py / mcp-server.py — BM25 keyword search + optional semantic vector search (RRF blend) exposed via CLI and MCP
watch-sessions.py / sk watch — adaptive polling (5 s / 30 s / 300 s tiers), auto re-indexes on file changes. sk watch Rust binary: handles session indexing, extract, relations, semantic proximity, and first-run DB bootstrap natively; never spawns Python. On DB or extract failure, emits a structured recovery hint naming the exact manual command.
learn.py — manual knowledge entry; CLI interface for agents to record learnings during a session

Intentional Python Boundaries

sk watch (Rust binary) never spawns Python — including on error paths. The following Python surfaces are permanent intentional architecture, not residual code pending deletion:

Python surface	Role	Status	Tested by
`sk.py` shim	Thin launcher/dispatcher for non-binary installs; routes all `sk <cmd>` calls	Intentional — the no-binary install contract	`tests/test_py_rust_boundary.py::test_python_shim_dispatch_without_rust_binary`; `tests/test_py_rust_boundary.py::test_project_list_json_matches_python_shim_when_rust_available`
`hook_runner.py`	Python hook runner for `sk.py` shim and non-binary installs; owns all managed hook events when no Rust binary is present	Intentional — Python shim hook entry point	`tests/test_py_rust_boundary.py::test_hook_runner_empty_payload_matches_native_exit_when_rust_available`
`build-session-index.py`	Indexes session files → FTS5 DB; named in `sk watch` DB-failure recovery hints	Intentional — manual operator recovery tool	`tests/test_py_rust_boundary.py::test_watch_db_failure_recovery_hint_no_python_spawn_when_rust_available`
`extract-knowledge.py`	Knowledge classification, relation extraction, `--semantic-only` fallback; named in `sk watch` extract-failure recovery hints	Intentional — manual/fallback operator tool	`tests/test_py_rust_boundary.py::test_watch_db_failure_recovery_hint_no_python_spawn_when_rust_available`
`migrate.py`	Versioned schema migrations via `schema_version` table	Intentional — canonical schema upgrade owner; Rust native bootstrap does NOT replace this	`tests/test_py_rust_boundary.py::test_migrate_help_remains_manual_python_surface`
`sync-daemon.py`	Push/pull sync runtime for Python `sk.py` shim and non-binary installs	Intentional — shim sync path	`tests/test_py_rust_boundary.py::test_python_shim_sync_run_dispatches_sync_daemon_without_rust_binary`
`briefing.py`, `learn.py`, `query-session.py`, `project-registry.py`, etc.	Admin/operator CLI scripts	Intentional — these are the primary Python CLI surface; the Rust binary may bypass Python for measured hot read-only subcommands such as `sk project list`, while the direct scripts and mutating fallbacks remain supported	`tests/test_py_rust_boundary.py::test_python_shim_dispatch_without_rust_binary`; `tests/test_py_rust_boundary.py::test_project_list_json_matches_python_shim_when_rust_available`

What wave20 removed: auto-spawning Python subprocess on sk watch error paths. The scripts above remain on disk, are invoked by operators manually, and are referenced by name in sk watch recovery hints. None of these scripts are candidates for deletion as a consequence of wave20.

Native Rust Browse Importers

sk-rust/src/browse/importers/ contains a native Rust parity port for the debug-log importers documented in docs/DEBUG-LOG-CONTRACT.md: VS Code Agent Debug Log JSONL and OTel ReadableSpan JSONL. The port includes shared path-safety, bounded-line reading, SHA-1 synthetic span IDs, dedup hashing, and redaction helpers that mirror the production Python importers.

This is parser-only infrastructure. It has no Browse DB persistence and no CLI subcommand yet, so the Python modules remain the production callable importer surface. The Rust port is kept compiled and tested in sk-rust so a later operator command can wire it in without changing the data contract.

Python Lint Surface Inventory

This is the canonical inventory for Python Ruff coverage. Keep it in sync with .github/workflows/ci.yml, hooks/pre-commit, and tests/test_quality_gates.py.

Surface	CI Ruff lint/format	Local `pre-commit` Ruff	Notes
Root standalone scripts	`*.py`	Any staged root `.py` path via `in_python_cleanliness_surface()`	Every root-level Python entry point is inside the blocking Ruff baseline.
Package/module directories	`browse/`, `hooks/`, `scripts/`	Any staged `.py` path under `browse/`, `hooks/`, or `scripts/`	Directory coverage applies to package-like surfaces where Ruff cleanup is already baselined.
Syntax gate	`python3 scripts/check_syntax.py`	All staged `.py` files via `scripts/check_syntax.py` when installed	Syntax coverage is broader than Ruff coverage.
Complexity advisory	Full suite path through `run_all_tests.py`; local hook runs `scripts/check_complexity.py --json` on staged `.py` snapshots	All staged `.py` files, non-blocking and fail-open	Advisory only; it prints findings but does not deny commits.
Ruff complexity/refactor advisory	`Complexity advisory (Ruff C90/PLR)` runs `ruff check --select C90,PLR0911,PLR0912,PLR0913,PLR0915 --statistics` on the same Ruff surface	Not run by local `pre-commit`	CI advisory only; `continue-on-error: true` records baseline counts before enforcement.
Out-of-surface Ruff advisory	Not run by CI	Staged `.py` files outside root scripts and covered package directories run `ruff check` as a non-blocking `[advisory]` scan	Advisory only; it is fail-open when Ruff is absent and does not change the blocking Ruff surface.

Root *.py files are now inside the blocking Ruff lint surface by default. New root scripts still must honor the standalone-script architecture contract: keep them self-contained and run syntax/tests for the touched behavior. Python files outside root scripts, browse/, hooks/, and scripts/ are outside CI Ruff coverage; the local pre-commit hook may print non-blocking [advisory] Ruff findings for those files when Ruff is installed.

Script Inventory

Script	Role
`build-session-index.py`	Indexes session files → FTS5 DB
`extract-knowledge.py`	Classifies + deduplicates knowledge entries; category-aware confidence floors; recurrence reward
`query-session.py`	FTS5 + semantic search; JSON/markdown export
`briefing.py`	Task-scoped recall; context packs for agent injection
`mcp-server.py`	Read-only MCP stdio JSON-RPC surface for `briefing` and `query_session`
`watch-sessions.py`	File watcher; triggers incremental re-indexing
`learn.py`	Manual knowledge entry
`tentacle.py`	Multi-agent orchestration (create → todo → bundle → swarm → complete) + orchestrator goal loop (`goal init/validate/status/dispatch/link/eval/resume/criteria/gate/budget/next-iter/verify-loop/coverage/context`). `goal.json` keeps both the legacy flat `tentacles` list and a structured per-iteration `iterations` map so status/history queries can answer which tentacles belonged to each iteration without breaking older readers. Goal writes are serialized with a `goal.json.lock` sidecar (`O_CREAT
`_tentacle_core.py`	Low-level `tentacle.py` helper module for core path constants, file-locking helpers, git-root discovery, tentacle directory resolution, Windows filesystem retries, PID liveness checks, and todo parsing/rendering. `tentacle.py` re-exports these symbols to preserve the long-standing CLI/module contract while later extraction waves split higher-level seams.
`_tentacle_goal.py`	Extracted `tentacle.py` goal subsystem: goal state locking, iteration/budget helpers, goal lifecycle commands, dispatch planning, continuation context, verify-loop, coverage, loop, and resilience-status behavior. `tentacle.py` injects the remaining orchestration-owned helpers at import time and re-exports goal symbols so existing CLI/module callers keep working.
`_tentacle_pr.py`	Extracted `tentacle.py` PR automation seam for collecting tentacle handoffs/verifications, generating commit messages and PR bodies, and implementing `sk tentacle pr`. `tentacle.py` injects patchable runtime wrappers and re-exports PR symbols so tests and module callers that patch `tentacle._pr_run_subprocess_safe` keep working.
`_tentacle_dispatch.py`	Extracted `tentacle.py` dispatch/profile/recall seam for runtime bundle materialization, live briefing recall packs, specialist agent profiles, prompt sizing, `resume`, `swarm`/`dispatch`, `next-step`, and `bundle`. `tentacle.py` injects patchable runtime wrappers and re-exports dispatch symbols so existing tests/callers that patch `tentacle.*` keep working.
`_tentacle_review.py`	Extracted `tentacle.py` fresh-context reviewer and review-loop seam for reviewer bundles/findings, review-loop verification classification, blocker-resolver tentacle creation, `dispatch-reviewer`, and `review-loop`. `tentacle.py` injects patchable runtime wrappers and re-exports review symbols so existing tests/callers that patch `tentacle.*` keep working.
`embed.py`	Optional semantic search via embedding APIs (OpenAI, Fireworks, etc.) with TF-IDF fallback
`claude-adapter.py`	Parses Claude Code JSONL sessions into the common DB format
`sync-knowledge.py`	Merges `knowledge.db` files across environments (Windows ↔ WSL); MAX confidence semantics
`sync-config.py`	Single `connection_string` config; `--setup`, `--setup-env`, `--status --json`
`sync-daemon.py`	Local-first push/pull runtime; backlog-aware adaptive limits and automatic sync queue compaction
`sync-status.py`	Local sync diagnostics; `--health-check`, `--audit`, `--json`
`auto-update-tools.py`	Smart git-diff–based update pipeline; `sk-update` alias
`migrate.py`	Versioned schema migrations via `schema_version` table
`install.py`	Deploy skills/hooks; inject global AI instructions
`setup-project.py`	Full project onboarding: skills + hooks + WORKFLOW.md
`project-registry.py`	`sk project add/remove/list` — manage the persistent project registry (`tools-managed-projects.json`); Rust `sk project list` is a measured native read-only hot path, while `add`/`remove` and direct-script use stay on this Python owner
`host_manifest.py`	Single source of truth for supported hosts + their filesystem paths
`index-status.py`	Row counts, FTS integrity, event-offset coverage
`knowledge-health.py`	Knowledge base health + recall telemetry
`error-analysis.py`	On-demand error pattern analysis: type distribution, severity, recurrence, root causes
`benchmark.py`	Commit-keyed benchmark ledger for retro + health snapshots
`checkpoint-save.py`	Save named checkpoint
`checkpoint-restore.py`	List/restore checkpoints
`checkpoint-diff.py`	Diff two checkpoints
`browse.py`	Local web UI (127.0.0.1, token auth) with read-only diagnostics plus the authenticated `/chat` operator console
`project-context.py`	Deterministic project-context.md generator
`codebase-map.py`	Repo structure snapshot (auto-refreshed at session start)
`trend-scout.py`	GitHub repo discovery via multi-lane search
`copilot-cli-healer.py`	Repairs stale Copilot CLI package state

Browse UI Operator Console (`/chat`)

The browse UI exposes a browser-managed Copilot CLI execution console at /chat. It is the only browse surface that actively launches Copilot CLI; the rest of the UI remains read-only diagnostics and search.

Route prefix note: The Python browse server (browse/core/server.py) and the Firebase-hosted deployment now both serve the Next.js app at root (/*, e.g. /chat, /settings). Compatibility redirects from /v2/* → /* remain for old bookmarks and deep links.

Components

Component	Role
`browse/core/operator_console.py`	Secure execution/persistence adapter. Starts Copilot CLI runs, normalizes event streams, and persists operator state under `~/.copilot/session-state/operator-console/`.
`browse/api/operator.py`	Authenticated REST + SSE surface for session CRUD, prompt submission, run status/history, path suggestions, previews, and diffs.
`browse-ui/src/app/chat/`	Next.js route wrapper for the `/chat` operator console.
`browse-ui/src/components/chat/`	`ChatShell`, `Transcript`, `Composer`, `SessionCreateDialog`, `MetadataBar`, file review components, and CLI session adoption UX (`CliSessionPicker`, `CliAdoptedBadge`, `ConfirmAdoptionPanel`).
`browse-ui/src/components/chat/cli-session-picker.tsx`	Lists real CLI sessions and drives the adopt/confirm flow; composer is disabled until `confirmed_at` is set.
`browse-ui/src/lib/api/{types,schemas,hooks}.ts`	Stable frontend contract layer for `/api/operator/*`.

Browse-wide host state

All pages share a single host context. The two source-of-truth files are:

File	Role
`browse-ui/src/providers/host-provider.tsx`	`HostProvider` React context — mounted at the root layout; exposes `{ host, diagnosticsEnabled }` to every page via `useHostState()`. Listens to cross-tab `storage` events and same-tab `BROWSE_HOST_CHANGE_EVENT` to stay current without a reload.
`browse-ui/src/lib/host-profiles.ts`	localStorage persistence helpers (`saveHostProfile`, `deleteHostProfile`, `setSelectedHostId`, `getEffectiveHost`, etc.) and the immutable `LOCAL_HOST` sentinel. All mutating helpers dispatch `BROWSE_HOST_CHANGE_EVENT` after writing so `HostProvider` re-evaluates immediately.

Active host resolution order (documented in getEffectiveHost()):

Explicit selection (browse_selected_host_id in localStorage), if the referenced profile still exists.
First remote profile with is_default === true.
LOCAL_HOST sentinel — same-origin, no bearer token required.

The header renders a compact AWS-region-style global host dropdown that calls setSelectedHostId() on selection and links to /settings#hosts for management. The Settings page exposes the full HostManagement surface (list, add, remove, set-default, restore-local). SessionCreateDialog at /chat reads useHostState() to pre-populate the host picker when the dialog opens.

API surface (`/api/operator/*`)

POST /api/operator/sessions                  → create session
GET  /api/operator/sessions                  → list sessions
GET  /api/operator/sessions/{id}             → session detail
PATCH /api/operator/sessions/{id}            → update session mutable fields
POST /api/operator/sessions/{id}/prompt      → submit prompt → {run_id}
GET  /api/operator/sessions/{id}/stream      → SSE run output
GET  /api/operator/sessions/{id}/status      → session + active run status (?run=<run_id>)
GET  /api/operator/sessions/{id}/runs        → persisted run history
POST /api/operator/sessions/{id}/delete      → delete session
POST /api/operator/sessions/adopt            → adopt CLI session → operator session (201|200|409)
POST /api/operator/sessions/{id}/confirm     → confirm adopted session → enable resume
GET  /api/operator/suggest                   → path/workspace suggestions under ~/
GET  /api/operator/preview                   → file preview under ~/
GET  /api/operator/diff                      → unified diff for two files under ~/
GET  /api/operator/browsers                  → read-only scan of allowlisted local browser candidates
GET  /api/operator/cli-sessions              → list real CLI sessions (Bearer/cookie; debug=True; read-only)
GET  /api/operator/cli-sessions/{id}         → single CLI session by UUID (Bearer/cookie; debug=True; read-only)
GET  /api/session/{id}/debug-log             → paginated session-scoped debug log; preserves CLI event hierarchy via span_id/parent_span_id (CLI id/parentId → SHA-1 16-hex), derives duration_ms for paired start/completion events; limit 100 (Bearer/cookie; debug=True; read-only)
GET  /api/sessions/{id}/debug-log            → plural alias for the route above

Guardrails

Workspace confinement: every workspace or file path is normalized with confine_path() and rejected unless it resolves under Path.home().
Token auth: all /api/operator/* routes require the same per-launch browse token as the rest of the UI.
Prompt cap: prompts longer than 4096 characters are rejected.
Path cap: oversized path inputs are rejected before filesystem access.
Separate persistence: operator run history is stored under ~/.copilot/session-state/operator-console/ and replayed from disk on reload.
Same Copilot policy surface: operator-console runs still inherit the installed Copilot CLI's hooks, custom instructions, and permission system. Browser mediation does not bypass briefing, tentacle, or other active policy gates.

CLI Session Adoption / Two-ID Model

The operator console supports resuming an existing Copilot CLI session from the browser via the From CLI history picker. Two distinct identifiers are always in play and must never be aliased:

ID	Where it lives	Purpose
Operator session ID	Browse/backend route segment; `operator-console/<id>/` on disk	Identifies the operator-side session; used in every `/api/operator/sessions/<id>/*` URL
CLI session UUID	Backend field `resume_target` (UUID4); never surfaced as a route key	Identifies the real Copilot CLI session; passed to `copilot` as `--resume=<cli_uuid>`

Lifecycle: discover -> adopt -> confirm -> prompt/resume

discover — GET /api/operator/cli-sessions reads real CLI session artifacts under ~/.copilot/session-state/ (read-only, path-confined, Bearer/cookie auth, debug=True).
adopt — POST /api/operator/sessions/adopt creates an operator session with source="cli_adopt", stores the CLI UUID in resume_target, and returns confirmed_at=None. The returned id is the new operator session ID.
confirm — POST /api/operator/sessions/<operator_id>/confirm sets confirmed_at and resume_ready=True. Prompts are blocked until this step completes.
prompt/resume — POST /api/operator/sessions/<operator_id>/prompt launches the CLI subprocess with --resume=<cli_uuid> (from resume_target). --name is never passed for adopted sessions. The operator session ID never appears in the CLI argv.

Guardrails specific to adoption

resume_target is validated as UUID4 at adopt time; malformed values are rejected.
The confirmation gate in operator_console.py · start_run() returns None for any session where confirmed_at is not set; the UI disables the composer until confirmation.
CLI tree discovery is read-only and path-confined; file hashes and workspace.yaml mtime are unchanged after discovery (tests/test_browse_chat_resume.py CR9).
Child subprocess env is filtered by _ENV_ALLOWLIST; test state env vars do not leak (CR8).

Stale / missing CLI session recovery

If the CLI session referenced by resume_target no longer exists on disk:

The GET /api/operator/cli-sessions/{uuid} probe returns 404.
CliSessionPicker (browse-ui) shows a warning badge on stale entries.
POST /api/operator/sessions/{id}/confirm calls get_cli_session_by_id and returns CLI_SESSION_NOT_FOUND / 404 when the CLI UUID is absent; there is no supported confirm-with-replacement-workspace fallback. Delete the unconfirmed operator session and adopt a different CLI session or start a fresh operator chat.
Deleting an operator session never touches the CLI session tree.

Duplicate adoption

POST /api/operator/sessions/adopt for a CLI UUID that has already been adopted returns one of two responses depending on confirmation state:

HTTP 200 — duplicate exists but is still unconfirmed; response body includes the existing operator session object (idempotent re-adopt).
HTTP 409 (ALREADY_ADOPTED) — duplicate is already confirmed; response is error-only with no session object. To find the existing session, use GET /api/operator/sessions.

Components added by CLI adoption UX

Component	Role
`browse-ui/src/components/chat/cli-session-picker.tsx`	Picker that lists real CLI sessions from `/api/operator/cli-sessions` and triggers the adopt flow
`CliAdoptedBadge` (in `cli-session-picker.tsx`)	Badge shown when a session was adopted from CLI history
`ConfirmAdoptionPanel` (in `cli-session-picker.tsx` / `chat-shell.tsx`)	Workspace/add_dirs confirmation step before the composer is enabled

Operator runbook for the adopt/confirm flow: docs/OPERATOR-PLAYBOOK.md — Chat Resume / CLI Session Adoption

Compatibility with watch-sessions and auto-update

watch-sessions.py still tracks normal Copilot session artifacts under ~/.copilot/session-state/; the operator console reads its own persisted run history directly from operator-console/.
auto-update-tools.py can restart watch-sessions.py, but it does not restart the browse server or interfere with active operator runs.
UI-only browse-ui/src/ updates require regenerating the local browse-ui/dist/ artifact before the running browse server can serve them; Python changes to browse/api/operator.py or browse/core/operator_console.py still require a manual browse server restart.

Remote access

Two deployment modes are supported:

Mode 1 — Same-origin (Cloudflare Tunnel, default): The Python browse server serves both the static UI and the API behind a single Cloudflare Tunnel URL (e.g. browse.example.com). All API calls are same-origin; no CORS configuration needed. Key code-level constraint: browse/core/auth.py · check_origin() builds the expected CSRF origin as http://{Host}. Behind HTTPS the browser sends Origin: https://… — these do not match, causing POST mutations to return 403. Fix check_origin to accept https:// (check X-Forwarded-Proto) before enabling full remote operator console access.

Mode 2 — Firebase Hosting control plane: The static browse-ui is deployed to Firebase Hosting (operator's chosen custom domain). The operator's browse.py server is reached via a Cloudflare Tunnel. All API calls from the Firebase-hosted UI to the tunnel are cross-origin; the operator host exposes an explicit CORS allowlist, Bearer auth, and a GET /api/operator/capabilities endpoint. Host profiles let the UI target different operator machines. See docs/OPERATOR-PLAYBOOK.md — Firebase-hosted control plane for the full topology and manual console steps.

Full remote-access setup, Cloudflare Access guidance, and Firebase topology: docs/OPERATOR-PLAYBOOK.md

Hosted shell specs (localhost bootstrap, relay, version negotiation, token storage, stream reconnect, first-run UX): see docs/HOSTED-SHELL-ARCHITECTURE.md and docs/HOSTED-SHELL-RESEARCH.md.

Enforcement Hooks

Hooks live in hooks/ and are deployed to ~/.copilot/hooks/ (Copilot CLI only).

Unified runner: hook_runner.py dispatches all hook events (1 Python process per event)
Supported events: sessionStart, sessionEnd, preToolUse, postToolUse, agentStop, subagentStop, errorOccurred
Fail-open: rule errors never block the agent
HMAC-signed markers: tamper-resistant counter state
Audit log: ~/.copilot/markers/audit.jsonl

Full hook architecture, rule inventory, and dispatched-subagent git guard: docs/HOOKS.md

Central Database

~/.copilot/session-state/knowledge.db — SQLite with FTS5, WAL journal mode, and optional vector embeddings.

Schema versions: v1–v6 (legacy) → v7 (two-phase indexing + event_offsets) → v8 (sessions_fts contentless FTS5 + BM25) → v9–v14 (eval, provenance, recall, sync, benchmark) → v15 (confidence_backfill_wave3: raises pattern confidence floor to 0.5 and applies recurrence reward to existing entries) → v16 (error lifecycle: error_type, root_cause, severity, is_resolved, fix_steps, prevention_hook, recurrence_after_briefing on knowledge_entries) → v17 (briefing_deliveries table for tracking which entries were briefed to each session). Run python3 ~/.copilot/tools/migrate.py to upgrade.

`providers/` Package

SessionProvider ABC defines iter_sessions() and iter_events_with_offset().

CopilotProvider — handles .md session checkpoints
ClaudeProvider — handles JSONL with real byte-offset seeks for Phase 2

Tentacle Workspace

.octogent/ stores local tentacle state and is gitignored in this repo.
Runtime-bundle workflow: create → todo add → bundle (optional) → swarm → complete.

complete accepts an optional --auto-verify <cmd> flag (fail-open): runs the command, persists the result as verification evidence before closing. Use --auto-verify-timeout <seconds> (default: 120 s) if the command is long-running.

Sub-agents must write a structured handoff before stopping:

tentacle.py handoff <name> "<summary>" --status DONE --changed-file <path> [--changed-file ...] --learn

--status must be one of DONE, BLOCKED, TOO_BIG, AMBIGUOUS, or REGRESSED. Include one --changed-file receipt per modified file so the orchestrator can verify the handoff trail.

Quota-blocked handoffs — when a tentacle is BLOCKED due to a quota or rate-limit signal, add machine-readable metadata:

tentacle.py handoff <name> "<summary>" --status BLOCKED \
  --quota-reason rate_limit \
  --retry-hint 2026-05-14T00:00:00Z

--quota-reason is a short token (rate_limit, quota_exceeded, daily_quota, monthly_quota, token_quota, context_limit). --retry-hint is an optional ISO timestamp or human-readable hint. cmd_complete persists these fields into meta.json["quota_reason"] / meta.json["retry_hint"] and appends an entry to goal.json["quota_retry_queue"] for orchestrator tracking. Old BLOCKED handoffs without quota metadata are fully backward compatible.

_classify_quota_signal(text) is available to classify raw dispatch output into a quota_reason token. The pattern list is intentionally minimal pending the fuller failure-mode matrix (#183).

tentacle.py marker-cleanup (dry-run by default, --apply to act) inspects and removes stale entries from the dispatched-subagent marker without completing a tentacle. Only entries whose per-entry timestamp exceeds the declared TTL are eligible; live entries are never touched.

Bundle artifacts

Each bundle directory contains a manifest.json listing all artifacts. The goal_context artifact is optional — it is only present when the tentacle is linked to a goal:

Artifact key	File	When present
`briefing`	`briefing.md`	Always (placeholder when empty)
`instructions`	`instructions.md`	Always
`session_metadata`	`session-metadata.md`	Always
`recall_pack`	`recall-pack.json`	Always (empty when no matches)
`goal_context`	`goal-context.md`	Only when tentacle is linked to a goal

The goal_context artifact is the output of _goal_render_continuation_context: a compact markdown block with objective, iteration counter, budget limits, criteria progress, remaining criteria IDs + descriptions, and the last N prior handoff summaries. Sub-agents must read goal-context.md (when manifest.json lists it as populated: true) to understand the overarching goal before making changes.

Python-only implementation — _goal_render_continuation_context, _goal_write_context_artifact, and _cmd_goal_context live entirely in _tentacle_goal.py (Python) and are re-exported from tentacle.py. The Rust sk binary routes sk tentacle goal context … to tentacle.py via its standard pass-through mechanism — no Rust changes are required when adding or modifying goal context behavior. The bundle injection path (_build_runtime_bundle) similarly calls the Python renderer directly; the Rust binary never constructs the goal-context.md content itself.

Full tentacle workflow reference: docs/USAGE.md

Host Scope

Tools are validated on Copilot CLI and Claude Code only.

Feature	Copilot CLI	Claude Code
Skill deployment	✅ `.github/skills/`	✅ `.claude/skills/`
Hook deployment	✅ `.copilot/hooks/`	❌ not supported
Global instruction injection	✅ `~/.github/copilot-instructions.md`	via CLAUDE.md
Session indexing	✅	✅ via `claude-adapter.py`

host_manifest.py is the single authoritative source for supported hosts and their filesystem paths. Do not add Codex, Cursor, or other hosts without documented session and hook formats.

Full skill deployment and host scope details: docs/SKILLS.md

Sync Architecture

Sync is local-first: knowledge.db is the authoritative read/query source; remote sync is optional transport/storage only.

Single config key: connection_string in ~/.copilot/tools/sync-config.json
sync-config.py --setup accepts HTTP(S) gateway URLs only (not raw Postgres/libSQL DSNs)
sync-gateway.py is reference/mock only — not a production authority
Default provider recommendation: Neon (backing Postgres) + Railway (thin gateway host)
Missing connection_string → daemon stays local-only/idle (not a fatal error)

Full sync diagnostics reference: docs/USAGE.md

Conventions

These conventions apply to all scripts in this repo. Follow them in every change.

Language & Runtime

Pure stdlib Python 3.10+ — zero pip dependencies required. scikit-learn and embedding API keys are optional.
Every script is standalone — no shared library or package imports between scripts. Each script duplicates its own DB path constants, encoding fix, etc.
Windows encoding fix — every script starts with the same os.name == "nt" block to reconfigure stdout/stderr to UTF-8. Preserve this pattern in every new script.

SQL Safety

Parameterized SQL only — all user input uses ? placeholders. Never interpolate strings into SQL.
FTS5 query sanitization — strip FTS5 operators (OR, AND, NOT, NEAR, *, ") before passing to MATCH. See _sanitize_fts_query() in query-session.py.

Serialization & Locking

JSON serialization only — never use pickle. Legacy pickle detection exists but new code must use JSON / struct.pack.
Atomic lock files — use O_CREAT | O_EXCL for process locks (no TOCTOU races).

Input Limits

Title ≤ 200 chars
Content ≤ 10 K chars
FTS queries ≤ 500 chars
Paths ≤ 256 chars

Cross-Platform Paths

Use Path.home() and pathlib throughout. Handle WSL path differences explicitly.

JSON Field Envelopes (stable contracts)

query-session.py --task --export json → entries[]
briefing.py --task --json → tagged_entries[] / related_entries[]
briefing.py --pack → entries.<category>[]
snippet_freshness values: fresh | drifted | missing | unknown
related_entry_ids — JSON ints, confidence-ranked, capped to top 3

Project Registry (`tools-managed-projects.json`)

The file ~/.copilot/session-state/tools-managed-projects.json is the persistent registry of projects managed by session-knowledge tools. It is written by install.py, setup-project.py, and project-registry.py.

Schema — backward-compatible mixed format:

{
  "projects": [
    "/legacy/string/path",
    {"name": "myproject", "path": "/richer/dict/path", "created_at": "2025-01-01T00:00:00+00:00"}
  ]
}

Legacy string entries (install.py, setup-project.py): plain path strings. Written by existing scripts; always preserved on any write.
Rich dict entries (project-registry.py): {name, path, created_at}. Written by sk project add. Both formats co-exist in the same file.

All readers (_load_project_registry() in install.py, setup-project.py, and auto-update-tools.py) extract the path string from either format.

project-registry.py is the CLI owner of this file: sk project add|remove|list. The Rust binary handles sk project list natively as a measured read-only hot path and preserves Python fallback/direct-script behavior for add, remove, help, and non-native forms.

DB Migrations

Add new migrations to the MIGRATIONS list in migrate.py with incrementing version numbers and a descriptive name.

Script Guards

if __name__ == "__main__": — migrate.py and generate-summary.py are both guarded; they can be imported without side effects. New scripts that may be imported or tested should follow this pattern.
TOOLS_DIR resolution — root scripts that need a reliable tools-directory path use Path(__file__).resolve().parent. hooks/rules/common.py intentionally keeps the installed-hook Path.home() / ".copilot" / "tools" form (hooks run from ~/.copilot/hooks/, not the source tree).

Docs Quality Gates

Agent-authored docs and operator/research outputs (tentacle handoffs, retro summaries, knowledge-health reports) must follow the four-layer QA rubric defined in docs/AGENT-RULES.md: facts, interpretation, actions, and verification evidence are kept distinct. Contributor-facing docs (CONTRIBUTING.md) use the existing concise tone and are not in scope for this rubric.

CI Quality Gates

GitHub Actions runs these jobs on every push / PR:

quality-gates — syntax check, scoped Ruff lint, and the Python test suites. The Ruff lint surface is: all root *.py scripts plus browse/, hooks/, and scripts/. Ruff lint is scoped to this surface; Python outside root scripts and those directories is not linted by CI.
remote-terminal — npm ci, npm test, lint baseline, clean-zone lint gate, and blocking high-severity dependency audit for remote-terminal/. Legacy complexity/size warnings stay advisory in npm run lint; clean files (pty-daemon.js, test/client.test.js) promote the same rules to errors through npm run lint:clean.
browse-ui — pnpm typecheck, pnpm lint, pnpm format:check, pnpm test, pnpm build. browse-ui/eslint.config.mjs keeps the repo-wide @typescript-eslint/no-explicit-any baseline advisory as warn, then promotes clean zones such as src/lib/**/*.{ts,tsx} to error so strict rules can expand without breaking legacy areas.
e2e-smoke — Playwright behavioral project (smoke.spec.ts, shortcuts.spec.ts, chat.spec.ts, diagnostics.spec.ts, and broker-mode.spec.ts) on Chromium.

For sk-rust/** changes, sk CI also runs Rust formatting, strict Clippy, tests across Ubuntu, Windows, and macOS, and a blocking startup benchmark regression gate. sk-rust/clippy.toml defines advisory complexity thresholds (cognitive-complexity-threshold, too-many-lines-threshold, too-many-arguments-threshold). The Complexity advisory (Rust Clippy) step runs before the strict Clippy gate with continue-on-error: true, so complexity warnings are measured before any future enforcement change. The startup benchmark gate runs benchmark.py startup --baseline-file .benchmarks/sk-startup-baseline.json --regression-threshold 20; the first run creates and uploads the baseline, and later runs fail only when median startup time exceeds the cached baseline by more than 20% and a 5ms absolute floor. The RustSec dependency audit (advisory) job installs cargo-audit in a blocking setup step, then runs cargo audit --file Cargo.lock with continue-on-error: true; the YAML TODO records the future path to remove advisory mode and make Rust dependency advisories blocking after the baseline is clean.

Playwright visual snapshot E2E is manual-dispatch only in e2e-visual because screenshot output differs across platforms. The always-on e2e-smoke job runs only the stable behavioral project and excludes visual.spec.ts.

Automation Surfaces

Trend Scout is scheduled/manual (trend-scout.yml or explicit CLI runs) — NOT bound to preToolUse/postToolUse hooks (avoid output spam during sessions). Multi-lane discovery (lanes[] config) and --explain are CLI/workflow-only features.
Sync browse diagnostics are read-only: /healthz advertises /api/sync/status; /api/sync/status reports local queue/failure/config/cursor state only.
Cron DB maintenance (cron-tasks.py templates wal-checkpoint and vacuum) — optional scheduled tasks that run PRAGMA wal_checkpoint(TRUNCATE) daily and VACUUM + PRAGMA quick_check weekly on knowledge.db; busy/missing states return a structured dict without raising; last_run_at is not advanced on busy so the task retries automatically.
Retrospective (retro.py) aggregates knowledge health, skill/tentacle outcomes, hook audit decisions, and git history into a composite operator score. The browse server exposes GET /api/retro/summary (defaults to ?mode=repo; pass ?mode=local for full multi-source data) and a minimal HTML page at /retro. The retro.yml workflow is workflow_dispatch-only; it runs retro.py --mode repo --json, writes a markdown summary artifact (including confidence, distortion flags, accuracy notes, and improvement actions when present), and appends to the job summary. No issues, commits, or DB writes are created. A collapsible RetroSection panel in the browse insights dashboard consumes /api/retro/summary and renders the richer explanation fields (score_confidence, distortion_flags, accuracy_notes, improvement_actions, scout, toward_100) when present, failing gracefully when absent or when the API is unavailable. Local mode (?mode=local) includes all sections (including behavior) but may emit score_confidence=low and distortion flags (e.g. hook_deny_dry_noise, skills_unverified) that indicate the score should be treated as a rough signal only; repo mode scores are typically cleaner but cover git signals only. The optional top-level scout object provides read-only Trend Scout coverage health without affecting the composite score. The optional top-level toward_100 array is an additive diagnostic: a list of sections scoring below 100, each with section, score, gap (100 − score), and metric-derived barriers; it explains the current gap but does not change the score formula or any subscore. benchmark.py stores commit-keyed snapshots and exposes retro_gap/health_gap gap-to-target fields in compare output so measurable progress is explicit.

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History