From 9679ffcceef409adcafc48ae16d392f26463ffcb Mon Sep 17 00:00:00 2001 From: Brendan McMullen Date: Thu, 2 Jul 2026 11:31:59 -0700 Subject: [PATCH 1/3] Skills: tighten report authoring quality + document GitHub-enriched graph (LLP 0028/0032) Two threads across the plugin skills, mirrored in the claude/ and codex/ workspaces. Docs/skills only; no runtime code paths change. Report authoring quality: two standing rules added to all four report skills (adoption, improvement, security, spend) so detail pages hold the same bar as the summary: - Every section is analysis, not inventory: each argues one claim, opens with its takeaway, ties every number to meaning; cut table narration and standing bookkeeping prose. - No scope apologies: scope/routing rules are authoring guidance, never report copy; state findings plainly and cross-link the sibling report that owns the action. GitHub-enriched graph + routing to it: - hypaware-graph: document server-side GitHub enrichment (Actor/Issue/ PullRequest/Review nodes, enriched Repo/Commit/File, new edges); explain bridge-ready natural keys that converge a session's HEAD commit and a PR's referenced commit on one node; add the git provenance sub-web to the base model (Repo/Commit, in/at); add a graph-vs-messages boundary; document freshness/settling caveats. De-em-dashed per repo code style. - hypaware-query: cross-link to the graph for descriptive who-used-what rollups. - adoption report: per-gateway GitHub reach (repos/PRs work landed in, whether it drew review), dated to graph projection freshness. - improvement report: optional GitHub outcome lens as a proxy sticking-point signal. Refs: LLP 0028 (context-graph enrichment), LLP 0032 (github-llm-graph-bridge). Co-Authored-By: Claude Opus 4.8 (1M context) --- .../hypaware-ai-adoption-report/SKILL.md | 26 +++++- .../hypaware-ai-improvement-report/SKILL.md | 18 +++- .../hypaware-ai-security-report/SKILL.md | 9 ++ .../skills/hypaware-ai-spend-report/SKILL.md | 9 ++ .../claude/skills/hypaware-query/SKILL.md | 4 + .../hypaware-ai-adoption-report/SKILL.md | 26 +++++- .../hypaware-ai-improvement-report/SKILL.md | 18 +++- .../hypaware-ai-security-report/SKILL.md | 9 ++ .../skills/hypaware-ai-spend-report/SKILL.md | 9 ++ .../codex/skills/hypaware-query/SKILL.md | 4 + .../skills/hypaware-graph/SKILL.md | 85 +++++++++++++++---- 11 files changed, 190 insertions(+), 27 deletions(-) diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md index 5c6ab15b..7bb1c0da 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md @@ -26,7 +26,9 @@ ask which one (or more) to profile, then proceed against the chosen source. **Descriptive only** — route every *action* out: "fan out more/less" and tooling go to **hypaware-ai-improvement-report**, token waste to **hypaware-ai-spend-report**. Query mechanics live in the **hypaware-query** skill; reuse hypaware-ai-spend-report's deduped token spine for -any token figure. +any token figure. For descriptive who-used-what rollups (distinct sessions per +repo/model/tool/file), prefer the **hypaware-graph** skill, which reads them from the projected +graph instead of scanning messages; keep token figures on messages. ## Procedure 1. **Scope + coverage.** Distinct `gateway_id` (the unit — `user_id` is ~always null, so don't @@ -34,12 +36,18 @@ any token figure. (`agent_id` / `is_sidechain` / `parent_thread_id` — transcript-enriched, may not survive ingest). Decide cost-capable vs volume-only, and which parallelism dimensions are real vs proxied (the `Task`-call proxy). State N; if it's effectively one gateway / dogfood, say so. + If the source is GitHub-enriched, also record graph coverage and freshness: whether `Repo` / + `PullRequest` / `Review` nodes exist and the max `first_seen` per type. A stale projection + undercounts reach, so state the graph's as-of date and treat every reach figure as a floor. 2. **Per-gateway utilization.** One row per gateway: volume (messages, sessions, active days, first/last seen), tokens + cache-read ratio `cache_read/(cache_read+input)`, then its focus — top models (+ `(unknown)`), tools (Bash dominance + top commands), repos, client (claude/codex), and 2–4 recurring work themes (sampled, redacted). Distill each into a - one-line **focus label**. Use the activity graph for structural focus (Session→Repo/PR via - GitHub enrichment) where projected. + one-line **focus label**. Where the graph is GitHub-enriched, add each gateway's real *reach* + from it: the repos and PRs its work actually landed in (`Session -at-> Commit <-references- + PullRequest`), and whether that work drew review (`... PullRequest <-on- Review <-submitted- + Actor`). This is footprint the messages cannot show. Keep it descriptive (route any "review + more" action to hypaware-ai-improvement-report) and dated to the graph's freshness from step 1. 3. **Parallelism / fan-out.** Adoption (% conversations with ≥1 subagent, incl. the zero bucket); breadth (subagents per parent) and depth (`parent_thread_id` chains); concurrency (do subagent time-spans actually *overlap*, or is it serial?); cost split (token share main @@ -56,9 +64,21 @@ any token figure. split) · **focus label**. Busiest first — this table is the spine. - **Then the detail**, in these sections: **Scope & coverage · Per-gateway utilization · Per-gateway focus · Parallelism & fan-out · Payoff (descriptive) · Fleet view · Caveats**. +- **GitHub reach (where enriched):** in Per-gateway focus, show each gateway's repos and PRs + reached, as of the graph's projection date from step 1 (a stale graph makes reach a floor, not + an upper bound). - **Formatting (human-readable):** open each section with its takeaway; **bold** headline numbers; one sortable footprint table as the centerpiece; keep the bottom line + table a ~1-minute read. +- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is + split) hold the same standard as the main page: each argues one claim, opens with that + takeaway, and ties every number to what it means for the reader. Cut table narration ("how to + read the table") and standing bookkeeping prose; compress source/window/method to a few lines + and fold stat-only content into the table it supports. +- **No scope apologies.** The scope rules above ("descriptive only", what routes to which + report) are authoring guidance, never report copy. Don't write "no recommendations here" or + routing disclaimers in the report; state findings plainly, and where a sibling report owns the + action a plain cross-link is enough. - **Capture-health note:** if subagent provenance doesn't reach the server, the standing #1 caveat is "subagent identity must survive ingest" — run adoption off the sub-agent-invocation proxy (the tool calls that spawn sub-agents) and flag it. diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-improvement-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-improvement-report/SKILL.md index d45a4a4a..0e42956f 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-improvement-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-improvement-report/SKILL.md @@ -16,7 +16,9 @@ and let the user choose which to query: **local logs** (this machine's own recor `query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — list it once). Present the options, ask which one (or more) to run the review against, then proceed against the chosen source. -Query mechanics live in the **hypaware-query** skill; read it first. +Query mechanics live in the **hypaware-query** skill; read it first. For descriptive +who-used-what rollups (distinct sessions per repo/model/tool/file), prefer the **hypaware-graph** +skill: it reads relationships from the projected graph instead of scanning messages. Focus on these signals: @@ -41,7 +43,10 @@ Focus on these signals: - **Sticking points** — where agents got stuck or redid work, ranked by impact: failing tools (`is_error` by `tool_name`), retry loops (same tool + first `tool_args` token ≥3×/session), refusals/truncations (stop-reason), abandoned costly sessions, - repeatedly-violated conventions. + repeatedly-violated conventions. Optional outcome lens where the graph is GitHub-enriched: + work that never landed (sessions whose commits reach no `PullRequest`) or drew heavy review + churn can corroborate a sticking point, but treat it as a proxy, not proof (unmerged or + heavily-reviewed work is not necessarily bad). - **Inefficiency** (reuse the spend spine) — model over-spec, context bloat (no `is_compact_summary`), low cache-read (`cache_read/(cache_read+input)`), redo loops. 3. **Collect, dedup, prioritize.** Gather the candidates into a list of possible @@ -75,3 +80,12 @@ Focus on these signals: - **Formatting (human-readable):** every candidate states its evidence inline; show AGENTS.md edits as real diffs/code blocks, not prose; **bold** the artifact type; keep it a ~1-minute read. +- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is + split) hold the same standard as the main page: each argues one claim, opens with that + takeaway, and ties every number to what it means for the reader. Cut table narration ("how to + read the table") and standing bookkeeping prose; compress source/window/method to a few lines + and fold stat-only content into the table it supports. +- **No scope apologies.** Scope rules (what routes to which report) are authoring guidance, + never report copy. Don't write "no recommendations here" or routing disclaimers in the report; + state findings plainly, and where a sibling report owns the action a plain cross-link is + enough. diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-security-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-security-report/SKILL.md index 4eafe2fb..f5946280 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-security-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-security-report/SKILL.md @@ -61,5 +61,14 @@ than the same command in a gated session. · Guardrails · Trends · Capture caveats**. **All examples redacted.** - **Formatting (human-readable):** open each section with its takeaway; **bold** severities; one severity-ranked findings table as the centerpiece; keep the bottom line + table a ~1-minute read. +- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is + split) hold the same standard as the main page: each argues one claim, opens with that + takeaway, and ties every number to what it means for the reader. Cut table narration ("how to + read the table") and standing bookkeeping prose; compress source/window/method to a few lines + and fold stat-only content into the table it supports. +- **No scope apologies.** Scope rules (what routes to which report) are authoring guidance, + never report copy. Don't write "no recommendations here" or routing disclaimers in the report; + state findings plainly, and where a sibling report owns the action a plain cross-link is + enough. - **Capture-health note:** if raw credentials appear in `tool_args`, the standing #1 finding is "gateway must redact secrets in tool_args" — put it in Capture caveats without reproducing it. diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-spend-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-spend-report/SKILL.md index b04ffa93..4eaa9335 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-spend-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-spend-report/SKILL.md @@ -73,3 +73,12 @@ SELECT sum(inp) t_in, sum(outp) t_out, sum(cwrite) t_cw, sum(cread) t_cr FROM ms - **Formatting (human-readable):** open each section with its one-sentence takeaway, then the table; **bold** headline numbers; split the four token types (never collapse them); keep the bottom line + ranked table a ~1-minute read. +- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is + split) hold the same standard as the main page: each argues one claim, opens with that + takeaway, and ties every number to what it means for the reader. Cut table narration ("how to + read the table") and standing bookkeeping prose; compress source/window/method to a few lines + and fold stat-only content into the table it supports. +- **No scope apologies.** Scope rules (what routes to which report) are authoring guidance, + never report copy. Don't write "no recommendations here" or routing disclaimers in the report; + state findings plainly, and where a sibling report owns the action a plain cross-link is + enough. diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md index f53b9624..d9a1d654 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md @@ -82,6 +82,10 @@ Claude transcript enrichment adds `provider_uuid`, `parent_uuid`, `request_id`, Run `hyp query schema ai_gateway_messages --format markdown` for the authoritative column reference. +## When the graph answers it cheaper + +Some questions are relationships, not row scans. For descriptive "who used what" rollups (distinct sessions per repo / model / tool / file, or session to commit to PR joins), the **hypaware-graph** skill reads compact `node` / `edge` adjacency instead of scanning `ai_gateway_messages`, and it reaches GitHub facets (repos, PRs, reviewers) that are not in the messages at all. Keep per-message measures here on `ai_gateway_messages`: token sums, `count(*)` call totals, error / stop-reason, and `content_text`. See the hypaware-graph skill for the full boundary. + ## Guardrails - Do not assume the cache auto-refreshes. Query commands default to `--refresh never`. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md index 5c6ab15b..7bb1c0da 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md @@ -26,7 +26,9 @@ ask which one (or more) to profile, then proceed against the chosen source. **Descriptive only** — route every *action* out: "fan out more/less" and tooling go to **hypaware-ai-improvement-report**, token waste to **hypaware-ai-spend-report**. Query mechanics live in the **hypaware-query** skill; reuse hypaware-ai-spend-report's deduped token spine for -any token figure. +any token figure. For descriptive who-used-what rollups (distinct sessions per +repo/model/tool/file), prefer the **hypaware-graph** skill, which reads them from the projected +graph instead of scanning messages; keep token figures on messages. ## Procedure 1. **Scope + coverage.** Distinct `gateway_id` (the unit — `user_id` is ~always null, so don't @@ -34,12 +36,18 @@ any token figure. (`agent_id` / `is_sidechain` / `parent_thread_id` — transcript-enriched, may not survive ingest). Decide cost-capable vs volume-only, and which parallelism dimensions are real vs proxied (the `Task`-call proxy). State N; if it's effectively one gateway / dogfood, say so. + If the source is GitHub-enriched, also record graph coverage and freshness: whether `Repo` / + `PullRequest` / `Review` nodes exist and the max `first_seen` per type. A stale projection + undercounts reach, so state the graph's as-of date and treat every reach figure as a floor. 2. **Per-gateway utilization.** One row per gateway: volume (messages, sessions, active days, first/last seen), tokens + cache-read ratio `cache_read/(cache_read+input)`, then its focus — top models (+ `(unknown)`), tools (Bash dominance + top commands), repos, client (claude/codex), and 2–4 recurring work themes (sampled, redacted). Distill each into a - one-line **focus label**. Use the activity graph for structural focus (Session→Repo/PR via - GitHub enrichment) where projected. + one-line **focus label**. Where the graph is GitHub-enriched, add each gateway's real *reach* + from it: the repos and PRs its work actually landed in (`Session -at-> Commit <-references- + PullRequest`), and whether that work drew review (`... PullRequest <-on- Review <-submitted- + Actor`). This is footprint the messages cannot show. Keep it descriptive (route any "review + more" action to hypaware-ai-improvement-report) and dated to the graph's freshness from step 1. 3. **Parallelism / fan-out.** Adoption (% conversations with ≥1 subagent, incl. the zero bucket); breadth (subagents per parent) and depth (`parent_thread_id` chains); concurrency (do subagent time-spans actually *overlap*, or is it serial?); cost split (token share main @@ -56,9 +64,21 @@ any token figure. split) · **focus label**. Busiest first — this table is the spine. - **Then the detail**, in these sections: **Scope & coverage · Per-gateway utilization · Per-gateway focus · Parallelism & fan-out · Payoff (descriptive) · Fleet view · Caveats**. +- **GitHub reach (where enriched):** in Per-gateway focus, show each gateway's repos and PRs + reached, as of the graph's projection date from step 1 (a stale graph makes reach a floor, not + an upper bound). - **Formatting (human-readable):** open each section with its takeaway; **bold** headline numbers; one sortable footprint table as the centerpiece; keep the bottom line + table a ~1-minute read. +- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is + split) hold the same standard as the main page: each argues one claim, opens with that + takeaway, and ties every number to what it means for the reader. Cut table narration ("how to + read the table") and standing bookkeeping prose; compress source/window/method to a few lines + and fold stat-only content into the table it supports. +- **No scope apologies.** The scope rules above ("descriptive only", what routes to which + report) are authoring guidance, never report copy. Don't write "no recommendations here" or + routing disclaimers in the report; state findings plainly, and where a sibling report owns the + action a plain cross-link is enough. - **Capture-health note:** if subagent provenance doesn't reach the server, the standing #1 caveat is "subagent identity must survive ingest" — run adoption off the sub-agent-invocation proxy (the tool calls that spawn sub-agents) and flag it. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-improvement-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-improvement-report/SKILL.md index d45a4a4a..0e42956f 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-improvement-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-improvement-report/SKILL.md @@ -16,7 +16,9 @@ and let the user choose which to query: **local logs** (this machine's own recor `query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — list it once). Present the options, ask which one (or more) to run the review against, then proceed against the chosen source. -Query mechanics live in the **hypaware-query** skill; read it first. +Query mechanics live in the **hypaware-query** skill; read it first. For descriptive +who-used-what rollups (distinct sessions per repo/model/tool/file), prefer the **hypaware-graph** +skill: it reads relationships from the projected graph instead of scanning messages. Focus on these signals: @@ -41,7 +43,10 @@ Focus on these signals: - **Sticking points** — where agents got stuck or redid work, ranked by impact: failing tools (`is_error` by `tool_name`), retry loops (same tool + first `tool_args` token ≥3×/session), refusals/truncations (stop-reason), abandoned costly sessions, - repeatedly-violated conventions. + repeatedly-violated conventions. Optional outcome lens where the graph is GitHub-enriched: + work that never landed (sessions whose commits reach no `PullRequest`) or drew heavy review + churn can corroborate a sticking point, but treat it as a proxy, not proof (unmerged or + heavily-reviewed work is not necessarily bad). - **Inefficiency** (reuse the spend spine) — model over-spec, context bloat (no `is_compact_summary`), low cache-read (`cache_read/(cache_read+input)`), redo loops. 3. **Collect, dedup, prioritize.** Gather the candidates into a list of possible @@ -75,3 +80,12 @@ Focus on these signals: - **Formatting (human-readable):** every candidate states its evidence inline; show AGENTS.md edits as real diffs/code blocks, not prose; **bold** the artifact type; keep it a ~1-minute read. +- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is + split) hold the same standard as the main page: each argues one claim, opens with that + takeaway, and ties every number to what it means for the reader. Cut table narration ("how to + read the table") and standing bookkeeping prose; compress source/window/method to a few lines + and fold stat-only content into the table it supports. +- **No scope apologies.** Scope rules (what routes to which report) are authoring guidance, + never report copy. Don't write "no recommendations here" or routing disclaimers in the report; + state findings plainly, and where a sibling report owns the action a plain cross-link is + enough. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-security-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-security-report/SKILL.md index 4eafe2fb..f5946280 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-security-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-security-report/SKILL.md @@ -61,5 +61,14 @@ than the same command in a gated session. · Guardrails · Trends · Capture caveats**. **All examples redacted.** - **Formatting (human-readable):** open each section with its takeaway; **bold** severities; one severity-ranked findings table as the centerpiece; keep the bottom line + table a ~1-minute read. +- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is + split) hold the same standard as the main page: each argues one claim, opens with that + takeaway, and ties every number to what it means for the reader. Cut table narration ("how to + read the table") and standing bookkeeping prose; compress source/window/method to a few lines + and fold stat-only content into the table it supports. +- **No scope apologies.** Scope rules (what routes to which report) are authoring guidance, + never report copy. Don't write "no recommendations here" or routing disclaimers in the report; + state findings plainly, and where a sibling report owns the action a plain cross-link is + enough. - **Capture-health note:** if raw credentials appear in `tool_args`, the standing #1 finding is "gateway must redact secrets in tool_args" — put it in Capture caveats without reproducing it. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-spend-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-spend-report/SKILL.md index b04ffa93..4eaa9335 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-spend-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-spend-report/SKILL.md @@ -73,3 +73,12 @@ SELECT sum(inp) t_in, sum(outp) t_out, sum(cwrite) t_cw, sum(cread) t_cr FROM ms - **Formatting (human-readable):** open each section with its one-sentence takeaway, then the table; **bold** headline numbers; split the four token types (never collapse them); keep the bottom line + ranked table a ~1-minute read. +- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is + split) hold the same standard as the main page: each argues one claim, opens with that + takeaway, and ties every number to what it means for the reader. Cut table narration ("how to + read the table") and standing bookkeeping prose; compress source/window/method to a few lines + and fold stat-only content into the table it supports. +- **No scope apologies.** Scope rules (what routes to which report) are authoring guidance, + never report copy. Don't write "no recommendations here" or routing disclaimers in the report; + state findings plainly, and where a sibling report owns the action a plain cross-link is + enough. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md index 7d4abc2e..b85d511d 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md @@ -82,6 +82,10 @@ Claude transcript enrichment adds `provider_uuid`, `parent_uuid`, `request_id`, Run `hyp query schema ai_gateway_messages --format markdown` for the authoritative column reference. +## When the graph answers it cheaper + +Some questions are relationships, not row scans. For descriptive "who used what" rollups (distinct sessions per repo / model / tool / file, or session to commit to PR joins), the **hypaware-graph** skill reads compact `node` / `edge` adjacency instead of scanning `ai_gateway_messages`, and it reaches GitHub facets (repos, PRs, reviewers) that are not in the messages at all. Keep per-message measures here on `ai_gateway_messages`: token sums, `count(*)` call totals, error / stop-reason, and `content_text`. See the hypaware-graph skill for the full boundary. + ## Guardrails - Do not assume the cache auto-refreshes. Query commands default to `--refresh never`. diff --git a/hypaware-core/plugins-workspace/context-graph/skills/hypaware-graph/SKILL.md b/hypaware-core/plugins-workspace/context-graph/skills/hypaware-graph/SKILL.md index c96b79bc..d6f1c9f1 100644 --- a/hypaware-core/plugins-workspace/context-graph/skills/hypaware-graph/SKILL.md +++ b/hypaware-core/plugins-workspace/context-graph/skills/hypaware-graph/SKILL.md @@ -1,15 +1,15 @@ --- name: hypaware-graph -description: Explore the local HypAware context graph — the activity graph projected from ai_gateway_messages (Sessions, Apps, Models, Tools, Files and how they connect). Use when the user asks what connects to a file/session/tool/model, wants co-occurrence or N-hop traversal, or wants to build/refresh the graph. Covers `hyp graph project` and `hyp graph neighbors`. +description: Explore the HypAware context graph: the activity graph projected from ai_gateway_messages (Sessions, Apps, Models, Tools, Files, and git Repos/Commits, and how they connect), plus optional server-side GitHub enrichment (Repos, PullRequests, Commits, Reviewers) that bridges AI sessions to code review. Use when the user asks what connects to a file/session/tool/model, wants co-occurrence or N-hop traversal, wants to join AI sessions to GitHub repos/PRs/reviewers, or wants to build/refresh the graph. Covers `hyp graph project` and `hyp graph neighbors`. --- # HypAware Graph -`hyp graph` turns local HypAware recordings into a queryable **activity graph** and walks it. The graph is a derived projection of the `ai_gateway_messages` dataset — the same local data the `hypaware-query` skill reads — read as *relationships* instead of rows. +`hyp graph` turns local HypAware recordings into a queryable **activity graph** and walks it. The graph is a derived projection of the `ai_gateway_messages` dataset (the same local data the `hypaware-query` skill reads), read as *relationships* instead of rows. ## Build or refresh the graph -The projection runs **on demand** — it does not auto-update. Run it before querying, and again after new sessions are recorded: +The projection runs **on demand**; it does not auto-update. Run it before querying, and again after new sessions are recorded: ```bash hyp graph project # project ai_gateway_messages → node/edge tables (idempotent) @@ -17,19 +17,56 @@ hyp graph project --dry-run # show what would be written hyp graph compact # merge duplicate rows, rewrite partitions sorted (maintenance) ``` -`hyp graph project` prints `N node(s), M edge(s) — wrote …`. If `node`/`edge` come back empty, the projection has not been run yet (or there are no recordings). +`hyp graph project` prints `N node(s), M edge(s) - wrote …`. If `node`/`edge` come back empty, the projection has not been run yet (or there are no recordings). ## The graph model -Deterministic T0 projection — exact-key, no models. All edges are **Session-rooted**: +Deterministic T0 projection: exact-key, no models. The core is **Session-rooted**, plus a small git-provenance sub-web (`Repo`, `Commit`) that doubles as the bridge to GitHub enrichment (see below). -- **Node types:** `Session` (one per session), `App` (client), `Model`, `Tool`, `File`. -- **Edge types:** `via` (Session→App), `used_model` (Session→Model), `used` (Session→Tool), `touched` (Session→File). -- **natural_key** is the human key per type: Session = `session_id` (the always-present session container; `conversation_id` is a nullable thread identity, null for Claude), App = `client_name`, Model = model id, Tool = tool name, File = full path (its `label` is the basename). Every row carries inline provenance (`source_dataset`, `projector`, …). +- **Node types:** `Session` (one per session), `App` (client), `Model`, `Tool`, `File`, and, when the session's git context is captured, `Repo` and `Commit`. +- **Edge types:** `via` (Session→App), `used_model` (Session→Model), `used` (Session→Tool), `touched` (Session→File), `in` (Session→Repo), `at` (Session→Commit, the HEAD the session sat on), and `in` (Commit→Repo). +- **natural_key** is the human key per type: Session = `session_id` (the always-present session container; `conversation_id` is a nullable thread identity, null for Claude), App = `client_name`, Model = model id, Tool = tool name, File = full path (or `owner/repo:relpath` when the repo remote is known; its `label` is the basename), Repo = `owner/repo`, Commit = full 40-hex sha. Every row carries inline provenance (`source_dataset`, `projector`, …). +- **Bridge-ready keys.** `Repo`, `Commit`, and `File` use shared, content-addressed natural keys, so a node minted from a session's git context and the same node minted by the GitHub source converge on one id. This is what lets an AI session and a pull request meet at the same commit. + +## GitHub enrichment (server-side) + +The base graph above comes from `ai_gateway_messages` and is available anywhere, including a local install. A **server** can additionally run the `@hypaware/github` source, which captures repo / commit / PR / issue / review events and projects a second contract into the **same** `node` / `edge` tables. This is the graph's biggest payoff, and none of it is answerable from the message data alone. + +**Caveats first, so you don't query nodes that aren't there:** + +- **Server-only and opt-in.** GitHub nodes exist only on a host where the `@hypaware/github` source is configured and has captured events (normally the central server, reached with `--remote`). A plain local projection has none of them. An empty GitHub query usually means the source is not configured on that host, or the graph has not been re-projected since capture, not that the true answer is zero. +- **`Actor` is a GitHub login, not the AI user.** The identity that authored a commit or opened a PR is the git actor, not the `user_id` of whoever ran the agent. Cross-domain identity merge is later work (T1/T2); never equate an `Actor` with an AI operator. +- **Freshness applies here too.** The GitHub domain is only as current as the last projection on that host, and you cannot project through the read-only query token (projection is admin-side). On a stale central graph, recent PRs and reviews are simply missing. One subtlety: freshly projected rows sit in a spool until a settling read runs **on the server** (an admin-side query; the remote query surface never settles), and `node` and `edge` settle independently, so a graph can briefly show fresh nodes joined by stale edges. If a cross-domain join returns implausibly few rows against fresh-looking nodes, suspect an unsettled `edge` dataset before doubting the data. + +**What it adds:** + +- **Nodes:** `Actor` (login), `Issue`, `PullRequest`, `Review`, plus enriched `Repo` / `Commit` / `File`. +- **Edges:** `authored` (Actor→Commit), `opened` and `commented` (Actor→Issue | PullRequest), `submitted` (Actor→Review), `on` (Review→PullRequest), `references` (PullRequest→Commit), `touched` (Commit→File and PullRequest→File), `in` (Commit | File | Issue | PullRequest→Repo). + +**Why it matters, the possibility.** Because `Repo`, `Commit`, and `File` are bridge-ready, the AI-session web and the GitHub web are *one graph*. The commit a session sat on (`Session -at-> Commit`) is the same node GitHub knows through `PullRequest -references-> Commit` and `Actor -authored-> Commit`. So you can walk from an agent's activity into the code-review reality around it, questions `ai_gateway_messages` cannot express: + +- **AI work to the PR that shipped it:** `Session -at-> Commit <-references- PullRequest`. +- **AI work to who reviewed it:** continue `PullRequest <-on- Review <-submitted- Actor`. +- **Coverage, honestly:** which repos and PRs an agent's work actually reached, not just which cwd it ran in. +- **Reverse:** start from a `PullRequest` or `Repo` and walk inbound to every AI session that touched it. + +```bash +# Sessions whose HEAD commit is referenced by a PR (AI work that reached code review). +# Note: no --refresh with --remote; the server owns its freshness. +hyp query sql "select distinct s.natural_key session + from edge a join node s on a.src_id = s.node_id + join edge r on r.dst_id = a.dst_id and r.edge_type = 'references' + where a.edge_type = 'at'" --remote HYP_CENTRAL + +# From a PR, walk out to its reviews, actors, and referenced commits. +hyp graph neighbors owner/repo#123 --type PullRequest --depth 2 --direction both --remote HYP_CENTRAL +``` + +Reach for the GitHub domain whenever a question spans **both** AI activity and code collaboration (sessions to PRs, agents to reviewers, work to repos). If it is purely one side, the base graph or plain message SQL is enough. ## Two ways to query -### 1. SQL over `node` / `edge` — counts, filters, aggregates +### 1. SQL over `node` / `edge`: counts, filters, aggregates Flat SQL through the normal query surface (no recursion). Use for "how many", "top N", "group by", one- or two-hop joins: @@ -39,23 +76,23 @@ hyp query sql "select t.natural_key tool, count(*) n from edge e join node t on where e.edge_type = 'used' group by tool order by n desc" --refresh always ``` -`node`/`edge` are ordinary datasets — see the `hypaware-query` skill for `hyp query` mechanics (read stderr, freshness, `--format json`). Use `--refresh always` so the read settles freshly-projected rows. +`node`/`edge` are ordinary datasets; see the `hypaware-query` skill for `hyp query` mechanics (read stderr, freshness, `--format json`). Use `--refresh always` so the read settles freshly-projected rows. -### 2. Traversal — `hyp graph neighbors` — relationships and depth +### 2. Traversal with `hyp graph neighbors`: relationships and depth -For "what connects to X", co-occurrence, and N-hop walks — what flat SQL can't express. +For "what connects to X", co-occurrence, and N-hop walks, what flat SQL can't express. ```bash hyp graph neighbors [--depth N] [--type T] [--edge-type T] [--direction out|in|both] [--limit N] [--json] ``` -- **``** resolves by `node_id`, then `natural_key`, then `label` — so pass a session id, a file path (or basename), a model id, or a tool/app name. If it's ambiguous the command lists the candidates; narrow with `--type Session|App|Model|Tool|File`. +- **``** resolves by `node_id`, then `natural_key`, then `label`, so pass a session id, a file path (or basename), a model id, or a tool/app name. If it's ambiguous the command lists the candidates; narrow with `--type Session|App|Model|Tool|File|Repo|Commit|Actor|Issue|PullRequest|Review` (the last five require GitHub enrichment). - **`--direction`** is load-bearing because edges are Session-rooted: - `out` from a **Session** → its app, model, tools, files. - `in` from a **File / Tool / Model / App** → the **Sessions** that touched/used it. - `both` at `--depth 2` from a **File** → **co-occurrence**: the other files/tools/models reached through the sessions that share it. - **`--edge-type`** restricts which relations are walked (e.g. `--edge-type used`); repeatable or comma-separated. -- **`--limit`** caps output in BFS order and reports the true total when it truncates (`3 of 7 … — truncated`). +- **`--limit`** caps output in BFS order and reports the true total when it truncates (`… - truncated; raise --limit`). - **`--json`** emits `{ seed, neighbors: [{ hop, edge_type, direction, node }], reachable, truncated }` for follow-up reasoning. Examples: @@ -73,9 +110,23 @@ hyp graph neighbors claude-opus-4-8 --type Model --direction in # sessions that - Counting, ranking, grouping, "how often" → **`hyp query sql`** over `node`/`edge`. - "What connects to X", paths, neighborhoods, co-occurrence, depth → **`hyp graph neighbors`**. +### When the graph is cheaper than scanning `ai_gateway_messages` + +For descriptive "who used what" rollups, the graph answers from compact adjacency instead of scanning every message row. Prefer it when the rollup keys on an entity that is already a node: + +- sessions per tool = `used` · per model = `used_model` · per file = `touched` · per app = `via` · per repo = `in` (Session→Repo) · per commit = `at`. +- distinct-session counts over any of these are `count(distinct src_id)` on the edge, far fewer rows than `count(distinct session_id)` over messages (benchmarked ~12x fewer for the repo rollup). + +Stay on `ai_gateway_messages` when the measure lives on the message, not the relationship: + +- token sums and cache-read ratios; `count(*)` call totals (the graph keeps one edge per pair, so it cannot count repeat calls); `is_error` / `is_sidechain` / stop-reason; and `content_text` classification. +- per-`gateway_id` or per-`user_id` rollups: there are no Gateway or User nodes yet. + +Rule of thumb: "which sessions relate to X" or "how many distinct sessions" is a graph question; anything needing a per-message measure (tokens, errors, call counts, text) stays on messages. + ## Guardrails - **Project first.** The graph is only as fresh as the last `hyp graph project`; empty results usually mean it has not run. -- **Read stderr for errors.** `graph neighbors` writes not-found / ambiguity notes (and the large-graph note) to stderr; exit `1` is a resolution error, `2` is a usage error. **Truncation is part of the result**, so it goes to **stdout** (`3 of 7 … — truncated`) and the `--json` `truncated` field — not stderr. -- The graph is **derived and rebuildable** — never the source of truth. To change what it contains, fix capture/projection and re-project; don't hand-edit `node`/`edge`. -- Basic traversal loads the graph in memory per call — fine at activity-graph scale; a very large graph prints a note pointing at the future indexed path. +- **Read stderr for errors.** `graph neighbors` writes not-found / ambiguity notes (and the large-graph note) to stderr; exit `1` is a resolution error, `2` is a usage error. **Truncation is part of the result**, so it goes to **stdout** (`… - truncated; raise --limit`) and the `--json` `truncated` field, not stderr. +- The graph is **derived and rebuildable**, never the source of truth. To change what it contains, fix capture/projection and re-project; don't hand-edit `node`/`edge`. +- Basic traversal loads the graph in memory per call, fine at activity-graph scale; a very large graph prints a note pointing at the future indexed path. From 3e014965fa08182680639ba4911ba1d8b40866a9 Mon Sep 17 00:00:00 2001 From: Brendan McMullen Date: Thu, 2 Jul 2026 13:39:24 -0700 Subject: [PATCH 2/3] Skills: multi-file report output + reconcile token math to one-carrier rule Report skills (adoption, improvement, security, spend), mirrored across the claude/ and codex/ workspaces. Docs/skills only; no runtime code paths change. Multi-file report output: the four reports now emit a progressively-disclosed one-pager (`hypaware-reports/-.md`) plus one markdown file per detail section, instead of a single file. The two standing quality rules now bind the section files, not just the summary: each section file argues one claim and opens with its takeaway, and a section that is only a stat table is folded back into the one-pager rather than shipped as a page. The "no scope apologies" rule applies in every file. Token math reconciled to the one-carrier rule (LLP 0035 / LLP 0026): usage rides exactly one row per response (the last assistant part; non-carrier parts are null), so a plain SUM over assistant `attributes.usage` rows is correct with no dedup. Verified empirically against local recordings (19,639 usage-bearing rows): usage never appears on >1 part per message, and the old dedup query returns a byte-identical total to a plain SUM. Removed the stale "usage repeats across parts / a plain SUM overcounts ~3x" guidance from the spend and improvement reports and the hypaware-query skill, replaced the spend SQL with a plain SUM, and switched the defensive-dedup key from conversation_id to session_id (conversation_id is null for Claude and only separates threads within a Codex session). De-em-dashed the four report skills per repo code style, and reworded the adoption opener to drop the "the one descriptive report / no actions" framing. Refs: LLP 0035 (token-usage-normalization), LLP 0026 (claude-native-granularity). Co-Authored-By: Claude Opus 4.8 (1M context) --- .../hypaware-ai-adoption-report/SKILL.md | 85 +++++++------- .../hypaware-ai-improvement-report/SKILL.md | 83 +++++++------- .../hypaware-ai-security-report/SKILL.md | 62 ++++++----- .../skills/hypaware-ai-spend-report/SKILL.md | 104 +++++++++--------- .../claude/skills/hypaware-query/SKILL.md | 2 +- .../hypaware-ai-adoption-report/SKILL.md | 85 +++++++------- .../hypaware-ai-improvement-report/SKILL.md | 83 +++++++------- .../hypaware-ai-security-report/SKILL.md | 62 ++++++----- .../skills/hypaware-ai-spend-report/SKILL.md | 104 +++++++++--------- .../codex/skills/hypaware-query/SKILL.md | 2 +- 10 files changed, 352 insertions(+), 320 deletions(-) diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md index 7bb1c0da..9a5d674f 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md @@ -1,46 +1,45 @@ --- name: hypaware-ai-adoption-report -description: AI Adoption Profile for a HypAware server — descriptive "who's using the fleet and how": per-gateway utilization (volume + focus — top models, tools, repos, themes) and parallelism/fan-out (multi-agent adoption, concurrency, main-vs-subagent split, payoff). +description: AI Adoption Profile for a HypAware server: descriptive "who's using the fleet and how": per-gateway utilization (volume + focus: top models, tools, repos, themes) and parallelism/fan-out (multi-agent adoption, concurrency, main-vs-subagent split, payoff). --- # AI Adoption Profile -The one **descriptive** report on *who's using the fleet and how* — no actions, just a clear -picture. Two lenses over one window: +A profile of *who's using the fleet and how*, read through two lenses over one window: -- **Utilization** — per `gateway_id` (≈ one machine/user): *how much* (messages, sessions, +- **Utilization:** per `gateway_id` (≈ one machine/user): *how much* (messages, sessions, active days, tokens, cache-read ratio) and *what it's focused on* (top models, tools, repos, work themes). -- **Parallelism / fan-out** — *how sophisticated*: multi-agent adoption, fan-out breadth/depth, +- **Parallelism / fan-out:** *how sophisticated*: multi-agent adoption, fan-out breadth/depth, true concurrency vs serial, the main-loop-vs-subagent token split, and whether fan-out appears to earn its cost. -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to profile, then proceed against the chosen source. -**Descriptive only** — route every *action* out: "fan out more/less" and tooling go to +**Descriptive only.** Route every *action* out: "fan out more/less" and tooling go to **hypaware-ai-improvement-report**, token waste to **hypaware-ai-spend-report**. Query mechanics -live in the **hypaware-query** skill; reuse hypaware-ai-spend-report's deduped token spine for +live in the **hypaware-query** skill; reuse hypaware-ai-spend-report's token spine for any token figure. For descriptive who-used-what rollups (distinct sessions per repo/model/tool/file), prefer the **hypaware-graph** skill, which reads them from the projected graph instead of scanning messages; keep token figures on messages. ## Procedure -1. **Scope + coverage.** Distinct `gateway_id` (the unit — `user_id` is ~always null, so don't +1. **Scope + coverage.** Distinct `gateway_id` (the unit; `user_id` is ~always null, so don't measure reach by it); window; coverage of `gateway_id`, token usage, and subagent provenance - (`agent_id` / `is_sidechain` / `parent_thread_id` — transcript-enriched, may not survive + (`agent_id` / `is_sidechain` / `parent_thread_id`, transcript-enriched, may not survive ingest). Decide cost-capable vs volume-only, and which parallelism dimensions are real vs proxied (the `Task`-call proxy). State N; if it's effectively one gateway / dogfood, say so. If the source is GitHub-enriched, also record graph coverage and freshness: whether `Repo` / `PullRequest` / `Review` nodes exist and the max `first_seen` per type. A stale projection undercounts reach, so state the graph's as-of date and treat every reach figure as a floor. 2. **Per-gateway utilization.** One row per gateway: volume (messages, sessions, active days, - first/last seen), tokens + cache-read ratio `cache_read/(cache_read+input)`, then its focus — + first/last seen), tokens + cache-read ratio `cache_read/(cache_read+input)`, then its focus: top models (+ `(unknown)`), tools (Bash dominance + top commands), repos, client (claude/codex), and 2–4 recurring work themes (sampled, redacted). Distill each into a one-line **focus label**. Where the graph is GitHub-enriched, add each gateway's real *reach* @@ -51,34 +50,40 @@ graph instead of scanning messages; keep token figures on messages. 3. **Parallelism / fan-out.** Adoption (% conversations with ≥1 subagent, incl. the zero bucket); breadth (subagents per parent) and depth (`parent_thread_id` chains); concurrency (do subagent time-spans actually *overlap*, or is it serial?); cost split (token share main - loop vs subagent). Read the payoff *descriptively* — fan-out vs tokens-to-resolution — and + loop vs subagent). Read the payoff *descriptively*: fan-out vs tokens-to-resolution, and route the "fan out more/less" recommendation to hypaware-ai-improvement-report. -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-adoption-profile.md` (create the dir if needed). - Dated so reports accumulate. -- **Bottom line (1 sentence):** how many gateways are active, the busiest one and its focus, and - the headline fan-out adoption %. -- **Per-gateway profile (sortable table):** one row per gateway — **gateway** · **volume** - (messages / sessions) · **tokens** · **cache-read %** · **fan-out** (adoption + main-vs-subagent - split) · **focus label**. Busiest first — this table is the spine. -- **Then the detail**, in these sections: **Scope & coverage · Per-gateway utilization · - Per-gateway focus · Parallelism & fan-out · Payoff (descriptive) · Fleet view · Caveats**. -- **GitHub reach (where enriched):** in Per-gateway focus, show each gateway's repos and PRs - reached, as of the graph's projection date from step 1 (a stale graph makes reach a floor, not - an upper bound). -- **Formatting (human-readable):** open each section with its takeaway; **bold** headline - numbers; one sortable footprint table as the centerpiece; keep the bottom line + table a - ~1-minute read. -- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is - split) hold the same standard as the main page: each argues one claim, opens with that - takeaway, and ties every number to what it means for the reader. Cut table narration ("how to - read the table") and standing bookkeeping prose; compress source/window/method to a few lines - and fold stat-only content into the table it supports. -- **No scope apologies.** The scope rules above ("descriptive only", what routes to which - report) are authoring guidance, never report copy. Don't write "no recommendations here" or - routing disclaimers in the report; state findings plainly, and where a sibling report owns the - action a plain cross-link is enough. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. + +- **Main one-pager:** `hypaware-reports/-adoption-profile.md` (create the dir if + needed). Dated so reports accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# AI Adoption Profile`, then a `## · ` subtitle + (e.g. `## HYP_CENTRAL fleet · 30-day view`). + 2. **Thesis** - ONE **bold** sentence carrying the whole story: how many gateways are active, + the busiest one + its focus, and the headline fan-out adoption %. + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link (e.g. `**3 active gateways** | phil, brendan, kenny`; busiest-gateway share; % sessions + using a subagent; % volume inside subagents; cost-capable vs volume-only). This is the ONLY + table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings, highest-signal first; each is a + `### N. ` followed by 2-3 plain-language sentences with **bold** numbers, then a + single `[
→](/.md)` link on its own line (e.g. adoption concentration, the + busiest power-user, subagent adoption, concurrency consistency, distinct per-gateway identities). + 5. **`## Caveat`** - the single most important caveat in plain language + a `[caveats →]` link + (for a transcript-only window: a volume-only profile, no cost/cache; note whether subagent + identity survived ingest). + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[scope & coverage](/scope-coverage.md) · [per-gateway utilization](/per-gateway-utilization.md) · [per-gateway focus](/per-gateway-focus.md) · [parallelism & fan-out](/parallelism-fan-out.md) · [payoff](/payoff.md) · [fleet view](/fleet-view.md) · [caveats](/caveats.md)`. +- **GitHub reach (where enriched):** the per-gateway-focus section file shows each gateway's repos + and PRs reached, as of the graph's projection date from step 1 (a stale graph makes reach a floor, + not an upper bound). +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file; state findings plainly, and where a sibling report owns the action a plain cross-link is enough. - **Capture-health note:** if subagent provenance doesn't reach the server, the standing #1 - caveat is "subagent identity must survive ingest" — run adoption off the sub-agent-invocation + caveat is "subagent identity must survive ingest", run adoption off the sub-agent-invocation proxy (the tool calls that spawn sub-agents) and flag it. diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-improvement-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-improvement-report/SKILL.md index 0e42956f..09b1512f 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-improvement-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-improvement-report/SKILL.md @@ -9,11 +9,11 @@ Read how the team's agents actually behave in the logs, then propose the **addit edits to skills, subagents, and AGENTS.md/CLAUDE.md** that make them do **better work (quality)** with **less wasted effort and spend**. -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to run the review against, then proceed against the chosen source. Query mechanics live in the **hypaware-query** skill; read it first. For descriptive @@ -22,11 +22,11 @@ skill: it reads relationships from the projected graph instead of scanning messa Focus on these signals: -- **Repeated work** — work the team repeats successfully → package it once (skill / subagent). -- **Sticking points** — where agents get stuck or redo work (errors, loops, refusals, +- **Repeated work:** work the team repeats successfully → package it once (skill / subagent). +- **Sticking points:** where agents get stuck or redo work (errors, loops, refusals, abandonment) → the missing or too-weak instruction that would prevent it (an AGENTS.md rule, or a skill). -- **Inefficiency** — expensive patterns (model over-spec, context bloat, low cache-read, +- **Inefficiency:** expensive patterns (model over-spec, context bloat, low cache-read, retry loops) → the setup change that costs less (right-size the model in AGENTS.md / a subagent, a context-hygiene rule, a skill that avoids the redo). @@ -35,57 +35,58 @@ Focus on these signals: claude/codex mix. State N and breadth; flag if single-contributor. 2. **Scan the signals for possible improvements.** Work the three signals above; each turns up candidates. Note frequency (sessions, distinct gateways) and redact examples. - - **Repeated work** — cluster sessions by shared-file overlap and tool-set signature into + - **Repeated work:** cluster sessions by shared-file overlap and tool-set signature into recurring kinds of work (reuse a recent `hypaware-reports/` *Leverage candidates* handoff if present). Flag parallelizable work done serially (low `is_sidechain`, few `agent_id`), and recurring asks / multi-step workflows / re-sent instructions in sampled prompts + `system_text`. - - **Sticking points** — where agents got stuck or redid work, ranked by impact: failing + - **Sticking points:** where agents got stuck or redid work, ranked by impact: failing tools (`is_error` by `tool_name`), retry loops (same tool + first `tool_args` token ≥3×/session), refusals/truncations (stop-reason), abandoned costly sessions, repeatedly-violated conventions. Optional outcome lens where the graph is GitHub-enriched: work that never landed (sessions whose commits reach no `PullRequest`) or drew heavy review churn can corroborate a sticking point, but treat it as a proxy, not proof (unmerged or heavily-reviewed work is not necessarily bad). - - **Inefficiency** (reuse the spend spine) — model over-spec, context bloat (no + - **Inefficiency** (reuse the spend spine): model over-spec, context bloat (no `is_compact_summary`), low cache-read (`cache_read/(cache_read+input)`), redo loops. 3. **Collect, dedup, prioritize.** Gather the candidates into a list of possible - improvements; drop anything an existing artifact already covers — a quick scan of the - repo's `.claude/skills/`, subagents, and AGENTS.md/CLAUDE.md (the only repo read — every + improvements; drop anything an existing artifact already covers: a quick scan of the + repo's `.claude/skills/`, subagents, and AGENTS.md/CLAUDE.md (the only repo read; every other signal is the logs), used to dedup and to mark each survivor **new** vs **edit to an existing artifact**. For each, suggest a likely form - where one obviously fits — a **skill**, a **subagent**, or an **AGENTS.md/CLAUDE.md - edit** (propose the concrete line(s) as a diff, not "consider documenting") — but don't + where one obviously fits: a **skill**, a **subagent**, or an **AGENTS.md/CLAUDE.md + edit** (propose the concrete line(s) as a diff, not "consider documenting"), but don't force a mapping. Attach evidence (frequency/impact + distinct gateways + token prize) and rank by it. - - **Size the token prize** off the spend spine (deduped `max()` per `message_id`; a plain - `SUM` overcounts ~3×). Give two numbers, kept distinct: **exposure (measured)** — tokens + - **Size the token prize** off the spend spine (usage is one row per response, so a plain `SUM` + over assistant `attributes.usage` rows is correct; no dedup needed, LLP 0035). Give two numbers, kept distinct: **exposure (measured)**: tokens currently flowing through the issue (sum across a repeated cluster, or tokens in - error/retry/abandoned turns) — and **est. saving (assumption)** only where the - counterfactual is clean (cache-read ratio, model right-size). Both are floors — capture + error/retry/abandoned turns), and **est. saving (assumption)** only where the + counterfactual is clean (cache-read ratio, model right-size). Both are floors; capture is partial; never present a saving as if it were measured. -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-improvement-review.md` (create the dir - if needed). Dated so reviews accumulate. -- **Bottom line (1 sentence):** "Create/edit these N things and what it will improve.". -- **Improvements (ranked table):** one row per improvement — **what** · **suggested form** - (skill / subagent / AGENTS.md edit, where one fits) · **evidence** (recurs N× across G - gateways, sticking-point impact) · **token prize** (exposure measured / est. saving, - labeled) · **where it lands**. Highest-leverage first. -- **Then the detail**, in these sections: **Basis · Skill candidates · Subagent candidates - (fan-out + roles) · AGENTS.md/CLAUDE.md edits · Efficiency → setup changes · Caveats** — - plus the proposed AGENTS.md lines verbatim. Each candidate is marked **new** or **edit to - **; don't list artifacts that aren't being changed. -- **Formatting (human-readable):** every candidate states its evidence inline; show - AGENTS.md edits as real diffs/code blocks, not prose; **bold** the artifact type; keep - it a ~1-minute read. -- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is - split) hold the same standard as the main page: each argues one claim, opens with that - takeaway, and ties every number to what it means for the reader. Cut table narration ("how to - read the table") and standing bookkeeping prose; compress source/window/method to a few lines - and fold stat-only content into the table it supports. -- **No scope apologies.** Scope rules (what routes to which report) are authoring guidance, - never report copy. Don't write "no recommendations here" or routing disclaimers in the report; - state findings plainly, and where a sibling report owns the action a plain cross-link is - enough. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. + +- **Main one-pager:** `hypaware-reports/-improvement-review.md` (create the dir if + needed). Dated so reviews accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# AI Improvement Review`, then a `## · ` subtitle. + 2. **Thesis** - ONE **bold** sentence: "Create/edit these N things and what it will improve." + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link (# improvements, new-vs-edit split, the single biggest token prize, the basis: + gateways / sessions / repos). This is the ONLY table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings = the top improvements, highest-leverage + first; each is a `### N. ` naming **what + its form** (skill / subagent / AGENTS.md + edit), followed by 2-3 plain-language sentences with the evidence and the **bold** token prize, + then a single `[
→](/.md)` link on its own line. Show AGENTS.md edits as + real diffs/code blocks in their section file, not prose. + 5. **`## Caveat`** - token prizes are floors and capture is partial, in plain language + a + `[caveats →]` link. + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[basis](/basis.md) · [skill candidates](/skill-candidates.md) · [subagent candidates](/subagent-candidates.md) · [AGENTS.md edits](/agents-md-edits.md) · [efficiency → setup](/efficiency-setup.md) · [caveats](/caveats.md)`. +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file; state findings plainly, and where a sibling report owns the action a plain cross-link is enough. diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-security-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-security-report/SKILL.md index f5946280..79a85d38 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-security-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-security-report/SKILL.md @@ -1,6 +1,6 @@ --- name: hypaware-ai-security-report -description: Security & Risk Review for a HypAware server — audits gateway logs for risky autonomous activity (destructive commands, remote-exec, privilege escalation, secret reads, network egress, package installs), severity-ranks findings, and recommends guardrails. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. Audits recorded logs, NOT pending code — for a code diff use a dedicated code-review tool. +description: Security & Risk Review for a HypAware server: audits gateway logs for risky autonomous activity (destructive commands, remote-exec, privilege escalation, secret reads, network egress, package installs), severity-ranks findings, and recommends guardrails. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. Audits recorded logs, NOT pending code; for a code diff use a dedicated code-review tool. --- # Security & Risk Review @@ -10,15 +10,15 @@ unsupervised that carries risk**, and the guardrails to contain it. This audits activity*, not pending code (for a diff, use a dedicated code-review tool). Be proportionate: separate "ran a risky command" from "caused harm". -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to audit, then proceed against the chosen source. -**REDACT everything** — never echo a secret, token, key, or credential value in the report, +**REDACT everything.** Never echo a secret, token, key, or credential value in the report, even one you found in `tool_args`; a raw secret appearing in args is itself a capture finding (flag it, don't reproduce it). Query mechanics live in the **hypaware-query** skill; read it first. @@ -32,12 +32,12 @@ even one you found in `tool_args`; a raw secret appearing in args is itself a ca | Network egress | outbound `curl`/`scp`/`nc` POSTs to external hosts | potential exfiltration | | Package install | `npm i`, `pip install`, `brew install`, `apt` | supply-chain surface | -Risk is **amplified by autonomy** — weight anything that ran under `bypassPermissions` higher +Risk is **amplified by autonomy**: weight anything that ran under `bypassPermissions` higher than the same command in a gated session. ## Procedure 1. **Coverage + autonomy baseline.** Window; coverage of `tool_args` on Bash calls and of - `permission_mode` (bound the read if sparse); distinct `gateway_id` (the unit — `user_id` is + `permission_mode` (bound the read if sparse); distinct `gateway_id` (the unit; `user_id` is ~always null). Total tool calls, % Bash, % conversations in `bypassPermissions`. State N; if it's a dogfood window, say so. 2. **Scan + rank.** Histogram the first token of each Bash command (top 30 = the normal), then @@ -49,26 +49,30 @@ than the same command in a gated session. policy**, an **agent-instructions rule** (AGENTS.md / CLAUDE.md), or a **gateway redaction** fix for any secret seen in args. -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-security-review.md` (create the dir if needed). - Dated so reports accumulate. -- **Bottom line (1 sentence):** the posture — how autonomous the fleet is (% `bypassPermissions`) - and whether anything needs urgent attention. -- **Findings (severity-ranked table):** one row per finding — **risk class** · **severity** - (high / med / low) · **count** · **example (redacted)** · **guardrail** · **where it ran**. - Highest severity first. -- **Then the detail**, in these sections: **Autonomy baseline · Command profile · Risk findings - · Guardrails · Trends · Capture caveats**. **All examples redacted.** -- **Formatting (human-readable):** open each section with its takeaway; **bold** severities; one - severity-ranked findings table as the centerpiece; keep the bottom line + table a ~1-minute read. -- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is - split) hold the same standard as the main page: each argues one claim, opens with that - takeaway, and ties every number to what it means for the reader. Cut table narration ("how to - read the table") and standing bookkeeping prose; compress source/window/method to a few lines - and fold stat-only content into the table it supports. -- **No scope apologies.** Scope rules (what routes to which report) are authoring guidance, - never report copy. Don't write "no recommendations here" or routing disclaimers in the report; - state findings plainly, and where a sibling report owns the action a plain cross-link is - enough. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. **All examples redacted in every file.** + +- **Main one-pager:** `hypaware-reports/-security-review.md` (create the dir if + needed). Dated so reports accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# Security & Risk Review`, then a `## · ` subtitle. + 2. **Thesis** - ONE **bold** sentence: the posture - how autonomous the fleet is + (% `bypassPermissions`) and whether anything needs urgent attention. + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link and all redacted (% `bypassPermissions`, count of HIGH findings, the top risk class, + total tool calls + % Bash, coverage of `tool_args` / `permission_mode`). This is the ONLY + table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings, highest-severity first; each is a + `### N. ` followed by 2-3 plain-language sentences with **bold** severities (redacted), + then a single `[
→](/.md)` link on its own line (autonomy posture, the top + risk findings, the single most important guardrail). + 5. **`## Caveat`** - the standing capture caveat in plain language + a `[capture caveats →]` link. + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[autonomy baseline](/autonomy-baseline.md) · [command profile](/command-profile.md) · [risk findings](/risk-findings.md) · [guardrails](/guardrails.md) · [trends](/trends.md) · [capture caveats](/capture-caveats.md)`. +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file (redaction still applies); state findings plainly, and where a sibling report owns the action a plain cross-link is enough. - **Capture-health note:** if raw credentials appear in `tool_args`, the standing #1 finding is - "gateway must redact secrets in tool_args" — put it in Capture caveats without reproducing it. + "gateway must redact secrets in tool_args", put it in Capture caveats without reproducing it. diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-spend-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-spend-report/SKILL.md index 4eaa9335..33599948 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-spend-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-spend-report/SKILL.md @@ -1,46 +1,48 @@ --- name: hypaware-ai-spend-report -description: AI Spend Review for a HypAware server — where tokens go (by user/repo/model/gateway), what work they go on (graph-clustered), and how to spend less (waste scorecard + cost levers). Deduped token volume, never dollars. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. +description: AI Spend Review for a HypAware server: where tokens go (by user/repo/model/gateway), what work they go on (graph-clustered), and how to spend less (waste scorecard + cost levers). Token volume, never dollars. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. --- # AI Spend Review -Where the team's tokens go, on what work, and how to spend less — read off one deduped token +Where the team's tokens go, on what work, and how to spend less, read off one token spine. One story in three moves: **where it goes** (by user / repo / model / gateway, over time), **on what work** (sessions clustered into recurring work-types, sized by tokens), and **how to spend less** (a waste scorecard + ranked cost levers). -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to run the review against, then proceed against the chosen source. -**Tokens, never dollars** — capture is partial, so stop at token volume and behavioral proxies. +**Tokens, never dollars.** Capture is partial, so stop at token volume and behavioral proxies. Query mechanics (`--remote`, date-pruning the 30s timeout, the SQL dialect, sampling) live in the **hypaware-query** skill; read it first. -## Token math (get this right — every breakdown reconciles to it) +## Token math (get this right; every breakdown reconciles to it) Usage is in `attributes.usage` (NOT `raw_frame`): `input_tokens`, `output_tokens`, -`cache_read_tokens`, `cache_write_tokens` (+ `reasoning_tokens` for Codex). Dedup by message -and `max()` — a plain `SUM` overcounts ~3×. Report the four types separately (cache-read is +`cache_read_tokens`, `cache_write_tokens` (+ `reasoning_tokens` for Codex). Usage rides exactly +one row per response (the last assistant part; non-carrier parts are null), so a plain `SUM` +over assistant rows is correct with no dedup (the one-carrier rule, LLP 0035); `input_tokens` +is net of cache, so it never double-counts. Report the four types separately (cache-read is usually the bulk; output the scarce slice). ```sql -WITH msg AS ( - SELECT COALESCE(CAST(JSON_EXTRACT(raw_frame,'$.message_id') AS VARCHAR), message_id) mid, - max(CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)) inp, - max(CAST(JSON_EXTRACT(attributes,'$.usage.output_tokens') AS BIGINT)) outp, - max(CAST(JSON_EXTRACT(attributes,'$.usage.cache_write_tokens') AS BIGINT)) cwrite, - max(CAST(JSON_EXTRACT(attributes,'$.usage.cache_read_tokens') AS BIGINT)) cread - FROM ai_gateway_messages - WHERE date BETWEEN '' AND '' - AND role='assistant' AND JSON_EXTRACT(attributes,'$.usage') IS NOT NULL - GROUP BY mid) -SELECT sum(inp) t_in, sum(outp) t_out, sum(cwrite) t_cw, sum(cread) t_cr FROM msg; --- slice by adding max(gateway_id)/max(model)/max(repo_root)/max(date) inside, GROUP BY outside. +SELECT + sum(CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)) t_in, + sum(CAST(JSON_EXTRACT(attributes,'$.usage.output_tokens') AS BIGINT)) t_out, + sum(CAST(JSON_EXTRACT(attributes,'$.usage.cache_write_tokens') AS BIGINT)) t_cw, + sum(CAST(JSON_EXTRACT(attributes,'$.usage.cache_read_tokens') AS BIGINT)) t_cr +FROM ai_gateway_messages +WHERE date BETWEEN '' AND '' + AND role='assistant' AND JSON_EXTRACT(attributes,'$.usage') IS NOT NULL; +-- One carrier row per response (LLP 0035): a plain SUM is correct, no dedup. +-- Slice by adding gateway_id / model / repo_root / date to SELECT + GROUP BY. +-- Defensive equivalent: max(...) GROUP BY session_id, message_id -- session_id is the +-- uniform key; conversation_id is null for Claude and only separates Codex threads. ``` ## Procedure @@ -51,34 +53,38 @@ SELECT sum(inp) t_in, sum(outp) t_out, sum(cwrite) t_cw, sum(cread) t_cr FROM ms proxies (turns, tool calls, length), labeled as estimates. 2. **Where it goes + on what work.** Slice the spine by user / repo / model (→ `(unknown)` bucket) / gateway with shares, plus trends (by week, WoW deltas vs the last report) and - top-spend outliers. Then cluster sessions into recurring **work-types** — shared-file overlap - for code work, tool-set signature for no-file work (context graph if projected, else SQL) — - and size each by deduped tokens. -3. **How to spend less.** Score the waste — cache-read ratio `cache_read/(cache_read+input)` - (the biggest lever — feature it), retry loops, abandoned costly sessions, model over-spec, - context bloat — and turn the top few into ranked cost levers, each with an estimated weekly + top-spend outliers. Then cluster sessions into recurring **work-types**: shared-file overlap + for code work, tool-set signature for no-file work (context graph if projected, else SQL), + and size each by tokens. +3. **How to spend less.** Score the waste: cache-read ratio `cache_read/(cache_read+input)` + (the biggest lever, feature it), retry loops, abandoned costly sessions, model over-spec, + context bloat, and turn the top few into ranked cost levers, each with an estimated weekly token saving. Hand the expensive recurring work-types to **hypaware-ai-improvement-report** as packaging candidates (surface them; don't build them here). -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-spend-review.md` (create the dir if needed). Dated - so reviews accumulate. -- **Bottom line (1 sentence):** total tokens this window, the trend vs the last report (▲/▼ %), - and the single biggest driver. -- **Cost actions (ranked table):** one row per lever — **action** · **evidence** (tokens/share - + cause) · **est. weekly saving** · **where it lands**. Highest-saving first. -- **Then the detail**, in these sections: **Coverage & footprint · Total spend (4 token types) - · Allocation (user / repo / model / gateway) · Trends · Outliers · Spend by work-type · Waste - scorecard · Leverage candidates (handoff to hypaware-ai-improvement-report) · Caveats**. -- **Formatting (human-readable):** open each section with its one-sentence takeaway, then the - table; **bold** headline numbers; split the four token types (never collapse them); keep the - bottom line + ranked table a ~1-minute read. -- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is - split) hold the same standard as the main page: each argues one claim, opens with that - takeaway, and ties every number to what it means for the reader. Cut table narration ("how to - read the table") and standing bookkeeping prose; compress source/window/method to a few lines - and fold stat-only content into the table it supports. -- **No scope apologies.** Scope rules (what routes to which report) are authoring guidance, - never report copy. Don't write "no recommendations here" or routing disclaimers in the report; - state findings plainly, and where a sibling report owns the action a plain cross-link is - enough. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. + +- **Main one-pager:** `hypaware-reports/-spend-review.md` (create the dir if needed). + Dated so reviews accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# AI Spend Review`, then a `## · ` subtitle. + 2. **Thesis** - ONE **bold** sentence: total tokens this window, the trend vs the last report + (▲/▼ %), and the single biggest driver. + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link (total tokens with the four types split, WoW trend, the top allocation who/what + dominates, the biggest waste lever + est. weekly saving, usage coverage). This is the ONLY + table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings, highest-leverage first; each is a + `### N. ` followed by 2-3 plain-language sentences with **bold** numbers, then a + single `[
→](/.md)` link on its own line (where it goes, on what work, the + biggest waste lever, the top cost action). + 5. **`## Caveat`** - tokens-never-dollars and partial capture, in plain language + a `[caveats →]` + link. + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[coverage](/coverage.md) · [total spend](/total-spend.md) · [allocation](/allocation.md) · [trends](/trends.md) · [outliers](/outliers.md) · [work-types](/work-types.md) · [waste scorecard](/waste-scorecard.md) · [leverage candidates](/leverage-candidates.md) · [caveats](/caveats.md)`. +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file; state findings plainly, and where a sibling report owns the action a plain cross-link is enough. diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md index d9a1d654..e47de5db 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md @@ -76,7 +76,7 @@ Key columns: - `tool_name`, `tool_call_id`, `tool_args`, `status` — tool-call/result joins and sparse status such as `finish_reason`. - `attributes` (JSON) — request settings, usage, propagated `dev_run_id`, and gateway diagnostics under `attributes.gateway`. -**Token counts** live under `attributes.usage` on `role='assistant'` rows (NOT in `raw_frame`): `input_tokens`, `output_tokens`, `cache_read_tokens`, `cache_write_tokens`. Codex (`provider='openai'`) omits `cache_write_tokens` and adds `reasoning_tokens` + `total_tokens`. Extract with `CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)`; usage repeats across a message's content parts, so dedup per `message_id` (e.g. `max(...)`) before summing. +**Token counts** live under `attributes.usage` on `role='assistant'` rows (NOT in `raw_frame`): `input_tokens`, `output_tokens`, `cache_read_tokens`, `cache_write_tokens`. Codex (`provider='openai'`) omits `cache_write_tokens` and adds `reasoning_tokens` + `total_tokens`. Extract with `CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)`. Usage rides exactly one row per response (the last assistant part; non-carrier parts are null), so a plain `SUM` over assistant rows is correct with no dedup (the one-carrier rule, LLP 0035). If you prefer a defensive dedup, `max(...) GROUP BY session_id, message_id` returns the same number: key on `session_id` (`conversation_id` is null for Claude, and only separates threads within a Codex session). Claude transcript enrichment adds `provider_uuid`, `parent_uuid`, `request_id`, `entrypoint`, `client_version`, `user_type`, `permission_mode`, and `hook_event` when the local Claude Code JSONL transcript can be matched. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md index 7bb1c0da..9a5d674f 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md @@ -1,46 +1,45 @@ --- name: hypaware-ai-adoption-report -description: AI Adoption Profile for a HypAware server — descriptive "who's using the fleet and how": per-gateway utilization (volume + focus — top models, tools, repos, themes) and parallelism/fan-out (multi-agent adoption, concurrency, main-vs-subagent split, payoff). +description: AI Adoption Profile for a HypAware server: descriptive "who's using the fleet and how": per-gateway utilization (volume + focus: top models, tools, repos, themes) and parallelism/fan-out (multi-agent adoption, concurrency, main-vs-subagent split, payoff). --- # AI Adoption Profile -The one **descriptive** report on *who's using the fleet and how* — no actions, just a clear -picture. Two lenses over one window: +A profile of *who's using the fleet and how*, read through two lenses over one window: -- **Utilization** — per `gateway_id` (≈ one machine/user): *how much* (messages, sessions, +- **Utilization:** per `gateway_id` (≈ one machine/user): *how much* (messages, sessions, active days, tokens, cache-read ratio) and *what it's focused on* (top models, tools, repos, work themes). -- **Parallelism / fan-out** — *how sophisticated*: multi-agent adoption, fan-out breadth/depth, +- **Parallelism / fan-out:** *how sophisticated*: multi-agent adoption, fan-out breadth/depth, true concurrency vs serial, the main-loop-vs-subagent token split, and whether fan-out appears to earn its cost. -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to profile, then proceed against the chosen source. -**Descriptive only** — route every *action* out: "fan out more/less" and tooling go to +**Descriptive only.** Route every *action* out: "fan out more/less" and tooling go to **hypaware-ai-improvement-report**, token waste to **hypaware-ai-spend-report**. Query mechanics -live in the **hypaware-query** skill; reuse hypaware-ai-spend-report's deduped token spine for +live in the **hypaware-query** skill; reuse hypaware-ai-spend-report's token spine for any token figure. For descriptive who-used-what rollups (distinct sessions per repo/model/tool/file), prefer the **hypaware-graph** skill, which reads them from the projected graph instead of scanning messages; keep token figures on messages. ## Procedure -1. **Scope + coverage.** Distinct `gateway_id` (the unit — `user_id` is ~always null, so don't +1. **Scope + coverage.** Distinct `gateway_id` (the unit; `user_id` is ~always null, so don't measure reach by it); window; coverage of `gateway_id`, token usage, and subagent provenance - (`agent_id` / `is_sidechain` / `parent_thread_id` — transcript-enriched, may not survive + (`agent_id` / `is_sidechain` / `parent_thread_id`, transcript-enriched, may not survive ingest). Decide cost-capable vs volume-only, and which parallelism dimensions are real vs proxied (the `Task`-call proxy). State N; if it's effectively one gateway / dogfood, say so. If the source is GitHub-enriched, also record graph coverage and freshness: whether `Repo` / `PullRequest` / `Review` nodes exist and the max `first_seen` per type. A stale projection undercounts reach, so state the graph's as-of date and treat every reach figure as a floor. 2. **Per-gateway utilization.** One row per gateway: volume (messages, sessions, active days, - first/last seen), tokens + cache-read ratio `cache_read/(cache_read+input)`, then its focus — + first/last seen), tokens + cache-read ratio `cache_read/(cache_read+input)`, then its focus: top models (+ `(unknown)`), tools (Bash dominance + top commands), repos, client (claude/codex), and 2–4 recurring work themes (sampled, redacted). Distill each into a one-line **focus label**. Where the graph is GitHub-enriched, add each gateway's real *reach* @@ -51,34 +50,40 @@ graph instead of scanning messages; keep token figures on messages. 3. **Parallelism / fan-out.** Adoption (% conversations with ≥1 subagent, incl. the zero bucket); breadth (subagents per parent) and depth (`parent_thread_id` chains); concurrency (do subagent time-spans actually *overlap*, or is it serial?); cost split (token share main - loop vs subagent). Read the payoff *descriptively* — fan-out vs tokens-to-resolution — and + loop vs subagent). Read the payoff *descriptively*: fan-out vs tokens-to-resolution, and route the "fan out more/less" recommendation to hypaware-ai-improvement-report. -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-adoption-profile.md` (create the dir if needed). - Dated so reports accumulate. -- **Bottom line (1 sentence):** how many gateways are active, the busiest one and its focus, and - the headline fan-out adoption %. -- **Per-gateway profile (sortable table):** one row per gateway — **gateway** · **volume** - (messages / sessions) · **tokens** · **cache-read %** · **fan-out** (adoption + main-vs-subagent - split) · **focus label**. Busiest first — this table is the spine. -- **Then the detail**, in these sections: **Scope & coverage · Per-gateway utilization · - Per-gateway focus · Parallelism & fan-out · Payoff (descriptive) · Fleet view · Caveats**. -- **GitHub reach (where enriched):** in Per-gateway focus, show each gateway's repos and PRs - reached, as of the graph's projection date from step 1 (a stale graph makes reach a floor, not - an upper bound). -- **Formatting (human-readable):** open each section with its takeaway; **bold** headline - numbers; one sortable footprint table as the centerpiece; keep the bottom line + table a - ~1-minute read. -- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is - split) hold the same standard as the main page: each argues one claim, opens with that - takeaway, and ties every number to what it means for the reader. Cut table narration ("how to - read the table") and standing bookkeeping prose; compress source/window/method to a few lines - and fold stat-only content into the table it supports. -- **No scope apologies.** The scope rules above ("descriptive only", what routes to which - report) are authoring guidance, never report copy. Don't write "no recommendations here" or - routing disclaimers in the report; state findings plainly, and where a sibling report owns the - action a plain cross-link is enough. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. + +- **Main one-pager:** `hypaware-reports/-adoption-profile.md` (create the dir if + needed). Dated so reports accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# AI Adoption Profile`, then a `## · ` subtitle + (e.g. `## HYP_CENTRAL fleet · 30-day view`). + 2. **Thesis** - ONE **bold** sentence carrying the whole story: how many gateways are active, + the busiest one + its focus, and the headline fan-out adoption %. + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link (e.g. `**3 active gateways** | phil, brendan, kenny`; busiest-gateway share; % sessions + using a subagent; % volume inside subagents; cost-capable vs volume-only). This is the ONLY + table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings, highest-signal first; each is a + `### N. ` followed by 2-3 plain-language sentences with **bold** numbers, then a + single `[
→](/.md)` link on its own line (e.g. adoption concentration, the + busiest power-user, subagent adoption, concurrency consistency, distinct per-gateway identities). + 5. **`## Caveat`** - the single most important caveat in plain language + a `[caveats →]` link + (for a transcript-only window: a volume-only profile, no cost/cache; note whether subagent + identity survived ingest). + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[scope & coverage](/scope-coverage.md) · [per-gateway utilization](/per-gateway-utilization.md) · [per-gateway focus](/per-gateway-focus.md) · [parallelism & fan-out](/parallelism-fan-out.md) · [payoff](/payoff.md) · [fleet view](/fleet-view.md) · [caveats](/caveats.md)`. +- **GitHub reach (where enriched):** the per-gateway-focus section file shows each gateway's repos + and PRs reached, as of the graph's projection date from step 1 (a stale graph makes reach a floor, + not an upper bound). +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file; state findings plainly, and where a sibling report owns the action a plain cross-link is enough. - **Capture-health note:** if subagent provenance doesn't reach the server, the standing #1 - caveat is "subagent identity must survive ingest" — run adoption off the sub-agent-invocation + caveat is "subagent identity must survive ingest", run adoption off the sub-agent-invocation proxy (the tool calls that spawn sub-agents) and flag it. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-improvement-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-improvement-report/SKILL.md index 0e42956f..09b1512f 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-improvement-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-improvement-report/SKILL.md @@ -9,11 +9,11 @@ Read how the team's agents actually behave in the logs, then propose the **addit edits to skills, subagents, and AGENTS.md/CLAUDE.md** that make them do **better work (quality)** with **less wasted effort and spend**. -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to run the review against, then proceed against the chosen source. Query mechanics live in the **hypaware-query** skill; read it first. For descriptive @@ -22,11 +22,11 @@ skill: it reads relationships from the projected graph instead of scanning messa Focus on these signals: -- **Repeated work** — work the team repeats successfully → package it once (skill / subagent). -- **Sticking points** — where agents get stuck or redo work (errors, loops, refusals, +- **Repeated work:** work the team repeats successfully → package it once (skill / subagent). +- **Sticking points:** where agents get stuck or redo work (errors, loops, refusals, abandonment) → the missing or too-weak instruction that would prevent it (an AGENTS.md rule, or a skill). -- **Inefficiency** — expensive patterns (model over-spec, context bloat, low cache-read, +- **Inefficiency:** expensive patterns (model over-spec, context bloat, low cache-read, retry loops) → the setup change that costs less (right-size the model in AGENTS.md / a subagent, a context-hygiene rule, a skill that avoids the redo). @@ -35,57 +35,58 @@ Focus on these signals: claude/codex mix. State N and breadth; flag if single-contributor. 2. **Scan the signals for possible improvements.** Work the three signals above; each turns up candidates. Note frequency (sessions, distinct gateways) and redact examples. - - **Repeated work** — cluster sessions by shared-file overlap and tool-set signature into + - **Repeated work:** cluster sessions by shared-file overlap and tool-set signature into recurring kinds of work (reuse a recent `hypaware-reports/` *Leverage candidates* handoff if present). Flag parallelizable work done serially (low `is_sidechain`, few `agent_id`), and recurring asks / multi-step workflows / re-sent instructions in sampled prompts + `system_text`. - - **Sticking points** — where agents got stuck or redid work, ranked by impact: failing + - **Sticking points:** where agents got stuck or redid work, ranked by impact: failing tools (`is_error` by `tool_name`), retry loops (same tool + first `tool_args` token ≥3×/session), refusals/truncations (stop-reason), abandoned costly sessions, repeatedly-violated conventions. Optional outcome lens where the graph is GitHub-enriched: work that never landed (sessions whose commits reach no `PullRequest`) or drew heavy review churn can corroborate a sticking point, but treat it as a proxy, not proof (unmerged or heavily-reviewed work is not necessarily bad). - - **Inefficiency** (reuse the spend spine) — model over-spec, context bloat (no + - **Inefficiency** (reuse the spend spine): model over-spec, context bloat (no `is_compact_summary`), low cache-read (`cache_read/(cache_read+input)`), redo loops. 3. **Collect, dedup, prioritize.** Gather the candidates into a list of possible - improvements; drop anything an existing artifact already covers — a quick scan of the - repo's `.claude/skills/`, subagents, and AGENTS.md/CLAUDE.md (the only repo read — every + improvements; drop anything an existing artifact already covers: a quick scan of the + repo's `.claude/skills/`, subagents, and AGENTS.md/CLAUDE.md (the only repo read; every other signal is the logs), used to dedup and to mark each survivor **new** vs **edit to an existing artifact**. For each, suggest a likely form - where one obviously fits — a **skill**, a **subagent**, or an **AGENTS.md/CLAUDE.md - edit** (propose the concrete line(s) as a diff, not "consider documenting") — but don't + where one obviously fits: a **skill**, a **subagent**, or an **AGENTS.md/CLAUDE.md + edit** (propose the concrete line(s) as a diff, not "consider documenting"), but don't force a mapping. Attach evidence (frequency/impact + distinct gateways + token prize) and rank by it. - - **Size the token prize** off the spend spine (deduped `max()` per `message_id`; a plain - `SUM` overcounts ~3×). Give two numbers, kept distinct: **exposure (measured)** — tokens + - **Size the token prize** off the spend spine (usage is one row per response, so a plain `SUM` + over assistant `attributes.usage` rows is correct; no dedup needed, LLP 0035). Give two numbers, kept distinct: **exposure (measured)**: tokens currently flowing through the issue (sum across a repeated cluster, or tokens in - error/retry/abandoned turns) — and **est. saving (assumption)** only where the - counterfactual is clean (cache-read ratio, model right-size). Both are floors — capture + error/retry/abandoned turns), and **est. saving (assumption)** only where the + counterfactual is clean (cache-read ratio, model right-size). Both are floors; capture is partial; never present a saving as if it were measured. -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-improvement-review.md` (create the dir - if needed). Dated so reviews accumulate. -- **Bottom line (1 sentence):** "Create/edit these N things and what it will improve.". -- **Improvements (ranked table):** one row per improvement — **what** · **suggested form** - (skill / subagent / AGENTS.md edit, where one fits) · **evidence** (recurs N× across G - gateways, sticking-point impact) · **token prize** (exposure measured / est. saving, - labeled) · **where it lands**. Highest-leverage first. -- **Then the detail**, in these sections: **Basis · Skill candidates · Subagent candidates - (fan-out + roles) · AGENTS.md/CLAUDE.md edits · Efficiency → setup changes · Caveats** — - plus the proposed AGENTS.md lines verbatim. Each candidate is marked **new** or **edit to - **; don't list artifacts that aren't being changed. -- **Formatting (human-readable):** every candidate states its evidence inline; show - AGENTS.md edits as real diffs/code blocks, not prose; **bold** the artifact type; keep - it a ~1-minute read. -- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is - split) hold the same standard as the main page: each argues one claim, opens with that - takeaway, and ties every number to what it means for the reader. Cut table narration ("how to - read the table") and standing bookkeeping prose; compress source/window/method to a few lines - and fold stat-only content into the table it supports. -- **No scope apologies.** Scope rules (what routes to which report) are authoring guidance, - never report copy. Don't write "no recommendations here" or routing disclaimers in the report; - state findings plainly, and where a sibling report owns the action a plain cross-link is - enough. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. + +- **Main one-pager:** `hypaware-reports/-improvement-review.md` (create the dir if + needed). Dated so reviews accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# AI Improvement Review`, then a `## · ` subtitle. + 2. **Thesis** - ONE **bold** sentence: "Create/edit these N things and what it will improve." + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link (# improvements, new-vs-edit split, the single biggest token prize, the basis: + gateways / sessions / repos). This is the ONLY table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings = the top improvements, highest-leverage + first; each is a `### N. ` naming **what + its form** (skill / subagent / AGENTS.md + edit), followed by 2-3 plain-language sentences with the evidence and the **bold** token prize, + then a single `[
→](/.md)` link on its own line. Show AGENTS.md edits as + real diffs/code blocks in their section file, not prose. + 5. **`## Caveat`** - token prizes are floors and capture is partial, in plain language + a + `[caveats →]` link. + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[basis](/basis.md) · [skill candidates](/skill-candidates.md) · [subagent candidates](/subagent-candidates.md) · [AGENTS.md edits](/agents-md-edits.md) · [efficiency → setup](/efficiency-setup.md) · [caveats](/caveats.md)`. +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file; state findings plainly, and where a sibling report owns the action a plain cross-link is enough. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-security-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-security-report/SKILL.md index f5946280..79a85d38 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-security-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-security-report/SKILL.md @@ -1,6 +1,6 @@ --- name: hypaware-ai-security-report -description: Security & Risk Review for a HypAware server — audits gateway logs for risky autonomous activity (destructive commands, remote-exec, privilege escalation, secret reads, network egress, package installs), severity-ranks findings, and recommends guardrails. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. Audits recorded logs, NOT pending code — for a code diff use a dedicated code-review tool. +description: Security & Risk Review for a HypAware server: audits gateway logs for risky autonomous activity (destructive commands, remote-exec, privilege escalation, secret reads, network egress, package installs), severity-ranks findings, and recommends guardrails. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. Audits recorded logs, NOT pending code; for a code diff use a dedicated code-review tool. --- # Security & Risk Review @@ -10,15 +10,15 @@ unsupervised that carries risk**, and the guardrails to contain it. This audits activity*, not pending code (for a diff, use a dedicated code-review tool). Be proportionate: separate "ran a risky command" from "caused harm". -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to audit, then proceed against the chosen source. -**REDACT everything** — never echo a secret, token, key, or credential value in the report, +**REDACT everything.** Never echo a secret, token, key, or credential value in the report, even one you found in `tool_args`; a raw secret appearing in args is itself a capture finding (flag it, don't reproduce it). Query mechanics live in the **hypaware-query** skill; read it first. @@ -32,12 +32,12 @@ even one you found in `tool_args`; a raw secret appearing in args is itself a ca | Network egress | outbound `curl`/`scp`/`nc` POSTs to external hosts | potential exfiltration | | Package install | `npm i`, `pip install`, `brew install`, `apt` | supply-chain surface | -Risk is **amplified by autonomy** — weight anything that ran under `bypassPermissions` higher +Risk is **amplified by autonomy**: weight anything that ran under `bypassPermissions` higher than the same command in a gated session. ## Procedure 1. **Coverage + autonomy baseline.** Window; coverage of `tool_args` on Bash calls and of - `permission_mode` (bound the read if sparse); distinct `gateway_id` (the unit — `user_id` is + `permission_mode` (bound the read if sparse); distinct `gateway_id` (the unit; `user_id` is ~always null). Total tool calls, % Bash, % conversations in `bypassPermissions`. State N; if it's a dogfood window, say so. 2. **Scan + rank.** Histogram the first token of each Bash command (top 30 = the normal), then @@ -49,26 +49,30 @@ than the same command in a gated session. policy**, an **agent-instructions rule** (AGENTS.md / CLAUDE.md), or a **gateway redaction** fix for any secret seen in args. -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-security-review.md` (create the dir if needed). - Dated so reports accumulate. -- **Bottom line (1 sentence):** the posture — how autonomous the fleet is (% `bypassPermissions`) - and whether anything needs urgent attention. -- **Findings (severity-ranked table):** one row per finding — **risk class** · **severity** - (high / med / low) · **count** · **example (redacted)** · **guardrail** · **where it ran**. - Highest severity first. -- **Then the detail**, in these sections: **Autonomy baseline · Command profile · Risk findings - · Guardrails · Trends · Capture caveats**. **All examples redacted.** -- **Formatting (human-readable):** open each section with its takeaway; **bold** severities; one - severity-ranked findings table as the centerpiece; keep the bottom line + table a ~1-minute read. -- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is - split) hold the same standard as the main page: each argues one claim, opens with that - takeaway, and ties every number to what it means for the reader. Cut table narration ("how to - read the table") and standing bookkeeping prose; compress source/window/method to a few lines - and fold stat-only content into the table it supports. -- **No scope apologies.** Scope rules (what routes to which report) are authoring guidance, - never report copy. Don't write "no recommendations here" or routing disclaimers in the report; - state findings plainly, and where a sibling report owns the action a plain cross-link is - enough. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. **All examples redacted in every file.** + +- **Main one-pager:** `hypaware-reports/-security-review.md` (create the dir if + needed). Dated so reports accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# Security & Risk Review`, then a `## · ` subtitle. + 2. **Thesis** - ONE **bold** sentence: the posture - how autonomous the fleet is + (% `bypassPermissions`) and whether anything needs urgent attention. + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link and all redacted (% `bypassPermissions`, count of HIGH findings, the top risk class, + total tool calls + % Bash, coverage of `tool_args` / `permission_mode`). This is the ONLY + table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings, highest-severity first; each is a + `### N. ` followed by 2-3 plain-language sentences with **bold** severities (redacted), + then a single `[
→](/.md)` link on its own line (autonomy posture, the top + risk findings, the single most important guardrail). + 5. **`## Caveat`** - the standing capture caveat in plain language + a `[capture caveats →]` link. + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[autonomy baseline](/autonomy-baseline.md) · [command profile](/command-profile.md) · [risk findings](/risk-findings.md) · [guardrails](/guardrails.md) · [trends](/trends.md) · [capture caveats](/capture-caveats.md)`. +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file (redaction still applies); state findings plainly, and where a sibling report owns the action a plain cross-link is enough. - **Capture-health note:** if raw credentials appear in `tool_args`, the standing #1 finding is - "gateway must redact secrets in tool_args" — put it in Capture caveats without reproducing it. + "gateway must redact secrets in tool_args", put it in Capture caveats without reproducing it. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-spend-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-spend-report/SKILL.md index 4eaa9335..33599948 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-spend-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-spend-report/SKILL.md @@ -1,46 +1,48 @@ --- name: hypaware-ai-spend-report -description: AI Spend Review for a HypAware server — where tokens go (by user/repo/model/gateway), what work they go on (graph-clustered), and how to spend less (waste scorecard + cost levers). Deduped token volume, never dollars. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. +description: AI Spend Review for a HypAware server: where tokens go (by user/repo/model/gateway), what work they go on (graph-clustered), and how to spend less (waste scorecard + cost levers). Token volume, never dollars. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. --- # AI Spend Review -Where the team's tokens go, on what work, and how to spend less — read off one deduped token +Where the team's tokens go, on what work, and how to spend less, read off one token spine. One story in three moves: **where it goes** (by user / repo / model / gateway, over time), **on what work** (sessions clustered into recurring work-types, sized by tokens), and **how to spend less** (a waste scorecard + ranked cost levers). -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to run the review against, then proceed against the chosen source. -**Tokens, never dollars** — capture is partial, so stop at token volume and behavioral proxies. +**Tokens, never dollars.** Capture is partial, so stop at token volume and behavioral proxies. Query mechanics (`--remote`, date-pruning the 30s timeout, the SQL dialect, sampling) live in the **hypaware-query** skill; read it first. -## Token math (get this right — every breakdown reconciles to it) +## Token math (get this right; every breakdown reconciles to it) Usage is in `attributes.usage` (NOT `raw_frame`): `input_tokens`, `output_tokens`, -`cache_read_tokens`, `cache_write_tokens` (+ `reasoning_tokens` for Codex). Dedup by message -and `max()` — a plain `SUM` overcounts ~3×. Report the four types separately (cache-read is +`cache_read_tokens`, `cache_write_tokens` (+ `reasoning_tokens` for Codex). Usage rides exactly +one row per response (the last assistant part; non-carrier parts are null), so a plain `SUM` +over assistant rows is correct with no dedup (the one-carrier rule, LLP 0035); `input_tokens` +is net of cache, so it never double-counts. Report the four types separately (cache-read is usually the bulk; output the scarce slice). ```sql -WITH msg AS ( - SELECT COALESCE(CAST(JSON_EXTRACT(raw_frame,'$.message_id') AS VARCHAR), message_id) mid, - max(CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)) inp, - max(CAST(JSON_EXTRACT(attributes,'$.usage.output_tokens') AS BIGINT)) outp, - max(CAST(JSON_EXTRACT(attributes,'$.usage.cache_write_tokens') AS BIGINT)) cwrite, - max(CAST(JSON_EXTRACT(attributes,'$.usage.cache_read_tokens') AS BIGINT)) cread - FROM ai_gateway_messages - WHERE date BETWEEN '' AND '' - AND role='assistant' AND JSON_EXTRACT(attributes,'$.usage') IS NOT NULL - GROUP BY mid) -SELECT sum(inp) t_in, sum(outp) t_out, sum(cwrite) t_cw, sum(cread) t_cr FROM msg; --- slice by adding max(gateway_id)/max(model)/max(repo_root)/max(date) inside, GROUP BY outside. +SELECT + sum(CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)) t_in, + sum(CAST(JSON_EXTRACT(attributes,'$.usage.output_tokens') AS BIGINT)) t_out, + sum(CAST(JSON_EXTRACT(attributes,'$.usage.cache_write_tokens') AS BIGINT)) t_cw, + sum(CAST(JSON_EXTRACT(attributes,'$.usage.cache_read_tokens') AS BIGINT)) t_cr +FROM ai_gateway_messages +WHERE date BETWEEN '' AND '' + AND role='assistant' AND JSON_EXTRACT(attributes,'$.usage') IS NOT NULL; +-- One carrier row per response (LLP 0035): a plain SUM is correct, no dedup. +-- Slice by adding gateway_id / model / repo_root / date to SELECT + GROUP BY. +-- Defensive equivalent: max(...) GROUP BY session_id, message_id -- session_id is the +-- uniform key; conversation_id is null for Claude and only separates Codex threads. ``` ## Procedure @@ -51,34 +53,38 @@ SELECT sum(inp) t_in, sum(outp) t_out, sum(cwrite) t_cw, sum(cread) t_cr FROM ms proxies (turns, tool calls, length), labeled as estimates. 2. **Where it goes + on what work.** Slice the spine by user / repo / model (→ `(unknown)` bucket) / gateway with shares, plus trends (by week, WoW deltas vs the last report) and - top-spend outliers. Then cluster sessions into recurring **work-types** — shared-file overlap - for code work, tool-set signature for no-file work (context graph if projected, else SQL) — - and size each by deduped tokens. -3. **How to spend less.** Score the waste — cache-read ratio `cache_read/(cache_read+input)` - (the biggest lever — feature it), retry loops, abandoned costly sessions, model over-spec, - context bloat — and turn the top few into ranked cost levers, each with an estimated weekly + top-spend outliers. Then cluster sessions into recurring **work-types**: shared-file overlap + for code work, tool-set signature for no-file work (context graph if projected, else SQL), + and size each by tokens. +3. **How to spend less.** Score the waste: cache-read ratio `cache_read/(cache_read+input)` + (the biggest lever, feature it), retry loops, abandoned costly sessions, model over-spec, + context bloat, and turn the top few into ranked cost levers, each with an estimated weekly token saving. Hand the expensive recurring work-types to **hypaware-ai-improvement-report** as packaging candidates (surface them; don't build them here). -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-spend-review.md` (create the dir if needed). Dated - so reviews accumulate. -- **Bottom line (1 sentence):** total tokens this window, the trend vs the last report (▲/▼ %), - and the single biggest driver. -- **Cost actions (ranked table):** one row per lever — **action** · **evidence** (tokens/share - + cause) · **est. weekly saving** · **where it lands**. Highest-saving first. -- **Then the detail**, in these sections: **Coverage & footprint · Total spend (4 token types) - · Allocation (user / repo / model / gateway) · Trends · Outliers · Spend by work-type · Waste - scorecard · Leverage candidates (handoff to hypaware-ai-improvement-report) · Caveats**. -- **Formatting (human-readable):** open each section with its one-sentence takeaway, then the - table; **bold** headline numbers; split the four token types (never collapse them); keep the - bottom line + ranked table a ~1-minute read. -- **Every section is analysis, not inventory.** Detail sections (or sub-pages, if the report is - split) hold the same standard as the main page: each argues one claim, opens with that - takeaway, and ties every number to what it means for the reader. Cut table narration ("how to - read the table") and standing bookkeeping prose; compress source/window/method to a few lines - and fold stat-only content into the table it supports. -- **No scope apologies.** Scope rules (what routes to which report) are authoring guidance, - never report copy. Don't write "no recommendations here" or routing disclaimers in the report; - state findings plainly, and where a sibling report owns the action a plain cross-link is - enough. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. + +- **Main one-pager:** `hypaware-reports/-spend-review.md` (create the dir if needed). + Dated so reviews accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# AI Spend Review`, then a `## · ` subtitle. + 2. **Thesis** - ONE **bold** sentence: total tokens this window, the trend vs the last report + (▲/▼ %), and the single biggest driver. + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link (total tokens with the four types split, WoW trend, the top allocation who/what + dominates, the biggest waste lever + est. weekly saving, usage coverage). This is the ONLY + table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings, highest-leverage first; each is a + `### N. ` followed by 2-3 plain-language sentences with **bold** numbers, then a + single `[
→](/.md)` link on its own line (where it goes, on what work, the + biggest waste lever, the top cost action). + 5. **`## Caveat`** - tokens-never-dollars and partial capture, in plain language + a `[caveats →]` + link. + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[coverage](/coverage.md) · [total spend](/total-spend.md) · [allocation](/allocation.md) · [trends](/trends.md) · [outliers](/outliers.md) · [work-types](/work-types.md) · [waste scorecard](/waste-scorecard.md) · [leverage candidates](/leverage-candidates.md) · [caveats](/caveats.md)`. +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file; state findings plainly, and where a sibling report owns the action a plain cross-link is enough. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md index b85d511d..2940dd5e 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md @@ -76,7 +76,7 @@ Key columns: - `tool_name`, `tool_call_id`, `tool_args`, `status` — tool-call/result joins and sparse status such as `finish_reason`. - `attributes` (JSON) — request settings, usage, propagated `dev_run_id`, and gateway diagnostics under `attributes.gateway`. -**Token counts** live under `attributes.usage` on `role='assistant'` rows (NOT in `raw_frame`): `input_tokens`, `output_tokens`, `cache_read_tokens`, `cache_write_tokens`. Codex (`provider='openai'`) omits `cache_write_tokens` and adds `reasoning_tokens` + `total_tokens`. Extract with `CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)`; usage repeats across a message's content parts, so dedup per `message_id` (e.g. `max(...)`) before summing. +**Token counts** live under `attributes.usage` on `role='assistant'` rows (NOT in `raw_frame`): `input_tokens`, `output_tokens`, `cache_read_tokens`, `cache_write_tokens`. Codex (`provider='openai'`) omits `cache_write_tokens` and adds `reasoning_tokens` + `total_tokens`. Extract with `CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)`. Usage rides exactly one row per response (the last assistant part; non-carrier parts are null), so a plain `SUM` over assistant rows is correct with no dedup (the one-carrier rule, LLP 0035). If you prefer a defensive dedup, `max(...) GROUP BY session_id, message_id` returns the same number: key on `session_id` (`conversation_id` is null for Claude, and only separates threads within a Codex session). Claude transcript enrichment adds `provider_uuid`, `parent_uuid`, `request_id`, `entrypoint`, `client_version`, `user_type`, `permission_mode`, and `hook_event` when the local Claude Code JSONL transcript can be matched. From ad70fcd33af0e2abe2275c01bd111d9d383f53af Mon Sep 17 00:00:00 2001 From: Brendan McMullen Date: Thu, 2 Jul 2026 14:53:01 -0700 Subject: [PATCH 3/3] adoption skill: make GitHub-enrichment reach a mandatory probe, not a guess Ported from a hand-edit made in the installed ~/.claude copy, mirrored across the claude/ and codex/ workspaces. Docs/skills only. Step 1 now requires actively probing for GitHub enrichment every run (a concrete `node`/`edge` query against the source) rather than assuming the "if enriched" condition. When a `github.t0` projector with PullRequest / Review nodes is present, GitHub reach is in scope and MUST be computed in step 2 (attributing sessions to a gateway via `cwd`); when absent, the report states "checked - no GitHub enrichment present" explicitly. The phrasings "not assessed" / "not queried" for reach are banned, since they signal a skipped probe. De-em-dashed per repo code style. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../hypaware-ai-adoption-report/SKILL.md | 23 +++++++++++++------ .../hypaware-ai-adoption-report/SKILL.md | 23 +++++++++++++------ 2 files changed, 32 insertions(+), 14 deletions(-) diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md index 9a5d674f..d397fd71 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md @@ -35,17 +35,26 @@ graph instead of scanning messages; keep token figures on messages. (`agent_id` / `is_sidechain` / `parent_thread_id`, transcript-enriched, may not survive ingest). Decide cost-capable vs volume-only, and which parallelism dimensions are real vs proxied (the `Task`-call proxy). State N; if it's effectively one gateway / dogfood, say so. - If the source is GitHub-enriched, also record graph coverage and freshness: whether `Repo` / - `PullRequest` / `Review` nodes exist and the max `first_seen` per type. A stale projection - undercounts reach, so state the graph's as-of date and treat every reach figure as a floor. + **ALWAYS verify GitHub enrichment before deciding it's out of scope: probe, never assume.** + A conditional ("if enriched…") obligates you to *test* the condition, not guess it. Run the + probe every run: `hyp query sql "SELECT node_type, projector, count(*) AS n, max(first_seen) + AS newest FROM node GROUP BY node_type, projector" --remote ` (and check `edge` exists via + `hyp query status`). If a `github.t0` projector with `PullRequest` / `Review` nodes is present, + **GitHub reach is IN SCOPE and MUST be computed in step 2**: record the node counts and the max + `first_seen` per type as the graph's as-of date, and treat every reach figure as a floor (a stale + projection undercounts). If the probe returns no `github.t0` / PR / Review nodes, state + **"checked - no GitHub enrichment present"** explicitly. **Never write "not assessed" / "not + queried" for reach**: that phrasing means you skipped the probe, which is the one thing this step + forbids. 2. **Per-gateway utilization.** One row per gateway: volume (messages, sessions, active days, first/last seen), tokens + cache-read ratio `cache_read/(cache_read+input)`, then its focus: top models (+ `(unknown)`), tools (Bash dominance + top commands), repos, client (claude/codex), and 2–4 recurring work themes (sampled, redacted). Distill each into a - one-line **focus label**. Where the graph is GitHub-enriched, add each gateway's real *reach* - from it: the repos and PRs its work actually landed in (`Session -at-> Commit <-references- - PullRequest`), and whether that work drew review (`... PullRequest <-on- Review <-submitted- - Actor`). This is footprint the messages cannot show. Keep it descriptive (route any "review + one-line **focus label**. When the step-1 probe found `github.t0` enrichment, you **MUST** add + each gateway's real *reach* from it: the repos and PRs its work actually landed in (`Session -at-> + Commit <-references- PullRequest`), and whether that work drew review (`... PullRequest <-on- Review + <-submitted- Actor`); attribute sessions to a gateway/person via session `cwd` (`/Users/`). + This is footprint the messages cannot show and is not optional when the graph supports it. Keep it descriptive (route any "review more" action to hypaware-ai-improvement-report) and dated to the graph's freshness from step 1. 3. **Parallelism / fan-out.** Adoption (% conversations with ≥1 subagent, incl. the zero bucket); breadth (subagents per parent) and depth (`parent_thread_id` chains); concurrency diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md index 9a5d674f..d397fd71 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md @@ -35,17 +35,26 @@ graph instead of scanning messages; keep token figures on messages. (`agent_id` / `is_sidechain` / `parent_thread_id`, transcript-enriched, may not survive ingest). Decide cost-capable vs volume-only, and which parallelism dimensions are real vs proxied (the `Task`-call proxy). State N; if it's effectively one gateway / dogfood, say so. - If the source is GitHub-enriched, also record graph coverage and freshness: whether `Repo` / - `PullRequest` / `Review` nodes exist and the max `first_seen` per type. A stale projection - undercounts reach, so state the graph's as-of date and treat every reach figure as a floor. + **ALWAYS verify GitHub enrichment before deciding it's out of scope: probe, never assume.** + A conditional ("if enriched…") obligates you to *test* the condition, not guess it. Run the + probe every run: `hyp query sql "SELECT node_type, projector, count(*) AS n, max(first_seen) + AS newest FROM node GROUP BY node_type, projector" --remote ` (and check `edge` exists via + `hyp query status`). If a `github.t0` projector with `PullRequest` / `Review` nodes is present, + **GitHub reach is IN SCOPE and MUST be computed in step 2**: record the node counts and the max + `first_seen` per type as the graph's as-of date, and treat every reach figure as a floor (a stale + projection undercounts). If the probe returns no `github.t0` / PR / Review nodes, state + **"checked - no GitHub enrichment present"** explicitly. **Never write "not assessed" / "not + queried" for reach**: that phrasing means you skipped the probe, which is the one thing this step + forbids. 2. **Per-gateway utilization.** One row per gateway: volume (messages, sessions, active days, first/last seen), tokens + cache-read ratio `cache_read/(cache_read+input)`, then its focus: top models (+ `(unknown)`), tools (Bash dominance + top commands), repos, client (claude/codex), and 2–4 recurring work themes (sampled, redacted). Distill each into a - one-line **focus label**. Where the graph is GitHub-enriched, add each gateway's real *reach* - from it: the repos and PRs its work actually landed in (`Session -at-> Commit <-references- - PullRequest`), and whether that work drew review (`... PullRequest <-on- Review <-submitted- - Actor`). This is footprint the messages cannot show. Keep it descriptive (route any "review + one-line **focus label**. When the step-1 probe found `github.t0` enrichment, you **MUST** add + each gateway's real *reach* from it: the repos and PRs its work actually landed in (`Session -at-> + Commit <-references- PullRequest`), and whether that work drew review (`... PullRequest <-on- Review + <-submitted- Actor`); attribute sessions to a gateway/person via session `cwd` (`/Users/`). + This is footprint the messages cannot show and is not optional when the graph supports it. Keep it descriptive (route any "review more" action to hypaware-ai-improvement-report) and dated to the graph's freshness from step 1. 3. **Parallelism / fan-out.** Adoption (% conversations with ≥1 subagent, incl. the zero bucket); breadth (subagents per parent) and depth (`parent_thread_id` chains); concurrency