diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md index 5c6ab15b..d397fd71 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-adoption-report/SKILL.md @@ -1,64 +1,98 @@ --- name: hypaware-ai-adoption-report -description: AI Adoption Profile for a HypAware server — descriptive "who's using the fleet and how": per-gateway utilization (volume + focus — top models, tools, repos, themes) and parallelism/fan-out (multi-agent adoption, concurrency, main-vs-subagent split, payoff). +description: AI Adoption Profile for a HypAware server: descriptive "who's using the fleet and how": per-gateway utilization (volume + focus: top models, tools, repos, themes) and parallelism/fan-out (multi-agent adoption, concurrency, main-vs-subagent split, payoff). --- # AI Adoption Profile -The one **descriptive** report on *who's using the fleet and how* — no actions, just a clear -picture. Two lenses over one window: +A profile of *who's using the fleet and how*, read through two lenses over one window: -- **Utilization** — per `gateway_id` (≈ one machine/user): *how much* (messages, sessions, +- **Utilization:** per `gateway_id` (≈ one machine/user): *how much* (messages, sessions, active days, tokens, cache-read ratio) and *what it's focused on* (top models, tools, repos, work themes). -- **Parallelism / fan-out** — *how sophisticated*: multi-agent adoption, fan-out breadth/depth, +- **Parallelism / fan-out:** *how sophisticated*: multi-agent adoption, fan-out breadth/depth, true concurrency vs serial, the main-loop-vs-subagent token split, and whether fan-out appears to earn its cost. -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to profile, then proceed against the chosen source. -**Descriptive only** — route every *action* out: "fan out more/less" and tooling go to +**Descriptive only.** Route every *action* out: "fan out more/less" and tooling go to **hypaware-ai-improvement-report**, token waste to **hypaware-ai-spend-report**. Query mechanics -live in the **hypaware-query** skill; reuse hypaware-ai-spend-report's deduped token spine for -any token figure. +live in the **hypaware-query** skill; reuse hypaware-ai-spend-report's token spine for +any token figure. For descriptive who-used-what rollups (distinct sessions per +repo/model/tool/file), prefer the **hypaware-graph** skill, which reads them from the projected +graph instead of scanning messages; keep token figures on messages. ## Procedure -1. **Scope + coverage.** Distinct `gateway_id` (the unit — `user_id` is ~always null, so don't +1. **Scope + coverage.** Distinct `gateway_id` (the unit; `user_id` is ~always null, so don't measure reach by it); window; coverage of `gateway_id`, token usage, and subagent provenance - (`agent_id` / `is_sidechain` / `parent_thread_id` — transcript-enriched, may not survive + (`agent_id` / `is_sidechain` / `parent_thread_id`, transcript-enriched, may not survive ingest). Decide cost-capable vs volume-only, and which parallelism dimensions are real vs proxied (the `Task`-call proxy). State N; if it's effectively one gateway / dogfood, say so. + **ALWAYS verify GitHub enrichment before deciding it's out of scope: probe, never assume.** + A conditional ("if enriched…") obligates you to *test* the condition, not guess it. Run the + probe every run: `hyp query sql "SELECT node_type, projector, count(*) AS n, max(first_seen) + AS newest FROM node GROUP BY node_type, projector" --remote ` (and check `edge` exists via + `hyp query status`). If a `github.t0` projector with `PullRequest` / `Review` nodes is present, + **GitHub reach is IN SCOPE and MUST be computed in step 2**: record the node counts and the max + `first_seen` per type as the graph's as-of date, and treat every reach figure as a floor (a stale + projection undercounts). If the probe returns no `github.t0` / PR / Review nodes, state + **"checked - no GitHub enrichment present"** explicitly. **Never write "not assessed" / "not + queried" for reach**: that phrasing means you skipped the probe, which is the one thing this step + forbids. 2. **Per-gateway utilization.** One row per gateway: volume (messages, sessions, active days, - first/last seen), tokens + cache-read ratio `cache_read/(cache_read+input)`, then its focus — + first/last seen), tokens + cache-read ratio `cache_read/(cache_read+input)`, then its focus: top models (+ `(unknown)`), tools (Bash dominance + top commands), repos, client (claude/codex), and 2–4 recurring work themes (sampled, redacted). Distill each into a - one-line **focus label**. Use the activity graph for structural focus (Session→Repo/PR via - GitHub enrichment) where projected. + one-line **focus label**. When the step-1 probe found `github.t0` enrichment, you **MUST** add + each gateway's real *reach* from it: the repos and PRs its work actually landed in (`Session -at-> + Commit <-references- PullRequest`), and whether that work drew review (`... PullRequest <-on- Review + <-submitted- Actor`); attribute sessions to a gateway/person via session `cwd` (`/Users/`). + This is footprint the messages cannot show and is not optional when the graph supports it. Keep it descriptive (route any "review + more" action to hypaware-ai-improvement-report) and dated to the graph's freshness from step 1. 3. **Parallelism / fan-out.** Adoption (% conversations with ≥1 subagent, incl. the zero bucket); breadth (subagents per parent) and depth (`parent_thread_id` chains); concurrency (do subagent time-spans actually *overlap*, or is it serial?); cost split (token share main - loop vs subagent). Read the payoff *descriptively* — fan-out vs tokens-to-resolution — and + loop vs subagent). Read the payoff *descriptively*: fan-out vs tokens-to-resolution, and route the "fan out more/less" recommendation to hypaware-ai-improvement-report. -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-adoption-profile.md` (create the dir if needed). - Dated so reports accumulate. -- **Bottom line (1 sentence):** how many gateways are active, the busiest one and its focus, and - the headline fan-out adoption %. -- **Per-gateway profile (sortable table):** one row per gateway — **gateway** · **volume** - (messages / sessions) · **tokens** · **cache-read %** · **fan-out** (adoption + main-vs-subagent - split) · **focus label**. Busiest first — this table is the spine. -- **Then the detail**, in these sections: **Scope & coverage · Per-gateway utilization · - Per-gateway focus · Parallelism & fan-out · Payoff (descriptive) · Fleet view · Caveats**. -- **Formatting (human-readable):** open each section with its takeaway; **bold** headline - numbers; one sortable footprint table as the centerpiece; keep the bottom line + table a - ~1-minute read. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. + +- **Main one-pager:** `hypaware-reports/-adoption-profile.md` (create the dir if + needed). Dated so reports accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# AI Adoption Profile`, then a `## · ` subtitle + (e.g. `## HYP_CENTRAL fleet · 30-day view`). + 2. **Thesis** - ONE **bold** sentence carrying the whole story: how many gateways are active, + the busiest one + its focus, and the headline fan-out adoption %. + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link (e.g. `**3 active gateways** | phil, brendan, kenny`; busiest-gateway share; % sessions + using a subagent; % volume inside subagents; cost-capable vs volume-only). This is the ONLY + table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings, highest-signal first; each is a + `### N. ` followed by 2-3 plain-language sentences with **bold** numbers, then a + single `[
→](/.md)` link on its own line (e.g. adoption concentration, the + busiest power-user, subagent adoption, concurrency consistency, distinct per-gateway identities). + 5. **`## Caveat`** - the single most important caveat in plain language + a `[caveats →]` link + (for a transcript-only window: a volume-only profile, no cost/cache; note whether subagent + identity survived ingest). + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[scope & coverage](/scope-coverage.md) · [per-gateway utilization](/per-gateway-utilization.md) · [per-gateway focus](/per-gateway-focus.md) · [parallelism & fan-out](/parallelism-fan-out.md) · [payoff](/payoff.md) · [fleet view](/fleet-view.md) · [caveats](/caveats.md)`. +- **GitHub reach (where enriched):** the per-gateway-focus section file shows each gateway's repos + and PRs reached, as of the graph's projection date from step 1 (a stale graph makes reach a floor, + not an upper bound). +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file; state findings plainly, and where a sibling report owns the action a plain cross-link is enough. - **Capture-health note:** if subagent provenance doesn't reach the server, the standing #1 - caveat is "subagent identity must survive ingest" — run adoption off the sub-agent-invocation + caveat is "subagent identity must survive ingest", run adoption off the sub-agent-invocation proxy (the tool calls that spawn sub-agents) and flag it. diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-improvement-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-improvement-report/SKILL.md index d45a4a4a..09b1512f 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-improvement-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-improvement-report/SKILL.md @@ -9,22 +9,24 @@ Read how the team's agents actually behave in the logs, then propose the **addit edits to skills, subagents, and AGENTS.md/CLAUDE.md** that make them do **better work (quality)** with **less wasted effort and spend**. -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to run the review against, then proceed against the chosen source. -Query mechanics live in the **hypaware-query** skill; read it first. +Query mechanics live in the **hypaware-query** skill; read it first. For descriptive +who-used-what rollups (distinct sessions per repo/model/tool/file), prefer the **hypaware-graph** +skill: it reads relationships from the projected graph instead of scanning messages. Focus on these signals: -- **Repeated work** — work the team repeats successfully → package it once (skill / subagent). -- **Sticking points** — where agents get stuck or redo work (errors, loops, refusals, +- **Repeated work:** work the team repeats successfully → package it once (skill / subagent). +- **Sticking points:** where agents get stuck or redo work (errors, loops, refusals, abandonment) → the missing or too-weak instruction that would prevent it (an AGENTS.md rule, or a skill). -- **Inefficiency** — expensive patterns (model over-spec, context bloat, low cache-read, +- **Inefficiency:** expensive patterns (model over-spec, context bloat, low cache-read, retry loops) → the setup change that costs less (right-size the model in AGENTS.md / a subagent, a context-hygiene rule, a skill that avoids the redo). @@ -33,45 +35,58 @@ Focus on these signals: claude/codex mix. State N and breadth; flag if single-contributor. 2. **Scan the signals for possible improvements.** Work the three signals above; each turns up candidates. Note frequency (sessions, distinct gateways) and redact examples. - - **Repeated work** — cluster sessions by shared-file overlap and tool-set signature into + - **Repeated work:** cluster sessions by shared-file overlap and tool-set signature into recurring kinds of work (reuse a recent `hypaware-reports/` *Leverage candidates* handoff if present). Flag parallelizable work done serially (low `is_sidechain`, few `agent_id`), and recurring asks / multi-step workflows / re-sent instructions in sampled prompts + `system_text`. - - **Sticking points** — where agents got stuck or redid work, ranked by impact: failing + - **Sticking points:** where agents got stuck or redid work, ranked by impact: failing tools (`is_error` by `tool_name`), retry loops (same tool + first `tool_args` token ≥3×/session), refusals/truncations (stop-reason), abandoned costly sessions, - repeatedly-violated conventions. - - **Inefficiency** (reuse the spend spine) — model over-spec, context bloat (no + repeatedly-violated conventions. Optional outcome lens where the graph is GitHub-enriched: + work that never landed (sessions whose commits reach no `PullRequest`) or drew heavy review + churn can corroborate a sticking point, but treat it as a proxy, not proof (unmerged or + heavily-reviewed work is not necessarily bad). + - **Inefficiency** (reuse the spend spine): model over-spec, context bloat (no `is_compact_summary`), low cache-read (`cache_read/(cache_read+input)`), redo loops. 3. **Collect, dedup, prioritize.** Gather the candidates into a list of possible - improvements; drop anything an existing artifact already covers — a quick scan of the - repo's `.claude/skills/`, subagents, and AGENTS.md/CLAUDE.md (the only repo read — every + improvements; drop anything an existing artifact already covers: a quick scan of the + repo's `.claude/skills/`, subagents, and AGENTS.md/CLAUDE.md (the only repo read; every other signal is the logs), used to dedup and to mark each survivor **new** vs **edit to an existing artifact**. For each, suggest a likely form - where one obviously fits — a **skill**, a **subagent**, or an **AGENTS.md/CLAUDE.md - edit** (propose the concrete line(s) as a diff, not "consider documenting") — but don't + where one obviously fits: a **skill**, a **subagent**, or an **AGENTS.md/CLAUDE.md + edit** (propose the concrete line(s) as a diff, not "consider documenting"), but don't force a mapping. Attach evidence (frequency/impact + distinct gateways + token prize) and rank by it. - - **Size the token prize** off the spend spine (deduped `max()` per `message_id`; a plain - `SUM` overcounts ~3×). Give two numbers, kept distinct: **exposure (measured)** — tokens + - **Size the token prize** off the spend spine (usage is one row per response, so a plain `SUM` + over assistant `attributes.usage` rows is correct; no dedup needed, LLP 0035). Give two numbers, kept distinct: **exposure (measured)**: tokens currently flowing through the issue (sum across a repeated cluster, or tokens in - error/retry/abandoned turns) — and **est. saving (assumption)** only where the - counterfactual is clean (cache-read ratio, model right-size). Both are floors — capture + error/retry/abandoned turns), and **est. saving (assumption)** only where the + counterfactual is clean (cache-read ratio, model right-size). Both are floors; capture is partial; never present a saving as if it were measured. -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-improvement-review.md` (create the dir - if needed). Dated so reviews accumulate. -- **Bottom line (1 sentence):** "Create/edit these N things and what it will improve.". -- **Improvements (ranked table):** one row per improvement — **what** · **suggested form** - (skill / subagent / AGENTS.md edit, where one fits) · **evidence** (recurs N× across G - gateways, sticking-point impact) · **token prize** (exposure measured / est. saving, - labeled) · **where it lands**. Highest-leverage first. -- **Then the detail**, in these sections: **Basis · Skill candidates · Subagent candidates - (fan-out + roles) · AGENTS.md/CLAUDE.md edits · Efficiency → setup changes · Caveats** — - plus the proposed AGENTS.md lines verbatim. Each candidate is marked **new** or **edit to - **; don't list artifacts that aren't being changed. -- **Formatting (human-readable):** every candidate states its evidence inline; show - AGENTS.md edits as real diffs/code blocks, not prose; **bold** the artifact type; keep - it a ~1-minute read. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. + +- **Main one-pager:** `hypaware-reports/-improvement-review.md` (create the dir if + needed). Dated so reviews accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# AI Improvement Review`, then a `## · ` subtitle. + 2. **Thesis** - ONE **bold** sentence: "Create/edit these N things and what it will improve." + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link (# improvements, new-vs-edit split, the single biggest token prize, the basis: + gateways / sessions / repos). This is the ONLY table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings = the top improvements, highest-leverage + first; each is a `### N. ` naming **what + its form** (skill / subagent / AGENTS.md + edit), followed by 2-3 plain-language sentences with the evidence and the **bold** token prize, + then a single `[
→](/.md)` link on its own line. Show AGENTS.md edits as + real diffs/code blocks in their section file, not prose. + 5. **`## Caveat`** - token prizes are floors and capture is partial, in plain language + a + `[caveats →]` link. + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[basis](/basis.md) · [skill candidates](/skill-candidates.md) · [subagent candidates](/subagent-candidates.md) · [AGENTS.md edits](/agents-md-edits.md) · [efficiency → setup](/efficiency-setup.md) · [caveats](/caveats.md)`. +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file; state findings plainly, and where a sibling report owns the action a plain cross-link is enough. diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-security-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-security-report/SKILL.md index 4eafe2fb..79a85d38 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-security-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-security-report/SKILL.md @@ -1,6 +1,6 @@ --- name: hypaware-ai-security-report -description: Security & Risk Review for a HypAware server — audits gateway logs for risky autonomous activity (destructive commands, remote-exec, privilege escalation, secret reads, network egress, package installs), severity-ranks findings, and recommends guardrails. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. Audits recorded logs, NOT pending code — for a code diff use a dedicated code-review tool. +description: Security & Risk Review for a HypAware server: audits gateway logs for risky autonomous activity (destructive commands, remote-exec, privilege escalation, secret reads, network egress, package installs), severity-ranks findings, and recommends guardrails. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. Audits recorded logs, NOT pending code; for a code diff use a dedicated code-review tool. --- # Security & Risk Review @@ -10,15 +10,15 @@ unsupervised that carries risk**, and the guardrails to contain it. This audits activity*, not pending code (for a diff, use a dedicated code-review tool). Be proportionate: separate "ran a risky command" from "caused harm". -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to audit, then proceed against the chosen source. -**REDACT everything** — never echo a secret, token, key, or credential value in the report, +**REDACT everything.** Never echo a secret, token, key, or credential value in the report, even one you found in `tool_args`; a raw secret appearing in args is itself a capture finding (flag it, don't reproduce it). Query mechanics live in the **hypaware-query** skill; read it first. @@ -32,12 +32,12 @@ even one you found in `tool_args`; a raw secret appearing in args is itself a ca | Network egress | outbound `curl`/`scp`/`nc` POSTs to external hosts | potential exfiltration | | Package install | `npm i`, `pip install`, `brew install`, `apt` | supply-chain surface | -Risk is **amplified by autonomy** — weight anything that ran under `bypassPermissions` higher +Risk is **amplified by autonomy**: weight anything that ran under `bypassPermissions` higher than the same command in a gated session. ## Procedure 1. **Coverage + autonomy baseline.** Window; coverage of `tool_args` on Bash calls and of - `permission_mode` (bound the read if sparse); distinct `gateway_id` (the unit — `user_id` is + `permission_mode` (bound the read if sparse); distinct `gateway_id` (the unit; `user_id` is ~always null). Total tool calls, % Bash, % conversations in `bypassPermissions`. State N; if it's a dogfood window, say so. 2. **Scan + rank.** Histogram the first token of each Bash command (top 30 = the normal), then @@ -49,17 +49,30 @@ than the same command in a gated session. policy**, an **agent-instructions rule** (AGENTS.md / CLAUDE.md), or a **gateway redaction** fix for any secret seen in args. -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-security-review.md` (create the dir if needed). - Dated so reports accumulate. -- **Bottom line (1 sentence):** the posture — how autonomous the fleet is (% `bypassPermissions`) - and whether anything needs urgent attention. -- **Findings (severity-ranked table):** one row per finding — **risk class** · **severity** - (high / med / low) · **count** · **example (redacted)** · **guardrail** · **where it ran**. - Highest severity first. -- **Then the detail**, in these sections: **Autonomy baseline · Command profile · Risk findings - · Guardrails · Trends · Capture caveats**. **All examples redacted.** -- **Formatting (human-readable):** open each section with its takeaway; **bold** severities; one - severity-ranked findings table as the centerpiece; keep the bottom line + table a ~1-minute read. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. **All examples redacted in every file.** + +- **Main one-pager:** `hypaware-reports/-security-review.md` (create the dir if + needed). Dated so reports accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# Security & Risk Review`, then a `## · ` subtitle. + 2. **Thesis** - ONE **bold** sentence: the posture - how autonomous the fleet is + (% `bypassPermissions`) and whether anything needs urgent attention. + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link and all redacted (% `bypassPermissions`, count of HIGH findings, the top risk class, + total tool calls + % Bash, coverage of `tool_args` / `permission_mode`). This is the ONLY + table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings, highest-severity first; each is a + `### N. ` followed by 2-3 plain-language sentences with **bold** severities (redacted), + then a single `[
→](/.md)` link on its own line (autonomy posture, the top + risk findings, the single most important guardrail). + 5. **`## Caveat`** - the standing capture caveat in plain language + a `[capture caveats →]` link. + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[autonomy baseline](/autonomy-baseline.md) · [command profile](/command-profile.md) · [risk findings](/risk-findings.md) · [guardrails](/guardrails.md) · [trends](/trends.md) · [capture caveats](/capture-caveats.md)`. +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file (redaction still applies); state findings plainly, and where a sibling report owns the action a plain cross-link is enough. - **Capture-health note:** if raw credentials appear in `tool_args`, the standing #1 finding is - "gateway must redact secrets in tool_args" — put it in Capture caveats without reproducing it. + "gateway must redact secrets in tool_args", put it in Capture caveats without reproducing it. diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-spend-report/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-spend-report/SKILL.md index b04ffa93..33599948 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-spend-report/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-ai-spend-report/SKILL.md @@ -1,46 +1,48 @@ --- name: hypaware-ai-spend-report -description: AI Spend Review for a HypAware server — where tokens go (by user/repo/model/gateway), what work they go on (graph-clustered), and how to spend less (waste scorecard + cost levers). Deduped token volume, never dollars. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. +description: AI Spend Review for a HypAware server: where tokens go (by user/repo/model/gateway), what work they go on (graph-clustered), and how to spend less (waste scorecard + cost levers). Token volume, never dollars. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. --- # AI Spend Review -Where the team's tokens go, on what work, and how to spend less — read off one deduped token +Where the team's tokens go, on what work, and how to spend less, read off one token spine. One story in three moves: **where it goes** (by user / repo / model / gateway, over time), **on what work** (sessions clustered into recurring work-types, sized by tokens), and **how to spend less** (a waste scorecard + ranked cost levers). -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to run the review against, then proceed against the chosen source. -**Tokens, never dollars** — capture is partial, so stop at token volume and behavioral proxies. +**Tokens, never dollars.** Capture is partial, so stop at token volume and behavioral proxies. Query mechanics (`--remote`, date-pruning the 30s timeout, the SQL dialect, sampling) live in the **hypaware-query** skill; read it first. -## Token math (get this right — every breakdown reconciles to it) +## Token math (get this right; every breakdown reconciles to it) Usage is in `attributes.usage` (NOT `raw_frame`): `input_tokens`, `output_tokens`, -`cache_read_tokens`, `cache_write_tokens` (+ `reasoning_tokens` for Codex). Dedup by message -and `max()` — a plain `SUM` overcounts ~3×. Report the four types separately (cache-read is +`cache_read_tokens`, `cache_write_tokens` (+ `reasoning_tokens` for Codex). Usage rides exactly +one row per response (the last assistant part; non-carrier parts are null), so a plain `SUM` +over assistant rows is correct with no dedup (the one-carrier rule, LLP 0035); `input_tokens` +is net of cache, so it never double-counts. Report the four types separately (cache-read is usually the bulk; output the scarce slice). ```sql -WITH msg AS ( - SELECT COALESCE(CAST(JSON_EXTRACT(raw_frame,'$.message_id') AS VARCHAR), message_id) mid, - max(CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)) inp, - max(CAST(JSON_EXTRACT(attributes,'$.usage.output_tokens') AS BIGINT)) outp, - max(CAST(JSON_EXTRACT(attributes,'$.usage.cache_write_tokens') AS BIGINT)) cwrite, - max(CAST(JSON_EXTRACT(attributes,'$.usage.cache_read_tokens') AS BIGINT)) cread - FROM ai_gateway_messages - WHERE date BETWEEN '' AND '' - AND role='assistant' AND JSON_EXTRACT(attributes,'$.usage') IS NOT NULL - GROUP BY mid) -SELECT sum(inp) t_in, sum(outp) t_out, sum(cwrite) t_cw, sum(cread) t_cr FROM msg; --- slice by adding max(gateway_id)/max(model)/max(repo_root)/max(date) inside, GROUP BY outside. +SELECT + sum(CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)) t_in, + sum(CAST(JSON_EXTRACT(attributes,'$.usage.output_tokens') AS BIGINT)) t_out, + sum(CAST(JSON_EXTRACT(attributes,'$.usage.cache_write_tokens') AS BIGINT)) t_cw, + sum(CAST(JSON_EXTRACT(attributes,'$.usage.cache_read_tokens') AS BIGINT)) t_cr +FROM ai_gateway_messages +WHERE date BETWEEN '' AND '' + AND role='assistant' AND JSON_EXTRACT(attributes,'$.usage') IS NOT NULL; +-- One carrier row per response (LLP 0035): a plain SUM is correct, no dedup. +-- Slice by adding gateway_id / model / repo_root / date to SELECT + GROUP BY. +-- Defensive equivalent: max(...) GROUP BY session_id, message_id -- session_id is the +-- uniform key; conversation_id is null for Claude and only separates Codex threads. ``` ## Procedure @@ -51,25 +53,38 @@ SELECT sum(inp) t_in, sum(outp) t_out, sum(cwrite) t_cw, sum(cread) t_cr FROM ms proxies (turns, tool calls, length), labeled as estimates. 2. **Where it goes + on what work.** Slice the spine by user / repo / model (→ `(unknown)` bucket) / gateway with shares, plus trends (by week, WoW deltas vs the last report) and - top-spend outliers. Then cluster sessions into recurring **work-types** — shared-file overlap - for code work, tool-set signature for no-file work (context graph if projected, else SQL) — - and size each by deduped tokens. -3. **How to spend less.** Score the waste — cache-read ratio `cache_read/(cache_read+input)` - (the biggest lever — feature it), retry loops, abandoned costly sessions, model over-spec, - context bloat — and turn the top few into ranked cost levers, each with an estimated weekly + top-spend outliers. Then cluster sessions into recurring **work-types**: shared-file overlap + for code work, tool-set signature for no-file work (context graph if projected, else SQL), + and size each by tokens. +3. **How to spend less.** Score the waste: cache-read ratio `cache_read/(cache_read+input)` + (the biggest lever, feature it), retry loops, abandoned costly sessions, model over-spec, + context bloat, and turn the top few into ranked cost levers, each with an estimated weekly token saving. Hand the expensive recurring work-types to **hypaware-ai-improvement-report** as packaging candidates (surface them; don't build them here). -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-spend-review.md` (create the dir if needed). Dated - so reviews accumulate. -- **Bottom line (1 sentence):** total tokens this window, the trend vs the last report (▲/▼ %), - and the single biggest driver. -- **Cost actions (ranked table):** one row per lever — **action** · **evidence** (tokens/share - + cause) · **est. weekly saving** · **where it lands**. Highest-saving first. -- **Then the detail**, in these sections: **Coverage & footprint · Total spend (4 token types) - · Allocation (user / repo / model / gateway) · Trends · Outliers · Spend by work-type · Waste - scorecard · Leverage candidates (handoff to hypaware-ai-improvement-report) · Caveats**. -- **Formatting (human-readable):** open each section with its one-sentence takeaway, then the - table; **bold** headline numbers; split the four token types (never collapse them); keep the - bottom line + ranked table a ~1-minute read. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. + +- **Main one-pager:** `hypaware-reports/-spend-review.md` (create the dir if needed). + Dated so reviews accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# AI Spend Review`, then a `## · ` subtitle. + 2. **Thesis** - ONE **bold** sentence: total tokens this window, the trend vs the last report + (▲/▼ %), and the single biggest driver. + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link (total tokens with the four types split, WoW trend, the top allocation who/what + dominates, the biggest waste lever + est. weekly saving, usage coverage). This is the ONLY + table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings, highest-leverage first; each is a + `### N. ` followed by 2-3 plain-language sentences with **bold** numbers, then a + single `[
→](/.md)` link on its own line (where it goes, on what work, the + biggest waste lever, the top cost action). + 5. **`## Caveat`** - tokens-never-dollars and partial capture, in plain language + a `[caveats →]` + link. + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[coverage](/coverage.md) · [total spend](/total-spend.md) · [allocation](/allocation.md) · [trends](/trends.md) · [outliers](/outliers.md) · [work-types](/work-types.md) · [waste scorecard](/waste-scorecard.md) · [leverage candidates](/leverage-candidates.md) · [caveats](/caveats.md)`. +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file; state findings plainly, and where a sibling report owns the action a plain cross-link is enough. diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md index f53b9624..e47de5db 100644 --- a/hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md +++ b/hypaware-core/plugins-workspace/claude/skills/hypaware-query/SKILL.md @@ -76,12 +76,16 @@ Key columns: - `tool_name`, `tool_call_id`, `tool_args`, `status` — tool-call/result joins and sparse status such as `finish_reason`. - `attributes` (JSON) — request settings, usage, propagated `dev_run_id`, and gateway diagnostics under `attributes.gateway`. -**Token counts** live under `attributes.usage` on `role='assistant'` rows (NOT in `raw_frame`): `input_tokens`, `output_tokens`, `cache_read_tokens`, `cache_write_tokens`. Codex (`provider='openai'`) omits `cache_write_tokens` and adds `reasoning_tokens` + `total_tokens`. Extract with `CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)`; usage repeats across a message's content parts, so dedup per `message_id` (e.g. `max(...)`) before summing. +**Token counts** live under `attributes.usage` on `role='assistant'` rows (NOT in `raw_frame`): `input_tokens`, `output_tokens`, `cache_read_tokens`, `cache_write_tokens`. Codex (`provider='openai'`) omits `cache_write_tokens` and adds `reasoning_tokens` + `total_tokens`. Extract with `CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)`. Usage rides exactly one row per response (the last assistant part; non-carrier parts are null), so a plain `SUM` over assistant rows is correct with no dedup (the one-carrier rule, LLP 0035). If you prefer a defensive dedup, `max(...) GROUP BY session_id, message_id` returns the same number: key on `session_id` (`conversation_id` is null for Claude, and only separates threads within a Codex session). Claude transcript enrichment adds `provider_uuid`, `parent_uuid`, `request_id`, `entrypoint`, `client_version`, `user_type`, `permission_mode`, and `hook_event` when the local Claude Code JSONL transcript can be matched. Run `hyp query schema ai_gateway_messages --format markdown` for the authoritative column reference. +## When the graph answers it cheaper + +Some questions are relationships, not row scans. For descriptive "who used what" rollups (distinct sessions per repo / model / tool / file, or session to commit to PR joins), the **hypaware-graph** skill reads compact `node` / `edge` adjacency instead of scanning `ai_gateway_messages`, and it reaches GitHub facets (repos, PRs, reviewers) that are not in the messages at all. Keep per-message measures here on `ai_gateway_messages`: token sums, `count(*)` call totals, error / stop-reason, and `content_text`. See the hypaware-graph skill for the full boundary. + ## Guardrails - Do not assume the cache auto-refreshes. Query commands default to `--refresh never`. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md index 5c6ab15b..d397fd71 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-adoption-report/SKILL.md @@ -1,64 +1,98 @@ --- name: hypaware-ai-adoption-report -description: AI Adoption Profile for a HypAware server — descriptive "who's using the fleet and how": per-gateway utilization (volume + focus — top models, tools, repos, themes) and parallelism/fan-out (multi-agent adoption, concurrency, main-vs-subagent split, payoff). +description: AI Adoption Profile for a HypAware server: descriptive "who's using the fleet and how": per-gateway utilization (volume + focus: top models, tools, repos, themes) and parallelism/fan-out (multi-agent adoption, concurrency, main-vs-subagent split, payoff). --- # AI Adoption Profile -The one **descriptive** report on *who's using the fleet and how* — no actions, just a clear -picture. Two lenses over one window: +A profile of *who's using the fleet and how*, read through two lenses over one window: -- **Utilization** — per `gateway_id` (≈ one machine/user): *how much* (messages, sessions, +- **Utilization:** per `gateway_id` (≈ one machine/user): *how much* (messages, sessions, active days, tokens, cache-read ratio) and *what it's focused on* (top models, tools, repos, work themes). -- **Parallelism / fan-out** — *how sophisticated*: multi-agent adoption, fan-out breadth/depth, +- **Parallelism / fan-out:** *how sophisticated*: multi-agent adoption, fan-out breadth/depth, true concurrency vs serial, the main-loop-vs-subagent token split, and whether fan-out appears to earn its cost. -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to profile, then proceed against the chosen source. -**Descriptive only** — route every *action* out: "fan out more/less" and tooling go to +**Descriptive only.** Route every *action* out: "fan out more/less" and tooling go to **hypaware-ai-improvement-report**, token waste to **hypaware-ai-spend-report**. Query mechanics -live in the **hypaware-query** skill; reuse hypaware-ai-spend-report's deduped token spine for -any token figure. +live in the **hypaware-query** skill; reuse hypaware-ai-spend-report's token spine for +any token figure. For descriptive who-used-what rollups (distinct sessions per +repo/model/tool/file), prefer the **hypaware-graph** skill, which reads them from the projected +graph instead of scanning messages; keep token figures on messages. ## Procedure -1. **Scope + coverage.** Distinct `gateway_id` (the unit — `user_id` is ~always null, so don't +1. **Scope + coverage.** Distinct `gateway_id` (the unit; `user_id` is ~always null, so don't measure reach by it); window; coverage of `gateway_id`, token usage, and subagent provenance - (`agent_id` / `is_sidechain` / `parent_thread_id` — transcript-enriched, may not survive + (`agent_id` / `is_sidechain` / `parent_thread_id`, transcript-enriched, may not survive ingest). Decide cost-capable vs volume-only, and which parallelism dimensions are real vs proxied (the `Task`-call proxy). State N; if it's effectively one gateway / dogfood, say so. + **ALWAYS verify GitHub enrichment before deciding it's out of scope: probe, never assume.** + A conditional ("if enriched…") obligates you to *test* the condition, not guess it. Run the + probe every run: `hyp query sql "SELECT node_type, projector, count(*) AS n, max(first_seen) + AS newest FROM node GROUP BY node_type, projector" --remote ` (and check `edge` exists via + `hyp query status`). If a `github.t0` projector with `PullRequest` / `Review` nodes is present, + **GitHub reach is IN SCOPE and MUST be computed in step 2**: record the node counts and the max + `first_seen` per type as the graph's as-of date, and treat every reach figure as a floor (a stale + projection undercounts). If the probe returns no `github.t0` / PR / Review nodes, state + **"checked - no GitHub enrichment present"** explicitly. **Never write "not assessed" / "not + queried" for reach**: that phrasing means you skipped the probe, which is the one thing this step + forbids. 2. **Per-gateway utilization.** One row per gateway: volume (messages, sessions, active days, - first/last seen), tokens + cache-read ratio `cache_read/(cache_read+input)`, then its focus — + first/last seen), tokens + cache-read ratio `cache_read/(cache_read+input)`, then its focus: top models (+ `(unknown)`), tools (Bash dominance + top commands), repos, client (claude/codex), and 2–4 recurring work themes (sampled, redacted). Distill each into a - one-line **focus label**. Use the activity graph for structural focus (Session→Repo/PR via - GitHub enrichment) where projected. + one-line **focus label**. When the step-1 probe found `github.t0` enrichment, you **MUST** add + each gateway's real *reach* from it: the repos and PRs its work actually landed in (`Session -at-> + Commit <-references- PullRequest`), and whether that work drew review (`... PullRequest <-on- Review + <-submitted- Actor`); attribute sessions to a gateway/person via session `cwd` (`/Users/`). + This is footprint the messages cannot show and is not optional when the graph supports it. Keep it descriptive (route any "review + more" action to hypaware-ai-improvement-report) and dated to the graph's freshness from step 1. 3. **Parallelism / fan-out.** Adoption (% conversations with ≥1 subagent, incl. the zero bucket); breadth (subagents per parent) and depth (`parent_thread_id` chains); concurrency (do subagent time-spans actually *overlap*, or is it serial?); cost split (token share main - loop vs subagent). Read the payoff *descriptively* — fan-out vs tokens-to-resolution — and + loop vs subagent). Read the payoff *descriptively*: fan-out vs tokens-to-resolution, and route the "fan out more/less" recommendation to hypaware-ai-improvement-report. -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-adoption-profile.md` (create the dir if needed). - Dated so reports accumulate. -- **Bottom line (1 sentence):** how many gateways are active, the busiest one and its focus, and - the headline fan-out adoption %. -- **Per-gateway profile (sortable table):** one row per gateway — **gateway** · **volume** - (messages / sessions) · **tokens** · **cache-read %** · **fan-out** (adoption + main-vs-subagent - split) · **focus label**. Busiest first — this table is the spine. -- **Then the detail**, in these sections: **Scope & coverage · Per-gateway utilization · - Per-gateway focus · Parallelism & fan-out · Payoff (descriptive) · Fleet view · Caveats**. -- **Formatting (human-readable):** open each section with its takeaway; **bold** headline - numbers; one sortable footprint table as the centerpiece; keep the bottom line + table a - ~1-minute read. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. + +- **Main one-pager:** `hypaware-reports/-adoption-profile.md` (create the dir if + needed). Dated so reports accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# AI Adoption Profile`, then a `## · ` subtitle + (e.g. `## HYP_CENTRAL fleet · 30-day view`). + 2. **Thesis** - ONE **bold** sentence carrying the whole story: how many gateways are active, + the busiest one + its focus, and the headline fan-out adoption %. + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link (e.g. `**3 active gateways** | phil, brendan, kenny`; busiest-gateway share; % sessions + using a subagent; % volume inside subagents; cost-capable vs volume-only). This is the ONLY + table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings, highest-signal first; each is a + `### N. ` followed by 2-3 plain-language sentences with **bold** numbers, then a + single `[
→](/.md)` link on its own line (e.g. adoption concentration, the + busiest power-user, subagent adoption, concurrency consistency, distinct per-gateway identities). + 5. **`## Caveat`** - the single most important caveat in plain language + a `[caveats →]` link + (for a transcript-only window: a volume-only profile, no cost/cache; note whether subagent + identity survived ingest). + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[scope & coverage](/scope-coverage.md) · [per-gateway utilization](/per-gateway-utilization.md) · [per-gateway focus](/per-gateway-focus.md) · [parallelism & fan-out](/parallelism-fan-out.md) · [payoff](/payoff.md) · [fleet view](/fleet-view.md) · [caveats](/caveats.md)`. +- **GitHub reach (where enriched):** the per-gateway-focus section file shows each gateway's repos + and PRs reached, as of the graph's projection date from step 1 (a stale graph makes reach a floor, + not an upper bound). +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file; state findings plainly, and where a sibling report owns the action a plain cross-link is enough. - **Capture-health note:** if subagent provenance doesn't reach the server, the standing #1 - caveat is "subagent identity must survive ingest" — run adoption off the sub-agent-invocation + caveat is "subagent identity must survive ingest", run adoption off the sub-agent-invocation proxy (the tool calls that spawn sub-agents) and flag it. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-improvement-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-improvement-report/SKILL.md index d45a4a4a..09b1512f 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-improvement-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-improvement-report/SKILL.md @@ -9,22 +9,24 @@ Read how the team's agents actually behave in the logs, then propose the **addit edits to skills, subagents, and AGENTS.md/CLAUDE.md** that make them do **better work (quality)** with **less wasted effort and spend**. -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to run the review against, then proceed against the chosen source. -Query mechanics live in the **hypaware-query** skill; read it first. +Query mechanics live in the **hypaware-query** skill; read it first. For descriptive +who-used-what rollups (distinct sessions per repo/model/tool/file), prefer the **hypaware-graph** +skill: it reads relationships from the projected graph instead of scanning messages. Focus on these signals: -- **Repeated work** — work the team repeats successfully → package it once (skill / subagent). -- **Sticking points** — where agents get stuck or redo work (errors, loops, refusals, +- **Repeated work:** work the team repeats successfully → package it once (skill / subagent). +- **Sticking points:** where agents get stuck or redo work (errors, loops, refusals, abandonment) → the missing or too-weak instruction that would prevent it (an AGENTS.md rule, or a skill). -- **Inefficiency** — expensive patterns (model over-spec, context bloat, low cache-read, +- **Inefficiency:** expensive patterns (model over-spec, context bloat, low cache-read, retry loops) → the setup change that costs less (right-size the model in AGENTS.md / a subagent, a context-hygiene rule, a skill that avoids the redo). @@ -33,45 +35,58 @@ Focus on these signals: claude/codex mix. State N and breadth; flag if single-contributor. 2. **Scan the signals for possible improvements.** Work the three signals above; each turns up candidates. Note frequency (sessions, distinct gateways) and redact examples. - - **Repeated work** — cluster sessions by shared-file overlap and tool-set signature into + - **Repeated work:** cluster sessions by shared-file overlap and tool-set signature into recurring kinds of work (reuse a recent `hypaware-reports/` *Leverage candidates* handoff if present). Flag parallelizable work done serially (low `is_sidechain`, few `agent_id`), and recurring asks / multi-step workflows / re-sent instructions in sampled prompts + `system_text`. - - **Sticking points** — where agents got stuck or redid work, ranked by impact: failing + - **Sticking points:** where agents got stuck or redid work, ranked by impact: failing tools (`is_error` by `tool_name`), retry loops (same tool + first `tool_args` token ≥3×/session), refusals/truncations (stop-reason), abandoned costly sessions, - repeatedly-violated conventions. - - **Inefficiency** (reuse the spend spine) — model over-spec, context bloat (no + repeatedly-violated conventions. Optional outcome lens where the graph is GitHub-enriched: + work that never landed (sessions whose commits reach no `PullRequest`) or drew heavy review + churn can corroborate a sticking point, but treat it as a proxy, not proof (unmerged or + heavily-reviewed work is not necessarily bad). + - **Inefficiency** (reuse the spend spine): model over-spec, context bloat (no `is_compact_summary`), low cache-read (`cache_read/(cache_read+input)`), redo loops. 3. **Collect, dedup, prioritize.** Gather the candidates into a list of possible - improvements; drop anything an existing artifact already covers — a quick scan of the - repo's `.claude/skills/`, subagents, and AGENTS.md/CLAUDE.md (the only repo read — every + improvements; drop anything an existing artifact already covers: a quick scan of the + repo's `.claude/skills/`, subagents, and AGENTS.md/CLAUDE.md (the only repo read; every other signal is the logs), used to dedup and to mark each survivor **new** vs **edit to an existing artifact**. For each, suggest a likely form - where one obviously fits — a **skill**, a **subagent**, or an **AGENTS.md/CLAUDE.md - edit** (propose the concrete line(s) as a diff, not "consider documenting") — but don't + where one obviously fits: a **skill**, a **subagent**, or an **AGENTS.md/CLAUDE.md + edit** (propose the concrete line(s) as a diff, not "consider documenting"), but don't force a mapping. Attach evidence (frequency/impact + distinct gateways + token prize) and rank by it. - - **Size the token prize** off the spend spine (deduped `max()` per `message_id`; a plain - `SUM` overcounts ~3×). Give two numbers, kept distinct: **exposure (measured)** — tokens + - **Size the token prize** off the spend spine (usage is one row per response, so a plain `SUM` + over assistant `attributes.usage` rows is correct; no dedup needed, LLP 0035). Give two numbers, kept distinct: **exposure (measured)**: tokens currently flowing through the issue (sum across a repeated cluster, or tokens in - error/retry/abandoned turns) — and **est. saving (assumption)** only where the - counterfactual is clean (cache-read ratio, model right-size). Both are floors — capture + error/retry/abandoned turns), and **est. saving (assumption)** only where the + counterfactual is clean (cache-read ratio, model right-size). Both are floors; capture is partial; never present a saving as if it were measured. -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-improvement-review.md` (create the dir - if needed). Dated so reviews accumulate. -- **Bottom line (1 sentence):** "Create/edit these N things and what it will improve.". -- **Improvements (ranked table):** one row per improvement — **what** · **suggested form** - (skill / subagent / AGENTS.md edit, where one fits) · **evidence** (recurs N× across G - gateways, sticking-point impact) · **token prize** (exposure measured / est. saving, - labeled) · **where it lands**. Highest-leverage first. -- **Then the detail**, in these sections: **Basis · Skill candidates · Subagent candidates - (fan-out + roles) · AGENTS.md/CLAUDE.md edits · Efficiency → setup changes · Caveats** — - plus the proposed AGENTS.md lines verbatim. Each candidate is marked **new** or **edit to - **; don't list artifacts that aren't being changed. -- **Formatting (human-readable):** every candidate states its evidence inline; show - AGENTS.md edits as real diffs/code blocks, not prose; **bold** the artifact type; keep - it a ~1-minute read. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. + +- **Main one-pager:** `hypaware-reports/-improvement-review.md` (create the dir if + needed). Dated so reviews accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# AI Improvement Review`, then a `## · ` subtitle. + 2. **Thesis** - ONE **bold** sentence: "Create/edit these N things and what it will improve." + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link (# improvements, new-vs-edit split, the single biggest token prize, the basis: + gateways / sessions / repos). This is the ONLY table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings = the top improvements, highest-leverage + first; each is a `### N. ` naming **what + its form** (skill / subagent / AGENTS.md + edit), followed by 2-3 plain-language sentences with the evidence and the **bold** token prize, + then a single `[
→](/.md)` link on its own line. Show AGENTS.md edits as + real diffs/code blocks in their section file, not prose. + 5. **`## Caveat`** - token prizes are floors and capture is partial, in plain language + a + `[caveats →]` link. + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[basis](/basis.md) · [skill candidates](/skill-candidates.md) · [subagent candidates](/subagent-candidates.md) · [AGENTS.md edits](/agents-md-edits.md) · [efficiency → setup](/efficiency-setup.md) · [caveats](/caveats.md)`. +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file; state findings plainly, and where a sibling report owns the action a plain cross-link is enough. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-security-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-security-report/SKILL.md index 4eafe2fb..79a85d38 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-security-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-security-report/SKILL.md @@ -1,6 +1,6 @@ --- name: hypaware-ai-security-report -description: Security & Risk Review for a HypAware server — audits gateway logs for risky autonomous activity (destructive commands, remote-exec, privilege escalation, secret reads, network egress, package installs), severity-ranks findings, and recommends guardrails. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. Audits recorded logs, NOT pending code — for a code diff use a dedicated code-review tool. +description: Security & Risk Review for a HypAware server: audits gateway logs for risky autonomous activity (destructive commands, remote-exec, privilege escalation, secret reads, network egress, package installs), severity-ranks findings, and recommends guardrails. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. Audits recorded logs, NOT pending code; for a code diff use a dedicated code-review tool. --- # Security & Risk Review @@ -10,15 +10,15 @@ unsupervised that carries risk**, and the guardrails to contain it. This audits activity*, not pending code (for a diff, use a dedicated code-review tool). Be proportionate: separate "ran a risky command" from "caused harm". -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to audit, then proceed against the chosen source. -**REDACT everything** — never echo a secret, token, key, or credential value in the report, +**REDACT everything.** Never echo a secret, token, key, or credential value in the report, even one you found in `tool_args`; a raw secret appearing in args is itself a capture finding (flag it, don't reproduce it). Query mechanics live in the **hypaware-query** skill; read it first. @@ -32,12 +32,12 @@ even one you found in `tool_args`; a raw secret appearing in args is itself a ca | Network egress | outbound `curl`/`scp`/`nc` POSTs to external hosts | potential exfiltration | | Package install | `npm i`, `pip install`, `brew install`, `apt` | supply-chain surface | -Risk is **amplified by autonomy** — weight anything that ran under `bypassPermissions` higher +Risk is **amplified by autonomy**: weight anything that ran under `bypassPermissions` higher than the same command in a gated session. ## Procedure 1. **Coverage + autonomy baseline.** Window; coverage of `tool_args` on Bash calls and of - `permission_mode` (bound the read if sparse); distinct `gateway_id` (the unit — `user_id` is + `permission_mode` (bound the read if sparse); distinct `gateway_id` (the unit; `user_id` is ~always null). Total tool calls, % Bash, % conversations in `bypassPermissions`. State N; if it's a dogfood window, say so. 2. **Scan + rank.** Histogram the first token of each Bash command (top 30 = the normal), then @@ -49,17 +49,30 @@ than the same command in a gated session. policy**, an **agent-instructions rule** (AGENTS.md / CLAUDE.md), or a **gateway redaction** fix for any secret seen in args. -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-security-review.md` (create the dir if needed). - Dated so reports accumulate. -- **Bottom line (1 sentence):** the posture — how autonomous the fleet is (% `bypassPermissions`) - and whether anything needs urgent attention. -- **Findings (severity-ranked table):** one row per finding — **risk class** · **severity** - (high / med / low) · **count** · **example (redacted)** · **guardrail** · **where it ran**. - Highest severity first. -- **Then the detail**, in these sections: **Autonomy baseline · Command profile · Risk findings - · Guardrails · Trends · Capture caveats**. **All examples redacted.** -- **Formatting (human-readable):** open each section with its takeaway; **bold** severities; one - severity-ranked findings table as the centerpiece; keep the bottom line + table a ~1-minute read. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. **All examples redacted in every file.** + +- **Main one-pager:** `hypaware-reports/-security-review.md` (create the dir if + needed). Dated so reports accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# Security & Risk Review`, then a `## · ` subtitle. + 2. **Thesis** - ONE **bold** sentence: the posture - how autonomous the fleet is + (% `bypassPermissions`) and whether anything needs urgent attention. + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link and all redacted (% `bypassPermissions`, count of HIGH findings, the top risk class, + total tool calls + % Bash, coverage of `tool_args` / `permission_mode`). This is the ONLY + table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings, highest-severity first; each is a + `### N. ` followed by 2-3 plain-language sentences with **bold** severities (redacted), + then a single `[
→](/.md)` link on its own line (autonomy posture, the top + risk findings, the single most important guardrail). + 5. **`## Caveat`** - the standing capture caveat in plain language + a `[capture caveats →]` link. + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[autonomy baseline](/autonomy-baseline.md) · [command profile](/command-profile.md) · [risk findings](/risk-findings.md) · [guardrails](/guardrails.md) · [trends](/trends.md) · [capture caveats](/capture-caveats.md)`. +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file (redaction still applies); state findings plainly, and where a sibling report owns the action a plain cross-link is enough. - **Capture-health note:** if raw credentials appear in `tool_args`, the standing #1 finding is - "gateway must redact secrets in tool_args" — put it in Capture caveats without reproducing it. + "gateway must redact secrets in tool_args", put it in Capture caveats without reproducing it. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-spend-report/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-spend-report/SKILL.md index b04ffa93..33599948 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-spend-report/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-ai-spend-report/SKILL.md @@ -1,46 +1,48 @@ --- name: hypaware-ai-spend-report -description: AI Spend Review for a HypAware server — where tokens go (by user/repo/model/gateway), what work they go on (graph-clustered), and how to spend less (waste scorecard + cost levers). Deduped token volume, never dollars. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. +description: AI Spend Review for a HypAware server: where tokens go (by user/repo/model/gateway), what work they go on (graph-clustered), and how to spend less (waste scorecard + cost levers). Token volume, never dollars. Saves a dated report under hypaware-reports/; first asks which HypAware source to query (local logs or a remote server) via the hypaware-query skill. --- # AI Spend Review -Where the team's tokens go, on what work, and how to spend less — read off one deduped token +Where the team's tokens go, on what work, and how to spend less, read off one token spine. One story in three moves: **where it goes** (by user / repo / model / gateway, over time), **on what work** (sessions clustered into recurring work-types, sized by tokens), and **how to spend less** (a waste scorecard + ranked cost levers). -IMPORTANT: Don't assume which logs to read — **ask first.** Start by listing the data sources -and let the user choose which to query: **local logs** (this machine's own recordings — +IMPORTANT: Don't assume which logs to read: **ask first.** Start by listing the data sources +and let the user choose which to query: **local logs** (this machine's own recordings, `hyp query sql …`, no `--remote`) and **each remote HypAware server** (every target from `hyp remote list`, plus any hypaware MCP server already available to you as MCP tools (a -`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways — +`query_sql` / `graph_neighbors` tool in your toolset); the same server can appear both ways, list it once). Present the options, ask which one (or more) to run the review against, then proceed against the chosen source. -**Tokens, never dollars** — capture is partial, so stop at token volume and behavioral proxies. +**Tokens, never dollars.** Capture is partial, so stop at token volume and behavioral proxies. Query mechanics (`--remote`, date-pruning the 30s timeout, the SQL dialect, sampling) live in the **hypaware-query** skill; read it first. -## Token math (get this right — every breakdown reconciles to it) +## Token math (get this right; every breakdown reconciles to it) Usage is in `attributes.usage` (NOT `raw_frame`): `input_tokens`, `output_tokens`, -`cache_read_tokens`, `cache_write_tokens` (+ `reasoning_tokens` for Codex). Dedup by message -and `max()` — a plain `SUM` overcounts ~3×. Report the four types separately (cache-read is +`cache_read_tokens`, `cache_write_tokens` (+ `reasoning_tokens` for Codex). Usage rides exactly +one row per response (the last assistant part; non-carrier parts are null), so a plain `SUM` +over assistant rows is correct with no dedup (the one-carrier rule, LLP 0035); `input_tokens` +is net of cache, so it never double-counts. Report the four types separately (cache-read is usually the bulk; output the scarce slice). ```sql -WITH msg AS ( - SELECT COALESCE(CAST(JSON_EXTRACT(raw_frame,'$.message_id') AS VARCHAR), message_id) mid, - max(CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)) inp, - max(CAST(JSON_EXTRACT(attributes,'$.usage.output_tokens') AS BIGINT)) outp, - max(CAST(JSON_EXTRACT(attributes,'$.usage.cache_write_tokens') AS BIGINT)) cwrite, - max(CAST(JSON_EXTRACT(attributes,'$.usage.cache_read_tokens') AS BIGINT)) cread - FROM ai_gateway_messages - WHERE date BETWEEN '' AND '' - AND role='assistant' AND JSON_EXTRACT(attributes,'$.usage') IS NOT NULL - GROUP BY mid) -SELECT sum(inp) t_in, sum(outp) t_out, sum(cwrite) t_cw, sum(cread) t_cr FROM msg; --- slice by adding max(gateway_id)/max(model)/max(repo_root)/max(date) inside, GROUP BY outside. +SELECT + sum(CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)) t_in, + sum(CAST(JSON_EXTRACT(attributes,'$.usage.output_tokens') AS BIGINT)) t_out, + sum(CAST(JSON_EXTRACT(attributes,'$.usage.cache_write_tokens') AS BIGINT)) t_cw, + sum(CAST(JSON_EXTRACT(attributes,'$.usage.cache_read_tokens') AS BIGINT)) t_cr +FROM ai_gateway_messages +WHERE date BETWEEN '' AND '' + AND role='assistant' AND JSON_EXTRACT(attributes,'$.usage') IS NOT NULL; +-- One carrier row per response (LLP 0035): a plain SUM is correct, no dedup. +-- Slice by adding gateway_id / model / repo_root / date to SELECT + GROUP BY. +-- Defensive equivalent: max(...) GROUP BY session_id, message_id -- session_id is the +-- uniform key; conversation_id is null for Claude and only separates Codex threads. ``` ## Procedure @@ -51,25 +53,38 @@ SELECT sum(inp) t_in, sum(outp) t_out, sum(cwrite) t_cw, sum(cread) t_cr FROM ms proxies (turns, tool calls, length), labeled as estimates. 2. **Where it goes + on what work.** Slice the spine by user / repo / model (→ `(unknown)` bucket) / gateway with shares, plus trends (by week, WoW deltas vs the last report) and - top-spend outliers. Then cluster sessions into recurring **work-types** — shared-file overlap - for code work, tool-set signature for no-file work (context graph if projected, else SQL) — - and size each by deduped tokens. -3. **How to spend less.** Score the waste — cache-read ratio `cache_read/(cache_read+input)` - (the biggest lever — feature it), retry loops, abandoned costly sessions, model over-spec, - context bloat — and turn the top few into ranked cost levers, each with an estimated weekly + top-spend outliers. Then cluster sessions into recurring **work-types**: shared-file overlap + for code work, tool-set signature for no-file work (context graph if projected, else SQL), + and size each by tokens. +3. **How to spend less.** Score the waste: cache-read ratio `cache_read/(cache_read+input)` + (the biggest lever, feature it), retry loops, abandoned costly sessions, model over-spec, + context bloat, and turn the top few into ranked cost levers, each with an estimated weekly token saving. Hand the expensive recurring work-types to **hypaware-ai-improvement-report** as packaging candidates (surface them; don't build them here). -## Output — SAVE A MARKDOWN FILE -- **Path:** `hypaware-reports/-spend-review.md` (create the dir if needed). Dated - so reviews accumulate. -- **Bottom line (1 sentence):** total tokens this window, the trend vs the last report (▲/▼ %), - and the single biggest driver. -- **Cost actions (ranked table):** one row per lever — **action** · **evidence** (tokens/share - + cause) · **est. weekly saving** · **where it lands**. Highest-saving first. -- **Then the detail**, in these sections: **Coverage & footprint · Total spend (4 token types) - · Allocation (user / repo / model / gateway) · Trends · Outliers · Spend by work-type · Waste - scorecard · Leverage candidates (handoff to hypaware-ai-improvement-report) · Caveats**. -- **Formatting (human-readable):** open each section with its one-sentence takeaway, then the - table; **bold** headline numbers; split the four token types (never collapse them); keep the - bottom line + ranked table a ~1-minute read. +## Output - SAVE A SHORT MAIN FILE + ONE FILE PER SECTION +A **progressively-disclosed one-pager** is the deliverable: a reader can stop after the thesis +sentence, glance the key-numbers table, skim the findings, or follow any link into a detail +**section**, and each section is its own markdown file. Build the one-pager top-to-bottom so each +layer is shorter and more glanceable than the one it summarizes. + +- **Main one-pager:** `hypaware-reports/-spend-review.md` (create the dir if needed). + Dated so reviews accumulate. Lay it out in exactly these blocks, separated by `---` rules: + 1. **Title + scope** - `# AI Spend Review`, then a `## · ` subtitle. + 2. **Thesis** - ONE **bold** sentence: total tokens this window, the trend vs the last report + (▲/▼ %), and the single biggest driver. + 3. **`### Key numbers`** - a small metric→readout table, ~4-6 rows, each one glanceable fact with + no link (total tokens with the four types split, WoW trend, the top allocation who/what + dominates, the biggest waste lever + est. weekly saving, usage coverage). This is the ONLY + table on the one-pager. + 4. **`## What this shows`** - 3-5 numbered findings, highest-leverage first; each is a + `### N. ` followed by 2-3 plain-language sentences with **bold** numbers, then a + single `[
→](/.md)` link on its own line (where it goes, on what work, the + biggest waste lever, the top cost action). + 5. **`## Caveat`** - tokens-never-dollars and partial capture, in plain language + a `[caveats →]` + link. + 6. **`## Report details`** - a one-line footer linking **every** section file written this run + (not just the cited ones), so nothing is orphaned - e.g. + `[coverage](/coverage.md) · [total spend](/total-spend.md) · [allocation](/allocation.md) · [trends](/trends.md) · [outliers](/outliers.md) · [work-types](/work-types.md) · [waste scorecard](/waste-scorecard.md) · [leverage candidates](/leverage-candidates.md) · [caveats](/caveats.md)`. +- **Section files are analysis, not inventory.** Each detail section is its own `/.md`, held to the same standard as the one-pager: it argues one claim, opens with that takeaway, and ties every number to what it means for the reader. A section file that is only a stat table has failed - fold it back into the one-pager rather than shipping it as a page. Cut table narration ("how to read the table") and standing bookkeeping prose; compress source/window/method to a few lines. +- **No scope apologies (in any file).** Scope rules (what routes to which report) are authoring guidance, never report copy. Don't write "descriptive only", "no recommendations here", or routing disclaimers in the one-pager or any section file; state findings plainly, and where a sibling report owns the action a plain cross-link is enough. diff --git a/hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md b/hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md index 7d4abc2e..2940dd5e 100644 --- a/hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md +++ b/hypaware-core/plugins-workspace/codex/skills/hypaware-query/SKILL.md @@ -76,12 +76,16 @@ Key columns: - `tool_name`, `tool_call_id`, `tool_args`, `status` — tool-call/result joins and sparse status such as `finish_reason`. - `attributes` (JSON) — request settings, usage, propagated `dev_run_id`, and gateway diagnostics under `attributes.gateway`. -**Token counts** live under `attributes.usage` on `role='assistant'` rows (NOT in `raw_frame`): `input_tokens`, `output_tokens`, `cache_read_tokens`, `cache_write_tokens`. Codex (`provider='openai'`) omits `cache_write_tokens` and adds `reasoning_tokens` + `total_tokens`. Extract with `CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)`; usage repeats across a message's content parts, so dedup per `message_id` (e.g. `max(...)`) before summing. +**Token counts** live under `attributes.usage` on `role='assistant'` rows (NOT in `raw_frame`): `input_tokens`, `output_tokens`, `cache_read_tokens`, `cache_write_tokens`. Codex (`provider='openai'`) omits `cache_write_tokens` and adds `reasoning_tokens` + `total_tokens`. Extract with `CAST(JSON_EXTRACT(attributes,'$.usage.input_tokens') AS BIGINT)`. Usage rides exactly one row per response (the last assistant part; non-carrier parts are null), so a plain `SUM` over assistant rows is correct with no dedup (the one-carrier rule, LLP 0035). If you prefer a defensive dedup, `max(...) GROUP BY session_id, message_id` returns the same number: key on `session_id` (`conversation_id` is null for Claude, and only separates threads within a Codex session). Claude transcript enrichment adds `provider_uuid`, `parent_uuid`, `request_id`, `entrypoint`, `client_version`, `user_type`, `permission_mode`, and `hook_event` when the local Claude Code JSONL transcript can be matched. Run `hyp query schema ai_gateway_messages --format markdown` for the authoritative column reference. +## When the graph answers it cheaper + +Some questions are relationships, not row scans. For descriptive "who used what" rollups (distinct sessions per repo / model / tool / file, or session to commit to PR joins), the **hypaware-graph** skill reads compact `node` / `edge` adjacency instead of scanning `ai_gateway_messages`, and it reaches GitHub facets (repos, PRs, reviewers) that are not in the messages at all. Keep per-message measures here on `ai_gateway_messages`: token sums, `count(*)` call totals, error / stop-reason, and `content_text`. See the hypaware-graph skill for the full boundary. + ## Guardrails - Do not assume the cache auto-refreshes. Query commands default to `--refresh never`. diff --git a/hypaware-core/plugins-workspace/context-graph/skills/hypaware-graph/SKILL.md b/hypaware-core/plugins-workspace/context-graph/skills/hypaware-graph/SKILL.md index c96b79bc..d6f1c9f1 100644 --- a/hypaware-core/plugins-workspace/context-graph/skills/hypaware-graph/SKILL.md +++ b/hypaware-core/plugins-workspace/context-graph/skills/hypaware-graph/SKILL.md @@ -1,15 +1,15 @@ --- name: hypaware-graph -description: Explore the local HypAware context graph — the activity graph projected from ai_gateway_messages (Sessions, Apps, Models, Tools, Files and how they connect). Use when the user asks what connects to a file/session/tool/model, wants co-occurrence or N-hop traversal, or wants to build/refresh the graph. Covers `hyp graph project` and `hyp graph neighbors`. +description: Explore the HypAware context graph: the activity graph projected from ai_gateway_messages (Sessions, Apps, Models, Tools, Files, and git Repos/Commits, and how they connect), plus optional server-side GitHub enrichment (Repos, PullRequests, Commits, Reviewers) that bridges AI sessions to code review. Use when the user asks what connects to a file/session/tool/model, wants co-occurrence or N-hop traversal, wants to join AI sessions to GitHub repos/PRs/reviewers, or wants to build/refresh the graph. Covers `hyp graph project` and `hyp graph neighbors`. --- # HypAware Graph -`hyp graph` turns local HypAware recordings into a queryable **activity graph** and walks it. The graph is a derived projection of the `ai_gateway_messages` dataset — the same local data the `hypaware-query` skill reads — read as *relationships* instead of rows. +`hyp graph` turns local HypAware recordings into a queryable **activity graph** and walks it. The graph is a derived projection of the `ai_gateway_messages` dataset (the same local data the `hypaware-query` skill reads), read as *relationships* instead of rows. ## Build or refresh the graph -The projection runs **on demand** — it does not auto-update. Run it before querying, and again after new sessions are recorded: +The projection runs **on demand**; it does not auto-update. Run it before querying, and again after new sessions are recorded: ```bash hyp graph project # project ai_gateway_messages → node/edge tables (idempotent) @@ -17,19 +17,56 @@ hyp graph project --dry-run # show what would be written hyp graph compact # merge duplicate rows, rewrite partitions sorted (maintenance) ``` -`hyp graph project` prints `N node(s), M edge(s) — wrote …`. If `node`/`edge` come back empty, the projection has not been run yet (or there are no recordings). +`hyp graph project` prints `N node(s), M edge(s) - wrote …`. If `node`/`edge` come back empty, the projection has not been run yet (or there are no recordings). ## The graph model -Deterministic T0 projection — exact-key, no models. All edges are **Session-rooted**: +Deterministic T0 projection: exact-key, no models. The core is **Session-rooted**, plus a small git-provenance sub-web (`Repo`, `Commit`) that doubles as the bridge to GitHub enrichment (see below). -- **Node types:** `Session` (one per session), `App` (client), `Model`, `Tool`, `File`. -- **Edge types:** `via` (Session→App), `used_model` (Session→Model), `used` (Session→Tool), `touched` (Session→File). -- **natural_key** is the human key per type: Session = `session_id` (the always-present session container; `conversation_id` is a nullable thread identity, null for Claude), App = `client_name`, Model = model id, Tool = tool name, File = full path (its `label` is the basename). Every row carries inline provenance (`source_dataset`, `projector`, …). +- **Node types:** `Session` (one per session), `App` (client), `Model`, `Tool`, `File`, and, when the session's git context is captured, `Repo` and `Commit`. +- **Edge types:** `via` (Session→App), `used_model` (Session→Model), `used` (Session→Tool), `touched` (Session→File), `in` (Session→Repo), `at` (Session→Commit, the HEAD the session sat on), and `in` (Commit→Repo). +- **natural_key** is the human key per type: Session = `session_id` (the always-present session container; `conversation_id` is a nullable thread identity, null for Claude), App = `client_name`, Model = model id, Tool = tool name, File = full path (or `owner/repo:relpath` when the repo remote is known; its `label` is the basename), Repo = `owner/repo`, Commit = full 40-hex sha. Every row carries inline provenance (`source_dataset`, `projector`, …). +- **Bridge-ready keys.** `Repo`, `Commit`, and `File` use shared, content-addressed natural keys, so a node minted from a session's git context and the same node minted by the GitHub source converge on one id. This is what lets an AI session and a pull request meet at the same commit. + +## GitHub enrichment (server-side) + +The base graph above comes from `ai_gateway_messages` and is available anywhere, including a local install. A **server** can additionally run the `@hypaware/github` source, which captures repo / commit / PR / issue / review events and projects a second contract into the **same** `node` / `edge` tables. This is the graph's biggest payoff, and none of it is answerable from the message data alone. + +**Caveats first, so you don't query nodes that aren't there:** + +- **Server-only and opt-in.** GitHub nodes exist only on a host where the `@hypaware/github` source is configured and has captured events (normally the central server, reached with `--remote`). A plain local projection has none of them. An empty GitHub query usually means the source is not configured on that host, or the graph has not been re-projected since capture, not that the true answer is zero. +- **`Actor` is a GitHub login, not the AI user.** The identity that authored a commit or opened a PR is the git actor, not the `user_id` of whoever ran the agent. Cross-domain identity merge is later work (T1/T2); never equate an `Actor` with an AI operator. +- **Freshness applies here too.** The GitHub domain is only as current as the last projection on that host, and you cannot project through the read-only query token (projection is admin-side). On a stale central graph, recent PRs and reviews are simply missing. One subtlety: freshly projected rows sit in a spool until a settling read runs **on the server** (an admin-side query; the remote query surface never settles), and `node` and `edge` settle independently, so a graph can briefly show fresh nodes joined by stale edges. If a cross-domain join returns implausibly few rows against fresh-looking nodes, suspect an unsettled `edge` dataset before doubting the data. + +**What it adds:** + +- **Nodes:** `Actor` (login), `Issue`, `PullRequest`, `Review`, plus enriched `Repo` / `Commit` / `File`. +- **Edges:** `authored` (Actor→Commit), `opened` and `commented` (Actor→Issue | PullRequest), `submitted` (Actor→Review), `on` (Review→PullRequest), `references` (PullRequest→Commit), `touched` (Commit→File and PullRequest→File), `in` (Commit | File | Issue | PullRequest→Repo). + +**Why it matters, the possibility.** Because `Repo`, `Commit`, and `File` are bridge-ready, the AI-session web and the GitHub web are *one graph*. The commit a session sat on (`Session -at-> Commit`) is the same node GitHub knows through `PullRequest -references-> Commit` and `Actor -authored-> Commit`. So you can walk from an agent's activity into the code-review reality around it, questions `ai_gateway_messages` cannot express: + +- **AI work to the PR that shipped it:** `Session -at-> Commit <-references- PullRequest`. +- **AI work to who reviewed it:** continue `PullRequest <-on- Review <-submitted- Actor`. +- **Coverage, honestly:** which repos and PRs an agent's work actually reached, not just which cwd it ran in. +- **Reverse:** start from a `PullRequest` or `Repo` and walk inbound to every AI session that touched it. + +```bash +# Sessions whose HEAD commit is referenced by a PR (AI work that reached code review). +# Note: no --refresh with --remote; the server owns its freshness. +hyp query sql "select distinct s.natural_key session + from edge a join node s on a.src_id = s.node_id + join edge r on r.dst_id = a.dst_id and r.edge_type = 'references' + where a.edge_type = 'at'" --remote HYP_CENTRAL + +# From a PR, walk out to its reviews, actors, and referenced commits. +hyp graph neighbors owner/repo#123 --type PullRequest --depth 2 --direction both --remote HYP_CENTRAL +``` + +Reach for the GitHub domain whenever a question spans **both** AI activity and code collaboration (sessions to PRs, agents to reviewers, work to repos). If it is purely one side, the base graph or plain message SQL is enough. ## Two ways to query -### 1. SQL over `node` / `edge` — counts, filters, aggregates +### 1. SQL over `node` / `edge`: counts, filters, aggregates Flat SQL through the normal query surface (no recursion). Use for "how many", "top N", "group by", one- or two-hop joins: @@ -39,23 +76,23 @@ hyp query sql "select t.natural_key tool, count(*) n from edge e join node t on where e.edge_type = 'used' group by tool order by n desc" --refresh always ``` -`node`/`edge` are ordinary datasets — see the `hypaware-query` skill for `hyp query` mechanics (read stderr, freshness, `--format json`). Use `--refresh always` so the read settles freshly-projected rows. +`node`/`edge` are ordinary datasets; see the `hypaware-query` skill for `hyp query` mechanics (read stderr, freshness, `--format json`). Use `--refresh always` so the read settles freshly-projected rows. -### 2. Traversal — `hyp graph neighbors` — relationships and depth +### 2. Traversal with `hyp graph neighbors`: relationships and depth -For "what connects to X", co-occurrence, and N-hop walks — what flat SQL can't express. +For "what connects to X", co-occurrence, and N-hop walks, what flat SQL can't express. ```bash hyp graph neighbors [--depth N] [--type T] [--edge-type T] [--direction out|in|both] [--limit N] [--json] ``` -- **``** resolves by `node_id`, then `natural_key`, then `label` — so pass a session id, a file path (or basename), a model id, or a tool/app name. If it's ambiguous the command lists the candidates; narrow with `--type Session|App|Model|Tool|File`. +- **``** resolves by `node_id`, then `natural_key`, then `label`, so pass a session id, a file path (or basename), a model id, or a tool/app name. If it's ambiguous the command lists the candidates; narrow with `--type Session|App|Model|Tool|File|Repo|Commit|Actor|Issue|PullRequest|Review` (the last five require GitHub enrichment). - **`--direction`** is load-bearing because edges are Session-rooted: - `out` from a **Session** → its app, model, tools, files. - `in` from a **File / Tool / Model / App** → the **Sessions** that touched/used it. - `both` at `--depth 2` from a **File** → **co-occurrence**: the other files/tools/models reached through the sessions that share it. - **`--edge-type`** restricts which relations are walked (e.g. `--edge-type used`); repeatable or comma-separated. -- **`--limit`** caps output in BFS order and reports the true total when it truncates (`3 of 7 … — truncated`). +- **`--limit`** caps output in BFS order and reports the true total when it truncates (`… - truncated; raise --limit`). - **`--json`** emits `{ seed, neighbors: [{ hop, edge_type, direction, node }], reachable, truncated }` for follow-up reasoning. Examples: @@ -73,9 +110,23 @@ hyp graph neighbors claude-opus-4-8 --type Model --direction in # sessions that - Counting, ranking, grouping, "how often" → **`hyp query sql`** over `node`/`edge`. - "What connects to X", paths, neighborhoods, co-occurrence, depth → **`hyp graph neighbors`**. +### When the graph is cheaper than scanning `ai_gateway_messages` + +For descriptive "who used what" rollups, the graph answers from compact adjacency instead of scanning every message row. Prefer it when the rollup keys on an entity that is already a node: + +- sessions per tool = `used` · per model = `used_model` · per file = `touched` · per app = `via` · per repo = `in` (Session→Repo) · per commit = `at`. +- distinct-session counts over any of these are `count(distinct src_id)` on the edge, far fewer rows than `count(distinct session_id)` over messages (benchmarked ~12x fewer for the repo rollup). + +Stay on `ai_gateway_messages` when the measure lives on the message, not the relationship: + +- token sums and cache-read ratios; `count(*)` call totals (the graph keeps one edge per pair, so it cannot count repeat calls); `is_error` / `is_sidechain` / stop-reason; and `content_text` classification. +- per-`gateway_id` or per-`user_id` rollups: there are no Gateway or User nodes yet. + +Rule of thumb: "which sessions relate to X" or "how many distinct sessions" is a graph question; anything needing a per-message measure (tokens, errors, call counts, text) stays on messages. + ## Guardrails - **Project first.** The graph is only as fresh as the last `hyp graph project`; empty results usually mean it has not run. -- **Read stderr for errors.** `graph neighbors` writes not-found / ambiguity notes (and the large-graph note) to stderr; exit `1` is a resolution error, `2` is a usage error. **Truncation is part of the result**, so it goes to **stdout** (`3 of 7 … — truncated`) and the `--json` `truncated` field — not stderr. -- The graph is **derived and rebuildable** — never the source of truth. To change what it contains, fix capture/projection and re-project; don't hand-edit `node`/`edge`. -- Basic traversal loads the graph in memory per call — fine at activity-graph scale; a very large graph prints a note pointing at the future indexed path. +- **Read stderr for errors.** `graph neighbors` writes not-found / ambiguity notes (and the large-graph note) to stderr; exit `1` is a resolution error, `2` is a usage error. **Truncation is part of the result**, so it goes to **stdout** (`… - truncated; raise --limit`) and the `--json` `truncated` field, not stderr. +- The graph is **derived and rebuildable**, never the source of truth. To change what it contains, fix capture/projection and re-project; don't hand-edit `node`/`edge`. +- Basic traversal loads the graph in memory per call, fine at activity-graph scale; a very large graph prints a note pointing at the future indexed path.