hyparam · bgmcmullen · Jul 2, 2026 · Jul 1, 2026
diff --git a/README.md b/README.md
@@ -5,8 +5,9 @@ Modular logs and telemetry collector. Plugin-kernel architecture.
 HypAware captures conversations and traffic from local AI clients (Claude
 Code, Codex), raw Anthropic / OpenAI API traffic, and OpenTelemetry
 logs / traces / metrics into a local query cache and optional Parquet
-exports. Everything runs on the local machine — no central server is
-required for V1.
+exports. It runs fully local by default, no central server required, and a
+host can optionally join a fleet (`hyp join`) to forward its recordings to a
+central server.
 
 > Part of **[HypStack](https://hypstack.ai/)**, an open-source stack for AI observability.
 
@@ -59,7 +60,30 @@ Other init flags:
 | `--from-file <config.json>`| Skip the picker and load a known-good config            |
 | `--bin <path>`             | Override the binary path the daemon installer uses      |
 
-## Where things live
+## Joining a centrally-managed fleet (`hyp join`)
+
+`hyp join` enrolls a host in a fleet so its recordings are forwarded to a
+central sink:
+
+```sh
+hyp join <url> [token]
+hyp join <url> --token-file <path>     # read the token from a file (recommended for MDM)
+echo "<token>" | hyp join <url>        # or from stdin
+hyp join <url> <token> --no-daemon     # write the seed only, skip daemon install
+```
+
+It writes a central-enrollment config (mode `0600`) to a dedicated layer under
+`config-control/`, never to your local `hypaware-config.json`, so joining
+augments an existing install rather than replacing it, then installs and starts
+the daemon (unless `--no-daemon` is passed).
+
+The policy token is a multi-use fleet-wide credential. Prefer `--token-file`
+or stdin over a positional argument, which would otherwise land in shell
+history and process listings. Other flags: `--bin <path>` overrides the binary
+the daemon installer records, and `--no-daemon` writes the seed without
+installing or restarting the daemon.
+
+## Files and directories
 
 | Path                                           | Contents                                                 |
 |------------------------------------------------|----------------------------------------------------------|
@@ -86,6 +110,33 @@ hyp query sql "select count(*) from logs"
 Use `hyp query schema <dataset>` to see the columns available on each
 dataset, and `hyp query status` to inspect cache freshness per dataset.
 
+## Building and querying the activity graph
+
+Alongside the row datasets, HypAware can project captured activity into a
+node/edge **activity graph**: which sessions ran in which app, against which
+model, using which tools, touching which files. The projection is
+deterministic (exact-key matching, no models), and the `context-graph` plugins
+are active by default.
+
+Projection is a manual, cheap-to-rerun step. Build or refresh the graph from
+what has been captured, then walk it from a seed node:
+
+```sh
+hyp graph project                       # project captured data into the node/edge graph
+hyp graph compact                       # merge duplicate rows (optional housekeeping)
+hyp graph neighbors <node> --depth 2    # walk out from a seed node
+```
+
+`hyp graph neighbors` takes a `node_id`, natural key, or label as the seed,
+plus `--depth`, `--direction out|in|both`, `--type <node_type>`, `--edge-type
+<type>` (repeatable), and `--limit`. The graph is also plain data: the `node`
+and `edge` datasets are queryable through `hyp query sql` like any other
+dataset.
+
+Claude Code and Codex additionally get a `hypaware-graph` skill (and a
+`graph_neighbors` tool) so an assistant can project and walk the graph on your
+behalf.
+
 ## Attaching and detaching AI clients
 
 Attach a single client (idempotent — running twice is a no-op):
@@ -115,6 +166,46 @@ scripting. Claude writes only HypAware-related keys to
 `~/.claude/settings.json`; Codex writes a `hypaware` provider entry to
 `~/.codex/config.toml`. Unrelated keys in either file are preserved.
 
+## Opting a folder out of recording (`.hypignore`)
+
+A `.hypignore` file declares a data-usage policy for a directory subtree. It
+currently carries exactly one class, `ignore`: AI gateway exchanges (Claude /
+Codex) whose working directory is at or under a directory containing a
+`.hypignore` are never written to the local cache. The live LLM call is
+untouched; only persistence is suppressed.
+
+Resolution is gitignore-style: from an exchange's `cwd`, HypAware walks up the
+directory tree, and any `.hypignore` found on the way governs. Dropping one
+file at a repo root covers the whole repo, but the file works anywhere in the
+ancestor chain, including outside a git repo.
+
+Manage it with the CLI (idempotent; hand-authoring the dotfile is optional):
+
+```sh
+hyp ignore              # write a .hypignore at the repo root (or cwd if not in a repo)
+hyp ignore <path>       # ignore a specific directory subtree
+hyp ignore --check      # report whether cwd is ignored, which file governs, and
+                        # how many already-cached rows from the scope remain
+hyp unignore            # remove the governing .hypignore, re-enabling recording
+```
+
+`hyp ignore` writes a self-documenting file (a comment header plus the `ignore`
+token); an empty or comment-only `.hypignore` also means `ignore`. Pass
+`--json` to `hyp ignore` for a machine-readable result.
+
+Two things to know:
+
+- **Prospective only.** A `.hypignore` gates future recording and backfills.
+  Rows already captured before the file existed are left in place; `hyp ignore
+  --check` surfaces that residual count.
+- **Folder matching needs a `cwd`.** Only the Claude and Codex pathways supply
+  one, so `.hypignore` is a no-op for the `raw-anthropic` / `raw-openai`
+  proxy and OTEL sources.
+
+To pause recording for just the current Claude session (in-memory, reversible,
+not committed) use the `/hypaware-ignore` and `/hypaware-unignore` skills
+instead.
+
 ## Daemon lifecycle
 
 ```sh
@@ -130,33 +221,6 @@ hyp daemon uninstall    # remove the service file (config + recordings are kept)
 content and target paths without touching the filesystem — useful for
 verifying what `hyp init` will install.
 
-## V1 scope
-
-V1 ships with:
-
-- Bundled plugins (no external plugin install required for the default
-  flow): `@hypaware/ai-gateway`, `@hypaware/claude`, `@hypaware/codex`,
-  `@hypaware/otel`, `@hypaware/local-fs`, `@hypaware/format-parquet`,
-  `@hypaware/format-jsonl`, plus Anthropic and OpenAI upstreams.
-- Local capture for Claude Code, Codex, raw Anthropic API, raw OpenAI
-  API, and OTEL logs / traces / metrics.
-- Local query (`hyp query sql`) across captured AI gateway messages,
-  logs, traces, and metrics.
-- Local Parquet export through the `local-fs` sink and the
-  `format-parquet` encoder.
-- Claude Code attach and Codex attach (idempotent, reversible).
-- Persistent macOS / Linux user daemon (`launchd` LaunchAgent or
-  `systemd --user` unit).
-
-## Out of scope for V1
-
-- The central / enterprise sink.
-- Gascity (multi-tenant aggregation).
-- Extracting the kernel or bundled plugins into separate repos.
-- Migrating an existing Collectivus install.
-- An external first-party plugin registry. Only bundled plugins are
-  required for the default V1 path.
-
 ## Troubleshooting
 
 `hyp status` is the entry point for any "is HypAware working?" question.
@@ -208,7 +272,7 @@ npm run typecheck         # if a typecheck script is present
 npm pack --dry-run        # verify the published file set
 ```
 
-Re-run the V1 smoke battery and confirm every one is green:
+Re-run the smoke battery and confirm every one is green:
 
 ```sh
 hyp smoke package_bin_boot
@@ -265,8 +329,8 @@ hypaware-core/
     format-jsonl/       # @hypaware/format-jsonl
     claude/             # @hypaware/claude
     codex/              # @hypaware/codex
-    central/            # @hypaware/central (out of scope for V1)
-    gascity/            # @hypaware/gascity (out of scope for V1)
+    central/            # @hypaware/central (bundled, opt-in via `hyp join`)
+    gascity/            # @hypaware/gascity (bundled, opt-in)
 bin/
   hypaware.js           # CLI entrypoint (bound to both `hypaware` and `hyp`)
 ```

diff --git a/hypaware-core/plugins-workspace/claude/hypaware.plugin.json b/hypaware-core/plugins-workspace/claude/hypaware.plugin.json
@@ -33,6 +33,7 @@
     },
     "skills": [
       { "name": "hypaware-query", "clients": ["claude"] },
+      { "name": "hypaware-reference", "clients": ["claude"] },
       { "name": "hypaware-ignore", "clients": ["claude"] },
       { "name": "hypaware-unignore", "clients": ["claude"] },
       { "name": "hypaware-ai-adoption-report", "clients": ["claude"] },

diff --git a/hypaware-core/plugins-workspace/claude/skills/hypaware-reference/SKILL.md b/hypaware-core/plugins-workspace/claude/skills/hypaware-reference/SKILL.md
@@ -0,0 +1,163 @@
+---
+name: hypaware-reference
+description: Explain what HypAware is, what it captures, how its data flows, which hyp command to run, config and paths, joining a fleet, and the activity graph. Use for product and CLI orientation - "what is HypAware", "what can it capture", "how do I detach codex", "how do I join a server", "where does my data go". For querying recorded data use hypaware-query; for analyses use the hypaware-ai-*-report skills.
+---
+
+# HypAware Reference
+
+Orientation for questions about HypAware itself: what it is, what it captures,
+how data moves, and which command does what. This is a conceptual map, not a
+flag reference. For the exact flags and behavior on *this* install, run
+`hyp --help` and `hyp <command> --help`; for "is it working right now?" run
+`hyp status` (add `--json` for the stable machine shape). Those live sources
+win over anything summarized here.
+
+Do not answer data questions from this skill. It names no dataset columns, JSON
+paths, or SQL - that all lives in the **hypaware-query** skill, which is the
+single source of truth for the recorded data format.
+
+## What HypAware is
+
+A modular logs and telemetry collector with a plugin-kernel architecture. It
+captures conversations and traffic from local AI clients (Claude Code, Codex),
+raw Anthropic / OpenAI API traffic, and OpenTelemetry logs / traces / metrics
+into a local query cache and optional Parquet exports.
+
+It runs fully local by default, with no central server required. A host can
+optionally join a fleet (`hyp join`) to forward its recordings to a central
+HypAware server. HypAware is part of HypStack, an open-source stack for AI
+observability.
+
+## What it captures (sources)
+
+Any subset of these can be enabled at `hyp init`:
+
+- `claude` - Claude Code conversations
+- `codex` - Codex conversations
+- `raw-anthropic` - raw Anthropic API traffic
+- `raw-openai` - raw OpenAI API traffic
+- `otel` - OpenTelemetry logs / traces / metrics
+
+A `.hypignore` file opts a directory subtree out of recording, but it only
+gates the `claude` and `codex` pathways (they supply a working directory to
+match against). It is a no-op for the `raw-anthropic` / `raw-openai` proxy and
+OTEL sources. See the **hypaware-ignore** and **hypaware-unignore** skills for
+managing it, or `hyp ignore --help`.
+
+## How data flows (invariants)
+
+- **Capture always lands in the local cache first.** Every source writes only
+  to the intrinsic Iceberg-backed local query cache. Storage and query are
+  intrinsic to the kernel, not plugin-provided.
+- **Sinks are export targets, not the write path.** Configured sinks (for
+  example local-fs Parquet) receive *scheduled exports out of* the cache.
+  Sources never see sinks.
+- **One source, one table.** Each dataset table has exactly one producer and is
+  named after it.
+- **Config is explicit.** The written config enumerates the chosen plugins;
+  there is no implicit "use defaults" mode.
+
+## CLI command map
+
+The CLI is bound to both `hyp` and `hypaware`. Core commands:
+
+- `hyp status` - health snapshot: config path, daemon state, active plugins,
+  sources/sinks, attach state, retention, cache size, recent errors,
+  diagnostics with repair lines. The entry point for any "is it working?"
+  question.
+- `hyp init` - interactive walkthrough (or `--yes` non-interactive) that picks
+  sources, an export strategy, and a retention window, writes the config,
+  installs the daemon, and attaches selected clients.
+- `hyp query` - SQL over the local cache (and remote hosts via `--remote`).
+  Use the **hypaware-query** skill for this.
+- `hyp graph` - build and walk the activity graph: `project` materializes a
+  node/edge graph from captured data, `compact` folds duplicate rows, and
+  `neighbors <node>` walks out from a seed node. Contributed by the default-on
+  `@hypaware/context-graph` plugin.
+- `hyp attach` / `hyp detach` - wire a client (`claude`, `codex`) into the
+  local gateway, or remove only HypAware-managed settings. Idempotent and
+  reversible. `hyp unattach` is an alias of `detach`.
+- `hyp ignore` / `hyp unignore` - write or remove a `.hypignore` for a folder.
+- `hyp daemon` - lifecycle for the persistent user daemon: `install`, `start`,
+  `status`, `restart`, `stop`, `uninstall` (launchd on macOS, systemd `--user`
+  on Linux).
+- `hyp join` - enroll the host in a centrally-managed fleet.
+- `hyp remote` - manage remote HypAware query targets (`add`, `login`, `list`).
+- `hyp collect` - manage collected JSONL tables (`list`, `remove`).
+- `hyp backfill` - import client history from registered backfill providers.
+- `hyp sink` - manage sink instances (`force`, `maintain`).
+- `hyp config` - inspect or validate the config.
+- `hyp mcp` - serve this host's read verbs as an MCP server over stdio.
+- `hyp smoke` - run a hermetic smoke flow.
+- `hyp version` - print the version.
+
+`logs`, `metrics`, and `traces` commands are contributed by the `@hypaware/otel`
+plugin, so they appear only when it is enabled. Plugin-contributed commands
+vary by install - trust `hyp --help` for the live inventory, and
+`hyp <command> --help` for exact subcommands and flags.
+
+## Config and paths
+
+`HYP_HOME` defaults to `~/.hyp`; override by exporting it before invoking the
+CLI or daemon.
+
+- `<HYP_HOME>/hypaware-config.json` - active config, rewritten by `hyp init`
+- `<HYP_HOME>/hypaware/cache/` - local query cache (Iceberg-backed)
+- `<HYP_HOME>/hypaware/sinks/<name>/outbox/` - failed export rows awaiting retry
+- `<HYP_HOME>/hypaware/dev-telemetry/` - daemon self-telemetry
+- `<HYP_HOME>/hypaware/logs/daemon.{out,err}.log` - daemon stdout / stderr
+- `<HYP_HOME>/exports/` - local Parquet exports (when the local-fs sink is on)
+
+`hyp join` writes a separate central-enrollment layer under `config-control/`
+(mode `0600`), never into the local `hypaware-config.json`, so joining augments
+an existing install rather than replacing it.
+
+## Availability
+
+Available on every install (default flow, no extra config):
+
+- Local capture for Claude Code, Codex, raw Anthropic API, raw OpenAI API, and
+  OTEL logs / traces / metrics.
+- Local query over captured messages, logs, traces, and metrics.
+- The activity graph: `hyp graph project` builds a node/edge graph from captured
+  data and `hyp graph neighbors` walks it (also queryable via `hyp query`).
+- Local Parquet export.
+- Claude Code and Codex attach (idempotent, reversible).
+- A persistent macOS / Linux user daemon.
+
+Opt-in (enabled by explicit config, not the default flow):
+
+- **Central forwarding.** `hyp join <url> <token>` enrolls the host and turns on
+  the `@hypaware/central` sink, which forwards cache partitions to a central
+  HypAware server. It is fine to explain how to join a server; the receiving
+  server is a separate deployment the user points the host at.
+- **Additional bundled plugins** (for example an S3 sink and AI-enrichment
+  plugins). Some are opt-in specifically because enabling them lets captured
+  content leave the machine, so they must be a deliberate config choice. Trust
+  `hyp --help` and the written config for the live set on this install.
+
+Not provided by the CLI: there is no first-party plugin registry. Third-party
+plugins can be installed from npm or git, but HypAware does not curate a
+registry.
+
+## Hand-offs
+
+- Query or inspect recorded data - use the **hypaware-query** skill.
+- Run an analysis (spend, adoption, security, improvement) - use the
+  **hypaware-ai-spend-report**, **-adoption-report**, **-security-report**, or
+  **-improvement-report** skills.
+- Opt a folder out of recording - use **hypaware-ignore** /
+  **hypaware-unignore**, or pause just the current session with the
+  `/hypaware-ignore` and `/hypaware-unignore` skills.
+- "Is it working?" or diagnose a problem - `hyp status` (with `--json` for the
+  stable shape); its `diagnostics:` section carries `repair:` lines to run.
+
+## Guardrails
+
+- Treat `hyp --help`, `hyp <command> --help`, and `hyp status --json` as the
+  authoritative source for exact commands, flags, and state on this install.
+  Use the map above for orientation and conceptual answers only.
+- Never invent flags or promise a capability you cannot confirm on this
+  install. When unsure, run the relevant `--help` before answering.
+- Do not name dataset columns, JSON paths, or SQL here. Defer every data-format
+  detail to the **hypaware-query** skill.
diff --git a/hypaware-core/plugins-workspace/claude/src/index.js b/hypaware-core/plugins-workspace/claude/src/index.js
@@ -220,6 +220,7 @@ export async function activate(ctx) {
   // @ref LLP 0011#interactive-walkthrough [implements]: contributes client skills the first-run walkthrough installs
   for (const skillName of [
     'hypaware-query',
+    'hypaware-reference',
     'hypaware-ignore',
     'hypaware-unignore',
     'hypaware-ai-adoption-report',

diff --git a/hypaware-core/plugins-workspace/codex/hypaware.plugin.json b/hypaware-core/plugins-workspace/codex/hypaware.plugin.json
@@ -31,6 +31,7 @@
     },
     "skills": [
       { "name": "hypaware-query", "clients": ["codex"] },
+      { "name": "hypaware-reference", "clients": ["codex"] },
       { "name": "hypaware-ai-adoption-report", "clients": ["codex"] },
       { "name": "hypaware-ai-improvement-report", "clients": ["codex"] },
       { "name": "hypaware-ai-security-report", "clients": ["codex"] },