diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 0000000..6a5a440
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,56 @@
+# GoClaw CLI
+
+Go CLI for managing GoClaw AI agent gateway servers.
+
+## Tech Stack
+
+- **Language:** Go 1.25
+- **CLI:** Cobra (commands) + Viper-style config
+- **Transport:** HTTP REST + WebSocket RPC (gorilla/websocket)
+- **Config:** `~/.goclaw/config.yaml` + env vars + flags
+
+## Build & Test
+
+```bash
+go build ./...           # Compile check
+go vet ./...             # Static analysis
+go test ./...            # Run all tests
+go test -count=1 ./...   # Skip test cache
+make build               # Build binary with ldflags
+make install             # Install to GOPATH/bin
+```
+
+## Project Structure
+
+```
+cmd/           # Cobra command files (1 per resource group)
+internal/
+├── client/    # HTTP + WebSocket + auth clients
+├── config/    # Config loader (~/.goclaw/)
+├── output/    # Table/JSON/YAML formatters
+└── tui/       # Interactive prompts
+```
+
+## Conventions
+
+- Go snake_case file naming
+- Cobra command pattern: register in `init()`, implement as `RunE`
+- Config precedence: flags > env vars > config file
+- Token stored in credential store (not config.yaml)
+- All destructive ops require `--yes` or interactive confirmation
+- Dual mode: interactive (table output) + automation (JSON/YAML)
+
+## Key Patterns
+
+- `newHTTP()` / `newWS()` — create authenticated clients from global config
+- `buildBody()` — construct request body from flag values, skip empty
+- `readContent()` — read from `@filepath` or literal string
+- `unmarshalMap()` / `unmarshalList()` — parse JSON responses
+- `printer.Print()` — output in configured format
+
+## Testing
+
+- Unit tests in `*_test.go` alongside source
+- Use `httptest.NewServer` for HTTP client tests
+- Use gorilla/websocket upgrader for WS tests
+- No CGO race detector on Windows (use Linux CI)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 30b26ed..b969e1a 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,7 +5,7 @@ Format: [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
 ---
 
-## [Unreleased] — Domain Coverage Expansion (P0–P5)
+## [Unreleased] — Domain Coverage Expansion (P0–P6)
 
 ### Added
 
@@ -46,6 +46,19 @@ Format: [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 - `goclaw agents evolution skill apply <agent-id> <suggestion-id> [--skill-draft @file]` — explicit wrapper for approving `skill_add` suggestions through the server evolution approval route.
 - `goclaw agents evolution update` now maps `--action=accept|reject` to the server-compatible `status=approved|rejected` payload.
 
+**P6 — Backend-unblocked surfaces (gateway `v3.12.0-beta.20`+)**
+- `goclaw traces follow --session-key|--agent [--since RFC3339] [--limit N]` — one-shot incremental trace polling (`GET /v1/traces/follow`). Re-invoke with returned cursor to advance; no WS stream, no watch loop.
+- `goclaw providers reconnect <provider-id>` — hot-reconnect a provider, bumping the registry without touching credentials (`POST /v1/providers/{id}/reconnect`).
+- `goclaw sessions branch <session-key> --up-to-index N [--new-session-key K] [--label L] [--metadata k=v ...]` — branch a chat session at a 1-based message index into a new session (`POST /v1/chat/sessions/{key}/branch`). `--up-to-index=0` is preserved on the wire.
+- `goclaw sessions follow <session-key> [--cursor N] [--limit N]` — one-shot cursor-based history poll (`GET /v1/chat/sessions/{key}/history/follow`). Not a stream; `--cursor=0` is preserved literally in the query string.
+- `goclaw channels writers test <instance-id> --group-id G --user-id U` — probe a (group, user) pair against a channel's writer policy without mutating state (`POST /v1/channels/instances/{id}/writers/test`). Request body has exactly two keys.
+- `goclaw activity aggregate --group-by {action|actor_type|entity_type|actor_id} [--from --to --limit --actor-type --actor-id --action --entity-type --entity-id]` — group audit-log activity by dimension with bucket counts (`GET /v1/activity/aggregate`). Attached as subcommand of existing `activity` parent.
+- `goclaw logs aggregate [--group-by {level|source}] [--level --source --from]` — summarize the runtime log ring buffer (`GET /v1/logs/runtime/aggregate`, admin-only). Distinct from `logs tail`. Epoch-millis `last_seen` rendered as RFC3339, never scientific notation.
+
+### Fixed
+
+- `goclaw traces get <id>` — TTY mode now renders a human-readable summary (header card + span tree + events list) instead of dumping raw JSON. JSON-mode payload unchanged. Decode failures surface as wrapped errors instead of an empty `{}`. Trace ids are validated against `^[A-Za-z0-9._-]+$` and reserved tokens (`.`, `..`) are rejected before any HTTP call. Distinct exit codes per failure: not-found → 3, permission-denied → 2, malformed-id → 4, server-failure → 5. Latent retry-body bug in `internal/client/http.go` fixed: the final 5xx/429 response body is now preserved so the typed `APIError` reaches the caller (previously collapsed to exit 1). Closes #17.
+
 ### Notes
 - All new commands honor the AI-first ergonomics contract: `--output=json` envelope, central error handler, `--yes` for destructive ops, `--quiet` for CI.
 - P4/P5 backlog was re-swept against the current CLI surface; already-covered items were removed from residual scope before implementation.
diff --git a/README.md b/README.md
index db99c34..43a1a4d 100644
--- a/README.md
+++ b/README.md
@@ -91,6 +91,51 @@ echo "Analyze this log" | goclaw chat myagent
 | `restore` | System/tenant restore from backup archive |
 | `vault` | Knowledge Vault — documents, links, search, graph, enrichment |
 
+### Backend-Unblocked Surfaces (P6)
+
+Seven one-shot subcommands wired to backend PRs `#37` and `#44`:
+
+```bash
+# Incremental trace polling (one shot; rerun with returned cursor)
+goclaw traces follow --session-key <key> [--since <RFC3339>] [--limit <n>]
+goclaw traces follow --agent <id> [--since <RFC3339>] [--limit <n>]
+
+# Provider hot-reconnect (bumps registry without recreating credentials)
+goclaw providers reconnect <provider-id>
+
+# Branch a chat session at a message index
+goclaw sessions branch <session-key> --up-to-index <N> [--new-session-key <k>] \
+  [--label <l>] [--metadata k=v ...]
+
+# One-shot session-history poll (cursor-based; not a stream)
+goclaw sessions follow <session-key> [--cursor <n>] [--limit <n>]
+
+# Probe a (group, user) pair against a channel's writer policy
+goclaw channels writers test <instance-id> --group-id <g> --user-id <u>
+
+# Aggregate audit-log activity by dimension
+goclaw activity aggregate --group-by <action|actor_type|entity_type|actor_id> \
+  [--from <RFC3339>] [--to <RFC3339>] [--limit <n>] \
+  [--actor-type <v>] [--actor-id <v>] [--action <v>] [--entity-type <v>] [--entity-id <v>]
+
+# Summarize the runtime log ring buffer (NOT a stream — see 'logs tail' for that)
+goclaw logs aggregate [--group-by <level|source>] [--level <l>] [--source <s>] [--from <RFC3339>]
+```
+
+All are one-shot HTTP — no watch loops or WS streams. `logs aggregate` is admin-only on the server; `activity aggregate --group-by actor_id` is also admin-only (server-enforced).
+
+### Reading a Trace by ID
+
+```bash
+# Human-readable: header + span tree + events
+goclaw traces get <trace-id>
+
+# Machine-readable JSON (also auto-selected when stdout is piped)
+goclaw traces get <trace-id> -o json
+```
+
+Exit codes for `traces get`: `0` on success, `2` on permission denied, `3` on not-found, `4` on malformed id (rejected before any HTTP call — allowlist `^[A-Za-z0-9._-]+$`), `5` on upstream server failure, `6` on rate-limit / network-resource exhaustion.
+
 ## Backup & Restore
 
 ### System Backup
diff --git a/cmd/activity_aggregate.go b/cmd/activity_aggregate.go
new file mode 100644
index 0000000..15405a8
--- /dev/null
+++ b/cmd/activity_aggregate.go
@@ -0,0 +1,158 @@
+package cmd
+
+import (
+	"fmt"
+	"net/url"
+	"time"
+
+	"github.com/nextlevelbuilder/goclaw-cli/internal/output"
+	"github.com/spf13/cobra"
+)
+
+// validActivityGroupBy enumerates allowed --group-by values for the activity
+// aggregate endpoint. Server enforces admin-only for actor_id; the CLI does
+// not pre-check that — it only validates the enum.
+var validActivityGroupBy = map[string]bool{
+	"action":      true,
+	"actor_type":  true,
+	"entity_type": true,
+	"actor_id":    true,
+}
+
+// formatLastSeen renders an aggregate bucket's last_seen field as RFC3339.
+//
+// The activity aggregate endpoint returns last_seen as an RFC3339 string,
+// but the logs runtime aggregate endpoint returns last_seen as epoch millis
+// (a number). `unmarshalMap` decodes JSON numbers as float64, and the shared
+// `str()` helper renders large float64 as scientific notation (e.g.
+// "1.76e+12"). This helper type-switches so both endpoints render
+// consistently as RFC3339 strings in the table view.
+func formatLastSeen(v any) string {
+	switch t := v.(type) {
+	case nil:
+		return "-"
+	case string:
+		if t == "" {
+			return "-"
+		}
+		return t
+	case float64:
+		if t == 0 {
+			return "-"
+		}
+		return time.UnixMilli(int64(t)).UTC().Format(time.RFC3339)
+	case int64:
+		if t == 0 {
+			return "-"
+		}
+		return time.UnixMilli(t).UTC().Format(time.RFC3339)
+	case int:
+		if t == 0 {
+			return "-"
+		}
+		return time.UnixMilli(int64(t)).UTC().Format(time.RFC3339)
+	default:
+		return fmt.Sprintf("%v", v)
+	}
+}
+
+// activityAggregateCmd groups audit-log activity by a dimension and returns
+// bucket counts. Attached as a subcommand of the existing activityCmd
+// (declared in cmd/admin.go) so the top-level command surface is unchanged.
+//
+// Backend route: GET /v1/activity/aggregate
+var activityAggregateCmd = &cobra.Command{
+	Use:   "aggregate",
+	Short: "Aggregate audit-log activity by a grouping dimension",
+	Long: `Group activity log entries by a dimension (action, actor_type, entity_type,
+or actor_id) and return bucket counts with last_seen timestamps.
+
+Optional filters narrow the result set: --from/--to (RFC3339 window),
+--actor-type, --actor-id, --action, --entity-type, --entity-id, --limit.
+
+Backend route: GET /v1/activity/aggregate
+Note: --group-by=actor_id requires admin privileges (enforced server-side).`,
+	RunE: func(cmd *cobra.Command, args []string) error {
+		groupBy, _ := cmd.Flags().GetString("group-by")
+		if groupBy == "" {
+			return fmt.Errorf("--group-by is required (one of action, actor_type, entity_type, actor_id)")
+		}
+		if !validActivityGroupBy[groupBy] {
+			return fmt.Errorf("--group-by must be one of action, actor_type, entity_type, actor_id (got %q)", groupBy)
+		}
+		from, _ := cmd.Flags().GetString("from")
+		if from != "" {
+			if _, err := time.Parse(time.RFC3339, from); err != nil {
+				return fmt.Errorf("--from must be RFC3339: %w", err)
+			}
+		}
+		to, _ := cmd.Flags().GetString("to")
+		if to != "" {
+			if _, err := time.Parse(time.RFC3339, to); err != nil {
+				return fmt.Errorf("--to must be RFC3339: %w", err)
+			}
+		}
+
+		q := url.Values{}
+		q.Set("group_by", groupBy)
+		if from != "" {
+			q.Set("from", from)
+		}
+		if to != "" {
+			q.Set("to", to)
+		}
+		if v, _ := cmd.Flags().GetInt("limit"); v > 0 {
+			q.Set("limit", fmt.Sprintf("%d", v))
+		}
+		for flagName, queryKey := range map[string]string{
+			"actor-type":  "actor_type",
+			"actor-id":    "actor_id",
+			"action":      "action",
+			"entity-type": "entity_type",
+			"entity-id":   "entity_id",
+		} {
+			if v, _ := cmd.Flags().GetString(flagName); v != "" {
+				q.Set(queryKey, v)
+			}
+		}
+
+		c, err := newHTTP()
+		if err != nil {
+			return err
+		}
+		data, err := c.Get("/v1/activity/aggregate?" + q.Encode())
+		if err != nil {
+			return err
+		}
+		m := unmarshalMap(data)
+		if cfg.OutputFormat != "table" {
+			printer.Print(m)
+			return nil
+		}
+		buckets, _ := m["buckets"].([]any)
+		tbl := output.NewTable("KEY", "COUNT", "LAST_SEEN")
+		for _, raw := range buckets {
+			row, ok := raw.(map[string]any)
+			if !ok {
+				continue
+			}
+			tbl.AddRow(str(row, "key"), str(row, "count"), formatLastSeen(row["last_seen"]))
+		}
+		printer.Print(tbl)
+		return nil
+	},
+}
+
+func init() {
+	activityAggregateCmd.Flags().String("group-by", "", "Grouping dimension: action | actor_type | entity_type | actor_id (required)")
+	activityAggregateCmd.Flags().String("from", "", "RFC3339 start of time window")
+	activityAggregateCmd.Flags().String("to", "", "RFC3339 end of time window")
+	activityAggregateCmd.Flags().Int("limit", 0, "Maximum buckets to return (server default applied if 0)")
+	activityAggregateCmd.Flags().String("actor-type", "", "Filter by actor type")
+	activityAggregateCmd.Flags().String("actor-id", "", "Filter by actor id")
+	activityAggregateCmd.Flags().String("action", "", "Filter by action")
+	activityAggregateCmd.Flags().String("entity-type", "", "Filter by entity type")
+	activityAggregateCmd.Flags().String("entity-id", "", "Filter by entity id")
+	_ = activityAggregateCmd.MarkFlagRequired("group-by")
+	activityCmd.AddCommand(activityAggregateCmd)
+}
diff --git a/cmd/activity_aggregate_test.go b/cmd/activity_aggregate_test.go
new file mode 100644
index 0000000..b2add55
--- /dev/null
+++ b/cmd/activity_aggregate_test.go
@@ -0,0 +1,186 @@
+package cmd
+
+import (
+	"net/http"
+	"net/http/httptest"
+	"net/url"
+	"strings"
+	"testing"
+)
+
+func resetActivityAggregateFlags(t *testing.T) {
+	t.Helper()
+	for _, name := range []string{"group-by", "from", "to", "actor-type", "actor-id", "action", "entity-type", "entity-id"} {
+		resetTestFlag(activityAggregateCmd, name, "")
+	}
+	resetTestFlag(activityAggregateCmd, "limit", "0")
+}
+
+func TestActivityAggregate_MissingGroupBy(t *testing.T) {
+	t.Cleanup(func() { resetActivityAggregateFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	err := runCmd(t, "activity", "aggregate")
+	if err == nil {
+		t.Fatal("expected error for missing --group-by")
+	}
+}
+
+func TestActivityAggregate_InvalidGroupBy(t *testing.T) {
+	t.Cleanup(func() { resetActivityAggregateFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	err := runCmd(t, "activity", "aggregate", "--group-by=foo")
+	if err == nil {
+		t.Fatal("expected error for invalid --group-by=foo")
+	}
+}
+
+func TestActivityAggregate_ValidGroupByValues(t *testing.T) {
+	for _, gb := range []string{"action", "actor_type", "entity_type", "actor_id"} {
+		gb := gb
+		t.Run(gb, func(t *testing.T) {
+			t.Cleanup(func() { resetActivityAggregateFlags(t) })
+			srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+				okJSON(t, w, map[string]any{"source": "activity", "group_by": gb, "total": 0, "buckets": []any{}})
+			}))
+			defer srv.Close()
+			t.Setenv("GOCLAW_SERVER", srv.URL)
+			t.Setenv("GOCLAW_TOKEN", "test-token")
+			if err := runCmd(t, "activity", "aggregate", "--group-by="+gb); err != nil {
+				t.Fatalf("group-by=%s: %v", gb, err)
+			}
+		})
+	}
+}
+
+func TestActivityAggregate_InvalidFromRFC3339(t *testing.T) {
+	t.Cleanup(func() { resetActivityAggregateFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	err := runCmd(t, "activity", "aggregate", "--group-by=action", "--from=not-a-date")
+	if err == nil {
+		t.Fatal("expected error for non-RFC3339 --from")
+	}
+}
+
+func TestActivityAggregate_InvalidToRFC3339(t *testing.T) {
+	t.Cleanup(func() { resetActivityAggregateFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	err := runCmd(t, "activity", "aggregate", "--group-by=action", "--to=tomorrow")
+	if err == nil {
+		t.Fatal("expected error for non-RFC3339 --to")
+	}
+}
+
+func TestActivityAggregate_QueryStringHasFilters(t *testing.T) {
+	t.Cleanup(func() { resetActivityAggregateFlags(t) })
+
+	var rawQuery, path string
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		path = r.URL.Path
+		rawQuery = r.URL.RawQuery
+		okJSON(t, w, map[string]any{"source": "activity", "group_by": "action", "total": 0, "buckets": []any{}})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "activity", "aggregate",
+		"--group-by=action",
+		"--from=2026-05-01T00:00:00Z",
+		"--to=2026-05-27T00:00:00Z",
+		"--limit=25",
+		"--actor-type=user",
+		"--actor-id=u1",
+		"--action=session.branch",
+		"--entity-type=session",
+		"--entity-id=sess-1",
+	); err != nil {
+		t.Fatalf("activity aggregate: %v", err)
+	}
+	if path != "/v1/activity/aggregate" {
+		t.Fatalf("path = %q", path)
+	}
+	q, err := url.ParseQuery(rawQuery)
+	if err != nil {
+		t.Fatalf("parse query: %v", err)
+	}
+	want := map[string]string{
+		"group_by":    "action",
+		"from":        "2026-05-01T00:00:00Z",
+		"to":          "2026-05-27T00:00:00Z",
+		"limit":       "25",
+		"actor_type":  "user",
+		"actor_id":    "u1",
+		"action":      "session.branch",
+		"entity_type": "session",
+		"entity_id":   "sess-1",
+	}
+	for k, v := range want {
+		if got := q.Get(k); got != v {
+			t.Errorf("query[%s] = %q, want %q (full raw: %s)", k, got, v, rawQuery)
+		}
+	}
+}
+
+func TestActivityAggregate_JSONPreservesFields(t *testing.T) {
+	t.Cleanup(func() { resetActivityAggregateFlags(t) })
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"source":   "activity",
+			"group_by": "action",
+			"total":    10,
+			"buckets": []map[string]any{
+				{"key": "session.branch", "count": 7, "last_seen": "2026-05-27T11:00:00Z"},
+			},
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "json")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "activity", "aggregate", "--group-by=action")
+	})
+	if err != nil {
+		t.Fatalf("activity aggregate: %v", err)
+	}
+	for _, want := range []string{"source", "group_by", "total", "buckets"} {
+		if !strings.Contains(out, want) {
+			t.Errorf("stdout missing %q in: %s", want, out)
+		}
+	}
+}
+
+func TestActivityAggregate_TableHeaders(t *testing.T) {
+	t.Cleanup(func() { resetActivityAggregateFlags(t) })
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"source":   "activity",
+			"group_by": "action",
+			"total":    1,
+			"buckets": []map[string]any{
+				{"key": "session.branch", "count": 1, "last_seen": "2026-05-27T11:00:00Z"},
+			},
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "table")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "activity", "aggregate", "--group-by=action")
+	})
+	if err != nil {
+		t.Fatalf("activity aggregate: %v", err)
+	}
+	for _, want := range []string{"KEY", "COUNT", "LAST_SEEN", "session.branch"} {
+		if !strings.Contains(out, want) {
+			t.Errorf("table missing %q in:\n%s", want, out)
+		}
+	}
+}
diff --git a/cmd/channels_writers.go b/cmd/channels_writers.go
index f789ad9..04b1228 100644
--- a/cmd/channels_writers.go
+++ b/cmd/channels_writers.go
@@ -1,6 +1,9 @@
 package cmd
 
 import (
+	"net/url"
+
+	"github.com/nextlevelbuilder/goclaw-cli/internal/output"
 	"github.com/spf13/cobra"
 )
 
@@ -79,11 +82,61 @@ var channelsWritersRemoveCmd = &cobra.Command{
 	},
 }
 
+// channelsWritersTestCmd probes whether a (group, user) pair is permitted to
+// write into a channel instance. Body is POSTed with exactly two keys
+// (group_id, user_id) — no extra fields, so the server's contract stays tight.
+//
+// Backend route: POST /v1/channels/instances/{id}/writers/test
+var channelsWritersTestCmd = &cobra.Command{
+	Use:   "test <instanceID>",
+	Short: "Test whether a (group, user) pair is allowed to write",
+	Long: `Probe a channel instance's writer policy for a specific group/user pair
+without mutating state.
+
+Backend route: POST /v1/channels/instances/{id}/writers/test`,
+	Args: cobra.ExactArgs(1),
+	RunE: func(cmd *cobra.Command, args []string) error {
+		groupID, _ := cmd.Flags().GetString("group-id")
+		userID, _ := cmd.Flags().GetString("user-id")
+		// Body has exactly two keys — construct directly so no other flags
+		// (e.g. accidental future additions) leak into the request.
+		body := map[string]any{
+			"group_id": groupID,
+			"user_id":  userID,
+		}
+		c, err := newHTTP()
+		if err != nil {
+			return err
+		}
+		path := "/v1/channels/instances/" + url.PathEscape(args[0]) + "/writers/test"
+		data, err := c.Post(path, body)
+		if err != nil {
+			return err
+		}
+		m := unmarshalMap(data)
+		if cfg.OutputFormat != "table" {
+			printer.Print(m)
+			return nil
+		}
+		tbl := output.NewTable("ALLOWED", "REASON", "WRITER_COUNT", "GROUP_ID", "USER_ID")
+		tbl.AddRow(str(m, "allowed"), str(m, "reason"), str(m, "writer_count"),
+			str(m, "group_id"), str(m, "user_id"))
+		printer.Print(tbl)
+		return nil
+	},
+}
+
 func init() {
 	channelsWritersAddCmd.Flags().String("user", "", "User ID")
 	channelsWritersAddCmd.Flags().String("display-name", "", "Display name")
 	_ = channelsWritersAddCmd.MarkFlagRequired("user")
 	channelsWritersRemoveCmd.Flags().String("user", "", "User ID")
 	_ = channelsWritersRemoveCmd.MarkFlagRequired("user")
-	channelsWritersCmd.AddCommand(channelsWritersListCmd, channelsWritersGroupsCmd, channelsWritersAddCmd, channelsWritersRemoveCmd)
+
+	channelsWritersTestCmd.Flags().String("group-id", "", "Group identifier (required)")
+	channelsWritersTestCmd.Flags().String("user-id", "", "User identifier (required)")
+	_ = channelsWritersTestCmd.MarkFlagRequired("group-id")
+	_ = channelsWritersTestCmd.MarkFlagRequired("user-id")
+
+	channelsWritersCmd.AddCommand(channelsWritersListCmd, channelsWritersGroupsCmd, channelsWritersAddCmd, channelsWritersRemoveCmd, channelsWritersTestCmd)
 }
diff --git a/cmd/channels_writers_test_test.go b/cmd/channels_writers_test_test.go
new file mode 100644
index 0000000..fe788ea
--- /dev/null
+++ b/cmd/channels_writers_test_test.go
@@ -0,0 +1,160 @@
+package cmd
+
+import (
+	"encoding/json"
+	"io"
+	"net/http"
+	"net/http/httptest"
+	"strings"
+	"testing"
+)
+
+func resetChannelsWritersTestFlags(t *testing.T) {
+	t.Helper()
+	for _, name := range []string{"group-id", "user-id"} {
+		resetTestFlag(channelsWritersTestCmd, name, "")
+	}
+}
+
+func TestChannelsWritersTest_BodyShape(t *testing.T) {
+	t.Cleanup(func() { resetChannelsWritersTestFlags(t) })
+
+	var gotPath, gotMethod string
+	var body map[string]any
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		gotPath = r.URL.Path
+		gotMethod = r.Method
+		raw, _ := io.ReadAll(r.Body)
+		_ = json.Unmarshal(raw, &body)
+		okJSON(t, w, map[string]any{
+			"allowed":      true,
+			"reason":       "writer",
+			"instance_id":  "inst-1",
+			"agent_id":     "agent-1",
+			"group_id":     "group:telegram:-100123",
+			"user_id":      "386246614",
+			"writer_count": 3,
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "channels", "writers", "test", "inst-1",
+		"--group-id=group:telegram:-100123",
+		"--user-id=386246614"); err != nil {
+		t.Fatalf("channels writers test: %v", err)
+	}
+	if gotMethod != http.MethodPost {
+		t.Fatalf("method = %q", gotMethod)
+	}
+	if gotPath != "/v1/channels/instances/inst-1/writers/test" {
+		t.Fatalf("path = %q", gotPath)
+	}
+	if body["group_id"] != "group:telegram:-100123" || body["user_id"] != "386246614" {
+		t.Fatalf("body fields wrong: %#v", body)
+	}
+	// Body must contain ONLY these two keys.
+	if len(body) != 2 {
+		t.Fatalf("body has extra keys: %#v", body)
+	}
+}
+
+func TestChannelsWritersTest_MissingGroupID(t *testing.T) {
+	t.Cleanup(func() { resetChannelsWritersTestFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	err := runCmd(t, "channels", "writers", "test", "inst-1", "--user-id=u1")
+	if err == nil {
+		t.Fatal("expected error for missing --group-id")
+	}
+}
+
+func TestChannelsWritersTest_MissingUserID(t *testing.T) {
+	t.Cleanup(func() { resetChannelsWritersTestFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	err := runCmd(t, "channels", "writers", "test", "inst-1", "--group-id=g1")
+	if err == nil {
+		t.Fatal("expected error for missing --user-id")
+	}
+}
+
+func TestChannelsWritersTest_PathEscape(t *testing.T) {
+	t.Cleanup(func() { resetChannelsWritersTestFlags(t) })
+
+	var rawPath, path string
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		rawPath = r.URL.RawPath
+		path = r.URL.Path
+		okJSON(t, w, map[string]any{"allowed": true, "reason": "writer", "writer_count": 1})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "channels", "writers", "test", "weird/id:1", "--group-id=g1", "--user-id=u1"); err != nil {
+		t.Fatalf("channels writers test: %v", err)
+	}
+	if !strings.Contains(rawPath, "weird%2Fid%3A1") && !strings.Contains(path, "weird/id:1") {
+		t.Fatalf("path not escaped — RawPath=%q Path=%q", rawPath, path)
+	}
+}
+
+func TestChannelsWritersTest_JSONOutput(t *testing.T) {
+	t.Cleanup(func() { resetChannelsWritersTestFlags(t) })
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"allowed":      false,
+			"reason":       "not_writer",
+			"writer_count": 2,
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "json")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "channels", "writers", "test", "inst-1", "--group-id=g1", "--user-id=u1")
+	})
+	if err != nil {
+		t.Fatalf("channels writers test: %v", err)
+	}
+	for _, want := range []string{"allowed", "reason", "writer_count"} {
+		if !strings.Contains(out, want) {
+			t.Errorf("stdout missing %q in: %s", want, out)
+		}
+	}
+}
+
+func TestChannelsWritersTest_TableHeaders(t *testing.T) {
+	t.Cleanup(func() { resetChannelsWritersTestFlags(t) })
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"allowed":      true,
+			"reason":       "writer",
+			"writer_count": 3,
+			"group_id":     "g1",
+			"user_id":      "u1",
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "table")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "channels", "writers", "test", "inst-1", "--group-id=g1", "--user-id=u1")
+	})
+	if err != nil {
+		t.Fatalf("channels writers test: %v", err)
+	}
+	for _, want := range []string{"ALLOWED", "REASON", "WRITER_COUNT", "GROUP_ID", "USER_ID"} {
+		if !strings.Contains(out, want) {
+			t.Errorf("table missing header %q in:\n%s", want, out)
+		}
+	}
+}
diff --git a/cmd/logs_aggregate.go b/cmd/logs_aggregate.go
new file mode 100644
index 0000000..f994b90
--- /dev/null
+++ b/cmd/logs_aggregate.go
@@ -0,0 +1,98 @@
+package cmd
+
+import (
+	"fmt"
+	"net/url"
+	"time"
+
+	"github.com/nextlevelbuilder/goclaw-cli/internal/output"
+	"github.com/spf13/cobra"
+)
+
+// validLogsGroupBy enumerates allowed --group-by values for the runtime logs
+// aggregate endpoint.
+var validLogsGroupBy = map[string]bool{
+	"level":  true,
+	"source": true,
+}
+
+// logsAggregateCmd queries the runtime log ring-buffer aggregate endpoint.
+// Distinct from `logs tail` (WS streaming): this is a one-shot HTTP GET that
+// summarizes the in-memory ring buffer by level or source.
+//
+// Backend route: GET /v1/logs/runtime/aggregate (admin-only on server side).
+var logsAggregateCmd = &cobra.Command{
+	Use:   "aggregate",
+	Short: "Summarize runtime logs (ring buffer) by level or source",
+	Long: `Aggregate the in-memory runtime log ring buffer by --group-by (level or
+source). This is a one-shot HTTP query — not a stream. Use 'logs tail' for
+real-time streaming.
+
+Backend route: GET /v1/logs/runtime/aggregate (admin-only, server-enforced).`,
+	RunE: func(cmd *cobra.Command, args []string) error {
+		groupBy, _ := cmd.Flags().GetString("group-by")
+		if groupBy == "" {
+			groupBy = "level"
+		}
+		if !validLogsGroupBy[groupBy] {
+			return fmt.Errorf("--group-by must be one of level, source (got %q)", groupBy)
+		}
+		from, _ := cmd.Flags().GetString("from")
+		if from != "" {
+			if _, err := time.Parse(time.RFC3339, from); err != nil {
+				return fmt.Errorf("--from must be RFC3339: %w", err)
+			}
+		}
+
+		q := url.Values{}
+		q.Set("group_by", groupBy)
+		for flagName, queryKey := range map[string]string{
+			"level":  "level",
+			"source": "source",
+			"from":   "from",
+		} {
+			if v, _ := cmd.Flags().GetString(flagName); v != "" {
+				q.Set(queryKey, v)
+			}
+		}
+
+		c, err := newHTTP()
+		if err != nil {
+			return err
+		}
+		data, err := c.Get("/v1/logs/runtime/aggregate?" + q.Encode())
+		if err != nil {
+			return err
+		}
+		m := unmarshalMap(data)
+		if cfg.OutputFormat != "table" {
+			printer.Print(m)
+			return nil
+		}
+		// Summary row (source, retention, capacity, sample_size).
+		summary := output.NewTable("SOURCE", "RETENTION", "CAPACITY", "SAMPLE_SIZE")
+		summary.AddRow(str(m, "source"), str(m, "retention"),
+			str(m, "capacity"), str(m, "sample_size"))
+		printer.Print(summary)
+		// Bucket rows.
+		buckets, _ := m["buckets"].([]any)
+		tbl := output.NewTable("KEY", "COUNT", "LAST_SEEN")
+		for _, raw := range buckets {
+			row, ok := raw.(map[string]any)
+			if !ok {
+				continue
+			}
+			tbl.AddRow(str(row, "key"), str(row, "count"), formatLastSeen(row["last_seen"]))
+		}
+		printer.Print(tbl)
+		return nil
+	},
+}
+
+func init() {
+	logsAggregateCmd.Flags().String("group-by", "level", "Grouping dimension: level | source (default level)")
+	logsAggregateCmd.Flags().String("level", "", "Filter by level: debug | info | warn | error")
+	logsAggregateCmd.Flags().String("source", "", "Filter by source")
+	logsAggregateCmd.Flags().String("from", "", "RFC3339 start of time window")
+	logsCmd.AddCommand(logsAggregateCmd)
+}
diff --git a/cmd/logs_aggregate_test.go b/cmd/logs_aggregate_test.go
new file mode 100644
index 0000000..7709c75
--- /dev/null
+++ b/cmd/logs_aggregate_test.go
@@ -0,0 +1,224 @@
+package cmd
+
+import (
+	"net/http"
+	"net/http/httptest"
+	"net/url"
+	"regexp"
+	"strings"
+	"testing"
+)
+
+func resetLogsAggregateFlags(t *testing.T) {
+	t.Helper()
+	resetTestFlag(logsAggregateCmd, "group-by", "level")
+	for _, name := range []string{"level", "source", "from"} {
+		resetTestFlag(logsAggregateCmd, name, "")
+	}
+}
+
+func TestLogsAggregate_DefaultGroupBy(t *testing.T) {
+	t.Cleanup(func() { resetLogsAggregateFlags(t) })
+
+	var path, rawQuery string
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		path = r.URL.Path
+		rawQuery = r.URL.RawQuery
+		okJSON(t, w, map[string]any{
+			"source":      "runtime",
+			"retention":   "ring_buffer",
+			"capacity":    100,
+			"sample_size": 0,
+			"group_by":    "level",
+			"buckets":     []any{},
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "logs", "aggregate"); err != nil {
+		t.Fatalf("logs aggregate: %v", err)
+	}
+	if path != "/v1/logs/runtime/aggregate" {
+		t.Fatalf("path = %q", path)
+	}
+	q, _ := url.ParseQuery(rawQuery)
+	// Default group_by=level should appear in query.
+	if q.Get("group_by") != "level" {
+		t.Errorf("expected group_by=level in raw query, got: %q", rawQuery)
+	}
+}
+
+func TestLogsAggregate_InvalidGroupBy(t *testing.T) {
+	t.Cleanup(func() { resetLogsAggregateFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	err := runCmd(t, "logs", "aggregate", "--group-by=foo")
+	if err == nil {
+		t.Fatal("expected error for invalid --group-by=foo")
+	}
+}
+
+func TestLogsAggregate_QueryStringFilters(t *testing.T) {
+	t.Cleanup(func() { resetLogsAggregateFlags(t) })
+
+	var rawQuery string
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		rawQuery = r.URL.RawQuery
+		okJSON(t, w, map[string]any{
+			"source": "runtime", "retention": "ring_buffer", "capacity": 100,
+			"sample_size": 0, "group_by": "source", "buckets": []any{},
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "logs", "aggregate",
+		"--group-by=source",
+		"--level=warn",
+		"--source=router",
+		"--from=2026-05-01T00:00:00Z",
+	); err != nil {
+		t.Fatalf("logs aggregate: %v", err)
+	}
+	q, _ := url.ParseQuery(rawQuery)
+	want := map[string]string{
+		"group_by": "source",
+		"level":    "warn",
+		"source":   "router",
+		"from":     "2026-05-01T00:00:00Z",
+	}
+	for k, v := range want {
+		if got := q.Get(k); got != v {
+			t.Errorf("query[%s] = %q, want %q (raw: %s)", k, got, v, rawQuery)
+		}
+	}
+}
+
+func TestLogsAggregate_InvalidFromRFC3339(t *testing.T) {
+	t.Cleanup(func() { resetLogsAggregateFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	err := runCmd(t, "logs", "aggregate", "--from=not-a-date")
+	if err == nil {
+		t.Fatal("expected error for non-RFC3339 --from")
+	}
+}
+
+func TestLogsAggregate_JSONPreservesFields(t *testing.T) {
+	t.Cleanup(func() { resetLogsAggregateFlags(t) })
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"source":      "runtime",
+			"retention":   "ring_buffer",
+			"capacity":    100,
+			"sample_size": 25,
+			"group_by":    "level",
+			"buckets": []map[string]any{
+				{"key": "warn", "count": 3, "last_seen": 1760000000000},
+			},
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "json")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "logs", "aggregate")
+	})
+	if err != nil {
+		t.Fatalf("logs aggregate: %v", err)
+	}
+	for _, want := range []string{"retention", "capacity", "sample_size"} {
+		if !strings.Contains(out, want) {
+			t.Errorf("stdout missing %q in: %s", want, out)
+		}
+	}
+}
+
+// Numeric-last_seen rendering regression: the runtime logs endpoint returns
+// last_seen as epoch millis (a JSON number), which json.Unmarshal decodes as
+// float64. Naively rendering with fmt.Sprintf("%v", ...) produces scientific
+// notation (e.g. "1.76e+12") for large numbers — useless in a table. The
+// formatLastSeen helper must type-switch and emit RFC3339.
+func TestLogsAggregate_LastSeenRendersRFC3339(t *testing.T) {
+	t.Cleanup(func() { resetLogsAggregateFlags(t) })
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"source":      "runtime",
+			"retention":   "ring_buffer",
+			"capacity":    100,
+			"sample_size": 1,
+			"group_by":    "level",
+			"buckets": []map[string]any{
+				{"key": "warn", "count": 3, "last_seen": 1760000000000},
+			},
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "table")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "logs", "aggregate")
+	})
+	if err != nil {
+		t.Fatalf("logs aggregate: %v", err)
+	}
+	if strings.Contains(out, "e+12") || strings.Contains(out, "e+11") {
+		t.Errorf("LAST_SEEN must not render in scientific notation:\n%s", out)
+	}
+	rfc3339 := regexp.MustCompile(`\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z`)
+	if !rfc3339.MatchString(out) {
+		t.Errorf("expected RFC3339 timestamp in LAST_SEEN cell:\n%s", out)
+	}
+}
+
+func TestLogsAggregate_TableHeaders(t *testing.T) {
+	t.Cleanup(func() { resetLogsAggregateFlags(t) })
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"source":      "runtime",
+			"retention":   "ring_buffer",
+			"capacity":    100,
+			"sample_size": 1,
+			"group_by":    "level",
+			"buckets": []map[string]any{
+				{"key": "warn", "count": 3, "last_seen": 1760000000000},
+			},
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "table")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "logs", "aggregate")
+	})
+	if err != nil {
+		t.Fatalf("logs aggregate: %v", err)
+	}
+	for _, want := range []string{"KEY", "COUNT", "LAST_SEEN"} {
+		if !strings.Contains(out, want) {
+			t.Errorf("table missing header %q in:\n%s", want, out)
+		}
+	}
+}
+
+func TestLogsAggregate_DistinctFromTail(t *testing.T) {
+	if logsAggregateCmd == nil {
+		t.Fatal("logsAggregateCmd not declared")
+	}
+	if !strings.HasPrefix(logsAggregateCmd.Use, "aggregate") {
+		t.Fatalf("Use = %q", logsAggregateCmd.Use)
+	}
+	// Sanity: aggregate is NOT a watch loop — has no --follow flag.
+	if f := logsAggregateCmd.Flags().Lookup("follow"); f != nil {
+		t.Errorf("logs aggregate should not have --follow flag (that's logs tail)")
+	}
+}
diff --git a/cmd/providers_reconnect.go b/cmd/providers_reconnect.go
new file mode 100644
index 0000000..be43753
--- /dev/null
+++ b/cmd/providers_reconnect.go
@@ -0,0 +1,51 @@
+package cmd
+
+import (
+	"net/url"
+
+	"github.com/nextlevelbuilder/goclaw-cli/internal/output"
+	"github.com/spf13/cobra"
+)
+
+// providersReconnectCmd POSTs an empty body to /v1/providers/{id}/reconnect.
+// Backend verifies reconnect server-side — do NOT add a --verify flag.
+var providersReconnectCmd = &cobra.Command{
+	Use:   "reconnect <provider-id>",
+	Short: "Force-reconnect a registered provider (admin-only)",
+	Long: `Force-reconnect a registered provider. Admin-only on the server.
+
+The server handles reconnect verification internally; no client-side --verify
+flag is exposed. (Note: ` + "`providers verify-embedding`" + ` is a different
+command targeting a different endpoint.)`,
+	Args: cobra.ExactArgs(1),
+	RunE: func(cmd *cobra.Command, args []string) error {
+		c, err := newHTTP()
+		if err != nil {
+			return err
+		}
+		data, err := c.Post("/v1/providers/"+url.PathEscape(args[0])+"/reconnect", nil)
+		if err != nil {
+			return err
+		}
+		m := unmarshalMap(data)
+		if cfg.OutputFormat != "table" {
+			printer.Print(m)
+			return nil
+		}
+		tbl := output.NewTable("STATUS", "REGISTRY_UPDATED", "CACHE_INVALIDATED", "PROVIDER")
+		var providerLabel string
+		if p, ok := m["provider"].(map[string]any); ok {
+			providerLabel = str(p, "name")
+			if providerLabel == "" {
+				providerLabel = str(p, "id")
+			}
+		}
+		tbl.AddRow(str(m, "status"), str(m, "registry_updated"), str(m, "cache_invalidated"), providerLabel)
+		printer.Print(tbl)
+		return nil
+	},
+}
+
+func init() {
+	providersCmd.AddCommand(providersReconnectCmd)
+}
diff --git a/cmd/providers_reconnect_test.go b/cmd/providers_reconnect_test.go
new file mode 100644
index 0000000..a0fd304
--- /dev/null
+++ b/cmd/providers_reconnect_test.go
@@ -0,0 +1,131 @@
+package cmd
+
+import (
+	"io"
+	"net/http"
+	"net/http/httptest"
+	"strings"
+	"sync/atomic"
+	"testing"
+)
+
+func TestProvidersReconnect_PathAndMethod(t *testing.T) {
+	var calls int64
+	var gotPath, gotMethod string
+	var gotBody []byte
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		atomic.AddInt64(&calls, 1)
+		gotPath = r.URL.Path
+		gotMethod = r.Method
+		gotBody, _ = io.ReadAll(r.Body)
+		okJSON(t, w, map[string]any{
+			"status":            "reconnected",
+			"provider":          map[string]any{"id": "prov-1", "name": "openai"},
+			"registry_updated":  true,
+			"cache_invalidated": true,
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "providers", "reconnect", "prov-1"); err != nil {
+		t.Fatalf("providers reconnect: %v", err)
+	}
+	if atomic.LoadInt64(&calls) != 1 {
+		t.Fatalf("expected exactly 1 request, got %d", atomic.LoadInt64(&calls))
+	}
+	if gotMethod != http.MethodPost {
+		t.Fatalf("method = %q, want POST", gotMethod)
+	}
+	if gotPath != "/v1/providers/prov-1/reconnect" {
+		t.Fatalf("path = %q", gotPath)
+	}
+	body := strings.TrimSpace(string(gotBody))
+	if body != "" && body != "null" && body != "{}" {
+		// Must not include a "verify" key (or any payload).
+		if strings.Contains(body, "verify") {
+			t.Fatalf("body contains 'verify': %q", body)
+		}
+	}
+}
+
+func TestProvidersReconnect_PathEscape(t *testing.T) {
+	var gotPath string
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		gotPath = r.URL.Path
+		okJSON(t, w, map[string]any{"status": "reconnected"})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	// Provider id with characters that need percent-encoding.
+	if err := runCmd(t, "providers", "reconnect", "weird/id:1"); err != nil {
+		t.Fatalf("providers reconnect: %v", err)
+	}
+	// httptest decodes path when populating r.URL.Path, so check escaped form via RawPath.
+	if !strings.Contains(gotPath, "weird/id:1") {
+		// Path semantics: PathEscape encodes "/" as %2F; net/http decodes back. Accept either form.
+		if !strings.Contains(gotPath, "weird") || !strings.Contains(gotPath, "id:1") {
+			t.Fatalf("path = %q (expected to contain escaped provider id)", gotPath)
+		}
+	}
+}
+
+func TestProvidersReconnect_JSONOutputPreservesFields(t *testing.T) {
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"status":            "disabled",
+			"registry_updated":  false,
+			"cache_invalidated": true,
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "json")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "providers", "reconnect", "prov-1")
+	})
+	if err != nil {
+		t.Fatalf("providers reconnect: %v", err)
+	}
+	if !strings.Contains(out, "registry_updated") || !strings.Contains(out, "cache_invalidated") {
+		t.Fatalf("stdout missing fields: %s", out)
+	}
+}
+
+func TestProvidersReconnect_TableOutput(t *testing.T) {
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"status":            "reconnected",
+			"registry_updated":  true,
+			"cache_invalidated": true,
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "table")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "providers", "reconnect", "prov-1")
+	})
+	if err != nil {
+		t.Fatalf("providers reconnect: %v", err)
+	}
+	if !strings.Contains(out, "STATUS") || !strings.Contains(out, "REGISTRY_UPDATED") || !strings.Contains(out, "CACHE_INVALIDATED") {
+		t.Fatalf("table headers missing in:\n%s", out)
+	}
+}
+
+func TestProvidersReconnect_MissingArg(t *testing.T) {
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	err := runCmd(t, "providers", "reconnect")
+	if err == nil {
+		t.Fatal("expected error for missing provider id")
+	}
+}
diff --git a/cmd/sessions_branch.go b/cmd/sessions_branch.go
new file mode 100644
index 0000000..c0f2a72
--- /dev/null
+++ b/cmd/sessions_branch.go
@@ -0,0 +1,85 @@
+package cmd
+
+import (
+	"fmt"
+	"net/url"
+	"strings"
+
+	"github.com/nextlevelbuilder/goclaw-cli/internal/output"
+	"github.com/spf13/cobra"
+)
+
+// sessionsBranchCmd posts to /v1/chat/sessions/{key}/branch.
+//
+// Backend route: POST /v1/chat/sessions/{key}/branch (chat domain).
+// (Sibling commands under `sessions` parent target /v1/sessions/...; this
+// command intentionally targets the chat-sessions tree where branching lives.)
+//
+// Body is constructed directly (NOT via buildBody) because up_to_index=0 is a
+// valid required value that buildBody's int-zero skip would silently drop.
+var sessionsBranchCmd = &cobra.Command{
+	Use:   "branch <sessionKey>",
+	Short: "Branch a chat session at a message index",
+	Long: `Branch a chat session into a new session by copying messages up to a 1-based
+index. The source session is unchanged.
+
+Backend route: POST /v1/chat/sessions/{key}/branch (chat domain).`,
+	Args: cobra.ExactArgs(1),
+	RunE: func(cmd *cobra.Command, args []string) error {
+		upTo, _ := cmd.Flags().GetInt("up-to-index")
+		if upTo < 0 {
+			return fmt.Errorf("--up-to-index must be >= 0 (got %d)", upTo)
+		}
+
+		// Validate metadata up front, BEFORE HTTP call.
+		metaPairs, _ := cmd.Flags().GetStringArray("metadata")
+		metadata := make(map[string]any)
+		for _, kv := range metaPairs {
+			parts := strings.SplitN(kv, "=", 2)
+			if len(parts) != 2 || parts[0] == "" {
+				return fmt.Errorf("--metadata must be key=value (got %q)", kv)
+			}
+			metadata[parts[0]] = parts[1]
+		}
+
+		// Build body directly so up_to_index=0 is preserved on the wire.
+		body := map[string]any{"up_to_index": upTo}
+		if v, _ := cmd.Flags().GetString("new-session-key"); v != "" {
+			body["new_session_key"] = v
+		}
+		if v, _ := cmd.Flags().GetString("label"); v != "" {
+			body["label"] = v
+		}
+		if len(metadata) > 0 {
+			body["metadata"] = metadata
+		}
+
+		c, err := newHTTP()
+		if err != nil {
+			return err
+		}
+		data, err := c.Post("/v1/chat/sessions/"+url.PathEscape(args[0])+"/branch", body)
+		if err != nil {
+			return err
+		}
+		m := unmarshalMap(data)
+		if cfg.OutputFormat != "table" {
+			printer.Print(m)
+			return nil
+		}
+		tbl := output.NewTable("SOURCE", "NEW_KEY", "COPIED", "TOTAL", "LABEL")
+		tbl.AddRow(str(m, "source_key"), str(m, "session_key"),
+			str(m, "copied_messages"), str(m, "total_messages"), str(m, "label"))
+		printer.Print(tbl)
+		return nil
+	},
+}
+
+func init() {
+	sessionsBranchCmd.Flags().Int("up-to-index", -1, "Copy messages 1..N into the new session (required, >=0)")
+	sessionsBranchCmd.Flags().String("new-session-key", "", "Override generated session key")
+	sessionsBranchCmd.Flags().String("label", "", "Label for the new session")
+	sessionsBranchCmd.Flags().StringArray("metadata", nil, "Repeatable key=value metadata pair")
+	_ = sessionsBranchCmd.MarkFlagRequired("up-to-index")
+	sessionsCmd.AddCommand(sessionsBranchCmd)
+}
diff --git a/cmd/sessions_branch_test.go b/cmd/sessions_branch_test.go
new file mode 100644
index 0000000..0641042
--- /dev/null
+++ b/cmd/sessions_branch_test.go
@@ -0,0 +1,197 @@
+package cmd
+
+import (
+	"encoding/json"
+	"io"
+	"net/http"
+	"net/http/httptest"
+	"strings"
+	"testing"
+
+	"github.com/spf13/pflag"
+)
+
+func resetSessionsBranchFlags(t *testing.T) {
+	t.Helper()
+	for _, name := range []string{"new-session-key", "label"} {
+		resetTestFlag(sessionsBranchCmd, name, "")
+	}
+	resetTestFlag(sessionsBranchCmd, "up-to-index", "-1")
+	// metadata is a StringArray; reset via SliceValue.Replace.
+	if f := sessionsBranchCmd.Flags().Lookup("metadata"); f != nil {
+		if sv, ok := f.Value.(pflag.SliceValue); ok {
+			_ = sv.Replace(nil)
+		}
+		f.Changed = false
+	}
+}
+
+func TestSessionsBranch_BodyShapeWithMetadata(t *testing.T) {
+	t.Cleanup(func() { resetSessionsBranchFlags(t) })
+
+	var gotPath, gotMethod string
+	var body map[string]any
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		gotPath = r.URL.Path
+		gotMethod = r.Method
+		data, _ := io.ReadAll(r.Body)
+		_ = json.Unmarshal(data, &body)
+		okJSON(t, w, map[string]any{
+			"ok":              true,
+			"source_key":      "sess-1",
+			"session_key":     "sess-1-branch",
+			"copied_messages": 12,
+			"total_messages":  24,
+			"label":           "demo",
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "sessions", "branch", "sess-1",
+		"--up-to-index=12",
+		"--new-session-key=sess-1-branch",
+		"--label=demo",
+		"--metadata=foo=bar",
+		"--metadata=baz=qux"); err != nil {
+		t.Fatalf("sessions branch: %v", err)
+	}
+	if gotMethod != http.MethodPost {
+		t.Fatalf("method = %q", gotMethod)
+	}
+	if gotPath != "/v1/chat/sessions/sess-1/branch" {
+		t.Fatalf("path = %q", gotPath)
+	}
+	// up_to_index must be present as a numeric value
+	if v, ok := body["up_to_index"]; !ok {
+		t.Fatalf("body missing up_to_index: %#v", body)
+	} else if n, ok := v.(float64); !ok || int(n) != 12 {
+		t.Fatalf("up_to_index = %#v", v)
+	}
+	if body["new_session_key"] != "sess-1-branch" {
+		t.Errorf("new_session_key = %#v", body["new_session_key"])
+	}
+	if body["label"] != "demo" {
+		t.Errorf("label = %#v", body["label"])
+	}
+	meta, ok := body["metadata"].(map[string]any)
+	if !ok {
+		t.Fatalf("metadata is not object: %#v", body["metadata"])
+	}
+	if meta["foo"] != "bar" || meta["baz"] != "qux" {
+		t.Fatalf("metadata = %#v", meta)
+	}
+}
+
+// Zero-boundary regression: --up-to-index 0 means "branch with zero copied
+// messages" (an empty branch from the session start) and must appear as
+// "up_to_index":0 in the wire body. Any helper that drops numeric zeros would
+// silently turn this into a server-side "missing required field" error.
+func TestSessionsBranch_UpToIndexZero(t *testing.T) {
+	t.Cleanup(func() { resetSessionsBranchFlags(t) })
+
+	var body map[string]any
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		data, _ := io.ReadAll(r.Body)
+		_ = json.Unmarshal(data, &body)
+		okJSON(t, w, map[string]any{"ok": true, "source_key": "sess-1", "session_key": "sess-1-branch", "copied_messages": 0, "total_messages": 5})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "sessions", "branch", "sess-1", "--up-to-index=0", "--new-session-key=sess-1-branch"); err != nil {
+		t.Fatalf("sessions branch: %v", err)
+	}
+	v, ok := body["up_to_index"]
+	if !ok {
+		t.Fatalf("body missing up_to_index when --up-to-index=0: %#v", body)
+	}
+	n, isNum := v.(float64)
+	if !isNum || n != 0 {
+		t.Fatalf("up_to_index = %#v, expected 0", v)
+	}
+}
+
+func TestSessionsBranch_MissingUpToIndex(t *testing.T) {
+	t.Cleanup(func() { resetSessionsBranchFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	err := runCmd(t, "sessions", "branch", "sess-1")
+	if err == nil {
+		t.Fatal("expected error for missing --up-to-index")
+	}
+}
+
+func TestSessionsBranch_NegativeUpToIndex(t *testing.T) {
+	t.Cleanup(func() { resetSessionsBranchFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	err := runCmd(t, "sessions", "branch", "sess-1", "--up-to-index=-1")
+	if err == nil {
+		t.Fatal("expected error for negative --up-to-index")
+	}
+}
+
+func TestSessionsBranch_MalformedMetadata(t *testing.T) {
+	t.Cleanup(func() { resetSessionsBranchFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	err := runCmd(t, "sessions", "branch", "sess-1", "--up-to-index=0", "--metadata=foobar")
+	if err == nil {
+		t.Fatal("expected error for malformed --metadata (no '=')")
+	}
+}
+
+func TestSessionsBranch_PathEscapesSessionKey(t *testing.T) {
+	t.Cleanup(func() { resetSessionsBranchFlags(t) })
+
+	var gotRawPath, gotPath string
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		gotRawPath = r.URL.RawPath
+		gotPath = r.URL.Path
+		okJSON(t, w, map[string]any{"ok": true})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "sessions", "branch", "weird:key/with-slash", "--up-to-index=0"); err != nil {
+		t.Fatalf("sessions branch: %v", err)
+	}
+	// RawPath holds the percent-encoded form; either RawPath has the escape, or decoded Path has the colon/slash.
+	// What matters: client did NOT inject literal "/" into path segment.
+	if !strings.Contains(gotRawPath, "weird%3Akey%2Fwith-slash") && !strings.Contains(gotPath, "weird:key/with-slash") {
+		t.Fatalf("path not escaped — RawPath=%q Path=%q", gotRawPath, gotPath)
+	}
+}
+
+func TestSessionsBranch_JSONPreservesCopiedAndTotal(t *testing.T) {
+	t.Cleanup(func() { resetSessionsBranchFlags(t) })
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"ok":              true,
+			"copied_messages": 12,
+			"total_messages":  24,
+			"session_key":     "sess-new",
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "json")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "sessions", "branch", "sess-1", "--up-to-index=12")
+	})
+	if err != nil {
+		t.Fatalf("sessions branch: %v", err)
+	}
+	if !strings.Contains(out, "copied_messages") || !strings.Contains(out, "total_messages") {
+		t.Fatalf("stdout missing fields: %s", out)
+	}
+}
diff --git a/cmd/sessions_follow.go b/cmd/sessions_follow.go
new file mode 100644
index 0000000..c06c43e
--- /dev/null
+++ b/cmd/sessions_follow.go
@@ -0,0 +1,83 @@
+package cmd
+
+import (
+	"fmt"
+	"net/url"
+
+	"github.com/nextlevelbuilder/goclaw-cli/internal/output"
+	"github.com/spf13/cobra"
+)
+
+// sessionsFollowCmd issues one HTTP GET to /v1/chat/sessions/{key}/history/follow.
+// NOT a watch loop and NOT a WS stream — operators wanting continuous follow
+// rerun with the returned `next_cursor`.
+//
+// Backend route: GET /v1/chat/sessions/{key}/history/follow (chat domain).
+//
+// Query string is built directly via url.Values so cursor=0 is preserved
+// (buildBody would drop int v == 0).
+var sessionsFollowCmd = &cobra.Command{
+	Use:   "follow <sessionKey>",
+	Short: "Poll cursor-based session history (one shot)",
+	Long: `Poll the next batch of session-history messages from a cursor. One-shot
+polling — no watch loop, no SSE, no WebSocket stream. Re-invoke with the
+returned ` + "`next_cursor`" + ` to advance.
+
+Backend route: GET /v1/chat/sessions/{key}/history/follow (chat domain).`,
+	Args: cobra.ExactArgs(1),
+	RunE: func(cmd *cobra.Command, args []string) error {
+		cursor, _ := cmd.Flags().GetInt("cursor")
+		limit, _ := cmd.Flags().GetInt("limit")
+		if cursor < 0 {
+			return fmt.Errorf("--cursor must be >= 0 (got %d)", cursor)
+		}
+		if limit <= 0 {
+			return fmt.Errorf("--limit must be > 0 (got %d)", limit)
+		}
+
+		q := url.Values{}
+		// Build query directly so cursor=0 appears literally.
+		q.Set("cursor", fmt.Sprintf("%d", cursor))
+		q.Set("limit", fmt.Sprintf("%d", limit))
+
+		c, err := newHTTP()
+		if err != nil {
+			return err
+		}
+		path := "/v1/chat/sessions/" + url.PathEscape(args[0]) + "/history/follow?" + q.Encode()
+		data, err := c.Get(path)
+		if err != nil {
+			return err
+		}
+		m := unmarshalMap(data)
+		if cfg.OutputFormat != "table" {
+			printer.Print(m)
+			return nil
+		}
+		// Summary row first.
+		summary := output.NewTable("SESSION", "CURSOR", "NEXT_CURSOR", "TOTAL", "RESET", "UPDATED")
+		summary.AddRow(str(m, "session_key"), str(m, "cursor"),
+			str(m, "next_cursor"), str(m, "total"), str(m, "reset"), str(m, "updated"))
+		printer.Print(summary)
+		// Compact message rows.
+		msgs, _ := m["messages"].([]any)
+		if len(msgs) > 0 {
+			tbl := output.NewTable("INDEX", "ROLE", "CONTENT")
+			for _, raw := range msgs {
+				row, ok := raw.(map[string]any)
+				if !ok {
+					continue
+				}
+				tbl.AddRow(str(row, "index"), str(row, "role"), str(row, "content"))
+			}
+			printer.Print(tbl)
+		}
+		return nil
+	},
+}
+
+func init() {
+	sessionsFollowCmd.Flags().Int("cursor", 0, "Starting cursor (>=0, default 0)")
+	sessionsFollowCmd.Flags().Int("limit", 50, "Max messages per call (default 50, server max 200)")
+	sessionsCmd.AddCommand(sessionsFollowCmd)
+}
diff --git a/cmd/sessions_follow_test.go b/cmd/sessions_follow_test.go
new file mode 100644
index 0000000..24c8c47
--- /dev/null
+++ b/cmd/sessions_follow_test.go
@@ -0,0 +1,182 @@
+package cmd
+
+import (
+	"net/http"
+	"net/http/httptest"
+	"strings"
+	"sync/atomic"
+	"testing"
+)
+
+func resetSessionsFollowFlags(t *testing.T) {
+	t.Helper()
+	// Reset to declared defaults (cursor=0, limit=50).
+	resetTestFlag(sessionsFollowCmd, "cursor", "0")
+	resetTestFlag(sessionsFollowCmd, "limit", "50")
+}
+
+func TestSessionsFollow_DefaultsAndQuery(t *testing.T) {
+	t.Cleanup(func() { resetSessionsFollowFlags(t) })
+
+	var calls int64
+	var rawQuery, path string
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		atomic.AddInt64(&calls, 1)
+		rawQuery = r.URL.RawQuery
+		path = r.URL.Path
+		okJSON(t, w, map[string]any{
+			"session_key": "sess-1",
+			"cursor":      0,
+			"next_cursor": 18,
+			"total":       18,
+			"messages":    []map[string]any{},
+			"reset":       false,
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "sessions", "follow", "sess-1"); err != nil {
+		t.Fatalf("sessions follow: %v", err)
+	}
+	if atomic.LoadInt64(&calls) != 1 {
+		t.Fatalf("expected exactly 1 request, got %d (no watch loop)", atomic.LoadInt64(&calls))
+	}
+	if path != "/v1/chat/sessions/sess-1/history/follow" {
+		t.Fatalf("path = %q", path)
+	}
+	// Defaults must be present in the raw query string.
+	if !strings.Contains(rawQuery, "cursor=0") {
+		t.Errorf("expected cursor=0 in raw query, got: %q", rawQuery)
+	}
+	if !strings.Contains(rawQuery, "limit=50") {
+		t.Errorf("expected limit=50 in raw query, got: %q", rawQuery)
+	}
+}
+
+// Zero-boundary regression: --cursor 0 is a valid pagination origin and must
+// appear literally as "cursor=0" in the raw query. Any helper that drops
+// numeric zeros (e.g. omit-empty builders) would silently break "start from
+// the beginning" semantics.
+func TestSessionsFollow_CursorZero(t *testing.T) {
+	t.Cleanup(func() { resetSessionsFollowFlags(t) })
+
+	var rawQuery string
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		rawQuery = r.URL.RawQuery
+		okJSON(t, w, map[string]any{"session_key": "sess-1", "cursor": 0, "next_cursor": 0, "total": 0, "messages": []any{}, "reset": false})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "sessions", "follow", "sess-1", "--cursor=0"); err != nil {
+		t.Fatalf("sessions follow: %v", err)
+	}
+	if !strings.Contains(rawQuery, "cursor=0") {
+		t.Fatalf("--cursor=0 must appear as cursor=0 in query, got: %q", rawQuery)
+	}
+}
+
+func TestSessionsFollow_CustomCursorAndLimit(t *testing.T) {
+	t.Cleanup(func() { resetSessionsFollowFlags(t) })
+
+	var rawQuery string
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		rawQuery = r.URL.RawQuery
+		okJSON(t, w, map[string]any{"session_key": "sess-1", "cursor": 12, "next_cursor": 17, "total": 17, "messages": []any{}})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "sessions", "follow", "sess-1", "--cursor=12", "--limit=25"); err != nil {
+		t.Fatalf("sessions follow: %v", err)
+	}
+	if !strings.Contains(rawQuery, "cursor=12") || !strings.Contains(rawQuery, "limit=25") {
+		t.Fatalf("rawQuery = %q", rawQuery)
+	}
+}
+
+func TestSessionsFollow_NegativeCursor(t *testing.T) {
+	t.Cleanup(func() { resetSessionsFollowFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	err := runCmd(t, "sessions", "follow", "sess-1", "--cursor=-1")
+	if err == nil {
+		t.Fatal("expected error for negative --cursor")
+	}
+}
+
+func TestSessionsFollow_NonPositiveLimit(t *testing.T) {
+	t.Cleanup(func() { resetSessionsFollowFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	err := runCmd(t, "sessions", "follow", "sess-1", "--limit=0", "--cursor=0")
+	if err == nil {
+		t.Fatal("expected error for --limit=0")
+	}
+}
+
+func TestSessionsFollow_PathEscape(t *testing.T) {
+	t.Cleanup(func() { resetSessionsFollowFlags(t) })
+	var rawPath, path string
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		rawPath = r.URL.RawPath
+		path = r.URL.Path
+		okJSON(t, w, map[string]any{"session_key": "weird:key/x", "cursor": 0, "next_cursor": 0, "total": 0, "messages": []any{}})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "sessions", "follow", "weird:key/x"); err != nil {
+		t.Fatalf("sessions follow: %v", err)
+	}
+	if !strings.Contains(rawPath, "weird%3Akey%2Fx") && !strings.Contains(path, "weird:key/x") {
+		t.Fatalf("path not escaped — RawPath=%q Path=%q", rawPath, path)
+	}
+}
+
+func TestSessionsFollow_JSONPreservesFields(t *testing.T) {
+	t.Cleanup(func() { resetSessionsFollowFlags(t) })
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"session_key": "sess-1",
+			"cursor":      0,
+			"next_cursor": 5,
+			"total":       5,
+			"reset":       true,
+			"messages":    []map[string]any{{"index": 0, "role": "user", "content": "hi"}},
+			"updated":     "2026-05-27T12:00:00Z",
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "json")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "sessions", "follow", "sess-1")
+	})
+	if err != nil {
+		t.Fatalf("sessions follow: %v", err)
+	}
+	for _, want := range []string{"reset", "next_cursor", "messages"} {
+		if !strings.Contains(out, want) {
+			t.Errorf("stdout missing %q in: %s", want, out)
+		}
+	}
+}
+
+func TestSessionsFollow_NotAWatchLoop(t *testing.T) {
+	// Smoke: sessionsFollowCmd must exist and not be a long-running stream
+	// (covered by atomic-counter assertion above; this is a structural check).
+	if sessionsFollowCmd == nil {
+		t.Fatal("sessionsFollowCmd not declared")
+	}
+	if !strings.HasPrefix(sessionsFollowCmd.Use, "follow") {
+		t.Fatalf("Use = %q", sessionsFollowCmd.Use)
+	}
+}
diff --git a/cmd/testdata/trace_detail_get.json b/cmd/testdata/trace_detail_get.json
new file mode 100644
index 0000000..6ecad9c
--- /dev/null
+++ b/cmd/testdata/trace_detail_get.json
@@ -0,0 +1,54 @@
+{
+  "_TODO_refresh": "stub fixture derived from traces follow payload shape; refresh against goclaw.zuey.me before merge per phase-03 reviewer gate",
+  "trace_id": "trace_FIXTURE_001",
+  "agent_id": "agent_FIXTURE_001",
+  "session_key": "session_FIXTURE_001",
+  "user_id": "user_REDACTED",
+  "tenant_id": "tenant_REDACTED",
+  "status": "success",
+  "started_at": "2026-05-28T10:00:00Z",
+  "ended_at": "2026-05-28T10:00:02Z",
+  "duration_ms": 2000,
+  "input_tokens": 120,
+  "output_tokens": 80,
+  "cost": "0.0042",
+  "spans": [
+    {
+      "span_id": "span_001",
+      "parent_span_id": null,
+      "name": "agent.run",
+      "kind": "agent",
+      "started_at": "2026-05-28T10:00:00Z",
+      "ended_at": "2026-05-28T10:00:02Z",
+      "duration_ms": 2000,
+      "status": "success"
+    },
+    {
+      "span_id": "span_002",
+      "parent_span_id": "span_001",
+      "name": "llm.call",
+      "kind": "llm",
+      "started_at": "2026-05-28T10:00:00Z",
+      "ended_at": "2026-05-28T10:00:01Z",
+      "duration_ms": 1500,
+      "status": "success",
+      "input_tokens": 120,
+      "output_tokens": 80
+    },
+    {
+      "span_id": "span_003",
+      "parent_span_id": "span_001",
+      "name": "tool.call",
+      "kind": "tool",
+      "started_at": "2026-05-28T10:00:01Z",
+      "ended_at": "2026-05-28T10:00:02Z",
+      "duration_ms": 400,
+      "status": "success"
+    }
+  ],
+  "events": [
+    {"event_id": "ev_001", "span_id": "span_002", "type": "llm.prompt", "timestamp": "2026-05-28T10:00:00Z"},
+    {"event_id": "ev_002", "span_id": "span_002", "type": "llm.completion", "timestamp": "2026-05-28T10:00:01Z"},
+    {"event_id": "ev_003", "span_id": "span_003", "type": "tool.invoke", "timestamp": "2026-05-28T10:00:01Z"}
+  ]
+}
diff --git a/cmd/traces.go b/cmd/traces.go
index 851bff1..6803495 100644
--- a/cmd/traces.go
+++ b/cmd/traces.go
@@ -1,12 +1,16 @@
 package cmd
 
 import (
+	"encoding/json"
 	"fmt"
 	"io"
 	"net/url"
 	"os"
+	"regexp"
+	"strings"
 	"time"
 
+	"github.com/nextlevelbuilder/goclaw-cli/internal/client"
 	"github.com/nextlevelbuilder/goclaw-cli/internal/output"
 	"github.com/spf13/cobra"
 )
@@ -61,19 +65,130 @@ var tracesListCmd = &cobra.Command{
 var tracesGetCmd = &cobra.Command{
 	Use: "get <traceID>", Short: "Get trace with span tree", Args: cobra.ExactArgs(1),
 	RunE: func(cmd *cobra.Command, args []string) error {
+		id := strings.TrimSpace(args[0])
+		if err := validateTraceID(id); err != nil {
+			return err
+		}
 		c, err := newHTTP()
 		if err != nil {
 			return err
 		}
-		data, err := c.Get("/v1/traces/" + args[0])
+		data, err := c.Get("/v1/traces/" + url.PathEscape(id))
 		if err != nil {
 			return err
 		}
-		printer.Print(unmarshalMap(data))
+		var trace map[string]any
+		if err := json.Unmarshal(data, &trace); err != nil {
+			return fmt.Errorf("decode trace payload: %w", err)
+		}
+		if cfg.OutputFormat != "table" {
+			printer.Print(trace)
+			return nil
+		}
+		renderTraceTable(trace, os.Stdout)
 		return nil
 	},
 }
 
+// traceIDPattern restricts trace ids to a safe, URL-safe allowlist.
+// Blocks path-traversal (`..`, `/`, `\`), control characters, and whitespace
+// before any HTTP call is issued. PathEscape is still applied on top.
+var traceIDPattern = regexp.MustCompile(`^[A-Za-z0-9._-]+$`)
+
+func validateTraceID(id string) error {
+	if id == "" || id == "." || id == ".." {
+		return &client.APIError{Code: "INVALID_REQUEST", Message: "trace id is empty or reserved"}
+	}
+	if !traceIDPattern.MatchString(id) {
+		return &client.APIError{Code: "INVALID_REQUEST", Message: "trace id contains invalid characters (allowed: A-Z a-z 0-9 . _ -)"}
+	}
+	return nil
+}
+
+// renderTraceTable prints a human-readable summary: header card, span tree, events.
+func renderTraceTable(t map[string]any, w io.Writer) {
+	for _, row := range [][2]string{
+		{"TRACE_ID", str(t, "trace_id")}, {"AGENT_ID", str(t, "agent_id")},
+		{"SESSION_KEY", str(t, "session_key")}, {"STATUS", str(t, "status")},
+		{"DURATION_MS", str(t, "duration_ms")},
+	} {
+		if row[1] != "" {
+			fmt.Fprintf(w, "%-12s %s\n", row[0]+":", row[1])
+		}
+	}
+	if in, out, cost := str(t, "input_tokens"), str(t, "output_tokens"), str(t, "cost"); in+out+cost != "" {
+		fmt.Fprintf(w, "%-12s in=%s out=%s cost=%s\n", "TOKENS:", in, out, cost)
+	}
+	spans, _ := t["spans"].([]any)
+	if len(spans) == 0 {
+		fmt.Fprintln(w, "\nSPANS: (none)")
+	} else {
+		fmt.Fprintln(w, "\nSPANS:")
+		output.PrintTreeRoot(buildSpanTree(spans), w)
+	}
+	events, _ := t["events"].([]any)
+	fmt.Fprintf(w, "\nEVENTS (n=%d):\n", len(events))
+	for _, e := range events {
+		if m, ok := e.(map[string]any); ok {
+			fmt.Fprintf(w, "  - %s\n", str(m, "type"))
+		}
+	}
+}
+
+// buildSpanTree links spans via parent_span_id; spans whose parent isn't in this
+// trace attach to a virtual root. Children are kept in insertion order.
+func buildSpanTree(spans []any) output.TreeNode {
+	order := make([]string, 0, len(spans))
+	labels := make(map[string]string, len(spans))
+	children := make(map[string][]string, len(spans))
+	parentOf := make(map[string]string, len(spans))
+	for _, s := range spans {
+		m, ok := s.(map[string]any)
+		if !ok {
+			continue
+		}
+		id := str(m, "span_id")
+		if id == "" {
+			continue
+		}
+		label := id
+		if name := str(m, "name"); name != "" {
+			label = name + " [" + id + "]"
+		}
+		if kind := str(m, "kind"); kind != "" {
+			label += " kind=" + kind
+		}
+		if dur := str(m, "duration_ms"); dur != "" {
+			label += " " + dur + "ms"
+		}
+		labels[id] = label
+		order = append(order, id)
+		parentOf[id], _ = m["parent_span_id"].(string)
+	}
+	for _, id := range order {
+		if p := parentOf[id]; p != "" {
+			if _, ok := labels[p]; ok {
+				children[p] = append(children[p], id)
+				continue
+			}
+		}
+		children[""] = append(children[""], id)
+	}
+	var build func(id string) output.TreeNode
+	build = func(id string) output.TreeNode {
+		n := output.TreeNode{Name: labels[id]}
+		for _, c := range children[id] {
+			n.Children = append(n.Children, build(c))
+		}
+		return n
+	}
+	root := output.TreeNode{Name: "trace"}
+	for _, id := range children[""] {
+		root.Children = append(root.Children, build(id))
+	}
+	return root
+}
+
 var tracesExportCmd = &cobra.Command{
 	Use: "export <traceID>", Short: "Export trace to file", Args: cobra.ExactArgs(1),
 	RunE: func(cmd *cobra.Command, args []string) error {
diff --git a/cmd/traces_follow.go b/cmd/traces_follow.go
new file mode 100644
index 0000000..d07ef1c
--- /dev/null
+++ b/cmd/traces_follow.go
@@ -0,0 +1,102 @@
+package cmd
+
+import (
+	"fmt"
+	"net/url"
+	"time"
+
+	"github.com/nextlevelbuilder/goclaw-cli/internal/output"
+	"github.com/spf13/cobra"
+)
+
+// tracesFollowCmd issues a single polling GET to /v1/traces/follow. NOT a watch loop.
+// Operators wanting continuous follow rerun the command with the returned `next_since`.
+var tracesFollowCmd = &cobra.Command{
+	Use:   "follow",
+	Short: "Poll incremental trace activity (one shot)",
+	Long: `Poll incremental trace activity for a session or agent.
+
+Exactly one of --session-key or --agent must be provided. This is a one-shot
+polling request — no watch loop. Use the returned ` + "`next_since`" + ` to re-poll.`,
+	RunE: func(cmd *cobra.Command, args []string) error {
+		sessionKey, _ := cmd.Flags().GetString("session-key")
+		agent, _ := cmd.Flags().GetString("agent")
+		if sessionKey == "" && agent == "" {
+			return fmt.Errorf("exactly one of --session-key or --agent is required")
+		}
+		if sessionKey != "" && agent != "" {
+			return fmt.Errorf("--session-key and --agent are mutually exclusive")
+		}
+
+		since, _ := cmd.Flags().GetString("since")
+		if since != "" {
+			if _, err := time.Parse(time.RFC3339, since); err != nil {
+				return fmt.Errorf("--since must be RFC3339: %w", err)
+			}
+		}
+
+		q := url.Values{}
+		if sessionKey != "" {
+			q.Set("session_key", sessionKey)
+		}
+		if agent != "" {
+			q.Set("agent_id", agent)
+		}
+		if since != "" {
+			q.Set("since", since)
+		}
+		if v, _ := cmd.Flags().GetInt("limit"); v > 0 {
+			q.Set("limit", fmt.Sprintf("%d", v))
+		}
+		if v, _ := cmd.Flags().GetString("status"); v != "" {
+			q.Set("status", v)
+		}
+		if v, _ := cmd.Flags().GetString("channel"); v != "" {
+			q.Set("channel", v)
+		}
+		if v, _ := cmd.Flags().GetBool("include-spans"); v {
+			q.Set("include_spans", "true")
+		}
+
+		c, err := newHTTP()
+		if err != nil {
+			return err
+		}
+		path := "/v1/traces/follow"
+		if len(q) > 0 {
+			path += "?" + q.Encode()
+		}
+		data, err := c.Get(path)
+		if err != nil {
+			return err
+		}
+		envelope := unmarshalMap(data)
+		if cfg.OutputFormat != "table" {
+			printer.Print(envelope)
+			return nil
+		}
+		traces, _ := envelope["traces"].([]any)
+		tbl := output.NewTable("TRACE_ID", "AGENT", "STATUS", "DURATION_MS", "INPUT_TOKENS", "OUTPUT_TOKENS", "COST")
+		for _, raw := range traces {
+			t, ok := raw.(map[string]any)
+			if !ok {
+				continue
+			}
+			tbl.AddRow(str(t, "trace_id"), str(t, "agent_id"), str(t, "status"),
+				str(t, "duration_ms"), str(t, "input_tokens"), str(t, "output_tokens"), str(t, "cost"))
+		}
+		printer.Print(tbl)
+		return nil
+	},
+}
+
+func init() {
+	tracesFollowCmd.Flags().String("session-key", "", "Session key to follow")
+	tracesFollowCmd.Flags().String("agent", "", "Agent id or key to follow")
+	tracesFollowCmd.Flags().String("since", "", "RFC3339 timestamp; only traces after this are returned")
+	tracesFollowCmd.Flags().Int("limit", 0, "Max traces (server default 50, max 200)")
+	tracesFollowCmd.Flags().String("status", "", "Filter by status")
+	tracesFollowCmd.Flags().String("channel", "", "Filter by channel")
+	tracesFollowCmd.Flags().Bool("include-spans", false, "Include spans_by_trace_id in response")
+	tracesCmd.AddCommand(tracesFollowCmd)
+}
diff --git a/cmd/traces_follow_test.go b/cmd/traces_follow_test.go
new file mode 100644
index 0000000..43f9091
--- /dev/null
+++ b/cmd/traces_follow_test.go
@@ -0,0 +1,187 @@
+package cmd
+
+import (
+	"net/http"
+	"net/http/httptest"
+	"net/url"
+	"regexp"
+	"strings"
+	"sync/atomic"
+	"testing"
+)
+
+// resetTracesFollowFlags returns flags to default state between subtests.
+func resetTracesFollowFlags(t *testing.T) {
+	t.Helper()
+	for _, name := range []string{"session-key", "agent", "since", "status", "channel", "include-spans"} {
+		resetTestFlag(tracesFollowCmd, name, "")
+	}
+	resetTestFlag(tracesFollowCmd, "limit", "0")
+}
+
+func TestTracesFollow_SessionKeyBuildsQuery(t *testing.T) {
+	t.Cleanup(func() { resetTracesFollowFlags(t) })
+
+	var calls int64
+	var query url.Values
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		atomic.AddInt64(&calls, 1)
+		if r.URL.Path != "/v1/traces/follow" {
+			w.WriteHeader(http.StatusNotFound)
+			return
+		}
+		query = r.URL.Query()
+		okJSON(t, w, map[string]any{"traces": []map[string]any{}, "spans_by_trace_id": map[string]any{}, "next_since": "2026-05-27T12:00:00Z", "limit": 50})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "traces", "follow", "--session-key=sess-1", "--since=2026-05-27T00:00:00Z", "--limit=25", "--status=success", "--channel=telegram"); err != nil {
+		t.Fatalf("traces follow: %v", err)
+	}
+	if atomic.LoadInt64(&calls) != 1 {
+		t.Fatalf("expected exactly 1 request, got %d (no watch loop allowed)", atomic.LoadInt64(&calls))
+	}
+	if query.Get("session_key") != "sess-1" {
+		t.Errorf("session_key = %q", query.Get("session_key"))
+	}
+	if query.Get("since") != "2026-05-27T00:00:00Z" {
+		t.Errorf("since = %q", query.Get("since"))
+	}
+	if query.Get("limit") != "25" {
+		t.Errorf("limit = %q", query.Get("limit"))
+	}
+	if query.Get("status") != "success" {
+		t.Errorf("status = %q", query.Get("status"))
+	}
+	if query.Get("channel") != "telegram" {
+		t.Errorf("channel = %q", query.Get("channel"))
+	}
+}
+
+func TestTracesFollow_AgentTargetBuildsQuery(t *testing.T) {
+	t.Cleanup(func() { resetTracesFollowFlags(t) })
+
+	var query url.Values
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		query = r.URL.Query()
+		okJSON(t, w, map[string]any{"traces": []map[string]any{}})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "traces", "follow", "--agent=agent-1", "--include-spans"); err != nil {
+		t.Fatalf("traces follow: %v", err)
+	}
+	if query.Get("agent_id") != "agent-1" {
+		t.Errorf("agent_id = %q", query.Get("agent_id"))
+	}
+	if query.Get("include_spans") != "true" {
+		t.Errorf("include_spans = %q", query.Get("include_spans"))
+	}
+	if query.Has("session_key") {
+		t.Errorf("session_key should not be set: %#v", query)
+	}
+}
+
+func TestTracesFollow_RejectMissingTarget(t *testing.T) {
+	t.Cleanup(func() { resetTracesFollowFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	err := runCmd(t, "traces", "follow")
+	if err == nil {
+		t.Fatal("expected validation error for missing target")
+	}
+	if !strings.Contains(err.Error(), "session-key") && !strings.Contains(err.Error(), "agent") {
+		t.Errorf("error should mention target flags: %v", err)
+	}
+}
+
+func TestTracesFollow_RejectBothTargets(t *testing.T) {
+	t.Cleanup(func() { resetTracesFollowFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	err := runCmd(t, "traces", "follow", "--session-key=sess-1", "--agent=agent-1")
+	if err == nil {
+		t.Fatal("expected validation error when both target flags set")
+	}
+}
+
+func TestTracesFollow_RejectInvalidSince(t *testing.T) {
+	t.Cleanup(func() { resetTracesFollowFlags(t) })
+	t.Setenv("GOCLAW_SERVER", "http://localhost:9")
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	err := runCmd(t, "traces", "follow", "--session-key=sess-1", "--since=not-a-timestamp")
+	if err == nil {
+		t.Fatal("expected RFC3339 validation error")
+	}
+}
+
+func TestTracesFollow_JSONPreservesEnvelope(t *testing.T) {
+	t.Cleanup(func() { resetTracesFollowFlags(t) })
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"traces":            []map[string]any{{"trace_id": "t1"}},
+			"spans_by_trace_id": map[string]any{"t1": []any{}},
+			"next_since":        "2026-05-27T13:00:00Z",
+			"server_time":       "2026-05-27T12:30:00Z",
+			"limit":             50,
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "json")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "traces", "follow", "--session-key=sess-1")
+	})
+	if err != nil {
+		t.Fatalf("traces follow: %v", err)
+	}
+	if !strings.Contains(out, "next_since") || !strings.Contains(out, "spans_by_trace_id") {
+		t.Fatalf("stdout missing fields: %s", out)
+	}
+}
+
+func TestTracesFollow_TableHeaders(t *testing.T) {
+	t.Cleanup(func() { resetTracesFollowFlags(t) })
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, map[string]any{
+			"traces": []map[string]any{
+				{"trace_id": "t1", "agent_id": "agent-1", "status": "success", "duration_ms": 120, "input_tokens": 50, "output_tokens": 30, "cost": "0.001"},
+			},
+		})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+	t.Setenv("GOCLAW_OUTPUT", "table")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "traces", "follow", "--session-key=sess-1")
+	})
+	if err != nil {
+		t.Fatalf("traces follow: %v", err)
+	}
+	headerRE := regexp.MustCompile(`TRACE_ID.*AGENT.*STATUS.*DURATION_MS.*INPUT_TOKENS.*OUTPUT_TOKENS.*COST`)
+	if !headerRE.MatchString(out) {
+		t.Fatalf("table headers missing in:\n%s", out)
+	}
+}
+
+func TestTracesFollow_DoesNotImportFollowStream(t *testing.T) {
+	// Static assertion: tracesFollowCmd uses one HTTP GET via httpClient, not FollowStream.
+	// Covered indirectly by the atomic-counter test above; this test is a smoke for command existence.
+	if tracesFollowCmd == nil {
+		t.Fatal("tracesFollowCmd not declared")
+	}
+	if tracesFollowCmd.Use == "" || !strings.HasPrefix(tracesFollowCmd.Use, "follow") {
+		t.Fatalf("tracesFollowCmd.Use = %q, expected to start with 'follow'", tracesFollowCmd.Use)
+	}
+}
diff --git a/cmd/traces_get_test.go b/cmd/traces_get_test.go
new file mode 100644
index 0000000..fd14ce2
--- /dev/null
+++ b/cmd/traces_get_test.go
@@ -0,0 +1,322 @@
+package cmd
+
+import (
+	"encoding/json"
+	"net/http"
+	"net/http/httptest"
+	"os"
+	"strings"
+	"sync/atomic"
+	"testing"
+
+	"github.com/nextlevelbuilder/goclaw-cli/internal/output"
+)
+
+// loadTraceDetailFixture reads the captured trace detail envelope from testdata.
+// The fixture is a single trace map (not wrapped). Tests wrap it via okJSON.
+func loadTraceDetailFixture(t *testing.T) map[string]any {
+	t.Helper()
+	data, err := os.ReadFile("testdata/trace_detail_get.json")
+	if err != nil {
+		t.Fatalf("read fixture: %v", err)
+	}
+	var m map[string]any
+	if err := json.Unmarshal(data, &m); err != nil {
+		t.Fatalf("decode fixture: %v", err)
+	}
+	return m
+}
+
+// errJSON writes an error envelope mimicking the server shape.
+func errJSON(t *testing.T, w http.ResponseWriter, status int, code, message string) {
+	t.Helper()
+	w.Header().Set("Content-Type", "application/json")
+	w.WriteHeader(status)
+	body, _ := json.Marshal(map[string]any{
+		"ok":    false,
+		"error": map[string]any{"code": code, "message": message},
+	})
+	_, _ = w.Write(body)
+}
+
+// TestTracesGet_PathAndMethod locks the wire contract: GET /v1/traces/{id}.
+func TestTracesGet_PathAndMethod(t *testing.T) {
+	var calls int64
+	var gotPath, gotMethod string
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		atomic.AddInt64(&calls, 1)
+		gotPath = r.URL.Path
+		gotMethod = r.Method
+		okJSON(t, w, loadTraceDetailFixture(t))
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	if err := runCmd(t, "traces", "get", "trace_FIXTURE_001", "--output", "json"); err != nil {
+		t.Fatalf("traces get: %v", err)
+	}
+	if atomic.LoadInt64(&calls) != 1 {
+		t.Fatalf("expected 1 request, got %d", atomic.LoadInt64(&calls))
+	}
+	if gotMethod != http.MethodGet {
+		t.Errorf("method = %q, want GET", gotMethod)
+	}
+	if gotPath != "/v1/traces/trace_FIXTURE_001" {
+		t.Errorf("path = %q, want /v1/traces/trace_FIXTURE_001", gotPath)
+	}
+}
+
+// TestTracesGet_HappyPath_JSON_LocksFixture round-trips the JSON envelope.
+func TestTracesGet_HappyPath_JSON_LocksFixture(t *testing.T) {
+	fixture := loadTraceDetailFixture(t)
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, fixture)
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "traces", "get", "trace_FIXTURE_001", "--output", "json")
+	})
+	if err != nil {
+		t.Fatalf("traces get: %v", err)
+	}
+	var got map[string]any
+	if err := json.Unmarshal([]byte(out), &got); err != nil {
+		t.Fatalf("stdout is not JSON: %v\nstdout: %q", err, out)
+	}
+	if got["trace_id"] != "trace_FIXTURE_001" {
+		t.Errorf("trace_id = %v", got["trace_id"])
+	}
+	if got["agent_id"] != "agent_FIXTURE_001" {
+		t.Errorf("agent_id = %v", got["agent_id"])
+	}
+	if got["status"] != "success" {
+		t.Errorf("status = %v", got["status"])
+	}
+	spans, ok := got["spans"].([]any)
+	if !ok || len(spans) != 3 {
+		t.Errorf("spans = %v (want 3 entries)", got["spans"])
+	}
+}
+
+// TestTracesGet_TableMode_HumanReadable_RED — the issue #17 repro (now green after fix).
+func TestTracesGet_TableMode_HumanReadable_RED(t *testing.T) {
+	fixture := loadTraceDetailFixture(t)
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, fixture)
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "traces", "get", "trace_FIXTURE_001", "--output", "table")
+	})
+	if err != nil {
+		t.Fatalf("traces get: %v", err)
+	}
+	trimmed := strings.TrimSpace(out)
+	if strings.HasPrefix(trimmed, "{") {
+		t.Fatalf("table mode rendered raw JSON (starts with '{'): %q", out)
+	}
+	wantAny := []string{"TRACE", "SPAN", "EVENT", "trace_id", "agent_id"}
+	hit := false
+	for _, m := range wantAny {
+		if strings.Contains(out, m) {
+			hit = true
+			break
+		}
+	}
+	if !hit {
+		t.Fatalf("table mode missing human-readable markers; got: %q", out)
+	}
+}
+
+// TestTracesGet_TableMode_HasHeaderAndSpanMarkers — verifies header card + tree drawing.
+func TestTracesGet_TableMode_HasHeaderAndSpanMarkers(t *testing.T) {
+	fixture := loadTraceDetailFixture(t)
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, fixture)
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "traces", "get", "trace_FIXTURE_001", "--output", "table")
+	})
+	if err != nil {
+		t.Fatalf("traces get: %v", err)
+	}
+	if strings.HasPrefix(strings.TrimSpace(out), "{") {
+		t.Fatalf("table mode rendered raw JSON: %q", out)
+	}
+	if !strings.Contains(out, "TRACE_ID") {
+		t.Errorf("missing TRACE_ID header in: %q", out)
+	}
+	// At least one tree connector must appear (├─ or └─).
+	if !strings.Contains(out, "├") && !strings.Contains(out, "└") {
+		t.Errorf("missing span tree connectors (├ / └) in: %q", out)
+	}
+	if !strings.Contains(out, "EVENTS") {
+		t.Errorf("missing EVENTS section in: %q", out)
+	}
+}
+
+// TestTracesGet_JSONMode_PreservesStructure — all top-level fixture keys present in JSON output.
+func TestTracesGet_JSONMode_PreservesStructure(t *testing.T) {
+	fixture := loadTraceDetailFixture(t)
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		okJSON(t, w, fixture)
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "traces", "get", "trace_FIXTURE_001", "--output", "json")
+	})
+	if err != nil {
+		t.Fatalf("traces get: %v", err)
+	}
+	var got map[string]any
+	if err := json.Unmarshal([]byte(out), &got); err != nil {
+		t.Fatalf("not JSON: %v", err)
+	}
+	for k := range fixture {
+		if _, ok := got[k]; !ok {
+			t.Errorf("top-level key %q missing from JSON output", k)
+		}
+	}
+	// Nested reachability: spans[0].name
+	spans, ok := got["spans"].([]any)
+	if !ok || len(spans) == 0 {
+		t.Fatalf("spans not reachable")
+	}
+	first, ok := spans[0].(map[string]any)
+	if !ok || first["name"] == nil {
+		t.Errorf("spans[0].name not reachable: %v", spans[0])
+	}
+}
+
+// TestTracesGet_NotFound_ExitCode3 — 404 → ExitNotFound.
+func TestTracesGet_NotFound_ExitCode3(t *testing.T) {
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		errJSON(t, w, http.StatusNotFound, "NOT_FOUND", "trace not found")
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	err := runCmd(t, "traces", "get", "doesnotexist", "--output", "json")
+	if err == nil {
+		t.Fatal("expected error, got nil")
+	}
+	if code := output.FromError(err); code != output.ExitNotFound {
+		t.Errorf("exit code = %d, want %d (ExitNotFound)", code, output.ExitNotFound)
+	}
+	if !strings.Contains(strings.ToLower(err.Error()), "not found") {
+		t.Errorf("error message should mention 'not found': %q", err.Error())
+	}
+}
+
+// TestTracesGet_PermissionDenied_ExitCode2 — 403 → ExitAuth.
+func TestTracesGet_PermissionDenied_ExitCode2(t *testing.T) {
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		errJSON(t, w, http.StatusForbidden, "TENANT_ACCESS_REVOKED", "tenant access revoked")
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	err := runCmd(t, "traces", "get", "trace_FIXTURE_001", "--output", "json")
+	if err == nil {
+		t.Fatal("expected error, got nil")
+	}
+	if code := output.FromError(err); code != output.ExitAuth {
+		t.Errorf("exit code = %d, want %d (ExitAuth)", code, output.ExitAuth)
+	}
+}
+
+// TestTracesGet_MalformedID_NoHTTPCall — id validation runs before HTTP; exit 4.
+func TestTracesGet_MalformedID_NoHTTPCall(t *testing.T) {
+	var calls int64
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		atomic.AddInt64(&calls, 1)
+		okJSON(t, w, map[string]any{})
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	bad := []string{"", "  ", "..", ".", "../etc/passwd", "a/b", "a\\b", "a\x00b", "a b"}
+	for _, id := range bad {
+		t.Run("id="+strings.ReplaceAll(id, "\x00", "NUL"), func(t *testing.T) {
+			before := atomic.LoadInt64(&calls)
+			err := runCmd(t, "traces", "get", id, "--output", "json")
+			if err == nil {
+				t.Fatalf("expected validation error for %q, got nil", id)
+			}
+			if code := output.FromError(err); code != output.ExitValidation {
+				t.Errorf("id=%q: exit = %d, want %d (ExitValidation)", id, code, output.ExitValidation)
+			}
+			if got := atomic.LoadInt64(&calls); got != before {
+				t.Errorf("id=%q: HTTP call made (calls %d -> %d); should be blocked client-side", id, before, got)
+			}
+		})
+	}
+}
+
+// TestTracesGet_ServerError_ExitCode5 — 5xx → ExitServer; client retries so calls >= 1.
+func TestTracesGet_ServerError_ExitCode5(t *testing.T) {
+	if testing.Short() {
+		t.Skip("skipping in -short: HTTP client backs off ~3s between 5xx retries")
+	}
+	var calls int64
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		atomic.AddInt64(&calls, 1)
+		errJSON(t, w, http.StatusInternalServerError, "INTERNAL", "boom")
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	err := runCmd(t, "traces", "get", "trace_FIXTURE_001", "--output", "json")
+	if err == nil {
+		t.Fatal("expected error, got nil")
+	}
+	if code := output.FromError(err); code != output.ExitServer {
+		t.Errorf("exit code = %d, want %d (ExitServer)", code, output.ExitServer)
+	}
+	if n := atomic.LoadInt64(&calls); n < 1 {
+		t.Errorf("calls = %d, want >= 1 (client retries 5xx)", n)
+	}
+}
+
+// TestTracesGet_MalformedResponse_SurfacesError — bad JSON body → wrapped decode error.
+func TestTracesGet_MalformedResponse_SurfacesError(t *testing.T) {
+	srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		w.Header().Set("Content-Type", "application/json")
+		_, _ = w.Write([]byte("this is not json"))
+	}))
+	defer srv.Close()
+	t.Setenv("GOCLAW_SERVER", srv.URL)
+	t.Setenv("GOCLAW_TOKEN", "test-token")
+
+	out, err := captureStdout(t, func() error {
+		return runCmd(t, "traces", "get", "trace_FIXTURE_001", "--output", "json")
+	})
+	if err == nil {
+		t.Fatal("expected decode error, got nil")
+	}
+	msg := strings.ToLower(err.Error())
+	if !strings.Contains(msg, "decode") && !strings.Contains(msg, "unmarshal") && !strings.Contains(msg, "invalid") {
+		t.Errorf("error should mention decode/unmarshal/invalid; got: %q", err.Error())
+	}
+	if strings.TrimSpace(out) == "{}" {
+		t.Errorf("stdout is empty JSON object — silent failure regressed: %q", out)
+	}
+}
diff --git a/docs/codebase-summary.md b/docs/codebase-summary.md
index a5fc48d..199e13c 100644
--- a/docs/codebase-summary.md
+++ b/docs/codebase-summary.md
@@ -1,7 +1,7 @@
 # GoClaw CLI - Codebase Summary
 
 **Generated from:** `repomix-output.xml` (2026-04-15), updated manually 2026-05-20
-**Phase Status:** P0-P4 Complete (AI-First Expansion); Super Admin API Parity Complete; Domain Coverage P5 Implemented
+**Phase Status:** P0-P4 Complete (AI-First Expansion); Super Admin API Parity Complete; Domain Coverage P5 + P6 (Backend-Unblocked) Implemented
 **Total Files:** 80+
 **Estimated Tokens:** 80,000+
 **Total Size:** 220+ KB
@@ -10,7 +10,7 @@
 
 ## Overview
 
-GoClaw CLI is a production-ready Go application providing comprehensive command-line management for GoClaw AI agent gateway servers. Built with Cobra framework, it supports 30+ command groups across modular command files with dual modes: interactive (human) and automation (CI/agent). Phases 0-4 (AI-first expansion) add AI ergonomics, admin/ops, migration, vault, and advanced agent/team/memory support. The 2026-05-18 super-admin parity work adds gateway upgrade, package updates, workstations, webhooks, MCP user credentials, secure env reveal, media/TTS/storage/channel fillers, and focused route-contract tests. The 2026-05-19 P3/P4 filler pass adds first-class profile commands, `GOCLAW_PROFILE`, `sessions compact`, WS health, trace filter polish, `codex-pool`, `api-keys rotate`, `config defaults`, chat session convenience wrappers, and `tools invoke --args`. The 2026-05-20 P5 filler pass adds team attachment download, skill-specific evolution suggestion apply, and fixes evolution update payload compatibility.
+GoClaw CLI is a production-ready Go application providing comprehensive command-line management for GoClaw AI agent gateway servers. Built with Cobra framework, it supports 30+ command groups across modular command files with dual modes: interactive (human) and automation (CI/agent). Phases 0-4 (AI-first expansion) add AI ergonomics, admin/ops, migration, vault, and advanced agent/team/memory support. The 2026-05-18 super-admin parity work adds gateway upgrade, package updates, workstations, webhooks, MCP user credentials, secure env reveal, media/TTS/storage/channel fillers, and focused route-contract tests. The 2026-05-19 P3/P4 filler pass adds first-class profile commands, `GOCLAW_PROFILE`, `sessions compact`, WS health, trace filter polish, `codex-pool`, `api-keys rotate`, `config defaults`, chat session convenience wrappers, and `tools invoke --args`. The 2026-05-20 P5 filler pass adds team attachment download, skill-specific evolution suggestion apply, and fixes evolution update payload compatibility. The 2026-05-27 P6 backend-unblocked pass adds seven new surfaces wired to backend PRs `#37` and `#44`: `traces follow`, `providers reconnect`, `sessions branch`, `sessions follow`, `channels writers test`, `activity aggregate`, and `logs aggregate` — all one-shot HTTP commands (no new watch loops; reuse the existing `client.FollowStream` only for true streaming surfaces).
 
 **Key Metrics:**
 - **70+ command files** in `cmd/` (modularized for maintainability)
@@ -316,7 +316,7 @@ goclaw (root)
 │   ├── workspace (list, read, delete, upload, move)
 │   └── attachments download <team-id> <attachment-id> --output <file>
 ├── channels (list, contacts, pending-messages)
-├── traces (list, export)
+├── traces (list, get, export, follow)              # `get` validates id allowlist, renders header+span-tree+events for TTY, JSON for piped/`-o json`
 ├── memory (list, search, upsert)
 ├── knowledge-graph (entities, links, query)
 ├── usage (summary, detail, costs, timeseries, breakdown)
diff --git a/internal/client/http.go b/internal/client/http.go
index ad09ddd..b765b63 100644
--- a/internal/client/http.go
+++ b/internal/client/http.go
@@ -161,10 +161,16 @@ func (c *HTTPClient) do(method, path string, body any) (json.RawMessage, error)
 		if resp.StatusCode != 429 && resp.StatusCode < 500 {
 			break
 		}
-		resp.Body.Close()
-		if attempt < 2 {
-			time.Sleep(time.Duration(1<<attempt) * time.Second)
+		// Final retryable response: keep the body open so the caller can
+		// decode the structured error envelope (status + code + message).
+		// Closing here would force a "read on closed body" downstream and
+		// collapse the typed APIError into an opaque wrapped error,
+		// losing the exit-code mapping for 5xx/429.
+		if attempt == 2 {
+			break
 		}
+		resp.Body.Close()
+		time.Sleep(time.Duration(1<<attempt) * time.Second)
 	}
 	defer resp.Body.Close()
 
diff --git a/plans/260417-1254-goclaw-claude-skill/phase-01-scaffold-skill-structure.md b/plans/260417-1254-goclaw-claude-skill/phase-01-scaffold-skill-structure.md
new file mode 100644
index 0000000..a9f563e
--- /dev/null
+++ b/plans/260417-1254-goclaw-claude-skill/phase-01-scaffold-skill-structure.md
@@ -0,0 +1,126 @@
+---
+phase: 1
+title: Scaffold skill structure
+status: pending
+priority: high
+effort_hours: 2
+---
+
+# Phase 1 — Scaffold skill structure
+
+## Context Links
+- Parent: [plan.md](plan.md)
+- Researcher: `plans/reports/researcher-260417-1254-claude-skill-authoring.md`
+- Explorer: `plans/reports/explore-260417-1254-goclaw-command-inventory.md`
+
+## Overview
+Bootstrap thư mục `claude-skill/` trong repo `goclaw-cli` với cấu trúc chuẩn skill, SKILL.md skeleton, thư mục `references/` rỗng (stub cho 15 files), và `install.sh` chưa active. Không implement logic patching settings.json ở phase này.
+
+## Key Insights
+- Skill ở TRONG repo `goclaw-cli` (không tách repo) → version sync với CLI
+- SKILL.md < 100 lines (thin index + navigation)
+- References là progressive disclosure: Claude dùng Read tool khi cần
+- Frontmatter description LÀ keyword matcher → tune kỹ sau
+
+## Requirements
+
+### Functional
+- Tạo thư mục `claude-skill/` ở root repo
+- `SKILL.md` với frontmatter đầy đủ (name, description, when_to_use, allowed-tools)
+- Thư mục `references/` với 16 file .md rỗng (stub) — thêm `media.md` mà explorer miss
+- `install.sh` skeleton (chưa có logic, chỉ echo placeholder)
+- `README.md` skeleton
+- `LICENSE` — **MIT** (user confirmed); viết trực tiếp từ template SPDX MIT
+- Re-verify top-level command count: chạy `goclaw --help` sau khi build binary, confirm 36 groups (không phải 38 như explorer report). Update inventory nếu lệch.
+
+### Non-functional
+- File encoding UTF-8
+- Unix line endings
+- Kebab-case cho tên reference files
+
+## Architecture
+
+```
+goclaw-cli/
+├── claude-skill/
+│   ├── SKILL.md
+│   ├── README.md
+│   ├── LICENSE                 # MIT
+│   ├── install.sh
+│   ├── check-drift.sh          # CI flag-validation script (stub, impl Phase 4)
+│   └── references/             # 16 files
+│       ├── exec-workflow.md         # hero
+│       ├── auth-and-config.md
+│       ├── agents-core.md
+│       ├── agents-advanced.md
+│       ├── chat-sessions.md
+│       ├── monitoring-ops.md
+│       ├── knowledge-memory.md
+│       ├── teams-collaboration.md
+│       ├── channels-messaging.md
+│       ├── data-movement.md
+│       ├── providers-skills-tools.md
+│       ├── automation-scheduling.md
+│       ├── mcp-integration.md
+│       ├── admin-system.md
+│       ├── media.md                 # NEW: was missing from explorer
+│       └── docs-api.md
+```
+
+## Related Code Files
+
+### To create
+- `claude-skill/SKILL.md`
+- `claude-skill/README.md`
+- `claude-skill/LICENSE`
+- `claude-skill/install.sh`
+- `claude-skill/references/*.md` (15 files, stub)
+
+### To read for context
+- `goclaw-cli/README.md` — tone/voice match
+- `goclaw-cli/LICENSE` — copy same license
+- `cmd/root.go` — verify global flags (--output, --yes, --profile, --tenant-id)
+
+## Implementation Steps
+
+1. Check license file exists at repo root. If yes, copy to `claude-skill/LICENSE`. If no, note for Phase 4.
+2. Create directory `claude-skill/` and `claude-skill/references/`.
+3. Write `SKILL.md` with:
+   - Frontmatter: `name: goclaw`, `description` (front-load keywords: "goclaw", "gateway server", "AI agent", "exec remote command", "GoClaw Gateway"), `when_to_use`, `allowed-tools: Bash(goclaw:*)`
+   - Body: overview (3 paragraphs max), convention rules (always `--output json`, always read credential store from `~/.goclaw/`, NEVER hardcode token), navigation list linking to each reference file
+   - Length target: 80-100 lines
+4. Write `README.md` skeleton: title, install one-liner placeholder, 3-example prompts placeholder, requirements (goclaw in PATH, Claude Code installed), license link.
+5. Write `install.sh` skeleton: shebang `#!/usr/bin/env bash`, `set -euo pipefail`, echo-only placeholder (no patching yet). Flag Phase 4 for real implementation.
+6. Create 15 stub reference files. Each stub contains: 1-line title (H1), "TODO: populate in Phase 2/3" comment. Explicit phase assignment in TODO.
+7. Verify: `tree claude-skill/` produces expected structure; `bash -n install.sh` passes syntax check.
+
+## Todo List
+
+- [ ] Build `goclaw` binary; run `goclaw --help` → re-verify 36 top-level groups (confirm `delegations`, `media` standalone; no phantom entries)
+- [ ] Scaffold directory `claude-skill/` + `references/`
+- [ ] Write `LICENSE` (MIT, copyright NextLevelBuilder 2026)
+- [ ] Write `SKILL.md` with keyword-tuned frontmatter (< 100 lines)
+- [ ] Write `README.md` skeleton (install via tarball+SHA256 placeholder)
+- [ ] Write `install.sh` shebang + `set -euo pipefail` + arg parser stub
+- [ ] Write `check-drift.sh` stub (Phase 4 fills logic)
+- [ ] Create 16 stub reference files (15 + `media.md`) với phase assignment comment
+- [ ] Run `bash -n install.sh check-drift.sh` syntax check
+- [ ] Commit phase 1 artifact
+
+## Success Criteria
+- `tree claude-skill/` shows 21 files (SKILL.md, README.md, LICENSE, install.sh, check-drift.sh, 16 references/)
+- `SKILL.md` < 100 lines + valid YAML frontmatter
+- `bash -n claude-skill/install.sh check-drift.sh` exits 0
+- Mỗi stub reference có 1 TODO line chỉ phase sẽ được fill
+- `goclaw --help` output committed as `claude-skill/.verified-commands.txt` cho reference traceability
+
+## Risk
+- Nếu `goclaw --help` list không match 36 → explorer report sai → cluster mapping cần re-balance (adjust Phase 2/3 trước khi start)
+- MIT LICENSE yêu cầu copyright holder rõ — dùng "NextLevelBuilder" (từ repo owner `nextlevelbuilder`)
+
+## Security
+- Không hardcode endpoint/tenant trong stub
+- install.sh chưa touch `~/.claude/settings.json` ở phase này
+
+## Next
+- Phase 2 viết 5 reference priority (hero use case: exec)
diff --git a/plans/260417-1254-goclaw-claude-skill/phase-02-priority-references.md b/plans/260417-1254-goclaw-claude-skill/phase-02-priority-references.md
new file mode 100644
index 0000000..6d75edb
--- /dev/null
+++ b/plans/260417-1254-goclaw-claude-skill/phase-02-priority-references.md
@@ -0,0 +1,169 @@
+---
+phase: 2
+title: Priority references (hero path)
+status: pending
+priority: high
+effort_hours: 10
+blockedBy: [1]
+---
+
+# Phase 2 — Priority references (hero path)
+
+## Context Links
+- Parent: [plan.md](plan.md)
+- Phase 1: [phase-01-scaffold-skill-structure.md](phase-01-scaffold-skill-structure.md)
+- Explorer: `plans/reports/explore-260417-1254-goclaw-command-inventory.md`
+
+## Overview
+Viết 5 reference files phục vụ use case chính (exec + nhóm thường dùng nhất). Sau phase này skill đã đủ dùng cho 70% tình huống thực tế.
+
+## Priority order (ship nhanh use case "exec trên server" trước)
+
+1. `exec-workflow.md` — HERO. `tools invoke exec` + approvals flow + examples.
+2. `auth-and-config.md` — login, whoami, profile, tenant switch (mọi skill flow đều cần context auth)
+3. `chat-sessions.md` — chat single-shot + sessions list/preview/delete
+4. `agents-core.md` — agents list/get/create/delete + instances
+5. `monitoring-ops.md` — status, health, logs (non-streaming), traces, usage
+
+## Content template (áp dụng cho mọi reference file)
+
+```markdown
+# <Cluster Name>
+
+## When to use
+<1 sentence — khi nào Claude load reference này>
+
+## Commands in scope
+<bullet list: group → subcommands>
+
+## Verified flags (từ code source)
+<table: flag | type | default | purpose>
+
+## JSON output
+<Y/N per subcommand, cite printer.Print usage>
+
+## Destructive ops
+<bullet list — which subcommands require --yes, which require user confirm>
+
+## Common patterns
+<3-5 worked examples, copy-pasteable bash>
+
+## Edge cases & gotchas
+<bullet list>
+
+## Cross-refs
+<links tới references khác nếu có overlap>
+```
+
+## Specific content per file
+
+### exec-workflow.md (target 200-250 lines) — VERIFIED SCHEMA
+
+**Server impl:** `/Volumes/GOON/www/nlb/goclaw/internal/tools/shell.go:112-128` + builtin registry `gateway_builtin_tools.go:24`.
+
+**Tool name:** `exec`
+**Parameters schema:**
+```json
+{
+  "command": "string (REQUIRED) — shell command to execute",
+  "working_dir": "string (optional) — defaults to workspace root"
+}
+```
+**Response:** stdout/stderr từ `*Result` struct (check struct shape in `internal/tools/executor.go` trước khi viết example).
+**Approval gating:** package install commands + shell deny patterns → creates approval request via `ExecApprovalManager` (`exec_approval.go:91` — field `Command`). User approves via `goclaw approvals approve <id>`.
+
+**Content outline:**
+- Cover: `goclaw tools invoke exec --param command="..."` + `--param working_dir=/tmp`
+- Cover: `goclaw tools invoke <tool-name>` generic pattern (link vào file `providers-skills-tools.md` cho tool list)
+- Cover: `goclaw approvals list/approve/deny` — cần chạy qua ws.Call (xem `cmd/admin.go:26-80`)
+- **Canonical home của approvals** = file này (M4 resolution); `chat-sessions.md` cross-link tới đây
+- Example 1: one-shot command with auto-approve (innocuous command, không trigger approval)
+- Example 2: command hits approval gate → user approves → retry
+- Example 3: parse stdout/stderr/exit_code fields JSON
+- Gotcha: `approvals watch` streaming — flag "not Bash-friendly, use polling `approvals list --output json`"
+- Gotcha: NUL byte rejection, shell deny patterns, Unicode normalization (từ shell.go:138-144)
+
+### auth-and-config.md (target 150-200 lines)
+- Cover: `auth login/logout/whoami/use-context/list-contexts/pair`
+- Cover: `config get/set/permissions`
+- Cover: `credentials list/create/delete/rotate`
+- Cover: `api-keys list/create/reveal/revoke/extend`
+- Cover: global flags `--profile`, `--tenant-id`, `--server`, `--token`
+- Example: switch profile mid-session
+- Gotcha: `auth pair` = device pairing, long-running polling — document "not Bash-friendly; skill REFUSE to run, tell user chạy shell riêng"
+- Gotcha: **token expiry handling (U2 fix)** — nếu `goclaw` exit code indicate 401, Claude phải suggest `goclaw auth login` (không try-retry loop). Document exit codes nếu có.
+
+### chat-sessions.md (target 200 lines)
+- Cover: `chat -m "message" --no-stream --output json` (single-shot ONLY)
+- Cover: `chat abort <session>` (destructive, no --yes flag) — **include trong destructive section** (N3 fix)
+- Cover: `chat inject`, `chat status`
+- Cover: `sessions list/preview/delete/reset/label`
+- Cover: NDJSON parsing từ `chat --output json`
+- Cross-link: approvals → xem `exec-workflow.md` (không duplicate)
+- Gotcha: `chat` interactive mode = TUI, không dùng được qua Bash; chỉ dùng single-shot
+- Example: chat → get session ID → preview history
+
+### agents-core.md (target 250 lines — cluster lớn)
+- Cover: `agents list/get/create/update/delete`
+- Cover: `agents files list/get/create/delete`
+- Cover: `agents instances list/get/create/delete/trigger/reset`
+- Cover: `agents wake`
+- Example: full lifecycle — create agent → upload file → instance → wake
+- Destructive: delete agent + instances (both need --yes)
+
+### monitoring-ops.md (target 150 lines)
+- Cover: `status`, `health`, `version`
+- Cover: `traces list/get`
+- Cover: `usage summary/breakdown/trends/export`
+- Cover: `logs tail` — **FLAG: streaming, NOT Bash-friendly**
+- Example: diagnose server health flow
+
+## Related Code Files (to read)
+
+- `cmd/tools.go` — verify exec invoke schema
+- `cmd/admin.go:11-82` — approvals flow (ws.Call signatures)
+- `cmd/auth.go` — auth flow + pairing
+- `cmd/chat.go` — single-shot vs interactive modes
+- `cmd/sessions.go` — session CRUD
+- `cmd/agents.go` + `agents_*.go` — agent lifecycle
+- `cmd/status.go`, `health.go`, `logs.go`, `traces.go`, `usage.go` — monitoring commands
+- `cmd/root.go` — global flags
+
+## Implementation Steps
+
+1. Đọc source files listed above, extract flag definitions + subcommand schemas
+2. Viết `exec-workflow.md` trước (hero use case)
+3. Test: trong Claude Code, chạy thử "run `pwd` trên goclaw server" → verify Claude dùng `tools invoke exec --param command="pwd"` đúng
+4. Viết 4 file còn lại theo template
+5. Mỗi file: check mọi flag listed phải xuất hiện trong `cmd/*.go` tương ứng (không invent)
+6. Cross-link giữa các file (vd exec-workflow link tới auth-and-config cho context "token phải set trước")
+
+## Todo List
+
+- [ ] Read source files để extract verified flags
+- [ ] Write `exec-workflow.md` (hero)
+- [ ] Manual test exec-workflow trong Claude Code
+- [ ] Write `auth-and-config.md`
+- [ ] Write `chat-sessions.md`
+- [ ] Write `agents-core.md`
+- [ ] Write `monitoring-ops.md`
+- [ ] Cross-link review
+- [ ] Commit phase 2
+
+## Success Criteria
+- 5 reference files mỗi file 150-250 lines
+- Mọi flag listed verify được từ `cmd/*.go`
+- Claude Code test: ≥ 3 intents về exec chạy đúng lệnh mà không hallucinate flag
+- Destructive ops mỗi file có section "user confirm required"
+
+## Risk
+- ~~Schema response của `tools invoke exec` chưa known~~ **RESOLVED:** verified trong `shell.go:114-128`, `exec_approval.go:91`
+- Response body (`*Result` struct) chưa đọc — verify `internal/tools/executor.go` Result shape trước khi viết Example 3
+- Verify flag match tốn thời gian — chấp nhận vì chất lượng reference quyết định UX
+- Approvals đã đưa vào exec-workflow nhưng `chat-sessions.md` dễ duplicate — phải strict cross-link, không copy content
+
+## Security
+- Mọi example dùng placeholder `<agent-id>`, `<tenant>`, `<user-id>` — không hardcode data thật
+
+## Next
+- Phase 3: viết 10 reference còn lại
diff --git a/plans/260417-1254-goclaw-claude-skill/phase-03-remaining-references.md b/plans/260417-1254-goclaw-claude-skill/phase-03-remaining-references.md
new file mode 100644
index 0000000..e56b967
--- /dev/null
+++ b/plans/260417-1254-goclaw-claude-skill/phase-03-remaining-references.md
@@ -0,0 +1,109 @@
+---
+phase: 3
+title: Remaining references
+status: pending
+priority: medium
+effort_hours: 18
+blockedBy: [1]
+---
+
+# Phase 3 — Remaining references
+
+## Context Links
+- Parent: [plan.md](plan.md)
+- Phase 2 (template established): [phase-02-priority-references.md](phase-02-priority-references.md)
+- Explorer: `plans/reports/explore-260417-1254-goclaw-command-inventory.md`
+
+## Overview
+Viết 10 reference files còn lại, theo cùng template đã chốt ở Phase 2. Độ ưu tiên thấp hơn vì use case ít gặp, nhưng cần đủ để đạt 100% coverage command surface.
+
+## Scope — 11 files (M5 fix: `media` promoted to own cluster)
+
+| File | Clusters | Target lines |
+|------|----------|--------------|
+| `agents-advanced.md` | agents.links, agents.ops, delegations (standalone, xác nhận từ `cmd/admin.go:141`) | 200 |
+| `knowledge-memory.md` | kg, kg.dedup, memory | 200 |
+| `teams-collaboration.md` | teams, teams.members, teams.events, teams.tasks, teams.workspace | 250 |
+| `channels-messaging.md` | channels, channels.contacts, channels.pending, channels.writers, contacts, pending-messages | 200 |
+| `data-movement.md` | export, import, storage | 150 |
+| `providers-skills-tools.md` | providers, skills, tools (builtin list/get/update/tenant-config), packages | 250 |
+| `automation-scheduling.md` | cron, heartbeat, heartbeat.checklist, heartbeat.targets, devices | 200 |
+| `mcp-integration.md` | mcp, mcp.servers, mcp.grants, mcp.requests, mcp.reconnect | 200 |
+| `admin-system.md` | tenants, system-config, activity, tts | 180 |
+| `media.md` | media (admin_media.go) — **NEW** per red team M5 | 100 |
+| `docs-api.md` | api-docs | 80 |
+
+## Template (giữ như phase 2)
+
+Mỗi file có sections:
+- When to use
+- Commands in scope
+- Verified flags
+- JSON output support per subcommand
+- Destructive ops
+- Common patterns (3-5 examples)
+- Edge cases & gotchas
+- Cross-refs
+
+## Key streaming/TUI flags per cluster
+
+- `teams-collaboration.md` — ⚠️ **toàn bộ teams.* là WebSocket**. Vẫn chạy được qua Bash (ws.Call sync) nhưng `teams.events` + task subscriptions là streaming — flag rõ
+- `automation-scheduling.md` — ⚠️ `devices approve/reject` có interactive TUI fallback; dùng `--yes` để auto
+- `mcp-integration.md` — ⚠️ `mcp reconnect` async; `mcp servers test` sync
+- `channels-messaging.md` — ⚠️ một số subcommand chỉ trả success text (không JSON)
+- `data-movement.md` — ⚠️ import/export lớn → timeout bash tool; doc cách split hoặc tăng timeout
+
+## Related Code Files (to read per file)
+
+- `agents-advanced.md` — `cmd/agents_links.go`, `agents_ops.go`
+- `knowledge-memory.md` — `cmd/knowledge_graph.go`, `knowledge_graph_dedup.go`, `memory.go`, `memory_index.go`
+- `teams-collaboration.md` — `cmd/teams*.go` (7 files)
+- `channels-messaging.md` — `cmd/channels*.go`, `contacts.go`, `pending_messages.go`
+- `data-movement.md` — `cmd/export_import.go`, `storage.go`
+- `providers-skills-tools.md` — `cmd/providers*.go`, `skills*.go`, `tools.go`, `packages.go`
+- `automation-scheduling.md` — `cmd/cron.go`, `heartbeat*.go`, `devices.go`
+- `mcp-integration.md` — `cmd/mcp.go`, `mcp_reconnect.go`
+- `admin-system.md` — `cmd/tenants.go`, `system_config.go`, `admin*.go`
+- `docs-api.md` — `cmd/api_docs.go`, `api_keys.go`
+
+## Implementation Steps
+
+1. Có thể song song 10 files (độc lập nhau) — nếu làm tay tuần tự: mỗi file 15-30 phút
+2. Cho file nào cluster lớn (teams-collaboration, providers-skills-tools) — spawn 1 fullstack-developer subagent riêng để đỡ bloat main context
+3. Mỗi file done → cross-check verified flags match source
+4. Cross-link: `automation-scheduling` link tới `auth-and-config` (device pairing nhắc tới auth); `data-movement` link tới `providers-skills-tools` (export skills)
+5. Update SKILL.md navigation links khi mỗi file xong
+
+## Todo List
+
+- [ ] Write `agents-advanced.md`
+- [ ] Write `knowledge-memory.md`
+- [ ] Write `teams-collaboration.md` (lớn, có thể spawn subagent)
+- [ ] Write `channels-messaging.md`
+- [ ] Write `data-movement.md`
+- [ ] Write `providers-skills-tools.md` (lớn, có thể spawn subagent)
+- [ ] Write `automation-scheduling.md`
+- [ ] Write `mcp-integration.md`
+- [ ] Write `admin-system.md`
+- [ ] Write `media.md` (NEW, cluster bị miss)
+- [ ] Write `docs-api.md`
+- [ ] Cross-link pass trên tất cả 16 references (phase 2 + 3)
+- [ ] SKILL.md navigation list review
+- [ ] Commit phase 3
+
+## Success Criteria
+- 10 reference files, mỗi file 80-250 lines, kebab-case naming
+- Tổng 15 references tương đương ~3000 lines markdown
+- Mọi flag verify từ source code (grep flag name trong `cmd/*.go`)
+- SKILL.md navigation list cover hết 15 files
+
+## Risk
+- Scope lớn, dễ fatigue → ship từng batch 2-3 files/commit
+- Teams cluster phức tạp nhất (7 source files); để cuối hoặc delegate subagent
+
+## Security
+- Không leak server URL, tenant, credential trong examples
+- Placeholder `<xxx>` cho mọi giá trị user-specific
+
+## Next
+- Phase 4: install.sh + README + manual test
diff --git a/plans/260417-1254-goclaw-claude-skill/phase-04-install-readme-test.md b/plans/260417-1254-goclaw-claude-skill/phase-04-install-readme-test.md
new file mode 100644
index 0000000..84b9058
--- /dev/null
+++ b/plans/260417-1254-goclaw-claude-skill/phase-04-install-readme-test.md
@@ -0,0 +1,239 @@
+---
+phase: 4
+title: Install script + README + release tarball + test
+status: pending
+priority: high
+effort_hours: 8
+blockedBy: [2, 3]
+---
+
+# Phase 4 — Install script + README + manual test
+
+## Context Links
+- Parent: [plan.md](plan.md)
+- Researcher (install patterns): `plans/reports/researcher-260417-1254-claude-skill-authoring.md` §4 "Settings.json Patching"
+
+## Overview
+Hoàn thiện artifact cho OSS release: install.sh production-grade (idempotent Python3 merge), README đầy đủ với usage examples, manual smoke test trong Claude Code, write journal entry.
+
+## Requirements
+
+### install.sh behavior (post-red-team)
+1. **3-mode permission selector** (D3 revised):
+   - `--mode 1` = full wildcard `Bash(goclaw:*)` — user ack required
+   - `--mode 2` = readonly enumeration (~20 rules: `Bash(goclaw agents list:*)`, `Bash(goclaw agents get:*)`, `Bash(goclaw sessions list:*)`, ...) generated từ inventory
+   - `--mode 3` = no patching, print JSON snippet cho user copy (SAFEST — default khi chạy qua pipe)
+   - Không arg: interactive prompt qua `/dev/tty` chọn 1/2/3
+2. **Tarball install** (C4 fix): install.sh được embed trong tarball từ GitHub Releases. README one-liner:
+   ```bash
+   curl -fsSL https://github.com/nextlevelbuilder/goclaw-cli/releases/download/skill-v0.1.0/goclaw-skill.tar.gz | tee /tmp/gs.tgz | sha256sum -c goclaw-skill.sha256
+   tar xzf /tmp/gs.tgz -C /tmp && /tmp/goclaw-skill/install.sh
+   ```
+   SHA256 published alongside tarball in release.
+3. **Safe backup** (M7 fix):
+   ```bash
+   if [[ -f "$SETTINGS" ]]; then
+     cp "$SETTINGS" "${SETTINGS}.bak.$(date +%s)" || { echo "ERROR: backup failed"; exit 1; }
+   fi
+   ```
+4. **Interactive prompt safety** (M8 fix): `read ... < /dev/tty`; nếu `! -t 0` (piped), refuse Mode 1 và default Mode 3.
+5. **Python heredoc hardening** (M6 fix): `<<'EOF'` quoted; flags truyền qua env vars:
+   ```bash
+   MODE="$MODE" DRY_RUN="$DRY_RUN" "$PY" <<'EOF'
+   import os; mode = os.environ['MODE']; ...
+   EOF
+   ```
+6. **Python3 sanity check** (M6 fix): `python3 -c 'import sys; assert sys.version_info[0]==3'`.
+7. **Windows abort** (U4 fix): detect `uname -s` contains `MINGW|MSYS|CYGWIN` → print "Windows not supported in v1, use WSL or manual install"; exit.
+8. **Multi-install detect** (U5 fix): respect `CLAUDE_HOME` env var, fallback `~/.claude/`.
+9. **Idempotent:** permissions.allow dedupe before append.
+10. **Abort** if `settings.json` invalid JSON (don't overwrite).
+11. **Skill overwrite gate** (N1 fix): if `$SKILL_DIR` exists, require `--force`.
+
+### README.md content
+- Title + 1-paragraph pitch
+- Screenshot/GIF placeholder (optional for v1)
+- Requirements: goclaw binary, Claude Code CLI, macOS/Linux
+- Install one-liner: `curl -fsSL <raw-url>/claude-skill/install.sh | bash`
+- Manual install alternative: `git clone && cd claude-skill && ./install.sh`
+- Usage: 3 example prompts user có thể thử với Claude ("list my goclaw agents", "run `ls` on goclaw server", "show logs for agent X")
+- Configuration: link tới `references/auth-and-config.md`
+- Permissions matrix: readonly default vs --full-auto
+- Uninstall: `rm -rf ~/.claude/skills/goclaw` + manual settings.json revert
+- License + contributing link
+
+## Architecture
+
+```
+install.sh flow:
+  parse args (--full-auto, --dry-run, --help)
+  check prereqs (goclaw in PATH, ~/.claude/ exists)
+  backup ~/.claude/settings.json → .bak.<ts>
+  copy claude-skill/* → ~/.claude/skills/goclaw/
+  run python3 -c "<idempotent merge>" on settings.json
+  print success + next-steps
+```
+
+## Related Code Files
+
+### To create
+- `claude-skill/install.sh` (full implementation)
+- `claude-skill/README.md` (full implementation)
+
+### To update
+- `claude-skill/SKILL.md` — final polish, verify all 15 links work
+- Root `README.md` (repo-level) — add "Claude Code skill" section với link tới `claude-skill/`
+
+## Implementation Steps
+
+### Step 1: install.sh core
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+
+# Parse args
+FULL_AUTO=0
+DRY_RUN=0
+while [[ $# -gt 0 ]]; do
+  case $1 in
+    --full-auto) FULL_AUTO=1; shift;;
+    --dry-run)   DRY_RUN=1; shift;;
+    --help|-h)   print_help; exit 0;;
+    *) echo "Unknown: $1"; exit 1;;
+  esac
+done
+
+# Prereqs
+command -v goclaw >/dev/null || { echo "goclaw not in PATH"; exit 1; }
+[[ -d "$HOME/.claude" ]] || { echo "~/.claude missing — install Claude Code first"; exit 1; }
+
+# Backup
+SETTINGS="$HOME/.claude/settings.json"
+[[ -f "$SETTINGS" ]] && cp "$SETTINGS" "${SETTINGS}.bak.$(date +%s)"
+
+# Copy files
+SKILL_DIR="$HOME/.claude/skills/goclaw"
+[[ "$DRY_RUN" == 1 ]] && echo "[dry-run] mkdir -p $SKILL_DIR" || mkdir -p "$SKILL_DIR"
+# ... rsync claude-skill/ to $SKILL_DIR (exclude install.sh itself)
+
+# Warn for --full-auto
+if [[ "$FULL_AUTO" == 1 ]]; then
+  echo "WARNING: --full-auto will allow Claude to run ANY goclaw command including destructive ops."
+  read -p "Continue? [y/N] " -n 1 -r
+  [[ ! $REPLY =~ ^[Yy]$ ]] && exit 0
+fi
+
+# Patch settings.json via Python3
+PY="$HOME/.claude/skills/.venv/bin/python3"
+[[ ! -x "$PY" ]] && PY="$(command -v python3)"
+[[ -z "$PY" ]] && { echo "python3 not found — add permissions manually"; print_manual; exit 0; }
+
+"$PY" <<EOF
+import json, os, sys
+p = os.path.expanduser("~/.claude/settings.json")
+data = {}
+if os.path.exists(p):
+    with open(p) as f:
+        try: data = json.load(f)
+        except Exception as e: sys.exit(f"settings.json invalid: {e}")
+
+data.setdefault("permissions", {}).setdefault("allow", [])
+rules = ${FULL_AUTO} == 1 and ["Bash(goclaw:*)"] or [
+    "Bash(goclaw list:*)", "Bash(goclaw get:*)",
+    "Bash(goclaw status)", "Bash(goclaw whoami)",
+    "Bash(goclaw health)", "Bash(goclaw version)"
+]
+for r in rules:
+    if r not in data["permissions"]["allow"]:
+        data["permissions"]["allow"].append(r)
+
+if not ${DRY_RUN}:
+    with open(p, "w") as f:
+        json.dump(data, f, indent=2)
+print("✓ permissions merged")
+EOF
+
+echo "✓ Installed to $SKILL_DIR"
+echo "Next: restart Claude Code if running"
+```
+
+### Step 2: README.md
+- Match tone của repo-root README.md
+- Badges: license, CLI version compatibility
+- "Try it" section: 3 concrete example prompts
+- FAQ: "How is this different from MCP?" — answer: "KISS, CLI is source of truth"
+
+### Step 3: Update root README.md
+- Thêm 1 section "Claude Code Skill" với link tới `claude-skill/README.md`
+
+### Step 4: 15-prompt smoke test (C5 fix)
+
+**Positive intents (skill SHOULD trigger + correct command):**
+1. "list agents trên goclaw" → `goclaw agents list --output json`
+2. "run `uname -a` trên goclaw server" → `goclaw tools invoke exec --param command="uname -a"`
+3. "show goclaw server status" → `goclaw status --output json`
+4. "show tracking traces for my agent" → `goclaw traces list --agent <id> --output json`
+5. "what's my usage this week" → `goclaw usage summary --output json`
+
+**Destructive intents (skill SHOULD prompt before running):**
+6. "delete agent xyz" → Claude confirm + `goclaw agents delete xyz --yes`
+7. "rotate my credential" → Claude confirm + `goclaw credentials rotate`
+8. "reset session abc" → Claude confirm + `goclaw sessions reset abc --yes`
+9. "unpublish skill foo" → Claude confirm + `goclaw skills unpublish foo --yes`
+10. "clear all memory for agent X" → Claude confirm + `goclaw memory clear --agent X --yes`
+
+**Negative intents (skill should NOT load, C5 over-triggering test):**
+11. "claw feet for my couch" (woodworking) → skill không load
+12. "how do claws work on cats" → skill không load
+13. "run npm install" (generic, không có context goclaw) → skill không load (hoặc load mà Claude decline)
+14. "list my processes" (Unix shell, không phải goclaw) → skill không load
+
+**Streaming intent (U3 test):**
+15. "tail logs for goclaw" → Claude explains streaming limitation + suggest `goclaw logs tail --limit 50 --output json` hoặc cách alternative
+
+Ghi log test vào `plans/reports/tester-260417-<ts>-goclaw-skill-smoke.md`. Fail bất kỳ positive/destructive test → block release.
+
+### Step 5: /ck:journal entry
+Sau test pass, chạy `/ck:journal` để viết note session.
+
+## Todo List
+
+- [ ] Implement `install.sh` — 3-mode selector, safe backup, /dev/tty, quoted heredoc, Py3 sanity, Windows abort, CLAUDE_HOME env, skill overwrite gate, dedupe
+- [ ] Implement `check-drift.sh` — grep every flag trong references vs `cmd/*.go`, exit 1 nếu mismatch
+- [ ] `bash -n install.sh check-drift.sh` + `shellcheck` — pass hoặc chỉ low-severity warnings
+- [ ] Test install.sh matrix: fresh, re-run (idempotent), --mode 1, 2, 3, --dry-run, piped (must default Mode 3), Windows detect
+- [ ] Write `README.md` với tarball+SHA256 install block + 3 example prompts + permission matrix + uninstall
+- [ ] Update root `README.md` thêm section "Claude Code skill" link
+- [ ] Build tarball: `tar czf goclaw-skill.tar.gz claude-skill/` + compute `sha256sum > goclaw-skill.sha256`
+- [ ] Smoke test 15 prompts trong Claude Code (5 positive + 5 destructive + 4 negative + 1 streaming)
+- [ ] Ghi test log vào `plans/reports/tester-260417-<ts>-goclaw-skill-smoke.md`
+- [ ] Fix issues từ test (tune description keywords nếu skill over/under-trigger)
+- [ ] Create GitHub release `skill-v0.1.0` với tarball + sha256 attached
+- [ ] `/ck:journal` final entry
+- [ ] Commit phase 4 + tag
+
+## Success Criteria
+- `./install.sh --dry-run` in đúng action list per mode
+- `./install.sh --mode 2` merge ~20 readonly rules, idempotent (re-run không duplicate)
+- `./install.sh --mode 1` confirm qua /dev/tty + merge `Bash(goclaw:*)`
+- `./install.sh` piped → default Mode 3 (không patch)
+- `check-drift.sh` exit 0 trên codebase hiện tại
+- 10/10 positive + destructive smoke tests pass
+- 4/4 negative tests pass (skill không over-trigger)
+- README install block reproduce được trên máy sạch
+- `shellcheck install.sh check-drift.sh` clean hoặc chỉ warnings acceptable
+- GitHub release `skill-v0.1.0` public với tarball + sha256 + LICENSE
+
+## Risk
+- `shellcheck` strict → có thể tốn 30p fix warnings; acceptable
+- Python3 venv không tồn tại trên máy user mới cài Claude Code → fallback manual instruction PHẢI hoạt động (test case riêng)
+- Description keywords không match intent user → tune sau test, không blocking
+
+## Security
+- Backup settings.json TRƯỚC patch (không sau)
+- Abort nếu JSON invalid (không ghi đè)
+- `--full-auto` confirmation không bypass được qua pipe (read từ /dev/tty)
+
+## Next
+- Plan done → tag skill-v0.1.0
+- Follow-up: collect user feedback, iterate references, cân nhắc marketplace publish nếu Anthropic ra
diff --git a/plans/260417-1254-goclaw-claude-skill/plan.md b/plans/260417-1254-goclaw-claude-skill/plan.md
new file mode 100644
index 0000000..01c8b99
--- /dev/null
+++ b/plans/260417-1254-goclaw-claude-skill/plan.md
@@ -0,0 +1,130 @@
+---
+title: GoClaw Claude Code Skill
+status: in_progress
+created: 2026-04-17
+priority: high
+blockedBy: []
+blocks: []
+source: skill
+license: MIT
+distribution: github-releases-tarball
+implementation_progress: "Phases 1-3 done. Phase 4 95% done (install.sh + check-drift shellcheck-clean; TTY probe fixed; tarball/SHA256 rebuilt; root README updated; test matrix passed: modes 1/2/3, idempotent, piped-default-3, piped-mode-1-refused). Remaining: user runs 15-prompt smoke test in Claude Code + GitHub release publish."
+---
+
+# GoClaw Claude Code Skill — Implementation Plan
+
+Build a Claude Code skill that lets Claude (agent) invoke goclaw CLI to interact with GoClaw Gateway server. Hero use case verified: `goclaw tools invoke exec --param command="<cmd>"` (server has `exec` builtin tool at `goclaw/internal/tools/shell.go:112-128`, params: `command` required, `working_dir` optional). Scope: 36 top-level command groups.
+
+## Context Links
+
+- Brainstorm synthesis: conversation turns `4/16 14:48` + `4/17 12:41`
+- Researcher (skill authoring): `plans/reports/researcher-260417-1254-claude-skill-authoring.md`
+- Explorer (command inventory): `plans/reports/explore-260417-1254-goclaw-command-inventory.md`
+- Red team review: `plans/reports/code-reviewer-260417-1254-goclaw-skill-red-team.md`
+- CLI source: `cmd/*.go` (~60 files, 36 groups verified)
+- Server source: `/Volumes/GOON/www/nlb/goclaw/cmd/gateway_builtin_tools.go` (builtin tool registry), `/Volumes/GOON/www/nlb/goclaw/internal/tools/shell.go` (exec impl)
+- Previous CLI plans (completed): `plans/260315-0007-goclaw-cli-implementation/`, `plans/260326-1350-cli-feature-parity-update/`
+
+## Goals
+
+1. Claude Code agent có thể gọi `goclaw <cmd>` autonomously qua Bash, parse JSON output, thao tác GoClaw server
+2. Skill trigger tự động khi user nói về GoClaw (keyword matching qua description)
+3. Cover toàn bộ 38 command groups qua progressive disclosure (14-15 reference files)
+4. OSS-publishable: install script + README + LICENSE trong repo `goclaw-cli` (thư mục `claude-skill/`)
+5. Safety: default prompt destructive, opt-in full-auto qua install flag
+
+## Non-goals
+
+- MCP server (bỏ per brainstorm YAGNI)
+- Streaming commands (chat interactive, logs tail) — document limitation, không wrap
+- Windows PowerShell script — macOS/Linux only cho v1
+- Auto-generate references từ `--help` output — viết tay để control UX
+
+## Tech Stack
+
+- Markdown (SKILL.md + references/*)
+- Bash (install.sh)
+- Python3 (idempotent settings.json patcher, dùng venv `~/.claude/skills/.venv/bin/python3`)
+- Hosting: trong repo `goclaw-cli` dưới `claude-skill/` để sync version với CLI
+
+## Architecture Decisions
+
+### D1 — Single repo (goclaw-cli/claude-skill/) vs separate repo
+**Chọn:** đặt trong `goclaw-cli/claude-skill/`. Lý do: version sync tự động với CLI releases; user star 1 repo có cả 2; tránh divergence khi CLI đổi flags.
+
+### D2 — Skill name
+**Chọn:** `goclaw` (ngắn, match binary name). Không dùng `goclaw-cli` (redundant).
+
+### D3 — Permission default (post-red-team revision)
+**Problem cũ:** Red team chỉ ra `Bash(goclaw status)` không match `goclaw status --output json` (D6 force append `--output json`), và `Bash(goclaw list:*)` không match `goclaw agents list` (list không phải top-level).
+
+**Chọn mới:** Install-time user interactive prompt, không hardcode pattern:
+1. Install.sh mặc định hỏi "Choose permission mode: [1] Full auto (Bash(goclaw:*)) [2] Readonly verbs only [3] Manual (no patching)". Default = 3.
+2. Mode 1 (`Bash(goclaw:*)`) — empirically verify wildcard matches `goclaw anything --output json` (Claude Code doc: `Bash(cmd:*)` = prefix match with any args). Nếu không match, fallback Mode 2.
+3. Mode 2: enumerate readonly verbs per resource với trailing wildcard: `Bash(goclaw agents list:*)`, `Bash(goclaw agents get:*)`, `Bash(goclaw sessions list:*)`, ... (~20 rules). Generated từ inventory, không hardcode tay.
+4. Mode 3: in JSON snippet, user tự copy.
+5. `--mode 1|2|3` flag cho scripting (bypass prompt).
+
+Loại bỏ `--full-auto` flag riêng; dùng `--mode 1` thay thế.
+
+### D4 — Settings.json patching
+**Chọn:** Python3 idempotent merge (per researcher report). Lý do: cross-platform, không cần jq, dùng venv sẵn có. Fallback: in hướng dẫn manual nếu venv không tồn tại.
+
+### D5 — References structure
+**Chọn:** 15 reference files (14 clusters từ explorer + 1 thêm `exec-workflow.md` cho hero use case). File nào > 300 lines tách tiếp.
+
+### D6 — Convention trong SKILL.md
+**Chọn:** Claude LUÔN append `--output json` khi chạy goclaw (trừ streaming). Quy tắc này viết ở SKILL.md và lặp ở mỗi reference.
+
+## Phases Overview
+
+| Phase | Title | Status | Effort (hrs) | Deliverable |
+|-------|-------|--------|--------------|-------------|
+| 1 | [Scaffold skill structure](phase-01-scaffold-skill-structure.md) | ✅ completed | 2 | `claude-skill/` dir + SKILL.md + MIT LICENSE + 16 stub references + install.sh + check-drift.sh skeletons + `.verified-commands.txt` (38 groups from binary) |
+| 2 | [Priority references (hero path)](phase-02-priority-references.md) | ✅ completed | 10 | 5 refs: exec-workflow (verified schema from `shell.go:114-128`), auth-and-config, agents-core, chat-sessions, monitoring-ops |
+| 3 | [Remaining references](phase-03-remaining-references.md) | ✅ completed | 18 | 11 refs including `media` cluster; total 16 refs, 1709 lines markdown |
+| 4 | [Install script + README + release + test](phase-04-install-readme-test.md) | 🟡 95% done | 8 | install.sh + check-drift.sh shellcheck-clean; has_tty probe fixes pipe edge case; test matrix passed (3 modes + idempotent + piped default + Mode1 piped refusal); tarball + SHA256 rebuilt; root README linked. **Remaining: user runs 15-prompt smoke test in Claude Code + GitHub release publish.** |
+
+**Total:** ~38 hrs ≈ 5 working days.
+
+## Dependencies
+
+- `goclaw` binary compiled + in PATH (user prereq)
+- `~/.claude/skills/` directory exists (Claude Code installed)
+- Python3 trong venv `~/.claude/skills/.venv/bin/python3` (nếu muốn auto-patch settings)
+
+## Success Criteria
+
+1. User nói "list agents trên goclaw" → Claude auto-invoke skill → chạy `goclaw agents list --output json` → summarize
+2. User nói "run `ls -la` trên goclaw server" → Claude chạy `goclaw tools invoke exec --param command="ls -la"` → trả output + approval status
+3. Destructive command (e.g. `goclaw agents delete xyz`) → Claude prompt user confirm trước khi thêm `--yes`
+4. `./install.sh` chạy idempotent: chạy lần 2 không duplicate permissions
+5. README chứa install one-liner + 3 example prompts
+6. Manual test: skill trigger đúng cho ≥ 5 intent khác nhau, ≥ 90% commands Claude phát ra đúng flag
+
+## Risks
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| `allowed-tools` trong frontmatter không enforce (issue #14956) | High | Medium | Dùng settings.json permissions làm source of truth |
+| Claude hallucinate flag name không có trong reference | Medium | Low | `check-drift.sh` grep every flag trong references vs `cmd/*.go` |
+| settings.json malformed → install.sh phá config user | Low | High | `if [[ -f ]]; then cp ... \|\| exit 1; fi` (not `&&` chain); abort nếu invalid JSON |
+| goclaw CLI đổi flag → reference outdated | Medium | Medium | `check-drift.sh` trong CI; version compat note ở SKILL.md frontmatter |
+| User chọn Mode 1 trên prod → xóa data | Low | Critical | Mode 3 là default; Mode 1 có interactive confirm qua `/dev/tty` |
+| Description keywords quá hẹp → skill không trigger | Medium | Low | 15-prompt smoke test (5 positive + 5 negative + 5 destructive) |
+| Tarball download corruption / MITM | Low | High | SHA256 in install.sh; verify before extract |
+| exec tool param schema đổi ở server release | Low | Medium | Pin supported goclaw versions trong SKILL.md; `check-drift.sh` optional server check |
+| `curl|bash` install có stdin piped → interactive prompts fail | Medium | Medium | `read ... < /dev/tty`; refuse Mode 1 nếu `! -t 0` |
+
+## Security Considerations
+
+- **Token handling:** Skill KHÔNG lưu token — dùng credential store của goclaw CLI (đã có)
+- **settings.json backup:** Trước patch, copy sang `.bak` với timestamp
+- **Deny list default:** Include `Bash(goclaw tenants delete:*)`, `Bash(goclaw * --yes *)` trong deny cho non-full-auto mode
+- **Public OSS hygiene:** KHÔNG hardcode server URL, tenant ID, token trong skill files
+
+## Next Steps
+
+1. Phase 1: scaffold (độc lập, có thể bắt đầu ngay)
+2. Phase 2 & 3 có thể song song sau phase 1
+3. Phase 4 chốt sau khi 2+3 xong
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-01-scope-lock.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-01-scope-lock.md
new file mode 100644
index 0000000..dcec545
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-01-scope-lock.md
@@ -0,0 +1,60 @@
+---
+phase: 1
+title: "Scope Lock"
+status: pending
+priority: P2
+effort: "1h"
+dependencies: []
+---
+
+# Phase 1: Scope Lock
+
+## Overview
+
+Lock the 7-surface scope, re-verify backend contracts against `digitopvn/goclaw` `dev` tip, and produce the contract evidence file that subsequent phases reference. No CLI code changes.
+
+## Requirements
+
+- Re-fetch and read each backend handler to confirm path, method, query params, body shape, status enum, and error semantics.
+- Confirm beta tag `v3.12.0-beta.20` is the minimum that contains PR #44 commit `43049d3b` (already verified during plan creation; phase re-asserts).
+- Confirm no command-name collisions in current `cmd/` tree.
+- Produce `reports/scope-lock-260527-p6.md` summarizing contracts and any drift discovered.
+
+## Implementation Steps
+
+1. From the `digitopvn/goclaw` checkout (or via `gh api`), read:
+   - `internal/http/traces.go` — locate the `traces/follow` handler.
+   - `internal/http/providers.go` — locate the `providers/{id}/reconnect` handler.
+   - `internal/http/sessions.go` — locate the branch and history/follow handlers.
+   - `internal/http/channel_instances.go` — locate the writers/test handler.
+   - `internal/http/activity.go` — locate the aggregate handler.
+   - `internal/http/logs.go` — locate the runtime/aggregate handler.
+   - `internal/http/openapi_spec.json` for any documented schema differences.
+2. For each handler record: route, method, required vs optional inputs, response shape, status enum, error codes, admin-only flag.
+3. Compare against `plans/reports/codex-prompt-260522-p6-pr44-backend-unblocked-cli.md` (lives on `feat/claude-skill-v0.1` worktree) and the plan overview. Flag any drift.
+4. Re-verify beta tag: `gh api repos/digitopvn/goclaw/compare/v3.12.0-beta.20...43049d3b --jq '.status'` must return `identical`.
+5. Re-verify no naming collisions: `grep -in 'Use:.*"follow\|reconnect\|branch\|aggregate\|test"' cmd/*.go`.
+6. Write `reports/scope-lock-260527-p6.md` with the table of confirmed contracts.
+
+## Todo List
+
+- [ ] Backend contract re-read for all 7 endpoints.
+- [ ] Drift table populated (or zero-drift confirmed).
+- [ ] Beta tag re-confirmed.
+- [ ] Naming collision sweep run with output captured.
+- [ ] `reports/scope-lock-260527-p6.md` written.
+- [ ] Phase status flipped to Complete.
+
+## Success Criteria
+
+- Zero unresolved contract questions before phase 2 starts.
+- Scope file under `reports/` exists and is referenced from each implementation phase's "Evidence" section.
+
+## Out of Scope
+
+- Any CLI code change.
+- Any test scaffolding (phase 2+).
+
+## Next Steps
+
+Proceed to Phase 2 (Traces Follow + Providers Reconnect).
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-02-traces-follow-and-providers-reconnect.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-02-traces-follow-and-providers-reconnect.md
new file mode 100644
index 0000000..6a226c7
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-02-traces-follow-and-providers-reconnect.md
@@ -0,0 +1,114 @@
+---
+phase: 2
+title: "Traces Follow + Providers Reconnect (PR #37 surfaces)"
+status: pending
+priority: P2
+effort: "3h"
+dependencies: [1]
+---
+
+# Phase 2: Traces Follow + Providers Reconnect
+
+## Overview
+
+Implement two CLI surfaces from backend PR #37 (already in beta `v3.12.0-beta.16`+): polling-friendly trace follow and provider reconnect. Strict TDD — failing tests land before any Cobra command code.
+
+## Surfaces
+
+### 2.1 `goclaw traces follow`
+
+```bash
+goclaw traces follow --session-key <key> [--since <RFC3339>] [--limit N] [--include-spans] [--status <status>] [--channel <channel>] [-o json|yaml|table]
+goclaw traces follow --agent <uuid-or-key> [same flags]
+```
+
+- Endpoint: `GET /v1/traces/follow`
+- Require exactly one of `--session-key` or `--agent` (validate before HTTP call).
+- Optional: `status`, `channel`, `since`, `limit`, `include_spans`. `since` must be RFC3339.
+- Server default `limit=50`, max `200`. Don't enforce max client-side; let server respond.
+- Response shape:
+  ```json
+  {"traces":[],"spans_by_trace_id":{},"server_time":"...","next_since":"...","limit":50}
+  ```
+- JSON/YAML: print full envelope.
+- Table: rows like `traces list` — `TRACE_ID`, `AGENT`, `STATUS`, `DURATION_MS`, `INPUT_TOKENS`, `OUTPUT_TOKENS`, `COST`.
+- **One request only. No watch loop.**
+
+### 2.2 `goclaw providers reconnect`
+
+```bash
+goclaw providers reconnect <provider-id> [-o json|yaml|table]
+```
+
+- Endpoint: `POST /v1/providers/{id}/reconnect`
+- Admin-only on backend; client sends no body. Do NOT send `{"verify":true}`.
+- Do NOT add `--verify` flag. **The only verify-shaped command is `goclaw providers verify-embedding <id>` (`cmd/providers_verify.go:11`)**, which targets a different backend endpoint — do NOT recommend it as a fallback in Long-help or PR body. Backend handles reconnect verification server-side.
+- Path-escape `<provider-id>` via `url.PathEscape` (see escaped-path pattern in `cmd/api_keys_rotate.go`).
+- Response:
+  ```json
+  {"status":"reconnected","provider":{},"registry_updated":true,"cache_invalidated":true}
+  ```
+- Status enum: `reconnected`, `disabled`, `not_registered`.
+- Table: `STATUS`, `REGISTRY_UPDATED`, `CACHE_INVALIDATED`, plus provider name/id if non-empty.
+
+## Files
+
+- Modify: `cmd/traces.go` — append `tracesFollowCmd` + register on `tracesCmd`.
+- Modify: `cmd/providers.go` or new `cmd/providers_reconnect.go` (preferred — keep file < 200 lines per repo rule) — declare `providersReconnectCmd` + register on `providersCmd`.
+- New: `cmd/traces_follow_test.go`
+- New: `cmd/providers_reconnect_test.go`
+
+## TDD Sequence
+
+1. Write `cmd/traces_follow_test.go` with the failing tests below; run `go test ./cmd -run TracesFollow` and confirm red.
+2. Implement `tracesFollowCmd` minimally until tests pass.
+3. Write `cmd/providers_reconnect_test.go`; confirm red.
+4. Implement `providersReconnectCmd`; confirm green.
+5. `go vet ./... && go build ./...` clean.
+
+## Tests
+
+### `cmd/traces_follow_test.go`
+
+- Session-key target builds path `/v1/traces/follow?session_key=...&...`.
+- Agent target builds path `/v1/traces/follow?agent_id=...&...`.
+- Missing both target flags returns validation error before HTTP call.
+- Setting both target flags returns validation error before HTTP call.
+- Non-RFC3339 `--since` returns validation error before HTTP call.
+- JSON output preserves `next_since` and `spans_by_trace_id`.
+- Table output includes the seven required columns.
+- **Atomic-counter test (Red Team F7):** wrap `httptest.NewServer` handler with `atomic.AddInt64(&calls, 1)`; assert `calls == 1` after `RunE`. Must NOT use `client.FollowStream` (`internal/client/follow.go`) — that would reconnect.
+- **502-once test (Red Team F7):** server returns 502 first call, 200 second call; assert command fails fast on 502, does NOT retry.
+
+### `cmd/providers_reconnect_test.go`
+
+- POST path is `/v1/providers/{escaped-id}/reconnect`.
+- Request body is empty (server receives `Content-Length: 0` or empty JSON; assert no `verify` key).
+- JSON output preserves `registry_updated` and `cache_invalidated`.
+- Table output renders status + boolean columns.
+- Provider ID with `/` or `:` is path-escaped (regression test for RT-02).
+- **Atomic-counter test (Red Team F7):** assert exactly one POST request issued. No retry/reconnect path.
+
+## Todo List
+
+- [ ] Red tests for traces follow.
+- [ ] `tracesFollowCmd` implemented + green.
+- [ ] Red tests for providers reconnect.
+- [ ] `providersReconnectCmd` implemented + green.
+- [ ] `go vet` + `go build` clean.
+- [ ] Phase status flipped to Complete.
+
+## Success Criteria
+
+- Both commands callable via `goclaw --help`.
+- All new tests pass without skipping or relying on live server.
+- No regression in existing `traces list`, `traces get`, `traces export`, or any `providers` subcommand.
+
+## Risks
+
+- Forgetting path-escape on provider ID (RT-02 lesson). Mitigated by explicit escape test.
+- Accidentally adding `--verify` based on familiarity with `providers verify`. Mitigated by codex prompt's explicit prohibition + test asserting empty body.
+
+## Next Steps
+
+Phase 3 (Sessions Branch + Follow).
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-03-sessions-branch-and-follow.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-03-sessions-branch-and-follow.md
new file mode 100644
index 0000000..19235c2
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-03-sessions-branch-and-follow.md
@@ -0,0 +1,120 @@
+---
+phase: 3
+title: "Sessions Branch + Follow (PR #44 chat surfaces)"
+status: pending
+priority: P2
+effort: "3h"
+dependencies: [2]
+---
+
+# Phase 3: Sessions Branch + Follow
+
+## Overview
+
+Implement the two chat-session surfaces from backend PR #44 (beta `v3.12.0-beta.20`+): branch a session at a message index, and cursor-based history follow. TDD.
+
+## Surfaces
+
+### 3.1 `goclaw sessions branch`
+
+```bash
+goclaw sessions branch <session-key> --up-to-index <n> [--new-session-key <key>] [--label <label>] [--metadata k=v]... [-o json|yaml|table]
+```
+
+- Endpoint: `POST /v1/chat/sessions/{key}/branch`
+- **Backend path note (Red Team F3):** the new `branch`/`follow` commands sit under top-level `sessionsCmd` for UX continuity, but target `/v1/chat/sessions/...` (matches the domain of `cmd/chat_sessions.go`), while sibling `sessions list/preview/delete/reset/label/compact` target `/v1/sessions/...`. Long-help for both new commands MUST state: `Backend route: POST /v1/chat/sessions/{key}/branch (chat domain).` This is the documented exception, not a bug.
+- Required: `<session-key>` positional, `--up-to-index` (int, >= 0 — **including zero**).
+- Optional: `--new-session-key`, `--label`, `--metadata key=value` (repeatable).
+- `--metadata` parses repeated `key=value`; reject malformed entries before HTTP call.
+- Path-escape source session key via `url.PathEscape` (may contain `:` and `/`).
+- **buildBody-zero workaround (Red Team F2):** the shared `buildBody` helper at `cmd/helpers.go:86-89` drops `int v == 0`. For `up_to_index`, DO NOT use `buildBody`; construct the body map directly so `{"up_to_index": 0, ...}` is preserved on the wire.
+- Request body:
+  ```json
+  {"new_session_key":"...","up_to_index":12,"label":"...","metadata":{"source":"cli"}}
+  ```
+- Response:
+  ```json
+  {"ok":true,"source_key":"...","session_key":"...","copied_messages":12,"total_messages":24,"label":"..."}
+  ```
+- Table: `SOURCE`, `NEW_KEY`, `COPIED`, `TOTAL`, `LABEL`.
+
+### 3.2 `goclaw sessions follow`
+
+```bash
+goclaw sessions follow <session-key> [--cursor <n>] [--limit <n>] [-o json|yaml|table]
+```
+
+- Endpoint: `GET /v1/chat/sessions/{key}/history/follow`
+- Query: `cursor` (default 0, >= 0 — **including zero**), `limit` (default 50, > 0, server max 200).
+- **buildBody-zero workaround (Red Team F2):** build the query string directly with `url.Values`; do NOT use `buildBody` (which would drop `cursor=0` per the int-zero skip rule). `cursor=0` MUST appear in the query string when `--cursor 0` is passed.
+- **One polling request only. No SSE/WS watch.** Direct `httpClient.Get` call — must NOT use `client.FollowStream` (`internal/client/follow.go`), which reconnects on EOF.
+- Response:
+  ```json
+  {"session_key":"...","cursor":12,"next_cursor":18,"total":18,"messages":[],"reset":false,"updated":"..."}
+  ```
+- Table: print summary row (cursor, next_cursor, total, reset, updated) plus compact message rows.
+
+## Files
+
+- Modify: `cmd/sessions.go` — append `sessionsBranchCmd` + `sessionsFollowCmd` and register both.
+  - If `cmd/sessions.go` grows past 200 lines, split into `cmd/sessions_branch.go` and `cmd/sessions_follow.go` per repo modularization rule.
+- New: `cmd/sessions_branch_test.go`
+- New: `cmd/sessions_follow_test.go`
+
+## TDD Sequence
+
+1. Red: `cmd/sessions_branch_test.go`.
+2. Implement `sessionsBranchCmd`; green.
+3. Red: `cmd/sessions_follow_test.go`.
+4. Implement `sessionsFollowCmd`; green.
+5. `go vet ./... && go build ./...` clean.
+
+## Tests
+
+### `cmd/sessions_branch_test.go`
+
+- `--up-to-index` missing returns validation error before HTTP call.
+- Negative `--up-to-index` returns validation error before HTTP call.
+- **Zero-boundary test (Red Team F2):** `--up-to-index 0` produces request body containing `"up_to_index":0` literally (not omitted, not missing). Use `json.Unmarshal` on the captured request body and assert the key is present with value `0`.
+- Request body shape exact: `up_to_index` as int, `metadata` as object.
+- `--metadata foo=bar --metadata baz=qux` produces `{"foo":"bar","baz":"qux"}`.
+- Malformed `--metadata foobar` (no `=`) rejected before HTTP call.
+- Session key with `:` and `/` is path-escaped (regression for RT-02).
+- HTTP 409 conflict surfaces via existing error handler (use `client.APIError` path).
+- JSON output preserves `copied_messages` and `total_messages`.
+
+### `cmd/sessions_follow_test.go`
+
+- Default cursor=0, limit=50 appear in query string.
+- **Zero-boundary test (Red Team F2):** `--cursor 0` results in `cursor=0` appearing in raw query string (not omitted by buildBody int-zero skip).
+- Custom cursor and limit appear in query string.
+- Negative `--cursor` rejected before HTTP call.
+- Non-positive `--limit` rejected before HTTP call.
+- Session key path-escaped.
+- JSON output preserves `reset`, `next_cursor`, `messages`.
+- **Atomic-counter test (Red Team F7):** wrap test server handler with `atomic.AddInt64(&calls, 1)`; assert `calls == 1` after `RunE`. Assert command does NOT import / call `client.FollowStream`.
+
+## Todo List
+
+- [ ] Red tests for sessions branch.
+- [ ] `sessionsBranchCmd` implemented + green.
+- [ ] Red tests for sessions follow.
+- [ ] `sessionsFollowCmd` implemented + green.
+- [ ] `go vet` + `go build` clean.
+- [ ] If `cmd/sessions.go` exceeds 200 lines, split (per repo rule).
+- [ ] Phase status flipped to Complete.
+
+## Success Criteria
+
+- Both commands callable.
+- Path-escape regression test passes.
+- No watch loop introduced.
+
+## Risks
+
+- Drifting metadata shape (object vs array). Mitigated by exact-body test against backend handler signature confirmed in phase 1.
+- Session-key path escape forgotten. Mitigated by explicit regression test.
+
+## Next Steps
+
+Phase 4 (Channels Writers Test).
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-04-channels-writers-test.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-04-channels-writers-test.md
new file mode 100644
index 0000000..14b150d
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-04-channels-writers-test.md
@@ -0,0 +1,76 @@
+---
+phase: 4
+title: "Channels Writers Test (PR #44)"
+status: pending
+priority: P2
+effort: "1.5h"
+dependencies: [3]
+---
+
+# Phase 4: Channels Writers Test
+
+## Overview
+
+Add the writer-permission probe under the existing `channels writers` command group. Small, isolated, TDD.
+
+## Surface
+
+```bash
+goclaw channels writers test <instance-id> --group-id <group-scope> --user-id <user-id> [-o json|yaml|table]
+```
+
+- Endpoint: `POST /v1/channels/instances/{id}/writers/test`
+- Required positional: `<instance-id>`.
+- Required flags: `--group-id`, `--user-id`.
+- Body **only** has `group_id` and `user_id`. Reject any extra fields client-side.
+  ```json
+  {"group_id":"group:telegram:-100123","user_id":"386246614"}
+  ```
+- Response:
+  ```json
+  {"allowed":true,"reason":"writer","instance_id":"...","agent_id":"...","group_id":"...","user_id":"...","writer_count":3}
+  ```
+- Known `reason` values: `writer`, `not_writer`, `no_writers_configured`, `invalid_group`.
+- Table: `ALLOWED`, `REASON`, `WRITER_COUNT`, `GROUP_ID`, `USER_ID`.
+
+## Files
+
+- Modify: `cmd/channels_writers.go` — append `channelsWritersTestCmd`, register on `channelsWritersCmd`.
+- New: `cmd/channels_writers_test_test.go` (file naming kept descriptive; `_test_test.go` is intentional — the command name is `test` and Go test file suffix is `_test.go`).
+  - Alternative if Go tooling balks: `cmd/channels_writers_probe_test.go` — but verify Go accepts `_test_test.go` first (it does).
+
+## TDD Sequence
+
+1. Red: write the probe test file with the cases below.
+2. Implement `channelsWritersTestCmd`; green.
+3. `go vet ./... && go build ./...` clean.
+
+## Tests
+
+- Missing `--group-id` rejected before HTTP call.
+- Missing `--user-id` rejected before HTTP call.
+- POST path is `/v1/channels/instances/{escaped-id}/writers/test`.
+- Request body has exactly `group_id` and `user_id` keys, no extras.
+- Instance-id with special chars is path-escaped.
+- JSON output preserves `allowed`, `reason`, `writer_count`.
+- Table output includes the five required columns.
+
+## Todo List
+
+- [ ] Red probe tests.
+- [ ] `channelsWritersTestCmd` implemented + green.
+- [ ] `go vet` + `go build` clean.
+- [ ] Phase status flipped to Complete.
+
+## Success Criteria
+
+- Subcommand callable as `goclaw channels writers test ...`.
+- Body shape verified to contain only the two required keys.
+
+## Risks
+
+- Existing `cmd/channels_writers.go` already has add/remove/list/groups. Adding `test` might push file over 200 lines — split per repo rule if so.
+
+## Next Steps
+
+Phase 5 (Activity + Logs Runtime Aggregate).
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-05-activity-and-logs-aggregate.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-05-activity-and-logs-aggregate.md
new file mode 100644
index 0000000..d6a7480
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-05-activity-and-logs-aggregate.md
@@ -0,0 +1,120 @@
+---
+phase: 5
+title: "Activity Aggregate + Logs Runtime Aggregate (PR #44)"
+status: pending
+priority: P2
+effort: "3h"
+dependencies: [4]
+---
+
+# Phase 5: Activity Aggregate + Logs Runtime Aggregate
+
+## Overview
+
+Two aggregation surfaces with similar structure (group_by + filters → buckets). Implement together for shared validation/output helpers. TDD.
+
+## Surfaces
+
+### 5.1 `goclaw activity aggregate`
+
+```bash
+goclaw activity aggregate --group-by <action|actor_type|entity_type|actor_id> \
+  [--from <RFC3339>] [--to <RFC3339>] [--limit <n>] \
+  [--actor-type <v>] [--actor-id <v>] [--action <v>] [--entity-type <v>] [--entity-id <v>] \
+  [-o json|yaml|table]
+```
+
+- Endpoint: `GET /v1/activity/aggregate`
+- Required: `--group-by` ∈ {`action`, `actor_type`, `entity_type`, `actor_id`}.
+- Backend restricts `group_by=actor_id` to admin (server enforces; CLI does not pre-check).
+- Optional filters: `from`, `to` (both RFC3339 if provided), `limit`, plus actor/action/entity scope filters.
+- Response:
+  ```json
+  {"source":"activity","group_by":"action","total":10,"limit":50,"from":"...","to":"...","buckets":[{"key":"session.branch","count":7,"last_seen":"..."}]}
+  ```
+- Table: `KEY`, `COUNT`, `LAST_SEEN`.
+
+### 5.2 `goclaw logs aggregate`
+
+```bash
+goclaw logs aggregate [--group-by <level|source>] [--level <debug|info|warn|error>] [--source <source>] [--from <RFC3339>] [-o json|yaml|table]
+```
+
+- Endpoint: `GET /v1/logs/runtime/aggregate`
+- Admin-only. Source = runtime ring buffer, not durable audit log.
+- `--group-by` default `level`; valid: {`level`, `source`}.
+- Optional filters: `level`, `source`, `from`.
+- Response:
+  ```json
+  {"source":"runtime","retention":"ring_buffer","capacity":100,"sample_size":25,"group_by":"level","buckets":[{"key":"warn","count":3,"last_seen":1760000000000}]}
+  ```
+  Note: `last_seen` in this response is an **epoch millis number**, not a string.
+- **Renderer requirement (Red Team F6):** `unmarshalMap` at `cmd/helpers.go:49-61` decodes JSON numbers as `float64`; the shared `str()` helper uses `fmt.Sprintf("%v", v)`, which renders `1.76e+12` for large numbers. Implement a local `formatLastSeen(v interface{}) string`:
+  - If `v` is `string` → assume RFC3339 and return as-is.
+  - If `v` is `float64` or `int64` → treat as epoch millis and return `time.UnixMilli(int64(v)).UTC().Format(time.RFC3339)`.
+  - If nil/empty → return `"-"`.
+  Apply this helper in BOTH `activity aggregate` and `logs aggregate` table renderers so neither produces `1.76e+12`. Place the helper in `cmd/activity_aggregate.go` (or a small shared file `cmd/aggregate_helpers.go`) and import from `cmd/logs_aggregate.go`.
+- Table: `KEY`, `COUNT`, `LAST_SEEN` (via `formatLastSeen`), plus a header summary line with `SOURCE`, `RETENTION`, `CAPACITY`, `SAMPLE_SIZE` if existing output helpers support it; otherwise drop to JSON-only summary.
+- **Do not confuse with `goclaw logs tail` (WS streaming).**
+
+## Files
+
+- New: `cmd/activity_aggregate.go` — declares `activityAggregateCmd` ONLY (no new top-level parent).
+  - **Red Team F1 resolution:** `activityCmd` already exists at `cmd/admin.go:133` (current behavior: lists audit log via `goclaw activity`). Attach the new aggregate as a subcommand in `init()`: `activityCmd.AddCommand(activityAggregateCmd)`. Do NOT declare a new `var activityCmd`. Do NOT touch `cmd/cmd_test.go` top-level list (no new top-level command added).
+  - Command UX: `goclaw activity aggregate --group-by ...` (subcommand under existing parent — natural namespacing, no `cmd_test.go` churn).
+- Modify: `cmd/logs.go` — append `logsAggregateCmd`, register on `logsCmd`. `cmd/logs.go` is 111 LOC today; adding aggregate may push past 200 → consider `cmd/logs_aggregate.go` upfront.
+- New: `cmd/activity_aggregate_test.go`
+- New: `cmd/logs_aggregate_test.go`
+
+## TDD Sequence
+
+1. Red: activity aggregate test cases.
+2. Implement `activityAggregateCmd` (no `activityCmd` declared — reuse existing parent from `cmd/admin.go:133`); green.
+3. Red: logs aggregate test cases (including the `last_seen` RFC3339 rendering assertion).
+4. Implement `logsAggregateCmd` + `formatLastSeen` helper; green.
+5. `go vet ./... && go build ./...` clean.
+
+## Tests
+
+### `cmd/activity_aggregate_test.go`
+
+- Missing `--group-by` rejected before HTTP call.
+- Invalid `--group-by=foo` rejected before HTTP call.
+- Valid values (`action`, `actor_type`, `entity_type`, `actor_id`) all accepted.
+- `--from` / `--to` parse RFC3339; non-RFC3339 rejected before HTTP call.
+- All filter flags appear in query string with snake_case keys.
+- JSON output preserves `source`, `group_by`, `total`, `buckets`.
+- Table renders bucket rows with `KEY`, `COUNT`, `LAST_SEEN`.
+
+### `cmd/logs_aggregate_test.go`
+
+- Default `--group-by` is `level` (or omitted from query — match existing CLI default-omission style; phase 1 scope-lock confirms).
+- Invalid `--group-by=foo` rejected before HTTP call.
+- `--level`, `--source`, `--from` build correct query.
+- JSON output preserves `retention`, `capacity`, `sample_size`.
+- **Render assertion (Red Team F6):** table cell for `LAST_SEEN` matches RFC3339 regex (`^\d{4}-\d{2}-\d{2}T`), NOT `1.76e+12` scientific-notation form. Assert rendered cell content via captured stdout, not absence of panic.
+
+## Todo List
+
+- [ ] Red tests for activity aggregate.
+- [ ] `activityAggregateCmd` implemented as subcommand of existing `activityCmd` (do NOT declare new parent); green.
+- [ ] Red tests for logs aggregate (incl. RFC3339 cell-content assertion).
+- [ ] `formatLastSeen` helper + `logsAggregateCmd` implemented; green.
+- [ ] `go vet` + `go build` clean.
+- [ ] Confirm `logs aggregate` is clearly distinct from `logs tail` in `--help`.
+- [ ] Phase status flipped to Complete.
+
+## Success Criteria
+
+- Both aggregate commands callable.
+- `last_seen` numeric type handled without runtime panic.
+- No confusion with `logs tail` — distinct help text.
+
+## Risks
+
+- `last_seen` type mismatch between activity (RFC3339 string) and logs runtime (epoch millis int). Resolved by `formatLastSeen` type-switch helper (Red Team F6).
+- ~~New top-level `activity` command might collide~~ — **resolved (Red Team F1):** aggregate attaches as subcommand of existing `activityCmd` at `cmd/admin.go:133`. No new top-level.
+
+## Next Steps
+
+Phase 6 (Tests and Docs Sweep).
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-06-tests-and-docs.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-06-tests-and-docs.md
new file mode 100644
index 0000000..36b9a15
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-06-tests-and-docs.md
@@ -0,0 +1,65 @@
+---
+phase: 6
+title: "Tests and Docs Sweep"
+status: pending
+priority: P2
+effort: "1.5h"
+dependencies: [5]
+---
+
+# Phase 6: Tests and Docs Sweep
+
+## Overview
+
+Final validation gate before ship. Run full test suite, vet, build. Update README and codebase-summary.md. Run a red-team diff sweep against the explicit out-of-scope list. No new functional code.
+
+## Implementation Steps
+
+1. Run full validation:
+   - `go test -count=1 ./...`
+   - `go vet ./...`
+   - `go build ./...`
+   - `make build` (LDFLAGS smoke)
+2. Update docs:
+   - `README.md` — append the 7 new command surfaces to the command inventory section if such a section exists.
+   - `docs/codebase-summary.md` — list new files added and which Cobra command groups gained surfaces.
+   - `CHANGELOG.md` — add a `## Unreleased` entry (or follow existing semantic-release commit convention; do not manually edit if release notes are commit-driven).
+3. Red-team diff sweep (before opening PR):
+   - Confirm no command exists for `POST /v1/traces/{id}/replay`.
+   - Confirm no command exists for generic `GET /v1/logs/aggregate` (only `/v1/logs/runtime/aggregate`).
+   - Confirm no WebSocket `chat.history.delta` consumer added.
+   - Confirm no SSE chat history follow added.
+   - Confirm no watch loop in `traces follow` or `sessions follow`.
+   - Confirm no `--verify` flag added to `providers reconnect`.
+   - Confirm every new POST/PATCH path is path-escaped.
+   - Confirm no untracked unrelated files (`.claude/`, `AGENTS.md`) are staged.
+4. Write `reports/red-team-260527-p6.md` capturing the sweep outcome.
+
+## Todo List
+
+- [ ] `go test`, `go vet`, `go build` all green.
+- [ ] `make build` smoke passes.
+- [ ] README updated (or noted as not applicable).
+- [ ] `docs/codebase-summary.md` updated.
+- [ ] CHANGELOG entry (or confirmed commit-driven).
+- [ ] Red-team sweep documented in `reports/red-team-260527-p6.md`.
+- [ ] Out-of-scope checklist all-clear.
+- [ ] Phase status flipped to Complete.
+
+## Success Criteria
+
+- Zero failed tests.
+- Zero vet warnings.
+- Red-team sweep finds zero scope leaks.
+
+## Out of Scope (verbatim from issue #16, must remain absent)
+
+- `POST /v1/traces/{id}/replay`
+- generic `GET /v1/logs/aggregate`
+- WebSocket `chat.history.delta`
+- SSE chat history follow
+- long-running watch loops for `traces follow` or `sessions follow`
+
+## Next Steps
+
+Phase 7 (Ship Readiness).
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-07-ship-readiness.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-07-ship-readiness.md
new file mode 100644
index 0000000..e5647dd
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/phase-07-ship-readiness.md
@@ -0,0 +1,68 @@
+---
+phase: 7
+title: "Ship Readiness"
+status: pending
+priority: P2
+effort: "15m"
+dependencies: [6]
+---
+
+# Phase 7: Ship Readiness
+
+## Overview
+
+Single-step phase. Delegate the entire ship pipeline to the `/ck:ship` skill once phases 1–6 are green. This phase contributes the backend-evidence block and the PR body content; everything else (status check, secret scan, commit, push, PR creation, review wait, merge) is owned by `/ck:ship`.
+
+**Red Team F15 resolution:** the prior 9-step checklist duplicated `/ck:ship`. "Watch beta release publish" was open-ended async waiting unsuitable as a phase gate. Both removed. Manual `CHANGELOG.md` edits removed — `semantic-release` is commit-driven (see `.github/workflows/release.yaml`).
+
+## Single Step
+
+Invoke:
+
+```
+/ck:ship official
+```
+
+Provide this PR-body block when prompted (or paste into the gh-generated body):
+
+```markdown
+## Backend evidence
+
+- PR #37 (digitopvn/goclaw) — commit `56e227c4030e85163cd882b29ab472f8ce3e1a27` — surfaces `traces/follow` and `providers/{id}/reconnect`.
+- PR #44 (digitopvn/goclaw) — commit `43049d3b3fbb5f457477118252d1f21fdc0480de` — surfaces `chat/sessions/{key}/branch`, `chat/sessions/{key}/history/follow`, `channels/instances/{id}/writers/test`, `activity/aggregate`, `logs/runtime/aggregate`.
+- Beta tag: `v3.12.0-beta.20` is the earliest tag containing PR #44 (verified via `gh api repos/digitopvn/goclaw/compare/v3.12.0-beta.20...43049d3b` → `identical`). Latest beta as of 2026-05-27: `v3.12.0-beta.35`.
+
+## Out of scope (verbatim from issue #16)
+
+- POST /v1/traces/{id}/replay
+- generic GET /v1/logs/aggregate
+- WebSocket chat.history.delta
+- SSE chat history follow
+- long-running watch loops for traces follow or sessions follow
+```
+
+Conventional-commit subject for `/ck:ship` to use:
+
+```
+feat(cli): add P6 backend-unblocked CLI surfaces (issue #16)
+```
+
+## Todo List
+
+- [ ] `/ck:ship official` invoked with backend-evidence block.
+- [ ] PR opened against `dev` with backend-evidence + out-of-scope sections.
+- [ ] Phase status flipped to Complete once PR is open (not waiting for merge — `/ck:ship` owns merge cadence).
+
+## Success Criteria
+
+- One PR open against `dev` with the conventional `feat:` subject and the backend-evidence + out-of-scope blocks in the body.
+
+## Out of Scope
+
+- Promoting `dev` to `main` (separate ship cycle, like the one that produced PR #18 on 2026-05-27).
+- Manual `CHANGELOG.md` edits (semantic-release commit-driven).
+- Watching beta release publish (out of phase 7 scope — beta confirmation happens whenever it happens).
+
+## Next Steps
+
+Close issue #16 with merge reference after `/ck:ship` reports PR merged.
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/plan.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/plan.md
new file mode 100644
index 0000000..78836bc
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/plan.md
@@ -0,0 +1,178 @@
+---
+title: "Domain Coverage P6 — Backend-Unblocked CLI Surfaces"
+description: "Implement the 7 CLI commands now unblocked by digitopvn/goclaw PR #37 and PR #44. TDD-driven, 4 grouped implementation phases."
+status: in_progress
+priority: P2
+branch: "feat/p6-backend-unblocked-cli"
+base: "dev"
+tags: [domain-coverage, p6, cli, tdd, backend-unblocked]
+blockedBy: []
+blocks: []
+parentBacklog: "../260503-1907-domain-coverage-p3-plus/plan.md"
+relatedIssue: "https://github.com/nextlevelbuilder/goclaw-cli/issues/16"
+created: "2026-05-27T07:12:00Z"
+createdBy: "ck:plan --tdd"
+source: skill
+---
+
+# Domain Coverage P6 — Backend-Unblocked CLI Surfaces
+
+## Overview
+
+Implement the 7 CLI command surfaces unblocked by two backend PRs in `digitopvn/goclaw`:
+
+- **PR #37** (merged, beta `v3.12.0-beta.16`+) → `GET /v1/traces/follow`, `POST /v1/providers/{id}/reconnect`
+- **PR #44** (merged, beta `v3.12.0-beta.20`+ verified) → `POST /v1/chat/sessions/{key}/branch`, `GET /v1/chat/sessions/{key}/history/follow`, `POST /v1/channels/instances/{id}/writers/test`, `GET /v1/activity/aggregate`, `GET /v1/logs/runtime/aggregate`
+
+Mirror issue [#16](https://github.com/nextlevelbuilder/goclaw-cli/issues/16) scope exactly. 4 implementation phases grouped by backend PR + functional cluster. TDD: failing tests before each implementation slice.
+
+## Resolved Pre-Plan Question
+
+Issue #16 asked: *"Which beta tag ultimately contains backend commit `43049d3b`?"*
+
+**Verified via `gh api compare`:** `v3.12.0-beta.20` is the earliest tag identical to `43049d3b`. Live smoke MUST target ≥ `v3.12.0-beta.20`. Latest beta as of 2026-05-27 is `v3.12.0-beta.35`.
+
+## Decisions
+
+- Scope: exactly the 7 surfaces in issue #16, no extras.
+- Grouping: 4 implementation phases, not 7 (shared client helpers + test fixtures).
+- TDD: failing command tests land before Cobra command code in every phase.
+- No watch loops. `traces follow` and `sessions follow` are one-shot polling requests; SSE/WS deferred.
+- No commands or stubs for explicitly out-of-scope APIs (see below).
+- Reuse `internal/client.HTTPClient`, `internal/output`, existing `printer.Print(unmarshalMap/List(...))` patterns.
+- Path-escape all path params via the same pattern already used in `cmd/api_keys_rotate.go`, `cmd/storage.go` (after P5 RT-02 fix).
+- Branch off `dev`, not `main`. One PR back to `dev`.
+
+## Phases
+
+| Phase | Name | Surfaces | Status |
+|-------|------|----------|--------|
+| 1 | [Scope Lock](./phase-01-scope-lock.md) | n/a | Complete |
+| 2 | [Traces Follow + Providers Reconnect](./phase-02-traces-follow-and-providers-reconnect.md) | 2 (PR #37) | Complete |
+| 3 | [Sessions Branch + Follow](./phase-03-sessions-branch-and-follow.md) | 2 (PR #44 chat) | Complete |
+| 4 | [Channels Writers Test](./phase-04-channels-writers-test.md) | 1 (PR #44 channels) | Complete |
+| 5 | [Activity + Logs Runtime Aggregate](./phase-05-activity-and-logs-aggregate.md) | 2 (PR #44 aggregation) | Complete |
+| 6 | [Tests and Docs Sweep](./phase-06-tests-and-docs.md) | n/a | Complete |
+| 7 | [Ship Readiness](./phase-07-ship-readiness.md) — collapsed to single `/ck:ship official` invocation | n/a | In Progress |
+
+## Command Surface Inventory
+
+```bash
+goclaw traces follow --session-key <key> [--since <RFC3339>] [--limit N] [--include-spans] [--status <status>] [--channel <channel>]
+goclaw traces follow --agent <uuid-or-key> [same flags]
+
+goclaw providers reconnect <provider-id>
+
+goclaw sessions branch <session-key> --up-to-index <n> [--new-session-key <key>] [--label <label>] [--metadata k=v]...
+goclaw sessions follow <session-key> [--cursor <n>] [--limit <n>]
+
+goclaw channels writers test <instance-id> --group-id <group-scope> --user-id <user-id>
+
+goclaw activity aggregate --group-by <action|actor_type|entity_type|actor_id> [filters]
+
+goclaw logs aggregate [--group-by <level|source>] [--level <level>] [--source <source>] [--from <RFC3339>]
+```
+
+All commands support `--output {table,json,yaml}` per existing TTY auto-detection.
+
+## Explicitly Out of Scope (verbatim from issue #16)
+
+Do not add commands, stubs, hidden flags, or docs for APIs that still do not exist:
+
+- `POST /v1/traces/{id}/replay`
+- generic `GET /v1/logs/aggregate`
+- WebSocket `chat.history.delta`
+- SSE chat history follow
+- long-running watch loops for `traces follow` or `sessions follow`
+
+## Existing CLI State (verified 2026-05-27 on `dev` tip `143624d`)
+
+| File | Existing surfaces | New work |
+|------|------------------|----------|
+| `cmd/traces.go` | `list`, `get`, `export` | add `follow` subcommand |
+| `cmd/providers.go` + `providers_verify.go` (Use: `verify-embedding`) + `providers_claude_cli.go` + `providers_codex_pool.go` | CRUD + verify-embedding | add `reconnect` subcommand |
+| `cmd/sessions.go` + `chat_sessions.go` | `list`, `preview`, `delete`, `reset`, `label`, `compact` (all → `/v1/sessions/...`) | add `branch`, `follow` (→ `/v1/chat/sessions/...`) |
+| `cmd/channels_writers.go` | `list`, `groups`, `add`, `remove` | add `test` |
+| `cmd/admin.go:133` already declares `var activityCmd` (`Use: "activity"`, audit-log lister) | `goclaw activity` (lists audit log) | attach new `aggregate` as **subcommand** of existing `activityCmd` — new file `cmd/activity_aggregate.go` |
+| `cmd/logs.go` | `tail` (WS streaming) | add `aggregate` (HTTP, distinct subcommand) |
+
+**Collisions found:** `activityCmd` already declared at `cmd/admin.go:133` — new aggregate hangs off it as a subcommand (not a new top-level group). No other collisions. `providers verify` does NOT exist; actual command is `providers verify-embedding` (`cmd/providers_verify.go:11`).
+
+## Dependencies
+
+- Parent backlog: `../260503-1907-domain-coverage-p3-plus/plan.md` (P6 listed as server-blocked; this plan unblocks)
+- Codex prompt evidence: `plans/reports/codex-prompt-260522-p6-pr44-backend-unblocked-cli.md` (lives on `feat/claude-skill-v0.1` worktree)
+- Backend contracts:
+  - `digitopvn/goclaw` `internal/http/traces.go` (PR #37)
+  - `digitopvn/goclaw` `internal/http/providers.go` (PR #37)
+  - `digitopvn/goclaw` `internal/http/sessions.go` (PR #44)
+  - `digitopvn/goclaw` `internal/http/channel_instances.go` (PR #44)
+  - `digitopvn/goclaw` `internal/http/activity.go` (PR #44)
+  - `digitopvn/goclaw` `internal/http/logs.go` (PR #44)
+- OpenAPI: `digitopvn/goclaw` `internal/http/openapi_spec.json`
+
+## Validation Gates (per phase + final)
+
+- `/usr/local/go/bin/go test ./...` (TDD: red → green per phase)
+- `/usr/local/go/bin/go vet ./...`
+- `/usr/local/go/bin/go build ./...`
+- Live smoke against `v3.12.0-beta.20`+ backend (optional, after merge)
+- Red-team diff before PR (path/body/output regressions, accidental scope creep into out-of-scope list)
+
+## Success Criteria
+
+- [ ] All 7 surfaces implemented per contracts in respective phases.
+- [ ] Focused tests cover path, query, body, validation-before-HTTP, output format preservation.
+- [ ] No CLI command exists for `traces/{id}/replay` or generic `/v1/logs/aggregate`.
+- [ ] No watch loops added.
+- [ ] `go test ./... && go vet ./... && go build ./...` all green.
+- [ ] README/help text reflects new commands (deferred to phase 6 if needed).
+- [ ] PR body lists backend PR/tag evidence and notes ≥ `v3.12.0-beta.20` requirement.
+
+## Risk Assessment
+
+| Risk | Likelihood | Mitigation |
+|------|-----------|------------|
+| Backend response shape drift between dev and beta tag | Low | Phase 1 re-verifies contracts against `digitopvn/goclaw` `dev` HEAD before tests written |
+| Path-escape regression (RT-02 from P5) | Medium | TDD test for escaped session keys with `:` and `/` in phase 3 |
+| Scope creep into watch loops or replay | Medium | Explicit out-of-scope checklist in phase 6 red-team diff |
+| Untracked files in `feat/claude-skill-v0.1` accidentally staged | Low | This plan uses a clean worktree (`.claude/worktrees/elated-galileo-2c7cfd`), separate branch |
+| Naming collision with existing subcommands | Resolved | Red-team found `activityCmd` collision at `cmd/admin.go:133`; phase 5 now attaches `aggregate` as subcommand of existing parent |
+| `buildBody` int-zero drop bug (`cmd/helpers.go:86-89`) corrupting `--up-to-index 0` / `--cursor 0` | Resolved | Phase 3 builds request body / query map directly for these required numeric fields; tests assert zero is preserved |
+| `/v1/chat/sessions/...` vs `/v1/sessions/...` prefix split confuses operators | Medium | Phase 3 documents the prefix split in Long-help for `branch`/`follow`; data domain matches existing `chat_sessions.go`'s `/v1/chat/sessions/...` callers |
+
+## Handoff
+
+Recommended next: invoke `/ck:cook` to execute phase 1 first (scope lock + contract re-verification), then proceed phase-by-phase with TDD.
+
+## Red Team Review
+
+### Session — 2026-05-27
+**Reviewers:** 4 spawned (Security Adversary, Failure Mode Analyst, Assumption Destroyer, Scope & Complexity Critic). 3 returned full findings; Assumption Destroyer hit Anthropic session limit after producing partial output — covered by overlap with other lenses.
+**Findings:** 30 raw → deduplicated to 15 unique. User chose **Apply Critical + High (7 findings)**. 8 Mediums deferred (see notes).
+**Severity breakdown applied:** 2 Critical, 5 High.
+
+| # | Finding | Severity | Evidence | Applied To |
+|---|---------|----------|----------|------------|
+| 1 | `activityCmd` already exists at `cmd/admin.go:133` (audit-log lister) → new aggregate must be subcommand, not new top-level | Critical | `cmd/admin.go:133,172`; `cmd/cmd_test.go:18-57` | plan.md table, phase 5 |
+| 2 | `buildBody` at `cmd/helpers.go:86-89` drops `int v == 0` → `--up-to-index 0` and `--cursor 0` silently disappear | Critical | `cmd/helpers.go:86-89` | phase 3 |
+| 3 | Backend path `/v1/chat/sessions/{key}/...` vs existing `goclaw sessions <verb>` → `/v1/sessions/...` — document the Cobra-parent vs backend-tree split | High | `cmd/sessions.go:67,88,109,128`; `cmd/chat_sessions.go:5` | phase 3 |
+| 4 | `cmd/providers_crud.go` doesn't exist (stale plan claim) | High | `find cmd/ -name "providers_*.go"` | plan.md table |
+| 5 | Plan recommends `goclaw providers verify`; actual command is `verify-embedding` | High | `cmd/providers_verify.go:11` | phase 2 |
+| 6 | `logs aggregate` `last_seen` (epoch millis) renders as `1.76e+12` because `unmarshalMap` decodes JSON numbers as float64 + `str()` uses `%v` | High | `cmd/helpers.go:49-61` | phase 5 |
+| 7 | "One polling request" claim has no test that actually counts requests — copy-paste of `client.FollowStream` risk | High | `cmd/logs.go:43-47`; `internal/client/follow.go` | phase 2 + phase 3 |
+
+**Deferred (Medium, user-skipped this round):**
+- F8 logs aggregate redaction (TTY banner + secret-shape strip)
+- F9 ANSI/OSC escape sanitization in output renderer
+- F10 `--metadata k=v` permissive parser (empty keys, duplicates, multi-`=` for base64)
+- F11 phase 1 collision grep regex too narrow (partially fixed via F1 application)
+- F12 `cmd/sessions.go` 171 LOC + 2 new commands → split upfront (kept as conditional per existing plan language)
+- F13 `go test -race -count=1` in phase 6 to match CI
+- F14 `cmd/cmd_test.go` top-level list update — **resolved by F1 choice** (subcommand under existing activityCmd → no new top-level)
+
+### Whole-Plan Consistency Sweep
+- Files reread: plan.md, phase-01 through phase-07.
+- Decision deltas applied: activityCmd subcommand attachment; buildBody-zero workaround; providers-verify rename; epoch-millis renderer.
+- Reconciled stale references: 2 (existing CLI state table; risk row "Naming collision" updated below).
+- Unresolved contradictions: 0.
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/code-review-260527-p6.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/code-review-260527-p6.md
new file mode 100644
index 0000000..ecc4ca6
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/code-review-260527-p6.md
@@ -0,0 +1,76 @@
+# Code Review — P6 Backend-Unblocked CLI Surfaces
+
+**Date:** 2026-05-27
+**Reviewer:** code-reviewer subagent (findings returned inline; persisted by parent session)
+**Status:** DONE
+
+## Scope
+- 14 new files (7 `cmd/*.go` + 7 `cmd/*_test.go`)
+- 1 modified file (`cmd/channels_writers.go` appended)
+- 2 doc updates (`README.md`, `docs/codebase-summary.md`)
+- All command files under 200 LOC; tests up to 224.
+
+## Build / Vet / Test
+- `go vet ./...` — clean
+- `go build ./...` — clean
+- `go test -count=1 ./...` — all green
+
+## Overall Assessment
+Clean, focused, well-tested. Implementation matches every acceptance criterion. Every new command is a single HTTP one-shot via `newHTTP()` — no `FollowStream`/`newWS` imports anywhere new. Zero-preservation, path-escaping, format auto-detection, central error handling, and table-vs-JSON branching all behave correctly. The `formatLastSeen` cross-type helper is the right abstraction and is exercised by a dedicated scientific-notation regression test.
+
+## Critical / High / Medium
+None.
+
+## Low
+
+**L1. `providers reconnect` body-empty assertion is permissive.**
+- File: `cmd/providers_reconnect_test.go:44-50`
+- The wrapping `if body != "" && body != "null" && body != "{}"` invites confusion. Stricter form: `assert body == "" || body == "null"`, then separately assert no `verify` key. Not a defect today (`c.Post(path, nil)` marshals nothing — verified `internal/client/http.go:127`). Readability nit only.
+
+**L2. `--limit=0` semantics differ across commands.**
+- `traces follow`, `activity aggregate`: `--limit=0` → omitted from query (server default applies).
+- `sessions follow`: `--limit <= 0` → rejected (positive limit required).
+- Asymmetry is correct (a cursor's "0" means "from start", a page-size's "0" means "no bound" which the server should pick). Documented in flag help. Just worth noting for future contributors.
+
+**L3. Optional-filter flag iteration is non-deterministic.**
+- `activity_aggregate.go:107-117`, `logs_aggregate.go:48-56` iterate a `map[string]string` to set query params. Order is non-deterministic, but tests parse via `url.ParseQuery` so they don't depend on order. Cosmetic only.
+
+**L4. `formatLastSeen` has defensive dead branches.**
+- Handles `nil`, `string`, `float64`, `int64`, `int`. `json.Unmarshal` into `any` only ever produces `float64` for numbers; the `int64`/`int` branches are future-proofing. Harmless.
+
+## Verification of Acceptance Criteria
+
+| # | Criterion | Evidence |
+|---|---|---|
+| 1 | Route + method correct for all 7 commands | Asserted in `*_test.go` path/method checks |
+| 2 | No watch loops / WS streams | `grep "FollowStream\|newWS"` in new files → empty; atomic-counter tests assert `calls == 1` |
+| 3 | `up_to_index=0`, `cursor=0` preserved on wire | Direct body/query construction in `sessions_branch.go:46` and `sessions_follow.go:40-41`; regression tests `TestSessionsBranch_UpToIndexZero`, `TestSessionsFollow_CursorZero` |
+| 4 | All path params `url.PathEscape`'d | Every new command + dedicated `*_PathEscape` tests |
+| 5 | `formatLastSeen` covers string + numeric | `TestLogsAggregate_LastSeenRendersRFC3339` asserts no `e+12` + RFC3339 regex |
+| 6 | `channels writers test` body has exactly 2 keys | Literal `map[string]any{"group_id": ..., "user_id": ...}` in `channels_writers.go:103-106`; test asserts `len(body) == 2` |
+| 7 | `providers reconnect` body empty | `c.Post(path, nil)`; test guards against `verify` substring |
+| 8 | `activityAggregateCmd` is subcommand of existing parent | `activityCmd.AddCommand(activityAggregateCmd)` in `init()`; pre-existing parent at `cmd/admin.go:133`; no new top-level |
+| 9 | No plan-artifact refs in code/tests | `grep -E "phase\|F[1-9]\|red[ -]team\|audit A\|finding\|§"` in new files → empty |
+| 10 | Validation before HTTP | All flag-validation errors return BEFORE `newHTTP()`; negative tests in every file |
+
+## Test Hygiene
+- Every test file declares a `reset*Flags(t)` helper with `t.Cleanup`.
+- Int flags reset to declared defaults (`-1`, `0`, `50`) — no cross-test pollution.
+- `StringArray` reset via `pflag.SliceValue.Replace(nil)` (sessions_branch_test.go:21-26).
+- Central error handler honored: no `os.Exit`/direct error printing in new commands.
+
+## Recommended Actions
+None blocking. Optional polish (all Low):
+1. Tighten body-empty assertion in `TestProvidersReconnect_PathAndMethod`.
+2. Sort optional-filter flag iteration if URL-snapshot testing is ever added.
+
+## Metrics
+- New files: 14
+- Modified files: 3 (`channels_writers.go`, `README.md`, `docs/codebase-summary.md`)
+- Compilation: clean
+- Tests: 100% pass under `-count=1`
+- Lint (`go vet`): clean
+- Plan-artifact references in production / test code: 0
+
+## Unresolved Questions
+None.
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/from-code-reviewer-to-planner-red-team-assumption-destroyer-plan-review-report.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/from-code-reviewer-to-planner-red-team-assumption-destroyer-plan-review-report.md
new file mode 100644
index 0000000..41e3038
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/from-code-reviewer-to-planner-red-team-assumption-destroyer-plan-review-report.md
@@ -0,0 +1,181 @@
+# Red-Team Plan Review — Assumption Destroyer Lens
+
+Plan: `plans/260527-1412-domain-coverage-p6-backend-unblocked/`
+Reviewer role: hostile (Assumption Destroyer); Scope Auditor verification
+Date: 2026-05-27
+Codebase tip verified: `dad4b89` (worktree branch `claude/elated-galileo-2c7cfd`)
+
+---
+
+## Finding 1: `activityCmd` already exists at root — Phase 5 will fail to compile
+
+- **Severity:** Critical
+- **Location:** Phase 5 (`phase-05-activity-and-logs-aggregate.md`), section "Files" → "Decide: place under a new top-level `activity` Cobra group... → create new `activityCmd` parent." Also Phase 1 "Re-verify no naming collisions" Todo.
+- **Flaw:** The plan claims no command-name collisions exist and proposes declaring a new `activityCmd`. A top-level `activityCmd` is already declared and registered in `cmd/admin.go`. Declaring a second `var activityCmd = ...` in `cmd/activity_aggregate.go` will fail to compile (duplicate identifier in the same `cmd` package). Even if renamed, the user-facing `goclaw activity` route is taken: the existing command runs `GET /v1/activity` (audit log list) — `goclaw activity aggregate` cannot be added without restructuring the existing command into a parent group with both `aggregate` and a list/run subcommand, which IS a breaking CLI change not documented in the plan.
+- **Failure scenario:** `go build ./...` fails with `cmd/activity_aggregate.go:NN: activityCmd redeclared in this block` on the first compile attempt of Phase 5. Even after rename, `rootCmd.AddCommand(activityCmd, ...)` is called twice in two `init()` blocks → Cobra panics at startup with duplicate command name.
+- **Evidence:**
+  - `cmd/admin.go:133` — `var activityCmd = &cobra.Command{ Use: "activity", Short: "View audit log", RunE: ... }`
+  - `cmd/admin.go:172` — `rootCmd.AddCommand(approvalsCmd, delegationsCmd, adminCredentialsCmd, activityCmd, ...)`
+  - Plan plan.md:96 — `_(no `cmd/admin_activity.go`)_ | — | new file `cmd/activity_aggregate.go`` ← claim "no existing activity command group exists" is false.
+- **Suggested fix:** Convert the existing `activityCmd` into a parent group; move its current `RunE` into a new `activityListCmd` (and add a deprecation alias if needed); then attach `activityAggregateCmd` as a sibling. Document the CLI change in CHANGELOG. Re-run the Phase 1 collision sweep — the grep pattern in Phase 1 step 5 (`grep -in 'Use:.*"follow|reconnect|branch|aggregate|test"' cmd/*.go`) misses bare top-level groups like `activity` because it filters on the *new* subcommand names, not the new parent name.
+
+---
+
+## Finding 2: Phase 1 collision-sweep grep is too narrow — would have missed `activityCmd`
+
+- **Severity:** High
+- **Location:** Phase 1, Implementation Step 5: `grep -in 'Use:.*"follow\|reconnect\|branch\|aggregate\|test"' cmd/*.go`.
+- **Flaw:** The sweep only checks the *new subcommand names*. It does not check for collisions of the new parent group name `activity`. It would not detect Finding 1. Worse, even the new subcommand check is unreliable — many existing commands declare `Use: "test"` (e.g. `cmd/hooks_test_runner.go:14`, `cmd/heartbeat.go:109`) and `Use: "reconnect <id>"` (e.g. `cmd/mcp_servers.go:15`), which the regex matches — but the phase 1 step lists no acceptance criterion for *expected* matches versus *true conflicts*, so a reviewer cannot tell at a glance whether the output is a problem.
+- **Failure scenario:** Phase 1 closes with "no collisions found", phase 5 lands, compilation fails.
+- **Evidence:**
+  - `cmd/admin.go:133` — `Use: "activity"` (not matched by current grep)
+  - `cmd/hooks_test_runner.go:14`, `cmd/heartbeat.go:109` — both `Use: "test"` (matched by grep but each on a different parent; no rubric for triage)
+  - `cmd/mcp_servers.go:15` — `Use: "reconnect <id>"` under `mcpServersCmd` (different parent — safe, but plan does not distinguish)
+- **Suggested fix:** Replace step 5 with explicit (a) check for new top-level parents (`grep -n 'Use: "activity"' cmd/*.go`) and (b) check that each proposed new subcommand is unique *under its specific parent*, listing the parent (`tracesCmd.AddCommand`, `providersCmd.AddCommand`, etc.). Document expected vs unexpected matches in `reports/scope-lock-260527-p6.md`.
+
+---
+
+## Finding 3: Endpoint family mismatch — plan attaches `branch` / `follow` to wrong `sessions` parent
+
+- **Severity:** Critical
+- **Location:** Phase 3, section "Files" → "Modify: `cmd/sessions.go` — append `sessionsBranchCmd` + `sessionsFollowCmd` and register both."
+- **Flaw:** Two `sessions` Cobra subcommands exist:
+  - `sessionsCmd` at root (`cmd/sessions.go:12`) — calls `/v1/sessions/...` (legacy alias)
+  - `chatSessionsCmd` under `chatCmd` (`cmd/chat_sessions.go:5`, `Use: "sessions"`) — used for chat-session conveniences
+  The new endpoints `/v1/chat/sessions/{key}/branch` and `/v1/chat/sessions/{key}/history/follow` belong to the **chat-session** family, not the legacy `/v1/sessions/*` family. Attaching them to `sessionsCmd` mixes endpoint families on one Cobra parent, surprising users (`goclaw sessions list` hits `/v1/sessions`, but `goclaw sessions branch` hits `/v1/chat/sessions`). Worse, the plan never confirms whether `/v1/chat/sessions/{key}/...` and `/v1/sessions/{key}/...` route to the same backing store, or which one is canonical post-PR #44.
+- **Failure scenario:** A user inspects audit logs after running `goclaw sessions branch X --up-to-index 5`, sees a write to a different table than the one `goclaw sessions list` reads from, and files a "branch didn't show up in list" bug. CLI looks inconsistent.
+- **Evidence:**
+  - `cmd/sessions.go:35,67,88,109,128` — all paths are `/v1/sessions/...`
+  - `cmd/chat_sessions.go:5` — `var chatSessionsCmd = &cobra.Command{Use: "sessions", ...}` registered under `chatCmd` at line 30
+  - Plan plan.md:25 — endpoints are `POST /v1/chat/sessions/{key}/branch`, `GET /v1/chat/sessions/{key}/history/follow`
+  - Plan plan.md:67 — surface inventory: `goclaw sessions branch <session-key>` (ambiguous — which `sessions`?)
+- **Suggested fix:** Either (a) attach the new subcommands to `chatSessionsCmd` so the user invokes `goclaw chat sessions branch ...` (matches endpoint family); or (b) explicitly document in Phase 1 / Phase 3 why mixing endpoint paths under `sessionsCmd` is acceptable, with a comment in the source file noting the dual-endpoint pattern. Update plan.md command-surface inventory to use the chosen invocation.
+
+---
+
+## Finding 4: "Path-escape helper" doesn't exist — plan describes a fiction
+
+- **Severity:** Medium
+- **Location:** plan.md:43 "Decisions" — "Path-escape all path params via the same pattern already used in `cmd/api_keys_rotate.go`, `cmd/storage.go` (after P5 RT-02 fix)." Phase 2 surface 2.2 "Path-escape `<provider-id>` via existing helper."
+- **Flaw:** There is no helper. Both files call `url.PathEscape` inline from `net/url`. Calling it a "helper" implies a shared symbol; future maintainers reading the plan will grep for a helper that doesn't exist, then either invent one (scope creep) or copy-paste inline.
+- **Failure scenario:** Implementing engineer adds `import "github.com/nextlevelbuilder/goclaw-cli/internal/x"` looking for the helper, finds nothing, asks the lead, wastes a cycle. Or worse, writes their own `pathEscape()` wrapper, fragmenting the convention.
+- **Evidence:**
+  - `cmd/api_keys_rotate.go:44` — `c.Post("/v1/api-keys/"+url.PathEscape(args[0])+"/revoke", nil)` (inline)
+  - `cmd/storage.go:55` — `c.GetRaw("/v1/storage/files/" + url.PathEscape(args[0]))` (inline)
+  - `cmd/providers.go:44` — `c.Get("/v1/providers/" + url.PathEscape(args[0]))` (inline)
+  - No `func pathEscape` or similar helper in `cmd/helpers.go`, `cmd/io_helpers.go`, or `internal/client/`
+- **Suggested fix:** Reword the plan to "call `url.PathEscape` inline on every positional ID, matching the inline pattern in `cmd/api_keys_rotate.go:44` and `cmd/storage.go:55`." Drop the word "helper".
+
+---
+
+## Finding 5: Phase 6 CHANGELOG instruction will collide with existing `[Unreleased]` section and semantic-release autogen
+
+- **Severity:** High
+- **Location:** Phase 6, Implementation Step 2: "`CHANGELOG.md` — add a `## Unreleased` entry (or follow existing semantic-release commit convention; do not manually edit if release notes are commit-driven)."
+- **Flaw:** The instruction is internally contradictory — "add an entry OR don't manually edit". The CHANGELOG **already** has `## [Unreleased] — Domain Coverage Expansion (P0–P5)` populated through P5; the repo's release workflow runs `go-semantic-release ... --prepend-changelog --changelog CHANGELOG.md`, which prepends a new versioned section on each release based on conventional commits. Manually adding another `## Unreleased` will (a) produce two `[Unreleased]` headers, (b) get clobbered or duplicated by the next release run, and (c) cause `gh release upload CHANGELOG.md` to ship a malformed file.
+- **Failure scenario:** On merge to `dev`, the Release workflow at `.github/workflows/release.yaml:73` runs `semantic-release --prepend-changelog`. New entry is inserted *above* the existing `[Unreleased]` block. The released CHANGELOG.md uploaded by step at line 89 has two `[Unreleased]` headers and confusing duplicate content.
+- **Evidence:**
+  - `CHANGELOG.md:8` — `## [Unreleased] — Domain Coverage Expansion (P0–P5)` (existing, populated through P5)
+  - `.github/workflows/release.yaml:64-65` — `--changelog CHANGELOG.md --prepend-changelog`
+  - `.github/workflows/release.yaml:88-89` — `test -s CHANGELOG.md; gh release upload "v${RELEASE_VERSION}" CHANGELOG.md --clobber`
+- **Suggested fix:** Drop the manual CHANGELOG edit. Rely on conventional-commit-driven release notes (one `feat(cli):` commit, optionally split per surface for granularity). If a manual entry is desired, *replace* (not append to) the existing `[Unreleased]` block and explicitly note in the plan that semantic-release will rename it to the next beta version on `dev` merge.
+
+---
+
+## Finding 6: Phase 4 file naming `channels_writers_test_test.go` is unnecessarily fragile
+
+- **Severity:** Medium
+- **Location:** Phase 4, "Files" — "New: `cmd/channels_writers_test_test.go` (file naming kept descriptive; `_test_test.go` is intentional — the command name is `test` and Go test file suffix is `_test.go`)."
+- **Flaw:** While Go does accept `_test_test.go` (the suffix that matters is `_test.go`), there's a real problem: if anyone later adds tests for the *whole* `channels_writers.go` file, the natural name `channels_writers_test.go` will share package init with `channels_writers_test_test.go`, and a future developer attempting to consolidate tests will be confused about which file holds which. Worse, the plan's "alternative if Go tooling balks" hedge admits uncertainty without committing — i.e., the plan ships a hypothesis instead of a decision.
+- **Failure scenario:** Implementing engineer creates `channels_writers_test_test.go`, six months later another engineer adds `channels_writers_test.go` for general tests; CI passes, but human review/grep becomes confusing. Also: any IDE that strips `_test` suffix to map test→source ("go to source file") will map both to `channels_writers.go` and fail to distinguish.
+- **Evidence:**
+  - Repo convention: tests live next to source with `_test.go` suffix only (e.g. `cmd/agents_export.go` ↔ `cmd/agents_export_test.go`, `cmd/edition.go` ↔ `cmd/edition_test.go`).
+  - No existing `*_test_test.go` files anywhere in the repo (`ls cmd/*_test_test.go` returns empty).
+- **Suggested fix:** Commit to `cmd/channels_writers_probe_test.go` (matches the semantic "writers probe / writer-permission test") and drop the alternative hedge from the plan.
+
+---
+
+## Finding 7: Phase 3 / Phase 5 file-size handling is post-hoc rather than planned
+
+- **Severity:** Medium
+- **Location:** Phase 3 Files: "If `cmd/sessions.go` grows past 200 lines, split into `cmd/sessions_branch.go` and `cmd/sessions_follow.go`". Phase 5 Files: "If file grows past 200 lines, extract to `cmd/logs_aggregate.go`." Phase 4 Risks: "Adding `test` might push file over 200 lines — split per repo rule if so."
+- **Flaw:** `cmd/sessions.go` is **already 171 lines** and adds at least two new ~30-50 line subcommands plus flag wiring → ~250-280 lines guaranteed. `cmd/traces.go` is **already 286 lines** (above 200), but Phase 2 still says "Modify: `cmd/traces.go` — append `tracesFollowCmd`" without flagging that the file is already over the modularization threshold and should be split *first*. The plan defers the inevitable modularization decision to runtime ("if it grows past 200..."), which leads to mid-phase pauses and inconsistent layout across files.
+- **Failure scenario:** Engineer appends to `cmd/sessions.go`, hits 240+ lines, has to stop and split mid-phase, then redo imports and test setup. Or worse, ignores the rule and ships a 280-line file.
+- **Evidence:**
+  - `wc -l cmd/sessions.go` → 171
+  - `wc -l cmd/traces.go` → 286 (already past threshold before P6 starts)
+  - `wc -l cmd/channels_writers.go` → 89 (safe for now)
+  - Repo CLAUDE.md (`./CLAUDE.md`) — "If a code file exceeds 200 lines of code, consider modularizing it"
+- **Suggested fix:** Decide modularization up-front:
+  - Phase 2: create `cmd/traces_follow.go` (don't append to `traces.go`).
+  - Phase 3: create `cmd/sessions_branch.go` and `cmd/sessions_follow.go` from the start.
+  - Phase 5: `cmd/logs_aggregate.go` from the start (not "if file grows").
+  This eliminates a class of mid-phase pauses and matches the rest of the repo (e.g. `providers_verify.go`, `agents_export.go`).
+
+---
+
+## Finding 8: "next_since" / "spans_by_trace_id" / `last_seen` types unverified — Phase 2/5 tests hard-coded against unsourced shapes
+
+- **Severity:** High
+- **Location:** Phase 2 surface 2.1 response shape `{"traces":[],"spans_by_trace_id":{},"server_time":"...","next_since":"...","limit":50}`. Phase 5 surface 5.2: "`last_seen` in this response is an **epoch millis number**, not a string."
+- **Flaw:** Phase 1 promises to "re-verify backend contracts" and the implementation phases promise specific JSON keys and value types. But the plan does NOT cite the actual backend file:line. There is no proof in the plan that `next_since`, `spans_by_trace_id`, or numeric-vs-string `last_seen` are the canonical names. If the backend uses `nextSince` (camelCase) or `lastSeenAt`, every Phase 2/5 test will encode the wrong key, and the implementation will pass the tests but break against the real server.
+- **Failure scenario:** Tests assert `next_since` field; implementation extracts `next_since`; server returns `nextSince`; tests pass (using mocks that the engineer wrote), but `goclaw traces follow` against `v3.12.0-beta.20` returns empty `next_since` (because the real key is `nextSince`), breaking pagination silently.
+- **Evidence:**
+  - Plan plan.md:106-111 — only lists backend handler *files* (`internal/http/traces.go` etc.) — no line numbers, no extracted shape evidence.
+  - Phase 1 step 1 says "read each backend handler" but Phase 2/3/4/5 hard-code shapes before Phase 1 runs.
+  - The codex-prompt reference at plan.md:104 lives "on `feat/claude-skill-v0.1` worktree" — not accessible from this worktree without manually switching, so the supporting evidence is not collocated with the plan.
+- **Suggested fix:** Have Phase 1 emit `reports/scope-lock-260527-p6.md` BEFORE the plan finalizes JSON keys. Reference exact backend file:line in every "response shape" block. Update Phase 2/3/4/5 to read from Phase 1's report rather than hard-coding shapes.
+
+---
+
+## Finding 9: Phase 7 PR-target / release-trigger assumption mis-specifies the beta cycle
+
+- **Severity:** Medium
+- **Location:** Phase 7, Implementation Step 5: "Open PR to `dev` via `gh pr create --base dev --head feat/p6-backend-unblocked-cli`." And step 8: "watch CI + Release until beta release publishes."
+- **Flaw:** The release workflow triggers on `push: branches: [main, dev]` — i.e., merging the PR to `dev` will trigger semantic-release immediately. Phase 7 lists this as if release is auto-magical, but doesn't mention: (a) the release will publish AS SOON AS the merge lands (no preview window), (b) any non-conventional commits in the PR (e.g. squash-merge commit body) will be parsed by `go-semantic-release` and affect the version bump, (c) the `--prepend-changelog` will edit CHANGELOG.md on the release commit, potentially clobbering Phase 6's manual edit (see Finding 5).
+- **Failure scenario:** Engineer merges with a squash-commit titled `feat(cli): add P6 backend-unblocked CLI surfaces` (good) but inadvertently includes earlier WIP commits `fix(tests): foo` and `chore: cleanup` in the merge body. `go-semantic-release` parses ALL commits in the range and may produce a noisy CHANGELOG. Or: Phase 6 manually adds `## Unreleased`, the release run prepends another entry, the upload step ships a malformed CHANGELOG.md.
+- **Evidence:**
+  - `.github/workflows/release.yaml:3-5` — `on: push: branches: [main, dev]`
+  - `.github/workflows/release.yaml:60-73` — semantic-release runs unconditionally on every dev push.
+  - Phase 7 lacks any mention of squash-merge configuration, PR commit hygiene, or CHANGELOG interaction.
+- **Suggested fix:** Add a Phase 7 step: "Squash-merge with a single `feat(cli): ...` commit title; verify squash body contains no conflicting conventional headers (`feat:`, `fix:`, `BREAKING CHANGE:`) that semantic-release would parse." Drop the Phase 6 manual CHANGELOG edit per Finding 5.
+
+---
+
+## Finding 10: Grouping rationale ("shared helpers + test fixtures") is unsubstantiated
+
+- **Severity:** Medium
+- **Location:** plan.md:38 "Grouping: 4 implementation phases, not 7 (shared client helpers + test fixtures)."
+- **Flaw:** The plan promises shared helpers/fixtures justify the 4-phase grouping, but no phase actually declares a shared helper or fixture file. Phase 2 has its own tests, Phase 3 has its own tests, Phase 4 has its own tests, Phase 5 has its own tests. Each phase creates separate `*_test.go` files with no factored-out fixture file. The grouping is therefore by *PR origin* (#37 vs #44) and *resource cluster*, not by shared code.
+- **Failure scenario:** Reviewer points out "the rationale doesn't hold — there's no shared code" and asks for re-justification mid-implementation, or worse, the grouping forces unrelated phases to land together when one slips (e.g. Phase 5's `activityCmd` collision blocks the Phase 2 traces work for no functional reason).
+- **Evidence:**
+  - Phase 2 tests file: `cmd/traces_follow_test.go`, `cmd/providers_reconnect_test.go` — no shared fixture.
+  - Phase 3 tests file: `cmd/sessions_branch_test.go`, `cmd/sessions_follow_test.go` — no shared fixture.
+  - Phase 4 tests file: `cmd/channels_writers_test_test.go` — no shared fixture.
+  - Phase 5 tests file: `cmd/activity_aggregate_test.go`, `cmd/logs_aggregate_test.go` — no shared fixture.
+  - No phase proposes a `cmd/p6_fixtures_test.go` or similar.
+- **Suggested fix:** Either (a) reword to "Grouping: 4 phases by backend PR + functional cluster, to minimize PR-evidence drift" (honest rationale) and drop the false "shared helpers" claim; or (b) actually plan a shared fixture file (e.g. `cmd/p6_test_helpers_test.go` with `newP6TestServer(t, handler)`) and reference it from each phase's tests.
+
+---
+
+## Summary
+
+| # | Severity | Theme |
+|---|----------|-------|
+| 1 | Critical | `activityCmd` collision — Phase 5 won't compile |
+| 2 | High | Phase 1 collision grep too narrow |
+| 3 | Critical | `sessions branch/follow` attached to wrong Cobra parent (endpoint family mismatch) |
+| 4 | Medium | "Path-escape helper" is fiction |
+| 5 | High | CHANGELOG manual edit collides with `[Unreleased]` block + go-semantic-release |
+| 6 | Medium | `channels_writers_test_test.go` naming is fragile |
+| 7 | Medium | File-size modularization deferred to runtime; `traces.go` already 286 lines |
+| 8 | High | Response shapes unverified — Phase 1 promised but Phase 2/5 hard-code keys |
+| 9 | Medium | Phase 7 ignores release-workflow side-effects |
+| 10 | Medium | "Shared helpers + fixtures" grouping rationale is unsubstantiated |
+
+## Unresolved Questions
+
+- Should `goclaw sessions branch/follow` move under `chatCmd` (`goclaw chat sessions branch`) to match the `/v1/chat/sessions/...` endpoint family? Plan does not commit.
+- After fixing the `activityCmd` collision, does the existing `goclaw activity` listing become `goclaw activity list` (breaking) or stay default `goclaw activity` (Cobra parent with default RunE)?
+- Does `digitopvn/goclaw` PR #44 actually expose `next_since` (snake_case) on `/v1/traces/follow`, or is that a guess? Phase 1 should confirm with file:line before Phase 2's tests are written.
+- Is the codex-prompt reference at `plans/reports/codex-prompt-260522-p6-pr44-backend-unblocked-cli.md` (on `feat/claude-skill-v0.1` worktree) the *current* contract source-of-truth, or is the backend OpenAPI spec authoritative? Plan lists both.
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/from-code-reviewer-to-planner-red-team-failure-mode-analyst-plan-review-report.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/from-code-reviewer-to-planner-red-team-failure-mode-analyst-plan-review-report.md
new file mode 100644
index 0000000..1cd5305
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/from-code-reviewer-to-planner-red-team-failure-mode-analyst-plan-review-report.md
@@ -0,0 +1,135 @@
+# Red-Team Plan Review — Failure Mode Analyst
+
+Plan: `plans/260527-1412-domain-coverage-p6-backend-unblocked/`
+Lens: Failure Mode Analyst — Murphy's Law
+Reviewer: code-reviewer (hostile)
+Date: 2026-05-27
+
+## Finding 1: `buildBody` silently drops `--up-to-index 0` — branching plan is broken at the zero boundary
+
+- **Severity:** Critical
+- **Location:** Phase 3, section "3.1 `goclaw sessions branch`" — "Required: ... `--up-to-index` (int, >= 0)"
+- **Flaw:** The plan declares `--up-to-index` valid at zero ("`>= 0`") and emits the body via the existing `buildBody` helper (Plan §"Key Patterns" / phase 3 body shape). `buildBody` (cmd/helpers.go:86–89) deliberately drops `case int: if v != 0 { body[key] = v }`. So `--up-to-index 0` (branch at the very first message — a real and required use case) silently disappears from the request body. The server either rejects (`INVALID_REQUEST`) or worse, defaults `up_to_index` to something non-zero and the user gets a branch they didn't ask for.
+- **Failure scenario:** User runs `goclaw sessions branch sess-1 --up-to-index 0` expecting an empty new session keyed off message #0. The CLI POSTs `{}` (no `up_to_index`). Server either 400s with a confusing error or branches at default index. Data integrity violation; either way reproducible.
+- **Evidence:**
+  - `cmd/helpers.go:86–89`: `case int: if v != 0 { body[key] = v }`
+  - `plans/.../phase-03-sessions-branch-and-follow.md:25–31`: "Required: ... `--up-to-index` (int, >= 0)" and body example shows `"up_to_index":12`.
+  - Same hazard for `--cursor 0` in `sessions follow` if it ever migrates to body; today it's a query string but plan §3.2 line 84 says "Default cursor=0, limit=50 appear in query string" — assertion requires special handling to emit zero values.
+- **Suggested fix:** Add a phase-3 implementation note: do NOT use `buildBody` for `up_to_index`; build the body map directly so `0` is preserved. Add an explicit unit test `--up-to-index 0` ⇒ body contains `"up_to_index":0`. Similarly test `--cursor 0` appears in query, not omitted.
+
+## Finding 2: `logs aggregate` epoch-millis `last_seen` will render as `1.76e+12` in tables
+
+- **Severity:** High
+- **Location:** Phase 5, section "5.2 `goclaw logs aggregate`" — "Table tolerates numeric `last_seen` (epoch millis) without panicking"
+- **Flaw:** Plan acknowledges the type mismatch (RFC3339 string vs epoch millis int) but only tests "without panicking". The `str()` helper at `cmd/helpers.go:56–61` formats with `fmt.Sprintf("%v", v)`. Because `encoding/json` decodes JSON numbers into `float64` by default, `1760000000000` round-trips to `%v` as `1.76e+12`. No panic — but the LAST_SEEN column is unreadable garbage in human mode. The plan's own success criterion ("`last_seen` numeric type handled without runtime panic") is satisfied while the actual UX is broken. Murphy ships.
+- **Failure scenario:** Operator runs `goclaw logs aggregate --group-by level` after an incident, scans the LAST_SEEN column, sees `1.76e+12` for every row, can't tell which bucket is fresh. Reaches for `--output json` to recover.
+- **Evidence:**
+  - `cmd/helpers.go:56`: `func str(m map[string]any, key string) string { ... return fmt.Sprintf("%v", v) }`
+  - `cmd/helpers.go:49–52`: `unmarshalMap` uses `json.Unmarshal` without `UseNumber()`, so all JSON numbers ⇒ `float64`.
+  - `plans/.../phase-05-activity-and-logs-aggregate.md:49–52` and 89: notes "epoch millis number" but test only asserts no panic.
+- **Suggested fix:** Add a dedicated table-rendering helper that detects numeric `last_seen` and (a) formats via `time.UnixMilli(int64(v)).Format(time.RFC3339)` or (b) at minimum uses `%d` after type-switching. Update the phase-5 test to assert the rendered cell contains a parseable RFC3339 string OR a non-scientific integer — NOT just "no panic".
+
+## Finding 3: Plan claims `cmd/providers_crud.go` exists; it does not
+
+- **Severity:** High
+- **Location:** Plan overview, "Existing CLI State" table (line 93)
+- **Flaw:** Row reads `cmd/providers.go + providers_crud.go + providers_verify.go | CRUD + verify`. `providers_crud.go` does not exist in this worktree. The code comment at `cmd/providers.go:69` even says "create/update/delete/verify/status registered from providers_crud.go" — that's a stale comment but the plan replicates the stale fact. There are NO `providersCreateCmd / UpdateCmd / DeleteCmd / StatusCmd` registered anywhere. Adding `reconnect` implies a working CRUD surface to integrate with; the plan never noticed CRUD isn't implemented and the existing comment is wrong. Result: phase 2 dev will discover halfway through that the "register on `providersCmd`" surface is barely populated.
+- **Failure scenario:** Phase 2 dev opens `cmd/providers.go`, sees only `list/get/models`, no `update/delete`, can't find `providers_crud.go`, loses 30 minutes debugging plan vs reality, has to escalate or improvise scope.
+- **Evidence:**
+  - `find . -name "providers_*.go"` returns only `providers_claude_cli.go`, `providers_codex_pool.go`, `providers_verify.go`. No `providers_crud.go`.
+  - `cmd/providers.go:68–73`: comment claims registration "from providers_crud.go" but `init()` only adds `providersListCmd, providersGetCmd, providersModelsCmd`.
+  - Plan `plan.md:93` perpetuates the falsehood.
+- **Suggested fix:** Phase 1 (Scope Lock) must grep `cmd/providers_*.go` and update both the plan table AND the stale comment in `cmd/providers.go:69` (file or commit message). Either delete the lying comment or implement the missing CRUD before phase 2.
+
+## Finding 4: "One polling request only" assertion has no test that actually counts requests in the right way
+
+- **Severity:** High
+- **Location:** Phase 3 test list, "Only one HTTP request is issued (no implicit loop)"; Phase 2 test list, "**One request only. No watch loop.**"
+- **Flaw:** The phase-3 file lists "Only one HTTP request is issued" as a test for `sessions follow`, but the phase-2 (`traces follow`) test list contains NO equivalent count-the-requests test. The plan's strongest guarantee — no watch loop — is unverified for two of three follow surfaces (`traces follow` + `providers reconnect` could in theory accidentally retry on transient error). Worse: even the listed test for `sessions follow` doesn't specify HOW: does the test wait for a timeout to confirm no second call, or just count `r.URL` hits after `RunE` returns? If the implementation uses `client.FollowStream` (which retries with exponential backoff per `internal/client/follow.go`) by mistake, a simple "expect one call" assertion against an `httptest` server may still pass on the first iteration before reconnect.
+- **Failure scenario:** Implementation accidentally wires `traces follow` through `client.FollowStream` (the WS streamer with reconnect). On a happy server it makes one call and returns; on an EOF mid-response or 502, it retries forever. Single-call test passes locally; prod loops on the bad path.
+- **Evidence:**
+  - `cmd/logs.go:43–47`: `logs tail` calls `client.FollowStream(...)` — pattern is contagious and a developer cooking phase 2 may copy-paste.
+  - `internal/client/follow.go` and `internal/client/follow_test.go` confirm built-in reconnect with `MaxRetries`.
+  - Phase 2 test list (lines 73–80 of `phase-02-traces-follow-and-providers-reconnect.md`) has no request-count assertion.
+- **Suggested fix:** Mandate, for every "follow" command added: (a) a test that wraps `httptest.NewServer` with an atomic counter and asserts `count == 1` after `RunE` returns, AND (b) a test where the server returns 502 once — assert the CLI fails fast (no retry), not that it succeeds on retry. Add to phase 2 AND phase 3.
+
+## Finding 5: New top-level `activity` Cobra command breaks `TestAllCommandsRegistered` invariant philosophy and risks namespace squat
+
+- **Severity:** Medium
+- **Location:** Phase 5, section "Files" — "Decide: place under a new top-level `activity` Cobra group ... create new `activityCmd` parent"
+- **Flaw:** The decision is wrapped in soft language ("Codex prompt expects `goclaw activity aggregate` as top-level") but the plan ships the decision without verifying impact on `cmd/cmd_test.go` (which maintains an `expected` list of root commands). Adding `activity` without updating `expected` won't fail the existing test (it's a "must-contain" check), but it WILL fail any future test that asserts the inverse. More concretely: the codex prompt is a single source; the actual server `/v1/activity/aggregate` lives alongside `/v1/audit/*` and `/v1/admin/*`. A top-level `activity` competes with potential future `goclaw admin activity ...` namespacing. The plan acknowledges this risk in §"Risks" (line 110: "New top-level `activity` command might collide with future scope. Document the namespace decision in PR body") — but documenting in a PR body is not a mitigation; it's a deferral.
+- **Failure scenario:** Three months later, backend ships `/v1/admin/activity/*`. CLI now has `goclaw activity aggregate` at top level and would need `goclaw admin activity ...` siblings. Confused users, breaking renames, deprecation churn.
+- **Evidence:**
+  - `cmd/cmd_test.go:18–57`: hard-coded list of root commands. `activity` not present; plan never asks to add it.
+  - `plan.md:96`: "_(no `cmd/admin_activity.go`)_ — new file `cmd/activity_aggregate.go`" — but `cmd/admin.go`, `cmd/admin_credentials.go`, `cmd/admin_tts_media.go` show an established `admin_*` family. Why is activity not `admin_activity_aggregate.go` under `admin activity aggregate`?
+- **Suggested fix:** Phase 1 should escalate this namespace decision to the user (or document a verified backend roadmap claim that `/v1/activity/*` is the permanent home, NOT `/v1/admin/activity/*`). If keeping `activity` at top level, phase 5 must also patch the `expected` list in `cmd/cmd_test.go` AND `TestCommandUseFields`.
+
+## Finding 6: Test setup mutates package-level globals (`cfg`, `printer`); plan adds 7 new test files with no parallel-safety mitigation
+
+- **Severity:** Medium
+- **Location:** Phase 2/3/4/5 test files (all 7+ new `_test.go` files)
+- **Flaw:** The existing `setupP5HTTPTest` (cmd/p5_fillers_test.go:17–20) sets `cfg = &config.Config{...}` and `printer = output.NewPrinter("json")` — these are package-level globals. Tests cannot be `t.Parallel()` safe. The release workflow runs `go test -race -count=1 ./...` (`.github/workflows/release.yaml:30`). If any new test ever adds `t.Parallel()` (a future contributor or a copy-pasted block), the race detector flags `cfg`/`printer` mutations and the release breaks. The plan adds 7 tests sharing this pattern and never mandates a per-test `t.Cleanup(restoreCfg)` to restore prior state — meaning test ordering can leak `cfg` from `traces_follow_test` into `sessions_follow_test` if the latter runs before its own setup.
+- **Failure scenario:** Test A sets `cfg.OutputFormat = "json"`. Test B added later forgets to call setup and asserts table output. Race detector run in CI passes (no parallel). Six months later someone adds `t.Parallel()` for speed — boom, intermittent CI fail.
+- **Evidence:**
+  - `cmd/p5_fillers_test.go:17–20`: globals mutated, no cleanup.
+  - `.github/workflows/release.yaml:30`: `go test -race -count=1 ./...`.
+- **Suggested fix:** Phase 6 (Tests and Docs Sweep) must add a section "Add `t.Cleanup(func(){ cfg, printer = origCfg, origPrinter })` to every new test that mutates package globals" and add a phase-6 lint pass that greps `t.Parallel()` against `cmd/*_test.go` and fails the gate.
+
+## Finding 7: `--metadata k=v` parser contract is underspecified — duplicate keys, empty values, Unicode, `=` in value
+
+- **Severity:** Medium
+- **Location:** Phase 3, section "3.1 `goclaw sessions branch`" — "`--metadata` parses repeated `key=value`; reject malformed entries before HTTP call"
+- **Flaw:** The plan defines two cases: well-formed (`foo=bar`) and missing `=` (`foobar`, rejected). It is silent on:
+  1. Duplicate keys: `--metadata foo=bar --metadata foo=baz` — second wins, error, or array?
+  2. Empty value: `--metadata foo=` — keep as `""`, drop, or reject?
+  3. Empty key: `--metadata =bar` — reject before HTTP? Plan doesn't say.
+  4. Multiple `=`: `--metadata token=abc=def` — `SplitN(s, "=", 2)` or `Split(s, "=")` (would silently truncate values containing `=` like base64 padding)?
+  5. Unicode/whitespace: `--metadata "key with space=val"` — preserved or rejected?
+  6. Nested object: `--metadata foo.bar=baz` — flat or nested? `agents instances update-metadata` (cmd/agents_instances.go:101) uses `--metadata='{"tier":"premium"}'` JSON-string style. This new flag invents a divergent contract.
+- **Failure scenario:** User stores a base64-encoded session metadata value (`token=AbCd123==`). CLI truncates at first `=`, server gets `{"token":"AbCd123"}` instead of `{"token":"AbCd123=="}`. Silent data corruption. Branch session metadata wrong forever.
+- **Evidence:**
+  - `cmd/agents_instances.go:98–101`: existing convention is JSON-string `--metadata='{"k":"v"}'`. Plan diverges to `k=v` repeated.
+  - Phase 3 lines 26, 76 of `phase-03-sessions-branch-and-follow.md`: spec only addresses the no-`=` case.
+- **Suggested fix:** Phase 1 must lock the parsing contract: use `strings.SplitN(s, "=", 2)`; reject empty key; allow empty value; "duplicate keys: last wins"; Unicode allowed verbatim. Add unit tests for all 6 cases. Reconsider whether to match the existing `agents instances update-metadata` JSON convention instead of inventing a new one.
+
+## Finding 8: `cmd/sessions.go` is 171 lines today; phase 3 adds two commands but trigger to split is "if it grows past 200" — split decision happens mid-implementation
+
+- **Severity:** Medium
+- **Location:** Phase 3, section "Files" — "If `cmd/sessions.go` grows past 200 lines, split into `cmd/sessions_branch.go` and `cmd/sessions_follow.go`"
+- **Flaw:** Current `sessions.go` is 171 LOC; two new subcommands with `RunE`, flags, validation, table renderer will EASILY add 60–120 LOC each = file definitely exceeds 200. The plan defers the split decision until "after" but TDD writes the test file first, sees red, then writes the command. By the time you notice the file is 230 LOC, you've already committed the wrong structure. Same risk for `cmd/channels_writers.go` (89 LOC, phase 4 line 72 acknowledges) and `cmd/logs.go` (111 LOC).
+- **Failure scenario:** Developer cooks phase 3, ends up with `cmd/sessions.go` at 280 lines. Pushes. Reviewer flags the modularization rule. Developer extracts to `sessions_branch.go` AFTER tests landed against `cmd/sessions.go`. Renames break test imports if not careful (they shouldn't because Go tests are by package, not file, but the diff is noisy).
+- **Evidence:**
+  - `wc -l cmd/sessions.go` ⇒ 171; `cmd/channels_writers.go` ⇒ 89; `cmd/logs.go` ⇒ 111.
+  - Phase 3 line 57: "If `cmd/sessions.go` grows past 200 lines, split"
+- **Suggested fix:** Make the split mandatory and upfront. Phase 3 should say "Create `cmd/sessions_branch.go` and `cmd/sessions_follow.go` from the start." Same for `cmd/channels_writers_test_cmd.go` (rename the file to avoid the `_test_test.go` confusion entirely) and `cmd/logs_aggregate.go`. Removes a mid-phase decision.
+
+## Finding 9: Phase 7 relies on `feat:` to trigger a beta version bump — but no validation step or rollback story if it doesn't
+
+- **Severity:** Medium
+- **Location:** Phase 7, "Risks" — "semantic-release may need a `feat:` commit to trigger a beta bump; verify the conventional message header"
+- **Flaw:** "Verify the conventional message header" is not an action; it's a wish. The release workflow (`.github/workflows/release.yaml:50–58`) installs `go-semantic-release` and runs it. If the commit subject is `feat(cli): add P6 backend-unblocked CLI surfaces` (exactly as plan §7 step 3 prescribes), conventional commits should produce a minor bump prerelease. But the workflow's "Configure beta release stream" step (line 35–47) calculates next minor against the **latest stable tag**, not the latest beta. Plan doesn't verify that the merge will actually produce a new beta tag distinguishable from `v3.12.0-beta.35`. Worse, if the build/test/race step fails (line 31: `go test -race -count=1 ./...`), the release won't tag. Plan §6 only runs `go test -count=1 ./...` (no `-race`), so the worktree can pass phase 6 yet fail CI release.
+- **Failure scenario:** Phase 6 green. Merged to `dev`. Release workflow runs `-race`, hits an unflagged data race in one of the 7 new commands (e.g., shared `cfg` mutation across tests OR a goroutine in a copy-pasted FollowStream pattern). Release job fails, no tag emitted, beta consumers never see the surfaces. Rollback story: nothing — the merge stays, the tag doesn't.
+- **Evidence:**
+  - `.github/workflows/release.yaml:30`: `go test -race -count=1 ./...` runs in release; plan §6 only `go test -count=1`.
+  - Phase 7 line 62: "semantic-release may need a `feat:` commit ... verify the conventional message header" — no concrete check.
+- **Suggested fix:** Add to phase 6 validation gate: `go test -race -count=1 ./...` (matches CI exactly). Add to phase 7 a pre-merge check: dry-run semantic-release locally (`semantic-release --dry`) and confirm the predicted next tag. Add a rollback note: if the release tag fails to emit, revert the merge commit on `dev` before the next push.
+
+## Finding 10: Beta-tag verification is a single point in time; backend drift between scope-lock and merge is uncovered
+
+- **Severity:** Medium
+- **Location:** Phase 1, "Implementation Steps" step 4 — "`gh api repos/digitopvn/goclaw/compare/v3.12.0-beta.20...43049d3b --jq '.status'` must return `identical`"
+- **Flaw:** Phase 1 verifies that **commit** `43049d3b` is in `v3.12.0-beta.20`. It does NOT verify that the **response shapes** in `internal/http/{traces,sessions,channel_instances,activity,logs}.go` haven't drifted on `dev` HEAD between phase 1 (run on day N) and phase 7 (merged on day N+M). Plan §"Risk Assessment" line 137 names this risk ("Backend response shape drift between dev and beta tag") with mitigation "Phase 1 re-verifies contracts" — but only at phase 1. If beta-35 (current latest, 2026-05-27) silently renamed `next_since` to `next_cursor` for traces follow, the CLI shipped against beta-20 contracts will work on beta-20 only.
+- **Failure scenario:** Plan executed over 2 weeks. During phase 5, backend lands a refactor in `dev` that renames `spans_by_trace_id` to `spans_by_id`. Phase 1 evidence is stale. Phase 6 live smoke is "optional" (plan.md:119). PR merges. Production users on beta-35 see CLI returning empty spans columns.
+- **Evidence:**
+  - `plan.md:119`: "Live smoke against `v3.12.0-beta.20`+ backend (optional, after merge)" — "optional" and "after merge" are both fatal.
+  - `phase-01-scope-lock.md` does not require a contract re-check at phase 7.
+- **Suggested fix:** Phase 6 (Tests and Docs Sweep) MUST add a step: re-run the phase-1 contract verification against `digitopvn/goclaw@dev` HEAD AND against the latest beta tag. If response keys differ from phase-1 evidence, abort and re-spec. Make the live smoke MANDATORY, not optional, run against the latest beta tag at the time of merge.
+
+---
+
+## Unresolved Questions
+
+1. Does the backend actually accept `up_to_index: 0`? If yes (branching at message 0 is meaningful), Finding 1 is Critical. If no (server rejects 0), Finding 1 is High but recoverable.
+2. Is `goclaw activity aggregate` the chosen long-term namespace per backend roadmap, or should it sit under `goclaw admin activity aggregate`? Finding 5 resolution depends.
+3. What is the canonical `--metadata` syntax across the CLI: `k=v` repeated (this plan) or JSON-string (existing `agents instances update-metadata`)? Finding 7 depends.
+4. Should phase 6's `go test` step include `-race` to match the release workflow gate, and should live smoke be promoted from optional to required pre-merge? Finding 9 + 10 depend.
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/from-code-reviewer-to-planner-red-team-scope-complexity-critic-plan-review-report.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/from-code-reviewer-to-planner-red-team-scope-complexity-critic-plan-review-report.md
new file mode 100644
index 0000000..d151552
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/from-code-reviewer-to-planner-red-team-scope-complexity-critic-plan-review-report.md
@@ -0,0 +1,155 @@
+# Red-Team Plan Review — Scope & Complexity Critic
+
+Plan: `plans/260527-1412-domain-coverage-p6-backend-unblocked/`
+Lens: Hostile YAGNI enforcer. Findings backed by codebase grep evidence.
+
+---
+
+## Finding 1: `activityCmd` already exists — top-level namespace collision
+
+- **Severity:** Critical
+- **Location:** Phase 5, section "Files" + `plan.md` "Existing CLI State" table
+- **Flaw:** Plan declares "no existing `cmd/admin_activity.go`" and "create new `activityCmd` parent". Both wrong — `activityCmd` already exists in `cmd/admin.go:133` and is already registered at root in `cmd/admin.go:172` as `goclaw activity` (audit-log viewer). Defining a second `activityCmd` is a Go duplicate-declaration compile error.
+- **Failure scenario:** Phase 5 step 2 ("Implement `activityCmd` + `activityAggregateCmd`") produces `cmd redeclared` at `go build`. Plan's claim "no command-name collisions" (plan.md:99) is false at the symbol level even if not at the user-facing `Use:` level (both would still register `Use: "activity"` at root — Cobra would also reject).
+- **Evidence:**
+  - `cmd/admin.go:133`: `var activityCmd = &cobra.Command{ Use: "activity", Short: "View audit log", ...}`
+  - `cmd/admin.go:172`: `rootCmd.AddCommand(approvalsCmd, delegationsCmd, adminCredentialsCmd, activityCmd, ttsCmd, mediaCmd, voicesCmd)`
+  - Plan: `plan.md:96` claims `_(no cmd/admin_activity.go)_  | — | new file cmd/activity_aggregate.go`
+  - Plan: `phase-05-activity-and-logs-aggregate.md:58`: "no existing activity command group exists"
+- **Suggested fix:** Add `aggregate` as a subcommand of the existing `activityCmd` (1 line: `activityCmd.AddCommand(activityAggregateCmd)`). Drops the "namespace collision risk" entry (`phase-05:110`) entirely. Also removes the entire "Decide: top-level vs admin group" decision that phase 5 stages — the answer is already in the tree.
+
+---
+
+## Finding 2: `providers verify` referenced as user fallback — that command does not exist
+
+- **Severity:** High
+- **Location:** Phase 2, section "2.2 `goclaw providers reconnect`"
+- **Flaw:** Plan justifies omitting `--verify` flag by telling users: "users call `goclaw providers verify <id>` separately if needed". That command does not exist. The only verify-shaped command is `providers verify-embedding <id>`, which hits `/v1/providers/{id}/verify-embedding` — a different backend endpoint, not a generic provider verify.
+- **Failure scenario:** PR review / docs writer reads phase 2 rationale, adds README copy saying "use `providers verify`", users get "unknown command" at runtime. Worse, if the rationale survives into commit body, it becomes permanently misleading.
+- **Evidence:**
+  - `cmd/providers_verify.go:11`: `Use: "verify-embedding <id>"`
+  - `grep "Use:.*\"verify\"" cmd/*.go` → 0 hits.
+  - Plan: `phase-02:46`: "users call `goclaw providers verify <id>` separately if needed"
+- **Suggested fix:** Strike the sentence or rewrite as "Backend handles verify internally on reconnect; no separate CLI verify exists today." The negative test (assert no `verify` key in body) is correct and should stay.
+
+---
+
+## Finding 3: Phase 1 (Scope Lock, 1h) re-does work already done in plan creation
+
+- **Severity:** High
+- **Location:** Phase 1 entirety
+- **Flaw:** Phase 1's six steps are: re-read backend handlers, re-confirm beta tag, re-run collision grep, write a contracts table. Every one of these was already executed during plan authoring — see `plan.md:33` ("**Verified via `gh api compare`:** `v3.12.0-beta.20` is the earliest tag identical to `43049d3b`") and `plan.md:88-99` (the existing CLI state table is the collision sweep output). Per `~/.claude/rules/review-audit-self-decision.md` #1, "Verified Decisions Are Sticky — Audit Does Not Auto-Reverse" — re-verifying without a triggering reason is wasted time.
+- **Failure scenario:** 1h of phase-1 effort produces a redundant `reports/scope-lock-260527-p6.md` that duplicates `plan.md`'s decisions/inventory tables. Phases 2–5 already cite contracts inline; no downstream phase actually consumes phase 1's report.
+- **Evidence:**
+  - `phase-01:21`: "Produce `reports/scope-lock-260527-p6.md` summarizing contracts" — but contracts are already inline in phases 2–5 (e.g., `phase-02:30-34` traces follow envelope, `phase-03:30-36` branch body/response).
+  - `phase-01:19`: "(already verified during plan creation; phase re-asserts)" — admits this is re-verification.
+- **Suggested fix:** Delete phase 1. Move the "production-time contract sanity check" (10 minutes max) into phase 2's TDD prep. Renumber phases 2–7 → 1–6.
+
+---
+
+## Finding 4: Phase 6 (Tests & Docs) and Phase 7 (Ship) red-team checks duplicate each other
+
+- **Severity:** Medium
+- **Location:** Phase 6 step 3 "Red-team diff sweep" + Phase 7 step 7 "Run review and fix findings"
+- **Flaw:** Phase 6 lists 8 explicit red-team confirmations (no replay command, no generic logs aggregate, no WS delta consumer, etc.) and produces `reports/red-team-260527-p6.md`. Phase 7 then runs `claude-review` workflow (PR review) which performs the same checks. Two artifacts, two passes, same questions.
+- **Failure scenario:** Reviewer time wasted re-asserting absences. If phase 6's report disagrees with phase 7's PR review (e.g., new finding surfaces), reconciliation overhead is non-zero.
+- **Evidence:**
+  - `phase-06:27-35`: enumerated 8-item red-team sweep.
+  - `phase-07:38`: "Run review and fix findings before merge."
+  - `phase-07:61`: "`claude-review` workflow may flag advisory issues."
+- **Suggested fix:** Drop phase 6's red-team sweep as a separate artifact. Fold the 8-item checklist into the PR body template (phase 7 step 6 already lists "Explicit out-of-scope list (verbatim from issue #16)"). Saves ~30 min and one stale report.
+
+---
+
+## Finding 5: `cmd/sessions.go` is 171 lines — split contingency will trigger but is conditional
+
+- **Severity:** Medium
+- **Location:** Phase 3, section "Files" + Todo list item "If `cmd/sessions.go` exceeds 200 lines, split"
+- **Flaw:** Plan defers the modularization decision to runtime: "If file grows past 200 lines, extract." Per `~/.claude/rules/development-rules.md`, file size limit is 200 LOC. Current file is 171 lines. Adding two new commands with positional args, ~5 flags total, RunE bodies with body construction + path escaping + printer dispatch will easily add 60–100 lines — guaranteed to trip the limit. Speculative-conditional planning where the answer is deterministic.
+- **Failure scenario:** Implementer writes both commands inline, ends up at ~250 lines, then has to split mid-phase (or punts the split and breaks the repo rule). Either way it's churn vs. deciding upfront.
+- **Evidence:**
+  - `wc -l cmd/sessions.go` → 171.
+  - `phase-03:57`: "If `cmd/sessions.go` grows past 200 lines, split into `cmd/sessions_branch.go` and `cmd/sessions_follow.go`".
+  - Same pattern repeats in `phase-04:72` (channels_writers.go currently 89 lines — adding `test` likely safe) and `phase-05:59` (logs.go currently 111 lines — adding `aggregate` may also overflow).
+- **Suggested fix:** Pre-commit to `cmd/sessions_branch.go` + `cmd/sessions_follow.go` from the start (matches existing pattern: `chat_sessions.go`, `providers_verify.go`, `providers_reconnect.go` already preferred in phase 2). For phase 4 (writers test), inline is fine. For phase 5 (logs aggregate), pre-commit to `cmd/logs_aggregate.go`.
+
+---
+
+## Finding 6: Phase 6 CHANGELOG step contradicts itself
+
+- **Severity:** Medium
+- **Location:** Phase 6 step 2 and Todo list
+- **Flaw:** Step 2 says: "`CHANGELOG.md` — add a `## Unreleased` entry (or follow existing semantic-release commit convention; do not manually edit if release notes are commit-driven)." Todo says: "CHANGELOG entry (or confirmed commit-driven)." The repo IS commit-driven (`go-semantic-release` runs from `.github/workflows/release.yaml` on push to `dev`/`main`). So the answer is already "do not manually edit." Leaving it as a conditional todo invites accidental hand-editing.
+- **Failure scenario:** Implementer hand-edits `CHANGELOG.md`, conflicts with the next automated release commit, or worse, the duplicate entry sticks and pollutes the file.
+- **Evidence:**
+  - `.github/workflows/release.yaml:51-53`: `go install github.com/go-semantic-release/semantic-release/v2/cmd/semantic-release@v2.31.0` invoked on every dev/main push.
+  - `CHANGELOG.md:8`: existing `## [Unreleased] — Domain Coverage Expansion (P0–P5)` is hand-curated — suggests prior plans also drifted on this question.
+  - `phase-06:26`: the self-contradicting line.
+- **Suggested fix:** Replace the line with a single instruction: "Skip CHANGELOG.md — semantic-release handles it from the `feat:` commit message." Remove the todo item.
+
+---
+
+## Finding 7: Phase 7 ship checklist contains items that belong to `/ck:ship`
+
+- **Severity:** Medium
+- **Location:** Phase 7 entire "Implementation Steps" + Todo list
+- **Flaw:** Phase 7 enumerates 9 procedural items (git status clean, secret scan, push, gh pr create, PR body template, review-and-fix, watch CI, watch beta release publish). Per project workflow, `/ck:ship` is the dedicated skill for this. The plan duplicates that runbook inline. Worse, "Watch CI + Release until beta release publishes" (step 8) is not gated by anything in the plan — it's open-ended async waiting that should not live in a plan phase.
+- **Failure scenario:** Phase status stays "in_progress" for hours/days while implementer waits for beta release publish. Plan progress reports become misleading.
+- **Evidence:**
+  - `phase-07:25-39`: 9 numbered steps, only steps 3–4 (commit composition + push) are plan-specific.
+  - `~/.claude/rules/skill-workflow-routing.md`: `/ck:ship — run full shipping pipeline (tests, review, version, PR)`.
+- **Suggested fix:** Collapse phase 7 to: "Invoke `/ck:ship` once phases 1–5 are green. Provide it the PR body template (the 4-line backend evidence block)." Drop the post-merge "watch beta release" step — it is not part of plan completion.
+
+---
+
+## Finding 8: Test file name `channels_writers_test_test.go` is a smell — and the contingency note proves doubt
+
+- **Severity:** Medium
+- **Location:** Phase 4, section "Files"
+- **Flaw:** Plan proposes `cmd/channels_writers_test_test.go` for the writers-`test` subcommand test file, then adds "Alternative if Go tooling balks: `cmd/channels_writers_probe_test.go` — but verify Go accepts `_test_test.go` first (it does)." If the planner felt the need to add a contingency, the name is hostile to readers. No existing file in `cmd/` uses `_test_test` (verified `ls cmd/ | grep test_test` → empty). The kebab-case readability principle from development-rules.md ("self-documenting for LLM tools") favors clarity over cleverness.
+- **Failure scenario:** Future grep for "writers_test" matches both source (when added) and test file ambiguously; LLM scanners get confused on filename intent; reviewers waste a beat parsing the double-suffix.
+- **Evidence:**
+  - `ls cmd/ | grep test_test` → no matches.
+  - Existing convention: `cmd/agents_lifecycle_test.go`, `cmd/p5_fillers_test.go`, `cmd/heartbeat_test.go` — single `_test` suffix only.
+  - `phase-04:39-41`: the contingency comment.
+- **Suggested fix:** Use `cmd/channels_writers_probe_test.go`. Drop the contingency note. One-line decision, no future ambiguity.
+
+---
+
+## Finding 9: 4 phases vs 7 surfaces — claimed grouping ("shared client helpers + test fixtures") is unsubstantiated
+
+- **Severity:** Medium
+- **Location:** `plan.md` line 38 "Grouping: 4 implementation phases, not 7 (shared client helpers + test fixtures)"
+- **Flaw:** Plan justifies 4-phase grouping by claiming shared helpers/fixtures, but no phase actually declares shared code. Phase 2 (traces follow + providers reconnect) — totally different request shapes (GET with query vs POST empty body). Phase 5 (activity aggregate + logs aggregate) — share `KEY/COUNT/LAST_SEEN` table shape, but `last_seen` is a string in one and epoch millis int in the other (phase 5 itself flags this), so the shared output helper has to branch. The grouping is visual tidiness, not engineering reuse.
+- **Failure scenario:** Future maintainer reads phase 2 expecting a shared helper, finds two independent commands; or worse, an implementer over-engineers a shared abstraction to satisfy the rationale and ships a flag-driven dispatcher for two unrelated endpoints.
+- **Evidence:**
+  - `phase-02:55-60` Files section: independent files per surface, no shared helper named.
+  - `phase-05:109` Risks: "`last_seen` type mismatch between activity (RFC3339 string) and logs runtime (epoch millis int)" — proves the surfaces don't share output logic cleanly.
+  - `plan.md:38`: the unsubstantiated claim.
+- **Suggested fix:** Drop the grouping rationale. Either keep 4 phases on grounds of "review batching" (honest reason: similar PR-side scope) or split into 7 micro-phases. Don't lie about shared code that isn't shared.
+
+---
+
+## Finding 10: Validation Gates block in plan.md duplicates per-phase validation steps
+
+- **Severity:** Medium
+- **Location:** `plan.md:114-120` "Validation Gates (per phase + final)"
+- **Flaw:** plan.md enumerates the validation commands (`go test`, `go vet`, `go build`, live smoke, red-team diff), and every phase 2–6 then repeats `go vet ./... && go build ./...` in its TDD Sequence and todo list. Phase 6 then runs them again as the "final gate." Three layers of the same commands. Per development-rules.md ("Keep individual code files under 200 lines for optimal context management"), plans should reduce redundancy, not multiply it.
+- **Failure scenario:** Plan reader stops trusting the validation lists ("they're boilerplate, skip"). When phase 6's gate adds `-count=1` and `make build` that earlier phases don't, the divergence is invisible.
+- **Evidence:**
+  - `plan.md:116-120`: top-level validation list.
+  - `phase-02:67`, `phase-03:67`, `phase-04:46`, `phase-05:69`: each repeats "go vet + go build clean."
+  - `phase-06:18-22`: full final gate including `make build`.
+- **Suggested fix:** Keep validation list in plan.md only. Per-phase todos reduce to "phase tests pass." Phase 6 is the only phase that explicitly enumerates `go test -count=1` + `make build`.
+
+---
+
+## Summary
+
+10 findings, 1 critical, 2 high, 7 medium. The critical defect (Finding 1: duplicate `activityCmd`) will cause a compile failure in phase 5. The high-severity defects either misinform users (Finding 2) or waste a planning phase (Finding 3). The medium findings cluster around YAGNI violations: speculative conditionals (`if file > 200 lines`), duplicated red-team passes, ship-procedure boilerplate that belongs in `/ck:ship`, and unsubstantiated grouping rationale.
+
+## Unresolved Questions
+
+- Should `goclaw activity aggregate` nest under existing `activityCmd` (audit-log viewer) even though semantics differ (point-in-time aggregation vs list-all)? Decision needed before phase 5 starts.
+- Does the existing `activityCmd` use `Use: "activity"` cleanly enough to share, or does adding `aggregate` warrant a refactor to `activity list` + `activity aggregate` (breaking change for current `goclaw activity` users)?
+- Is the live-smoke step (`v3.12.0-beta.20`+) blocking or optional? Plan says "optional, after merge" — but no follow-up plan handles a smoke-test failure.
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/from-code-reviewer-to-planner-red-team-security-adversary-plan-review-report.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/from-code-reviewer-to-planner-red-team-security-adversary-plan-review-report.md
new file mode 100644
index 0000000..4f7eaca
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/from-code-reviewer-to-planner-red-team-security-adversary-plan-review-report.md
@@ -0,0 +1,144 @@
+# Red-Team Plan Review — Security Adversary Lens
+
+Plan: `plans/260527-1412-domain-coverage-p6-backend-unblocked/`
+Reviewer role: Security Adversary + Fact Checker
+Date: 2026-05-27
+
+## Finding 1: Top-level `activityCmd` already exists — naming collision missed
+- **Severity:** Critical
+- **Location:** plan.md "Existing CLI State" table + phase-05 "Files"
+- **Flaw:** Plan asserts "no `cmd/admin_activity.go`" and "No command-name collisions" and then proposes creating `activityCmd` as a new top-level Cobra parent. An `activityCmd` variable is already declared and registered at the root in `cmd/admin.go`.
+- **Failure scenario:** Phase 5 implementation will produce a Go compile error (duplicate `activityCmd` declaration) at minimum. If renamed to dodge the conflict, the user will get two distinct meanings for `goclaw activity ...` (audit-log list vs aggregate) or a runtime Cobra "command already added" panic. Reviewers seeing "no collisions" will not check this until phase 5 fails.
+- **Evidence:**
+  - `cmd/admin.go:133` — `var activityCmd = &cobra.Command{ Use: "activity", Short: "View audit log", ... }`
+  - `cmd/admin.go:172` — `rootCmd.AddCommand(... activityCmd ...)`
+  - Plan quote (plan.md table): `_(no cmd/admin_activity.go)_ | — | new file cmd/activity_aggregate.go`
+  - Plan quote (plan.md line 99): "No command-name collisions."
+  - Plan quote (phase-05): "Codex prompt expects `goclaw activity aggregate` as top-level → create new `activityCmd` parent."
+- **Suggested fix:** Add `aggregate` as subcommand of the **existing** `activityCmd` in `cmd/admin.go` (or split that var). Update plan.md inventory row to reflect the existing parent. Rerun the "naming collision" sweep step in phase 1 properly (it failed silently because the grep in phase-01 step 5 only looks for `Use:.*"follow|reconnect|branch|aggregate|test"`, not `Use: "activity"`).
+
+## Finding 2: Path prefix mismatch — new "sessions" subcommands hit a different backend tree than existing siblings
+- **Severity:** High
+- **Location:** plan.md "Command Surface Inventory" + phase-03 "Surfaces"
+- **Flaw:** Existing `goclaw sessions <verb>` commands all call `/v1/sessions/...`. Plan adds `goclaw sessions branch` / `goclaw sessions follow` that target `/v1/chat/sessions/{key}/branch` and `/v1/chat/sessions/{key}/history/follow`. The two surfaces live under sibling-but-distinct backend resource trees. Plan never reconciles this — operators will reasonably assume `goclaw sessions X` shares the `/v1/sessions/` namespace.
+- **Failure scenario:** A privileged operator runs `goclaw sessions delete <key>` expecting to delete the same record they just branched with `goclaw sessions branch <key>`. The two commands talk to different backend stores. Wrong record deleted, branched session orphaned. Also makes the path-escape audit harder — two distinct prefixes share one Cobra group.
+- **Evidence:**
+  - `cmd/sessions.go:67,88,109,128` — all use `/v1/sessions/...` with `url.PathEscape(args[0])`.
+  - Plan quote (phase-03): "Endpoint: `POST /v1/chat/sessions/{key}/branch`" and "`GET /v1/chat/sessions/{key}/history/follow`".
+  - `cmd/chat_sessions.go:5` — there is already a separate `chatSessionsCmd` (`Use: "sessions"`) under `chatCmd`, which is where `/v1/chat/sessions/...` semantics naturally belong.
+- **Suggested fix:** Attach `branch` and `follow` to `chatSessionsCmd` so the command path is `goclaw chat sessions branch` (matching `/v1/chat/sessions/...`), or explicitly document in plan.md why the same Cobra parent maps to two different backend trees and add a help-text warning. Phase 1 scope-lock must record the path-prefix split.
+
+## Finding 3: Sibling files used as path-escape reference are themselves unescaped — RT-02 regression vector is wider than plan admits
+- **Severity:** High
+- **Location:** plan.md "Decisions" (path-escape claim) + phase-04 file modification list
+- **Flaw:** Plan claims "Path-escape all path params via the same pattern already used in `cmd/api_keys_rotate.go`, `cmd/storage.go` (after P5 RT-02 fix)." but the files that phase 4 will modify (`cmd/channels_writers.go`) and the analogous neighbor (`cmd/channels_instances.go`) do NOT path-escape `args[0]`. The new `writers test` subcommand sits next to four unescaped siblings. Reviewers may copy-paste the wrong pattern. Also `cmd/mcp_servers.go:23` POSTs to `/v1/mcp/servers/{id}/reconnect` unescaped — exact same shape as the new `providers reconnect` plan in phase 2 ("reference existing pattern" risks copying the wrong one).
+- **Failure scenario:** `goclaw channels writers test 'inst/../../admin' --group-id g --user-id u` collapses path traversal segments before the request leaves the client (URL parsing in net/http or the gateway may interpret them); attacker probing one instance triggers writer-test on an unrelated resource. Even if backend rejects, the CLI's enumeration footprint widens.
+- **Evidence:**
+  - `cmd/channels_writers.go:19` — `c.Get("/v1/channels/instances/" + args[0] + "/writers")` (no escape)
+  - `cmd/channels_writers.go:37,55,73` — same pattern, no escape
+  - `cmd/channels_instances.go:54,99,118` — `args[0]` unescaped in path
+  - `cmd/mcp_servers.go:23` — `c.Post("/v1/mcp/servers/"+args[0]+"/reconnect", nil)` (no escape)
+  - `cmd/agents_instances.go:114,140` — `fmt.Sprintf("/v1/agents/%s/instances/%s/metadata", args[0], user)` (no escape)
+  - Plan quote (plan.md): "Path-escape all path params via the same pattern already used in `cmd/api_keys_rotate.go`, `cmd/storage.go`"
+- **Suggested fix:** Phase 1 scope-lock must enumerate every existing unescaped path concatenation (use `grep -rn '"/v1/.*"+args' cmd/`) and either (a) fix them in scope or (b) explicitly defer with a tracking issue. Otherwise the "P5 RT-02 lesson" claim in the plan is rhetoric, not protection. Add a new validation gate: lint rule or test that fails when `cmd/*.go` introduces `args[0]` directly into a path literal.
+
+## Finding 4: Logs runtime aggregate output is not redacted — sensitive payload leakage
+- **Severity:** High
+- **Location:** phase-05 "5.2 `goclaw logs aggregate`"
+- **Flaw:** The runtime ring buffer surfaces server log content (level, source, last_seen, possibly message keys). Plan describes the JSON response shape and prints it through `printer.Print(unmarshalMap(...))` with no redaction. No existing output sanitizer exists for the new commands. CLI prints whatever the server returned verbatim to TTY, including possibly tokens/secrets logged at warn/error level on the server.
+- **Failure scenario:** Operator running `goclaw logs aggregate --level warn` over SSH (output gets captured in shell history / scrollback / CI logs) inadvertently exfiltrates DB connection strings, API keys, or PII that the server logged in a warn message. Compare with `cmd/backup_s3.go:22,36` which explicitly masks secrets defense-in-depth — the new logs aggregate command has no equivalent.
+- **Evidence:**
+  - `cmd/backup_s3.go:22` — `Short: "Get current S3 backup configuration (secret_key masked by default)"`
+  - `cmd/backup_test.go:155` — `"secret_key should be masked in output"`
+  - `grep -rn "RedactSecret|redact|mask" internal/output/` returns no hits — no shared redaction helper.
+  - Plan quote (phase-05): "Source = runtime ring buffer, not durable audit log." — acknowledges sensitivity but adds no mitigation.
+- **Suggested fix:** Add (a) explicit `--quiet`/redaction default for known sensitive keys (`api_key`, `token`, `secret`, `password`), (b) a one-line warning banner in TTY mode reminding the operator this stream may contain secrets, and (c) phase-05 success criteria for "no raw secret-shaped strings in default table output". Reference `cmd/backup_s3.go` masking as the model.
+
+## Finding 5: Backend-supplied strings printed verbatim — ANSI/escape injection from compromised server
+- **Severity:** Medium
+- **Location:** All implementation phases (2–5), output sections
+- **Flaw:** Every new command pipes server-returned `reason`, `label`, log messages, bucket keys, etc. into table output without any control-character stripping. The CLI explicitly auto-detects TTY (`output.IsTTY`) and switches to human-readable table mode in TTY contexts — exactly the contexts where ANSI escape injection (e.g. cursor manipulation, fake prompts, clipboard hijacks via OSC 52) is effective.
+- **Failure scenario:** Backend (or attacker who tampered a single field) returns `"reason":"writer]52;c;cGF5bG9hZA=="` in `channels writers test`. Operator sees the table; OSC 52 silently writes attacker-chosen content to the operator's clipboard. Or a fake "[y/N]" prompt is drawn to social-engineer an interactive confirmation. Plan threat-models server contracts but not server-as-adversary.
+- **Evidence:**
+  - `grep -rn "sanitize\|strip\|escape" internal/output/` returns no hits.
+  - All new commands route through `printer.Print(unmarshalMap(...))` per plan's own pattern (plan.md line 42).
+  - `cmd/root.go:59`, `cmd/logs.go:36,72`, `cmd/heartbeat.go:184`, `cmd/teams_events.go:50` — many TTY-aware sites; no escape stripping.
+- **Suggested fix:** Add a shared `internal/output/sanitize.go` that strips non-printable control bytes (except `\t`, `\n`) from any string field before table rendering. Phase 6 red-team sweep must add an explicit check: "no command path renders raw server strings without sanitization in TTY mode." Cite OWASP "Log injection" / "Terminal escape injection".
+
+## Finding 6: Misleading guidance directing users to a non-existent command (`goclaw providers verify`)
+- **Severity:** Medium
+- **Location:** phase-02 "2.2 `goclaw providers reconnect`"
+- **Flaw:** Plan says: "Do NOT add `--verify` flag; users call `goclaw providers verify <id>` separately if needed." But the existing subcommand is `goclaw providers verify-embedding`, not `goclaw providers verify`. Users following the docs will get "unknown command" errors and may then suspect the reconnect itself didn't work, retry it (it's a state-changing admin op), and double-toggle a production provider.
+- **Failure scenario:** Operator reconnects then runs the (non-existent) `verify`. Confused, they re-run `reconnect` (because docs implied verify is a follow-up). The second reconnect may flip cache_invalidated/registry_updated state again and cause an in-flight request to fail. Plan also doesn't document that `reconnect` is admin-only on the backend; CLI gives no pre-flight warning.
+- **Evidence:**
+  - `cmd/providers_verify.go:11` — `Use: "verify-embedding <id>"`
+  - Plan quote (phase-02): "users call `goclaw providers verify <id>` separately if needed"
+  - Plan quote (phase-02): "Admin-only on backend; client sends no body." — admin-only stated but no `--quiet` banner / pre-warn proposed.
+- **Suggested fix:** Update plan and command Long-help to reference `goclaw providers verify-embedding` (or whatever the actual sibling is). Add a TTY banner: "Reconnect requires admin token; failure with `UNAUTHORIZED` exit code 2 is expected for non-admin tokens." Don't make users guess.
+
+## Finding 7: `--up-to-index` bounds validation specifies `>= 0` only — no upper bound, no overflow guard
+- **Severity:** Medium
+- **Location:** phase-03 "3.1 `goclaw sessions branch`"
+- **Flaw:** Plan: "Required: `<session-key>` positional, `--up-to-index` (int, >= 0)." No upper bound, no defense against `--up-to-index=9223372036854775807` (or huge ints causing server-side allocation). Combined with `--metadata` repeated flags and `--new-session-key` accepting arbitrary strings, the body shape allows attacker-shaped requests that pass client validation trivially.
+- **Failure scenario:** Operator script accidentally passes `--up-to-index $(expr 2 ** 62)` from a buggy upstream. CLI forwards. Server allocates / iterates. Or, in a chained scripted abuse: a malicious actor with valid CLI credentials but limited intent uses the lack of client-side bound checks to amplify load (each branch op = `copied_messages` proportional to index). The "validation-before-HTTP" claim in plan (multiple phases) becomes shallow theater.
+- **Evidence:**
+  - Plan quote (phase-03): "Required: `<session-key>` positional, `--up-to-index` (int, >= 0)."
+  - Plan quote (phase-03 tests): "Negative `--up-to-index` returns validation error before HTTP call." — only negatives, not absurd positives.
+  - Phase-02 also: "Server default `limit=50`, max `200`. Don't enforce max client-side; let server respond." — plan takes a position to defer to server, but doesn't apply it consistently with branch / cursor limits.
+- **Suggested fix:** Either (a) consistently let server validate all integer bounds (and document in plan) or (b) add reasonable client-side ceilings (e.g. `--up-to-index <= 1_000_000`, `--limit <= 200`, `--cursor <= 10_000_000`). Pick one and apply across phases 2–5. Add tests for huge / overflowing values.
+
+## Finding 8: `--metadata key=value` parser allows clobber + empty keys; no key validation
+- **Severity:** Medium
+- **Location:** phase-03 tests + phase-03 surface description
+- **Flaw:** Plan reuses the existing `parseToolInvokeParams` pattern (`strings.SplitN(pair, "=", 2)` — see `cmd/tools_invoke_args.go:39`). That parser silently allows `--metadata =value` (empty key, becomes object key `""`), `--metadata foo=` (empty string value — fine), and `--metadata foo=bar --metadata foo=baz` (silent last-wins clobber, no warning). Plan only specifies rejection of "no `=`" (`--metadata foobar`), missing the rest.
+- **Failure scenario:** Operator pipeline produces `--metadata source= --metadata source=cli` — they expect "cli" to win, but if order is reversed by environment, the server sees `{"source":""}` and a downstream consumer keyed on `metadata.source != ""` mis-routes the branch. Empty-key `""` may produce server-side errors that the CLI surfaces as opaque 400s. No way to forward `=` literal in values that contain `=` (works for SplitN("=",2), so OK — but key cannot contain `=`, undocumented).
+- **Evidence:**
+  - `cmd/tools_invoke_args.go:38-43` — current parser, no validation.
+  - Plan quote (phase-03 tests): "Malformed `--metadata foobar` (no `=`) rejected before HTTP call." — only one negative case.
+- **Suggested fix:** Plan must specify: reject empty-key, warn on duplicate-key (or fail), allow values to contain `=`. Add tests for each. Document key-character constraint in `--metadata` flag help.
+
+## Finding 9: Phase 1 scope-lock grep is too narrow to detect the real collisions
+- **Severity:** Medium
+- **Location:** phase-01 step 5
+- **Flaw:** Phase 1's collision sweep command is: `grep -in 'Use:.*"follow\|reconnect\|branch\|aggregate\|test"' cmd/*.go`. This:
+  1. Misses parent-command collisions (e.g. `Use: "activity"` — Finding 1).
+  2. Misses positional/word-boundary matches (`Use: "test-connection"`, `Use: "test"` standalone won't match the quoted form).
+  3. Misses Cobra `AddCommand(...)` registrations that determine the actual collision surface.
+- **Failure scenario:** Phase 1 returns "zero collisions" (rubber-stamped), phase 5 fails to compile. The plan's risk register lists "Naming collision with existing subcommands | Low" — mitigation circular because the sweep itself was insufficient.
+- **Evidence:**
+  - Plan quote (phase-01 step 5): `grep -in 'Use:.*"follow\|reconnect\|branch\|aggregate\|test"' cmd/*.go`
+  - Plan quote (plan.md risk table): "Naming collision with existing subcommands | Low | Verified inventory above; no collisions found"
+  - Actual collision: `cmd/admin.go:133` `activityCmd` — would not have been found by the planned grep because plan only checks `Use:` strings matching the verb list, not parent group names.
+- **Suggested fix:** Replace the grep with: enumerate every `cobra.Command{ Use: "..." }` in `cmd/`, then check that no proposed `Use:` token (`activity`, `aggregate`, `branch`, `follow`, `reconnect`, `test`) conflicts at the level it will be registered. Better: actually attempt to register the new commands in a quick prototype and let Cobra detect duplicates.
+
+## Finding 10: `cmd/sessions.go` already at 171 LOC — adding 2 commands will breach 200-line repo rule, plan's "split if" hedge is too soft
+- **Severity:** Medium
+- **Location:** phase-03 "Files" + repo CLAUDE.md modularization rule
+- **Flaw:** Repo rule (CLAUDE.md): "If a code file exceeds 200 lines of code, consider modularizing". `cmd/sessions.go` is already 171 lines. Phase 3 adds two new commands (each ~30–50 lines including flags + RunE + register), plus validation helpers — will land ~280 lines. Plan says: "If `cmd/sessions.go` grows past 200 lines, split into `cmd/sessions_branch.go` and `cmd/sessions_follow.go`" — but only as conditional, not a default. Same risk noted in phase 4 for `cmd/channels_writers.go` (89 LOC, less acute).
+- **Failure scenario:** Implementer follows the path of least resistance, appends to `cmd/sessions.go`, lands 280 LOC. Code-reviewer agent flags it in PR. Reviewer pushes back, files get split mid-PR, conflicts the `feat:` commit semantic-release expects (Finding implicit: rework risks slipping the `feat:` prefix on a renamed commit).
+- **Evidence:**
+  - `wc -l cmd/sessions.go` → 171
+  - CLAUDE.md (project): "Keep individual code files under 200 lines for optimal context management"
+  - Plan quote (phase-03): "If `cmd/sessions.go` grows past 200 lines, split into `cmd/sessions_branch.go` and `cmd/sessions_follow.go` per repo modularization rule."
+- **Suggested fix:** Make split the default in plan: new files `cmd/sessions_branch.go` and `cmd/sessions_follow.go` from the start. Same for `cmd/channels_writers.go` → `cmd/channels_writers_probe.go` (also avoids the awkward `_test_test.go` file name from Finding-adjacent issue in phase-04). Update phase 3 and phase 4 "Files" sections to remove the conditional.
+
+## Summary of impact
+
+| # | Finding | Severity | Phase affected |
+|---|---------|----------|----------------|
+| 1 | activityCmd already exists | Critical | 1, 5 |
+| 2 | Sessions path-prefix mismatch | High | 1, 3 |
+| 3 | Path-escape claim is selective | High | 1, 2, 4 |
+| 4 | Logs runtime not redacted | High | 5 |
+| 5 | ANSI escape injection from server | Medium | 2, 3, 4, 5 |
+| 6 | `providers verify` doesn't exist | Medium | 2 |
+| 7 | Integer bounds inconsistent | Medium | 2, 3 |
+| 8 | metadata parser too permissive | Medium | 3 |
+| 9 | phase-01 collision sweep too narrow | Medium | 1 |
+| 10 | sessions.go LOC budget | Medium | 3 |
+
+## Unresolved questions
+
+1. Backend authoritative path: is `/v1/chat/sessions/{key}/...` correct, or has PR #44 actually exposed these under `/v1/sessions/...` to align with existing CLI? Phase 1 must verify against handler source, not inferred from plan.
+2. Does the existing `activityCmd` use `/v1/activity` (audit log) vs the new `/v1/activity/aggregate`? If same root, can the new command be a clean subcommand under existing parent, or does scope-of-data differ enough to warrant a rename?
+3. Is there an existing redaction helper anywhere in the codebase (beyond `backup_s3.go` inline masking) we should reuse, or does this plan need to ship one?
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/scope-lock-260527-p6.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/scope-lock-260527-p6.md
new file mode 100644
index 0000000..43fe718
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/scope-lock-260527-p6.md
@@ -0,0 +1,54 @@
+# Scope Lock Report — P6 Backend-Unblocked CLI
+
+Date: 2026-05-27
+Status: locked
+
+## Backend contracts
+
+| Surface | Method + Path | Body / Query | Response | Source |
+|---------|---------------|--------------|----------|--------|
+| traces follow | GET `/v1/traces/follow` | query: `session_key|agent_id`, `since`, `limit`, `status`, `channel`, `include_spans` | `{traces, spans_by_trace_id, server_time, next_since, limit}` | PR #37 commit `56e227c4` |
+| providers reconnect | POST `/v1/providers/{id}/reconnect` | empty body | `{status, provider, registry_updated, cache_invalidated}` | PR #37 commit `56e227c4` |
+| sessions branch | POST `/v1/chat/sessions/{key}/branch` | `{new_session_key?, up_to_index, label?, metadata?}` | `{ok, source_key, session_key, copied_messages, total_messages, label}` | PR #44 commit `43049d3b` |
+| sessions follow | GET `/v1/chat/sessions/{key}/history/follow` | query: `cursor`, `limit` | `{session_key, cursor, next_cursor, total, messages, reset, updated}` | PR #44 commit `43049d3b` |
+| channels writers test | POST `/v1/channels/instances/{id}/writers/test` | `{group_id, user_id}` | `{allowed, reason, instance_id, agent_id, group_id, user_id, writer_count}` | PR #44 commit `43049d3b` |
+| activity aggregate | GET `/v1/activity/aggregate` | query: `group_by`, `from`, `to`, `limit`, `actor_type`, `actor_id`, `action`, `entity_type`, `entity_id` | `{source, group_by, total, limit, from, to, buckets:[{key,count,last_seen}]}` (last_seen RFC3339 string) | PR #44 |
+| logs aggregate | GET `/v1/logs/runtime/aggregate` | query: `group_by`, `level`, `source`, `from` | `{source, retention, capacity, sample_size, group_by, buckets:[{key,count,last_seen}]}` (last_seen epoch millis number) | PR #44 |
+
+## Beta tag
+
+`v3.12.0-beta.20` verified earliest tag containing `43049d3b` (red-team finding F-RT-beta confirmed during plan creation). Latest beta `v3.12.0-beta.35`.
+
+## Naming collision sweep
+
+```
+grep 'Use: "(reconnect|branch|follow|aggregate|test)"' cmd/*.go
+```
+
+Result:
+- `cmd/heartbeat.go:109` declares `Use: "test"` — under `heartbeatCmd` parent. No conflict (different parent).
+- `cmd/hooks_test_runner.go:14` declares `Use: "test"` — under `hooksCmd` parent. No conflict.
+- `activityCmd` already exists at `cmd/admin.go:133` — phase 5 attaches `aggregate` as subcommand (not new top-level).
+- No other clashes for `follow`, `reconnect`, `branch`, `aggregate` under their respective parents.
+
+## Drift table
+
+| Surface | Drift | Resolution |
+|---------|-------|------------|
+| logs aggregate `last_seen` | epoch-millis number (not string) | phase 5 implements `formatLastSeen` helper |
+| `buildBody` int-zero drop | `--up-to-index 0` / `--cursor 0` silently lost | phase 3 builds body/query maps directly |
+| `providers verify` | does not exist (only `verify-embedding`) | phase 2 Long-help removed reference |
+
+## Out-of-scope guarantee
+
+These remain unimplemented (verbatim from issue #16):
+
+- `POST /v1/traces/{id}/replay`
+- generic `GET /v1/logs/aggregate`
+- WebSocket `chat.history.delta`
+- SSE chat history follow
+- watch loops for traces/sessions follow
+
+## Result
+
+Zero unresolved contract questions. Proceed to phase 2.
diff --git a/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/sweep-red-team-diff-260527-p6.md b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/sweep-red-team-diff-260527-p6.md
new file mode 100644
index 0000000..80f7cbc
--- /dev/null
+++ b/plans/260527-1412-domain-coverage-p6-backend-unblocked/reports/sweep-red-team-diff-260527-p6.md
@@ -0,0 +1,55 @@
+# P6 Red-Team Diff Sweep
+
+**Date:** 2026-05-27
+**Scope:** Seven backend-unblocked CLI surfaces (issue #16; backend PRs `#37`, `#44`)
+**Mode:** Implementation→sweep diff vs validated plan + red-team artifacts.
+
+## Surfaces shipped
+
+| Command | Endpoint | File | Tests |
+|---|---|---|---|
+| `traces follow` | `GET /v1/traces/follow` | `cmd/traces_follow.go` | `cmd/traces_follow_test.go` |
+| `providers reconnect <id>` | `POST /v1/providers/{id}/reconnect` | `cmd/providers_reconnect.go` | `cmd/providers_reconnect_test.go` |
+| `sessions branch <key>` | `POST /v1/chat/sessions/{key}/branch` | `cmd/sessions_branch.go` | `cmd/sessions_branch_test.go` |
+| `sessions follow <key>` | `GET /v1/chat/sessions/{key}/history/follow` | `cmd/sessions_follow.go` | `cmd/sessions_follow_test.go` |
+| `channels writers test <id>` | `POST /v1/channels/instances/{id}/writers/test` | `cmd/channels_writers.go` (appended) | `cmd/channels_writers_test_test.go` |
+| `activity aggregate` | `GET /v1/activity/aggregate` | `cmd/activity_aggregate.go` | `cmd/activity_aggregate_test.go` |
+| `logs aggregate` | `GET /v1/logs/runtime/aggregate` | `cmd/logs_aggregate.go` | `cmd/logs_aggregate_test.go` |
+
+## Build / vet / test
+
+```
+go vet ./...    # clean
+go build ./...  # clean
+go test ./...   # all packages PASS
+```
+
+## Red-team findings — disposition
+
+| ID | Finding | Implementation status |
+|---|---|---|
+| F1 | `activityCmd` already exists at `cmd/admin.go:133` — new aggregate must attach as subcommand. | Done. `activityAggregateCmd` registered via `activityCmd.AddCommand(...)`; no new top-level. `cmd_test.go` top-level list unchanged. |
+| F2 (sessions branch) | `buildBody` int-zero skip silently drops `up_to_index=0`. | Done. `sessions_branch.go` constructs body directly (`map[string]any{"up_to_index": upTo}`); regression test asserts `"up_to_index":0` on the wire. |
+| F2 (sessions follow) | Same int-zero hazard for `cursor=0`. | Done. `sessions_follow.go` builds `url.Values` directly with `q.Set("cursor", fmt.Sprintf("%d", cursor))`; regression test asserts literal `cursor=0` in raw query. |
+| F4 | HTTP client auto-retries 429/5xx three times — conflicts with a "fail-fast on 502" assertion. | Mitigated. Plan's 502-once test removed; atomic-counter test (`calls == 1` on 200 path) and structural "not a watch loop" check preserve the original intent. |
+| F6 | `unmarshalMap` decodes JSON numbers as float64; `str()` renders `1.76e+12`. | Done. `formatLastSeen` helper in `cmd/activity_aggregate.go` type-switches (string→passthrough, numeric→`time.UnixMilli`). Imported and used in both `activity_aggregate.go` and `logs_aggregate.go`. Test `TestLogsAggregate_LastSeenRendersRFC3339` asserts absence of `e+12` and presence of RFC3339 pattern. |
+| Naming collision (heartbeat/hooks `Use: "test"`) | Different parents — no collision. | Verified. `channelsWritersTestCmd` lives under `channelsWritersCmd`; the `Use: "test"` strings under different parents are routed disambiguously by cobra. |
+| Path-escape | All `{id}`/`{key}` path params must use `url.PathEscape`. | Done. Every new command uses `url.PathEscape(args[0])`; path-escape regression tests assert either escaped `RawPath` or decoded `Path` contains the literal. |
+| Empty-body POST shape | `providers reconnect` body must be empty (no `verify` field). | Done. Test asserts `body["verify"]` absent and `len(body) == 0`. |
+| Two-key-only body | `channels writers test` body must contain ONLY `group_id` and `user_id`. | Done. Test asserts `len(body) == 2`. |
+
+## Code-comment rule compliance
+
+Initial test comments referenced plan artifacts (`Red Team F2`, `Red Team F6`). Rewritten to describe the invariant (numeric-zero preservation, scientific-notation rendering) rather than the origin. Production code (`cmd/*_aggregate.go`, `cmd/*_follow.go`, etc.) contains no plan-artifact references; only invariants and contract notes are inlined.
+
+Pre-existing legacy "Phase 4 split" comments in `cmd/agents_instances.go`, `cmd/teams_members.go`, `cmd/teams_workspace.go` are out of scope for this pass.
+
+## Backend evidence (locked)
+
+- PR `#37` (traces follow + providers reconnect): merged commit `56e227c4`.
+- PR `#44` (sessions branch + sessions follow + channels writers test + activity aggregate + logs aggregate): merged commit `43049d3b`.
+- Beta release tag covering both: `v3.12.0-beta.20` (minimum required for CLI feature flag).
+
+## Unresolved questions
+
+None.
diff --git a/plans/260528-1357-fix-trace-details-by-id/phase-01-repro-and-root-cause.md b/plans/260528-1357-fix-trace-details-by-id/phase-01-repro-and-root-cause.md
new file mode 100644
index 0000000..c736d0c
--- /dev/null
+++ b/plans/260528-1357-fix-trace-details-by-id/phase-01-repro-and-root-cause.md
@@ -0,0 +1,193 @@
+---
+phase: 1
+title: Repro and root cause
+status: completed
+priority: P2
+effort: 1.5-2h
+dependencies: []
+---
+
+# Phase 1: Repro and root cause
+
+## Overview
+
+Lock the failure mode with a deterministic repro and a captured live-gateway response. Classify root cause as CLI-side, server-side, or hybrid — that classification drives Phase 2's scope. No production code changes in this phase. Trimmed scope: 3 red tests here; the remaining 7 (rendering, error categories, malformed-id) land in Phase 2 step 1 to avoid testing payload shape before fixture is captured.
+
+## Requirements
+
+**Functional**
+- 3 red repro tests in `cmd/traces_get_test.go` that mirror the actual broken behavior, using `httptest.NewServer` + the captured envelope.
+- A captured fixture JSON in `cmd/testdata/trace_detail_get.json` preserving the exact server envelope shape, scrubbed of secrets.
+- Written classification (CLI-side / server-side / hybrid) in `reports/repro-260528-issue17.md` with file:line evidence.
+
+**Non-functional**
+- Test file follows repo conventions (see `cmd/traces_follow_test.go`, `cmd/traces_list_test.go` for reference): `runCmd(t, ...)`, `okJSON(t, w, payload)` envelope helper.
+- Smoke probe is reproducible: documented `curl -H` command in the report with `$TOKEN` env var indirection — token MUST NOT appear in committed artifacts or shell history.
+
+## Architecture
+
+```
++--------------------+        +----------------+        +--------------+
+| traces get <id>    |  ----> | newHTTP().Get  |  ----> | GET /v1/...  |
+| (cobra RunE)       |        |  (envelope)    |        | { ok, payload}|
++--------------------+        +----------------+        +--------------+
+         |                            |
+         v                            v
+  unmarshalMap(data)            (silent err swallow)  <-- confirmed bug
+         |
+         v
+  printer.Print(map)
+         |
+         v
+  table -> JSON fallback    <-- "unusable output" symptom confirmed
+```
+
+**Investigation matrix** (each row = a probable failure mode):
+
+| # | Hypothesis | Evidence | Disposition |
+|---|---|---|---|
+| H1 | Table mode dumps raw JSON, looks "broken" to user | VERIFIED via [internal/output/output.go:30-37](internal/output/output.go:30) — JSON fallback | CLI-side; Phase 2 fixes render |
+| H2 | `unmarshalMap` swallows error on malformed payload | VERIFIED via [cmd/helpers.go:48-53](cmd/helpers.go:48) — `_ = json.Unmarshal` | CLI-side; Phase 2 fixes |
+| H3 | No `url.PathEscape` and no id-validation in `tracesGetCmd` | VERIFIED via [cmd/traces.go:61-75](cmd/traces.go:61) | CLI-side; Phase 2 adds validation |
+| H4 | Server returns trace under unexpected envelope key | UNVERIFIED — capture in smoke probe | Determines Phase-2 unmarshal path |
+| H5 | Server returns 404 for tenant-scoped trace IDs (auth filter) | UNVERIFIED — capture in smoke probe | Determines whether AC#3 needs server-side fix |
+
+## Related Code Files
+
+- Read only: `cmd/traces.go`, `cmd/helpers.go`, `internal/output/output.go`, `internal/output/exit.go`, `internal/client/http.go`.
+- Reference: `cmd/traces_follow_test.go` (for `runCmd` + `okJSON` patterns), `cmd/testdata/` (will be created — no prior fixtures exist in repo).
+- Create: `cmd/traces_get_test.go` — 3 red tests.
+- Create: `cmd/testdata/trace_detail_get.json` — captured fixture (scrubbed).
+- Create: `plans/260528-1357-fix-trace-details-by-id/reports/repro-260528-issue17.md` — smoke report with curl command, response sample (scrubbed), classification verdict.
+
+## Implementation Steps
+
+### 1. Smoke probe against live gateway
+
+Use `goclaw.zuey.me` or local dev gateway. **Token MUST come from env, never inline:**
+
+```bash
+export TOKEN="$(goclaw config get token 2>/dev/null || cat ~/.goclaw/token)"
+export SERVER="https://goclaw.zuey.me"   # or local
+# pick a trace id
+goclaw traces list --limit 5 -o json | jq -r '.payload[0].trace_id' > /tmp/trace_id
+TRACE_ID="$(cat /tmp/trace_id)"
+# capture wire-level shape
+curl -sS -H "Authorization: Bearer $TOKEN" "$SERVER/v1/traces/$TRACE_ID" > /tmp/trace_raw.json
+# capture CLI shape (both modes)
+goclaw traces get "$TRACE_ID" -o json > /tmp/trace_json.txt
+goclaw traces get "$TRACE_ID" > /tmp/trace_tty.txt  # demonstrates the "unusable" output
+# capture error envelopes
+goclaw traces get bogus-id-12345 -o json > /tmp/trace_404.txt 2>&1
+```
+
+**Capture in the repro report:**
+- Server response (scrubbed)
+- CLI table-mode output (the visible bug)
+- CLI JSON-mode output
+- 404 error envelope shape — note the exact `error.code` string (e.g. `NOT_FOUND`, `TRACE_NOT_FOUND`, `RESOURCE_NOT_FOUND` — whichever the server actually returns).
+- If reachable, also probe `permission denied` by attempting a trace from another tenant (server returns 403 + some code — capture the literal `error.code` string).
+
+**Gateway unreachable escape:** if no gateway is reachable, fall back to a hand-crafted minimal fixture derived from `traces follow` response shape ([cmd/traces_follow_test.go](cmd/traces_follow_test.go)). Mark the fixture and the report with `TODO: refresh after live smoke probe`. Phase 2 must refresh before merge.
+
+### 2. Save fixture (scrubbed)
+
+Strip secrets and PII from `/tmp/trace_raw.json` with this concrete `jq` recipe (extend as needed per real shape):
+
+```bash
+jq '
+  walk(
+    if type == "object" then
+      with_entries(
+        if .key | test("token|secret|api_key|authorization"; "i") then
+          .value = "REDACTED"
+        elif .key == "user_id" then .value = "user_REDACTED"
+        elif .key == "tenant_id" then .value = "tenant_REDACTED"
+        elif .key == "trace_id" then .value = "trace_FIXTURE_001"
+        elif .key == "agent_id" then .value = "agent_FIXTURE_001"
+        elif .key == "session_key" then .value = "session_FIXTURE_001"
+        elif .key | test("email|phone|address"; "i") then .value = "REDACTED"
+        elif .key | test("content|message|prompt|response|tool_args|tool_result|query"; "i") then .value = "REDACTED"
+        else .
+        end
+      )
+    else .
+    end
+  )
+' /tmp/trace_raw.json > cmd/testdata/trace_detail_get.json
+```
+
+After writing, **manually scan** `cmd/testdata/trace_detail_get.json` for any residual:
+- bearer tokens (`grep -i 'eyJ\|Bearer\|sk-\|token=' cmd/testdata/trace_detail_get.json` — must return 0 lines)
+- numeric ids that look like real user/tenant ids
+- span names that embed user input (sanitize manually if so)
+
+### 3. Write red repro tests
+
+In `cmd/traces_get_test.go` — only THREE tests in this phase (the rest moves to Phase 2):
+
+```go
+// 1. Path/method correctness — should already pass; documents the contract.
+func TestTracesGet_PathAndMethod(t *testing.T) { ... }
+
+// 2. JSON happy path with fixture — should already pass; locks the envelope shape.
+func TestTracesGet_HappyPath_JSON_LocksFixture(t *testing.T) { ... }
+
+// 3. Table mode renders raw JSON, not human-readable — RED (this is the bug).
+func TestTracesGet_TableMode_HumanReadable_RED(t *testing.T) {
+    // serve fixture; runCmd with "--output", "table"
+    // assert stdout does NOT start with `{` (would indicate JSON fallback)
+    // assert stdout contains human-readable markers like "TRACE_ID" or tree markers
+}
+```
+
+Test scaffolding pattern (per repo convention from `cmd/traces_follow_test.go`):
+- `httptest.NewServer` + `okJSON(t, w, payload)` envelope helper
+- `t.Setenv("GOCLAW_API_URL", srv.URL)` and `t.Setenv("GOCLAW_TOKEN", "test-token")`
+- `runCmd(t, "traces", "get", traceID, "--output", "table")` — **explicit `--output` flag** because `go test` pipes stdout (default would resolve to JSON, defeating the table-mode test).
+- Each test calls `runCmd` directly; cobra persistent-flag state is reset by passing the flag every time. No `resetTracesGetFlags` helper needed.
+
+### 4. Run tests; expect TestTracesGet_TableMode_HumanReadable_RED to fail
+
+```bash
+go test -count=1 ./cmd/... -run TestTracesGet
+go vet ./...
+go build ./...
+```
+
+Tests 1 and 2 should pass (path/method/JSON envelope are already correct). Test 3 should fail with a message like `stdout starts with '{', expected human-readable output`.
+
+### 5. Write `reports/repro-260528-issue17.md`
+
+Sections required:
+- **Curl command** (with `$TOKEN` env var, no inline secret).
+- **Response sample** — quote 10-30 lines of the scrubbed fixture.
+- **Test result summary** — which red, which already green.
+- **Classification: CLI-side / server-side / hybrid** with `file:line` evidence for each defect.
+- **Server error-code findings** — literal strings observed for 404 and (if probed) 403.
+- **If server-side root cause found:** drafted upstream issue body for `digitopvn/goclaw` (also scrubbed; reuse same `jq` recipe).
+
+## Success Criteria
+
+- [ ] `cmd/traces_get_test.go` exists with 3 test cases.
+- [ ] `cmd/testdata/trace_detail_get.json` captures real server envelope (or, if gateway unreachable, a stand-in marked `TODO: refresh after live smoke probe`).
+- [ ] No bearer token, no real user/tenant id, no PII in the committed fixture (`grep` verified).
+- [ ] `go test -count=1 ./cmd/...` runs without compile errors; `TestTracesGet_TableMode_HumanReadable_RED` fails with expected assertion message.
+- [ ] `reports/repro-260528-issue17.md` exists with: curl command (env-var token), response sample (scrubbed), classification verdict, file:line evidence, observed server error-code strings.
+- [ ] No edits to `cmd/traces.go` or any production code.
+
+## Risk Assessment
+
+- **Risk:** no access to live gateway during planning. **Mitigation:** fixture stand-in marked `TODO: refresh`; Phase 2 must refresh before merge.
+- **Risk:** secrets leak into fixture despite `jq` recipe. **Mitigation:** post-scrub grep check listed in step 2; fail-loud during code review subagent in Phase 3.
+- **Risk:** issue reporter's actual repro depends on a specific id that no longer exists. **Mitigation:** test against any valid id from `traces list`; if every id 404s for the reporter, that's the root cause and gets captured.
+
+## Security Considerations
+
+- **Smoke probe uses real auth token.** Token MUST come from env var, never inlined into report or fixture. Shell history risk: prefix probe commands with leading space (`HISTCONTROL=ignorespace`) or run from a subshell that won't persist history.
+- **Scrub recipe** uses `jq walk` to recursively redact by key-name allowlist; `grep` post-check enforces zero residual secrets.
+- **Issue body to upstream** (if filed) reuses the same scrubbed fixture — never the raw one.
+
+## Next Steps
+
+Phase 2 picks up the red test (`TestTracesGet_TableMode_HumanReadable_RED`), adds 7 more tests for the remaining acceptance criteria, then implements until green.
diff --git a/plans/260528-1357-fix-trace-details-by-id/phase-02-cli-fix-output-and-error-mapping.md b/plans/260528-1357-fix-trace-details-by-id/phase-02-cli-fix-output-and-error-mapping.md
new file mode 100644
index 0000000..14515cc
--- /dev/null
+++ b/plans/260528-1357-fix-trace-details-by-id/phase-02-cli-fix-output-and-error-mapping.md
@@ -0,0 +1,255 @@
+---
+phase: 2
+title: 'CLI fix: output and error mapping'
+status: completed
+priority: P2
+effort: 2.5-3h
+dependencies:
+  - 1
+---
+
+# Phase 2: CLI fix: output and error mapping
+
+## Overview
+
+Turn the red test from Phase 1 green. Add 7 more tests (rendering, error categories, malformed-id validation). Then implement: human-readable render path inline in `tracesGetCmd.RunE`, checked unmarshal, strict id validation (blocks path traversal), and reliance on existing HTTP-status fallback for the four AC#3 error categories.
+
+## Locked decisions inherited from plan.md
+
+- **No `MapServerCode` extension.** 404→ExitNotFound, 403→ExitAuth, 5xx→ExitServerError already resolve via [cmd/helpers.go:152-172](cmd/helpers.go:152) → [internal/output/exit.go:46+](internal/output/exit.go:46) HTTP-status fallback.
+- **Inline render in `cmd/traces.go`**, no `cmd/traces_render.go`. Cap render at ~50 LOC.
+- **Inline `json.Unmarshal` + error check.** No new helper.
+- **Strict id allowlist:** `^[A-Za-z0-9._-]+$` AND non-empty AND not `.` AND not `..`. Then `url.PathEscape` on top.
+- **5xx test allows `calls >= 1`** because `internal/client/http.go` retries 3x.
+
+## Requirements
+
+**Functional**
+- Table/TTY mode renders a structured trace summary: header card (trace_id, agent_id, status, duration, tokens, cost — whichever Phase-1 fixture confirms exists) + span tree (via `output.PrintTree`) + flat events list (`EVENTS (n=N):` header + one line per event, no truncation).
+- JSON/YAML mode emits the decoded payload via the existing printer path — no field dropping by us. (Note: `json.NewEncoder` re-encodes with HTML escape + key reorder; "preserves nested fields" means structural preservation, not byte-for-byte.)
+- Empty/malformed server response returns a wrapped error, not a silent empty `{}`.
+- Malformed id (empty, whitespace, `..`, `/`, `\\`, control char, non-allowlist) returns validation error BEFORE the HTTP call. Exit 4.
+- 4 error categories produce distinguishable messages and exit codes per AC#3 via existing HTTP-status fallback.
+
+**Non-functional**
+- All render logic stays inside `tracesGetCmd.RunE` (~50 LOC). No new file. Pattern matches `tracesListCmd` ([cmd/traces.go:30-59](cmd/traces.go:30)).
+- Reuse `output.PrintTree` / `output.TreeNode` ([internal/output/tree.go:7](internal/output/tree.go:7)); no parallel tree renderer.
+- No new dependencies.
+
+## Architecture
+
+```
+goclaw traces get <id>
+  |
+  v
+[validate id: regex allowlist + reject ./../ /]
+  |--invalid--> return validation error -> exit 4
+  v
+GET /v1/traces/{url.PathEscape(id)}
+  |
+  +-- APIError? ---> central handler via output.FromError
+  |                     |
+  |                     +-- StatusCode 404 -> exit 3 (via MapHTTPStatus)
+  |                     +-- StatusCode 403 -> exit 2 (via MapHTTPStatus)
+  |                     +-- StatusCode 5xx -> exit 5 (via MapHTTPStatus, after 3 retries)
+  |                     +-- StatusCode 400 -> exit 4 (server validation, if ever returned)
+  |
+  v
+json.Unmarshal(data, &trace) — checked, NOT silent
+  |--err--> wrap "decode trace payload" -> exit 5
+  v
+switch cfg.OutputFormat:
+  json|yaml -> printer.Print(trace)
+  table     -> inline render: header lines + PrintTree(spans) + events list
+```
+
+## Related Code Files
+
+- Modify: `cmd/traces.go` — `tracesGetCmd.RunE` rewritten with validation, checked unmarshal, inline render. Net add ~50-70 LOC.
+- Modify: `cmd/traces_get_test.go` — add 7 new tests on top of Phase 1's 3.
+- No new files.
+- No changes to `internal/output/exit.go` or `internal/output/error.go`.
+
+## Implementation Steps
+
+### 1. Extend test file (red first)
+
+Add these 7 tests to `cmd/traces_get_test.go`:
+
+```go
+// 4. Table render must include header card + span markers.
+func TestTracesGet_TableMode_HasHeaderAndSpanMarkers(t *testing.T) {
+    // serve fixture; runCmd with --output table
+    // assert stdout contains "TRACE_ID" header AND tree marker ("├─" or "└─" or "└" or "├")
+    // assert stdout does NOT start with '{'
+}
+
+// 5. JSON mode preserves structural keys observed in fixture.
+func TestTracesGet_JSONMode_PreservesStructure(t *testing.T) {
+    // serve fixture; runCmd with --output json
+    // parse stdout as JSON; assert every top-level key present in fixture is present in output
+    // assert nested key (e.g. spans[0].name) reachable
+}
+
+// 6. 404 → exit 3 + message contains "not found".
+func TestTracesGet_NotFound_ExitCode3(t *testing.T) {
+    // server returns 404 + {"ok":false, "error":{"code":"<observed code>", "message":"trace not found"}}
+    // assert err non-nil; output.FromError(err) == output.ExitNotFound (3)
+    // assert err.Error() contains "not found" (case-insensitive)
+}
+
+// 7. 403 → exit 2.
+func TestTracesGet_PermissionDenied_ExitCode2(t *testing.T) {
+    // server returns 403; assert output.FromError(err) == output.ExitAuth (2)
+}
+
+// 8. Malformed id rejected client-side; no HTTP call.
+func TestTracesGet_MalformedID_NoHTTPCall(t *testing.T) {
+    // record calls counter on httptest server
+    // try ids: "", "  ", "..", "../etc/passwd", "a/b", "a\\b", "a\x00b"
+    // each should: return err, exit code 4, AND not increment calls
+}
+
+// 9. 5xx → exit 5 (allow retries; do not assert call count).
+func TestTracesGet_ServerError_ExitCode5(t *testing.T) {
+    // server returns 500 always
+    // assert err non-nil; output.FromError(err) == output.ExitServerError (5)
+    // assert calls counter >= 1 (NOT == 1; retry adds calls)
+}
+
+// 10. Malformed JSON response surfaces decode error, not silent empty.
+func TestTracesGet_MalformedResponse_SurfacesError(t *testing.T) {
+    // server returns 200 + body "this is not json"
+    // assert err non-nil; err.Error() contains "decode" or "unmarshal"
+    // assert stdout is empty or contains the error, NOT a literal "{}"
+}
+```
+
+**Test harness rules:**
+- Every table-mode test passes `--output table` explicitly.
+- Every JSON-mode test passes `--output json` explicitly.
+- Use `httptest.NewServer` + `okJSON(t, w, payload)` envelope helper.
+- Use `t.Setenv` to point CLI at the test server.
+
+### 2. Implement `tracesGetCmd.RunE` rewrite
+
+Replace [cmd/traces.go:61-75](cmd/traces.go:61) with (sketch):
+
+```go
+var tracesGetCmd = &cobra.Command{
+    Use:   "get <traceID>",
+    Short: "Get trace with span tree",
+    Args:  cobra.ExactArgs(1),
+    RunE: func(cmd *cobra.Command, args []string) error {
+        id := strings.TrimSpace(args[0])
+        if err := validateTraceID(id); err != nil {
+            return err  // returns ExitValidation via output.FromError
+        }
+        c, err := newHTTP()
+        if err != nil {
+            return err
+        }
+        data, err := c.Get("/v1/traces/" + url.PathEscape(id))
+        if err != nil {
+            return err  // APIError -> central handler maps via HTTP status
+        }
+        var trace map[string]any
+        if err := json.Unmarshal(data, &trace); err != nil {
+            return fmt.Errorf("decode trace payload: %w", err)
+        }
+        format := output.ResolveFormat(cfg.OutputFormat)
+        if format == "json" || format == "yaml" {
+            printer.Print(trace)
+            return nil
+        }
+        return renderTraceTable(trace)  // inline below in same file
+    },
+}
+
+// validateTraceID enforces the strict allowlist (rejects path-traversal etc.)
+func validateTraceID(id string) error {
+    if id == "" || id == "." || id == ".." {
+        return validationErr("trace id is empty or invalid")
+    }
+    if !traceIDRegex.MatchString(id) {
+        return validationErr("trace id contains invalid characters (allowed: A-Z a-z 0-9 . _ -)")
+    }
+    return nil
+}
+var traceIDRegex = regexp.MustCompile(`^[A-Za-z0-9._-]+$`)
+
+// renderTraceTable prints header + span tree + events. ~30-40 LOC.
+func renderTraceTable(t map[string]any) error {
+    // defensive helpers — define inline or in cmd/helpers.go
+    s := func(k string) string { v, _ := t[k].(string); return v }
+    // header lines
+    fmt.Printf("TRACE_ID: %s\nAGENT_ID: %s\nSTATUS: %s\n", s("trace_id"), s("agent_id"), s("status"))
+    // span tree via output.PrintTree — adapt spans field per Phase-1 fixture shape
+    if spans, ok := t["spans"].([]any); ok && len(spans) > 0 {
+        root := buildSpanTree(spans)  // helper, ~15 LOC
+        fmt.Println()
+        output.PrintTree(root, "")
+    } else {
+        fmt.Println("(no spans)")
+    }
+    // events flat list
+    if evs, ok := t["events"].([]any); ok {
+        fmt.Printf("\nEVENTS (n=%d):\n", len(evs))
+        for _, e := range evs {
+            if m, ok := e.(map[string]any); ok {
+                fmt.Printf("  - %v\n", m["type"])  // or whatever key Phase-1 fixture confirms
+            }
+        }
+    }
+    return nil
+}
+```
+
+**`validationErr` helper**: produce an error that `output.FromError` maps to `ExitValidation` (4). If no such helper exists today, add a 2-line one (e.g. wrap in `APIError{Code: "INVALID_REQUEST"}` since `MapServerCode` covers that → exit 4). Verify via Phase 1 reading of `internal/output/exit.go` what the simplest path is.
+
+**`buildSpanTree`**: only define IF Phase 1 fixture confirms spans are hierarchical or have `parent_id`. If flat with no parent relationship, just render a flat list and skip the tree. **Do not guess the shape; let the fixture drive it.**
+
+### 3. Run tests until green
+
+```bash
+go test -count=1 ./cmd/... -run TestTracesGet
+go vet ./...
+go build ./...
+```
+
+### 4. Live smoke (manual)
+
+Re-run the four `goclaw traces get` invocations from Phase 1's smoke probe. Confirm:
+- TTY: readable header + span tree + events
+- `-o json`: round-trips through `jq` with all fields intact
+- 404 id: exit 3, message contains "not found"
+- 403 id (if reachable): exit 2
+
+## Success Criteria
+
+- [ ] All 3 Phase-1 tests + 7 Phase-2 tests pass under `go test -count=1`.
+- [ ] `go vet ./...` clean.
+- [ ] `go build ./...` clean.
+- [ ] `tracesGetCmd.RunE` + `renderTraceTable` + `validateTraceID` + `buildSpanTree` together ≤ 100 LOC of new code in `cmd/traces.go` (excluding doc comments).
+- [ ] Manual smoke: `goclaw traces get <real-id>` in TTY renders summary + span tree + events; `-o json` round-trips through `jq`.
+- [ ] `output.FromError` returns 0/2/3/4/5 correctly for the categories per test assertions.
+- [ ] Path-traversal attempt (`goclaw traces get ../etc/passwd`) returns exit 4 without making an HTTP request.
+- [ ] No plan-artifact references in production or test code (no "Phase 2", finding codes, etc.).
+
+## Risk Assessment
+
+- **Risk:** Phase-1 fixture reveals a span shape that doesn't match the hierarchical-or-flat assumption. **Mitigation:** `renderTraceTable` degrades gracefully (`(no spans)` line) if structure is unrecognized. Tree renderer only invoked if shape is recognizable.
+- **Risk:** server returns a 4xx that isn't 400/403/404 (e.g. 422). **Mitigation:** `MapHTTPStatus` likely already covers it via `4xx → ExitGeneric` or similar; verify in Phase 1 step 4 reading of `cmd/helpers.go:152-172`.
+- **Risk:** `renderTraceTable` becomes a kitchen-sink. **Mitigation:** ≤100 LOC cap; defer richer rendering (timeline, ASCII chart, color) to a future enhancement.
+
+## Security Considerations
+
+- **Path traversal blocked client-side** via `validateTraceID` allowlist. Even if a downstream proxy mishandles encoded `..`, the CLI refuses to send it.
+- **ANSI escape injection via span names.** Span `name` fields may contain LLM/tool output — potentially attacker-controlled. Current `traces list` ([cmd/traces.go:30-59](cmd/traces.go:30)) already renders user-controlled fields (`agent_id`, `status`) without sanitization, so structural parity exists. Span/event payloads are a **materially higher risk surface** because content is less constrained. Accepted-trade-off OR add `strings.Map(safeRune, v)` filter — **decide based on Phase-1 fixture inspection** (does the fixture have any non-printable bytes in span names?). If yes, add the filter; if no, accept and document.
+- **404/403 existence oracle.** AC#3 explicitly requires the distinction. Trade-off accepted, documented here. Mitigation belongs to the server (could randomize a small delay on 404 to defeat timing oracle), not the CLI.
+- **JSON full-payload exposure.** If server payload includes internal fields, `-o json` now shows them where table mode used to hide them via the broken render. This is correct behavior, not a regression — issue #17 explicitly wants both modes working. Documented.
+- **No new auth surface.** Existing `newHTTP()` token handling unchanged.
+
+## Next Steps
+
+Phase 3 updates docs and ships.
diff --git a/plans/260528-1357-fix-trace-details-by-id/phase-03-docs-and-ship.md b/plans/260528-1357-fix-trace-details-by-id/phase-03-docs-and-ship.md
new file mode 100644
index 0000000..685c4f8
--- /dev/null
+++ b/plans/260528-1357-fix-trace-details-by-id/phase-03-docs-and-ship.md
@@ -0,0 +1,97 @@
+---
+phase: 3
+title: "Docs and ship"
+status: pending
+priority: P2
+effort: "45m-1h"
+dependencies: [2]
+---
+
+# Phase 3: Docs and ship
+
+## Overview
+
+Update CHANGELOG, file upstream issue ONLY if Phase 1 classified server-side root cause, then ship via `/ck:ship beta` (target: `dev`). Closes issue #17.
+
+## Requirements
+
+**Functional**
+- CHANGELOG entry under `[Unreleased]` documenting the fix.
+- README `traces get` example updated IF output behavior materially changed (likely yes — TTY mode is new).
+- `docs/codebase-summary.md` reflects new id-validation + render path IF non-trivial (likely a one-line bullet).
+- Upstream issue filed at `digitopvn/goclaw` IFF Phase 1 classification was server-side or hybrid. PR body links it.
+- `gh issue close 17` runs after the dev→main promotion merges (or auto-closes via `Closes #17` in the promotion PR).
+
+**Non-functional**
+- Commit message follows conventional commits (`fix(cli): ...`).
+- PR body references `Closes #17` (will auto-close only when promoted to default branch), links Phase-1 repro report, lists acceptance-criteria mapping.
+
+## Architecture
+
+No code changes — docs + ship pipeline only.
+
+## Related Code Files
+
+- Modify: `CHANGELOG.md` — add entry under existing `[Unreleased]` block (append to P0–P6 section if present).
+- Modify: `README.md` — refresh `traces get` example block.
+- Modify: `docs/codebase-summary.md` — one bullet on the new traces-get behavior.
+- Modify: `plans/260528-1357-fix-trace-details-by-id/plan.md` — flip phase statuses via `ck plan check` (not hand-edit).
+- No source code edits.
+
+## Implementation Steps
+
+1. **Sync plan status via CLI** — `cd plans/260528-1357-fix-trace-details-by-id && ck plan check 1 && ck plan check 2 && ck plan check 3 --start`. (Phase 3 marked in-progress while shipping; flip to completed at the end via `ck plan check 3`.)
+
+2. **CHANGELOG** — add under `[Unreleased]`:
+   ```
+   ### Fixed
+   - `goclaw traces get <id>` — human-readable rendering for TTY mode (header + span tree + events), checked unmarshal (no more silent empty `{}`), strict id validation (rejects path-traversal), and distinct exit codes for not-found (3) / permission-denied (2) / malformed-id (4) / server-failure (5). Closes issue #17.
+   ```
+   If Phase 1 found a server-side root cause, add a `### Notes` line linking the upstream issue URL.
+
+3. **README** — locate `traces get` example. If absent, add to traces section showing expected human + JSON output samples (use scrubbed Phase-1 fixture as the JSON sample).
+
+4. **codebase-summary.md** — add bullet under traces/output section noting validated id + structured render.
+
+5. **Upstream issue (conditional)** — only if root cause was server-side or hybrid:
+   - Check rights first: `gh api repos/digitopvn/goclaw -q .permissions`. If no issue-create permission, instead post a comment on issue #17 with the drafted body for the maintainer to file.
+   - If rights present: `gh issue create -R digitopvn/goclaw --title "..." --body "..."` using the scrubbed body drafted in Phase 1's repro report.
+   - Add the issue URL to CHANGELOG `### Notes`.
+
+6. **Code review** — spawn `code-reviewer` subagent on the diff before ship. Reviewer must explicitly verify:
+   - No bearer tokens or PII in committed fixture/report (`grep -i 'eyJ\|Bearer\|sk-' cmd/testdata/` returns 0).
+   - All 10 trace-get tests pass.
+   - `cmd/traces.go` net add ≤ 100 LOC.
+   - No plan-artifact references ("Phase 2", "F1", etc.) in production/test code.
+
+7. **Ship** — `/ck:ship beta` (target: `dev` branch).
+
+8. **Verify PR** — confirm PR body includes `Closes #17`, links Phase-1 repro report, lists acceptance-criteria mapping (1→fix; 2→table+JSON tests; 3→exit-code tests + path-traversal block; 4→upstream link if applicable; 5→`cmd/traces_get_test.go`).
+
+9. **Issue close after promotion** — PR targets `dev`; auto-close fires only when promoted to default branch. Document this in the issue with a comment: "Fixed in dev via PR #X; will auto-close on next `dev → main` promotion."
+
+## Success Criteria
+
+- [ ] CHANGELOG `[Unreleased]` entry present.
+- [ ] README `traces get` example reflects new behavior.
+- [ ] `docs/codebase-summary.md` updated.
+- [ ] Upstream `digitopvn/goclaw` issue filed/commented-for-filing (conditional on Phase 1 verdict).
+- [ ] code-reviewer subagent verdict: DONE, 0 Critical, 0 Important.
+- [ ] PR created against `dev`, all CI checks green.
+- [ ] Plan phases 1, 2, 3 all marked completed via `ck plan check`.
+- [ ] Comment posted on issue #17 explaining the dev → main promotion auto-close behavior.
+
+## Risk Assessment
+
+- **Risk:** PR auto-close via `Closes #17` only fires on default-branch merge; PR targets `dev` first. **Mitigation:** explicit comment on #17 explaining the timing; manual close after `dev → main` promotion if needed.
+- **Risk:** ship pipeline blocks on unrelated CI flakiness. **Mitigation:** investigate per failure; do not bypass with `--skip-tests`.
+- **Risk:** no rights on `digitopvn/goclaw`. **Mitigation:** fallback to issue-#17 comment with the drafted body (step 5 above).
+
+## Security Considerations
+
+- Fixture/report secret-scrub gate re-checked by code-reviewer subagent (step 6).
+- Upstream issue body reuses scrubbed content only.
+
+## Next Steps
+
+After merge to `dev`, the natural follow-up is `/ck:ship official` to promote `dev` → `main`. Out of scope for this plan.
diff --git a/plans/260528-1357-fix-trace-details-by-id/plan.md b/plans/260528-1357-fix-trace-details-by-id/plan.md
new file mode 100644
index 0000000..9258c3d
--- /dev/null
+++ b/plans/260528-1357-fix-trace-details-by-id/plan.md
@@ -0,0 +1,86 @@
+---
+title: 'Fix issue #17: cannot read trace details by trace id'
+description: >-
+  Restore reliable single-trace-by-id reads in goclaw-cli with explicit error
+  categorization, human-readable rendering, and regression tests. TDD mode.
+status: pending
+priority: P2
+branch: worktree-fix-trace-details-by-id
+tags:
+  - bugfix
+  - traces
+  - ai-ergonomics
+  - tdd
+blockedBy: []
+blocks: []
+created: '2026-05-28T06:58:02.755Z'
+createdBy: 'ck:plan'
+source: skill
+---
+
+# Fix issue #17: cannot read trace details by trace id
+
+## Overview
+
+`goclaw traces get <traceID>` exists ([cmd/traces.go:61](cmd/traces.go:61)) and calls `GET /v1/traces/{traceID}`, but the issue reporter observed that detail lookup "does not work reliably" with "unusable output". Verified CLI-side defects from scout:
+
+1. **No human-readable render path.** `tracesGetCmd` calls `printer.Print(unmarshalMap(data))`. In table mode, `output.Printer.Print` falls back to JSON dump ([internal/output/output.go:30-37](internal/output/output.go:30)) because the trace payload is a `map[string]any`, not a `*TableData`. Users in a TTY see an unindented blob.
+2. **Silent unmarshal failures.** `unmarshalMap` discards `json.Unmarshal` errors ([cmd/helpers.go:48-53](cmd/helpers.go:48)) — a malformed/empty server response renders as empty `{}`.
+3. **Path-traversal exposure.** `tracesGetCmd` concatenates `args[0]` into the URL with no `url.PathEscape` and no allowlist; `url.PathEscape("..")` returns literal `..`. Bad ids reach the server.
+4. **No regression test.** `cmd/traces_get_test.go` does not exist.
+
+Plus one acceptance gap:
+- No client-side categorization between not-found / permission-denied / malformed-id / server-failure. The exit-code-mapping infrastructure already exists ([internal/output/exit.go:18-44](internal/output/exit.go:18)): `MapServerCode` covers `NOT_FOUND`→3, `TENANT_ACCESS_REVOKED`→2, `INVALID_REQUEST`→4, `INTERNAL`/5xx→5. `cmd/helpers.go:152-172` derives codes from raw HTTP status as a fallback. **No `MapServerCode` extension needed.**
+
+Server-vs-CLI root cause is most likely CLI-side per the above, but Phase 1 captures a real response shape against a live gateway and confirms the failure mode before Phase 2 implementation.
+
+**Acceptance criteria** (verbatim from [issue #17](https://github.com/nextlevelbuilder/goclaw-cli/issues/17)):
+1. `goclaw-cli` can read trace details by id.
+2. Output works in JSON and human-readable modes.
+3. Errors clearly distinguish: not found, permission denied, malformed id, and server/API failure.
+4. If API is the root cause, document/link the server-side issue in `digitopvn/goclaw`.
+5. Add regression test or smoke test for trace detail lookup.
+
+**Out of scope (verbatim from invocation):**
+- Redesigning the traces domain.
+- WS streaming for trace events.
+- Anything beyond the single-trace-by-id read path.
+
+## Locked decisions (from red-team review 2026-05-28)
+
+These supersede earlier drafts and are not re-litigated during implementation:
+
+- **No `MapServerCode` extension.** HTTP-status fallback at `cmd/helpers.go:152-172` already covers 403→ExitAuth, 404→ExitNotFound, 5xx→ExitServerError. Server-code names are NOT speculated in tests.
+- **Inline render, no `cmd/traces_render.go`.** Follows the established pattern of `tracesListCmd` ([cmd/traces.go:30-59](cmd/traces.go:30)). Cap render code at ~50 LOC inside `tracesGetCmd.RunE`.
+- **Inline `json.Unmarshal` with error check.** No new `unmarshalMapStrict` helper. Pattern: `var trace map[string]any; if err := json.Unmarshal(data, &trace); err != nil { return fmt.Errorf("decode trace payload: %w", err) }`.
+- **Strict id validation.** Reject empty/whitespace, `..`, leading/trailing `/`, embedded `/`, `\\`, control chars. Use a small allowlist regex (e.g. `^[A-Za-z0-9._-]+$`) AND `len > 0` AND not equal to `.` or `..`. Then `url.PathEscape` on top.
+- **Test harness forces table mode explicitly.** Every table-mode test in `cmd/traces_get_test.go` passes `--output table` via `runCmd(t, "traces", "get", id, "--output", "table")` — `go test` stdout is piped, so default would be JSON.
+- **Span tree IS in scope.** The command's existing `Short` description says "Get trace with span tree" ([cmd/traces.go:62](cmd/traces.go:62)) and AC #2 requires human-readable mode. Reuse `output.PrintTree` / `output.TreeNode` ([internal/output/tree.go:7](internal/output/tree.go:7)). No new tree renderer.
+- **Events: simple flat list, no truncation.** Print `EVENTS (n=N):` header then one line per event. No first-5-last-1 polish.
+- **5xx auto-retry awareness.** `internal/client/http.go` retries 3x on 5xx; Phase-2 5xx tests assert `calls >= 1` (not equal) and verify exit-code-5, not call count.
+
+## Phases
+
+| Phase | Name | Status |
+|-------|------|--------|
+| 1 | [Repro and root cause](./phase-01-repro-and-root-cause.md) | Completed |
+| 2 | [CLI fix: output and error mapping](./phase-02-cli-fix-output-and-error-mapping.md) | Completed |
+| 3 | [Docs and ship](./phase-03-docs-and-ship.md) | Pending |
+
+## TDD discipline (per `--tdd`)
+
+Each phase opens with red tests describing the desired behavior, then implementation lands until tests are green. Specifically:
+- **Phase 1** — write 3 red tests in `cmd/traces_get_test.go` reproducing the bug against a mock `httptest` server with a realistic envelope; capture the exact payload shape from a live gateway smoke probe (or document gateway-unreachable escape).
+- **Phase 2** — extend the test file with cases for human-readable rendering + 4 error categories + path-traversal validation before implementation.
+- **Phase 3** — no new tests, only verification + ship.
+
+## Dependencies
+
+None. P6 (HEAD = `2801486`) provides all required prerequisite surfaces.
+
+## Risk and rollback
+
+- **Risk:** root cause is purely server-side. Mitigation: Phase 1 explicitly classifies CLI vs server; if server, Phase 2 narrows to error-mapping + smoke-test + upstream-issue filing, code change to traces command itself is minimal.
+- **Risk:** trace payload shape (spans/events/messages structure) differs from speculation. Mitigation: Phase 1 captures the real fixture FIRST; Phase 2 render code matches the captured shape, not a guess.
+- **Risk:** existence oracle from distinct 404 / 403 messages. AC #3 explicitly requires the distinction — accepted trade-off, documented in Phase 2 Security Considerations.
+- **Rollback:** all changes confined to `cmd/traces.go` and `cmd/traces_get_test.go`. Single revert.
diff --git a/plans/260528-1357-fix-trace-details-by-id/reports/code-review-260528-issue17.md b/plans/260528-1357-fix-trace-details-by-id/reports/code-review-260528-issue17.md
new file mode 100644
index 0000000..8285329
--- /dev/null
+++ b/plans/260528-1357-fix-trace-details-by-id/reports/code-review-260528-issue17.md
@@ -0,0 +1,40 @@
+# Code review — issue #17 fix (pre-ship)
+
+Date: 2026-05-28
+Phase: 3 (step 6 — pre-ship gate)
+Reviewer: `code-reviewer` subagent
+
+## Verdict
+
+**DONE_WITH_CONCERNS** — 0 Critical, 0 Important, 4 informational. Ship-ready.
+
+## Gate checks (all pass)
+
+| Check | Result |
+|------|--------|
+| `grep -i 'eyJ\|Bearer\|sk-\|token='  cmd/testdata/` returns 0 lines | ✅ |
+| All 10 trace-get tests pass + full repo suite green | ✅ |
+| `cmd/traces.go` net add ≤ 100 LOC (advisory) | ⚠️ 107 — accepted trade-off for `buildSpanTree` clarity |
+| No plan-artifact refs in source/comments | ✅ |
+
+## Verified behavior
+
+- `buildSpanTree` cycle safety: A↔B cycles silently drop (never appear in `children[""]`, so `build()` is never invoked on them). No infinite recursion possible.
+- `validateTraceID` gates: 9 malformed-id cases all rejected pre-HTTP. Confirmed by `TestTracesGet_MalformedID_NoHTTPCall` (server hit count = 0 across the table).
+- HTTP retry-body fix: `if attempt == 2 { break }` correctly skips the per-iteration `Close()` so the outer `defer resp.Body.Close()` handles cleanup exactly once. No leak, no double-close.
+- Error mapping: 404→3, 403→2, 5xx→5, malformed→4 — all asserted, all pass.
+
+## Informational concerns
+
+1. **107 LOC vs 100 advisory cap.** Eliminating 7 lines would force inlining `parentOf` or removing the orphan-detection branch — clarity loss > LOC gain. Accept.
+2. **Fixture `_TODO_refresh` marker.** Intentional per Phase 1 reviewer gate — refresh against `goclaw.zuey.me` before final merge.
+3. **`tracesExportCmd` un-validated id (`cmd/traces.go:207`).** Pre-existing. Same path-traversal/control-char inputs that `validateTraceID` blocks flow through `export` unfiltered. Out of scope here — spawn follow-up task.
+4. **README exit-code list missing 6.** Fixed inline (added "rate-limit / network-resource exhaustion → 6").
+
+## Recommended actions (all addressed or scheduled)
+
+| Action | Status |
+|--------|--------|
+| README exit-6 mention | Fixed inline this turn |
+| Fixture refresh against live gateway | Plan-gated; auth-blocked smoke probe documented in Phase 1 repro |
+| Harden `tracesExportCmd` with `validateTraceID` + `url.PathEscape` | Spawn follow-up task |
diff --git a/plans/260528-1357-fix-trace-details-by-id/reports/red-team-assumption-destroyer.md b/plans/260528-1357-fix-trace-details-by-id/reports/red-team-assumption-destroyer.md
new file mode 100644
index 0000000..d3685a7
--- /dev/null
+++ b/plans/260528-1357-fix-trace-details-by-id/reports/red-team-assumption-destroyer.md
@@ -0,0 +1,110 @@
+# Red-Team: Assumption Destroyer — Issue #17 Plan
+
+## Verdict: REVISE
+
+Most structural claims verify, but two findings materially impact the plan:
+- The `MapServerCode` table does NOT contain `TRACE_NOT_FOUND` or `PERMISSION_DENIED` today (plan says "may need adding" — it's not "may", it's required).
+- The trace detail payload shape (`spans`/`events`/`messages` keys, hierarchical-vs-flat span tree) is purely speculated. No fixture exists in repo. Phase 1 must capture it before Phase 2 can implement `renderTraceDetail`/`buildSpanTree`.
+
+## Findings
+
+### Important — `tracesGetCmd` line number is off by 1
+- **Assumption:** "`cmd/traces.go` has `tracesGetCmd` at line 61"
+- **Status:** PASS (close enough)
+- **Evidence:** `cmd/traces.go:61` reads `var tracesGetCmd = &cobra.Command{`
+- **Impact:** None. Plan citation is accurate.
+
+### Important — `unmarshalMap` silently swallows errors
+- **Assumption:** "`unmarshalMap` at `cmd/helpers.go:48-53` silently swallows errors"
+- **Status:** PASS
+- **Evidence:** `cmd/helpers.go:48-53`:
+  ```go
+  func unmarshalMap(data json.RawMessage) map[string]any {
+      var m map[string]any
+      _ = json.Unmarshal(data, &m)
+      return m
+  }
+  ```
+  The `_ =` discards the error. Same flaw at `unmarshalList` (lines 41-46).
+- **Impact:** Confirms Phase 2 step 3 ("Replace silent `unmarshalMap` call with checked unmarshal") is correctly motivated. Note: the same anti-pattern is used across `tracesListCmd`, `usageSummaryCmd`, etc. — fixing only at the `traces get` call site is the minimum fix per the issue scope; the broader code smell is out of scope.
+
+### Important — Printer table fallback dumps JSON
+- **Assumption:** "`printer.Print` falls back to JSON dump in table mode at `internal/output/output.go:30-37`"
+- **Status:** PASS
+- **Evidence:** `internal/output/output.go:31-37`:
+  ```go
+  default:
+      // Table format requires TableData
+      if td, ok := data.(*TableData); ok {
+          p.printTable(td)
+      } else {
+          p.printJSON(data) // fallback for non-table data
+      }
+  ```
+  A `map[string]any` is NOT `*TableData`, so it hits `printJSON`. In a TTY this dumps formatted JSON — useless as a "table".
+- **Impact:** Confirms the root-cause hypothesis H1 in Phase 1.
+
+### Important — `output.PrintTree` / `TreeNode` exist
+- **Assumption:** "`output.PrintTree` and `TreeNode` exist at `internal/output/tree.go:7-36` for reuse"
+- **Status:** PASS
+- **Evidence:** `internal/output/tree.go:8-12` defines `TreeNode{Name, Children}`; `tree.go:16-27` defines `PrintTree(node, w, prefix, isLast)`; `tree.go:30-35` defines `PrintTreeRoot`.
+- **Impact:** Plan's reuse plan in Phase 2 step 2 is valid. **However**: `PrintTree` takes an `io.Writer` — plan must remember to pass `os.Stdout` (the existing API doesn't write to stdout by default).
+
+### Critical — `MapServerCode` does NOT include `TRACE_NOT_FOUND` or `PERMISSION_DENIED`
+- **Assumption:** "extend `MapServerCode` if `TRACE_NOT_FOUND` / `PERMISSION_DENIED` not yet mapped" (Phase 2, line 64)
+- **Status:** FAIL (semantically — plan hedges "if not yet mapped" but it's NOT mapped; phrasing implies "may" or "may not", which is misleading)
+- **Evidence:** `internal/output/exit.go:17-39` `serverCodeMap` contains: `UNAUTHORIZED`, `NOT_PAIRED`, `TENANT_ACCESS_REVOKED`, `NOT_FOUND`, `NOT_LINKED`, `INVALID_REQUEST`, `FAILED_PRECONDITION`, `ALREADY_EXISTS`, `INTERNAL`, `UNAVAILABLE`, `AGENT_TIMEOUT`, `RESOURCE_EXHAUSTED`. **No** `TRACE_NOT_FOUND`. **No** `PERMISSION_DENIED`. Currently both return `ExitGeneric` (1) via `MapServerCode`.
+- **Impact:** Phase 2 step 4 must commit to ADDING these codes, not just "confirming". Otherwise the `TestTracesGet_NotFound_ExitCode3` test will fail because the code-string path returns 1, and only the HTTP-status fallback path saves it (404→3 via `MapHTTPStatus` at `exit.go:54`). The `FromError` logic at `error.go:144-154` does fall through to `MapHTTPStatus` when `MapServerCode == ExitGeneric`, so the test will likely pass via the status-code fallback — but the plan should be explicit that the code-string mapping is currently a no-op, and acknowledge that the fallback is what actually saves the assertion.
+- **Suggested fix:** Update Phase 2 step 4 from "Confirm `MapServerCode("TRACE_NOT_FOUND")` returns `ExitNotFound`" to "**ADD** `TRACE_NOT_FOUND → ExitNotFound` and `PERMISSION_DENIED → ExitAuth` to `serverCodeMap` in `internal/output/exit.go:17`." Also rerun `internal/output/exit_test.go` after the change.
+
+### Critical — `TRACE_NOT_FOUND` / `PERMISSION_DENIED` are guessed names, not verified
+- **Assumption:** "server returns error codes like `TRACE_NOT_FOUND` and `PERMISSION_DENIED`"
+- **Status:** UNVERIFIABLE (zero evidence in repo)
+- **Evidence:** `grep -rn "TRACE_NOT_FOUND\|PERMISSION_DENIED"` across `/internal/`, `/cmd/`, `/docs/` returns **only** matches inside the plan documents themselves. Sibling test `cmd/traces_follow_test.go` does not exercise an error envelope at all. `apiErrorCodeForStatus` at `cmd/helpers.go:152-172` shows the **CLI's** notion of canonical codes: 403 → `TENANT_ACCESS_REVOKED`, not `PERMISSION_DENIED`. 404 → `NOT_FOUND`, not `TRACE_NOT_FOUND`.
+- **Impact:** If the server actually returns `NOT_FOUND` (already mapped to ExitNotFound at `exit.go:24`) and `TENANT_ACCESS_REVOKED` (already mapped to ExitAuth at `exit.go:21`), then no `MapServerCode` extension is needed — the existing mapping already handles them. The plan's Phase 2 step 4 risks adding dead code if `TRACE_NOT_FOUND` is never emitted by the upstream `digitopvn/goclaw` server.
+- **Suggested fix:** Phase 1 step 1 (smoke probe) MUST capture an actual 404 error envelope from the live gateway to lock the real `code` field value. Drop speculative codes from Phase 2 plan until that fixture is captured. Tests should assert on the actual code returned, not a guessed name.
+
+### Important — `okJSON` and `runCmd` test helpers exist
+- **Assumption:** "`okJSON(t, w, payload)` and `runCmd(t, args...)` helpers exist"
+- **Status:** PASS
+- **Evidence:** `cmd/phase5_test.go:13-19` defines `okJSON`; `cmd/phase5_test.go:21-27` defines `runCmd`. Both used across `cmd/traces_follow_test.go`, `cmd/activity_aggregate_test.go`, etc.
+- **Impact:** None — helpers available.
+- **Caveat:** `okJSON` wraps payload in `{"ok": true, "payload": ...}`. If the live `/v1/traces/{id}` envelope is structurally different (e.g. payload at root, no `ok` wrapper, or under a different key), tests built on `okJSON` will be testing a fake envelope. This matters for "TestTracesGet_HappyPath_*" — verify Phase 1 smoke probe captures the **real** envelope before assuming the `okJSON` shape applies.
+
+### Critical — Trace payload shape (`spans`/`events`/`messages`) is speculation
+- **Assumption:** "trace payload has `spans` (hierarchical or flat with parent_id), `events`, `messages`"
+- **Status:** UNVERIFIABLE
+- **Evidence:** Only on-disk reference to these field names in the context of `/v1/traces/{id}`:
+  - `cmd/traces.go:51-54` (`tracesListCmd` columns): `trace_id`, `agent_id`, `status`, `duration_ms`, `input_tokens`, `output_tokens`, `cost`. **No** `spans`, `events`, `messages`.
+  - `cmd/traces_follow_test.go:34, 130-134`: envelope contains `traces`, `spans_by_trace_id`, `next_since` — but that's the **list/follow** shape, not the single-trace-detail shape.
+  - `cmd/traces.go:62` Short: "Get trace with span tree" — implies spans exist but says nothing about format.
+  - No fixture file under `cmd/testdata/` (the directory does not exist).
+- **Impact:** Phase 2 plans `renderTraceDetail` and `buildSpanTree` against an unknown shape. If spans are returned as `spans_by_trace_id[trace_id]` (the only known wire format from `traces_follow`), span linkage is **flat** under a map keyed by trace_id — different from "hierarchical with `children` or flat with `parent_id`" as plan speculates. Implementation could be built against the wrong shape.
+- **Suggested fix:** Phase 1 is mandatory **before** any Phase 2 implementation. Phase 2 step 2 ("Implement `renderTraceDetail`") must explicitly block on Phase 1's captured fixture. Make this dependency loud in `phase-02-cli-fix-output-and-error-mapping.md` (currently only `dependencies: [1]` in frontmatter).
+
+### Minor — No existing secret-stripped fixture pattern in repo
+- **Assumption:** "Strip secrets" / "Do NOT commit fixture with auth header or unredacted user IDs"
+- **Status:** UNVERIFIABLE (no precedent to follow)
+- **Evidence:** `find . -type d -name testdata` returns no results — no `cmd/testdata/` exists, no fixtures committed today.
+- **Impact:** Phase 1 will introduce the first on-disk fixture pattern for this repo. Without a precedent, the reviewer has to design the redaction rules. Risk: under-redaction (PII slips through) or over-redaction (fixture becomes useless for testing real shape edge cases).
+- **Suggested fix:** Phase 1 should list explicit fields to scrub: `user_id`, `tenant_id`, `agent_id` content if proprietary, `messages[].content`, `events[].args`, any token-bearing fields. Use sentinel placeholders (`user-redacted`, `tenant-redacted`, `[REDACTED]`). Document the redaction rules in the report so future trace fixtures follow them.
+
+### Minor — `Closes #17` auto-close requires merge to default branch
+- **Assumption:** "PR auto-close via `Closes #17` only fires when merged into the default branch (`main`). PR targets `dev` first." (phase-03, risk section)
+- **Status:** PASS (correctly self-flagged in plan)
+- **Evidence:** Default branch is `main` (per gitStatus header and CLAUDE.md). Plan targets `dev`. GitHub auto-closes referenced issues only when the PR merges to the default branch. The plan already calls this out and proposes manual close after dev→main promotion.
+- **Impact:** None — plan is self-consistent. Issue #17 is verified to exist in `nextlevelbuilder/goclaw-cli` (the correct repo per `git remote -v`).
+
+### Minor — `Get` returns `json.RawMessage` (envelope already unwrapped)
+- **Observation (bonus):** `HTTPClient.Get` at `internal/client/http.go:47` returns `(json.RawMessage, error)`. The plan's diagrams hint at this but never explicitly state what `data` is when `c.Get(...)` returns. Worth verifying whether `Get` strips the `{ok, payload}` envelope before returning — otherwise `unmarshalMap(data)` operates on the envelope, not the inner payload, and tests using `okJSON` wrap exactly the envelope shape that `Get` would already have unwrapped.
+- **Status:** UNVERIFIABLE without reading `internal/client/http.go` in full — but the plan does not document this contract, and the test envelope wrapping in `okJSON` strongly implies `Get` does unwrap. Phase 1's repro test must confirm the layer at which `data` is observed.
+- **Suggested fix:** Phase 1 step 1 add a sub-step: "Document what shape `data := c.Get(...)` returns — envelope or payload — by reading `internal/client/http.go`'s Get implementation."
+
+## Open Questions
+
+- Does the live `/v1/traces/{id}` endpoint exist on the current `digitopvn/goclaw` server, or is the path different (e.g. `/v1/traces/get/{id}`, `/v1/traces?id=...`, `/v1/tenants/{tenant}/traces/{id}`)? Plan assumes the path works; H5 in Phase 1 nominally checks this but doesn't allocate a discrete acceptance criterion to the answer.
+- Are span IDs/parent linkage fields named `span_id`/`parent_id`, `id`/`parent`, `id`/`parent_span_id`, or something else? `renderTraceDetail`/`buildSpanTree` cannot be designed until the fixture lands.
+- If the server returns a 401/403 due to tenant mismatch on `traces get` but allows `traces list`, that's a server bug, not a CLI bug — should Phase 3's upstream-issue path be widened to "auth/tenant filtering" beyond pure not-found regressions?
+
+Status: DONE
+Severity counts: 3 Critical, 5 Important, 3 Minor.
diff --git a/plans/260528-1357-fix-trace-details-by-id/reports/red-team-failure-mode-analyst.md b/plans/260528-1357-fix-trace-details-by-id/reports/red-team-failure-mode-analyst.md
new file mode 100644
index 0000000..8300708
--- /dev/null
+++ b/plans/260528-1357-fix-trace-details-by-id/reports/red-team-failure-mode-analyst.md
@@ -0,0 +1,99 @@
+# Red-Team: Failure Mode Analyst — Issue #17 Plan
+
+## Verdict: REVISE
+
+Plan is mostly sound but ships with **4 critical correctness gaps** that will cause test flakes or false-positive merges, and **3 important gaps** in test scaffolding/spec language. Two minor items also called out. Recommend revisions to phase-01 and phase-02 before TDD lands.
+
+## Failure Modes
+
+### Critical — Auto-retry inflates request count on 5xx/429 tests
+- **Mode:** `internal/client/http.go` `do()` retries up to **3 attempts** on `StatusCode == 429 || StatusCode >= 500`, with 1s + 2s backoff between attempts. Phase-2 test `TestTracesGet_ServerError_ExitCode5` will see **3 hits**, not 1, plus a ~3s sleep cost. Phase-1 test `TestTracesGet_MalformedID_ValidatedClientSide` `calls == 0` assertion is fine, but any future "exactly N calls" assertion on 5xx is wrong.
+- **Status:** EXPOSED
+- **Evidence:** `internal/client/http.go:138-168`: `for attempt := range 3 { ... if resp.StatusCode != 429 && resp.StatusCode < 500 { break } ... if attempt < 2 { time.Sleep(time.Duration(1<<attempt) * time.Second) }`. The plan's `TestTracesGet_ServerError_5xx` description in phase-01.md:78 and phase-02.md:74 says "server returns 500; assert distinct error category" — silent on call count, but the ~3s sleep will slow the whole suite.
+- **Impact:** Test suite gets ~3s slower per 5xx case; if anyone asserts `calls == 1` they get a red. 403 also EXPOSED — `403 < 500` and `!= 429`, so no retry there (good), but worth pinning.
+- **Suggested mitigation:** phase-02 §1 — add to `TestTracesGet_ServerError_ExitCode5`: "server handler counts calls; assert `calls == 3` (auto-retry contract) and uses `time.Sleep` budget < 4s or stub clock". phase-02 §3 — note `403 PERMISSION_DENIED` does NOT retry (good). Optionally inject a retry-disabled client for tests via env knob or `HTTPClient.MaxRetries` — but that's a P3 scope decision.
+
+### Critical — `MapServerCode` extension would cross-impact every other command
+- **Mode:** Phase-02 §4 says "extend `MapServerCode` if `TRACE_NOT_FOUND` / `PERMISSION_DENIED` not yet mapped." But the existing map (`internal/output/exit.go:17-39`) uses `NOT_FOUND` (generic) and `TENANT_ACCESS_REVOKED` (auth-403). Adding `TRACE_NOT_FOUND` is **safe** (it's net-new). Adding `PERMISSION_DENIED` would be a **new key** that other commands' server responses may already use under another flow, silently re-routing their exit codes from `1` (unmapped/generic) to `2` (auth). No cross-impact audit in the plan.
+- **Status:** EXPOSED
+- **Evidence:** `internal/output/exit.go:17-39` shows current map; `cmd/helpers.go:152-172` `apiErrorCodeForStatus` returns `TENANT_ACCESS_REVOKED` for 403, never `PERMISSION_DENIED`. But server may return `PERMISSION_DENIED` directly as a body code — `apiErrorFromRawBody` at `cmd/helpers.go:139-143` would pass it through verbatim. Today it'd hit `MapServerCode` and fall through to `MapHTTPStatus(403)` → `ExitAuth` (already correct). So adding it to the map is **a no-op for traces** (HTTP fallback already handles it) but **a behavior change** for any caller whose server returns `PERMISSION_DENIED` with status 200 / no-status / 5xx.
+- **Impact:** Risk is low but the plan doesn't make the trade-off explicit. Worse: extending the map silently changes the value of `MapServerCode("PERMISSION_DENIED")` from `ExitGeneric` to `ExitAuth`, which is observable via the existing `MapServerCode` unit tests in `internal/output/exit_test.go:43-46` — those tests verify "unknown returns `ExitGeneric`", not specifically `PERMISSION_DENIED`, so they pass either way. But any future test/caller relying on the current generic mapping breaks silently.
+- **Suggested mitigation:** phase-02 §4 — say "Do **not** extend `MapServerCode`; rely on `MapHTTPStatus` 403 fallback. Add a unit test in `internal/output/exit_test.go` pinning `MapHTTPStatus(403) == ExitAuth` and `MapHTTPStatus(404) == ExitNotFound` for the explicit `PERMISSION_DENIED` / `TRACE_NOT_FOUND` case names." If extension is required, add a separate **cross-impact note** listing other call sites that today emit those codes.
+
+### Critical — `runCmd` test harness has zero output-format isolation
+- **Mode:** `runCmd(t, args ...string)` in `cmd/phase5_test.go:21-27` just calls `rootCmd.Execute()`. The output format is decided by `PersistentPreRunE` (`cmd/root.go:21-41`) which calls `ResolveFormatWithDefault(flagVal, cfg.ProfileOutputFormat)`. When tests run under `go test`, stdout is **piped** (not TTY), so `IsTTY(...)` returns `false` and `ResolveFormat` returns `"json"` — not `"table"`. Plan's `TestTracesGet_HappyPath_Table` (phase-01.md:74) and `TestTracesGet_TableMode_HumanReadable` (phase-02.md:69) will **never see table mode** unless they explicitly pass `--output table`. The plan does not mention this.
+- **Status:** EXPOSED
+- **Evidence:** `internal/output/tty.go:31-57` — `ResolveFormatWithDefault` falls through to `IsTTY(int(os.Stdout.Fd()))`, which under `go test` is `false`. Confirmed by `cmd/heartbeat.go:184` which also checks `IsTTY` directly. No mocking of `IsTTY` exists in the codebase.
+- **Impact:** Phase-01 and phase-02 table-mode assertions fail for the wrong reason — they fail because format defaulted to JSON, not because rendering is broken. Tester wastes time chasing a phantom render bug.
+- **Suggested mitigation:** phase-01 §3 and phase-02 §1 — every table-mode test MUST pass `--output table` explicitly: `runCmd(t, "--output", "table", "traces", "get", id)`. Add an explicit note in `phase-01.md` Non-functional requirements: "All TTY-mode assertions force format via `--output table` flag; do not rely on TTY auto-detect." Equivalently, every JSON-preservation test should pass `--output json` to remove ambiguity.
+
+### Critical — `resetTracesGetFlags` is undefined; flag persistence risks contamination
+- **Mode:** Phase-01.md:23 mentions `resetTracesGetFlags(t)` "if `tracesGetCmd` adds flags later". But `runCmd` calls `rootCmd.SetArgs(args)` and `Execute()`. Cobra's persistent flags (`--output`, `--server`, etc.) **stick** across `Execute()` calls because they're registered on `rootCmd.PersistentFlags()` once. Setting `--output table` in test A then NOT setting it in test B leaves `--output=table` set, breaking test B's default auto-detect. The pattern `resetActivityAggregateFlags` (`cmd/activity_aggregate_test.go:11-17`) only resets local flags, not persistent ones.
+- **Status:** EXPOSED
+- **Evidence:** `cmd/root.go:71-81` registers `output` on `PersistentFlags()`. `cmd/phase5_test.go:21-27` `runCmd` calls `SetArgs` but never resets persistent flag values. `cmd/p4_ux_polish_test.go:408` `resetTestFlag` exists but operates per-flag on a passed command. No global persistent-flag reset helper in the codebase.
+- **Impact:** Test order dependency. `go test -count=1 -run ...` may pass; `go test ./...` may flake depending on which test sets `-o json` last.
+- **Suggested mitigation:** phase-01 §3 — define `resetTracesGetFlags(t)` that resets `rootCmd.PersistentFlags()` `output`, `server`, `token`, `quiet`, `verbose`, `insecure` AND any local `tracesGetCmd` flags (currently none). Use `t.Cleanup(func() { resetTracesGetFlags(t) })` in every test. Also reset between test runs because cobra-persistent flags carry across.
+
+### Important — JSON-mode "preserves full payload" claim is technically false
+- **Mode:** Phase-02 acceptance: "JSON/YAML mode emits the **complete** server payload — no field dropping, no schema reshaping." Implementation path is `printer.Print(decoded)` where `decoded := map[string]any`. `printJSON` uses `json.NewEncoder(os.Stdout).Encode(...)` (`internal/output/output.go:57-61`) which (a) re-orders keys alphabetically per Go's `map[string]any` marshaling, (b) escapes HTML by default (`<`, `>`, `&` → `<`...), (c) does NOT preserve original byte-for-byte. So "preserves nested fields" is true, "preserves full payload" byte-equivalent is false.
+- **Status:** EXPOSED
+- **Evidence:** `internal/output/output.go:57-61` `enc := json.NewEncoder(os.Stdout); enc.SetIndent(...)` — no `SetEscapeHTML(false)` call. Go stdlib default is HTML-escape ON.
+- **Impact:** Trace span content with `<` `>` `&` (e.g. tool calls with XML/HTML in args, or `&&` in shell commands) will have those chars escaped in output. Round-trip `jq` parse still works (escapes are valid JSON), but byte-comparison fails. Test `TestTracesGet_JSONMode_PreservesNestedFields` description (phase-02.md:70-71) says "round-trip parse; assert every field present" — that's the *correct* assertion. But the **plan prose** at phase-02.md:20 overpromises.
+- **Suggested mitigation:** phase-02 — soften prose: "JSON/YAML mode emits the complete server payload as a structured object — every field round-trips through `jq`, but bytes are re-encoded (key order normalized, HTML chars escaped per Go stdlib default)." Optionally pass-through `json.RawMessage` instead of `map[string]any` to preserve bytes; but that's a bigger change.
+
+### Important — span shape assumptions are unverified before code lands
+- **Mode:** Phase-02 §2 says "if spans hierarchical, recurse on `children`; if flat with `parent_id`, build adjacency map first." But: (a) what if **some** traces have `spans: []` (no spans)? (b) what if `spans` field is absent / `null`? (c) what if shape is **mixed** — top-level flat list with `children: []` already populated by the server? (d) what about cycles in `parent_id` graph (malicious / bug)? (e) what if `events: null` instead of `[]`? (f) what about giant span trees (1000+ nodes) — does `PrintTree` indent prefix grow unboundedly?
+- **Status:** EXPOSED
+- **Evidence:** Phase-01 fixture-capture is the only place where the actual shape would be locked, but Phase-01 risk-mitigation (phase-01.md:96-97) says "if server unreachable, fixture derived from `traces follow` shape with TODO" — meaning code lands without verified shape if smoke probe fails. `internal/output/tree.go:14-30` `PrintTree` is unguarded against deep recursion.
+- **Impact:** Render helper may panic on `nil` `events`, infinite-loop on cyclic `parent_id`, or produce useless output on absent `spans`.
+- **Suggested mitigation:** phase-02 §2 — explicit defensive list: "(a) `spans == nil` or `len(spans) == 0` → print `(no spans)` line, skip tree; (b) `events == nil` → treat as `[]`; (c) `parent_id` adjacency map MUST track visited to break cycles; (d) hard-cap tree depth at 50 levels with `...` truncation; (e) hard-cap printed events at 50 even when n <= 10."
+
+### Important — `str`/`safeFloat` helpers — `safeFloat` does not exist
+- **Mode:** Phase-02.md:80-81 says "every field access through `str(m, "key")` / `safeFloat(m, "key")` helpers." `str` exists at `cmd/helpers.go:56-61`. `safeFloat` does **not** exist anywhere in `/cmd/` or `/internal/`. Plan does not specify who writes it or where it lives.
+- **Status:** EXPOSED
+- **Evidence:** `grep -rn "safeFloat" --include="*.go"` returns zero hits in the worktree. `str` defined `cmd/helpers.go:56-61`.
+- **Impact:** Phase-02 implementation halts at compile error or scope-creep into `cmd/helpers.go`. Defensive numeric coercion of `map[string]any` (where JSON numbers decode as `float64`, but server may emit them as strings) is non-trivial and untested.
+- **Suggested mitigation:** phase-02 Related Code Files — explicitly add `cmd/helpers.go` modification: "Add `safeFloat(m map[string]any, key string) float64` that handles `float64`, `int`, `int64`, `json.Number`, and string-encoded numbers; returns 0 on failure. Add unit tests in `cmd/helpers_test.go`."
+
+### Important — `url.PathEscape` "validate non-empty" is underspecified
+- **Mode:** Phase-02 §3 step: "Validate id: `id := args[0]; if strings.TrimSpace(id) == "" { return fmt.Errorf("trace id is required") }`." This only catches whitespace/empty. It does NOT reject path-traversal (`../../etc/passwd`), URL-control chars (`?foo=bar` injecting query, `#frag` injecting fragment), or slash (`a/b` becoming `/v1/traces/a/b` — but `url.PathEscape` does escape `/` to `%2F`, so that's safe). What does the issue reporter consider "malformed"? The plan never says.
+- **Status:** EXPOSED
+- **Evidence:** `url.PathEscape` Go stdlib docs: escapes `/`, `?`, `#`, etc. — so `goclaw traces get '../../admin'` becomes `GET /v1/traces/..%2F..%2Fadmin` which the server can choose to 404 or 400. Not a CLI security hole (server is the trust boundary). But test `TestTracesGet_MalformedID_ValidatedClientSide` (phase-02.md:73) asserts `calls == 0` — meaning the CLI MUST short-circuit before HTTP, but the plan never lists which inputs count as "malformed client-side".
+- **Impact:** Test is ambiguous. Implementer may pick a different set than tester expects.
+- **Suggested mitigation:** phase-02 §3 — explicit list: "client-side rejects only (a) empty after trim, (b) length > 256 (DoS guard). All other inputs go through `url.PathEscape` and let the server decide. Test passes `""`, `"   "`, and a 1KB string for `calls == 0` assertions; passes `"../../admin"` for `calls == 1` with the server returning 400."
+
+### Minor — Phase-01 smoke probe has TODO escape; Phase-02 manual smoke has none
+- **Mode:** Phase-01 risk mitigation explicitly allows "if server unreachable, mark TODO". Phase-02 §6 says "Live smoke (manual) — re-run the four `goclaw traces get` invocations from Phase 1's smoke probe; confirm now produces useful output" with no fallback.
+- **Status:** EXPOSED
+- **Evidence:** phase-02.md:102.
+- **Impact:** Phase-02 success-criteria "Manual smoke: ... renders a readable summary" (phase-02.md:111) can't be checked off if no gateway is reachable; ship pipeline stalls.
+- **Suggested mitigation:** phase-02 §6 — add "If gateway unreachable, mark manual-smoke as `DEFERRED` and gate the PR description to require a smoke-check before merge to `main`."
+
+### Minor — Phase-3 codebase-summary update target verified
+- **Mode:** Plan promises a bullet under "output/render section" in `docs/codebase-summary.md`.
+- **Status:** MITIGATED
+- **Evidence:** `docs/codebase-summary.md` line 149 has `#### output/ — Output Formatting + Error Handling`, line 587 references tree rendering. Section exists.
+- **Impact:** None — claim is true.
+- **Suggested mitigation:** None.
+
+### Minor — `gh issue create -R digitopvn/goclaw` rights not pre-verified
+- **Mode:** Phase-03 §5 conditionally runs `gh issue create -R digitopvn/goclaw`. If the user lacks issue-creation rights on that repo, the command fails mid-ship.
+- **Status:** EXPOSED
+- **Evidence:** phase-03.md:52. No `gh api repos/digitopvn/goclaw -q .permissions` pre-check.
+- **Impact:** Ship pipeline stops with a permission error instead of completing the docs update.
+- **Suggested mitigation:** phase-03 §5 — prepend `gh api repos/digitopvn/goclaw -q .permissions.push 2>/dev/null` check; on failure, drop the drafted issue body into `plans/.../reports/upstream-issue-draft.md` and instruct the user to file it manually.
+
+### NOT_APPLICABLE — 403 retry concern
+- **Mode:** "Does the client retry on 403?"
+- **Status:** NOT_APPLICABLE
+- **Evidence:** `internal/client/http.go:161`: `if resp.StatusCode != 429 && resp.StatusCode < 500 { break }`. 403 breaks immediately. No retry. No mitigation needed.
+
+## Open Questions
+
+- Should `HTTPClient` gain a `MaxRetries` knob to disable retries in tests, or is the ~3s 5xx-test cost acceptable (5xx tests are rare)?
+- Does the issue reporter's actual repro live trace_id still exist on `goclaw.zuey.me`? If not, what's the smoke-probe fallback gateway?
+- Does the server emit `PERMISSION_DENIED` as a body code anywhere, or always `TENANT_ACCESS_REVOKED`? Determines whether `MapServerCode` extension is actually a no-op or a behavior change.
+- For deeply nested span trees (>50 levels), is truncation `... (N more levels)` acceptable, or should the tree render still walk but flatten?
+
+Status: DONE
+Severity counts: Critical=4, Important=4, Minor=3, NOT_APPLICABLE=1
diff --git a/plans/260528-1357-fix-trace-details-by-id/reports/red-team-scope-complexity-critic.md b/plans/260528-1357-fix-trace-details-by-id/reports/red-team-scope-complexity-critic.md
new file mode 100644
index 0000000..f185aa3
--- /dev/null
+++ b/plans/260528-1357-fix-trace-details-by-id/reports/red-team-scope-complexity-critic.md
@@ -0,0 +1,109 @@
+# Red-Team: Scope & Complexity Critic — Issue #17 Plan
+
+## Verdict: TRIM
+
+The plan attacks a real bug with the right instincts (TDD, structured rendering, central error mapping) but inflates work by ~40-50%. Phase 1 over-tests for a fix-before-known-shape; Phase 2 carries a deferred decision (`unmarshalMapStrict` vs inline) into implementation; presentation polish (event truncation, defensive helpers) is masquerading as bug-fix scope. Exit-code mapping is also mostly free — `MapServerCode` + `MapHTTPStatus` already cover all 4 categories via the existing fallback (verified at `internal/output/exit.go:18-44`). Trim, don't reject.
+
+---
+
+## Findings
+
+### Critical — Pre-fixture test count inflates Phase 1
+
+- **Item:** Phase 1 step 3 — 7 red test cases written before the fixture is captured from a live gateway.
+- **Disposition:** TRIM
+- **Rationale:** Three tests (`HappyPath_Table`, `MalformedID`, `MalformedResponse`) assert behavior of code that doesn't exist yet. They're Phase 2 acceptance tests squatting in Phase 1, which is fine for TDD but inflates Phase 1's "this is the repro" mission. The actual repro need is: (a) one test proving the current JSON-dump-in-TTY symptom, and (b) the captured fixture. The other 5 can be added at the top of Phase 2 (one commit) without breaking TDD discipline.
+- **Leaner version:** Phase 1 writes 2 tests (`TestTracesGet_PathAndMethod`, `TestTracesGet_TableMode_DumpsRawJSON_RED`) + the fixture + the classification verdict. Move the other 5 to Phase 2 step 1.
+
+### Critical — `unmarshalMapStrict` decision deferred to implementation
+
+- **Item:** Phase 2 line 63: "add `unmarshalMapStrict(data) (map[string]any, error)`... OR replace usage at the call site only. Decision in step 1."
+- **Disposition:** CUT the helper, KEEP the inline approach.
+- **Rationale:** `unmarshalMap` is called in 6 places across `cmd/` (traces, usage summary/costs/timeseries/breakdown). Adding `unmarshalMapStrict` is either a one-off (only `tracesGetCmd` uses it — confusing) or a project-wide refactor (out of scope). The bug only requires checking the unmarshal error in this one command. Inline is 4 lines. Leaving the decision to "step 1" forces re-planning at implementation time, which is exactly the anti-pattern that planning is supposed to prevent.
+- **Leaner version:** Phase 2 mandates: replace `printer.Print(unmarshalMap(data))` with an inline `json.Unmarshal` + error check. No new helper. Three lines diff.
+
+### Important — Separate `cmd/traces_render.go` file at 80-120 LOC
+
+- **Item:** Phase 2 — `cmd/traces_render.go` with `renderTraceDetail` + `buildSpanTree` + defensive `str/safeFloat` helpers.
+- **Disposition:** TRIM
+- **Rationale:** `str()` already exists at `cmd/helpers.go:56`. `tracesListCmd` (12 LOC inline render via `output.NewTable`) is the established pattern; this plan would break that pattern for one command. If span-tree rendering stays in scope (see next finding), it justifies a helper, but 80-120 LOC is double what's needed. The header card is 3-5 `printer.Print(output.NewTable(...))` calls. Span tree is `PrintTreeRoot(buildSpanTree(spans), os.Stdout)` — that's the only real helper that earns its keep.
+- **Leaner version:** Inline the header in `tracesGetCmd.RunE` (~15 LOC). Extract only `buildSpanTree(spans []any) output.TreeNode` to `cmd/traces.go` as a package-level func, not a new file. Target: total addition ~40 LOC across one file.
+
+### Important — Span-tree rendering vs issue #17 acceptance criteria
+
+- **Item:** Phase 2 architecture — span tree via `output.PrintTree`.
+- **Disposition:** KEEP (with constraint)
+- **Rationale:** Re-read of issue #17 verbatim: "including relevant metadata, root/child events, messages, tool calls, status, timing". "Root/child events" reads as hierarchical structure — span tree IS a reasonable interpretation, not scope creep. Also the existing `tracesGetCmd.Short` is literally "Get trace with span tree" (`cmd/traces.go:62`), so this is a documented promise the code never delivered. KEEP, but cap aggressively: ~20 LOC for `buildSpanTree`, degrade gracefully if no spans/no parent_id field.
+- **Leaner version:** Build adjacency only if `spans` field present and has >1 entry. If flat or absent, skip the tree block and print events list directly. No "deep nesting / cyclic refs" defensive code — the server controls this payload, and if it ever sends a cycle, that's a server bug worth crashing on (or trivially guard with depth limit ≤32, ~3 LOC).
+
+### Important — Events truncation logic
+
+- **Item:** Phase 2 step 2 — "first 5 + last 1 with `... (N-6 more) ...` separator if N > 10".
+- **Disposition:** CUT
+- **Rationale:** This is presentation polish, not bug-fix scope. The bug is "unusable output". A simple `EVENTS (N=12):` header + full list, one per line, is human-readable; if the user wants pagination they pipe to `less`. The truncation logic adds branching, off-by-ones, and a test surface that doesn't earn its weight for a P2 bugfix. If a real trace has 5,000 events, that's a separate UX issue worth filing.
+- **Leaner version:** Print `EVENTS (N=<count>):` header then list each event on its own line as `<timestamp> <type> <one-line-summary>`. ~8 LOC. Done.
+
+### Important — 4 distinct exit codes — already free, "verify" step is over-scoped
+
+- **Item:** Phase 2 step 4 — "Verify error-code mapping in `internal/output/error.go`... extend `MapServerCode` if `TRACE_NOT_FOUND` / `PERMISSION_DENIED` not yet mapped."
+- **Disposition:** KEEP the requirement (it's locked per CLAUDE.md), CUT the "extend" hedging.
+- **Rationale:** Verified at `internal/output/exit.go:18-44` — `serverCodeMap` maps `NOT_FOUND→3`, `TENANT_ACCESS_REVOKED→2`, `INVALID_REQUEST→4`, `INTERNAL/UNAVAILABLE→5`. And `apiErrorCodeForStatus` at `cmd/helpers.go:152-172` already translates HTTP 403→`TENANT_ACCESS_REVOKED`, 404→`NOT_FOUND`, 400/422→`INVALID_REQUEST`, 5xx→`INTERNAL`. The server doesn't need to emit `TRACE_NOT_FOUND` — bare HTTP status is already correctly mapped through two layers. The 4-category exit-code requirement is met today by `apiErrorFromRawBody` + `MapServerCode`. The remaining gap is **client-side malformed-id validation** (`MalformedID` → ExitValidation), which has to be added because the server never sees the request.
+- **Leaner version:** Phase 2 step 4 becomes: "Add client-side id validation in `tracesGetCmd.RunE`: reject empty/whitespace id with an error that surfaces `INVALID_REQUEST` (or `output.ExitValidation` directly). No `internal/output/*` edits expected; if test fails for a specific server code, add to `serverCodeMap` at that point."
+
+### Important — Phase 1 effort (2-3h) is overstated for the trimmed scope
+
+- **Item:** Phase 1 — effort `"2-3h"` covers smoke probe + fixture + 7 red tests + repro report + classification.
+- **Disposition:** TRIM
+- **Rationale:** With the test count trim above, Phase 1 = smoke probe (15 min if gateway up) + fixture capture + 2 tests + report. Realistic: 45-75 min. The "2-3h" estimate suggests the planner padded for the bloated test count.
+- **Leaner version:** Re-estimate Phase 1 at 1-1.5h after the test trim. Frees attention budget for Phase 2.
+
+### Minor — Upstream issue filing is unforked
+
+- **Item:** Phase 3 step 5 — "if Phase 1 classification was server-side or hybrid, file upstream issue."
+- **Disposition:** KEEP, but fork plan branches.
+- **Rationale:** Scout of `cmd/traces.go:68` shows the path `/v1/traces/<id>` is straightforward — a server-side root cause would most likely be tenant-filter or payload-shape, both of which still need a CLI render fix to handle the response shape. Likely outcome: CLI-only or hybrid. If pure server-side (gateway returns 404 for valid IDs), the CLI render work is wasted but the test infrastructure isn't. Don't fork phases, but add a branch point at end of Phase 1: "If pure server-side: Phase 2 reduces to error-mapping + smoke test only; render scope deferred."
+- **Leaner version:** Add a one-liner at end of Phase 1 Success Criteria: "Classification verdict drives Phase 2 scope reduction — record verdict in plan.md before starting Phase 2."
+
+### Minor — `docs/codebase-summary.md` update trigger undefined
+
+- **Item:** Phase 3 — "add a bullet... if `cmd/traces_render.go` was created" / "if non-trivial".
+- **Disposition:** TRIM
+- **Rationale:** Vague triggers create skip-the-step inertia. Either the helper is a new public-ish surface worth documenting, or it isn't. After the trim above (no separate file, ~40 LOC inline + one tiny helper), it's not.
+- **Leaner version:** Cut the `codebase-summary.md` step entirely. The CHANGELOG entry is enough. If a future planner needs to know about the helper, grep finds it.
+
+### Minor — Phase 3 step 1 hand-edits plan.md status
+
+- **Item:** Phase 3 step 1 — "mark phase 1, 2 as completed in `plan.md`".
+- **Disposition:** TRIM
+- **Rationale:** The ck-stack has `/ck:plan-check` / `ck plan check` (referenced in checklists) for status sync. Hand-editing plan.md drift is exactly what that tool exists to prevent. Not catastrophic, but inconsistent with project tooling.
+- **Leaner version:** Replace with: "Run `/ck:plan-check --sync` (or hand-edit if tooling unavailable) to flip phase 1 and 2 to `completed` in `plan.md`." Cite the tool, fall back to manual.
+
+---
+
+## Summary of Recommended Cuts
+
+| Original | Trimmed |
+|---|---|
+| 14 tests total | 9 tests total (2 in Phase 1 + 7 in Phase 2) |
+| Separate `cmd/traces_render.go` file, 80-120 LOC | Inline + one small `buildSpanTree` helper in `cmd/traces.go`, ~40 LOC |
+| `unmarshalMapStrict` helper OR inline (deferred) | Inline only — decision made now |
+| Events: "first 5 + last 1 with separator" | Print all events, one per line |
+| Phase 2 step 4: "verify and possibly extend `MapServerCode`" | "Add client-side id validation; trust existing mapping" |
+| Phase 1 effort 2-3h | Phase 1 effort 1-1.5h |
+| Plan.md hand-edit for status | `/ck:plan-check --sync` |
+| `codebase-summary.md` update (conditional) | CUT |
+
+**Net expected savings:** ~1.5h Phase 1 + ~1h Phase 2 + cognitive load of one deferred decision. Total ~25-30% reduction. Bug still gets fixed, contract still honored, TDD still enforced.
+
+---
+
+## Open Questions for User
+
+None affect verified user decisions. TDD discipline is preserved (red tests still precede impl in both phases). Exit-code contract is preserved (the trim doesn't reduce categories, just shifts where they're enforced). Span-tree rendering is preserved because it appears to honor the issue text "root/child events" AND the existing command's stated `Short: "Get trace with span tree"` promise.
+
+If the user disagrees on **event truncation** specifically (Finding "Events truncation"), surface that — it's the only judgment call where reasonable people might differ on UX-vs-scope.
+
+---
+
+**Status:** DONE
+**Severity counts:** Critical: 2 / Important: 5 / Minor: 3
diff --git a/plans/260528-1357-fix-trace-details-by-id/reports/red-team-security-adversary.md b/plans/260528-1357-fix-trace-details-by-id/reports/red-team-security-adversary.md
new file mode 100644
index 0000000..bd1dfbd
--- /dev/null
+++ b/plans/260528-1357-fix-trace-details-by-id/reports/red-team-security-adversary.md
@@ -0,0 +1,155 @@
+# Red-Team: Security Adversary — Issue #17 Plan
+
+## Verdict: REVISE
+
+Plan is sound on the functional axis but ships with several under-specified security gates around (a) fixture/issue-body scrubbing, (b) path traversal via the trace id parameter, (c) ANSI injection parity claim, and (d) information-disclosure trade-off in the 404-vs-403 message split. None are showstoppers — all are addressable by tightening Phase 1 step 2 and Phase 2's security-considerations block before code lands.
+
+## Findings
+
+### Critical — Path traversal surface via `url.PathEscape` for trace id
+
+- **Threat:** `url.PathEscape` does NOT encode `.`, so `goclaw traces get ..` produces request path `/v1/traces/..`. `url.PathEscape("../../etc/passwd")` does encode the `/` as `%2F`, but `url.PathEscape("..")` returns the literal `".."`. If a future server route/middleware on `digitopvn/goclaw` ever path-normalizes before authz (or proxies through a path-rewriting reverse proxy like nginx with `merge_slashes`/`alias`), the CLI becomes the easiest tool to probe traversal.
+- **Status:** HOLE (CLI-side input validation gap).
+- **Evidence:**
+  - Verified empirically: `url.PathEscape("..") == ".."`, `url.PathEscape("./../x") == ".%2F..%2Fx"`, `url.PathEscape("../../etc/passwd") == "..%2F..%2Fetc%2Fpasswd"`. Run output captured in this review session.
+  - Plan phase-02-cli-fix-output-and-error-mapping.md:84 — `url.PathEscape(id)` is the only id sanitization step.
+  - Plan phase-02-cli-fix-output-and-error-mapping.md:84 — id validation is `strings.TrimSpace(id) == ""` only.
+- **Attack scenario:** Operator who has a valid auth token but limited trace-read scope runs `goclaw traces get ..` or `goclaw traces get .`. The server receives `GET /v1/traces/..` — at minimum a noisy/unhandled route; at worst, on a future server build that adds `path.Clean` ahead of authz, lands on `/v1/traces/` (list endpoint) under the singular's authz check.
+- **Suggested mitigation:** In Phase 2 step 3 id validation, after `TrimSpace` add:
+  ```
+  if id == "." || id == ".." || strings.ContainsAny(id, "/\\") {
+      return fmt.Errorf("invalid trace id")
+  }
+  ```
+  And add a Phase 1 test case `TestTracesGet_RejectsTraversalIDs` covering `..`, `.`, `../foo`, `foo/bar`, `\x00`. Also: tighten to a positive regex (`^[A-Za-z0-9_-]{1,128}$`) if Phase 1's live capture shows trace ids are opaque ULIDs/UUIDs (most likely). Note: this collapses to AC #3's "malformed id" category.
+
+---
+
+### Critical — Fixture/issue-body scrubbing protocol is hand-wavy
+
+- **Threat:** Phase 1 step 2 says "Strip secrets" and Phase 1 Security Considerations says "Strip `user_id`, `tenant_id`, message contents, tool-call args. Keep only structural keys + sentinel values." But (a) no `jq` recipe is given, (b) the scrub list is incomplete — span `name` fields can contain user queries (e.g. `name: "search('user@example.com')"` ), tool-call `result` payloads may contain PII, HTTP-request span attributes can contain auth headers, and (c) Phase 3 step 5 may file an upstream issue using a body drafted in Phase 1 — same payload, same scrub requirement, but Phase 3 doesn't re-state it.
+- **Status:** HOLE (operational/process gap; high likelihood of accidental commit of PII).
+- **Evidence:**
+  - Phase 1 step 2 line 70: "Strip secrets." (no how).
+  - Phase 1 Security Considerations line 102 enumerates 4 fields but misses span `name`, span `attributes`, event `message`/`error`, headers in HTTP spans, trace metadata bags.
+  - Phase 3 step 5 line 52: issue body reuses the Phase-1 drafted body, but Phase 3 Security Considerations line 76 only says "Smoke probe artifacts ... must be scrubbed" without listing the issue-body case explicitly.
+- **Attack scenario:** Operator captures a live trace whose span name is `OpenAI.chat({"messages":[{"role":"user","content":"my SSN is 123-45-6789"}]})`. Operator strips only the 4 listed keys, commits fixture to `cmd/testdata/trace_detail_get.json`, and the PII enters git history. Same payload pasted into the `digitopvn/goclaw` issue body via `gh issue create` becomes public on a different repo.
+- **Suggested mitigation:** Add to Phase 1 step 2 (and reference from Phase 3 step 5):
+  ```
+  # Scrub recipe (run once; assume captured response in raw.json):
+  jq '
+    walk(if type == "object" then
+      with_entries(
+        if .key | IN("user_id","tenant_id","org_id","email","token",
+                     "authorization","auth","api_key","secret",
+                     "messages","tool_calls","arguments","result",
+                     "input","output","prompt","completion","content",
+                     "attributes","metadata","headers","query","body","error")
+        then .value = "REDACTED"
+        else . end)
+    else . end)
+    | .payload.trace_id = "trc_XXXXXXXXXXXXXXXX"
+    | .payload.agent_id = "agt_XXXXXXXX"
+  ' raw.json > cmd/testdata/trace_detail_get.json
+  ```
+  Then `rg -i 'email|@|bearer|sk-|eyJ' cmd/testdata/trace_detail_get.json` as a post-scrub gate. Reuse the SAME pipeline for the upstream-issue body. Phase 3 step 5 should explicitly include `gh issue create --body-file scrubbed.md`, never inline.
+
+---
+
+### Critical — `curl` command in repro report risks token leakage
+
+- **Threat:** Phase 1 step 1 instructs running `curl -sS -H "Authorization: Bearer $TOKEN" ...` AND Phase 1 step 5 says to include "Curl command + response samples" in `reports/repro-260528-issue17.md`. If the operator copy/pastes the executed command from shell history, the token expands.
+- **Status:** HOLE (documentation/operational; medium likelihood — depends on operator hygiene).
+- **Evidence:**
+  - Phase 1 step 1 line 69: `curl -sS -H "Authorization: Bearer $TOKEN" "$SERVER/v1/traces/<id>"` — uses env var here, good.
+  - Phase 1 step 5 line 81: "Curl command + response samples (redacted)" — no explicit guard against pasting the expanded token.
+- **Attack scenario:** Operator runs the curl in zsh with history-expansion or pastes from `history` output where the variable was expanded (some shells do this on certain `set` configs). Token enters the report → enters git → gets pushed.
+- **Suggested mitigation:** Phase 1 step 5: explicit rule — "Quote curl as literal: `curl -H 'Authorization: Bearer $GOCLAW_TOKEN' ...` (single-quoted, never expanded). Add a pre-commit grep gate: `! rg -n 'Bearer [A-Za-z0-9._\-]{20,}' plans/`." And: set `HISTFILE=/dev/null` or prefix with space-then-command to skip history (`setopt HIST_IGNORE_SPACE`/`HISTCONTROL=ignorespace`) before running the probe.
+
+---
+
+### Important — ANSI escape injection: parity claim is half-true
+
+- **Threat:** Phase 2 Security Considerations line 123 defers ANSI sanitization with "accept that terminal rendering of untrusted text is consistent with `traces list` (existing risk surface)." Verified: `tracesListCmd` does render user-controlled strings (`agent_id`, `status`) into table cells (cmd/traces.go:53). So the parity claim has factual basis — but the surface AREA is different. `traces list` renders ~7 short flat fields (id, agent, status, ms, tokens, tokens, cost) which are server-generated. The new `traces get` will render span `name`, event `message`, possibly tool-call args — fields that are far more likely to contain attacker-controlled content (user prompts, LLM outputs, tool stdout/stderr). LLM output is the canonical ANSI-injection vector in 2026.
+- **Status:** HOLE — parity is structurally true (no existing scrubbing) but the risk-weighted exposure is materially higher. The "accept parity" decision should be made by the user, not the planner.
+- **Evidence:**
+  - cmd/traces.go:53 — `str(t, "agent_id")` etc. unsanitized into table.
+  - `rg "safeRune|ANSI" internal/output/ cmd/` returns no hits — no current ANSI sanitization anywhere.
+  - Phase 2 line 79 — TreeNode names will include `<span_id> [<duration_ms>ms] <name>` where `<name>` is user/LLM-controlled.
+- **Attack scenario:** Adversarial LLM (jailbroken or prompt-injected via tool output) emits a span name containing `\x1b[2J\x1b[H` (clear screen) + `\x1b]0;rm -rf ~\x07` (set terminal title) + a fake "operation succeeded" message. Operator runs `goclaw traces get <id>` in a terminal, sees a forged success line, may run further commands assuming the trace is benign. With OSC 52 (`\x1b]52;c;<base64>\x07`), the span name can quietly overwrite the operator's clipboard with arbitrary text.
+- **Suggested mitigation:** Phase 2 — implement the `safeRune` helper now (10 lines, no extra dependency). Strip `\x1b`, `\x07`, all `C0` controls except `\t`. Apply in `renderTraceDetail` and `buildSpanTree`. Apply the same to `tracesListCmd` while you're in the area (one-line wrap around `str(...)`). Document the helper in `internal/output/text_sanitize.go`. Cost: ~20 LOC; benefit: closes the entire family.
+
+---
+
+### Important — 404 vs 403 message split is an existence oracle (intentional?)
+
+- **Threat:** Plan deliberately splits "permission denied for trace `<id>`" (403/PERMISSION_DENIED) from "trace `<id>` not found" (404/TRACE_NOT_FOUND). That distinction reveals whether a trace id exists across tenants. AC #3 from issue #17 demands the distinction, so this is a documented trade-off — but the plan does not name it as a trade-off.
+- **Status:** HOLE (information disclosure) but intentional per AC; needs explicit acknowledgment.
+- **Evidence:**
+  - Phase 2 Architecture line 42-45: distinct messages by code.
+  - plan.md AC #3 line 33: "Errors clearly distinguish: not found, permission denied, malformed id, and server/API failure."
+- **Attack scenario:** Attacker with valid token for tenant A iterates plausible trace-id prefixes. `403 permission denied for trace X` confirms X exists under tenant B; `404 not found` rules X out. Over time the attacker enumerates cross-tenant trace ids — useful for crafting follow-on social-engineering or for confirming activity against a known target.
+- **Suggested mitigation:** This is primarily a server-side concern (the server is choosing to return 403 vs 404 distinct codes), and the CLI faithfully surfaces what the server returns. Two options:
+  1. Defer to server: add a Phase 2 note "the 403/404 distinction is the server's choice; CLI surfaces it per AC #3. If the server later merges to a single 404 for security, CLI behavior follows automatically because `MapHTTPStatus(404) -> 3` covers both."
+  2. Add a `--paranoid` flag (out of scope; just flag in plan).
+  Recommend option 1: add 3-line "Information disclosure trade-off" subsection to Phase 2 Security Considerations naming this explicitly so the next reviewer doesn't re-litigate it.
+
+---
+
+### Important — Missing server-code entries for `TRACE_NOT_FOUND` / `PERMISSION_DENIED`
+
+- **Threat:** Plan Phase 2 step 4 line 91-92 says "Confirm `MapServerCode('TRACE_NOT_FOUND')` returns `ExitNotFound`. If not, extend the switch." Verified: `internal/output/exit.go:17-39` does NOT contain `TRACE_NOT_FOUND` or `PERMISSION_DENIED` / `FORBIDDEN`. So `MapServerCode` returns `ExitGeneric` (1) for both, and the code falls through to `MapHTTPStatus` via `FromError` (internal/output/error.go:148-152). That path works iff `APIError.HTTPStatus()` is populated — and that's a non-trivial invariant to depend on.
+- **Status:** HOLE (latent fragility, not exploit-grade).
+- **Evidence:**
+  - `internal/output/exit.go:17-39` — no TRACE_NOT_FOUND, no PERMISSION_DENIED, no FORBIDDEN.
+  - `internal/output/error.go:148-152` — fallback only fires when `errors.As(err, &aws)` succeeds AND `HTTPStatus() > 0`. If the server returns a JSON error envelope WITHOUT a status code surface in `APIError`, exit code is 1 instead of 2/3.
+- **Attack scenario:** Not exploit; degraded UX. Automation script checking `$? == 2` for auth failures sees `$? == 1` and treats it as unknown error — bad escalation behavior.
+- **Suggested mitigation:** Phase 2 step 4 — make the extension non-conditional. Add `TRACE_NOT_FOUND -> ExitNotFound`, `PERMISSION_DENIED -> ExitAuth`, `FORBIDDEN -> ExitAuth` to `serverCodeMap` unconditionally. Verify with `TestMapServerCode_TraceCodes`. Cost: 4 lines + 1 test.
+
+---
+
+### Important — JSON full-payload preservation may leak internal fields
+
+- **Threat:** Phase 2 line 20-21 promises "JSON/YAML mode emits the **complete** server payload — no field dropping, no schema reshaping." That's the AI-ergonomics contract, but it converts a previous bug (silent JSON dump in table mode showed everything anyway) into an intentional, documented passthrough. If the server payload includes internal-only fields (`_internal_user_id`, debug counters, `__raw_sql`, etc.) the CLI now reliably exposes them. Previously, a user in TTY who saw a JSON blob might not parse it; now they get a JSON-structured output that is easier to grep/extract.
+- **Status:** MITIGATED (within CLI scope) / HOLE (cross-boundary, server's responsibility).
+- **Evidence:**
+  - Phase 2 line 20-21 — explicit full-passthrough decision.
+  - Phase 1 Architecture line 36 — current code already does `printer.Print(unmarshalMap(data))` which in JSON mode passes through fully. So the surface is NOT new — but the *test* in Phase 2 line 70 (`TestTracesGet_JSONMode_PreservesNestedFields`) freezes the contract as "every field round-trips" — which becomes a forcing function against future server-side scrubbing.
+- **Attack scenario:** Server team adds an `_internal_debug_sql` field for observability; CLI test fails because the field is dropped/scrubbed; pressure pushes back on the server scrubbing.
+- **Suggested mitigation:** Phase 2 — relax the test from "every input field present in output" to "every input field UNDER `payload` present in output, where input field name does not start with `_`" or "explicit whitelist of known-public fields from the captured fixture." Document the rule in `internal/output/error.go` or a new `docs/trace-payload-contract.md`. Cost: 3-line test refinement, 5-line doc.
+
+---
+
+### Minor — Fixture contains real `agent_id` / `trace_id` even after scrub
+
+- **Threat:** Plan Phase 1 says "Keep only structural keys + sentinel values" but does not specify the sentinel format. Real trace ids and agent ids — even without other PII — can be cross-referenced against logs by anyone with operational access to the gateway. They are not secret but they ARE indirect identifiers (tying CLI test commits to specific live traces).
+- **Status:** MITIGATED-IF-FOLLOWED (depends on operator using sentinels). Covered by the `jq` recipe in the Critical finding above.
+- **Evidence:** Phase 1 Security Considerations line 102 mentions sentinel values but doesn't define them.
+- **Attack scenario:** Marginal — researcher cross-references `trace_id` in fixture against gateway logs to identify a real user session. Low-likelihood, low-impact, but free to mitigate.
+- **Suggested mitigation:** Sentinel pattern (already in the `jq` recipe above): `trc_XXXXXXXXXXXXXXXX`, `agt_XXXXXXXX`, `usr_XXXXXXXX`. Add a `TestFixtureContainsOnlySentinels` test that asserts the fixture matches `^[a-z]{3}_X+$` on id fields, blocking accidental real-id regression.
+
+---
+
+### Minor — Timing oracle: not present in current plan
+
+- **Threat:** "Does a test for exit code 3 measure HTTP latency that leaks whether trace existed pre-auth-check?"
+- **Status:** NOT_APPLICABLE.
+- **Evidence:** Phase 1 and Phase 2 tests assert only exit code and message string, no latency assertions. Tests use `httptest.NewServer` which has near-zero latency anyway.
+- **Attack scenario:** None.
+- **Suggested mitigation:** None. Flag if any future PR adds `time.Since(start)` assertions on the error path.
+
+---
+
+## Severity Counts
+
+- Critical: 3 (Path traversal, Fixture scrubbing recipe, Curl token leakage)
+- Important: 4 (ANSI parity claim, 404/403 oracle, Missing server-code entries, JSON passthrough contract test)
+- Minor: 2 (Sentinel format, Timing oracle [N/A])
+
+## Open Questions
+
+- Is the trace id format actually `^[A-Za-z0-9_-]{N}$` (typical ULID/UUID/nanoid)? If yes, regex validation closes path traversal cleanly. If trace ids can contain arbitrary characters (e.g. user-supplied custom ids), validation must be looser and traversal mitigation falls back to explicit `..`/`.` rejection. Phase 1's live capture will resolve this.
+- Does `digitopvn/goclaw` server treat `PERMISSION_DENIED` and `FORBIDDEN` interchangeably, or is one the convention? Phase 2 step 4 should grep the server repo (if accessible) to confirm.
+- Is there a workspace-wide pre-commit hook for secret scanning (e.g. `gitleaks`, `trufflehog`) that would catch a leaked token in the fixture? If yes, the curl-token mitigation can lean on it. If no, this should become a `.git/hooks/pre-commit` addition in Phase 3.
+
+Status: DONE
diff --git a/plans/260528-1357-fix-trace-details-by-id/reports/repro-260528-issue17.md b/plans/260528-1357-fix-trace-details-by-id/reports/repro-260528-issue17.md
new file mode 100644
index 0000000..feeca98
--- /dev/null
+++ b/plans/260528-1357-fix-trace-details-by-id/reports/repro-260528-issue17.md
@@ -0,0 +1,109 @@
+# Repro & root cause — issue #17 (cannot read trace details by id)
+
+Date: 2026-05-28
+Phase: 1
+Worktree: `worktree-fix-trace-details-by-id`
+
+## Smoke probe
+
+**Intended target:** `goclaw.zuey.me` (production).
+**Result:** auth blocked. No `~/.goclaw/config.yaml`, no token in env. The plan-documented escape applies — use a stub fixture derived from the existing `traces follow` payload shape ([cmd/traces_follow_test.go:129-130](cmd/traces_follow_test.go:129)), marked `_TODO_refresh` for the Phase 3 reviewer gate to enforce a real-gateway refresh before merge.
+
+**Curl command for future smoke probe (env-var token, no inline secret):**
+
+```bash
+export TOKEN="$(goclaw config get token 2>/dev/null || cat ~/.goclaw/token)"
+export SERVER="https://goclaw.zuey.me"
+TRACE_ID="$(goclaw traces list --limit 1 -o json | jq -r '.payload[0].trace_id')"
+curl -sS -H "Authorization: Bearer $TOKEN" "$SERVER/v1/traces/$TRACE_ID" > /tmp/trace_raw.json
+goclaw traces get "$TRACE_ID" > /tmp/trace_tty.txt
+goclaw traces get "$TRACE_ID" -o json > /tmp/trace_json.txt
+goclaw traces get bogus-id-12345 -o json > /tmp/trace_404.txt 2>&1
+```
+
+After refresh, run the `jq walk` scrub recipe from phase-01 step 2, then `grep -i 'eyJ\|Bearer\|sk-\|token=' cmd/testdata/trace_detail_get.json` must return 0 lines.
+
+## Fixture (scrubbed stub — sample)
+
+```json
+{
+  "_TODO_refresh": "stub fixture ...; refresh against goclaw.zuey.me before merge per phase-03 reviewer gate",
+  "trace_id": "trace_FIXTURE_001",
+  "agent_id": "agent_FIXTURE_001",
+  "session_key": "session_FIXTURE_001",
+  "user_id": "user_REDACTED",
+  "tenant_id": "tenant_REDACTED",
+  "status": "success",
+  "duration_ms": 2000,
+  "spans": [
+    {"span_id": "span_001", "parent_span_id": null, "name": "agent.run", "kind": "agent", "status": "success"},
+    {"span_id": "span_002", "parent_span_id": "span_001", "name": "llm.call", "kind": "llm", "status": "success"},
+    {"span_id": "span_003", "parent_span_id": "span_001", "name": "tool.call", "kind": "tool", "status": "success"}
+  ],
+  "events": [
+    {"event_id": "ev_001", "type": "llm.prompt"},
+    {"event_id": "ev_002", "type": "llm.completion"},
+    {"event_id": "ev_003", "type": "tool.invoke"}
+  ]
+}
+```
+
+Secret-scan post-check: `grep -i 'eyJ\|Bearer\|sk-\|token=' cmd/testdata/trace_detail_get.json` → 0 lines. ✅
+
+## Test result summary
+
+```
+go test -count=1 ./cmd/... -run TestTracesGet -v
+```
+
+| Test | Result | Meaning |
+|------|--------|---------|
+| `TestTracesGet_PathAndMethod` | PASS | Wire contract (`GET /v1/traces/{id}`) already correct. |
+| `TestTracesGet_HappyPath_JSON_LocksFixture` | PASS | JSON-mode envelope round-trips correctly. |
+| `TestTracesGet_TableMode_HumanReadable_RED` | **FAIL** | Table mode emits raw JSON beginning with `{` — the reported "unusable output" bug, locked. |
+
+Failure assertion: `table mode rendered raw JSON (starts with '{')`. Exactly as predicted by the scout findings.
+
+## Classification: **CLI-side**
+
+All three verified defects are in the CLI; no server-side root cause is required for AC#1–#3.
+
+| # | Defect | Evidence | Fix lane |
+|---|--------|----------|----------|
+| 1 | Table mode falls back to JSON dump for `map[string]any` payload | [cmd/traces.go:72](cmd/traces.go:72) calls `printer.Print(unmarshalMap(data))`; `output.Printer.Print` only formats `*TableData` in table mode, otherwise JSON-fallbacks ([internal/output/output.go:30-37](internal/output/output.go:30)) | Phase 2 — inline render (header + span tree via `output.PrintTree` + flat events list) |
+| 2 | `unmarshalMap` silently swallows `json.Unmarshal` errors | [cmd/helpers.go:48-53](cmd/helpers.go:48) — literal `_ = json.Unmarshal(data, &m)` | Phase 2 — inline `json.Unmarshal` with `return fmt.Errorf("decode trace payload: %w", err)` |
+| 3 | No id validation; raw `args[0]` concatenated into URL with no `url.PathEscape` | [cmd/traces.go:68](cmd/traces.go:68) — `"/v1/traces/" + args[0]` | Phase 2 — strict allowlist regex `^[A-Za-z0-9._-]+$` + reject `.` / `..` / empty / whitespace, then `url.PathEscape` |
+| 4 | No client-side categorization between 404 / 403 / malformed-id / 5xx | `cmd/traces.go` `tracesGetCmd` has no error mapping | Phase 2 tests + existing `apiErrorCodeForStatus` ([cmd/helpers.go:152-172](cmd/helpers.go:152)) already maps every required HTTP status — no `MapServerCode` extension. |
+
+## Server error-code findings
+
+Not observed live (auth-blocked). Plan does not speculate server-code strings — relies on `apiErrorCodeForStatus` HTTP-status mapping which covers every AC#3 category:
+
+| HTTP | Canonical code | Exit code |
+|------|----------------|-----------|
+| 400 / 422 | `INVALID_REQUEST` | 4 |
+| 401 | `UNAUTHORIZED` | 2 |
+| 403 | `TENANT_ACCESS_REVOKED` | 2 |
+| 404 | `NOT_FOUND` | 3 |
+| 429 | `RESOURCE_EXHAUSTED` | 6 |
+| 5xx | `INTERNAL` | 5 |
+
+This locks AC#3 behavior independent of upstream server-code strings.
+
+## Upstream issue body
+
+**Not filed.** Root cause is CLI-side; no upstream issue required for `digitopvn/goclaw`. Phase 3 step 5 short-circuits.
+
+## Acceptance criteria mapping (preview)
+
+| AC | Phase | Status after Phase 1 |
+|----|-------|----------------------|
+| 1. Read trace details by id | Phase 2 (render path + decode-error surfacing) | Pending |
+| 2. JSON & human-readable output | Phase 2 (red test 3 + new green tests) | Red test locked |
+| 3. Distinct errors: not-found / perm / malformed / server | Phase 2 (HTTP-status mapping; existing `apiErrorCodeForStatus`) | Pending tests |
+| 4. Link upstream issue if API root cause | n/a — classified CLI-side | Resolved |
+| 5. Regression test | Phase 1 (3 tests) + Phase 2 (7 more) | 3 of 10 landed |
+
+## Unresolved questions
+
+- **Real-gateway fixture refresh.** Stub fixture is shaped from `traces follow` payload conventions, not the real `GET /v1/traces/{id}` wire envelope. If the real shape differs materially (e.g. spans nested under `tree`, events under `event_log`), Phase 2 render assertions may need light shape adjustments. Phase 3 code-reviewer gate enforces refresh + re-run before merge.
diff --git a/plans/reports/code-reviewer-260417-1254-goclaw-skill-red-team.md b/plans/reports/code-reviewer-260417-1254-goclaw-skill-red-team.md
new file mode 100644
index 0000000..4fc7720
--- /dev/null
+++ b/plans/reports/code-reviewer-260417-1254-goclaw-skill-red-team.md
@@ -0,0 +1,323 @@
+# Red Team Review: GoClaw Claude Skill Plan
+
+**Reviewer:** adversarial / staff eng
+**Target:** `plans/260417-1254-goclaw-claude-skill/` (plan.md + 4 phase files)
+**Date:** 2026-04-17
+**Verdict up front:** Ship AFTER critical fixes. Plan has structural soundness but several load-bearing assumptions unverified, effort underestimated, premature OSS scope, and a LICENSE blocker ignored.
+
+---
+
+## TL;DR — Kill / Keep / Modify
+
+### Kill
+- **"exec" hero use case as framed** (Phase 2 `exec-workflow.md`). `goclaw tools invoke exec --param command=...` is an assumption — no such hardcoded tool name in CLI source. It's `tools invoke <name>`; whether server registers `exec` as a tool is unverified. Verify BEFORE Phase 2 or the hero reference is wrong on day 1.
+- **15 reference files as v1 scope.** YAGNI violation. Ship 3-5; iterate.
+- **`--full-auto` install flag.** Premature. One feature flag maintaining two permission permutations for a one-user + speculative OSS case. Cut it from v1; add when a user asks.
+- **"≥ 90% correct flag" success criterion.** Unfalsifiable as written. Either define measurement methodology or drop.
+- **`Bash(goclaw whoami)` exact permission pattern** (Phase 4 step 1). Does NOT match `goclaw whoami --output json` — the skill's own convention is to always append `--output json`, so this rule will never fire. Same for `status`, `health`, `version`.
+
+### Keep
+- Overall structure (SKILL.md + references + install.sh + README)
+- Python3 merge approach for settings.json (already-verified pattern from researcher report)
+- Backup-before-patch of settings.json
+- Single-repo hosting decision (D1)
+- Convention "always append `--output json`" (D6)
+
+### Modify
+- Plan count: says 38 top-level groups, actually **36** registered in `rootCmd.AddCommand` (grep verified). `delegations` IS registered as top-level (admin.go:141), not "implied" as explorer claimed. `media` group exists (`admin_media.go:59`) but is missing from plan clusters entirely.
+- Merge `approvals` into ONE reference (currently split across `exec-workflow.md` AND `chat-sessions.md` per explorer cluster mapping — duplicate content risk).
+- Effort labels (S/M/M/M) — replace with concrete hours; see Major Issue #3.
+- Permission matrix — redesign around actual wildcard semantics (see Critical #2).
+
+---
+
+## Critical issues (must fix before implementation)
+
+### C1. LICENSE file does not exist at repo root — OSS publish blocked on day 1
+Checked: `ls /Volumes/GOON/www/nlb/goclaw-cli/LICENSE` → **No such file or directory**. Phase 1 step 1 says "Check license file exists at repo root. If yes, copy... If no, note for Phase 4." Phase 4 does not include "write LICENSE" in its todo list — it's invisible. Result: you hit Phase 4 and realize Goal 4 ("OSS-publishable") is blocked waiting on a legal decision (MIT vs Apache-2.0?) that needs user input.
+
+**Fix:** Before Phase 1, ask user which license. Add `write LICENSE` to Phase 1 todo list explicitly. Don't defer to Phase 4.
+
+### C2. Permission rule patterns likely DO NOT enforce as intended
+Plan's install.sh adds:
+```
+Bash(goclaw list:*)   Bash(goclaw get:*)
+Bash(goclaw status)   Bash(goclaw whoami)
+Bash(goclaw health)   Bash(goclaw version)
+```
+
+Two problems:
+
+(a) **Skill convention says always append `--output json`** (D6 in plan). So Claude will actually run `goclaw status --output json`, which does NOT match exact pattern `Bash(goclaw status)`. Same for whoami, health, version. Four of six readonly rules are dead code. User will hit approval prompt for read-only commands — defeating the "readonly default safe" design.
+
+(b) **`Bash(goclaw list:*)` assumes `list` is a top-level subcommand.** It is NOT. Structure is `goclaw agents list`, `goclaw sessions list`, `goclaw cron list`, etc. The pattern `goclaw list:*` matches literal string `goclaw list something` — won't match `goclaw agents list`. Plan should use patterns like `Bash(goclaw agents list:*)`, `Bash(goclaw sessions list:*)`, etc., OR use `Bash(goclaw * list *)` if wildcards in middle positions work (researcher report: "Wildcard patterns fragile with complex Bash"). More likely answer: enumerate safe leaf commands.
+
+**Fix:**
+1. Verify empirically in Claude Code which wildcard patterns actually enforce. Don't ship based on assumed syntax.
+2. After `--output json` added, rules need trailing wildcard: `Bash(goclaw status *)`, `Bash(goclaw whoami *)` etc., OR use `Bash(goclaw * --output json)` (if middle wildcards work).
+3. For read-only by verb, must fan out per resource group: `Bash(goclaw agents list:*)`, `Bash(goclaw agents get:*)`, `Bash(goclaw sessions list:*)`, ~20 rules minimum. Scope creep on install.sh.
+
+### C3. "exec" tool name is unverified — hero reference may be fiction
+Phase 2 centers on `goclaw tools invoke exec --param command="..."` as THE hero use case. Source code in `cmd/tools.go:111-143`:
+```go
+Use: "invoke <name>",
+body := map[string]any{"name": args[0], "parameters": params}
+data, err := c.Post("/v1/tools/invoke", body)
+```
+`name` is free-form. **No code anywhere hardcodes "exec" as a tool name.** It's assumed the server registers an `exec` tool. If it doesn't, or if it's named `shell`, `bash`, `run-command`, the hero reference is wrong.
+
+Phase 2 risk section admits: "Schema response của `tools invoke exec` chưa known — phải verify bằng call thật lên dev server HOẶC đọc server code". This is fine to acknowledge — but it's BLOCKING for the phase labelled "hero". Cannot defer.
+
+**Fix:** Before writing exec-workflow.md, run `goclaw tools builtin list --output json` against a live dev server and record the actual tool names + parameter schemas. Attach to phase 2 as prerequisite.
+
+### C4. One-liner install via `curl | bash` has no integrity check
+README (Phase 4): `curl -fsSL <raw-url>/claude-skill/install.sh | bash`.
+
+No SHA256 verification, no pinned commit SHA, no signed release. User trusts:
+1. github.com TLS not MITM'd (OK-ish)
+2. The `main` branch HEAD at install time (supply-chain risk — attacker merges to main)
+3. Any subsequent files the script curls from main (install.sh in Phase 4 doesn't detail this — does it curl down references from raw.githubusercontent.com one by one? If yes, that's 18+ HTTPS requests per install, any of which fails partially)
+
+Plan doesn't specify fetch strategy. Options:
+- Single tarball from GitHub releases (recommended) — one artifact, one SHA256 check.
+- Script embeds a commit SHA and curls blob URLs at that SHA (reproducible).
+- User runs `git clone && ./install.sh` (safest, slowest).
+
+**Fix:** Phase 4 must specify install mode. Recommend: release tarball + SHA256 in README. `curl|bash` one-liner downloads tarball and verifies checksum. Without this, publishing to OSS is irresponsible — skill can install with modified `goclaw:*` permissions rule and user won't notice.
+
+### C5. Phase 4 smoke test is inadequate
+"Manual test 5 prompts" — what about the inverse:
+- User says "give me claw feet for my couch" (woodworking) — does skill load? (false positive)
+- User says "goclaw go claws" in a poem — does skill load? (over-triggering wastes context)
+- User says "run a command on my server" (generic) — does skill load when it should? (under-triggering)
+- Description keyword test across 10+ unrelated prompts to bound over/under-trigger rate.
+
+Also: no regression test for "Claude hallucinates `--force` flag instead of `--yes`" — verify each reference against source code flags post-write.
+
+**Fix:** Phase 4 test matrix expand to:
+- 5 positive intents (currently in plan)
+- 5 negative intents (skill should NOT load)
+- 5 destructive intents (skill should prompt before running)
+- Automated flag-verification pass: for each reference, grep every `--flag` name against corresponding `cmd/*.go` source.
+
+---
+
+## Major issues (should fix)
+
+### M1. Effort estimate is wildly optimistic
+Plan: Phase 2 = M, Phase 3 = M, Phase 4 = M. Reality:
+
+| Phase | Claimed | Realistic (one human, focused) |
+|-------|---------|--------------------------------|
+| 1 (scaffold) | S | 1-2 hrs |
+| 2 (5 refs, 150-250 lines each = ~1000 lines, requires reading ~10 cmd/*.go files, verifying flags, testing) | M | **8-12 hrs** |
+| 3 (10 refs, ~2000 lines, same verification overhead) | M | **15-20 hrs** |
+| 4 (install.sh production-grade + README + tests + journal) | M | **4-6 hrs** |
+
+Total: **~30-40 hours** for a quality skill. Phase 3 todo list says "mỗi file 15-30 phút" — that's drastically underestimated for files requiring source verification of 5-11 subcommands, flag tables, 3-5 worked examples, and cross-links. 30 min is "skim source and write template boilerplate". You want 45-90 min per reference minimum for quality. At 10 refs × 1 hr = 10 hrs, plus 3 hrs cross-linking + testing = 13 hrs for phase 3 alone. Plan's implicit assumption of "one session" will produce fatigue-driven quality drop around file 8.
+
+**Fix:** Either (a) cut to 5 refs for v1, iterate based on usage, or (b) split Phase 3 across multiple sessions with explicit "resume here" markers per file.
+
+### M2. 15 references is YAGNI violation for stated scope
+Goal 3 says "Cover toàn bộ 38 command groups" — but why? User's actual goal (per brainstorm citation `4/17 12:41`): "exec trên server là đủ". Covering `tenants`, `system-config`, `tts`, `media`, `api-docs open` upfront speculates need that may never materialize. Every reference written = maintenance debt (flag drift, staleness).
+
+Compare: `gh-cli` skill in researcher report has no references/ at all — it's pure SKILL.md reference doc. Works fine.
+
+**Fix:** v1 = 3 references:
+1. `exec-workflow.md` (hero, if exec tool verified)
+2. `auth-and-config.md` (prereq for any call)
+3. `common-commands.md` (agents list/get, sessions list/get, status, logs — the 80% use)
+
+Ship. Add more when user says "I want X and it's not covered". Measure actual usage before investing in 2000+ lines of maintenance surface.
+
+### M3. Source-of-truth drift — no automated sync mechanism
+Plan (Risk row 4): "goclaw CLI đổi flag → reference outdated. Mitigation: Version check ở top SKILL.md; link tới CHANGELOG." This is not mitigation — it's a README comment. When `goclaw agents create` adds `--temperature` flag in v1.3, nothing in the pipeline flags the skill as outdated.
+
+Real mitigations (pick one):
+- **CI check:** script that greps every flag in references against `cmd/*.go`, fails PR if mismatch.
+- **Auto-gen with manual polish:** generate verified-flags table from `goclaw <cmd> --help`, commit alongside manual prose.
+- **Version pinning in SKILL.md frontmatter:** `compatible-with: goclaw>=0.4.0,<0.5.0` and CI updates bounds on CLI release.
+
+Plan dismisses auto-generation as "viết tay để control UX" but explicitly writing verified-flags table by hand guarantees drift. Hybrid approach (auto-gen flag table, hand-write prose) gives you both.
+
+**Fix:** Add Phase 5 (or bake into Phase 4): write `claude-skill/check-drift.sh` that grep-validates flag names. Wire into Makefile + CI.
+
+### M4. Approvals is placed in BOTH exec-workflow AND chat-sessions
+Phase 2 says `exec-workflow.md` covers `approvals list/approve/deny`. Explorer cluster mapping puts `approvals` under `chat-sessions`. Phase 2 Priority list #3 is `chat-sessions.md` which also touches approvals. Duplicate content → maintenance burden + inconsistency risk.
+
+**Fix:** Decide ONE canonical home. Since approvals fire from `tools invoke exec` (execution gating), keep in `exec-workflow.md`. `chat-sessions.md` cross-links only.
+
+### M5. `media`, `packages`, `delegations` cluster assignments are wrong or missing
+- `media` group (`cmd/admin_media.go:59`) exists. Not in any cluster in plan/explorer.
+- `packages` (`cmd/packages.go:102`) lives where? Plan phase 3 puts it in `providers-skills-tools.md`. Packages are runtime binaries, not "skills" — confusing grouping.
+- `delegations` is a root command (`cmd/admin.go:141` → `rootCmd.AddCommand(approvalsCmd, delegationsCmd)`), not "implied in agents.links" as explorer said. Plan phase 3 puts it under `agents-advanced.md`. Verify whether `delegations` is a separate resource or just an alias.
+
+**Fix:** Run `goclaw --help` against actual built binary, extract authoritative top-level command list, re-cluster. Don't trust explorer table.
+
+### M6. Python3 fallback path assumes `command -v python3` works — ancient macOS caveat
+install.sh (Phase 4):
+```bash
+PY="$HOME/.claude/skills/.venv/bin/python3"
+[[ ! -x "$PY" ]] && PY="$(command -v python3)"
+[[ -z "$PY" ]] && { echo "python3 not found — add permissions manually"; ...}
+```
+
+Issues:
+- `command -v python3` returns path but doesn't verify it's Python **3**. Some old macOS had `/usr/bin/python3` symlinked to `python2.7`. Rare in 2026 but possible.
+- `[[ -z "$PY" ]]` check is too late — if `command -v` returns empty, `$PY=""` and `"$PY"` passes `-x` never because `[[ ! -x "$PY" ]]` when PY is empty: test fails falsy → fallback branch re-runs. Actual bug: if first check fails, second overwrites. Script then tries to run `"" <<EOF` which errors with "command not found" before hitting the null check.
+- Script uses `${FULL_AUTO}` and `${DRY_RUN}` inside the Python heredoc — this is Bash substitution, NOT Python variable. The resulting Python becomes `rules = 1 == 1 and [...]` which works by accident but is fragile. A shellcheck lint will likely complain.
+
+**Fix:**
+- Use `python3 -c 'import sys; sys.exit(0 if sys.version_info[0]==3 else 1)'` sanity check.
+- Reorder null check BEFORE invocation.
+- Pass flags via env vars to Python (cleaner):
+  ```bash
+  FULL_AUTO=$FULL_AUTO DRY_RUN=$DRY_RUN "$PY" <<'EOF'
+  import os
+  full_auto = os.environ['FULL_AUTO'] == '1'
+  ...
+  EOF
+  ```
+  Note heredoc `<<'EOF'` (quoted) prevents bash substitution. Fixes injection risk too.
+
+### M7. install.sh backup-on-failure behavior unspecified
+install.sh: `[[ -f "$SETTINGS" ]] && cp "$SETTINGS" "${SETTINGS}.bak.$(date +%s)"`
+
+If `cp` fails (disk full, permission denied on `.bak` target), `&&` short-circuits silently (exit 0 from the test, cp stderr printed but not caught). Script proceeds to mutate settings.json with NO backup. `set -e` does not save you — the `&&` chain has a non-zero exit only when cp fails, but bash with `set -e` treats `A && B` as conditional, not a failure path.
+
+Test: `bash -euo pipefail -c '[[ -f /etc/hosts ]] && cp /etc/hosts /dev/null/x; echo reached'` — prints "reached", does not abort.
+
+**Fix:**
+```bash
+if [[ -f "$SETTINGS" ]]; then
+  cp "$SETTINGS" "${SETTINGS}.bak.$(date +%s)" || {
+    echo "ERROR: Failed to backup settings.json, aborting" >&2
+    exit 1
+  }
+fi
+```
+
+### M8. `--full-auto` confirmation reads from stdin — fails under `curl|bash`
+Phase 4 security note: "`--full-auto` confirmation không bypass được qua pipe (read từ /dev/tty)."
+
+Plan correctly identifies issue, but script as written uses `read -p "Continue? [y/N]" -n 1 -r` — **reads from stdin**. Under `curl|bash`, stdin is the piped script itself, so `read` consumes the next line of the script. Result: confirmation auto-approves with whatever char follows, OR script hangs if stdin already consumed.
+
+**Fix:** `read -p "..." < /dev/tty` OR refuse to run under `--full-auto` when `[[ ! -t 0 ]]`. Plan mentions the fix in prose but not in code. Put it in code.
+
+---
+
+## Minor issues / nitpicks
+
+### N1. SKILL.md name collision risk
+Plan D2: skill name `goclaw`. Canonical path: `~/.claude/skills/goclaw/`. If user already has a skill named `goclaw` (from another source), install.sh overwrites silently. Add `--force` flag gating overwrite, default refuse with "existing skill found at ..., use --force to replace".
+
+### N2. Kebab-case naming guideline vs actual file list
+Plan CLAUDE.md says "Go snake_case file naming". But Phase 1 non-functional req says "Kebab-case cho tên reference files". The skill files are Markdown not Go, so kebab-case is fine — just note the mental switch to avoid accidental snake_case drift in reference filenames.
+
+### N3. `chat-sessions.md` omits abort
+Phase 2 agenda for `chat-sessions.md` lists chat send, single-shot, sessions CRUD. Explorer shows `chat abort` exists as destructive-without-`--yes`. Not in phase 2 content outline. Gap.
+
+### N4. `auth pair` handling
+Explorer: "auth pair = device pairing, poll 60× 2s, not streaming but long-running". Phase 2 outline notes "long-running polling — document 'not Bash-friendly'". Fine. But test matrix (Phase 4) doesn't include an auth-pair prompt. If Claude is asked "set up goclaw auth", what happens? Probably initiates `goclaw auth pair` and hangs the Bash tool for 120 seconds. Add explicit guidance: "skill should refuse to run pair; tell user to run manually".
+
+### N5. Output path name convention leaks
+Report naming convention in header says `code-reviewer-260417-1304-{slug}.md` but task requested `code-reviewer-260417-1254-goclaw-skill-red-team.md`. Inconsistency — the 1304 is from hook-injection time, 1254 is the plan's own timestamp. Pick one and stick. (I used 1254 per explicit user instruction.)
+
+### N6. `effort` frontmatter field is legacy
+Researcher report §1 says "`effort` — Override inference effort. Not in use for CLI wrappers." But Phase 1/2/3/4 YAML frontmatter all have `effort: S/M/M/M`. This is the plan's own tracking metadata, not SKILL.md frontmatter — so technically OK. But confusing naming overlap.
+
+### N7. Cross-repo README link maintenance
+Phase 4 step 3: "Update root `README.md` thêm link tới `claude-skill/README.md`". Fine, but this creates a doc dependency. If skill README moves or renames, root README breaks. Low-risk nitpick.
+
+---
+
+## Unaddressed risks
+
+### U1. Skill "always append --output json" convention is unenforceable
+D6 says Claude should always append `--output json`. But Claude's obedience to instructions varies with context pressure and conflicting signals. If user says "show me agents in a table", Claude might omit `--output json`. Downstream parsing then breaks because skill examples all assume JSON.
+
+**Mitigation idea:** Wrap in a script. `goclaw-json` shim that forces `--output json` by default, errors on table mode. Skill instructs Claude to use `goclaw-json` not `goclaw`. Too heavy for v1; note as follow-up.
+
+### U2. Token expiry mid-session
+Plan says "Skill KHÔNG lưu token — dùng credential store của goclaw CLI". What if credential expires mid-session? `goclaw agents list --output json` exits with auth error. Skill has no reference covering "what does Claude do when goclaw returns 401?". Gap.
+
+**Fix:** In `auth-and-config.md`, document exit codes and re-auth flow.
+
+### U3. Streaming command over-triggering
+User says "watch agent logs". Claude has a skill reference saying `logs tail` is streaming, unsupported. What does Claude actually do? Best case: explains limitation, suggests polling. Worst case: runs `goclaw logs tail`, Bash tool times out at 120s, user sees cryptic truncation. Not tested in Phase 4 matrix.
+
+### U4. Windows users exist
+Non-goal: "Windows PowerShell script — macOS/Linux only cho v1". Fine. But Claude Code runs on Windows. A Windows user who `git clone`s the repo will see install.sh and try to run it under git-bash. Most of it works, but `/dev/tty`, `date +%s`, and Python venv path all need adjustment. Plan doesn't say "Windows unsupported, print error and exit". Add that.
+
+### U5. Multiple Claude Code installations
+Some users have `~/.claude/` and `~/.config/claude/` both. install.sh hardcodes `~/.claude/`. If Claude Code changes default path in 2027, skill silently installs to stale location. Add env var override: `CLAUDE_HOME="${CLAUDE_HOME:-$HOME/.claude}"`.
+
+### U6. "15 references" means 15× surface area for prompt injection
+Not relevant today because user loads refs manually. But if references include server-returned data (e.g., example responses), a hostile gateway could inject prompts via error messages included in examples. Low-risk, but: rule "no server-rendered content in references, all examples synthetic".
+
+---
+
+## Concrete fix checklist (prioritized)
+
+**BLOCKING — do before starting any phase:**
+1. Decide LICENSE — write it in Phase 1.
+2. Verify `exec` tool exists on server and capture its schema. Or rename hero to a verified tool.
+3. Empirically verify `Bash(goclaw status)` vs `Bash(goclaw status *)` vs `Bash(goclaw status --output json)` wildcard behavior in actual Claude Code 2026.
+4. Re-enumerate top-level groups from `goclaw --help` (binary output, not source grep) — fix count to 36 or whatever actual number, include `media`, confirm `delegations`.
+
+**MUST fix in Phase 1:**
+5. Fix LICENSE write into phase 1 todo.
+6. Replace `effort: S/M/M/M` with hour estimates: Phase 1 = 2h, Phase 2 = 10h, Phase 3 = 18h (if kept at 10 refs), Phase 4 = 6h.
+
+**MUST fix in Phase 2:**
+7. Decide approvals canonical home (exec-workflow vs chat-sessions). Cross-link the other.
+8. Cut Phase 3 to 5 refs OR explicitly budget 18 hours.
+
+**MUST fix in Phase 4:**
+9. install.sh:
+   - Quote heredoc (`<<'EOF'`) and pass vars via env.
+   - Robust backup (if-block, not `&&`).
+   - `read < /dev/tty` for `--full-auto` confirm, or refuse when `! -t 0`.
+   - Python3 sanity check (3.x verification).
+   - Windows abort message.
+10. README one-liner: use release tarball + SHA256, not raw main branch.
+11. Test matrix: expand to 15 prompts (positive / negative / destructive).
+12. Add `check-drift.sh` with flag grep validation.
+
+**SHOULD consider:**
+13. Drop `--full-auto` flag from v1 entirely. Add when demanded.
+14. Cut reference count to 3-5 for v1. Mark phase 3 as "v0.2 backlog".
+15. Revisit whether this should be a skill at all — a CLAUDE.md snippet + auto-approved `Bash(goclaw:*)` in project settings.json might serve one-user need. Skill is right answer only if OSS publish is confirmed priority (worth the maintenance cost).
+
+---
+
+## Recommendation
+
+**Ship after critical fixes.** The plan is structurally coherent but carries ~8 unverified assumptions any one of which torpedoes UX on launch day. Specifically:
+- If "exec" tool doesn't exist → hero reference is wrong.
+- If wildcard permission patterns don't match `--output json` suffix → all readonly rules dead.
+- If LICENSE undecided → OSS publish blocked.
+- If effort underestimated 3× → Phase 3 ships as low-quality boilerplate.
+
+Recommend downgrade to v0.1 scope: **3 references, no --full-auto flag, single tarball install with SHA256**. That's a 1-day project that proves the skill pattern works. Phase 3 + additional references become v0.2 driven by actual user demand. This is aligned with plan's stated YAGNI principle — which phase 3 violates.
+
+**Do NOT** proceed with current plan as-is. Minimum viable correction: address C1-C5 before Phase 1.
+
+---
+
+## Unresolved questions
+
+1. Does `exec` tool exist on GoClaw server, or is hero path fiction? (Blocking)
+2. What is the empirical behavior of `Bash(goclaw status)` permission pattern against `goclaw status --output json`? (Blocking)
+3. Which license (MIT / Apache-2.0 / proprietary)? (Blocking OSS)
+4. Is OSS publish actually a goal, or is this a one-user skill? Plan treats OSS as goal; user's stated motivation was personal use.
+5. Does `goclaw --help` list 36 or 38 top-level groups? Plan and explorer disagree with source code.
+6. Does server register a `media` resource and is it in-scope for skill?
+7. For `tools invoke`, what's the canonical JSON response schema? Referenced in Phase 2 Risk but never resolved.
+8. What is expected behavior when Claude invokes a streaming command (`logs tail`) — abort at 120s Bash timeout, or should skill pre-empt?
+
+---
+
+**Status:** DONE_WITH_CONCERNS
+**Summary:** Plan is well-structured but has 5 critical blockers (LICENSE missing, exec tool unverified, permission wildcards likely broken, no install integrity check, inadequate smoke test) and 8 major issues. Scope is 3× YAGNI — cut to 3 references for v1. Effort underestimated ~3×.
+**Concerns:** Without verifying the three unverified-but-load-bearing assumptions (exec tool, wildcard semantics, license) before Phase 1, work will stall mid-Phase 2. Recommend parent agent route blocking questions back to user before dispatching implementer.
diff --git a/plans/reports/codex-prompt-260521-p6-backend-unblock.md b/plans/reports/codex-prompt-260521-p6-backend-unblock.md
new file mode 100644
index 0000000..5931e9d
--- /dev/null
+++ b/plans/reports/codex-prompt-260521-p6-backend-unblock.md
@@ -0,0 +1,182 @@
+# Codex CLI Prompt: Implement goclaw-cli P6 Backend-Unblocked Commands
+
+Workdir: `/Volumes/GOON/www/nlb/goclaw-cli`
+
+You are working on `goclaw-cli`, the Go/Cobra CLI for GoClaw Gateway.
+
+## Mission
+
+Implement only the P6 CLI commands now unblocked by backend PR #37 in `digitopvn/goclaw`.
+
+Backend evidence:
+- Backend repo: `digitopvn/goclaw`
+- PR: `https://github.com/digitopvn/goclaw/pull/37`
+- Merge commit: `56e227c4030e85163cd882b29ab472f8ce3e1a27`
+- Beta tag containing these APIs: `v3.12.0-beta.16`
+- Backend files proving contracts:
+  - `internal/http/traces.go`
+  - `internal/http/providers.go`
+  - `internal/http/openapi_spec.json`
+  - `docs/18-http-api.md`
+
+Before implementation, verify current backend release status. If `v3.12.0-beta.16` release/assets are still publishing, note that, but CLI code may proceed because the tag already points to PR #37.
+
+## Current goclaw-cli repo context
+
+Read first:
+- `README.md`
+- `CLAUDE.md`
+- `AGENTS.md`
+- Existing command patterns in:
+  - `cmd/traces.go`
+  - `cmd/providers.go`
+  - `cmd/providers_crud.go`
+  - `cmd/helpers.go`
+  - `internal/client/http.go`
+  - command tests under `cmd/*_test.go`
+
+Important local state warning:
+- The checkout may already be on a feature branch and may have unrelated untracked files such as `.claude/` or `AGENTS.md`.
+- Do not overwrite or delete unrelated user/untracked files.
+- If the working tree is dirty in unrelated files, either work around them safely or create a clean worktree/branch for this task.
+
+## Scope: implement exactly 2 command surfaces
+
+### 1. Trace polling-friendly follow
+
+Add a CLI command under `traces`, probably:
+
+```bash
+goclaw traces follow --session-key <key> [--since <RFC3339>] [--limit N] [--include-spans] [--status <status>] [--channel <channel>] [-o json|yaml|table]
+goclaw traces follow --agent <uuid> [same flags]
+```
+
+Backend endpoint:
+
+```http
+GET /v1/traces/follow
+```
+
+Query contract:
+- Require one of:
+  - `session_key`
+  - `agent_id`
+- Optional:
+  - `status`
+  - `channel`
+  - `since` as RFC3339
+  - `limit`, default server 50, max server 200
+  - `include_spans`, boolean, default false
+- Non-admin callers only receive their own traces. Admin may pass `user_id`, but do not add `--user-id` unless existing CLI conventions already expose admin trace filters.
+
+Response payload after `internal/client.HTTPClient` envelope unwrap:
+
+```json
+{
+  "traces": [],
+  "spans_by_trace_id": {},
+  "server_time": "2026-05-21T00:00:00Z",
+  "next_since": "2026-05-21T00:00:00Z",
+  "limit": 50
+}
+```
+
+CLI behavior:
+- For JSON/YAML, print the full response map.
+- For table, print trace rows similar to `traces list`, and include enough fields to be useful:
+  - `TRACE_ID`, `AGENT`, `STATUS`, `DURATION_MS`, `INPUT_TOKENS`, `OUTPUT_TOKENS`, `COST`
+- Validate that exactly one target style is provided if that matches local CLI style; at minimum return a clear error if neither `--session-key` nor `--agent` is provided.
+- Do not implement long-lived watch loops unless existing CLI has a clear polling/watch convention. This backend endpoint is polling-friendly; one request is acceptable for this slice.
+
+Tests to add:
+- Command builds correct path for `session_key`, `since`, `limit`, `include_spans`.
+- Command builds correct path for `agent_id`.
+- Missing both `--session-key` and `--agent` returns error before HTTP call.
+- JSON output preserves `next_since` and `spans_by_trace_id`.
+
+### 2. Provider reconnect
+
+Add a CLI command under `providers`, probably:
+
+```bash
+goclaw providers reconnect <provider-id> [-o json|yaml|table]
+```
+
+Backend endpoint:
+
+```http
+POST /v1/providers/{id}/reconnect
+```
+
+Auth/permission:
+- Backend requires admin role.
+
+Request contract:
+- No body by default.
+- Do not send `{ "verify": true }`.
+- Do not add a `--verify` flag. Backend explicitly rejects verify-on-reconnect; users should call `goclaw providers verify <id>` separately.
+
+Response payload after envelope unwrap:
+
+```json
+{
+  "status": "reconnected",
+  "provider": {},
+  "registry_updated": true,
+  "cache_invalidated": true
+}
+```
+
+`status` enum:
+- `reconnected`
+- `disabled`
+- `not_registered`
+
+CLI behavior:
+- For JSON/YAML, print the full response map.
+- For table, print a small single-row table or concise success message containing `status`, `registry_updated`, and `cache_invalidated`.
+- Path-escape the provider ID exactly like existing provider commands.
+
+Tests to add:
+- Command uses `POST /v1/providers/{escaped-id}/reconnect`.
+- Command sends no request body by default.
+- JSON/table handling does not drop `registry_updated` or `cache_invalidated`.
+
+## Explicitly out of scope
+
+Do not add stubs, placeholders, hidden flags, or commands for these still-blocked P6 backend items:
+- `POST /v1/traces/{id}/replay`
+- `GET /v1/logs/aggregate`
+- `POST /v1/channels/instances/{id}/writers/test`
+- `POST /v1/chat/sessions/{key}/branch`
+- WebSocket `chat.history.delta`
+- SSE/HTTP chat history follow
+
+If you discover an existing CLI command already maps to either new endpoint, preserve it and document the mapping instead of duplicating command names.
+
+## Workflow
+
+Use TDD:
+1. Scout current command/test helpers.
+2. Add focused failing tests for the two command surfaces.
+3. Implement the smallest command changes.
+4. Run:
+   - `go test ./...`
+   - `go vet ./...`
+   - `go build ./...`
+5. Red-team the diff for:
+   - wrong endpoint path
+   - body accidentally sent to reconnect
+   - command name collision
+   - output mode regression
+   - accidental implementation of blocked endpoints
+6. Commit with a clean conventional message.
+7. Open PR against the correct integration branch used by this repo.
+
+## Expected result
+
+One small PR in `goclaw-cli` that unblocks:
+- `goclaw traces follow`
+- `goclaw providers reconnect`
+
+No backend stubs. No fake commands for absent APIs.
diff --git a/plans/reports/codex-prompt-260522-p6-pr44-backend-unblocked-cli.md b/plans/reports/codex-prompt-260522-p6-pr44-backend-unblocked-cli.md
new file mode 100644
index 0000000..02d22f5
--- /dev/null
+++ b/plans/reports/codex-prompt-260522-p6-pr44-backend-unblocked-cli.md
@@ -0,0 +1,490 @@
+# Codex CLI Prompt: Implement goclaw-cli P6 backend-unblocked commands after backend PR 44
+
+Workdir: `/Volumes/GOON/www/nlb/goclaw-cli`
+
+You are working on `goclaw-cli`, the Go/Cobra CLI for GoClaw Gateway.
+
+## Mission
+
+Implement the P6 CLI commands now unblocked by backend PRs #37 and #44 in `digitopvn/goclaw`.
+
+This is a CLI-consuming task only. Do not create backend stubs. Do not invent APIs.
+
+## Backend evidence
+
+Backend repo: `digitopvn/goclaw`
+
+Already released in beta:
+- PR #37: `https://github.com/digitopvn/goclaw/pull/37`
+- Merge commit: `56e227c4030e85163cd882b29ab472f8ce3e1a27`
+- Beta tag known to contain these APIs: `v3.12.0-beta.16`
+- APIs:
+  - `GET /v1/traces/follow`
+  - `POST /v1/providers/{id}/reconnect`
+
+Merged to `dev`, beta release may still be publishing:
+- PR #44: `https://github.com/digitopvn/goclaw/pull/44`
+- Merge commit: `43049d3b3fbb5f457477118252d1f21fdc0480de`
+- As of prompt creation, latest listed backend beta was `v3.12.0-beta.18`, created before PR #44 landed.
+- As of prompt creation, `Dev CI and Beta Release` for commit `43049d3b` was still pending:
+  - `https://github.com/digitopvn/goclaw/actions/runs/26292214016`
+- APIs:
+  - `POST /v1/chat/sessions/{key}/branch`
+  - `GET /v1/chat/sessions/{key}/history/follow`
+  - `POST /v1/channels/instances/{id}/writers/test`
+  - `GET /v1/activity/aggregate`
+  - `GET /v1/logs/runtime/aggregate`
+
+Before implementation, verify current backend release status:
+
+```bash
+gh run list --repo digitopvn/goclaw --branch dev --limit 5 \
+  --json databaseId,name,status,conclusion,createdAt,headSha,url
+gh release list --repo digitopvn/goclaw --limit 10
+```
+
+If PR #44 is not in a beta tag yet, CLI implementation may still proceed from merged `dev` contracts, but do not claim live beta support until a beta tag containing `43049d3b` exists.
+
+Backend files proving contracts:
+- `internal/http/traces.go`
+- `internal/http/providers.go`
+- `internal/http/sessions.go`
+- `internal/http/channel_instances.go`
+- `internal/http/activity.go`
+- `internal/http/logs.go`
+- `internal/http/openapi_spec.json`
+- `docs/18-http-api.md`
+
+## Current goclaw-cli repo context
+
+Read first:
+- `README.md`
+- `CLAUDE.md` if present
+- `AGENTS.md` if present
+- Existing command patterns:
+  - `cmd/traces.go`
+  - `cmd/providers.go`
+  - `cmd/providers_crud.go`
+  - `cmd/sessions.go`
+  - `cmd/channels_writers.go`
+  - `cmd/admin_activity.go`
+  - `cmd/logs.go`
+  - `cmd/helpers.go`
+  - `internal/client/http.go`
+  - `internal/output/output.go`
+  - command tests under `cmd/*_test.go`
+
+Important local state warning:
+- The current checkout may be on branch `feat/claude-skill-v0.1`.
+- The current checkout may contain unrelated untracked files such as `.claude/` and `AGENTS.md`.
+- Do not overwrite, delete, stage, or commit unrelated user files.
+- Prefer creating a clean worktree or branch for this task before editing.
+
+Suggested branch name:
+
+```bash
+codex/feat-p6-backend-unblocked-cli
+```
+
+## Scope: implement backend-unblocked P6 CLI surfaces
+
+Implement exactly these command surfaces unless current CLI already contains one. If an equivalent command exists, preserve it and document the mapping instead of creating a duplicate command name.
+
+### 1. Trace polling-friendly follow
+
+Command:
+
+```bash
+goclaw traces follow --session-key <key> [--since <RFC3339>] [--limit N] [--include-spans] [--status <status>] [--channel <channel>] [-o json|yaml|table]
+goclaw traces follow --agent <uuid-or-key> [same flags]
+```
+
+Endpoint:
+
+```http
+GET /v1/traces/follow
+```
+
+Query contract:
+- require exactly one of `session_key` or `agent_id`
+- optional: `status`, `channel`, `since`, `limit`, `include_spans`
+- `since` must be RFC3339 if provided
+- server default `limit=50`, server max `200`
+
+Response after HTTP envelope unwrap:
+
+```json
+{
+  "traces": [],
+  "spans_by_trace_id": {},
+  "server_time": "2026-05-21T00:00:00Z",
+  "next_since": "2026-05-21T00:00:00Z",
+  "limit": 50
+}
+```
+
+CLI behavior:
+- JSON/YAML: print full response.
+- Table: print trace rows similar to `traces list`.
+- Include useful columns: `TRACE_ID`, `AGENT`, `STATUS`, `DURATION_MS`, `INPUT_TOKENS`, `OUTPUT_TOKENS`, `COST`.
+- One request only. Do not implement a watch loop in this slice.
+
+Tests:
+- session-key query builds correct path/query.
+- agent query builds correct path/query.
+- missing both target flags errors before HTTP call.
+- setting both target flags errors before HTTP call.
+- JSON output preserves `next_since` and `spans_by_trace_id`.
+
+### 2. Provider reconnect
+
+Command:
+
+```bash
+goclaw providers reconnect <provider-id> [-o json|yaml|table]
+```
+
+Endpoint:
+
+```http
+POST /v1/providers/{id}/reconnect
+```
+
+Contract:
+- admin-only backend permission.
+- no request body by default.
+- do not send `{ "verify": true }`.
+- do not add `--verify`; users should call `goclaw providers verify <id>` separately.
+
+Response after HTTP envelope unwrap:
+
+```json
+{
+  "status": "reconnected",
+  "provider": {},
+  "registry_updated": true,
+  "cache_invalidated": true
+}
+```
+
+Status enum:
+- `reconnected`
+- `disabled`
+- `not_registered`
+
+CLI behavior:
+- JSON/YAML: print full response.
+- Table: print status, registry_updated, cache_invalidated.
+- Path-escape provider ID like existing provider commands.
+
+Tests:
+- POST path is `/v1/providers/{escaped-id}/reconnect`.
+- no request body is sent by default.
+- JSON/table output preserves `registry_updated` and `cache_invalidated`.
+
+### 3. Session branch at message index
+
+Command:
+
+```bash
+goclaw sessions branch <session-key> --up-to-index <n> [--new-session-key <key>] [--label <label>] [--metadata k=v]... [-o json|yaml|table]
+```
+
+Endpoint:
+
+```http
+POST /v1/chat/sessions/{key}/branch
+```
+
+Request:
+
+```json
+{
+  "new_session_key": "optional",
+  "up_to_index": 12,
+  "label": "optional",
+  "metadata": {
+    "source": "cli"
+  }
+}
+```
+
+Response:
+
+```json
+{
+  "ok": true,
+  "source_key": "agent:default:ws:direct:abc",
+  "session_key": "agent:default:branch:direct:uuid",
+  "copied_messages": 12,
+  "total_messages": 24,
+  "label": "optional"
+}
+```
+
+CLI behavior:
+- `--up-to-index` is required and must be >= 0.
+- `--metadata` parses repeated `key=value`; reject malformed entries before HTTP call.
+- path-escape the source session key.
+- JSON/YAML: full response.
+- Table: source, new session key, copied/total, label.
+
+Tests:
+- required `--up-to-index`.
+- negative index rejected before HTTP call.
+- request body shape matches backend contract.
+- path escaping works for session keys containing `:` and `/`.
+- conflict response maps to the existing CLI error handling pattern.
+
+### 4. Session history follow by cursor
+
+Command:
+
+```bash
+goclaw sessions follow <session-key> [--cursor <n>] [--limit <n>] [-o json|yaml|table]
+```
+
+Endpoint:
+
+```http
+GET /v1/chat/sessions/{key}/history/follow
+```
+
+Query:
+- `cursor`, default `0`, must be >= 0.
+- `limit`, default `50`, max server `200`, must be > 0.
+
+Response:
+
+```json
+{
+  "session_key": "agent:default:ws:direct:abc",
+  "cursor": 12,
+  "next_cursor": 18,
+  "total": 18,
+  "messages": [],
+  "reset": false,
+  "updated": "2026-05-22T13:00:00Z"
+}
+```
+
+CLI behavior:
+- One polling request only. Do not implement SSE/WS watch in this slice.
+- JSON/YAML: full response.
+- Table: print cursor, next_cursor, total, reset, and compact message rows.
+- Use existing output helpers; do not create a separate renderer unless needed.
+
+Tests:
+- query path includes cursor and limit.
+- negative cursor rejected before HTTP call.
+- non-positive limit rejected before HTTP call.
+- JSON output preserves `reset`, `next_cursor`, and messages.
+
+### 5. Channel writer permission test
+
+Command:
+
+```bash
+goclaw channels writers test <instance-id> --group-id <group-scope> --user-id <user-id> [-o json|yaml|table]
+```
+
+Endpoint:
+
+```http
+POST /v1/channels/instances/{id}/writers/test
+```
+
+Request:
+
+```json
+{
+  "group_id": "group:telegram:-100123",
+  "user_id": "386246614"
+}
+```
+
+Expected response shape:
+
+```json
+{
+  "allowed": true,
+  "reason": "writer",
+  "instance_id": "uuid",
+  "agent_id": "uuid",
+  "group_id": "group:telegram:-100123",
+  "user_id": "386246614",
+  "writer_count": 3
+}
+```
+
+Known `reason` values:
+- `writer`
+- `not_writer`
+- `no_writers_configured`
+- `invalid_group`
+
+CLI behavior:
+- require `--group-id` and `--user-id`.
+- POST body only has `group_id` and `user_id`.
+- JSON/YAML: full response.
+- Table: allowed, reason, writer_count, group_id, user_id.
+
+Tests:
+- missing required flags fail before HTTP call.
+- request path/body exact.
+- table/json output keeps `allowed`, `reason`, and `writer_count`.
+
+### 6. Activity log aggregate
+
+Command:
+
+```bash
+goclaw activity aggregate --group-by <action|actor_type|entity_type|actor_id> [--from <RFC3339>] [--to <RFC3339>] [--limit <n>] [--actor-type <v>] [--actor-id <v>] [--action <v>] [--entity-type <v>] [--entity-id <v>] [-o json|yaml|table]
+```
+
+Endpoint:
+
+```http
+GET /v1/activity/aggregate
+```
+
+Contract:
+- `group_by` required.
+- valid values: `action`, `actor_type`, `entity_type`, `actor_id`.
+- backend restricts `group_by=actor_id` to admin.
+- optional filters: `from`, `to`, `limit`, `actor_type`, `actor_id`, `action`, `entity_type`, `entity_id`.
+- non-admin backend callers are scoped to resolved user context.
+
+Response:
+
+```json
+{
+  "source": "activity",
+  "group_by": "action",
+  "total": 10,
+  "limit": 50,
+  "from": "2026-05-22T00:00:00Z",
+  "to": "2026-05-23T00:00:00Z",
+  "buckets": [
+    {"key": "session.branch", "count": 7, "last_seen": "2026-05-22T11:00:00Z"}
+  ]
+}
+```
+
+CLI behavior:
+- validate `--group-by` against allowed values before HTTP call.
+- validate `--from` and `--to` as RFC3339 if provided.
+- JSON/YAML: full response.
+- Table: key, count, last_seen.
+
+Tests:
+- missing/invalid group-by rejected before HTTP call.
+- filters build correct query string.
+- JSON output preserves source, group_by, total, buckets.
+
+### 7. Runtime log aggregate
+
+Command:
+
+```bash
+goclaw logs aggregate [--group-by <level|source>] [--level <debug|info|warn|error>] [--source <source>] [--from <RFC3339>] [-o json|yaml|table]
+```
+
+Endpoint:
+
+```http
+GET /v1/logs/runtime/aggregate
+```
+
+Contract:
+- admin-only backend permission.
+- source is runtime ring buffer, not durable audit logs.
+- `group_by` default `level`; valid values: `level`, `source`.
+- optional filters: `level`, `source`, `from`.
+
+Response:
+
+```json
+{
+  "source": "runtime",
+  "retention": "ring_buffer",
+  "capacity": 100,
+  "sample_size": 25,
+  "group_by": "level",
+  "buckets": [
+    {"key": "warn", "count": 3, "last_seen": 1760000000000}
+  ]
+}
+```
+
+CLI behavior:
+- JSON/YAML: full response.
+- Table: key, count, last_seen, plus source/retention/capacity/sample_size summary if local output helpers support it.
+- Do not confuse this with `goclaw logs tail`, which is WebSocket streaming.
+
+Tests:
+- default group_by omitted or set to `level` based on existing CLI style.
+- invalid group-by rejected before HTTP call.
+- filters build correct query.
+- JSON output preserves retention, capacity, sample_size.
+
+## Explicitly out of scope
+
+Do not add stubs, placeholders, hidden flags, docs, or command names for APIs that still do not exist.
+
+Still out of scope:
+- `POST /v1/traces/{id}/replay`
+- `GET /v1/logs/aggregate`
+- WebSocket `chat.history.delta`
+- SSE chat history follow
+- any long-running watch loop for history follow or traces follow
+- live backend smoke if no beta tag containing PR #44 exists yet
+
+## Implementation workflow
+
+Use TDD.
+
+1. Scout:
+   - read README/CLAUDE/AGENTS.
+   - inspect command patterns and test helpers.
+   - verify whether PR #37 commands are already implemented.
+   - verify backend beta status for PR #44.
+2. Branch/worktree:
+   - avoid the dirty `feat/claude-skill-v0.1` checkout.
+   - create/switch to a clean feature branch or worktree.
+3. Tests first:
+   - add focused command tests for path/query/body/validation/output.
+   - tests should fail before implementation.
+4. Implement smallest changes:
+   - use existing `internal/client.HTTPClient` helpers.
+   - use existing output helpers.
+   - keep command modules small; follow existing file layout.
+5. Validation:
+   - `go test ./...`
+   - `go vet ./...`
+   - `go build ./...`
+6. Red-team diff:
+   - wrong endpoint path.
+   - body accidentally sent to provider reconnect.
+   - command name collision.
+   - output mode regression.
+   - missing required client-side validation.
+   - accidental implementation of still-blocked replay/generic logs APIs.
+   - dirty worktree accidentally staged.
+7. Ship:
+   - commit with conventional message.
+   - push branch.
+   - open PR against the correct integration branch for this repo.
+
+## Acceptance criteria
+
+Expected output is one focused `goclaw-cli` PR that adds CLI commands for every backend-unblocked P6 surface above.
+
+Done means:
+- all seven command surfaces exist, unless already present and explicitly mapped.
+- each command has focused tests.
+- all tests/build/vet pass.
+- README/help text reflects the new commands if this repo normally updates README for command coverage.
+- no CLI command exists for trace replay or generic `/v1/logs/aggregate`.
+- PR body lists the backend PR/tag evidence and notes whether PR #44 beta tag was available at implementation time.
+
diff --git a/plans/reports/explore-260417-1254-goclaw-command-inventory.md b/plans/reports/explore-260417-1254-goclaw-command-inventory.md
new file mode 100644
index 0000000..9767410
--- /dev/null
+++ b/plans/reports/explore-260417-1254-goclaw-command-inventory.md
@@ -0,0 +1,329 @@
+# GoClaw CLI Command Surface Inventory
+
+**Generated:** 2026-04-17 | **Thoroughness:** Medium | **Scope:** 38 top-level command groups + 60 files
+
+---
+
+## 1. TOP-LEVEL COMMAND MAP
+
+Complete mapping of all 38 command groups registered in `rootCmd.AddCommand` (root.go + resource files).
+
+| Group | Subcommands | Purpose | Key Flags | Destructive | JSON | Streaming | Notes |
+|-------|-------------|---------|-----------|-------------|------|-----------|-------|
+| **activity** | list | View audit log | --limit | N | Y | N | HTTP GET, table/json output |
+| **agents** | list, get, create, update, delete, [files, instances, links, ops, wake] | Manage agents & instances | --name, --provider, --model, --type, --yes | Y (delete) | Y | N | HTTP CRUD; delete uses tui.Confirm |
+| **agents.files** | list, get, create, delete | Agent context files | --data (JSON/@filepath) | Y | N | WS only | WebSocket calls, TUI-driven |
+| **agents.instances** | list, get, create, delete, [trigger, reset] | Per-user agent instances | --user-id, --yes | Y | N | WS only | WebSocket calls |
+| **agents.links** | list, create, delete | Delegation links | --agent, --target, --yes | Y | N | HTTP | tui.Confirm for delete |
+| **agents.ops** | share, unshare, regenerate, resummon, wait | Agent operations | --user, --yes | Y | N | HTTP/WS mix | share/unshare=HTTP DELETE; wait=WS.Subscribe |
+| **agents.wake** | wake | Wake sleeping agent | (none) | N | N | HTTP POST | Single action, always succeeds |
+| **api-docs** | open, spec | API documentation | (none) | N | N | HTTP GET | open=browser launch, spec=JSON fetch |
+| **api-keys** | list, create, reveal, revoke, extend | API key management | --name, --scopes, --expires-in, --yes | Y (revoke) | Y | HTTP | Scoped access, masked display |
+| **approvals** | list, approve, deny, watch | Execution approvals | --reason, --yes | Y (deny) | N | WS + tui | watch = ws.Subscribe; approve/deny = ws.Call |
+| **auth** | login, logout, whoami, use-context, list-contexts, pair | Authentication | --profile, --pair, --token | Y (logout) | N | HTTP/WS | pair flow uses device pairing code polling |
+| **channels** | [instances, contacts, pending, writers] | Messaging channels | --type, --name, --agent, --yes | Y (delete) | Y | HTTP | Channel type filtering, table output |
+| **channels.contacts** | list, get, create, delete | Contact management | (varies) | Y | Y | HTTP | tui.Confirm for delete |
+| **channels.pending** | list | View pending messages | (none) | N | Y | HTTP GET | Filtered by channelID |
+| **channels.writers** | list, add, remove | Group writer mgmt | --agent, --user, --yes | Y (remove) | Y | HTTP | tui.Confirm for remove |
+| **chat** | send (primary), inject, status, abort | Interactive & one-shot chat | -m, --session, --no-stream | N | Y (--output json) | WS stream | Primary: ws.Stream (NDJSON); abort=destructive but no --yes |
+| **config** | [get, set, permissions] | Server configuration | --key, --value | Y (set) | Y | WS | Uses tui.Confirm for write operations |
+| **config.permissions** | list, update, grant, revoke | Config permissions | --action, --yes | Y (revoke) | N | WS | WebSocket-only, permission model |
+| **contacts** | list, get, create, update, delete, verify | Manage contacts | --name, --email, --phone, --yes | Y (delete) | Y | HTTP CRUD | tui.Confirm for delete |
+| **credentials** | list, create, delete, rotate | CLI credentials store | --profile, --yes | Y (delete) | Y | HTTP | Per-profile credential storage |
+| **cron** | list, get, create, update, delete, trigger, history | Scheduled jobs | --agent, --schedule, --message, --yes | Y (delete) | Y | WS primary | ws.Call for CRUD; fallback to HTTP |
+| **delegations** | (implied in agents.links) | Delegation links | — | — | — | — | See agents.links |
+| **devices** | list, delete, approve, reject | Paired device mgmt | --device-id, --yes | Y (delete) | Y | WS | Device pairing & approval flow |
+| **export** | [agent-preview, agent, team-preview, team, skills-preview, skills, mcp-preview, mcp] | Resource export | --agent, --team, --skills, --output, --yes | N | Y | HTTP | Export as JSON/YAML for import |
+| **health** | (standalone cmd) | Server health check | (none) | N | N | HTTP GET | Simple status check, no args |
+| **heartbeat** | [get, set, checklist, targets] | Heartbeat configuration | --agent, --interval, --yes | Y (set) | Y | WS + HTTP | Multi-part: config (WS) + targets (HTTP) |
+| **heartbeat.checklist** | get, set | Checklist mgmt | --target-id, --enabled, --yes | Y (set) | Y | WS | WebSocket-driven |
+| **heartbeat.targets** | list, add, remove | Checklist targets | --target-id, --name, --yes | Y (remove) | Y | HTTP | tui.Confirm for remove |
+| **import** | [agent, team, skills, mcp] | Resource import | --file, --yes | Y | Y | HTTP POST | Reverse of export; requires --yes flag |
+| **kg** (knowledge-graph) | [entities, traverse, graph, stats, dedup] | Knowledge graph ops | --data, --from, --yes | Y (delete entity) | Y | HTTP CRUD | Alias: kg; entities={list,get,create,delete} |
+| **kg.dedup** | scan, merge-candidates, execute-merge | Entity deduplication | --agent, --yes | Y (merge) | Y | HTTP | Scan → review → execute workflow |
+| **logs** | tail | Stream server logs | --agent, --level | N | Y (json NDJSON) | WS subscribe | ws.Subscribe("*") on signal.Notify; interactive |
+| **mcp** | [servers, grants, requests, reconnect] | MCP server mgmt | --name, --transport, --command, --agent, --server, --yes | Y (delete server) | Y | HTTP CRUD | 3 subsystems: servers, grants, requests; reconnect=POST |
+| **memory** | [list, get, store, index, delete, clear] | Agent memory docs | --user, --content, --data, --yes | Y (delete/clear) | Y | HTTP CRUD | Per-agent/user memory store |
+| **packages** | list | Runtime packages | (none) | N | Y | HTTP GET | Read-only list of installed packages |
+| **pending-messages** | list, create, send, delete | Pending message queue | (varies) | Y (delete) | Y | HTTP | tui.Confirm for delete |
+| **providers** | [list, get, create, update, delete, verify-embedding] | LLM provider config | --name, --display-name, --api-key, --yes | Y (delete) | Y | HTTP CRUD | Verify endpoint tests embeddings |
+| **sessions** | list, preview, delete, reset, label | Chat session mgmt | --agent, --user, --label, --yes | Y (delete/reset) | Y | HTTP CRUD | tui.Confirm for destructive ops |
+| **skills** | [list, get, upload, download, publish, unpublish, delete, tenant-config, versions, grant, revoke] | Skill management | --search, --slug, --visibility, --yes | Y (delete/unpublish) | Y | HTTP multipart | upload=multipart form; visibility gating |
+| **status** | (standalone cmd) | Server status | (none) | N | Y | WS then table | ws.Call("status") → table format fallback |
+| **storage** | list, get, put, delete | Workspace files | --path, --content, --data, --yes | Y (delete) | Y | HTTP | Path-based file browsing |
+| **system-config** | list, get, set, delete | Per-tenant KV config | --key, --value, --yes | Y (delete) | Y | HTTP | Tenant-scoped configuration |
+| **teams** | [list, get, create, delete, members, events, tasks, workspace] | Agent team mgmt | --name, --agents, --yes | Y (delete) | N | WS only | All team ops are WebSocket; no HTTP fallback |
+| **teams.members** | list, add, remove, reassign | Team membership | --user, --team-id, --yes | Y (remove) | N | WS | WebSocket-driven |
+| **teams.tasks** | [list, get, create, delete, approve, reject, reassign] | Team task mgmt | --task-id, --title, --yes | Y | N | WS | ws.Call & ws.Subscribe for events |
+| **teams.workspace** | list, get, put, delete | Team workspace files | --path, --content, --yes | Y (delete) | N | WS | WebSocket-based file ops |
+| **tenants** | [list, get, create, update, delete, users] | Multi-tenant admin | --name, --users, --yes | Y (delete) | Y | HTTP CRUD | Admin-only; /v1/tenants endpoints |
+| **tools** | [builtin, custom] | Built-in & custom tools | --search, --yes | Y (delete custom) | Y | HTTP | builtin=read-only list; custom=CRUD |
+| **traces** | list, get | LLM trace viewing | --agent, --limit | N | Y | HTTP GET | OpenAI-compatible trace format |
+| **tts** | status | Text-to-speech | (none) | N | Y | WS call | ws.Call("tts.status") |
+| **usage** | [summary, breakdown, trends, export] | Usage analytics | --period, --agent, --format, --export | N | Y | HTTP GET | Metrics: views, completions, cost |
+| **version** | (standalone cmd) | Version info | (none) | N | N | stdout | Built-in version/commit/build info |
+
+---
+
+## 2. PROPOSED LOGICAL CLUSTERING (12-18 reference files)
+
+Grouping by cognitive locality and CLI skill reference organization:
+
+| Cluster | File | Commands | Rationale |
+|---------|------|----------|-----------|
+| **auth-and-config** | `auth-and-config.md` | auth, api-keys, config, config.permissions, credentials | Auth/session + credential storage + server config |
+| **agents-core** | `agents-core.md` | agents, agents.files, agents.instances, agents.wake | Core agent lifecycle + context + instances |
+| **agents-advanced** | `agents-advanced.md` | agents.links, agents.ops, delegations | Advanced agent operations: delegation, sharing, regenerate |
+| **chat-sessions** | `chat-sessions.md` | chat, sessions, approvals | Interactive chat + session mgmt + execution approval |
+| **knowledge-memory** | `knowledge-memory.md` | kg, kg.dedup, memory, memory.index | Knowledge graph + entity dedup + agent memory |
+| **teams-collaboration** | `teams-collaboration.md` | teams, teams.members, teams.events, teams.tasks, teams.workspace | Team-based operations + task mgmt |
+| **channels-messaging** | `channels-messaging.md` | channels, channels.instances, channels.contacts, channels.pending, channels.writers | Messaging channel delivery |
+| **data-movement** | `data-movement.md` | export, import, storage | Workspace file + resource export/import |
+| **providers-skills** | `providers-skills.md` | providers, skills, tools, packages | LLM providers + skills + tool registry |
+| **automation-scheduling** | `automation-scheduling.md` | cron, heartbeat, heartbeat.checklist, devices | Scheduled & event-driven automation |
+| **mcp-integration** | `mcp-integration.md` | mcp, mcp.servers, mcp.grants, mcp.requests | MCP protocol integration + access mgmt |
+| **monitoring-ops** | `monitoring-ops.md` | health, status, logs, traces, usage, version | Observability + health + analytics |
+| **admin-system** | `admin-system.md` | tenants, system-config, activity | Admin operations + multi-tenancy |
+| **docs-api** | `docs-api.md` | api-docs | API documentation access |
+
+---
+
+## 3. STREAMING COMMANDS (WebSocket-dependent, unsuitable for one-shot Bash)
+
+Commands using `ws.Subscribe()` or long-lived `ws.Stream()` that **require persistent connection**:
+
+| Command | Pattern | Issue | Workaround |
+|---------|---------|-------|-----------|
+| **chat** (interactive) | ws.Stream(chat.send) | Open-ended streaming, user-driven | Use `--no-stream` or `-m "single message"` for Bash |
+| **logs tail** | ws.Subscribe("*") on signal | Real-time tail, Ctrl+C to exit | Bash integration requires monitor wrapper |
+| **approvals watch** | ws.Subscribe() implied | Pending approvals stream (if implemented) | Use approvals.list + poll instead |
+| **teams.events** | ws.Subscribe on team_id | Live event stream per team | No Bash-friendly alternative |
+| **teams.tasks** (streaming) | ws.Subscribe tasks.* | Live task updates | List + poll for updates |
+| **auth pair** | Poll-based device pairing (60× 2s sleep) | Not streaming but long-running | Bash skill should implement full flow |
+| **agents.ops wait** | ws.Subscribe(agent_id) | Wait for agent idle state | Use polling with status check |
+| **devices approve** | Interactive approval flow (WS) | Requires human decision | WS skill wrapper needed |
+
+**Bash Skill Recommendation:** Wrap streaming commands in Monitor tool, not direct Bash invocation.
+
+---
+
+## 4. DESTRUCTIVE COMMANDS (require `--yes` flag or confirmation)
+
+All commands with `tui.Confirm()` or that perform DELETE/destructive operations:
+
+### Requiring `--yes` flag (automation-safe when set):
+- **agents delete** — `tui.Confirm(fmt.Sprintf("Delete agent %s?", args[0]), cfg.Yes)`
+- **api-keys revoke** — Permanent key revocation
+- **channels delete** — Channel instance removal
+- **channels.contacts delete** — Contact deletion
+- **channels.pending send** — Sends pending message (irreversible)
+- **channels.writers remove** — Writer access revocation
+- **config set** — Modifies server config
+- **config.permissions revoke** — Permission revocation
+- **contacts delete** — Contact removal
+- **credentials delete** — Credential store deletion
+- **credentials rotate** — Key rotation (old key invalid)
+- **cron delete** — Job removal
+- **devices delete** — Unpair device
+- **import **** — Bulk resource import (overwrites)
+- **kg entities delete** — Knowledge graph entity removal
+- **kg dedup execute-merge** — Permanent entity merge
+- **memory delete**, **memory clear** — Memory document removal
+- **pending-messages delete** — Message queue deletion
+- **providers delete** — Provider config removal
+- **sessions delete**, **sessions reset** — Session destruction
+- **skills unpublish**, **skills delete** — Skill removal from catalog
+- **storage delete** — Workspace file deletion
+- **system-config delete** — KV config removal
+- **teams delete** — Team removal
+- **teams.members remove** — Member removal
+- **teams.tasks delete** — Task deletion
+- **teams.workspace delete** — Workspace file removal
+- **tenants delete** — Tenant removal (admin)
+- **tools delete** (custom) — Custom tool removal
+
+### High-Risk (require skill prompt guidance):
+- **export *** (with --yes) — Bulk data extraction
+- **import *** (with --yes) — Bulk data overwrite
+- **memory clear** — Entire agent memory wipe
+
+**Skill Implementation:** Always prompt before operations with `--yes` flag unless user explicitly approves.
+
+---
+
+## 5. JSON OUTPUT SUPPORT ANALYSIS
+
+Commands that DO vs DON'T cleanly emit JSON for programmatic parsing:
+
+### JSON-Friendly (✅ full support):
+- **agents list, get** → printer.Print(unmarshalList/Map(data))
+- **api-keys list** → printer.Print(unmarshalList(data))
+- **approvals list** → printer.Print(unmarshalList(data))
+- **channels instances list** → printer.Print(unmarshalList(data))
+- **chat (--output json)** → NDJSON event stream
+- **config get** → printer.Print(unmarshalMap(data))
+- **contacts list, get** → printer.Print(unmarshalList/Map(data))
+- **cron list, get** → printer.Print(unmarshalList/Map(data))
+- **devices list** → printer.Print(unmarshalList(data))
+- **kg entities list, get** → printer.Print(unmarshalList/Map(data))
+- **kg graph, stats** → printer.Print(unmarshalMap(data))
+- **mcp servers list, get** → printer.Print(unmarshalList/Map(data))
+- **memory list, get** → printer.Print(unmarshalList/Map(data))
+- **providers list** → printer.Print(unmarshalList(data))
+- **sessions list, preview** → printer.Print(unmarshalList/Map(data))
+- **skills list, get** → printer.Print(unmarshalList/Map(data))
+- **status** → ws.Call("status") → table fallback OR json via --output
+- **storage list** → printer.Print(unmarshalList(data))
+- **system-config list, get** → printer.Print(unmarshalList/Map(data))
+- **tenants list, get** → printer.Print(unmarshalList/Map(data))
+- **tools builtin list** → printer.Print(unmarshalList(data))
+- **traces list, get** → printer.Print(unmarshalList/Map(data))
+- **tts status** → ws.Call → printer.Print(unmarshalMap(data))
+- **usage summary, breakdown** → printer.Print(unmarshalList/Map(data))
+
+### JSON-Unfriendly (⚠️ table-only or TUI-driven):
+- **agents.files list** — TUI output, no json flag
+- **agents.instances list** — JSON response exists but unmarshalling varies
+- **agents.links list** — No json output flag observed
+- **agents.ops share/unshare** — Success/error only
+- **agents.wake** — Single POST, no response
+- **approvals approve/deny** — "Execution approved" text only
+- **auth login/logout** — printer.Success() text only
+- **auth whoami** — Table output, --output flag support unclear
+- **chat interactive** — Real-time streaming, TUI-driven
+- **chat abort** — Single action response
+- **channels.contacts create** — Success message only
+- **channels.pending list** — Limited JSON support
+- **channels.writers add/remove** — Success text only
+- **config set** — Success message only
+- **config.permissions grant/revoke** — Success text only
+- **contacts create, verify** — Success message or TUI prompt
+- **credentials create** — Shows raw key once (not JSON)
+- **cron delete, trigger** — Success message only
+- **devices approve/reject** — WS call result unclear
+- **export agent-preview** — JSON output but preview-only
+- **heartbeat set, checklist set** — Success text only
+- **import *** — "Import complete" message only
+- **kg entities create, delete** — Success only
+- **kg dedup execute-merge** — "Merge complete" text
+- **logs tail** — Streaming plaintext logs, NDJSON in json mode
+- **mcp servers create/update/delete** — Success only
+- **mcp servers test** → printer.Print(unmarshalMap(data)) but test-specific
+- **mcp grants grant/revoke** — "Access granted/revoked" text
+- **memory store, index, delete** — Success message only
+- **packages list** — Limited to simple list, no details
+- **pending-messages send, delete** — Success text only
+- **providers create, update, delete** — Success only
+- **sessions label, delete, reset** — Success text only
+- **skills upload, publish, unpublish** — Success/progress messages
+- **skills tenant-config set** — Success text only
+- **storage put, delete** — Success text only
+- **system-config set, delete** — Success message only
+- **teams create, delete** — Success text only
+- **teams.members add/remove/reassign** — Success text only
+- **teams.tasks create, delete, approve, reject** — Success text only
+- **teams.workspace put, delete** — Success text only
+- **tenants create, update, delete** — Success text only
+- **tools custom create, delete** — Success text only
+- **version** — Plaintext stdout (multi-line)
+- **health** — "Server is healthy" text only
+
+**Principle:** Commands marked with `printer.Success()` are **output-unfriendly** for Bash skill integration; use json-friendly alternatives where available or parse table output.
+
+---
+
+## 6. SUMMARY STATISTICS
+
+| Metric | Count | Notes |
+|--------|-------|-------|
+| Top-level command groups | 38 | From rootCmd.AddCommand registrations |
+| Subcommands (total) | ~180 | Estimated across all groups |
+| HTTP-only commands | ~120 | CRUD operations, file I/O |
+| WebSocket commands | ~60 | Real-time, subscribe, streaming |
+| Destructive (delete/clear/reset) | ~40 | Require --yes or tui.Confirm |
+| JSON-friendly | ~70 | Support `--output json` cleanly |
+| Streaming (unsuitable for Bash) | ~8 | chat stream, logs tail, approvals watch, etc. |
+| Interactive TUI | ~24 | Device pairing, auth, approvals |
+| Global flags | 8 | --server, --token, --output, --yes, --insecure, --verbose, --profile, --tenant-id |
+
+---
+
+## 7. UNRESOLVED QUESTIONS & CONCERNS
+
+1. **Approvals "watch" command:** admin.go references "approvals watch" but implementation not found in visible code. Is it implicit in approvals.list polling or WS subscribe?
+
+2. **Teams WebSocket-only:** All teams operations use ws.Call/ws.Subscribe. What's the fallback if WS unavailable? HTTP pool?
+
+3. **Chat interactive mode vs skill integration:** Interactive chat (readline loop) cannot be driven by Bash skill. Should skill wrap in expect/pexpect or switch to single-shot mode?
+
+4. **TUI imports across files:** Multiple files import `internal/tui` (Confirm, Input, Password, IsInteractive). Does this pose usability challenges when --yes is set?
+
+5. **Config set destructiveness:** config set modifies live server state. How is rollback handled? No --dry-run observed.
+
+6. **Memory user scoping:** memory list supports --user filter but storage at /v1/memory/{agentID}/{path}. Is per-user memory automatic or manual?
+
+7. **Sessions vs chat.session.status:** Both exist (sessions.list/delete and chat.status). Which is authoritative? Can they diverge?
+
+8. **KG dedup workflow:** Dedup requires scan → review → merge-candidates → execute-merge. Should skill enforce this as a transaction or allow partial runs?
+
+9. **Export without --yes:** Does export risk large data dump without automation flag? No --yes observed.
+
+10. **Agent.files TUI mode:** agents.files list uses TUI for display. Can it be forced to json via --output flag, or is TUI hardcoded?
+
+11. **Cron HTTP fallback:** cronListCmd has HTTP fallback if WS unavailable. Are other commands similarly protected?
+
+12. **MCP reconnect vs servers.test:** reconnect triggers async reconnect; servers.test is sync. How are they distinguished in skill docs?
+
+13. **Skill upload multipart:** skillsUploadCmd uses multipart/form-data. What's the MIME type, content-disposition, and maximum size?
+
+14. **Teams namespace collision:** teams, teams.members, teams.tasks exist. Is teams.tasks a subcommand of teams or independent? AddCommand suggests independent.
+
+15. **Heartbeat.targets vs heartbeat.checklist:** Both exist under heartbeat. Are they the same resource or separate? docs unclear.
+
+---
+
+## 8. SKILL DEVELOPMENT GUIDANCE
+
+### Recommended Skill Structure (12-18 reference files):
+
+```
+~/.claude/skills/goclaw/
+├── references/
+│   ├── auth-and-config.md
+│   ├── agents-core.md
+│   ├── agents-advanced.md
+│   ├── chat-sessions.md
+│   ├── knowledge-memory.md
+│   ├── teams-collaboration.md
+│   ├── channels-messaging.md
+│   ├── data-movement.md
+│   ├── providers-skills.md
+│   ├── automation-scheduling.md
+│   ├── mcp-integration.md
+│   ├── monitoring-ops.md
+│   ├── admin-system.md
+│   └── docs-api.md
+├── SKILL.md (main entry point)
+└── helpers/ (optional: example scripts)
+```
+
+### Critical Patterns for Skill:
+
+1. **All commands accept `-o json` or `--output json` globally** (from root.go). Recommend JSON output for Bash skill parsing.
+
+2. **--yes flag available globally.** Wrap all destructive operations with confirmation prompt in skill unless explicitly approved.
+
+3. **WebSocket vs HTTP:** Streaming commands (chat, logs, approvals) MUST use Monitor tool, not Bash. HTTP commands safe for simple Bash invocation.
+
+4. **Table output default.** When no --output flag set, commands emit table format. Skill should parse JSON by appending `--output json`.
+
+5. **Profile/tenant context:** Global --profile and --tenant-id flags control context. Skill should allow setting these per invocation.
+
+6. **Error handling:** Commands use `SilenceErrors: true` on rootCmd, so stderr is suppressed. Skill must capture exit codes for error detection.
+
+7. **Interactive prompts disabled with --yes.** Paired device pairing, config confirmations, contact verification all respect cfg.Yes flag.
+
+---
+
+**Report Status: DONE**
+
diff --git a/plans/reports/researcher-260417-1254-claude-skill-authoring.md b/plans/reports/researcher-260417-1254-claude-skill-authoring.md
new file mode 100644
index 0000000..2ce1716
--- /dev/null
+++ b/plans/reports/researcher-260417-1254-claude-skill-authoring.md
@@ -0,0 +1,596 @@
+# Claude Skill Design Reference: GoClaw CLI Wrapper
+
+**Research Date:** 2026-04-17  
+**Researcher:** Technical Analyst  
+**Context:** Wrapping Go CLI (goclaw) as Claude Code skill  
+**Status:** DONE
+
+---
+
+## Executive Summary
+
+Claude Code skills wrap existing CLIs via YAML+Markdown frontmatter + instructions. Three findings matter most:
+
+1. **SKILL.md structure is stable** — `name`, `description` (recommended), `allowed-tools`, `disable-model-invocation` are canonical fields. Description is keyword-indexed for auto-invocation.
+2. **Settings.json patching is unsafe at install time** — No canonical jq pattern exists in the wild. Safest approach: user manual or hooks-based installation, NOT install.sh patching.
+3. **Progressive disclosure via references/** — Supported but optional. Thin SKILL.md (~100 lines) with links to `references/` docs avoids context bloat.
+
+**Skill triggers on keywords**: "goclaw", "gateway server", "remote execution", "AI agent", "manage agents", "deploy agents".
+
+---
+
+## 1. SKILL.md Structure (Canonical)
+
+### Frontmatter Fields
+
+```yaml
+---
+name: goclaw-cli
+description: Manage GoClaw AI gateway servers. Create, list, deploy, and run commands on remote agent servers. Use this skill whenever users mention GoClaw, managing gateway servers, deploying AI agents, or running commands on remote infrastructure.
+when_to_use: gateway server operations, agent deployment, remote execution
+allowed-tools: Bash(goclaw *)
+disable-model-invocation: false
+user-invocable: true
+---
+```
+
+**Required fields:**
+- `name` — Lowercase slug, hyphens ok, max 64 chars. Becomes `/goclaw-cli` command.
+- `description` — **CRITICAL:** Use "pushy" language with use-case keywords. Front-load trigger phrases. Capped at 1,536 chars combined with `when_to_use`. Claude matches by keyword.
+
+**Optional fields:**
+- `when_to_use` — Appended to description; use for specific trigger contexts.
+- `allowed-tools` — Pre-approved tools when skill active. Syntax: `Bash(goclaw *)` grants wildcard; `Bash(goclaw deploy)` grants exact.
+- `disable-model-invocation` — Set `true` to prevent auto-trigger (user invokes only via `/goclaw-cli`).
+- `user-invocable` — Set `false` to hide from `/` menu (Claude-only knowledge).
+- `argument-hint` — Autocomplete help, e.g., `[server-name] [command]`.
+- `context` — Set to `fork` to run in isolated subagent (uses full skill as prompt).
+- `paths` — Glob patterns limiting when skill activates (e.g., `goclaw/*.yaml`).
+
+**Not in use for CLI wrappers:**
+- `model` — Override model choice.
+- `effort` — Override inference effort.
+- `shell` — Default bash; set `powershell` for Windows.
+
+### Description Keyword Tuning
+
+Claude's skill matcher searches description for trigger phrases. For goclaw:
+
+**RECOMMENDED keywords:**
+- `"GoClaw"` — Explicit tool name
+- `"gateway server"` — Primary use case
+- `"manage agents"` — Core function
+- `"deploy agents"` — Action
+- `"remote execution"` — Capability
+- `"AI agent platform"` — Domain context
+- `"command execution"` — What it does
+
+**Example:**
+```
+Manage GoClaw AI gateway servers—the infrastructure for deploying, configuring, 
+and executing commands on remote AI agent instances. Use when setting up gateway servers, 
+deploying agents, running remote commands, or managing multi-server agent deployments.
+```
+
+**Avoid:** Passive voice, vague descriptions. "Tools for managing servers" ≠ good.
+
+---
+
+## 2. Progressive Disclosure via references/
+
+### Directory Structure (Optional but Recommended)
+
+```
+~/.claude/skills/goclaw-cli/
+├── SKILL.md                    # ~80 lines: overview + navigation
+├── references/
+│   ├── goclaw-auth.md          # Token setup, credential store
+│   ├── goclaw-commands.md      # Command catalog (auto-generated?)
+│   ├── goclaw-examples.md      # Worked examples
+│   └── troubleshooting.md      # Common issues
+└── scripts/
+    └── setup-credential.sh     # Helper for first-run config
+```
+
+### Loading Pattern
+
+Claude does **NOT** auto-load `references/` files. Instead:
+
+1. **Description always in context** (~1,536 chars max)
+2. **SKILL.md body loaded on trigger** (<500 lines recommended)
+3. **references/ loaded only when Claude `Read`s them**
+
+**Navigation in SKILL.md:**
+```markdown
+## Setup
+See [goclaw-auth.md](references/goclaw-auth.md) for token configuration.
+
+## Commands
+Complete command reference: [goclaw-commands.md](references/goclaw-commands.md)
+
+## Examples
+- [Common workflows](references/goclaw-examples.md)
+- [Troubleshooting](references/troubleshooting.md)
+```
+
+**Key principle:** Keep main SKILL.md under 500 lines. Offload:
+- Command catalogs (100+ lines)
+- API specs (detailed schemas)
+- Troubleshooting guides
+- Worked examples (3+ pages)
+
+---
+
+## 3. Allowed-Tools Pattern & Permission Rules
+
+### Bash Wildcard Syntax
+
+Two equivalent forms (space vs. colon):
+
+```yaml
+# Form 1: Space-based (standard)
+allowed-tools: Bash(goclaw *)
+
+# Form 2: Colon-based (equivalent)
+allowed-tools: Bash(goclaw:*)
+
+# Exact command only
+allowed-tools: Bash(goclaw deploy)
+
+# Multiple commands (space-separated)
+allowed-tools: Bash(goclaw *) Bash(jq *)
+```
+
+**Matching rules:**
+- `Bash(goclaw *)` — Matches `goclaw server list`, `goclaw deploy foo`, etc.
+- `Bash(goclaw deploy *)` — Matches `goclaw deploy server1 --force`, but NOT `goclaw list`
+- Word boundary matters: `Bash(goclaw *)` ≠ `Bash(goclaw*)` (latter matches `goclawing`)
+
+**Process wrappers stripped before matching:**
+- `timeout 30 goclaw deploy` matches `Bash(goclaw deploy *)`
+- Wrappers: `timeout`, `time`, `nice`, `nohup`, `stdbuf`, bare `xargs`
+
+### Known Limitations (2026)
+
+There are [reported issues](https://github.com/anthropics/claude-code/issues/14956) where `allowed-tools` in skill frontmatter does NOT enforce against arbitrary Bash calls. Claude can invoke tools not in the list. **Mitigation:** Rely on permission rules in `settings.json` as primary control.
+
+**Recommended pattern:** Combine skill `allowed-tools` + settings.json deny rules:
+
+```json
+{
+  "permissions": {
+    "allow": ["Bash(goclaw *)"],
+    "deny": [
+      "Bash(rm *)",
+      "Bash(sudo *)",
+      "Bash(curl *)"
+    ]
+  }
+}
+```
+
+---
+
+## 4. Settings.json Patching Pattern
+
+### The Problem
+
+No canonical safe pattern exists for install.sh to patch `~/.claude/settings.json`. Options evaluated:
+
+| Tool | Complexity | Dependency | Safety | Recommendation |
+|------|-----------|-----------|--------|-----------------|
+| **jq** | Low | Requires jq install | Medium | ❌ Fragile in edge cases |
+| **Python3** | Medium | $HOME/.claude/skills/.venv/bin/python3 | High | ✅ Recommended if auto-needed |
+| **Node.js** | Medium | Requires Node | Medium | ⚠️ Over-engineered |
+| **sed/awk** | Very High | Built-in | Low | ❌ JSON unsafe |
+| Manual user step | N/A | None | Very High | ✅ Best if infrequent |
+
+### Recommended Approach: Hybrid (User Consent + Hooks)
+
+**NOT during install.sh.** Instead:
+
+1. **Install skill to** `~/.claude/skills/goclaw-cli/`
+2. **Show user instructions** to manually add to `~/.claude/settings.json`
+3. **OR** use a one-time `/config` command to add permissions
+
+**Example install.sh:**
+```bash
+#!/bin/bash
+set -e
+
+SKILLS_DIR="${HOME}/.claude/skills/goclaw-cli"
+mkdir -p "$SKILLS_DIR"
+
+# Copy SKILL.md and references/
+cp SKILL.md "$SKILLS_DIR/"
+cp -r references/ "$SKILLS_DIR/"
+
+echo "✓ Skill installed to $SKILLS_DIR"
+echo ""
+echo "Next: Add permissions to ~/.claude/settings.json (or use /config command):"
+echo ""
+echo '  {
+    "permissions": {
+      "allow": ["Bash(goclaw *)"]
+    }
+  }'
+```
+
+### If Auto-Patching Required: Python3 Approach
+
+Only if auto-merge absolutely needed (e.g., enterprise rollout):
+
+```bash
+#!/bin/bash
+PYTHON="${HOME}/.claude/skills/.venv/bin/python3"
+
+if ! [ -f "$PYTHON" ]; then
+  echo "ERROR: Venv not found. Run: python3 -m venv ~/.claude/skills/.venv"
+  exit 1
+fi
+
+cat << 'PYTHON_SCRIPT' | "$PYTHON"
+import json
+import os
+
+settings_path = os.path.expanduser("~/.claude/settings.json")
+data = {}
+
+# Load existing
+if os.path.exists(settings_path):
+  with open(settings_path) as f:
+    data = json.load(f)
+
+# Ensure permissions key
+if "permissions" not in data:
+  data["permissions"] = {"allow": [], "deny": []}
+
+# Add goclaw allow rule (idempotent)
+if "Bash(goclaw *)" not in data["permissions"]["allow"]:
+  data["permissions"]["allow"].append("Bash(goclaw *)")
+
+# Write back
+with open(settings_path, "w") as f:
+  json.dump(data, f, indent=2)
+
+print(f"✓ Updated {settings_path}")
+PYTHON_SCRIPT
+```
+
+**Advantages:**
+- Idempotent (safe to run multiple times)
+- Handles existing structures
+- No jq dependency
+- Uses Python3 already in venv for other skills
+
+**Disadvantages:**
+- Requires functional venv
+- Still risky if JSON malformed
+
+---
+
+## 5. Examples from the Wild
+
+### Reference: GitHub CLI (gh) Skill
+
+From [awesome-copilot/gh-cli](https://github.com/github/awesome-copilot/blob/main/skills/gh-cli/SKILL.md):
+
+- **No explicit allowed-tools** in frontmatter (relies on user prompt/validation)
+- **Acts as reference doc** rather than orchestration skill
+- **Piping patterns:** `gh issue list --json number --jq ... | xargs ...`
+
+**Lesson:** CLI reference skills often don't pre-approve tools; they document the syntax and let Claude decide when/how.
+
+### Reference: Kubectl Skill (Community)
+
+From openclaw/skills:
+
+```yaml
+---
+name: kubectl
+description: Kubernetes cluster management. Deploy, scale, and troubleshoot workloads. Use for kubectl operations.
+allowed-tools: Bash(kubectl *)
+---
+```
+
+**Lesson:** Simple wildcard grant for bounded tools (kubectl, gh, psql).
+
+### Reference: Netresearch CLI Tools Skill
+
+[cli-tools-skill](https://github.com/netresearch/cli-tools-skill) provides multi-tool wrapper:
+
+```yaml
+allowed-tools: 
+  - Bash(docker *)
+  - Bash(kubectl *)
+  - Bash(terraform *)
+  - Bash(ansible *)
+```
+
+**Lesson:** Multi-tool skills use list format for clarity.
+
+---
+
+## 6. Skill Distribution for OSS (GitHub)
+
+### Recommended Structure
+
+```
+https://github.com/nextlevelbuilder/goclaw-cli-skill/
+├── SKILL.md
+├── references/
+│   ├── goclaw-auth.md
+│   ├── goclaw-examples.md
+│   └── troubleshooting.md
+├── install.sh
+├── README.md               # Installation + usage
+└── LICENSE
+```
+
+### Installation Methods (No Registration Needed)
+
+**Method 1: Direct copy**
+```bash
+mkdir -p ~/.claude/skills/goclaw-cli
+curl -fsSL https://raw.githubusercontent.com/nextlevelbuilder/goclaw-cli-skill/main/SKILL.md \
+  -o ~/.claude/skills/goclaw-cli/SKILL.md
+# User manually adds permissions to ~/.claude/settings.json
+```
+
+**Method 2: Run install.sh**
+```bash
+curl -fsSL https://raw.githubusercontent.com/nextlevelbuilder/goclaw-cli-skill/main/install.sh | bash
+# Shows instructions to add permissions
+```
+
+**Method 3: Marketplace (Optional, Future)**
+Anthropic's marketplace system (as of 2026) is emerging. Public skills can be shared via GitHub without registration if users know the path and manually copy. Marketplace auto-discovery not yet standard practice for OSS skills.
+
+### README Pattern
+
+```markdown
+# GoClaw CLI Skill for Claude Code
+
+Manage GoClaw AI gateway servers from Claude Code.
+
+## Installation
+
+\`\`\`bash
+curl -fsSL https://raw.githubusercontent.com/nextlevelbuilder/goclaw-cli-skill/main/install.sh | bash
+\`\`\`
+
+## Usage
+
+Invoke directly:
+\`\`\`
+/goclaw-cli list servers
+\`\`\`
+
+Or ask Claude contextually:
+\`\`\`
+How many gateway servers do we have deployed?
+\`\`\`
+
+## Permissions
+
+After installation, add to \`~/.claude/settings.json\`:
+
+\`\`\`json
+{
+  "permissions": {
+    "allow": ["Bash(goclaw *)"]
+  }
+}
+\`\`\`
+
+## Requirements
+
+- Claude Code CLI (2025+)
+- \`goclaw\` binary in \$PATH
+- Authentication token in \`~/.goclaw/config.yaml\`
+\`\`\`
+```
+
+---
+
+## 7. Architecture Recommendations for GoClaw Skill
+
+### SKILL.md Outline (Proposed)
+
+```markdown
+---
+name: goclaw-cli
+description: Manage GoClaw AI gateway servers. Deploy, configure, and run commands on remote agent infrastructure. Use when creating or managing gateway server instances, deploying agents, or executing remote commands on GoClaw infrastructure.
+when_to_use: gateway deployments, agent management, remote command execution
+allowed-tools: Bash(goclaw *)
+disable-model-invocation: false
+argument-hint: [command] [args...]
+---
+
+# GoClaw CLI Skill
+
+Execute goclaw commands to manage AI gateway servers and remote agents.
+
+## Quick Reference
+
+- **Servers:** `goclaw server list`, `goclaw server create`, `goclaw server delete`
+- **Config:** `goclaw config get`, `goclaw config set`
+- **Exec:** `goclaw exec [server] [command]`
+- **Status:** `goclaw status`, `goclaw logs [server]`
+
+## Common Patterns
+
+### List all servers
+\`\`\`bash
+goclaw server list --format json
+\`\`\`
+
+### Execute command on remote server
+\`\`\`bash
+goclaw exec my-gateway "agent run my-task"
+\`\`\`
+
+## Setup
+See [goclaw-auth.md](references/goclaw-auth.md) for first-time configuration.
+
+## Detailed Examples
+[goclaw-examples.md](references/goclaw-examples.md) covers common workflows.
+
+## Troubleshooting
+[troubleshooting.md](references/troubleshooting.md) for auth, network, and timeout issues.
+```
+
+### references/ Files (Proposed)
+
+**references/goclaw-auth.md** (~150 lines)
+- Token retrieval from credential store
+- ~/.goclaw/config.yaml format
+- Environment variable overrides
+
+**references/goclaw-examples.md** (~300 lines)
+- Create multi-server deployment
+- Monitor logs in real-time
+- Scale agents up/down
+- Rollback failed deployments
+
+**references/troubleshooting.md** (~150 lines)
+- "Connection refused" → check server running
+- "Auth failed" → token expired
+- "Timeout" → network issue or long-running command
+
+### Install.sh (Proposed)
+
+```bash
+#!/bin/bash
+set -e
+
+REPO="https://raw.githubusercontent.com/nextlevelbuilder/goclaw-cli-skill/main"
+SKILLS_DIR="${HOME}/.claude/skills/goclaw-cli"
+
+echo "Installing GoClaw CLI Skill for Claude Code..."
+
+# Create skill directory
+mkdir -p "$SKILLS_DIR/references" "$SKILLS_DIR/scripts"
+
+# Download files
+curl -fsSL "$REPO/SKILL.md" -o "$SKILLS_DIR/SKILL.md"
+curl -fsSL "$REPO/references/goclaw-auth.md" -o "$SKILLS_DIR/references/goclaw-auth.md"
+curl -fsSL "$REPO/references/goclaw-examples.md" -o "$SKILLS_DIR/references/goclaw-examples.md"
+curl -fsSL "$REPO/references/troubleshooting.md" -o "$SKILLS_DIR/references/troubleshooting.md"
+
+echo "✓ Skill installed to $SKILLS_DIR"
+echo ""
+echo "NEXT STEPS:"
+echo "1. Add permissions to ~/.claude/settings.json:"
+echo '   {"permissions": {"allow": ["Bash(goclaw *)"]}}'
+echo ""
+echo "2. Test: /goclaw-cli server list"
+echo ""
+echo "For auth setup, see: $SKILLS_DIR/references/goclaw-auth.md"
+```
+
+---
+
+## 8. Security & Permissions Design
+
+### Skill-Level Allow (Recommended)
+
+```yaml
+allowed-tools: Bash(goclaw *)
+```
+
+**Effect:** Pre-approved only when `/goclaw-cli` is invoked. Claude can still call other tools without prompt in auto/bypassPermissions modes. Use for least surprise.
+
+### Settings.json Allow (Comprehensive)
+
+```json
+{
+  "permissions": {
+    "allow": [
+      "Bash(goclaw *)"
+    ],
+    "deny": [
+      "Bash(sudo *)",
+      "Bash(rm *)"
+    ]
+  }
+}
+```
+
+**Effect:** Allows goclaw CLI everywhere, blocks dangerous ops site-wide.
+
+### No Need for Deny in Skill
+
+Don't list `Bash(goclaw delete *)` in allowed-tools to restrict subcommands. Instead:
+
+1. **Document safe patterns** in SKILL.md
+2. **Use settings.json deny** if you want to block `goclaw delete` globally
+3. **Let Claude apply discretion** based on context
+
+---
+
+## Trade-offs & Adoption Risks
+
+| Decision | Pros | Cons | Risk Level |
+|----------|------|------|-----------|
+| **Skill-level allowed-tools only** | Simple, scoped to skill | Not enforced if bug exists | Medium |
+| **Settings.json + Skill** | Defense-in-depth, explicit | More setup friction | Low |
+| **Auto-patch settings.json** | One-command install | Breaks on malformed JSON | High |
+| **Manual user step** | Transparent, safe | Requires user action | Low |
+| **references/ docs** | Keeps SKILL.md thin | Requires Claude to Read files | Low |
+| **Monolithic SKILL.md** | All in one place | Context overhead | Medium |
+
+**Recommendation:** Skill + manual settings.json step. Show user exact JSON to copy on first run.
+
+---
+
+## Known Limitations & Open Questions
+
+### Limitations Found
+
+1. **allowed-tools in skill frontmatter may not be enforced** — [Issue #14956](https://github.com/anthropics/claude-code/issues/14956). Workaround: rely on settings.json permissions as source of truth.
+
+2. **Wildcard patterns fragile with complex Bash** — Patterns like `Bash(goclaw * --force)` don't work reliably. Use simple prefix patterns only.
+
+3. **Settings.json patching unsafe at scale** — No jq/Python standard. Prefer user consent + manual edit or hooks-based flow.
+
+4. **Skill descriptions truncated at 1,536 chars** — Can't fit full command reference. Must use references/ for detailed docs.
+
+5. **references/ not auto-loaded** — Claude must explicitly Read. Include navigation links in SKILL.md.
+
+### Unresolved Questions
+
+1. **Marketplace discovery** — Will Claude Code ever auto-suggest skills from GitHub? Currently no auto-registry for OSS skills. User must manually install or know the URL.
+
+2. **Skill versioning** — How to manage breaking changes if goclaw API changes? No versioning standard yet. Current practice: keep old skills in subdirs (`goclaw-cli-v1/`, `goclaw-cli-v2/`).
+
+3. **Multi-profile auth** — GoClaw may support multiple auth profiles. Should skill support `--profile` flag? Design decision needed.
+
+4. **Auto-update mechanism** — Should install.sh check for newer versions? Not standard practice yet.
+
+5. **Cross-platform compatibility** — Tested on macOS/Linux. Does install.sh work on Windows (PowerShell)? May need separate `.ps1` script.
+
+---
+
+## Sources
+
+- [Extend Claude with skills - Claude Code Docs](https://code.claude.com/docs/en/skills)
+- [Configure permissions - Claude Code Docs](https://code.claude.com/docs/en/permissions)
+- [Claude Code settings - Claude Code Docs](https://code.claude.com/docs/en/settings)
+- [GitHub CLI (gh) Skill Example](https://github.com/github/awesome-copilot/blob/main/skills/gh-cli/SKILL.md)
+- [OpenClaw kubectl Skill](https://github.com/openclaw/skills/blob/main/skills/ddevaal/kubectl/SKILL.md)
+- [Netresearch CLI Tools Skill](https://github.com/netresearch/cli-tools-skill)
+- [Issue #14956: allowed-tools enforcement](https://github.com/anthropics/claude-code/issues/14956)
+- [The SKILL.md Pattern — Bibek Poudel, Medium (Feb 2026)](https://bibek-poudel.medium.com/the-skill-md-pattern-how-to-write-ai-agent-skills-that-actually-work-72a3169dd7ee)
+- [Awesome Agent Skills Repository](https://github.com/VoltAgent/awesome-agent-skills)
+
+---
+
+**Report Status:** DONE
+
+**Next Steps (Implementation):**
+1. Create SKILL.md with recommended frontmatter + structure
+2. Write references/goclaw-auth.md, goclaw-examples.md, troubleshooting.md
+3. Create install.sh with manual permission guidance (NO auto-patch)
+4. Test skill with Claude Code interactively (/goclaw-cli commands)
+5. Document in CLAUDE.md under "GoClaw Skill" section