v0.2 — Kubernetes management tab + fat-privileged container + CI repair by tm4rtin17 · Pull Request #1 · tm4rtin17/ControlRoom

tm4rtin17 · 2026-05-09T18:21:15Z

Summary

Promotes 25 commits from v0.2 to main for the v0.2.0 release. Three big themes:

Kubernetes management tab end-to-end (Phases A–D). Read-only inventory + detail drawers + pod log streaming + pod exec + lifecycle actions (restart/scale/delete/cordon) + ConfigMap structured editor + read-only Secret viewer (masked, audited) + Monaco YAML editor with server-side dry-run.
Fat-privileged Docker container as a deployment shape that drives every host integration (Services, Updates, Network, Logs/journal, Terminal, K8s) without sidecars. Bare-metal install remains the unprivileged path. Image went from ~25 MB distroless to ~230 MB Debian-slim with the binaries the integrations need; documented as the higher-blast-radius option.
CI repaired end-to-end. The pipeline had been failing at workflow startup since v0.1 — even main's last push (m9 polish) failed at 0s. Fixed in five commits: Go version drift, invalid YAML in the awk-format step, missing package-lock.json + outdated golangci-lint, missing v2 config, Dockerfile/vite path mismatch.

Plus: terminal PAM login flow, Tailscale CGNAT awareness in the public-bind banner, capability-gated nav (hides tabs whose backend isn't reachable), two nil-slice JSON fixes that were blanking the Containers and Network tabs, and a docker-compose volume-external fix so admin accounts don't get stranded across docker run ↔ docker compose up switches.

Full per-section detail in CHANGELOG.md.

Verification

CI: all three jobs (Go vet+lint+test, Web typecheck+build, Docker image) pass on the v0.2 head (latest run).
Sensitive-data audit re-run before this PR — zero secret-shaped strings introduced across the 25 commits; runtime data lives under /var/lib/controlroom/ and the Docker named volume, both outside the repo; .claude/ (local agent definitions) ignored.
All four K8s phases verified in-cluster on the test K3s cluster: image loaded onto both nodes, RBAC widening applied, deployment rolled, every endpoint mounts and auth-gates correctly, pods/exec works through SPDY, manifest dry-run + apply work through the Monaco editor.

Migration notes for `main`

Container deployment: run docker volume create controlroom-data once before docker compose up. The compose volume is now external: true to avoid the docker run ↔ compose namespacing trap.
In-cluster deployment: re-apply deploy/k8s/rbac.yaml to pick up the widened verbs (pods/exec: create, pods: delete, nodes: patch, apps/*: patch,update, apps/*/scale: update, services/configmaps: update, secrets: get,list).

Test plan

Reviewer pulls v0.2, runs make image && docker volume create controlroom-data && docker compose -f deploy/docker-compose.yml up -d, sees the K8s tab with full feature set.
Reviewer scans CHANGELOG.md § 0.2.0 for accuracy.
Reviewer skims docs/SECURITY.md § "Privilege scoping by deployment shape" — make sure the three-shape compromise-impact statements match what we want to publish.
Optional: kubectl apply -f deploy/k8s/{namespace,rbac,pvc,deployment,service}.yaml against a test cluster to verify the in-cluster shape still rolls cleanly.

🤖 Generated with Claude Code

- PublicBindBanner: treat 100.64.0.0/10 (Tailscale CGNAT) as private so the destructive "public-looking address" warning no longer fires when reaching ControlRoom over Tailscale. - internal/logs: add Available() (cached exec.LookPath); /api/logs/journal and /ws/logs/journal now return 503 with a clear operator message when journalctl is missing instead of bubbling up an exec error as 500. - main: warn at boot when journalctl is not in PATH, matching the existing systemd-unavailable warning. - web/Logs: source toggle (Journal / Containers). Containers mode reuses /api/containers + the existing /ws/containers/:id/logs WebSocket — no new backend route — with picker, client-side substring filter, pause, and 1k-line cap. Journal-mode errors now surface the real backend message instead of "Could not fetch logs." - deploy/docker-compose: add group_add: ["${DOCKER_GID:-999}"] so the nonroot uid 65532 can read /var/run/docker.sock; refresh the header comment to match the new logs behavior.

- .claude/ — local Claude Code config and project agent definitions (.claude/agents/*.md). These contain working-tree-specific tooling and should not be published. - *.crt — round out the existing *.pem / *.key entries so a stray TLS certificate dropped in the tree can't be committed.

Adds GET /api/system/capabilities — `{systemd, docker, journal}` booleans derived from the existing nil-client / logs.Available() patterns — and threads useCapabilities() into AppLayout to filter the sidebar. In the container deployment this hides the Services tab (no dbus socket mounted, no privilege to drive host systemd) instead of letting the operator click into a dead-end 503. The Containers tab gets the same treatment so a host without /var/run/docker.sock isn't shown a broken tab either. The Logs tab stays visible regardless: it has its own Journal/Containers source toggle and degrades gracefully on its own.

The frontend types Container.ports and Container.labels as non-null arrays/objects (web/src/lib/containers.ts) and dereferences them directly (e.g. ContainerCard.tsx accesses container.ports.length). Go's encoding/json marshals nil slices and nil maps as null, so a container with no published ports or no labels would crash the SPA with a blank-screen runtime error. Initialize Ports, Labels, Mounts, Networks, Command, and Env to non-nil empty values in List() and Inspect() before the struct is returned. The compose label lookups now read from the local nil-guarded copy instead of the daemon's possibly-nil map.

Adds a Kubernetes tab to the SPA backed by client-go, plus K8s manifests to deploy ControlRoom into the cluster it's meant to manage. Read-only for now: Nodes, Namespaces, Workloads (Deployment/StatefulSet/DaemonSet), Pods, Services. No write actions, log streaming, or exec — those land in Phase B/C/D. Backend: - internal/k8s/ — client-go wrapper. New(ctx) tries in-cluster first, then $KUBECONFIG, ~/.kube/config, /etc/rancher/k3s/k3s.yaml. Returns ErrUnavailable on all failures, mirroring the docker pattern. - internal/api/k8s/ — REST handlers under /api/k8s/{nodes,namespaces, workloads,pods,services}. Nil-client → 503. - /api/system/capabilities adds `kubernetes: bool` so the SPA hides the tab when the cluster client isn't reachable. - Best-effort wiring in main.go matching systemd/docker patterns. Frontend: - web/src/lib/k8s.ts — types + TanStack Query hooks (10s refetch). - web/src/routes/Kubernetes.tsx — namespace picker, sub-tabs, search. - web/src/components/k8s/ — list components per resource kind. - formatRelativeTime() helper for the Age columns. Deploy: - deploy/k8s/{namespace,rbac,pvc,deployment,service,ingress}.yaml — minimal in-cluster install. ClusterRole has get/list/watch only on the resources we surface; widens in Phase C when actions land. - Single-replica deployment by design — SQLite + ROX PVC. - Pod runs as uid 65532, drop ALL caps, readOnlyRootFilesystem. Verified: - make image succeeds. - Image loaded into both K3s nodes (multi-node cluster). - Pod reaches Ready, /api/healthz returns 200. - No "kubernetes unavailable" warning in logs — client-go connects via the projected SA token.

Adds click-to-detail across the Kubernetes tab and a live log viewer for pods. Read-only still — no write actions yet. Backend (internal/k8s/detail.go, internal/api/k8s/k8s.go): - GET /api/k8s/nodes/:name → NodeDetail (conditions, allocatable, taints, events). - GET /api/k8s/workloads/:namespace/:kind/:name → WorkloadDetail for Deployment / StatefulSet / DaemonSet (selector, strategy, conditions, events). - GET /api/k8s/pods/:namespace/:name → PodDetail (per-container status, conditions, QoS, node, events). - GET /api/k8s/services/:namespace/:name → ServiceDetail (endpoint addresses, selector, events). - WS /ws/k8s/pods/:namespace/:name/logs?container=&tail= — streams via CoreV1.Pods().GetLogs().Stream(). Frame shape matches /ws/containers/:id/logs (type=line, stream=stdout, line) so the SPA reuses the LogStream UX. - Events listed via field selector on involvedObject — server-side filter, no full-list scan. Cluster-scoped Node events scan all namespaces. - DNS-1123 regex with length guard validates :namespace and :name before they reach the API server. Frontend (web/src/components/k8s/*Detail.tsx + helpers): - Sheet-based drawer per resource kind, opened by row click in the list views. State held in routes/Kubernetes.tsx so opening a pod doesn't clobber a node selection. - Reusable EventsTable and ConditionsTable. - PodLogStream — WebSocket viewer mirroring ContainerLogStream (pause/resume/clear, 1000-line cap, auto-scroll). - Container picker in PodDetail: clicking a container row syncs the active container in the picker chips and the log stream. - Used the existing Sheet primitive — no new deps. RBAC widening (deploy/k8s/rbac.yaml): - pods/log → get (so the SA can stream container output). - endpoints → get/list/watch (for ServiceDetail.endpoints). - Still no secrets, no PVs, no write verbs. Phase C territory. Verified: - make image, image loaded into both K3s nodes (piserver + piserver2). - kubectl apply -f deploy/k8s/rbac.yaml succeeded. - rollout restart deployment/controlroom rolled cleanly within 90s. - /api/healthz → 200, /api/system/capabilities reports kubernetes:true. - kubectl auth can-i get pods/log as the SA → yes. - No "kubernetes unavailable" warning in pod logs.

The docker-compose container can't reach K3s' API server at 127.0.0.1:6443 from inside its bridge namespace, and the cert SAN doesn't include the docker bridge gateway IP. So we rewrite the kubeconfig server URL to the host's K3s node-ip (which is in the cert SAN by default) and mount that copy read-only into the container. deploy/scripts/setup-host-kubeconfig.sh - One-shot helper run on the host (sudo). Reads /etc/rancher/k3s/k3s.yaml, picks a host IP that's in the cert SAN (override → /etc/rancher/k3s/config.yaml node-ip → hostname -I), writes the rewritten config to /var/lib/controlroom/kubeconfig with mode 0600 and owner 65532:65532 (the controlroom container's nonroot uid). - Verifies the chosen IP is in the cert's SubjectAltName before writing — warns the operator if not. deploy/docker-compose.yml - Mount /var/lib/controlroom/kubeconfig:/etc/k3s/kubeconfig:ro - KUBECONFIG=/etc/k3s/kubeconfig — client-go's first lookup path. - Comment header reframed: progressive integration model. Docker + K8s wired now; journal / systemd / apt left as TODO mounts. Security note: the K3s admin kubeconfig has cluster-admin. The container now has cluster-admin. Matches the user's stated intent of "fat-privileged container" but worth calling out — RBAC the SA later if you want to scope this down.

…, network Bundle that lights up every host-integration tab in the docker-compose deployment by giving the container the host namespaces, capabilities, and bind mounts it needs. Bare-metal install (deploy/install.sh) is unchanged and remains the unprivileged path. Image (deploy/Dockerfile) - Drop gcr.io/distroless/static-debian12:nonroot, switch runtime to debian:bookworm-slim. - Install: bash, ca-certificates, sudo, util-linux (nsenter), procps, iproute2, iptables, ufw, systemd (journalctl + systemctl + libsystemd0 + libdbus-1-3), apt-utils. - Run as root; SYS_ADMIN can't survive a uid transition. - Image size 230MB (up from ~20MB distroless). Acceptable; documented. Compose (deploy/docker-compose.yml) - network_mode: host, pid: host - cap_add: SYS_ADMIN (nsenter) + SYS_PTRACE (host ps/top) + NET_ADMIN (ufw/iptables) - security_opt: apparmor:unconfined + seccomp:unconfined - user: 0:0 - Mounts: dbus socket (rw), /run/log/journal + /var/log/journal + /etc/machine-id (ro) for journalctl, /etc/rancher/k3s/k3s.yaml directly (network_mode:host means 127.0.0.1:6443 is reachable so the rewritten kubeconfig from setup-host-kubeconfig.sh is no longer needed), /var/cache/apt rw + /etc/apt + /var/lib/dpkg ro for apt. - env: CR_HOST_SHELL=true so pty wraps shells with nsenter. - Header reframed with security model + integration matrix. pty (internal/pty/pty.go + config.go + api/terminal + api/router) - New Options.HostShell. When true, exec.Command becomes `/usr/bin/nsenter -t 1 -m -u -i -n -p -- <shell> --login` so the user gets a shell in the host's mount/UTS/IPC/network/PID namespaces — equivalent to ssh-ing into the host. - CR_HOST_SHELL env var (default false). Wired through config.Config → api.Deps.Cfg → terminalapi.Deps.HostShell → pty.Options.HostShell. - Stat-checks /usr/bin/nsenter when HostShell=true; clear error if missing instead of an opaque exec failure. - Bare-metal behavior is byte-identical when CR_HOST_SHELL=false. setup-host-kubeconfig.sh + deploy/k8s/README.md - Header rewritten: script no longer needed for the standard docker-compose deployment, retained for non-host-network cases. - K8s README adds a paragraph contrasting in-cluster (unprivileged nonroot SA, scoped RBAC) vs docker-compose (fat-privileged) shapes. Verified: - make image succeeds at 230MB. - Container boots clean — none of the four "unavailable" warnings (systemd / journalctl / kubernetes / docker). - host_shell=true logged at boot. - All required binaries present in image: bash, sudo, nsenter, ip, ufw, journalctl, systemctl, apt, apt-get. - `nsenter -t 1` from inside the container produces a real host shell (returns the host hostname + kernel). - /api/healthz → 200. Security note (re-stating compose header): a compromise of ControlRoom is now effectively root on the host. Intended tradeoff for a single-user homelab where the operator already has root. The bare-metal install remains the unprivileged option.

The fat-privileged container runs as root, and the previous nsenter default dropped the operator straight into a root shell — a meaningful escalation over the SSH UX it should mirror. Now it runs login(1) inside the host namespaces; the user types host credentials in the xterm, PAM authenticates, and login execs the user's shell at the user's uid only on success. Same audit trail as SSH (auth.log, last-login records). Wiring: - internal/pty/pty.go: new Options.LoginMode. When HostShell+LoginMode the spawned process is `nsenter -t 1 -m -u -i -n -p -- /bin/login`. Removed the misleading container-side stat for login (after nsenter -m the binary resolves from the host's filesystem; a stat from the container's view doesn't tell us anything useful). - internal/config/config.go: CR_TERMINAL_LOGIN env var, default false. - internal/api/terminal + router: thread cfg.TerminalLogin through to pty.Options.LoginMode. - cmd/controlroom/main.go: log the new flag at boot. - deploy/docker-compose.yml: CR_TERMINAL_LOGIN=true alongside CR_HOST_SHELL=true. Comment explains the security improvement. Bare-metal install is unchanged — CR_TERMINAL_LOGIN defaults false, existing pty.New behavior preserved. Refused setup: LoginMode without HostShell returns an error. login(1) is meaningful only when there are namespaces to enter; otherwise we'd just be running it as the controlroom user with nothing to gain.

…ompt util-linux login(1) refuses to run interactively when execv'd from a regular process — it expects to be slave to getty/sshd, with specific TTY ioctls, and exits silently otherwise. Confirmed end-to-end: even with a real PTY allocated by Python and a controlling terminal set, /bin/login produces no output and exits 1. su(1) is a regular interactive program that works the way login should. Replace the LoginMode exec target with a tiny inline bash script that: 1. Prints a short banner with the hostname. 2. Prompts for the username. 3. exec su -l "$username". su then prompts for the password through PAM (host /etc/pam.d/su rules), authenticates, and execs the user's shell at the user's uid. The password never touches Go, audit goes through PAM as before, and the UX is identical to SSH: type user, type password, you're in. LoginPath constant removed; replaced with the loginScript string. No external file added — the script is small enough to live as a string constant in pty.go. Verified: a Python PTY harness around the same exec.Command path that pty.New uses now prints "ControlRoom Terminal — host login at <host>" followed by "username: " and waits for input, exactly like SSH.

Root invoking su(1) is special-cased to skip PAM authentication — "root can become anyone" — so when the controlroom container (uid 0) ran `su -l pi` the operator landed as pi without ever proving the password. That's not the SSH-equivalent UX we wanted; it's worse than the old root-shell default because now it's pretending to authenticate. Drop to nobody:nogroup with setpriv before invoking su. su now sees a non-root caller, runs full PAM auth via /etc/pam.d/su, the password is prompted in the xterm, and on success the setuid bit on /bin/su lets it escalate to root and then drop to the target user. Lockouts and auth.log entries flow through PAM as expected. Verified: a Python PTY harness around the new exec path now prints "Password:" after entering the username, exactly like SSH.

Same bug class as 74dd538 fixed for the Docker client — Network.tsx crashes the SPA with a blank screen when iface.ips or fw.data.rules arrives as null instead of an empty array, because the frontend dereferences .length directly. Two fields needed nil-guards: - internal/network/iface.go: Interface.IPs starts as a nil slice when an interface has no addresses (e.g. some down/dummy/bridge ifaces). Now initialized to []string{}. Flags inherits whatever `ip -j addr show` returns (which is non-nil in practice but not guaranteed) so we nil-guard that too. - internal/network/ufw.go: UFWStatus.Rules starts as a nil slice when the firewall is disabled or has no rules. Now initialized to []UFWRule{} in parseUFWStatus before the parse loop. Verified: rebuilt image, recreated container. The Network tab should render an empty Interfaces grid + "No rules" instead of crashing.

The shipped docs lagged badly behind v0.2: - README claimed "v0.1 in development", didn't mention the Kubernetes tab, listed footprint targets that no longer apply, and warned that the Docker container deployment lost Services and Logs (we wired both via the fat-privileged container). - INSTALL.md only covered bare-metal + the old distroless docker flavor; nothing about the in-cluster Pod, host-shell login, or the new env vars. - SECURITY.md described privilege scoping for one shape only; the new fat-privileged container needed an explicit "this is effectively root on the host" callout, and the in-cluster Pod needed a section on the scoped ClusterRole. This commit: README.md (~140 lines) - v0.2 status; Kubernetes in the feature list; three deployment shapes table; quick-start per shape; updated project layout (deploy/k8s, deploy/scripts, internal/k8s, internal/api/k8s); Go 1.25+; doc index pointing at INSTALL/CONFIG/SECURITY/SPEC. docs/INSTALL.md (rewritten, ~340 lines) - Choose-a-shape comparison table with privilege model + which tabs work in each. - Path A bare-metal: prerequisites, install, first-run wizard, update, uninstall, common pitfalls. - Path B Docker fat-privileged: full mount/cap inventory, the nsenter+setpriv+su login flow explained step-by-step, common pitfalls (terminal misbehaviors, kubeconfig issues). - Path C in-cluster Pod: prerequisites, image-loading on multi-node K3s, ClusterRole scope (current + Phase C plans), three access patterns (Ingress / port-forward / NodePort), common pitfalls (ImagePullBackOff, missing pods/log RBAC). - TLS modes section; Kubernetes integration section covering the four kubeconfig discovery paths. - Post-install checklist. docs/CONFIG.md (new, ~180 lines) - Every CR_* env var with default + purpose, grouped by Core, TLS, Integrations, Container-only, Compose helpers. - Per-tab capability matrix (which tab works in which shape and what mount/cap each one needs). - Per-shape file paths table. - Capability detection / debugging walkthrough. - Compose env-file pattern. docs/SECURITY.md (rewritten, ~180 lines) - Bumped to v0.2. - Privilege scoping section split into three subsections (bare-metal / fat-privileged container / in-cluster Pod) with explicit compromise-impact statements. - Audit section calls out PAM/auth.log + apt history.log as the real audit trail for the fat-privileged container's host ops. - Public-bind detection updated to mention RFC 6598 / Tailscale CGNAT awareness (was listed as a gap; now shipped). - Known gaps refreshed: K8s Phase C/D, cert-manager TLS for the Pod shape, etc.

Adds the four standard cluster operations to the SPA, with audit and RBAC scoped to exactly what's needed. Backend (internal/k8s/actions.go, internal/api/k8s/k8s.go) - RestartWorkload(ns, kind, name): merge-patches the pod template annotations with kubectl.kubernetes.io/restartedAt = now. Same trick `kubectl rollout restart` uses. Works for Deployment / StatefulSet / DaemonSet via case-insensitive kind dispatch. - ScaleWorkload(ns, kind, name, replicas): GetScale → mutate Spec.Replicas → UpdateScale, preserving resourceVersion so the optimistic-concurrency check in the API server actually fires. Deployment + StatefulSet only; DaemonSet returns 400. - DeletePod(ns, name, gracePeriod, force): Delete with optional grace-period override. force=true hard-overrides ?grace=N to 0. - CordonNode(name, cordoned): merge-patches spec.unschedulable. Distinct audit actions for cordon vs uncordon so the trail captures intent. - Validation: every :namespace, :name, :kind goes through the existing DNS-1123 regex + length guard. Replicas bounds-checked 0..1000 server-side. - Audit: every action writes a store.AuditEntry with target=ns/name (or just name for cluster-scoped Node), detail map carrying the relevant params. Outcome=failure includes the error message. Best-effort write, never fails the request. - DB threaded into k8sapi.Deps; router updated. RBAC widening (deploy/k8s/rbac.yaml) - apps/{deployments,statefulsets,daemonsets}: patch (restart) - apps/{deployments,statefulsets}/scale: update (scale) - core/pods: delete (rotate) - core/nodes: patch (cordon) - Still NO secrets, NO persistentvolumes, NO delete on apps/* (would let the SPA wipe a workload outside GitOps). Frontend (web/src/lib/k8s.ts + the three Detail drawers) - useRestartWorkload, useScaleWorkload, useDeletePod, useCordonNode — each useMutation invalidates the matching list + detail keys on success. Pattern matches useContainerAction. - WorkloadDetail: Restart button (always); Scale button with a number input prefilled from ready.desired (Deployment+STS only). - PodDetail: red Delete button with a "Force (skip graceful shutdown)" checkbox; closes the drawer on success via a new onClose prop wired up from routes/Kubernetes.tsx. - NodeDetail: toggle button derives current cordon state from the node.kubernetes.io/unschedulable taint (which `kubectl cordon` applies), falling back to the SchedulingDisabled condition. - Every destructive action goes through AlertDialog with the impact spelled out. Pending state + inline error on failure. Verified end-to-end: - make image succeeds at 230MB. - Image loaded to both K3s nodes. - kubectl apply rbac.yaml; rollout restart succeeds. - 4 new endpoints all return 401 (auth required, routes mounted). - kubectl auth can-i {patch deployments, update deployments/scale, delete pods, patch nodes} as the SA → all "yes". - SPA bundle contains all four action mutation hooks.

Adds "shell into a container" to the Kubernetes tab. Equivalent to `kubectl exec -it <pod> -c <container> -- /bin/sh`, but in the browser. Backend (internal/k8s/exec.go + internal/api/k8s/k8s.go) - Client.PodExec uses k8s.io/client-go/tools/remotecommand to open a SPDY exec stream to the API server. Stashes the rest.Config on the Client struct (NewSPDYExecutor needs it; the existing typed clientset wasn't enough). - WS handler at /ws/k8s/pods/:namespace/:name/exec wires: WS binary in → io.Pipe → SPDY stdin SPDY stdout → wsExecWriter → WS binary out SPDY stderr → wsExecWriter → WS binary out (merged; xterm doesn't distinguish) WS text JSON → resize → wsSizeQueue → SPDY resize stream - Wire format mirrors /ws/terminal exactly so the frontend can clone Terminal.tsx's xterm wiring with minimal changes: init: { rows, cols, container, command? } (default command ["/bin/sh"]) err: { type: "error", err: "..." } - Lifecycle correctness: cancelling ctx unblocks StreamWithContext, closing stdinW unblocks any blocked SPDY stdin read, closing sq.ch causes wsSizeQueue.Next to return nil so the SPDY size tracker exits cleanly. No goroutine leaks. - Audit: k8s.pod.exec.start at session open, k8s.pod.exec.end on close with bytes_in/bytes_out/duration_ms — same shape as terminal.session_end so a single audit query catches both. - Idle timeout 30 min, matching the terminal handler. RBAC (deploy/k8s/rbac.yaml) - core/pods/exec: [create]. Minimum verb needed for the K8s API server to accept the SubjectAccessReview that the SPDY exec call triggers. Nothing else widened in this commit. Frontend (web/src/components/k8s/PodExecModal.tsx + PodDetail.tsx) - Sheet-based modal: container picker, command picker (sh / bash / custom), Reconnect, status pill — same xterm theme + addons as routes/Terminal.tsx. - PodDetail Containers table gets an Exec column. Click → modal opens with that container preselected. e.stopPropagation on the button so it doesn't fight the row's log-viewer container picker. - Reconnect generation counter bumps on container or command change, forcing a clean WS rebuild with the new init frame. - podExecURL helper added to web/src/lib/k8s.ts. Dependency churn - go mod tidy pulled in the SPDY transitive set: github.com/gorilla/websocket github.com/moby/spdystream github.com/mxk/go-flowrate All transitive of k8s.io/client-go/tools/remotecommand. No new direct dependencies. Verified end-to-end: - make image succeeds. - Image loaded into both K3s nodes. - kubectl apply rbac.yaml; rollout restart succeeds. - kubectl auth can-i create --subresource=exec pods → yes (SA's perms via can-i --list also confirm pods/exec [create]). - WS upgrade attempt returns 401 unauthenticated (route mounted, auth gate firing). - SPA bundle contains the new PodExecModal + podExecURL strings.

Adds a ConfigMaps tab to the Kubernetes route, with full list / view / edit. Uses a simple key-value editor for now (no Monaco — that lands with D4's manifest editor for the full YAML edit experience). Backend (internal/k8s/configmaps.go + internal/api/k8s/k8s.go) - GET /api/k8s/configmaps?namespace= → { configmaps: [{name, namespace, keys[], age}] } List omits data values to keep responses small; only metadata and the lex-sorted key list. - GET /api/k8s/configmaps/:ns/:name → ConfigMapDetail (configmap + labels + annotations + data + binary_keys + events). binary_keys lists names of binaryData entries; the bytes themselves aren't returned (avoids accidentally exposing non-text secrets-in-configmaps). - PUT /api/k8s/configmaps/:ns/:name body: { data: Record<string,string> } resp: {ok: true} / 400 (invalid key) / 404 / 409 (conflict) / 413 (over 1 MiB). Uses Update (not Patch) so the API server's resourceVersion-based optimistic concurrency check fires and we get 409 instead of silent overwrite when two operators edit at once. - Validation: key matches `^[a-zA-Z0-9._-]+$`, total `len(k)+len(v)` under 1 MiB. Audit row per update with key_count + size_bytes. metadata.name/namespace are URL-canonical and not editable; labels/annotations are read-only here (D4 will handle them via manifest edit). Frontend (lib/k8s.ts + components/k8s/ConfigMap{List,Detail}.tsx) - Three new TS interfaces matching the backend contract; three hooks (useK8sConfigMaps, useK8sConfigMapDetail, useUpdateConfigMap) following the Phase B/C invalidation pattern. - ConfigMapList: table with Name / Namespace / Keys-count / Age, namespace picker driven from the existing parent state. - ConfigMapDetail: Sheet drawer with key/value table editor (`<Input>` for keys, multi-line `<textarea>` for values), add/remove rows, Save with AlertDialog confirmation showing key count + total bytes via formatBytes. Dirty detection via JSON.stringify of sorted entries against the original snapshot. - 409 conflict UX: dedicated inline Alert with a Reload button that refetches detail and discards local edits. - Read-only sections for labels (chips), annotations (collapsible), binary_keys ("Edit via kubectl; not editable here"), events (reuses existing EventsTable). RBAC (deploy/k8s/rbac.yaml) - core/configmaps: [update]. get/list/watch were already in scope from Phase A. No widening for binary data — read still returns only key names. Verified end-to-end: - make image succeeds. - Image loaded into both K3s nodes; rollout restart clean. - Docker compose container recreated with the new image. - kubectl auth can-i update/list configmaps as the SA → both yes. - All 3 endpoints return 401 (mounted, auth-gated). - SPA bundle contains ConfigMapList / ConfigMapDetail / hooks.

CI has been failing at workflow startup since before v0.2 (the m9 push to main also failed). Three real issues plus a couple of polish items: 1. Go version drift. ci.yml pinned Go 1.23 while go.mod requires go 1.25.0 (k8s.io/client-go pulled in via Phase A). setup-go would install 1.23 and `go mod download` would fail. Switched to `go-version-file: go.mod` so the toolchain follows go.mod and can't drift again. 2. Wrong embed-stub path. internal/web/web.go does //go:embed all:dist from its own package directory, so the embed needs internal/web/dist/index.html — not web/dist/index.html. The old stub step created the wrong path; the Go compile would fail with "no matching files for embed". Fixed. 3. Trigger only ran on `push: [main]`, never on dev branches. Added "v*" so v0.2 (and future v0.3, v0.4) get CI feedback before merge. Polish: - Added a `Vet` step (cheap, fast, often catches things lint misses). - Scoped the test run to ./internal/... ./cmd/... — ./... would descend into web/node_modules/flatted/golang/pkg/flatted (an npm dep that ships an embedded .go file) and fail to compile. - npm cache + cache-dependency-path on the web job for faster reruns. - Lint kept on `latest` with a comment explaining we'll pin if it ever flakes from a release upgrade. The Docker image job is unchanged. It builds the same fat-privileged runtime image we deploy from. Image size is ~230MB now (vs ~25MB when ci.yml was first written) — the existing "Inspect image size" step just reports, no enforcement, so no further change needed.

That step's run: scalar contained `image size: %.1f` — YAML plain scalars can't contain colon-space, which made the parser see the `%.1f` as a value of an implicit mapping with key `image size`. The result was the entire workflow failing at startup with "this run likely failed because of a workflow file issue" — zero jobs running, zero seconds duration. Same bug existed on v0.1; main's m9 push hit it too. Switched to `run: |` block scalar so the awk format string is opaque to YAML.

Two fixes for the first real-execution CI run: 1. web/package-lock.json wasn't tracked, so the runner couldn't resolve `cache-dependency-path`, the cache step errored, and `npm ci` would have fallen through to `npm install` anyway — neither reproducible nor cached. Tracked the lock file (188K, one-time inflation; future updates will be small diffs). 2. golangci-lint v1.64.8 (which the v6 action's `latest` resolves to) is built with Go 1.24 and refuses to lint code targeting Go 1.25 ("the Go language version used to build golangci-lint is lower than the targeted Go version"). Bumped to action @v8 pinned at v2.5.0 (built with Go 1.25). v2.x has a different config format but we have no .golangci.yml, so default linters run unchanged.

golangci-lint v2 doesn't run any linters without an explicit config (v1 silently ran the standard set). Added a minimal v2 config that enables the same set v1 had on by default — errcheck, govet, ineffassign, staticcheck, unused — plus targeted exclusion rules for idiomatic patterns we don't want to flag: - `defer x.Close()` — the canonical Go idiom; close-on-defer failures aren't actionable. - `.Set(Read|Write)Deadline(...)` on websocket conns — same idea; a failed deadline just means the next op will error and we'll exit cleanly. - `_ = d.DB.WriteAudit(...)` — audits are best-effort by design. - tests routinely ignore cleanup errors. Two real findings fixed in source: - internal/docker/fake.go:98 — staticcheck QF1008 flagged `c.Container.State = state` as redundant once `c.State = state` was already on the line above. Embedded field rewrites are transparent here; dropped the duplicate. - internal/api/auth/auth.go:191 — `type pendingEnrollments struct{}` was declared with a TODO comment but never referenced. Removed; the TODO now lives in MILESTONES.md territory rather than dead code. Local `golangci-lint run` against the same v2.5.0 the CI runs is clean: 0 issues.

…b/dist) The committed web-builder stage writes its bundle to /src/web/dist (vite.config.ts has `outDir: 'dist'` resolved relative to web/), but the go-builder COPY was reading from /src/internal/web/dist — nothing there, build error: failed to compute cache key: ... "/src/internal/web/dist": not found `make image` worked locally because my working-tree Dockerfile had the correct path, but I never committed that fix; the broken COPY from a pre-v0.2 in-flight relocation attempt has been on origin/v0.2 since the fat-privileged container rewrite. Aligning the path so the bundle from web-builder actually reaches go-builder for embed. Embed target inside the runtime image is unchanged — internal/web/dist relative to the Go module — only the source path of the COPY changed.

Adds a Secrets tab to the Kubernetes route. Read-only — no edit/create/ delete. Most-sensitive RBAC widening so far; the design choices reflect that. Backend (internal/k8s/secrets.go + internal/api/k8s/k8s.go) - GET /api/k8s/secrets?namespace= → { secrets: [{name, namespace, type, keys[], age}] } List omits values; only metadata + lex-sorted key list. Type is the K8s SecretType (Opaque, kubernetes.io/tls, dockerconfigjson, etc.). - GET /api/k8s/secrets/:ns/:name → SecretDetail (secret + labels + annotations + data + events). data values are base64-decoded to plaintext on the backend. Non-UTF-8 bytes get U+FFFD substitution via standard JSON encoder behavior — most secrets are text, this is acceptable. - Per-detail audit: every detail GET (success or failure) writes a store.AuditEntry with action `k8s.secret.read`, target `ns/name`, detail `{"key_count": N}`. Key NAMES are not logged, values are not logged. The audit table records THAT a secret was read, not what was in it. List endpoint is intentionally NOT audited (no values exposed there). RBAC (deploy/k8s/rbac.yaml) - core/secrets: [get, list] - Deliberately NOT watch — keeps a long-lived stream of all cluster secrets from being established. Reads are point-in-time and audited; watch would not be. - Comment explicitly explains the trade. Frontend (web/src/components/k8s/Secret{List,Detail}.tsx + lib hooks) - SecretList: table with Name/Namespace/Type-badge/Keys/Age. Type badge strips the kubernetes.io/ prefix for compactness. - SecretDetail: Sheet drawer with: • muted informational banner: "Read-only. Values are sensitive — every detail view writes an audit row." (informational, not alarmist red.) • Reveal All / Hide All toggle resets to OFF every time the drawer opens — re-opening always re-masks. • Per-row eye toggle + copy-to-clipboard button. • Masking: 8 fixed bullets regardless of value length so the DOM never leaks length information. • JSON pretty-print: try/catch around JSON.parse + stringify; falls back to raw text on parse failure. • TLS-shaped values (start with -----BEGIN, contain newlines) rendered in a monospace <pre>. • Copy fallback: navigator.clipboard.writeText with a hidden <textarea>+execCommand fallback for insecure contexts. - useK8sSecretDetail uses staleTime: 60s and refetchOnWindowFocus: false so re-opening doesn't immediately re-hit the API and pile up audit rows. Verified end-to-end: - Local golangci-lint v2.5.0 clean (0 issues), tsc --noEmit clean. - make image succeeds at 230MB. - Image loaded into both K3s nodes; deployment rolled clean. - Docker compose container recreated with the standard fat-privileged flags. - kubectl auth can-i: list secrets=yes, get secrets=yes, watch secrets=no (intentional). - Both endpoints return 401 (mounted, auth-gated). - SPA bundle contains 24 references to SecretList/SecretDetail/ useK8sSecrets/secrets path. Side change: deploy/docker-compose.yml `image:` flipped from ghcr.io/tm4rtin17/controlroom:latest to controlroom:dev so `docker compose up` matches the local-build workflow without forcing a registry pull. Production deployments should pin a specific tag.

Adds an "Edit YAML" button to the existing detail drawers for the five editable resource kinds (Deployment / StatefulSet / DaemonSet / Service / ConfigMap). Click → Sheet with a Monaco editor; Dry-run validates server-side; Apply writes back via PUT semantics. Backend (internal/k8s/manifest.go + internal/api/k8s/k8s.go) - GET /api/k8s/manifest?kind=&namespace=&name= → { yaml, resource_version, gvk } The dynamic client fetches the resource as unstructured; we strip metadata.{managedFields,creationTimestamp,uid, resourceVersion,generation} and the entire status before marshalling so the editable surface is just spec + labels + annotations. resourceVersion is sent separately so the apply round-trip can detect conflicts without leaking RV into the editable buffer. - POST /api/k8s/manifest body: { yaml, resource_version, dry_run } → { ok, resource_version, warnings, dry_run } dyn.Update with metav1.UpdateOptions{DryRun: [DryRunAll]} when dry_run=true. PUT (Update) semantics, not server-side Apply — kubectl-edit-equivalent. The API server rejects on stale RV with k8serrors.IsConflict → 409. - Cross-check enforced: the YAML's `kind`, `metadata.namespace`, and `metadata.name` must match the URL path values; mismatch returns 400 ErrManifestMismatch. Prevents an "edit foo, but YAML targets bar" cross-resource attack. - Editable kinds allowlist (deployment / statefulset / daemonset / service / configmap). Anything outside → 400. Pods are excluded (immutable fields). Nodes are excluded (metadata-only edits warrant a different flow). Secrets stay read-only per D3. - Audit: k8s.manifest.dry_run / k8s.manifest.apply with {kind, namespace, name, bytes, dry_run}. The YAML body itself is never logged — only its size — so a future audit dump can't replay sensitive data that flowed through the editor. - Error mapping: 400 (parse / cross-check / not-editable kind), 404, 409 (IsConflict), 422 (IsInvalid — admission webhook denial passes through verbatim), 500. RBAC (deploy/k8s/rbac.yaml) - apps/{deployments,statefulsets,daemonsets}: patch + update (patch was already in scope from Phase C for restart; update is new for full YAML edits). - core/services: update (was get/list/watch only). - core/configmaps: update was already in scope from D2. Frontend (lib/k8s.ts + components/k8s/ManifestEditor.tsx) - useK8sManifest: useQuery with refetchOnWindowFocus:false and staleTime:Infinity so unsaved edits aren't clobbered by a background refetch. Operator-driven Reload only. - useApplyManifest: useMutation. On non-dry-run success, invalidates workload/service/configmap detail + list queries. - ManifestEditor.tsx (Sheet drawer): • Monaco lazy-loaded via React.lazy → ~3MB chunk fetched only when the Sheet first opens; main bundle stays at the size it was before D4. • Dark theme, YAML mode, no minimap, font-mono, wordWrap on, scrollBeyondLastLine off. • Header buttons: Reload (with dirty-state confirmation), Dry-run, Apply (always confirms via AlertDialog). • Apply disabled when YAML is unchanged. • 409 conflict UX has its own dedicated Alert + inline Reload button, separate from generic / 422-admission errors. • resource_version shown muted in the header so operators know which revision they're working against. - WorkloadDetail / ServiceDetail / ConfigMapDetail each grew an Edit YAML button alongside their existing actions. ConfigMap keeps its structured key/value editor too — the YAML editor is the alternative when you need to edit metadata/labels/annotations. Dependency: web/package.json adds @monaco-editor/react ^4.6.0 (the small wrapper; monaco-editor itself is its peer/transitive). Lock file regenerated locally before this commit so CI's `npm ci` works without falling back to `npm install`. Verified end-to-end: - Local golangci-lint v2.5.0 clean (0 issues), tsc --noEmit clean. - make image succeeds at ~230MB (Monaco lives in lazy chunks under /assets, not in the main runtime image bloat path). - Image loaded into both K3s nodes; deployment rolled cleanly. - kubectl auth can-i update {deployments|statefulsets|daemonsets| services|configmaps} as the SA → all five yes. - GET /api/k8s/manifest and POST /api/k8s/manifest both return 401 (mounted, auth-gated). - SPA main bundle contains ManifestEditor + the manifest URL helpers; Monaco itself is in a separate lazy chunk.

`docker compose up` namespaces named volumes with the project name — when run from deploy/ the volume becomes deploy_controlroom-data, not controlroom-data. That silently creates a fresh empty volume rather than reusing the host-level one, stranding the operator's admin user, JWT signing key, and TLS material on the other volume; the SPA then shows the first-run setup wizard as if it were a clean install. Marking the volume external means compose refuses to start if the volume isn't already present (operator runs `docker volume create controlroom-data` once on first install) AND uses that exact named volume thereafter. Switching between `docker run -v controlroom-data: /data` and `docker compose up` now lands on the same persistent state.

- Add CHANGELOG.md (Keep-a-Changelog) covering v0.1.0 → v0.2.0. The Unreleased section is reserved for post-merge work on main. - README features table: rewrite the Kubernetes row to mention D1 exec, D2 ConfigMap edit, D3 Secret view, D4 manifest editor. Add a CHANGELOG link to the doc index. - docs/INSTALL.md: rewrite the in-cluster ClusterRole table (Phase A → D verbs grouped by phase that introduced them, with explicit exclusions); add the `docker volume create controlroom-data` step to the compose path now that the volume is `external: true`. - docs/CONFIG.md: per-tab capability matrix gains rows for the K8s sub-features (exec, lifecycle, configmap edit, secret view, manifest edit) so operators can see which RBAC verbs each one needs. - docs/SECURITY.md: in-cluster Pod section's RBAC table updated for the full Phases A→D verb set; per-Secret-read and per-manifest-edit audit rows called out; Known gaps refreshed (K8s tab moved from "open" to "closed in v0.2", multi-cluster + RBAC roles added as v0.3 candidates). Audit re-run before this commit: zero secret-shaped strings introduced across the 24 commits in v0.2; runtime data still under /var/lib/* outside the repo; .claude/ still ignored.

tm4rtin17 added 25 commits May 8, 2026 15:28

tm4rtin17 merged commit be8ab22 into main May 9, 2026
6 checks passed

tm4rtin17 deleted the v0.2 branch May 9, 2026 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2 — Kubernetes management tab + fat-privileged container + CI repair#1

v0.2 — Kubernetes management tab + fat-privileged container + CI repair#1
tm4rtin17 merged 25 commits into
mainfrom
v0.2

tm4rtin17 commented May 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tm4rtin17 commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Migration notes for main

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tm4rtin17 commented May 9, 2026 •

edited

Loading

Migration notes for `main`