Conversation
- PublicBindBanner: treat 100.64.0.0/10 (Tailscale CGNAT) as private so
the destructive "public-looking address" warning no longer fires when
reaching ControlRoom over Tailscale.
- internal/logs: add Available() (cached exec.LookPath); /api/logs/journal
and /ws/logs/journal now return 503 with a clear operator message when
journalctl is missing instead of bubbling up an exec error as 500.
- main: warn at boot when journalctl is not in PATH, matching the existing
systemd-unavailable warning.
- web/Logs: source toggle (Journal / Containers). Containers mode reuses
/api/containers + the existing /ws/containers/:id/logs WebSocket — no
new backend route — with picker, client-side substring filter, pause,
and 1k-line cap. Journal-mode errors now surface the real backend
message instead of "Could not fetch logs."
- deploy/docker-compose: add group_add: ["${DOCKER_GID:-999}"] so the
nonroot uid 65532 can read /var/run/docker.sock; refresh the header
comment to match the new logs behavior.
- .claude/ — local Claude Code config and project agent definitions (.claude/agents/*.md). These contain working-tree-specific tooling and should not be published. - *.crt — round out the existing *.pem / *.key entries so a stray TLS certificate dropped in the tree can't be committed.
Adds GET /api/system/capabilities — `{systemd, docker, journal}` booleans
derived from the existing nil-client / logs.Available() patterns — and
threads useCapabilities() into AppLayout to filter the sidebar.
In the container deployment this hides the Services tab (no dbus socket
mounted, no privilege to drive host systemd) instead of letting the
operator click into a dead-end 503. The Containers tab gets the same
treatment so a host without /var/run/docker.sock isn't shown a broken
tab either.
The Logs tab stays visible regardless: it has its own Journal/Containers
source toggle and degrades gracefully on its own.
The frontend types Container.ports and Container.labels as non-null arrays/objects (web/src/lib/containers.ts) and dereferences them directly (e.g. ContainerCard.tsx accesses container.ports.length). Go's encoding/json marshals nil slices and nil maps as null, so a container with no published ports or no labels would crash the SPA with a blank-screen runtime error. Initialize Ports, Labels, Mounts, Networks, Command, and Env to non-nil empty values in List() and Inspect() before the struct is returned. The compose label lookups now read from the local nil-guarded copy instead of the daemon's possibly-nil map.
Adds a Kubernetes tab to the SPA backed by client-go, plus K8s
manifests to deploy ControlRoom into the cluster it's meant to
manage. Read-only for now: Nodes, Namespaces, Workloads
(Deployment/StatefulSet/DaemonSet), Pods, Services. No write
actions, log streaming, or exec — those land in Phase B/C/D.
Backend:
- internal/k8s/ — client-go wrapper. New(ctx) tries in-cluster first,
then $KUBECONFIG, ~/.kube/config, /etc/rancher/k3s/k3s.yaml. Returns
ErrUnavailable on all failures, mirroring the docker pattern.
- internal/api/k8s/ — REST handlers under /api/k8s/{nodes,namespaces,
workloads,pods,services}. Nil-client → 503.
- /api/system/capabilities adds `kubernetes: bool` so the SPA
hides the tab when the cluster client isn't reachable.
- Best-effort wiring in main.go matching systemd/docker patterns.
Frontend:
- web/src/lib/k8s.ts — types + TanStack Query hooks (10s refetch).
- web/src/routes/Kubernetes.tsx — namespace picker, sub-tabs, search.
- web/src/components/k8s/ — list components per resource kind.
- formatRelativeTime() helper for the Age columns.
Deploy:
- deploy/k8s/{namespace,rbac,pvc,deployment,service,ingress}.yaml
— minimal in-cluster install. ClusterRole has get/list/watch only
on the resources we surface; widens in Phase C when actions land.
- Single-replica deployment by design — SQLite + ROX PVC.
- Pod runs as uid 65532, drop ALL caps, readOnlyRootFilesystem.
Verified:
- make image succeeds.
- Image loaded into both K3s nodes (multi-node cluster).
- Pod reaches Ready, /api/healthz returns 200.
- No "kubernetes unavailable" warning in logs — client-go connects
via the projected SA token.
Adds click-to-detail across the Kubernetes tab and a live log viewer for pods. Read-only still — no write actions yet. Backend (internal/k8s/detail.go, internal/api/k8s/k8s.go): - GET /api/k8s/nodes/:name → NodeDetail (conditions, allocatable, taints, events). - GET /api/k8s/workloads/:namespace/:kind/:name → WorkloadDetail for Deployment / StatefulSet / DaemonSet (selector, strategy, conditions, events). - GET /api/k8s/pods/:namespace/:name → PodDetail (per-container status, conditions, QoS, node, events). - GET /api/k8s/services/:namespace/:name → ServiceDetail (endpoint addresses, selector, events). - WS /ws/k8s/pods/:namespace/:name/logs?container=&tail= — streams via CoreV1.Pods().GetLogs().Stream(). Frame shape matches /ws/containers/:id/logs (type=line, stream=stdout, line) so the SPA reuses the LogStream UX. - Events listed via field selector on involvedObject — server-side filter, no full-list scan. Cluster-scoped Node events scan all namespaces. - DNS-1123 regex with length guard validates :namespace and :name before they reach the API server. Frontend (web/src/components/k8s/*Detail.tsx + helpers): - Sheet-based drawer per resource kind, opened by row click in the list views. State held in routes/Kubernetes.tsx so opening a pod doesn't clobber a node selection. - Reusable EventsTable and ConditionsTable. - PodLogStream — WebSocket viewer mirroring ContainerLogStream (pause/resume/clear, 1000-line cap, auto-scroll). - Container picker in PodDetail: clicking a container row syncs the active container in the picker chips and the log stream. - Used the existing Sheet primitive — no new deps. RBAC widening (deploy/k8s/rbac.yaml): - pods/log → get (so the SA can stream container output). - endpoints → get/list/watch (for ServiceDetail.endpoints). - Still no secrets, no PVs, no write verbs. Phase C territory. Verified: - make image, image loaded into both K3s nodes (piserver + piserver2). - kubectl apply -f deploy/k8s/rbac.yaml succeeded. - rollout restart deployment/controlroom rolled cleanly within 90s. - /api/healthz → 200, /api/system/capabilities reports kubernetes:true. - kubectl auth can-i get pods/log as the SA → yes. - No "kubernetes unavailable" warning in pod logs.
The docker-compose container can't reach K3s' API server at
127.0.0.1:6443 from inside its bridge namespace, and the cert SAN
doesn't include the docker bridge gateway IP. So we rewrite the
kubeconfig server URL to the host's K3s node-ip (which is in the
cert SAN by default) and mount that copy read-only into the
container.
deploy/scripts/setup-host-kubeconfig.sh
- One-shot helper run on the host (sudo). Reads
/etc/rancher/k3s/k3s.yaml, picks a host IP that's in the cert
SAN (override → /etc/rancher/k3s/config.yaml node-ip →
hostname -I), writes the rewritten config to
/var/lib/controlroom/kubeconfig with mode 0600 and owner
65532:65532 (the controlroom container's nonroot uid).
- Verifies the chosen IP is in the cert's SubjectAltName before
writing — warns the operator if not.
deploy/docker-compose.yml
- Mount /var/lib/controlroom/kubeconfig:/etc/k3s/kubeconfig:ro
- KUBECONFIG=/etc/k3s/kubeconfig — client-go's first lookup path.
- Comment header reframed: progressive integration model. Docker
+ K8s wired now; journal / systemd / apt left as TODO mounts.
Security note: the K3s admin kubeconfig has cluster-admin. The
container now has cluster-admin. Matches the user's stated intent
of "fat-privileged container" but worth calling out — RBAC the SA
later if you want to scope this down.
…, network
Bundle that lights up every host-integration tab in the docker-compose
deployment by giving the container the host namespaces, capabilities,
and bind mounts it needs. Bare-metal install (deploy/install.sh) is
unchanged and remains the unprivileged path.
Image (deploy/Dockerfile)
- Drop gcr.io/distroless/static-debian12:nonroot, switch runtime to
debian:bookworm-slim.
- Install: bash, ca-certificates, sudo, util-linux (nsenter), procps,
iproute2, iptables, ufw, systemd (journalctl + systemctl + libsystemd0
+ libdbus-1-3), apt-utils.
- Run as root; SYS_ADMIN can't survive a uid transition.
- Image size 230MB (up from ~20MB distroless). Acceptable; documented.
Compose (deploy/docker-compose.yml)
- network_mode: host, pid: host
- cap_add: SYS_ADMIN (nsenter) + SYS_PTRACE (host ps/top) + NET_ADMIN
(ufw/iptables)
- security_opt: apparmor:unconfined + seccomp:unconfined
- user: 0:0
- Mounts: dbus socket (rw), /run/log/journal + /var/log/journal +
/etc/machine-id (ro) for journalctl, /etc/rancher/k3s/k3s.yaml
directly (network_mode:host means 127.0.0.1:6443 is reachable so
the rewritten kubeconfig from setup-host-kubeconfig.sh is no longer
needed), /var/cache/apt rw + /etc/apt + /var/lib/dpkg ro for apt.
- env: CR_HOST_SHELL=true so pty wraps shells with nsenter.
- Header reframed with security model + integration matrix.
pty (internal/pty/pty.go + config.go + api/terminal + api/router)
- New Options.HostShell. When true, exec.Command becomes
`/usr/bin/nsenter -t 1 -m -u -i -n -p -- <shell> --login` so the
user gets a shell in the host's mount/UTS/IPC/network/PID namespaces
— equivalent to ssh-ing into the host.
- CR_HOST_SHELL env var (default false). Wired through
config.Config → api.Deps.Cfg → terminalapi.Deps.HostShell →
pty.Options.HostShell.
- Stat-checks /usr/bin/nsenter when HostShell=true; clear error if
missing instead of an opaque exec failure.
- Bare-metal behavior is byte-identical when CR_HOST_SHELL=false.
setup-host-kubeconfig.sh + deploy/k8s/README.md
- Header rewritten: script no longer needed for the standard
docker-compose deployment, retained for non-host-network cases.
- K8s README adds a paragraph contrasting in-cluster (unprivileged
nonroot SA, scoped RBAC) vs docker-compose (fat-privileged) shapes.
Verified:
- make image succeeds at 230MB.
- Container boots clean — none of the four "unavailable" warnings
(systemd / journalctl / kubernetes / docker).
- host_shell=true logged at boot.
- All required binaries present in image: bash, sudo, nsenter, ip,
ufw, journalctl, systemctl, apt, apt-get.
- `nsenter -t 1` from inside the container produces a real host
shell (returns the host hostname + kernel).
- /api/healthz → 200.
Security note (re-stating compose header): a compromise of ControlRoom
is now effectively root on the host. Intended tradeoff for a single-user
homelab where the operator already has root. The bare-metal install
remains the unprivileged option.
The fat-privileged container runs as root, and the previous nsenter default dropped the operator straight into a root shell — a meaningful escalation over the SSH UX it should mirror. Now it runs login(1) inside the host namespaces; the user types host credentials in the xterm, PAM authenticates, and login execs the user's shell at the user's uid only on success. Same audit trail as SSH (auth.log, last-login records). Wiring: - internal/pty/pty.go: new Options.LoginMode. When HostShell+LoginMode the spawned process is `nsenter -t 1 -m -u -i -n -p -- /bin/login`. Removed the misleading container-side stat for login (after nsenter -m the binary resolves from the host's filesystem; a stat from the container's view doesn't tell us anything useful). - internal/config/config.go: CR_TERMINAL_LOGIN env var, default false. - internal/api/terminal + router: thread cfg.TerminalLogin through to pty.Options.LoginMode. - cmd/controlroom/main.go: log the new flag at boot. - deploy/docker-compose.yml: CR_TERMINAL_LOGIN=true alongside CR_HOST_SHELL=true. Comment explains the security improvement. Bare-metal install is unchanged — CR_TERMINAL_LOGIN defaults false, existing pty.New behavior preserved. Refused setup: LoginMode without HostShell returns an error. login(1) is meaningful only when there are namespaces to enter; otherwise we'd just be running it as the controlroom user with nothing to gain.
…ompt util-linux login(1) refuses to run interactively when execv'd from a regular process — it expects to be slave to getty/sshd, with specific TTY ioctls, and exits silently otherwise. Confirmed end-to-end: even with a real PTY allocated by Python and a controlling terminal set, /bin/login produces no output and exits 1. su(1) is a regular interactive program that works the way login should. Replace the LoginMode exec target with a tiny inline bash script that: 1. Prints a short banner with the hostname. 2. Prompts for the username. 3. exec su -l "$username". su then prompts for the password through PAM (host /etc/pam.d/su rules), authenticates, and execs the user's shell at the user's uid. The password never touches Go, audit goes through PAM as before, and the UX is identical to SSH: type user, type password, you're in. LoginPath constant removed; replaced with the loginScript string. No external file added — the script is small enough to live as a string constant in pty.go. Verified: a Python PTY harness around the same exec.Command path that pty.New uses now prints "ControlRoom Terminal — host login at <host>" followed by "username: " and waits for input, exactly like SSH.
Root invoking su(1) is special-cased to skip PAM authentication — "root can become anyone" — so when the controlroom container (uid 0) ran `su -l pi` the operator landed as pi without ever proving the password. That's not the SSH-equivalent UX we wanted; it's worse than the old root-shell default because now it's pretending to authenticate. Drop to nobody:nogroup with setpriv before invoking su. su now sees a non-root caller, runs full PAM auth via /etc/pam.d/su, the password is prompted in the xterm, and on success the setuid bit on /bin/su lets it escalate to root and then drop to the target user. Lockouts and auth.log entries flow through PAM as expected. Verified: a Python PTY harness around the new exec path now prints "Password:" after entering the username, exactly like SSH.
Same bug class as 74dd538 fixed for the Docker client — Network.tsx crashes the SPA with a blank screen when iface.ips or fw.data.rules arrives as null instead of an empty array, because the frontend dereferences .length directly. Two fields needed nil-guards: - internal/network/iface.go: Interface.IPs starts as a nil slice when an interface has no addresses (e.g. some down/dummy/bridge ifaces). Now initialized to []string{}. Flags inherits whatever `ip -j addr show` returns (which is non-nil in practice but not guaranteed) so we nil-guard that too. - internal/network/ufw.go: UFWStatus.Rules starts as a nil slice when the firewall is disabled or has no rules. Now initialized to []UFWRule{} in parseUFWStatus before the parse loop. Verified: rebuilt image, recreated container. The Network tab should render an empty Interfaces grid + "No rules" instead of crashing.
The shipped docs lagged badly behind v0.2:
- README claimed "v0.1 in development", didn't mention the
Kubernetes tab, listed footprint targets that no longer apply,
and warned that the Docker container deployment lost Services
and Logs (we wired both via the fat-privileged container).
- INSTALL.md only covered bare-metal + the old distroless docker
flavor; nothing about the in-cluster Pod, host-shell login, or
the new env vars.
- SECURITY.md described privilege scoping for one shape only;
the new fat-privileged container needed an explicit "this is
effectively root on the host" callout, and the in-cluster Pod
needed a section on the scoped ClusterRole.
This commit:
README.md (~140 lines)
- v0.2 status; Kubernetes in the feature list; three deployment
shapes table; quick-start per shape; updated project layout
(deploy/k8s, deploy/scripts, internal/k8s, internal/api/k8s);
Go 1.25+; doc index pointing at INSTALL/CONFIG/SECURITY/SPEC.
docs/INSTALL.md (rewritten, ~340 lines)
- Choose-a-shape comparison table with privilege model + which
tabs work in each.
- Path A bare-metal: prerequisites, install, first-run wizard,
update, uninstall, common pitfalls.
- Path B Docker fat-privileged: full mount/cap inventory, the
nsenter+setpriv+su login flow explained step-by-step, common
pitfalls (terminal misbehaviors, kubeconfig issues).
- Path C in-cluster Pod: prerequisites, image-loading on
multi-node K3s, ClusterRole scope (current + Phase C plans),
three access patterns (Ingress / port-forward / NodePort),
common pitfalls (ImagePullBackOff, missing pods/log RBAC).
- TLS modes section; Kubernetes integration section covering
the four kubeconfig discovery paths.
- Post-install checklist.
docs/CONFIG.md (new, ~180 lines)
- Every CR_* env var with default + purpose, grouped by Core,
TLS, Integrations, Container-only, Compose helpers.
- Per-tab capability matrix (which tab works in which shape and
what mount/cap each one needs).
- Per-shape file paths table.
- Capability detection / debugging walkthrough.
- Compose env-file pattern.
docs/SECURITY.md (rewritten, ~180 lines)
- Bumped to v0.2.
- Privilege scoping section split into three subsections
(bare-metal / fat-privileged container / in-cluster Pod) with
explicit compromise-impact statements.
- Audit section calls out PAM/auth.log + apt history.log as the
real audit trail for the fat-privileged container's host ops.
- Public-bind detection updated to mention RFC 6598 / Tailscale
CGNAT awareness (was listed as a gap; now shipped).
- Known gaps refreshed: K8s Phase C/D, cert-manager TLS for the
Pod shape, etc.
Adds the four standard cluster operations to the SPA, with audit and
RBAC scoped to exactly what's needed.
Backend (internal/k8s/actions.go, internal/api/k8s/k8s.go)
- RestartWorkload(ns, kind, name): merge-patches the pod template
annotations with kubectl.kubernetes.io/restartedAt = now.
Same trick `kubectl rollout restart` uses. Works for Deployment /
StatefulSet / DaemonSet via case-insensitive kind dispatch.
- ScaleWorkload(ns, kind, name, replicas): GetScale → mutate
Spec.Replicas → UpdateScale, preserving resourceVersion so the
optimistic-concurrency check in the API server actually fires.
Deployment + StatefulSet only; DaemonSet returns 400.
- DeletePod(ns, name, gracePeriod, force): Delete with optional
grace-period override. force=true hard-overrides ?grace=N to 0.
- CordonNode(name, cordoned): merge-patches spec.unschedulable.
Distinct audit actions for cordon vs uncordon so the trail
captures intent.
- Validation: every :namespace, :name, :kind goes through the
existing DNS-1123 regex + length guard. Replicas bounds-checked
0..1000 server-side.
- Audit: every action writes a store.AuditEntry with
target=ns/name (or just name for cluster-scoped Node), detail
map carrying the relevant params. Outcome=failure includes the
error message. Best-effort write, never fails the request.
- DB threaded into k8sapi.Deps; router updated.
RBAC widening (deploy/k8s/rbac.yaml)
- apps/{deployments,statefulsets,daemonsets}: patch (restart)
- apps/{deployments,statefulsets}/scale: update (scale)
- core/pods: delete (rotate)
- core/nodes: patch (cordon)
- Still NO secrets, NO persistentvolumes, NO delete on apps/*
(would let the SPA wipe a workload outside GitOps).
Frontend (web/src/lib/k8s.ts + the three Detail drawers)
- useRestartWorkload, useScaleWorkload, useDeletePod, useCordonNode
— each useMutation invalidates the matching list + detail keys
on success. Pattern matches useContainerAction.
- WorkloadDetail: Restart button (always); Scale button with a
number input prefilled from ready.desired (Deployment+STS only).
- PodDetail: red Delete button with a "Force (skip graceful
shutdown)" checkbox; closes the drawer on success via a new
onClose prop wired up from routes/Kubernetes.tsx.
- NodeDetail: toggle button derives current cordon state from the
node.kubernetes.io/unschedulable taint (which `kubectl cordon`
applies), falling back to the SchedulingDisabled condition.
- Every destructive action goes through AlertDialog with the
impact spelled out. Pending state + inline error on failure.
Verified end-to-end:
- make image succeeds at 230MB.
- Image loaded to both K3s nodes.
- kubectl apply rbac.yaml; rollout restart succeeds.
- 4 new endpoints all return 401 (auth required, routes mounted).
- kubectl auth can-i {patch deployments, update deployments/scale,
delete pods, patch nodes} as the SA → all "yes".
- SPA bundle contains all four action mutation hooks.
Adds "shell into a container" to the Kubernetes tab. Equivalent to
`kubectl exec -it <pod> -c <container> -- /bin/sh`, but in the browser.
Backend (internal/k8s/exec.go + internal/api/k8s/k8s.go)
- Client.PodExec uses k8s.io/client-go/tools/remotecommand to open a
SPDY exec stream to the API server. Stashes the rest.Config on the
Client struct (NewSPDYExecutor needs it; the existing typed
clientset wasn't enough).
- WS handler at /ws/k8s/pods/:namespace/:name/exec wires:
WS binary in → io.Pipe → SPDY stdin
SPDY stdout → wsExecWriter → WS binary out
SPDY stderr → wsExecWriter → WS binary out (merged; xterm
doesn't distinguish)
WS text JSON → resize → wsSizeQueue → SPDY resize stream
- Wire format mirrors /ws/terminal exactly so the frontend can clone
Terminal.tsx's xterm wiring with minimal changes:
init: { rows, cols, container, command? } (default command ["/bin/sh"])
err: { type: "error", err: "..." }
- Lifecycle correctness: cancelling ctx unblocks StreamWithContext,
closing stdinW unblocks any blocked SPDY stdin read, closing
sq.ch causes wsSizeQueue.Next to return nil so the SPDY size
tracker exits cleanly. No goroutine leaks.
- Audit: k8s.pod.exec.start at session open, k8s.pod.exec.end on
close with bytes_in/bytes_out/duration_ms — same shape as
terminal.session_end so a single audit query catches both.
- Idle timeout 30 min, matching the terminal handler.
RBAC (deploy/k8s/rbac.yaml)
- core/pods/exec: [create]. Minimum verb needed for the K8s API
server to accept the SubjectAccessReview that the SPDY exec call
triggers. Nothing else widened in this commit.
Frontend (web/src/components/k8s/PodExecModal.tsx + PodDetail.tsx)
- Sheet-based modal: container picker, command picker (sh / bash /
custom), Reconnect, status pill — same xterm theme + addons as
routes/Terminal.tsx.
- PodDetail Containers table gets an Exec column. Click → modal
opens with that container preselected. e.stopPropagation on the
button so it doesn't fight the row's log-viewer container picker.
- Reconnect generation counter bumps on container or command change,
forcing a clean WS rebuild with the new init frame.
- podExecURL helper added to web/src/lib/k8s.ts.
Dependency churn
- go mod tidy pulled in the SPDY transitive set:
github.com/gorilla/websocket
github.com/moby/spdystream
github.com/mxk/go-flowrate
All transitive of k8s.io/client-go/tools/remotecommand. No new
direct dependencies.
Verified end-to-end:
- make image succeeds.
- Image loaded into both K3s nodes.
- kubectl apply rbac.yaml; rollout restart succeeds.
- kubectl auth can-i create --subresource=exec pods → yes (SA's
perms via can-i --list also confirm pods/exec [create]).
- WS upgrade attempt returns 401 unauthenticated (route mounted,
auth gate firing).
- SPA bundle contains the new PodExecModal + podExecURL strings.
Adds a ConfigMaps tab to the Kubernetes route, with full list / view /
edit. Uses a simple key-value editor for now (no Monaco — that lands
with D4's manifest editor for the full YAML edit experience).
Backend (internal/k8s/configmaps.go + internal/api/k8s/k8s.go)
- GET /api/k8s/configmaps?namespace=
→ { configmaps: [{name, namespace, keys[], age}] }
List omits data values to keep responses small; only metadata
and the lex-sorted key list.
- GET /api/k8s/configmaps/:ns/:name
→ ConfigMapDetail (configmap + labels + annotations + data
+ binary_keys + events).
binary_keys lists names of binaryData entries; the bytes
themselves aren't returned (avoids accidentally exposing
non-text secrets-in-configmaps).
- PUT /api/k8s/configmaps/:ns/:name
body: { data: Record<string,string> }
resp: {ok: true} / 400 (invalid key) / 404 / 409 (conflict)
/ 413 (over 1 MiB).
Uses Update (not Patch) so the API server's
resourceVersion-based optimistic concurrency check fires and
we get 409 instead of silent overwrite when two operators
edit at once.
- Validation: key matches `^[a-zA-Z0-9._-]+$`, total `len(k)+len(v)`
under 1 MiB. Audit row per update with key_count + size_bytes.
metadata.name/namespace are URL-canonical and not editable;
labels/annotations are read-only here (D4 will handle them
via manifest edit).
Frontend (lib/k8s.ts + components/k8s/ConfigMap{List,Detail}.tsx)
- Three new TS interfaces matching the backend contract; three
hooks (useK8sConfigMaps, useK8sConfigMapDetail,
useUpdateConfigMap) following the Phase B/C invalidation pattern.
- ConfigMapList: table with Name / Namespace / Keys-count / Age,
namespace picker driven from the existing parent state.
- ConfigMapDetail: Sheet drawer with key/value table editor
(`<Input>` for keys, multi-line `<textarea>` for values),
add/remove rows, Save with AlertDialog confirmation showing
key count + total bytes via formatBytes. Dirty detection via
JSON.stringify of sorted entries against the original snapshot.
- 409 conflict UX: dedicated inline Alert with a Reload button
that refetches detail and discards local edits.
- Read-only sections for labels (chips), annotations (collapsible),
binary_keys ("Edit via kubectl; not editable here"), events
(reuses existing EventsTable).
RBAC (deploy/k8s/rbac.yaml)
- core/configmaps: [update]. get/list/watch were already in scope
from Phase A. No widening for binary data — read still returns
only key names.
Verified end-to-end:
- make image succeeds.
- Image loaded into both K3s nodes; rollout restart clean.
- Docker compose container recreated with the new image.
- kubectl auth can-i update/list configmaps as the SA → both yes.
- All 3 endpoints return 401 (mounted, auth-gated).
- SPA bundle contains ConfigMapList / ConfigMapDetail / hooks.
CI has been failing at workflow startup since before v0.2 (the m9
push to main also failed). Three real issues plus a couple of polish
items:
1. Go version drift. ci.yml pinned Go 1.23 while go.mod requires
go 1.25.0 (k8s.io/client-go pulled in via Phase A). setup-go would
install 1.23 and `go mod download` would fail. Switched to
`go-version-file: go.mod` so the toolchain follows go.mod and
can't drift again.
2. Wrong embed-stub path. internal/web/web.go does
//go:embed all:dist
from its own package directory, so the embed needs
internal/web/dist/index.html — not web/dist/index.html. The old
stub step created the wrong path; the Go compile would fail with
"no matching files for embed". Fixed.
3. Trigger only ran on `push: [main]`, never on dev branches. Added
"v*" so v0.2 (and future v0.3, v0.4) get CI feedback before merge.
Polish:
- Added a `Vet` step (cheap, fast, often catches things lint
misses).
- Scoped the test run to ./internal/... ./cmd/... — ./... would
descend into web/node_modules/flatted/golang/pkg/flatted (an
npm dep that ships an embedded .go file) and fail to compile.
- npm cache + cache-dependency-path on the web job for faster
reruns.
- Lint kept on `latest` with a comment explaining we'll pin if
it ever flakes from a release upgrade.
The Docker image job is unchanged. It builds the same fat-privileged
runtime image we deploy from. Image size is ~230MB now (vs ~25MB
when ci.yml was first written) — the existing "Inspect image size"
step just reports, no enforcement, so no further change needed.
That step's run: scalar contained `image size: %.1f` — YAML plain scalars can't contain colon-space, which made the parser see the `%.1f` as a value of an implicit mapping with key `image size`. The result was the entire workflow failing at startup with "this run likely failed because of a workflow file issue" — zero jobs running, zero seconds duration. Same bug existed on v0.1; main's m9 push hit it too. Switched to `run: |` block scalar so the awk format string is opaque to YAML.
Two fixes for the first real-execution CI run:
1. web/package-lock.json wasn't tracked, so the runner couldn't
resolve `cache-dependency-path`, the cache step errored, and
`npm ci` would have fallen through to `npm install` anyway —
neither reproducible nor cached. Tracked the lock file (188K,
one-time inflation; future updates will be small diffs).
2. golangci-lint v1.64.8 (which the v6 action's `latest` resolves
to) is built with Go 1.24 and refuses to lint code targeting
Go 1.25 ("the Go language version used to build golangci-lint
is lower than the targeted Go version"). Bumped to action @v8
pinned at v2.5.0 (built with Go 1.25). v2.x has a different
config format but we have no .golangci.yml, so default linters
run unchanged.
golangci-lint v2 doesn't run any linters without an explicit config
(v1 silently ran the standard set). Added a minimal v2 config that
enables the same set v1 had on by default — errcheck, govet,
ineffassign, staticcheck, unused — plus targeted exclusion rules
for idiomatic patterns we don't want to flag:
- `defer x.Close()` — the canonical Go idiom; close-on-defer
failures aren't actionable.
- `.Set(Read|Write)Deadline(...)` on websocket conns — same idea;
a failed deadline just means the next op will error and we'll
exit cleanly.
- `_ = d.DB.WriteAudit(...)` — audits are best-effort by design.
- tests routinely ignore cleanup errors.
Two real findings fixed in source:
- internal/docker/fake.go:98 — staticcheck QF1008 flagged
`c.Container.State = state` as redundant once `c.State = state`
was already on the line above. Embedded field rewrites are
transparent here; dropped the duplicate.
- internal/api/auth/auth.go:191 — `type pendingEnrollments
struct{}` was declared with a TODO comment but never referenced.
Removed; the TODO now lives in MILESTONES.md territory rather
than dead code.
Local `golangci-lint run` against the same v2.5.0 the CI runs is
clean: 0 issues.
…b/dist)
The committed web-builder stage writes its bundle to /src/web/dist
(vite.config.ts has `outDir: 'dist'` resolved relative to web/), but
the go-builder COPY was reading from /src/internal/web/dist —
nothing there, build error:
failed to compute cache key: ... "/src/internal/web/dist": not found
`make image` worked locally because my working-tree Dockerfile had
the correct path, but I never committed that fix; the broken COPY
from a pre-v0.2 in-flight relocation attempt has been on origin/v0.2
since the fat-privileged container rewrite. Aligning the path so
the bundle from web-builder actually reaches go-builder for embed.
Embed target inside the runtime image is unchanged — internal/web/dist
relative to the Go module — only the source path of the COPY changed.
Adds a Secrets tab to the Kubernetes route. Read-only — no edit/create/
delete. Most-sensitive RBAC widening so far; the design choices reflect
that.
Backend (internal/k8s/secrets.go + internal/api/k8s/k8s.go)
- GET /api/k8s/secrets?namespace=
→ { secrets: [{name, namespace, type, keys[], age}] }
List omits values; only metadata + lex-sorted key list. Type
is the K8s SecretType (Opaque, kubernetes.io/tls,
dockerconfigjson, etc.).
- GET /api/k8s/secrets/:ns/:name
→ SecretDetail (secret + labels + annotations + data + events).
data values are base64-decoded to plaintext on the backend.
Non-UTF-8 bytes get U+FFFD substitution via standard JSON
encoder behavior — most secrets are text, this is acceptable.
- Per-detail audit: every detail GET (success or failure) writes a
store.AuditEntry with action `k8s.secret.read`, target `ns/name`,
detail `{"key_count": N}`. Key NAMES are not logged, values are
not logged. The audit table records THAT a secret was read, not
what was in it. List endpoint is intentionally NOT audited (no
values exposed there).
RBAC (deploy/k8s/rbac.yaml)
- core/secrets: [get, list]
- Deliberately NOT watch — keeps a long-lived stream of all cluster
secrets from being established. Reads are point-in-time and
audited; watch would not be.
- Comment explicitly explains the trade.
Frontend (web/src/components/k8s/Secret{List,Detail}.tsx + lib hooks)
- SecretList: table with Name/Namespace/Type-badge/Keys/Age. Type
badge strips the kubernetes.io/ prefix for compactness.
- SecretDetail: Sheet drawer with:
• muted informational banner: "Read-only. Values are sensitive
— every detail view writes an audit row." (informational, not
alarmist red.)
• Reveal All / Hide All toggle resets to OFF every time the
drawer opens — re-opening always re-masks.
• Per-row eye toggle + copy-to-clipboard button.
• Masking: 8 fixed bullets regardless of value length so the
DOM never leaks length information.
• JSON pretty-print: try/catch around JSON.parse + stringify;
falls back to raw text on parse failure.
• TLS-shaped values (start with -----BEGIN, contain newlines)
rendered in a monospace <pre>.
• Copy fallback: navigator.clipboard.writeText with a hidden
<textarea>+execCommand fallback for insecure contexts.
- useK8sSecretDetail uses staleTime: 60s and refetchOnWindowFocus:
false so re-opening doesn't immediately re-hit the API and pile
up audit rows.
Verified end-to-end:
- Local golangci-lint v2.5.0 clean (0 issues), tsc --noEmit clean.
- make image succeeds at 230MB.
- Image loaded into both K3s nodes; deployment rolled clean.
- Docker compose container recreated with the standard
fat-privileged flags.
- kubectl auth can-i: list secrets=yes, get secrets=yes,
watch secrets=no (intentional).
- Both endpoints return 401 (mounted, auth-gated).
- SPA bundle contains 24 references to SecretList/SecretDetail/
useK8sSecrets/secrets path.
Side change: deploy/docker-compose.yml `image:` flipped from
ghcr.io/tm4rtin17/controlroom:latest to controlroom:dev so
`docker compose up` matches the local-build workflow without
forcing a registry pull. Production deployments should pin a
specific tag.
Adds an "Edit YAML" button to the existing detail drawers for the
five editable resource kinds (Deployment / StatefulSet / DaemonSet
/ Service / ConfigMap). Click → Sheet with a Monaco editor; Dry-run
validates server-side; Apply writes back via PUT semantics.
Backend (internal/k8s/manifest.go + internal/api/k8s/k8s.go)
- GET /api/k8s/manifest?kind=&namespace=&name=
→ { yaml, resource_version, gvk }
The dynamic client fetches the resource as unstructured;
we strip metadata.{managedFields,creationTimestamp,uid,
resourceVersion,generation} and the entire status before
marshalling so the editable surface is just spec + labels +
annotations. resourceVersion is sent separately so the
apply round-trip can detect conflicts without leaking RV
into the editable buffer.
- POST /api/k8s/manifest
body: { yaml, resource_version, dry_run }
→ { ok, resource_version, warnings, dry_run }
dyn.Update with metav1.UpdateOptions{DryRun: [DryRunAll]}
when dry_run=true. PUT (Update) semantics, not server-side
Apply — kubectl-edit-equivalent. The API server rejects on
stale RV with k8serrors.IsConflict → 409.
- Cross-check enforced: the YAML's `kind`, `metadata.namespace`,
and `metadata.name` must match the URL path values; mismatch
returns 400 ErrManifestMismatch. Prevents an "edit foo, but
YAML targets bar" cross-resource attack.
- Editable kinds allowlist (deployment / statefulset / daemonset
/ service / configmap). Anything outside → 400. Pods are
excluded (immutable fields). Nodes are excluded (metadata-only
edits warrant a different flow). Secrets stay read-only per D3.
- Audit: k8s.manifest.dry_run / k8s.manifest.apply with
{kind, namespace, name, bytes, dry_run}. The YAML body itself
is never logged — only its size — so a future audit dump can't
replay sensitive data that flowed through the editor.
- Error mapping: 400 (parse / cross-check / not-editable kind),
404, 409 (IsConflict), 422 (IsInvalid — admission webhook
denial passes through verbatim), 500.
RBAC (deploy/k8s/rbac.yaml)
- apps/{deployments,statefulsets,daemonsets}: patch + update
(patch was already in scope from Phase C for restart; update
is new for full YAML edits).
- core/services: update (was get/list/watch only).
- core/configmaps: update was already in scope from D2.
Frontend (lib/k8s.ts + components/k8s/ManifestEditor.tsx)
- useK8sManifest: useQuery with refetchOnWindowFocus:false and
staleTime:Infinity so unsaved edits aren't clobbered by a
background refetch. Operator-driven Reload only.
- useApplyManifest: useMutation. On non-dry-run success, invalidates
workload/service/configmap detail + list queries.
- ManifestEditor.tsx (Sheet drawer):
• Monaco lazy-loaded via React.lazy → ~3MB chunk fetched only
when the Sheet first opens; main bundle stays at the size it
was before D4.
• Dark theme, YAML mode, no minimap, font-mono, wordWrap on,
scrollBeyondLastLine off.
• Header buttons: Reload (with dirty-state confirmation),
Dry-run, Apply (always confirms via AlertDialog).
• Apply disabled when YAML is unchanged.
• 409 conflict UX has its own dedicated Alert + inline Reload
button, separate from generic / 422-admission errors.
• resource_version shown muted in the header so operators know
which revision they're working against.
- WorkloadDetail / ServiceDetail / ConfigMapDetail each grew an
Edit YAML button alongside their existing actions. ConfigMap
keeps its structured key/value editor too — the YAML editor is
the alternative when you need to edit metadata/labels/annotations.
Dependency: web/package.json adds @monaco-editor/react ^4.6.0 (the
small wrapper; monaco-editor itself is its peer/transitive). Lock
file regenerated locally before this commit so CI's `npm ci` works
without falling back to `npm install`.
Verified end-to-end:
- Local golangci-lint v2.5.0 clean (0 issues), tsc --noEmit clean.
- make image succeeds at ~230MB (Monaco lives in lazy chunks under
/assets, not in the main runtime image bloat path).
- Image loaded into both K3s nodes; deployment rolled cleanly.
- kubectl auth can-i update {deployments|statefulsets|daemonsets|
services|configmaps} as the SA → all five yes.
- GET /api/k8s/manifest and POST /api/k8s/manifest both return
401 (mounted, auth-gated).
- SPA main bundle contains ManifestEditor + the manifest URL
helpers; Monaco itself is in a separate lazy chunk.
`docker compose up` namespaces named volumes with the project name — when run from deploy/ the volume becomes deploy_controlroom-data, not controlroom-data. That silently creates a fresh empty volume rather than reusing the host-level one, stranding the operator's admin user, JWT signing key, and TLS material on the other volume; the SPA then shows the first-run setup wizard as if it were a clean install. Marking the volume external means compose refuses to start if the volume isn't already present (operator runs `docker volume create controlroom-data` once on first install) AND uses that exact named volume thereafter. Switching between `docker run -v controlroom-data: /data` and `docker compose up` now lands on the same persistent state.
- Add CHANGELOG.md (Keep-a-Changelog) covering v0.1.0 → v0.2.0. The Unreleased section is reserved for post-merge work on main. - README features table: rewrite the Kubernetes row to mention D1 exec, D2 ConfigMap edit, D3 Secret view, D4 manifest editor. Add a CHANGELOG link to the doc index. - docs/INSTALL.md: rewrite the in-cluster ClusterRole table (Phase A → D verbs grouped by phase that introduced them, with explicit exclusions); add the `docker volume create controlroom-data` step to the compose path now that the volume is `external: true`. - docs/CONFIG.md: per-tab capability matrix gains rows for the K8s sub-features (exec, lifecycle, configmap edit, secret view, manifest edit) so operators can see which RBAC verbs each one needs. - docs/SECURITY.md: in-cluster Pod section's RBAC table updated for the full Phases A→D verb set; per-Secret-read and per-manifest-edit audit rows called out; Known gaps refreshed (K8s tab moved from "open" to "closed in v0.2", multi-cluster + RBAC roles added as v0.3 candidates). Audit re-run before this commit: zero secret-shaped strings introduced across the 24 commits in v0.2; runtime data still under /var/lib/* outside the repo; .claude/ still ignored.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Promotes 25 commits from
v0.2tomainfor the v0.2.0 release. Three big themes:m9 polish) failed at 0s. Fixed in five commits: Go version drift, invalid YAML in the awk-format step, missing package-lock.json + outdated golangci-lint, missing v2 config, Dockerfile/vite path mismatch.Plus: terminal PAM login flow, Tailscale CGNAT awareness in the public-bind banner, capability-gated nav (hides tabs whose backend isn't reachable), two nil-slice JSON fixes that were blanking the Containers and Network tabs, and a docker-compose volume-external fix so admin accounts don't get stranded across
docker run↔docker compose upswitches.Full per-section detail in
CHANGELOG.md.Verification
/var/lib/controlroom/and the Docker named volume, both outside the repo;.claude/(local agent definitions) ignored.Migration notes for
maindocker volume create controlroom-dataonce beforedocker compose up. The compose volume is nowexternal: trueto avoid thedocker run↔ compose namespacing trap.deploy/k8s/rbac.yamlto pick up the widened verbs (pods/exec: create,pods: delete,nodes: patch,apps/*: patch,update,apps/*/scale: update,services/configmaps: update,secrets: get,list).Test plan
make image && docker volume create controlroom-data && docker compose -f deploy/docker-compose.yml up -d, sees the K8s tab with full feature set.CHANGELOG.md§ 0.2.0 for accuracy.docs/SECURITY.md§ "Privilege scoping by deployment shape" — make sure the three-shape compromise-impact statements match what we want to publish.kubectl apply -f deploy/k8s/{namespace,rbac,pvc,deployment,service}.yamlagainst a test cluster to verify the in-cluster shape still rolls cleanly.🤖 Generated with Claude Code