fix(consts): host-derive ContainerUID, structured CP host-id diagnostic#291
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes Linux ~/.claude/projects bind-mount write failures by deriving the agent container’s UID/GID from the host invoker, and by plumbing the host UID/GID into the CP so CP-driven init stages drop privileges consistently. It also removes now-dead UID-mismatch warnings and replaces earlier stderr init-time diagnostics with structured CP logging and targeted tests.
Changes:
- Derive
ContainerUID()/ContainerGID()fromos.Getuid()/os.Getgid()(with fallback) and add CP-sideHostUID()/HostGID()sourced fromCLAWKER_HOST_UID/GID. - Wire host UID/GID env into the CP container and update CP-driven
userStageto drop toHostUID/HostGID. - Remove workspace mount warnings plumbing; add unit/E2E coverage and structured CP diagnostic logging.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/e2e/bind_mount_uid_test.go | Adds a Linux-only E2E test asserting bind-mount writes land with host UID. |
| internal/workspace/setup.go | Removes UID-mismatch warning path and associated warning accumulation. |
| internal/workspace/setup_test.go | Updates tests to reflect removal of SetupMountsResult.Warnings. |
| internal/workspace/CLAUDE.md | Updates workspace failure-handling docs for new UID/GID contract. |
| internal/controlplane/cpboot/cp_container.go | Plumbs CLAWKER_HOST_UID/GID into CP container env. |
| internal/controlplane/cpboot/container_config_test.go | Adds tests ensuring host UID/GID env is emitted for CP container. |
| internal/controlplane/agent/init.go | Switches userStage UID/GID to consts.HostUID()/HostGID(). |
| internal/controlplane/agent/init_test.go | Updates test comment to refer to Host* values. |
| internal/consts/host_user_test.go | Adds table test covering all resolveHostID branches. |
| internal/consts/controlplane.go | Introduces EnvHostUID/GID, HostUID/GID accessors, and resolution metadata. |
| internal/consts/consts.go | Replaces fixed container UID/GID constants with host-derived accessors + fallback. |
| internal/config/consts.go | Updates deprecated config accessors to call consts.ContainerUID()/GID(). |
| internal/config/config_test.go | Removes hardcoded UID/GID assertions from constant accessor test. |
| internal/config/CLAUDE.md | Updates docs to reflect host-derived container UID/GID semantics. |
| internal/cmd/container/shared/container_create.go | Removes surfacing of workspace mount warnings (field deleted). |
| internal/bundler/CLAUDE.md | Documents that host UID affects the rendered Dockerfile and build cache key. |
| cmd/clawker-cp/main.go | Adds structured CP warning on missing/invalid host UID/GID env at boot. |
| cmd/clawker-cp/main_test.go | Adds JSON-structural test for logHostIdentity event output. |
| .serena/memories/fix_bind_mount_uid_postmortem.md | Adds postmortem / implementation notes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
PR #269's ~/.claude/projects bind mount silently failed to persist auto-memory + session jsonls when the host UID was not 1001 — the agent image's claude user was hardcoded at UID 1001 (Dockerfile.tmpl useradd) so bind-mount writes from inside the container hit EACCES on virtually every Linux host (typical UID is 1000). macOS unaffected (virtiofs translates). Two scoped accessors in internal/consts/, distinguished by where they read: - consts.ContainerUID / ContainerGID: refactored from const 1001 to vars initialized from os.Getuid() / os.Getgid(). Rejects 0 (sudo invocation) and -1 (Windows); falls back to fallbackContainerUID (1001). CLI-side. All existing callers (bundler tar, containerfs tar, docker volume copy, deprecated cfg delegates) transparently get the host UID with zero call-site changes. - consts.HostUID / HostGID: new env-fed vars in internal/consts/controlplane.go. Read CLAWKER_HOST_UID / CLAWKER_HOST_GID via resolveHostUID with the same > 0 guard; fallback pinned to fallbackContainerUID (NOT ContainerUID, which inside the CP container resolves to 0 — CP runs as root for BPF caps, so a missing env would have silently dropped userStage to root). Invalid env emits event=host_uid_invalid on stderr so operators get a signal in docker logs <cp>. The CLI populates the env vars on the CP container at boot via BuildCPContainerConfig (cpboot/cp_container.go), using consts.ContainerUID / ContainerGID (host invoker's UID in CLI process). The single CP-side caller — userStage in internal/controlplane/agent/init.go — switches to HostUID / HostGID so post-init shell stages drop to the same UID baked into the agent image. The workspace UID-mismatch warning at setup.go is deleted (now dead by construction). Image content hash is host-UID-specific via the existing Dockerfile rendering, so multi-user hosts naturally get separate cached images — documented in internal/bundler/CLAUDE.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ad Warnings field
Follow-up to the host-derive ContainerUID commit on this branch.
Addresses PR-review findings without changing behavior:
- consts.{Container,Host}{UID,GID} converted from exported mutable
vars to func() int accessors backed by unexported package-private
state. Removes the stomp-able exported-mutable-var pattern; all
callers updated.
- resolveHostID is now side-effect-free, returning (int,
HostIDResolution). cmd/clawker-cp/main.go adds logHostIdentity
which emits event=host_uid_unavailable via the project zerolog
surface (rotating CP logfile) when EnvHostUID/GID came through
unset or invalid. The earlier package-init fmt.Fprintf to stderr
is removed.
- internal/workspace setup.go: SetupMountsResult.Warnings field +
the dead mountWarnings slice are removed; iterating caller and
warnings-empty asserts go with them.
- TestConstantAccessors no longer hardcodes 1001; UID/GID asserts
dropped as circular.
- TestCPContainer_HostUIDGIDEnv_Emitted anchored on os.Getuid()
with the same fallback rule.
- TestLogHostIdentity parses zerolog JSON and asserts level/event/
env/reason/fallback structurally, not substring grep.
- test/e2e/bind_mount_uid_test.go: new Linux-only e2e exercising
the full chain end-to-end (skips on darwin — virtiofs would
false-pass).
- Misleading init-ordering comment on HostUID deleted.
- Comments trimmed across consts, controlplane, cp_container,
agent/init — drop WHAT, keep WHY.
- Postmortem memory: file:line refs scrubbed; follow-ups 2 + 3
marked DONE.
- CLAUDE.md docs updated to the func-call form.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
resolveHostID now parses CLAWKER_HOST_UID/GID via ParseUint(_,10,32) so over-uint32 inputs become a structured "malformed" fallback instead of silently wrapping when userStage casts to uint32 (closes CodeQL #285, #286). HostUID()/HostGID() and HostIDResolution.Value return uint32, so the PipeStage assignment is a total identity at the call site. Diagnostic event renamed host_uid_unavailable -> host_id_unavailable; the env field on the record (CLAWKER_HOST_UID or CLAWKER_HOST_GID) already disambiguates UID vs GID for the operator. Also collapses the duplicated logHostIdentity doc paragraph and qualifies the bare HostUID() reference in workspace/setup.go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Host-UID/GID derivation is only meaningful on Linux hosts, where the
container's numeric IDs land directly on bind-mount files. Docker
Desktop on macOS (virtiofs / gRPC FUSE) masks UID/GID at the share
boundary — any container UID is presented on the host as the host
user, so baking the host UID into the image gains nothing and forces
groupadd to claim a GID that often collides with a base-image group
(macOS GID 20 = staff vs Debian dialout). resolveProcessID now
returns fallbackContainerUID/GID on non-Linux hosts, restoring the
pre-fix 1001 path on macOS where virtiofs already handles ownership
translation.
The Dockerfile useradd block is also made idempotent so that an
edge-case Linux host whose UID/GID happens to overlap a base-image
group (low GIDs assigned to system groups) still builds: groupadd at
the host GID falls back to an auto-assigned GID for ${USERNAME},
preserving UPG. The bind-mount writability contract holds on UID
match alone — GID match is best-effort.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Append Phase 9 (uid_t-typed accessors + neutral host_id_unavailable event, ce98cbe), Phase 10 (Linux-only resolveProcessID + idempotent groupadd, 1cc857c), and Phase 11 (soft-close hand-off). Top-of-file status banner marks the work complete pending alpha-release verification on a Linux host — the Linux-only TestBindMountUID_E2E and the Phase 7 host-side manual sequence are the only gates remaining before hard-close. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TestCPContainer_HostUIDGIDEnv_Emitted computed wantUID/wantGID directly from os.Getuid()/os.Getgid() with only a `<= 0` fallback, diverging from consts.resolveProcessID's `runtime.GOOS != "linux" => 1001` short-circuit added in 1cc857c. On macOS where os.Getuid() is typically 501, the test would assert CLAWKER_HOST_UID=501 while the production resolver emits 1001 — hard fail on every macOS run. Mirror the production fallback exactly: branch on runtime.GOOS, then apply the `> 0` Linux refinement. internal/config/CLAUDE.md and internal/bundler/CLAUDE.md both described ContainerUID/GID as Linux-style host-derived without qualifying the Linux-only gate. Updated wording to call out the non-Linux unconditional-fallback path with the virtiofs / groupadd collision rationale so a future reader doesn't try to "fix" the gate as if it were a bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… 0.10.0 clawker-support - known-issues.md: removed the "~/.claude/projects bind mount Linux UID mismatch" entry; fix landed (host-derived ContainerUID on Linux via 1cc857c). Workaround block obsolete. - monitoring.md (new): OTel + OpenSearch + Prometheus stack overview, Clawker analytics workspace navigation, telemetry env var contract (build-time bake vs runtime enable), troubleshooting for empty indices / failed bootstrap / Prometheus targets / cross-container diagnostics. Routed from SKILL.md step 12 and troubleshooting.md. - SKILL.md / troubleshooting.md: route monitoring questions to the new reference; old "punt to docs.clawker.dev" step kept for other features (worktrees, etc.). - plugin.json: 0.9.1 -> 0.10.0 (minor — new reference file, user-visible structural change, plus the cd84d2e / 1cc857c Dockerfile.tmpl idempotent groupadd block from the UID branch). docs.clawker.dev (driveby from fix/uid PR) - container-internals.mdx: drop literal "UID 1001" from the claude user description; the Privilege Model table now says "claude (host-derived on Linux, 1001 elsewhere)". The literal was wrong on every non-1001 Linux host post-fix. - threat-model.mdx, security.mdx: same hedge — describe the user as unprivileged + UID baked at build time with Linux-host-derived semantics, instead of asserting a stale literal value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolveHostID reads env via os.Getenv, which returns "" for both unset and set-to-"". The "unset" and "empty" cases exercised the same code path; "unset" used os.Unsetenv which doesn't restore a prior value at test end — a theoretical cross-test leak if the probe env were set in the parent shell. Drop the "unset" case + envSet branching; all cases now drive the env through t.Setenv, which restores the prior value via t.Cleanup automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines
+320
to
321
| RUN (groupadd --gid {{.GID}} ${USERNAME} 2>/dev/null || groupadd ${USERNAME}) \ | ||
| && useradd --uid {{.UID}} --gid ${USERNAME} --shell {{.Shell}} --create-home ${USERNAME} |
Comment on lines
+320
to
321
| RUN (groupadd --gid {{.GID}} ${USERNAME} 2>/dev/null || groupadd ${USERNAME}) \ | ||
| && useradd --uid {{.UID}} --gid ${USERNAME} --shell {{.Shell}} --create-home ${USERNAME} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
~/.claude/projectsbind-mount EACCES on Linux hosts where the invoker's UID isn't 1001. Pre-fix the agent image'sclaudeuser was hardcoded at 1001; bind writes from inside the container hit EACCES on virtually every Linux host (typical UID is 1000). macOS / Docker Desktop unaffected — virtiofs masks UID/GID at the share boundary.internal/consts/:consts.ContainerUID()/ContainerGID()resolve from the CLI invoker'sos.Getuid()/Getgid(). Linux-only —resolveProcessIDshort-circuits tofallbackContainerUID/GID(1001) on non-Linux hosts, restoring the pre-fix 1001 path on macOS where virtiofs already handles ownership translation and where a low host GID (macOS staff = 20) would otherwise collide with a base-image group (Debian dialout = 20). Sudo (Getuid() == 0) and Windows (-1) also fall back to 1001.consts.HostUID()/HostGID()are env-fed (CLAWKER_HOST_UID/CLAWKER_HOST_GID) from the CLI at CP-container boot (BuildCPContainerConfig). Return type isuint32(uid_t) so theclawkerdv1.PipeStageUid/Gid assignment inuserStageis a total identity — no narrowing cast at the call site.resolveHostIDparses viastrconv.ParseUint(_, 10, 32)so out-of-uint32-range env values become a structuredReason: "malformed"fallback instead of silently wrapping at a downstreamuint32cast (closes CodeQL docs: refresh user-facing docs for CP, clawkerd, eBPF firewall #285 / fix(agent): cert SAN identity binding + invalid-length regression hardening #286). Zero is still rejected (non_positive) to prevent a sudo'd CLI from propagating root intouserStage.groupadd/addgroupis now idempotent — when the host GID collides with a base-image group (low GIDs are reserved for system groups), it falls back to an auto-assigned GID for${USERNAME}while still creating the user. UPG is preserved either way becauseuseradd --gid ${USERNAME}binds by group name. The bind-mount writability contract holds on UID match alone — GID match is best-effort. Applied to bothinternal/bundler/assets/Dockerfile.tmpland the clawker-support reference copy.func() <typ>accessors backed by unexported package-private state — closes the stomp-able-exported-mutable-var foot-gun.resolveHostIDis side-effect-free (returns(uint32, HostIDResolution)).cmd/clawker-cp/main.gologHostIdentityemitsevent=host_id_unavailable(renamed fromhost_uid_unavailable; theenvfield —CLAWKER_HOST_UIDorCLAWKER_HOST_GID— already disambiguates UID vs GID) at warn via the project zerolog surface (rotating CP logfile) when env was unset / malformed / non-positive.SetupMountsResult.Warningsfield + themountWarningsslice + thewsResult.Warningsiterating caller incontainer_create.go+ the empty-warnings asserts insetup_test.go.Test plan
go test ./internal/consts/... ./internal/config/... ./internal/workspace/... ./internal/controlplane/cpboot/... ./internal/controlplane/agent/... ./cmd/clawker-cp/...— all passgo vet ./...cleanTestResolveHostID— table covers happy / unset / empty / zero / negative (nowmalformedvia ParseUint) / malformed / overflow (2^32) — pins the uid_t-shape guard against silent wrap on the downstreamuint32castTestCPContainer_HostUIDGIDEnv_Emitted— anchored onos.Getuid()with the production fallback rule, so a hardcoded literal trips the test on non-1001 hostsTestLogHostIdentity— parses zerolog JSON output, assertslevel/event(host_id_unavailable) /env/reason/fallbackstructurally (not substring grep)TestConstantAccessorsno longer hardcodes 1001 — previously would have failed on every non-1001 dev host post-fixgo test ./test/e2e/... -run TestBindMountUID_E2E -v— Linux-only e2e that runs the fullclawker init/build/run, exec-writes a probe file from inside the container to~/.claude/projects, and host-statsst.Uid == os.Getuid(). Skips on darwin (virtiofs would false-pass).🤖 Generated with Claude Code