Skip to content

fix(consts): host-derive ContainerUID, structured CP host-id diagnostic#291

Merged
schmitthub merged 8 commits into
mainfrom
fix/uid
May 19, 2026
Merged

fix(consts): host-derive ContainerUID, structured CP host-id diagnostic#291
schmitthub merged 8 commits into
mainfrom
fix/uid

Conversation

@schmitthub
Copy link
Copy Markdown
Owner

@schmitthub schmitthub commented May 19, 2026

Summary

  • Fixes ~/.claude/projects bind-mount EACCES on Linux hosts where the invoker's UID isn't 1001. Pre-fix the agent image's claude user was hardcoded at 1001; bind writes from inside the container hit EACCES on virtually every Linux host (typical UID is 1000). macOS / Docker Desktop unaffected — virtiofs masks UID/GID at the share boundary.
  • Host-derives container UID via two scoped accessors in internal/consts/:
    • consts.ContainerUID() / ContainerGID() resolve from the CLI invoker's os.Getuid() / Getgid(). Linux-onlyresolveProcessID short-circuits to fallbackContainerUID/GID (1001) on non-Linux hosts, restoring the pre-fix 1001 path on macOS where virtiofs already handles ownership translation and where a low host GID (macOS staff = 20) would otherwise collide with a base-image group (Debian dialout = 20). Sudo (Getuid() == 0) and Windows (-1) also fall back to 1001.
    • consts.HostUID() / HostGID() are env-fed (CLAWKER_HOST_UID / CLAWKER_HOST_GID) from the CLI at CP-container boot (BuildCPContainerConfig). Return type is uint32 (uid_t) so the clawkerdv1.PipeStage Uid/Gid assignment in userStage is a total identity — no narrowing cast at the call site.
  • resolveHostID parses via strconv.ParseUint(_, 10, 32) so out-of-uint32-range env values become a structured Reason: "malformed" fallback instead of silently wrapping at a downstream uint32 cast (closes CodeQL docs: refresh user-facing docs for CP, clawkerd, eBPF firewall #285 / fix(agent): cert SAN identity binding + invalid-length regression hardening #286). Zero is still rejected (non_positive) to prevent a sudo'd CLI from propagating root into userStage.
  • Dockerfile groupadd/addgroup is now idempotent — when the host GID collides with a base-image group (low GIDs are reserved for system groups), it falls back to an auto-assigned GID for ${USERNAME} while still creating the user. UPG is preserved either way because useradd --gid ${USERNAME} binds by group name. The bind-mount writability contract holds on UID match alone — GID match is best-effort. Applied to both internal/bundler/assets/Dockerfile.tmpl and the clawker-support reference copy.
  • Refactored from mutable exported vars to func() <typ> accessors backed by unexported package-private state — closes the stomp-able-exported-mutable-var foot-gun.
  • resolveHostID is side-effect-free (returns (uint32, HostIDResolution)). cmd/clawker-cp/main.go logHostIdentity emits event=host_id_unavailable (renamed from host_uid_unavailable; the env field — CLAWKER_HOST_UID or CLAWKER_HOST_GID — already disambiguates UID vs GID) at warn via the project zerolog surface (rotating CP logfile) when env was unset / malformed / non-positive.
  • Deletes the now-dead Linux UID-mismatch warning + SetupMountsResult.Warnings field + the mountWarnings slice + the wsResult.Warnings iterating caller in container_create.go + the empty-warnings asserts in setup_test.go.

Test plan

  • go test ./internal/consts/... ./internal/config/... ./internal/workspace/... ./internal/controlplane/cpboot/... ./internal/controlplane/agent/... ./cmd/clawker-cp/... — all pass
  • go vet ./... clean
  • TestResolveHostID — table covers happy / unset / empty / zero / negative (now malformed via ParseUint) / malformed / overflow (2^32) — pins the uid_t-shape guard against silent wrap on the downstream uint32 cast
  • TestCPContainer_HostUIDGIDEnv_Emitted — anchored on os.Getuid() with the production fallback rule, so a hardcoded literal trips the test on non-1001 hosts
  • TestLogHostIdentity — parses zerolog JSON output, asserts level/event (host_id_unavailable) / env/reason/fallback structurally (not substring grep)
  • TestConstantAccessors no longer hardcodes 1001 — previously would have failed on every non-1001 dev host post-fix
  • go test ./test/e2e/... -run TestBindMountUID_E2E -v — Linux-only e2e that runs the full clawker init/build/run, exec-writes a probe file from inside the container to ~/.claude/projects, and host-stats st.Uid == os.Getuid(). Skips on darwin (virtiofs would false-pass).
  • CI sign-off

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 19, 2026 07:30
Comment thread internal/controlplane/agent/init.go Fixed
Comment thread internal/controlplane/agent/init.go Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Linux ~/.claude/projects bind-mount write failures by deriving the agent container’s UID/GID from the host invoker, and by plumbing the host UID/GID into the CP so CP-driven init stages drop privileges consistently. It also removes now-dead UID-mismatch warnings and replaces earlier stderr init-time diagnostics with structured CP logging and targeted tests.

Changes:

  • Derive ContainerUID()/ContainerGID() from os.Getuid()/os.Getgid() (with fallback) and add CP-side HostUID()/HostGID() sourced from CLAWKER_HOST_UID/GID.
  • Wire host UID/GID env into the CP container and update CP-driven userStage to drop to HostUID/HostGID.
  • Remove workspace mount warnings plumbing; add unit/E2E coverage and structured CP diagnostic logging.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/e2e/bind_mount_uid_test.go Adds a Linux-only E2E test asserting bind-mount writes land with host UID.
internal/workspace/setup.go Removes UID-mismatch warning path and associated warning accumulation.
internal/workspace/setup_test.go Updates tests to reflect removal of SetupMountsResult.Warnings.
internal/workspace/CLAUDE.md Updates workspace failure-handling docs for new UID/GID contract.
internal/controlplane/cpboot/cp_container.go Plumbs CLAWKER_HOST_UID/GID into CP container env.
internal/controlplane/cpboot/container_config_test.go Adds tests ensuring host UID/GID env is emitted for CP container.
internal/controlplane/agent/init.go Switches userStage UID/GID to consts.HostUID()/HostGID().
internal/controlplane/agent/init_test.go Updates test comment to refer to Host* values.
internal/consts/host_user_test.go Adds table test covering all resolveHostID branches.
internal/consts/controlplane.go Introduces EnvHostUID/GID, HostUID/GID accessors, and resolution metadata.
internal/consts/consts.go Replaces fixed container UID/GID constants with host-derived accessors + fallback.
internal/config/consts.go Updates deprecated config accessors to call consts.ContainerUID()/GID().
internal/config/config_test.go Removes hardcoded UID/GID assertions from constant accessor test.
internal/config/CLAUDE.md Updates docs to reflect host-derived container UID/GID semantics.
internal/cmd/container/shared/container_create.go Removes surfacing of workspace mount warnings (field deleted).
internal/bundler/CLAUDE.md Documents that host UID affects the rendered Dockerfile and build cache key.
cmd/clawker-cp/main.go Adds structured CP warning on missing/invalid host UID/GID env at boot.
cmd/clawker-cp/main_test.go Adds JSON-structural test for logHostIdentity event output.
.serena/memories/fix_bind_mount_uid_postmortem.md Adds postmortem / implementation notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/consts/controlplane.go Outdated
Comment thread cmd/clawker-cp/main.go Outdated
Comment thread cmd/clawker-cp/main.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Comment thread test/e2e/bind_mount_uid_test.go
Comment thread internal/workspace/setup.go Outdated
Comment thread cmd/clawker-cp/main.go Outdated
schmitthub and others added 3 commits May 19, 2026 12:31
PR #269's ~/.claude/projects bind mount silently failed to persist
auto-memory + session jsonls when the host UID was not 1001 — the
agent image's claude user was hardcoded at UID 1001 (Dockerfile.tmpl
useradd) so bind-mount writes from inside the container hit EACCES
on virtually every Linux host (typical UID is 1000). macOS unaffected
(virtiofs translates).

Two scoped accessors in internal/consts/, distinguished by where they
read:

- consts.ContainerUID / ContainerGID: refactored from const 1001 to
  vars initialized from os.Getuid() / os.Getgid(). Rejects 0 (sudo
  invocation) and -1 (Windows); falls back to fallbackContainerUID
  (1001). CLI-side. All existing callers (bundler tar, containerfs
  tar, docker volume copy, deprecated cfg delegates) transparently
  get the host UID with zero call-site changes.

- consts.HostUID / HostGID: new env-fed vars in
  internal/consts/controlplane.go. Read CLAWKER_HOST_UID /
  CLAWKER_HOST_GID via resolveHostUID with the same > 0 guard;
  fallback pinned to fallbackContainerUID (NOT ContainerUID, which
  inside the CP container resolves to 0 — CP runs as root for BPF
  caps, so a missing env would have silently dropped userStage to
  root). Invalid env emits event=host_uid_invalid on stderr so
  operators get a signal in docker logs <cp>.

The CLI populates the env vars on the CP container at boot via
BuildCPContainerConfig (cpboot/cp_container.go), using
consts.ContainerUID / ContainerGID (host invoker's UID in CLI
process). The single CP-side caller — userStage in
internal/controlplane/agent/init.go — switches to HostUID / HostGID
so post-init shell stages drop to the same UID baked into the agent
image.

The workspace UID-mismatch warning at setup.go is deleted (now dead
by construction). Image content hash is host-UID-specific via the
existing Dockerfile rendering, so multi-user hosts naturally get
separate cached images — documented in internal/bundler/CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ad Warnings field

Follow-up to the host-derive ContainerUID commit on this branch.
Addresses PR-review findings without changing behavior:

- consts.{Container,Host}{UID,GID} converted from exported mutable
  vars to func() int accessors backed by unexported package-private
  state. Removes the stomp-able exported-mutable-var pattern; all
  callers updated.
- resolveHostID is now side-effect-free, returning (int,
  HostIDResolution). cmd/clawker-cp/main.go adds logHostIdentity
  which emits event=host_uid_unavailable via the project zerolog
  surface (rotating CP logfile) when EnvHostUID/GID came through
  unset or invalid. The earlier package-init fmt.Fprintf to stderr
  is removed.
- internal/workspace setup.go: SetupMountsResult.Warnings field +
  the dead mountWarnings slice are removed; iterating caller and
  warnings-empty asserts go with them.
- TestConstantAccessors no longer hardcodes 1001; UID/GID asserts
  dropped as circular.
- TestCPContainer_HostUIDGIDEnv_Emitted anchored on os.Getuid()
  with the same fallback rule.
- TestLogHostIdentity parses zerolog JSON and asserts level/event/
  env/reason/fallback structurally, not substring grep.
- test/e2e/bind_mount_uid_test.go: new Linux-only e2e exercising
  the full chain end-to-end (skips on darwin — virtiofs would
  false-pass).
- Misleading init-ordering comment on HostUID deleted.
- Comments trimmed across consts, controlplane, cp_container,
  agent/init — drop WHAT, keep WHY.
- Postmortem memory: file:line refs scrubbed; follow-ups 2 + 3
  marked DONE.
- CLAUDE.md docs updated to the func-call form.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
resolveHostID now parses CLAWKER_HOST_UID/GID via ParseUint(_,10,32) so
over-uint32 inputs become a structured "malformed" fallback instead of
silently wrapping when userStage casts to uint32 (closes CodeQL #285,
#286). HostUID()/HostGID() and HostIDResolution.Value return uint32, so
the PipeStage assignment is a total identity at the call site.

Diagnostic event renamed host_uid_unavailable -> host_id_unavailable;
the env field on the record (CLAWKER_HOST_UID or CLAWKER_HOST_GID)
already disambiguates UID vs GID for the operator.

Also collapses the duplicated logHostIdentity doc paragraph and
qualifies the bare HostUID() reference in workspace/setup.go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 19, 2026 19:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Comment thread cmd/clawker-cp/main.go
Comment thread internal/config/config_test.go
Host-UID/GID derivation is only meaningful on Linux hosts, where the
container's numeric IDs land directly on bind-mount files. Docker
Desktop on macOS (virtiofs / gRPC FUSE) masks UID/GID at the share
boundary — any container UID is presented on the host as the host
user, so baking the host UID into the image gains nothing and forces
groupadd to claim a GID that often collides with a base-image group
(macOS GID 20 = staff vs Debian dialout). resolveProcessID now
returns fallbackContainerUID/GID on non-Linux hosts, restoring the
pre-fix 1001 path on macOS where virtiofs already handles ownership
translation.

The Dockerfile useradd block is also made idempotent so that an
edge-case Linux host whose UID/GID happens to overlap a base-image
group (low GIDs assigned to system groups) still builds: groupadd at
the host GID falls back to an auto-assigned GID for ${USERNAME},
preserving UPG. The bind-mount writability contract holds on UID
match alone — GID match is best-effort.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

Comment thread internal/controlplane/cpboot/container_config_test.go Outdated
Comment thread internal/config/CLAUDE.md Outdated
Comment thread internal/bundler/CLAUDE.md Outdated
Comment thread cmd/clawker-cp/main.go
schmitthub and others added 2 commits May 19, 2026 20:30
Append Phase 9 (uid_t-typed accessors + neutral host_id_unavailable
event, ce98cbe), Phase 10 (Linux-only resolveProcessID + idempotent
groupadd, 1cc857c), and Phase 11 (soft-close hand-off). Top-of-file
status banner marks the work complete pending alpha-release verification
on a Linux host — the Linux-only TestBindMountUID_E2E and the Phase 7
host-side manual sequence are the only gates remaining before
hard-close.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TestCPContainer_HostUIDGIDEnv_Emitted computed wantUID/wantGID
directly from os.Getuid()/os.Getgid() with only a `<= 0` fallback,
diverging from consts.resolveProcessID's `runtime.GOOS != "linux" =>
1001` short-circuit added in 1cc857c. On macOS where os.Getuid() is
typically 501, the test would assert CLAWKER_HOST_UID=501 while the
production resolver emits 1001 — hard fail on every macOS run. Mirror
the production fallback exactly: branch on runtime.GOOS, then apply
the `> 0` Linux refinement.

internal/config/CLAUDE.md and internal/bundler/CLAUDE.md both
described ContainerUID/GID as Linux-style host-derived without
qualifying the Linux-only gate. Updated wording to call out the
non-Linux unconditional-fallback path with the virtiofs / groupadd
collision rationale so a future reader doesn't try to "fix" the
gate as if it were a bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 19, 2026 20:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Comment thread internal/consts/host_user_test.go Outdated
Comment thread test/e2e/bind_mount_uid_test.go
schmitthub and others added 2 commits May 19, 2026 20:44
… 0.10.0

clawker-support
- known-issues.md: removed the "~/.claude/projects bind mount Linux
  UID mismatch" entry; fix landed (host-derived ContainerUID on Linux
  via 1cc857c). Workaround block obsolete.
- monitoring.md (new): OTel + OpenSearch + Prometheus stack overview,
  Clawker analytics workspace navigation, telemetry env var contract
  (build-time bake vs runtime enable), troubleshooting for empty
  indices / failed bootstrap / Prometheus targets / cross-container
  diagnostics. Routed from SKILL.md step 12 and troubleshooting.md.
- SKILL.md / troubleshooting.md: route monitoring questions to the
  new reference; old "punt to docs.clawker.dev" step kept for other
  features (worktrees, etc.).
- plugin.json: 0.9.1 -> 0.10.0 (minor — new reference file,
  user-visible structural change, plus the cd84d2e / 1cc857c
  Dockerfile.tmpl idempotent groupadd block from the UID branch).

docs.clawker.dev (driveby from fix/uid PR)
- container-internals.mdx: drop literal "UID 1001" from the claude
  user description; the Privilege Model table now says "claude
  (host-derived on Linux, 1001 elsewhere)". The literal was wrong
  on every non-1001 Linux host post-fix.
- threat-model.mdx, security.mdx: same hedge — describe the user as
  unprivileged + UID baked at build time with Linux-host-derived
  semantics, instead of asserting a stale literal value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolveHostID reads env via os.Getenv, which returns "" for both
unset and set-to-"". The "unset" and "empty" cases exercised the
same code path; "unset" used os.Unsetenv which doesn't restore a
prior value at test end — a theoretical cross-test leak if the
probe env were set in the parent shell. Drop the "unset" case +
envSet branching; all cases now drive the env through t.Setenv,
which restores the prior value via t.Cleanup automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 19, 2026 20:47
@schmitthub schmitthub merged commit 4d97446 into main May 19, 2026
20 checks passed
@schmitthub schmitthub deleted the fix/uid branch May 19, 2026 20:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 2 comments.

Comment on lines +320 to 321
RUN (groupadd --gid {{.GID}} ${USERNAME} 2>/dev/null || groupadd ${USERNAME}) \
&& useradd --uid {{.UID}} --gid ${USERNAME} --shell {{.Shell}} --create-home ${USERNAME}
Comment on lines +320 to 321
RUN (groupadd --gid {{.GID}} ${USERNAME} 2>/dev/null || groupadd ${USERNAME}) \
&& useradd --uid {{.UID}} --gid ${USERNAME} --shell {{.Shell}} --create-home ${USERNAME}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants