Skip to content

fix(agent): cert SAN identity binding + invalid-length regression hardening#286

Merged
schmitthub merged 18 commits into
mainfrom
fix/invalid-name-len
May 18, 2026
Merged

fix(agent): cert SAN identity binding + invalid-length regression hardening#286
schmitthub merged 18 commits into
mainfrom
fix/invalid-name-len

Conversation

@schmitthub
Copy link
Copy Markdown
Owner

Summary

Fixes the underlying regression that surfaced as agent-name length overflow ("invalid length") on long randomly-generated agent names, and the cascade of follow-ups that the PR review identified.

The root cause: the per-agent AgentFullName (clawker.<project>.<agent>) used to live in the x509 Subject.CommonName, which is hard-capped at 64 bytes. A long docker.GenerateRandomName output ("adventurous-rosalind-franklin-…") pushed past that cap and MintAgentCert failed at signing time. Moving AgentFullName to a urn:clawker:agent: URI SAN removes the cap, but it also changes the trust model — the cert's claim is no longer load-bearing identity, so the CP-side gate has to ground identity in Docker labels (peer-IP → container → labels) and verify the cert SAN against that.

Done across this branch:

  • Identity moves to URI SAN + cert CN becomes binary identity. MintAgentCert embeds urn:clawker:agent:<full-name> (the per-agent identity) and urn:clawker:container:<id> (channel binding). CN is now the deterministic consts.ContainerClawkerd literal. Typed auth.ProjectSlug / auth.AgentName enforce the form/charset/length contract at every wire boundary.
  • IdentityInterceptor becomes the universal trust gate. Runs on every AgentService RPC (Register included). Three stages: (1) CN pin to consts.ContainerClawkerd, (2) peer-IP → Docker → dev.clawker.{project,agent} labels via ContainerByPeerIP, (3) constant-time compare cert SAN AgentFullName against the label-derived AgentFullName. Resolved container attached to ctx via WithResolvedContainer for handlers.
  • Register handler consumes middleware-resolved identity. No longer re-resolves; cross-checks cert urn:clawker:container: SAN and the RPC body against the daemon-attested truth; captures the cert thumbprint at the gate that persists it; writes the row using label-derived (authoritative) values.
  • Dialer becomes thumbprint-only at Hello. Registry.Lookup is gone; the dialer classifies by thumbprint vs row (outcomeRegistryMatch / outcomeRegistryMiss / outcomeRegistryThumbprintMismatch) and lets IdentityInterceptor catch SAN-label drift on the agent's next inbound RPC. Connection stays open in all cases — asymmetric trust contract (CP must always be reachable for containment).
  • Goose migrations adopted. 00001 is the current schema baseline; 00002 drops the now-unused canonical_cn pre-compute column. Both one-way — registry state regenerates from Docker ground truth on next CP boot.
  • PR-review punch list (last commit, 9 tasks):
    • AgentFullNameFromCert(string, error) with ErrAgentSANMissing / ErrAgentSANMalformed sentinels; interceptor emits distinct event= per shape (wire envelope unchanged).
    • docker.VolumePurposes + typed const suffixes — headroom test auto-covers any new purpose against auth.MaxShortNameLen().
    • New subscribeSessionCancel listens to container die/stop/kill/oom/destroy and calls Dialer.CancelDial, tearing down the doomed CP→clawkerd Session synchronously. Independent of registry evict; docker is the source of truth.
    • validateEntry returns error (no longer panic()); both Registry.Add impls propagate. Keeps the CP serve path clear of panics per root CLAUDE.md.
    • ErrMalformedEntry sentinel wrapped in scanEntry; Register evicts the malformed row and re-writes using middleware-resolved identity rather than refusing the handshake.
    • ProjectSlug.Less / AgentName.Less on the typed identity values; shared sortEntries helper consolidates the duplicated Snapshot sort.
    • Registry.Snapshot()([]Entry, error) so a sqlite query failure can't be silently treated as "no agents" (would evict every row on the next reap pass). ListAgents surfaces codes.Internal.

Test plan

  • make test — 4663 pass, 8 skipped (the usual integration / OS-specific skips).
  • Unit tests added or refreshed for: tri-state SAN classifier, malformed-SAN interceptor stage, register-handler malformed-row evict + re-write, session-cancel subscriber on every exit transition, headroom test now iterating docker.VolumePurposes.
  • Manual smoke: clawker run with a randomly-generated long agent name now succeeds (the original regression). Verify via clawker controlplane agents that the row lands with the typed identity.
  • Restart-resilience spot check: docker stop an agent, confirm CP→clawkerd Session tears down synchronously rather than on next reconnect attempt.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 13, 2026 08:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the agent-name length regression caused by x509 CN’s 64-byte cap by moving per-agent identity to URI SANs, and hardens the trust model by grounding identity in Docker-attested facts (peer IP → container → labels) with SAN cross-checking at the gRPC boundary.

Changes:

  • Move per-agent identity to urn:clawker:agent:<AgentFullName> URI SAN and pin cert CN to consts.ContainerClawkerd; introduce typed auth.ProjectSlug / auth.AgentName and ordering helpers.
  • Make IdentityInterceptor the universal identity gate (including Register) using peer-IP→Docker label resolution + constant-time SAN/label comparison; pass resolved container identity via context to handlers.
  • Adopt goose migrations for the agent registry DB and adjust registry/dialer/listing flows to handle snapshot errors and thumbprint-only registry matching.

Reviewed changes

Copilot reviewed 46 out of 47 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
NOTICE Adds newly introduced dependency licenses.
internal/workspace/strategy.go Switches volume purpose strings to typed constants.
internal/workspace/snapshot.go Switches workspace volume purpose to typed constant.
internal/docker/names.go Adds typed volume purpose constants + list for tests.
internal/docker/names_test.go Adds headroom + random-name validity regression tests.
internal/docker/env.go Updates comments to reflect AgentFullName usage.
internal/controlplane/server.go Propagates Snapshot() ([]Entry, error) and returns codes.Internal on failure.
internal/controlplane/server_test.go Updates tests for typed identity values and new Snapshot signature.
internal/controlplane/CLAUDE.md Updates controlplane startup/identity gate documentation.
internal/controlplane/agent/start.go Adds PeerLookup dep validation; adds session-cancel subscriber; handles Snapshot errors.
internal/controlplane/agent/start_test.go Updates tests for new deps + Snapshot error return; adds session-cancel subscriber coverage.
internal/controlplane/agent/registry.go Converts identity fields to typed values; changes Snapshot to return error; removes CN-based Lookup.
internal/controlplane/agent/registry_test.go Updates tests for error-returning invariants and typed identity.
internal/controlplane/agent/registry_sqlite.go Switches to goose-managed embedded migrations; adds malformed-row sentinel handling and typed re-validation.
internal/controlplane/agent/registry_sqlite_test.go Updates tests for goose migrations, Snapshot errors, and typed identity fields.
internal/controlplane/agent/registry_mock_test.go Regenerates mock for new Registry interface (Snapshot returns error; removes Lookup).
internal/controlplane/agent/register_handler.go Consumes middleware-resolved container identity; verifies request/cert against resolved truth; rewrites malformed rows.
internal/controlplane/agent/peer_lookup.go Introduces ContainerByPeerIP interface + ResolvedContainer + sentinels.
internal/controlplane/agent/peer_lookup_test.go Adds unit tests for peer-IP→container resolution and label validation behavior.
internal/controlplane/agent/peer_lookup_moby.go Implements Docker-backed peer lookup with label filtering and error aggregation.
internal/controlplane/agent/migrations/00001_init.sql Adds goose v1 baseline schema (includes legacy canonical_cn column).
internal/controlplane/agent/migrations/00002_drop_canonical_cn.sql Drops canonical_cn as identity moves to SAN + labels.
internal/controlplane/agent/identity_interceptor.go Reworks interceptor into universal 3-stage gate (CN pin, peer lookup, SAN/label compare).
internal/controlplane/agent/identity_interceptor_test.go Adds tests for each gate stage + stream-context propagation.
internal/controlplane/agent/handler.go Adds peer-leaf extraction helpers and context plumbing for resolved container identity.
internal/controlplane/agent/events_session.go Renames peer identity fields to SAN-based PeerAgentFullName.
internal/controlplane/agent/dispatch_register_test.go Updates tests for typed identity fields and SAN-based peer identity naming.
internal/controlplane/agent/dialer.go Stores per-dial cancel funcs for synchronous teardown; registry classification becomes thumbprint-only; SAN captured for diagnostics.
internal/controlplane/agent/dialer_test.go Updates dialer tests for SAN identity capture and thumbprint-only registry outcomes.
internal/controlplane/agent/CLAUDE.md Updates agent package docs for new trust model, resolver, and registry behavior.
internal/consts/consts.go Adds ContainerClawkerd CN pin constant.
internal/cmd/container/shared/containerfs.go Uses typed volume purpose constants.
internal/cmd/container/shared/container_start.go Updates comments to reflect SAN-based identity.
internal/cmd/container/shared/container_create.go Uses typed volume purpose constants for cleanup tracking.
internal/cmd/container/shared/CLAUDE.md Updates docs for AgentFullName-in-SAN and CN pin.
internal/cmd/container/shared/agent_bootstrap.go Updates docs to reflect AgentFullName SAN composition.
internal/cmd/container/shared/agent_bootstrap_test.go Updates tests: CN pin + AgentFullName SAN presence.
internal/auth/identity.go Increases short-name cap; adds MaxShortNameLen() and Less methods for typed identities.
internal/auth/identity_test.go Updates tests for AgentFullName terminology and random-name length regression.
internal/auth/CLAUDE.md Updates auth docs for SAN-based identity and CN pin.
internal/auth/agent_cert.go Adds Agent SAN scheme + tri-state SAN parsing; CN set to ContainerClawkerd; embeds both agent/container URI SANs.
internal/auth/agent_cert_test.go Updates tests for CN pin + AgentFullNameFromCert tri-state behavior.
go.mod Adds goose + related deps; bumps several indirect dependency versions.
go.sum Records module checksum updates for new/updated dependencies.
cmd/clawker-cp/main.go Wires peer lookup into IdentityInterceptor and agent.Start; updates Register handler construction.
.claude/docs/KEY-CONCEPTS.md Updates repository docs to reflect new identity/registry model.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/auth/agent_cert.go Outdated
Comment thread internal/auth/agent_cert.go Outdated
Comment thread internal/auth/agent_cert.go Outdated
Comment thread internal/controlplane/agent/peer_lookup.go
Comment thread internal/docker/names.go Outdated
Comment thread internal/controlplane/agent/registry_test.go Outdated
schmitthub added a commit that referenced this pull request May 13, 2026
Addresses critical + important findings from the post-punch-list
review of PR #286. CP serve path hardened, identity gate constructor
follows the (*T, error) pattern, trust-anchor resolver fails closed
on ambiguous peer IPs.

- C1: IdentityInterceptor signature changes from panic-on-nil to
  (unary, stream, error). cmd/clawker-cp/main.go logs
  event=agent_identity_unavailable on error and degrades the
  AgentService listener while CP/firewall/admin stay up. Panic at
  this seam (post-eBPF-load) would have stranded pinned eBPF
  programs with no supervisor — exactly the silent-failure shape
  root CLAUDE.md forbids.

- C2: internal/consts/consts.go — three doc-comment fixes to drop
  stale "canonical CN" / "pre-computed registry column" references
  that contradict the post-migration-00002 model.

- C3: TestSQLiteRegistry_Snapshot_ReturnsErrOnDBFailure and
  TestAdminServer_ListAgents_SnapshotError_ReturnsCodesInternal —
  pin the eviction-cascade-prevention surface: Snapshot must
  surface a non-nil error, ListAgents must map it to codes.Internal,
  never silently return an empty result.

- I1: Registry.Add godoc — panics→error wording matching the
  punch-list refactor.

- I2: AgentFullNameFromCert, ContainerIDFromCert, sanTailFromCert
  — three duplicate godoc blocks deleted (godoc/IDE were picking
  up stale (string, bool) descriptions on the SAN parsers).

- I3: dialer.classifyRegistry — ErrMalformedEntry now classifies as
  outcomeRegistryMiss so the Register handshake drives the
  evict+rewrite path. Treating it as NotQueried would have left
  malformed rows stranded forever (every reconnect publishes
  AgentUntrusted, no recovery trigger).

- I4+I5: peer_lookup_moby.LookupByIP — exhaustively walks every
  purpose=agent candidate (no first-match-wins) and returns the new
  ErrAmbiguousPeerIP sentinel when two containers advertise the
  same clawker-net IP. Fail-closed for the transient-stale-endpoint
  race window. Daemon-error wrap now only escalates when NO
  candidate inspected cleanly — preserves the peer_lookup_no_match
  audit signal operators rely on. IdentityInterceptor Stage 2b
  gains a dedicated case → event=agent_identity_ambiguous_peer_ip.

Tests: 116 packages green (in-container, e2e+whail skipped per
project convention).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@schmitthub schmitthub requested a review from Copilot May 13, 2026 09:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 46 out of 47 changed files in this pull request and generated 2 comments.

Comment thread internal/controlplane/agent/registry_test.go
Comment thread internal/docker/names_test.go Outdated
@schmitthub schmitthub requested a review from Copilot May 13, 2026 10:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 46 out of 47 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (5)

internal/controlplane/agent/registry_test.go:1

  • This comment block is duplicated and internally contradictory (first describes panic semantics, then error-return semantics). Please collapse it into a single, accurate description (Add returns an error) so future readers/tests don’t get misled about the intended contract.
    internal/controlplane/agent/start.go:1
  • StartDeps.PeerLookup is made required and validated, but Start() doesn’t appear to use it anywhere in this diff (beyond the nil-check). If Start() doesn’t actually depend on peer lookup yet, consider (mandatory) either wiring it into the components started here or (optional) deferring the requirement until it’s used to avoid forcing callers/tests to pass an unused dependency.
    internal/controlplane/agent/start.go:1
  • StartDeps.PeerLookup is made required and validated, but Start() doesn’t appear to use it anywhere in this diff (beyond the nil-check). If Start() doesn’t actually depend on peer lookup yet, consider (mandatory) either wiring it into the components started here or (optional) deferring the requirement until it’s used to avoid forcing callers/tests to pass an unused dependency.
    internal/docker/names.go:1
  • VolumePurposes is exported as a mutable slice, which can be modified by any caller/package at runtime and unintentionally weaken the headroom test’s coverage assumptions (and any logic that iterates it). Consider exposing this list via a function that returns a copy, or keeping the backing list unexported and providing an accessor, so the canonical set cannot be mutated.
    internal/controlplane/agent/dispatch_register_test.go:1
  • The helper parameter is still named peerCN, but it now represents PeerAgentFullName (SAN-sourced AgentFullName), not CN. Renaming the parameter (e.g., peerAgentFullName) would make the test intent clearer and prevent confusion now that CN is a fixed literal.
package agent

Comment thread internal/controlplane/agent/identity_interceptor.go Outdated
Comment thread internal/controlplane/agent/identity_interceptor.go Outdated
Comment thread internal/controlplane/agent/identity_interceptor.go Outdated
Comment thread internal/controlplane/agent/identity_interceptor.go Outdated
Comment thread internal/controlplane/agent/identity_interceptor.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 60 out of 61 changed files in this pull request and generated 9 comments.

Comment thread internal/controlplane/agent/CLAUDE.md Outdated
Comment thread internal/controlplane/agent/registry_test.go
Comment thread internal/docker/names_test.go Outdated
Comment thread internal/auth/identity.go
Comment thread internal/auth/agent_cert.go Outdated
Comment thread internal/project/manager.go
Comment thread internal/cmd/project/register/register.go
Comment thread docs/configuration.mdx
Comment thread cmd/gen-docs/configuration.mdx.tmpl
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 60 out of 61 changed files in this pull request and generated 3 comments.

Comment thread internal/controlplane/agent/registry.go Outdated
Comment thread internal/controlplane/agent/registry.go Outdated
Comment thread internal/controlplane/agent/peer_lookup.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 66 out of 67 changed files in this pull request and generated 1 comment.

}
}

func TestContainerName(t *testing.T) {
schmitthub and others added 12 commits May 17, 2026 18:30
Random container names (docker.GenerateRandomName worst case ~27 chars)
exceeded the 24-char shortNameMax cap that existed so the composed CN
"clawker.<project>.<agent>" stayed under x509's 64-byte limit. Surfaces
as "agent bootstrap: invalid agent name" from CreateContainer (broke
TestCreateContainer_NoPostInit).

Mint per-agent leaves with Subject.CommonName = consts.ContainerClawkerd
(deterministic binary identity) and put the canonical agent identity in
a urn:clawker:agent:<canonical> URI SAN. CP-side IdentityInterceptor
gains a universal CN pin to ContainerClawkerd (runs for every RPC
including Register); registry Lookup keys off the SAN-sourced
AgentFullName instead of Subject.CommonName. Register handler keeps its
SAN-canonical-vs-request cross-check. Cap raised 24 -> 50; locked with a
cartesian-product regression test over the GenerateRandomName word
lists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace hand-rolled applySchema + schemaMissingCanonicalCN drift detection
with goose-managed migrations embedded via embed.FS. 00001_init.sql captures
the current schema verbatim — no behavioral change for fresh DBs or DBs
already at the current shape.

Pre-goose DBs (have `agents` table but no `goose_db_version` sibling) get
their `agents` table dropped via a one-shot pre-flight; goose then applies
00001 fresh. Alpha policy stands — registry state is regenerable from the
next clawker run's Register handshake.

Migrations live co-located at internal/controlplane/agent/migrations/ and
ship inside the CP binary via embed.FS so the schema definitions are
reachable at runtime inside the CP container without host-side asset
bind mounts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Maps a live mTLS peer IP to the purpose=agent container owning that
clawker-net endpoint and reads project + agent_name straight off the
Docker labels. Lays the trust anchor for the universal identity
middleware: labels are the authoritative source, cert claims become
inputs to verify against them rather than the basis of identity
lookup.

ResolvedContainer carries typed Project/AgentName (auth.ProjectSlug /
auth.AgentName) so a zero-value instance cannot pass an empty-vs-empty
identity compare downstream. Resolver validates labels at the seam and
returns ErrInvalidAgentLabels on malformed input — distinct from
ErrNoContainerForPeerIP so callers can tell daemon-state corruption
from a clean no-match.

Per-container inspect failures log + continue iteration; the wrapped
daemon error propagates only when no candidate matched AND any inspect
erred, preserving the "we couldn't tell" vs "no agent owns this IP"
distinction. Read-only docker calls are bounded by peerLookupTimeout
under context.Background() so a per-RPC cancel cannot turn into a
spurious reject.

Wired through agent.StartDeps as a required field with a nil-check
mirroring Registry/DockerLister/Bus/Dialer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ddleware

Trust anchor is now the kernel-attested peer IP, not cert claims. The
cert's urn:clawker:agent: URI SAN becomes a CLAIM verified against
Docker labels resolved from the peer's IP — never the basis of lookup.

The middleware runs three stages on every RPC including Register
(no opt-out):
  1. CN pin: leaf.Subject.CommonName == consts.ContainerClawkerd
  2a. Cert must carry an agent URI SAN (explicit reject for diagnostic
      fidelity + Docker-roundtrip short-circuit)
  2b. Peer IP -> Docker -> labels via ContainerByPeerIP
  3.  Cert SAN AgentFullName constant-time-compared against the
      label-derived AgentFullName

Resolved (containerID, project, agent_name) is attached to ctx via
WithResolvedContainer; downstream handlers read it via
ResolvedContainerFromContext. Register reads label-derived identity
instead of the request's claim.

Key changes:
- IdentityInterceptor signature: (peerLookup ContainerByPeerIP, log)
  drops the Registry + optedOut params and the IdentityOptedOutMethods
  helper. Registry-row lookup is bookkeeping, not auth.
- WithEntry / EntryFromContext deleted; WithResolvedContainer /
  ResolvedContainerFromContext replace them. The write side is
  permissive on empty ContainerID (no panic on serving path per the
  CP no-panic security contract); the read side returns ok=false for
  both absent and zero-value cases, preventing silent identity vacuum.
- peerIdentity gains PeerAddr (the trust anchor); drops the unused
  Raw field; peerIdentityFromContext delegates to peerLeafFromContext.
- remoteAddrToNetip moves from register_handler.go to handler.go so
  both the interceptor and Register share the same extraction path
  (Task 3 will collapse peerLeafAndIP).
- Five distinct event= log fields per reject path for operator
  triage; uniform "registration rejected" wire envelope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Register handler used to call ContainerInspector + cross-check
labels + peer-IP itself. With IdentityInterceptor as universal trust
middleware (peer-IP -> Docker -> labels -> cert-SAN cross-check), the
handler now reads the resolved (containerID, project, agentName) from
ctx via ResolvedContainerFromContext and focuses on what only it can
do: capture the live thumbprint, verify the cert's container_id SAN
against the resolved truth, verify the request body against the
resolved truth, and write the row using label-derived (authoritative)
values.

Drops:
- ContainerInspector interface + NewMobyContainerInspector +
  mobyContainerInspector + inspectTimeout const + fakeContainerInspector
  test fake. The middleware owns Docker resolution; the handler is
  Docker-free.
- peerLeafAndIP -> handler now uses peerLeafFromContext (handler.go).
- labelsMatchRequest, peerIPMatchesContainer -> middleware's concern.
- subtle.ConstantTimeCompare for non-secret values (ProjectSlug,
  AgentName, ContainerID are daemon-attested labels / public docker
  IDs). Plain != is equally secure here; thumbprint compare remains
  direct [32]byte equality.

Adds:
- agent_register_no_resolved_container -> codes.Internal (wiring
  bug, not identity verdict; operator can distinguish "rejected"
  from "broken").
- agent_register_peer_cert_missing_post_resolve -> defense-in-depth
  PermissionDenied if ctx is tampered between interceptor and
  handler.
- agent_register_request_label_mismatch -> request body lies about
  identity even though cert/labels agree.
- agent_register_container_id_mismatch -> cert container SAN claim
  disagrees with peer-IP-resolved truth.

Tests collapsed into table-driven groups (RequestValidation,
CtxGates, IdentityCrossChecks, RegistryBranches) for ~80 line
reduction.

internal/controlplane/agent/CLAUDE.md updated: Identity contract
rewritten around peer-IP trust anchor; Register flow step 5 reworded;
opt-out subsection replaced with universal three-stage gate; Files +
Test seam tables refreshed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Registry.Lookup(thumbprint, cn) gate combined two checks — channel
binding via thumbprint and identity binding via canonical CN — into one
sentinel-returning call. After Tasks 1-3 the identity binding moved
upstream into IdentityInterceptor (peer-IP → Docker labels → cert SAN
cross-check), so Registry no longer needs to carry the CN half.

Registry surface shrinks: Lookup removed; Add no longer validates
project/agent_name strings (the wire boundary already does, via
auth.NewProjectSlug/NewAgentName). memEntry wrapper and the
canonical-CN pre-compute deleted; entries map is now [Thumbprint]Entry
directly. Dialer.classifyRegistry collapses to a thumbprint compare
against the row keyed by container_id — Match / Miss /
ThumbprintMismatch / NotQueried only. outcomeRegistryCNMismatch and the
dead computeCNPinMatch + canonicalCNFromStrings helpers go away.

sqlite still has the canonical_cn column (NOT NULL) so Add writes "" as
a placeholder until Task 5 of the SAN refactor drops the column via
goose migration 00002.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The canonical_cn pre-compute column has had no readers since Task 4
of the SAN refactor; writes have gone to empty string in the interim
to keep the NOT-NULL-no-default constraint valid. AgentFullName now
lives in a urn:clawker:agent: URI SAN and trust derives from the
kernel-attested peer IP via Docker labels, not from cached strings.

Test seeds a DB at goose-version-1 with a populated canonical_cn row
rather than running on a fresh DB. A fresh-DB test could not
distinguish "00002 dropped the column" from "the column was never
there"; seeding pre-002 state pins the actual migration effect and
verifies row-level data preservation across DROP COLUMN.

Also fix applySchema partial-apply observability: previously the
applied-migration log lines were emitted only on the success branch,
so a stuck DB hid the last successful version behind the wrapped
error. Iterate results before returning the error so triage is
possible from log surface alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sweeps "canonical" terminology out of identity naming per the SAN-refactor
initiative. Identity now consistently uses AgentFullName for the
clawker.<project>.<agent> string; "canonical" only remains in generic
English (canonical container ID, canonical aud claim) and in the
intentionally-kept outcomeRegistryCNMismatch / UntrustedReasonCNMismatch
enum values.

Closes the Task 4 silent-failure-hunter MEDIUM by typing agent.Entry's
AgentName and Project fields as auth.AgentName / auth.ProjectSlug so a
future writer cannot bypass validation. sqlite scanEntry re-runs
auth.NewAgentName / auth.NewProjectSlug on read; Snapshot logs and skips
malformed rows with event=agentregistry_row_skipped plus a per-snapshot
summary line carrying the count.

Drops the server.go ListAgents re-sort — Snapshot's interface contract
already guarantees (Project, AgentName) ordering.

Renames:
- auth.CanonicalAgentCN → auth.AgentFullName
- auth.AgentCanonicalFromCert → auth.AgentFullNameFromCert
- canonicalPrefix → agentFullNamePrefix

Deferred (out of rename scope):
- validateEntry panics on CP serve path (FIXME comment landed)
- LookupByContainerID malformed-row DOS path against re-registration
- auth.AgentName.Less / ProjectSlug.Less typed comparators
- CLAUDE.md doc sweep (Task 8)
- Test-file comment scrub (Task 7)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 7 of the SAN-refactor initiative.

- Scrub "canonical" residue in identity-context test comments + identifiers
  after the Task 6 rename; generic-English "canonical" usages preserved.
- Add TestCapturePeer_DistinctCNAndSAN — production has CN=clawker-clawkerd
  but PeerAgentFullName must come from the urn:clawker:agent: SAN. Existing
  TestCapturePeer_ValidChain collapses CN==SAN so it would pass a regression
  that read CN.
- Add TestContainerName_HeadroomForMaxFields — composes max-length project
  + max-length agent + every production purpose suffix and asserts Docker's
  128-byte resource-name budget holds.
- Expose auth.MaxShortNameLen() so the headroom test consumes the source of
  truth instead of hardcoding 50; catches both shrink and bump regressions
  of shortNameMax.
- Collapse signLeafCNSAN helper into signLeaf (now takes cn + sanFullName).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 8 of the SAN-refactor initiative. Sweeps the post-rename and
post-Lookup-removal residue across user-facing package docs and the
project-wide concept index so a future reader reaching for these
files sees the post-Task-7 reality:

- auth/CLAUDE.md: agent_cert.go row references AgentFullName /
  AgentFullNameFromCert; drops "pre-stored canonical_cn column"
  sentence (column dropped by migration 00002 in Task 5).
- controlplane/agent/CLAUDE.md: Identity contract section
  documents the dropped-column reality plus the typed Entry fields
  (auth.ProjectSlug / auth.AgentName) and the on-read scanEntry
  re-validation + skipped-row events from Task 6. Register flow
  step 1 uses urn:clawker:agent:<full-name> wording. Imports
  section lists AgentFullName / AgentFullNameFromCert.
- cmd/container/shared/CLAUDE.md: rewrites the agent-bootstrap CN
  framing — CN is the deterministic ContainerClawkerd literal;
  AgentFullName rides in the URI SAN.
- controlplane/CLAUDE.md: startup step 8 rewritten around the
  three-stage IdentityInterceptor gate (CN pin → peer-IP→Docker
  →labels → cert SAN ↔ label compare) with the ContainerByPeerIP
  dep and WithResolvedContainer ctx attachment. Constructor
  signature documented.
- .claude/docs/KEY-CONCEPTS.md: agent.Registry row rewritten
  around the current surface (Add / LookupByContainerID / Evict /
  Snapshot); agent.IdentityInterceptor row documents the universal
  three-stage gate; WithEntry/EntryFromContext row replaced with
  WithResolvedContainer/ResolvedContainerFromContext; CanonicalAgentCN
  row replaced with AgentFullName; MintAgentCert signature corrected
  to include containerID; shared.AgentBootstrap row drops the
  removed RegisterAgentInRegistry function reference.

Memory updated: Task 8 → complete. Two new items added to Deferred
list from the final silent-failure-hunter sweep (Snapshot query-
failure swallow; migration 00002 Down-path empty-CN backfill).

Verification: make test GOFLAGS="-trimpath -buildvcs=false" → 4643
tests pass / 8 legitimate skips. go build clean. grep canonical
on internal/auth, internal/controlplane, internal/cmd/container,
.claude/docs returns only generic-English usages (pattern descriptions,
ColorScheme).

Initiative complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses every item in the invalid-length-pr-review-tasks tracker:

- AgentFullNameFromCert now returns (string, error) with
  ErrAgentSANMissing / ErrAgentSANMalformed sentinels so the
  IdentityInterceptor can emit distinct structured-log events for
  missing vs malformed agent URI SANs (wire envelope stays uniform
  PermissionDenied).
- docker.VolumePurposes + typed const suffixes so the headroom test
  in names_test.go auto-covers any new purpose; inline call sites
  migrated to the typed names.
- New subscribeSessionCancel listens for container die/stop/kill/
  oom/destroy docker events (source of truth) and calls
  Dialer.CancelDial, tearing down the doomed CP→clawkerd Session
  synchronously instead of waiting for the next reconnect to
  classify outcomeContainerGone. Per-container cancel handles
  replace the dialing dedup set.
- validateEntry returns error instead of panicking; both Add impls
  propagate; tests assert error returns rather than panics. Keeps
  the CP serve path clear of panic() per root CLAUDE.md security
  contract.
- ErrMalformedEntry sentinel wrapped by scanEntry; Register evicts
  malformed rows and re-writes using middleware-resolved identity
  rather than refusing the handshake; covered by two new tests.
- ProjectSlug.Less / AgentName.Less on the typed identity values;
  sortEntries helper shared by both registry impls.
- Registry.Snapshot() now returns ([]Entry, error); reapOrphans
  aborts on snapshot failure (no longer treats empty as
  authoritative); ListAgents surfaces as codes.Internal.
- Migration 00002 is one-way; the Down half (which pretended to
  preserve trust state across rollback) is removed.

make test: 4663 pass, 8 skipped.
schmitthub and others added 6 commits May 17, 2026 18:30
Addresses critical + important findings from the post-punch-list
review of PR #286. CP serve path hardened, identity gate constructor
follows the (*T, error) pattern, trust-anchor resolver fails closed
on ambiguous peer IPs.

- C1: IdentityInterceptor signature changes from panic-on-nil to
  (unary, stream, error). cmd/clawker-cp/main.go logs
  event=agent_identity_unavailable on error and degrades the
  AgentService listener while CP/firewall/admin stay up. Panic at
  this seam (post-eBPF-load) would have stranded pinned eBPF
  programs with no supervisor — exactly the silent-failure shape
  root CLAUDE.md forbids.

- C2: internal/consts/consts.go — three doc-comment fixes to drop
  stale "canonical CN" / "pre-computed registry column" references
  that contradict the post-migration-00002 model.

- C3: TestSQLiteRegistry_Snapshot_ReturnsErrOnDBFailure and
  TestAdminServer_ListAgents_SnapshotError_ReturnsCodesInternal —
  pin the eviction-cascade-prevention surface: Snapshot must
  surface a non-nil error, ListAgents must map it to codes.Internal,
  never silently return an empty result.

- I1: Registry.Add godoc — panics→error wording matching the
  punch-list refactor.

- I2: AgentFullNameFromCert, ContainerIDFromCert, sanTailFromCert
  — three duplicate godoc blocks deleted (godoc/IDE were picking
  up stale (string, bool) descriptions on the SAN parsers).

- I3: dialer.classifyRegistry — ErrMalformedEntry now classifies as
  outcomeRegistryMiss so the Register handshake drives the
  evict+rewrite path. Treating it as NotQueried would have left
  malformed rows stranded forever (every reconnect publishes
  AgentUntrusted, no recovery trigger).

- I4+I5: peer_lookup_moby.LookupByIP — exhaustively walks every
  purpose=agent candidate (no first-match-wins) and returns the new
  ErrAmbiguousPeerIP sentinel when two containers advertise the
  same clawker-net IP. Fail-closed for the transient-stale-endpoint
  race window. Daemon-error wrap now only escalates when NO
  candidate inspected cleanly — preserves the peer_lookup_no_match
  audit signal operators rely on. IdentityInterceptor Stage 2b
  gains a dedicated case → event=agent_identity_ambiguous_peer_ip.

Tests: 116 packages green (in-container, e2e+whail skipped per
project convention).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLAUDE.md drift: drop stale "Session teardown on evict" deferred-limitation
(shipped via Dialer.CancelDial), document 4th Start step + PeerLookup dep,
add ErrAmbiguousPeerIP to sentinel list, note goose migrations, add
ErrAgentSAN* + MaxShortNameLen() to auth surface.

Comments: drop removed peer-CN match references from dialer doc + agents
field; rewrite register_handler "cert+CN pin" to SAN/labels middleware
gate; add ErrAmbiguousPeerIP to peer_lookup interface; move sanState
block out of BuildContainerSAN's doc.

Error handling: rename main.go err → identityErr to scope IdentityInterceptor
degrade; omit `rows` field on countRows-fail path (Str/Int under one key
breaks OpenSearch ingest); migrate ContainerIDFromCert to (string, error)
tri-state with ErrContainerSAN{Missing,Malformed} sentinels mirroring
AgentFullNameFromCert — distinct structured-log events for operator
triage, uniform PermissionDenied on the wire.

Tests: add ErrAmbiguousPeerIP row to interceptor sentinel table,
TestMintAgentCert_MaxLengthRoundTrip (pins x509 64-byte CN limit against
MaxShortNameLen() inputs), TestContainerIDFromCert_TriState. Delete
TestNamePrefix (tautology) and TestGenerateRandomName format/uniqueness
loop (subsumed by cartesian-walk TestGenerateRandomName_AlwaysValidAsAgentName).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add internal/cmdutil/slugify.go: centralized input normalization
  (lowercase, whitespace collapse, control-strip, leading/trailing
  dash trim). New slugify_test.go covers the FSM branches.
- Reduce auth.NewAgentName / NewProjectSlug to compile-time type
  wrappers; remove shortNameRE, MaxShortNameLen, and the docker
  128-byte cap in ValidateResourceName. Downstream consumers (Docker
  create, x509 URI SAN, IdentityInterceptor) enforce their own
  constraints at op time.
- Drop internal/cmd/project/init/init.go's validateProjectName helper
  and the parallel charset/length test tables; coverage now lives
  centrally in slugify_test.go.
- Wire clawker.yaml::project.name override through project.Manager
  and document the resolution hierarchy in docs/configuration.mdx +
  the gen-docs template.
- Update CLAUDE.md files (auth, cmd/project) and load-bearing godocs
  (agent_cert.go) to stop referring to deleted APIs and to describe
  the new typed-wrapper-only contract honestly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves 14 Copilot review threads:

- identity_interceptor: resolve closure now returns the original ctx
  on all five PermissionDenied paths instead of nil, hardening against
  future logging/metrics that might nil-deref on the CP serve path
  (root CLAUDE.md treats CP panic as a security incident).
- agent/CLAUDE.md: step #3 rewritten to describe Session cancel as a
  docker-event subscriber (die/stop/kill/oom/destroy), independent of
  the evict subscriber. Both fire off the same docker bus; neither
  drives the other.
- agent_cert: sanTailFromCert comment updated for tri-state
  ContainerIDFromCert (matches AgentFullNameFromCert).
- registry_test: removed stale duplicate "Add panics" doc paragraph.
- docker/names_test: deleted TestGenerateRandomName_AlwaysValidAsAgentName
  (and the now-unused fmt + auth imports). NewAgentName only rejects
  empty per the package's intentional design.
- project/manager + cmd/project/register + gen-docs tmpl + regen mdx:
  rewrote stale `clawker.yaml::project.name` references to
  `clawker.yaml::name` (root is Project; name is top-level).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes that reframe the 2-segment naming case as a
first-class concept rather than a degraded/error form:

1. Terminology: "unscoped" / "empty project" → "global-scope agent"
   across docs, comments, log messages, and Godoc. The empty
   ProjectSlug is a legitimate signal (no project namespace), not an
   absence — every caller now reads as such.

2. Rename ErrInvalidAgentLabels → ErrInvalidAgentLabel (singular),
   clarifying that a missing dev.clawker.project label is NOT an
   error — it's the global-scope signal. Only the dev.clawker.agent
   label being missing/malformed trips the sentinel. Updates the
   identity interceptor's structured-log event name and all test
   references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two stale doc claims in registry.go:
- Entry.AgentName/Project: previous comment said auth.NewAgentName
  rejects dots/"clawker." prefix/charset; today it only rejects
  empty. Restate the typed-field rationale as compile-time discipline
  and point at the real enforcement layers (Docker create, x509 SAN
  encoding, IdentityInterceptor) plus the upstream
  cmdutil.ProjectSlugify normalization.
- Add/validateEntry: previous comment said Register maps the error
  to codes.InvalidArgument; the handler maps to codes.Internal. Every
  field validateEntry checks is server-derived, so a failure is a CP
  wiring bug, not bad client input — Internal is correct. User-typed
  identity strings are validated upstream at the wire boundary and
  map to InvalidArgument there.

Doc-only; no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@schmitthub schmitthub force-pushed the fix/invalid-name-len branch from d098996 to 7980098 Compare May 18, 2026 01:30
@schmitthub schmitthub merged commit b96ef1c into main May 18, 2026
17 checks passed
@schmitthub schmitthub deleted the fix/invalid-name-len branch May 18, 2026 02:03
schmitthub added a commit that referenced this pull request May 19, 2026
resolveHostID now parses CLAWKER_HOST_UID/GID via ParseUint(_,10,32) so
over-uint32 inputs become a structured "malformed" fallback instead of
silently wrapping when userStage casts to uint32 (closes CodeQL #285,
#286). HostUID()/HostGID() and HostIDResolution.Value return uint32, so
the PipeStage assignment is a total identity at the call site.

Diagnostic event renamed host_uid_unavailable -> host_id_unavailable;
the env field on the record (CLAWKER_HOST_UID or CLAWKER_HOST_GID)
already disambiguates UID vs GID for the operator.

Also collapses the duplicated logHostIdentity doc paragraph and
qualifies the bare HostUID() reference in workspace/setup.go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
schmitthub added a commit that referenced this pull request May 19, 2026
resolveHostID now parses CLAWKER_HOST_UID/GID via ParseUint(_,10,32) so
over-uint32 inputs become a structured "malformed" fallback instead of
silently wrapping when userStage casts to uint32 (closes CodeQL #285,
#286). HostUID()/HostGID() and HostIDResolution.Value return uint32, so
the PipeStage assignment is a total identity at the call site.

Diagnostic event renamed host_uid_unavailable -> host_id_unavailable;
the env field on the record (CLAWKER_HOST_UID or CLAWKER_HOST_GID)
already disambiguates UID vs GID for the operator.

Also collapses the duplicated logHostIdentity doc paragraph and
qualifies the bare HostUID() reference in workspace/setup.go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
schmitthub added a commit that referenced this pull request May 19, 2026
…ic (#291)

* fix(consts): host-derive ContainerUID, add HostUID for CP

PR #269's ~/.claude/projects bind mount silently failed to persist
auto-memory + session jsonls when the host UID was not 1001 — the
agent image's claude user was hardcoded at UID 1001 (Dockerfile.tmpl
useradd) so bind-mount writes from inside the container hit EACCES
on virtually every Linux host (typical UID is 1000). macOS unaffected
(virtiofs translates).

Two scoped accessors in internal/consts/, distinguished by where they
read:

- consts.ContainerUID / ContainerGID: refactored from const 1001 to
  vars initialized from os.Getuid() / os.Getgid(). Rejects 0 (sudo
  invocation) and -1 (Windows); falls back to fallbackContainerUID
  (1001). CLI-side. All existing callers (bundler tar, containerfs
  tar, docker volume copy, deprecated cfg delegates) transparently
  get the host UID with zero call-site changes.

- consts.HostUID / HostGID: new env-fed vars in
  internal/consts/controlplane.go. Read CLAWKER_HOST_UID /
  CLAWKER_HOST_GID via resolveHostUID with the same > 0 guard;
  fallback pinned to fallbackContainerUID (NOT ContainerUID, which
  inside the CP container resolves to 0 — CP runs as root for BPF
  caps, so a missing env would have silently dropped userStage to
  root). Invalid env emits event=host_uid_invalid on stderr so
  operators get a signal in docker logs <cp>.

The CLI populates the env vars on the CP container at boot via
BuildCPContainerConfig (cpboot/cp_container.go), using
consts.ContainerUID / ContainerGID (host invoker's UID in CLI
process). The single CP-side caller — userStage in
internal/controlplane/agent/init.go — switches to HostUID / HostGID
so post-init shell stages drop to the same UID baked into the agent
image.

The workspace UID-mismatch warning at setup.go is deleted (now dead
by construction). Image content hash is host-UID-specific via the
existing Dockerfile rendering, so multi-user hosts naturally get
separate cached images — documented in internal/bundler/CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(consts): UID accessors, structured CP host-id diagnostic, dead Warnings field

Follow-up to the host-derive ContainerUID commit on this branch.
Addresses PR-review findings without changing behavior:

- consts.{Container,Host}{UID,GID} converted from exported mutable
  vars to func() int accessors backed by unexported package-private
  state. Removes the stomp-able exported-mutable-var pattern; all
  callers updated.
- resolveHostID is now side-effect-free, returning (int,
  HostIDResolution). cmd/clawker-cp/main.go adds logHostIdentity
  which emits event=host_uid_unavailable via the project zerolog
  surface (rotating CP logfile) when EnvHostUID/GID came through
  unset or invalid. The earlier package-init fmt.Fprintf to stderr
  is removed.
- internal/workspace setup.go: SetupMountsResult.Warnings field +
  the dead mountWarnings slice are removed; iterating caller and
  warnings-empty asserts go with them.
- TestConstantAccessors no longer hardcodes 1001; UID/GID asserts
  dropped as circular.
- TestCPContainer_HostUIDGIDEnv_Emitted anchored on os.Getuid()
  with the same fallback rule.
- TestLogHostIdentity parses zerolog JSON and asserts level/event/
  env/reason/fallback structurally, not substring grep.
- test/e2e/bind_mount_uid_test.go: new Linux-only e2e exercising
  the full chain end-to-end (skips on darwin — virtiofs would
  false-pass).
- Misleading init-ordering comment on HostUID deleted.
- Comments trimmed across consts, controlplane, cp_container,
  agent/init — drop WHAT, keep WHY.
- Postmortem memory: file:line refs scrubbed; follow-ups 2 + 3
  marked DONE.
- CLAUDE.md docs updated to the func-call form.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(consts): uid_t-typed host id accessors, neutral diagnostic event

resolveHostID now parses CLAWKER_HOST_UID/GID via ParseUint(_,10,32) so
over-uint32 inputs become a structured "malformed" fallback instead of
silently wrapping when userStage casts to uint32 (closes CodeQL #285,
#286). HostUID()/HostGID() and HostIDResolution.Value return uint32, so
the PipeStage assignment is a total identity at the call site.

Diagnostic event renamed host_uid_unavailable -> host_id_unavailable;
the env field on the record (CLAWKER_HOST_UID or CLAWKER_HOST_GID)
already disambiguates UID vs GID for the operator.

Also collapses the duplicated logHostIdentity doc paragraph and
qualifies the bare HostUID() reference in workspace/setup.go.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(consts,bundler): Linux-only host UID/GID, idempotent groupadd

Host-UID/GID derivation is only meaningful on Linux hosts, where the
container's numeric IDs land directly on bind-mount files. Docker
Desktop on macOS (virtiofs / gRPC FUSE) masks UID/GID at the share
boundary — any container UID is presented on the host as the host
user, so baking the host UID into the image gains nothing and forces
groupadd to claim a GID that often collides with a base-image group
(macOS GID 20 = staff vs Debian dialout). resolveProcessID now
returns fallbackContainerUID/GID on non-Linux hosts, restoring the
pre-fix 1001 path on macOS where virtiofs already handles ownership
translation.

The Dockerfile useradd block is also made idempotent so that an
edge-case Linux host whose UID/GID happens to overlap a base-image
group (low GIDs assigned to system groups) still builds: groupadd at
the host GID falls back to an auto-assigned GID for ${USERNAME},
preserving UPG. The bind-mount writability contract holds on UID
match alone — GID match is best-effort.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(serena): postmortem reflects HEAD, soft-closed pending alpha

Append Phase 9 (uid_t-typed accessors + neutral host_id_unavailable
event, ce98cbe), Phase 10 (Linux-only resolveProcessID + idempotent
groupadd, 1cc857c), and Phase 11 (soft-close hand-off). Top-of-file
status banner marks the work complete pending alpha-release verification
on a Linux host — the Linux-only TestBindMountUID_E2E and the Phase 7
host-side manual sequence are the only gates remaining before
hard-close.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test,docs): mirror Linux-only host-ID fallback in test + CLAUDE.md

TestCPContainer_HostUIDGIDEnv_Emitted computed wantUID/wantGID
directly from os.Getuid()/os.Getgid() with only a `<= 0` fallback,
diverging from consts.resolveProcessID's `runtime.GOOS != "linux" =>
1001` short-circuit added in 1cc857c. On macOS where os.Getuid() is
typically 501, the test would assert CLAWKER_HOST_UID=501 while the
production resolver emits 1001 — hard fail on every macOS run. Mirror
the production fallback exactly: branch on runtime.GOOS, then apply
the `> 0` Linux refinement.

internal/config/CLAUDE.md and internal/bundler/CLAUDE.md both
described ContainerUID/GID as Linux-style host-derived without
qualifying the Linux-only gate. Updated wording to call out the
non-Linux unconditional-fallback path with the virtiofs / groupadd
collision rationale so a future reader doesn't try to "fix" the
gate as if it were a bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(plugin): drop UID-mismatch known-issue, add monitoring ref, bump 0.10.0

clawker-support
- known-issues.md: removed the "~/.claude/projects bind mount Linux
  UID mismatch" entry; fix landed (host-derived ContainerUID on Linux
  via 1cc857c). Workaround block obsolete.
- monitoring.md (new): OTel + OpenSearch + Prometheus stack overview,
  Clawker analytics workspace navigation, telemetry env var contract
  (build-time bake vs runtime enable), troubleshooting for empty
  indices / failed bootstrap / Prometheus targets / cross-container
  diagnostics. Routed from SKILL.md step 12 and troubleshooting.md.
- SKILL.md / troubleshooting.md: route monitoring questions to the
  new reference; old "punt to docs.clawker.dev" step kept for other
  features (worktrees, etc.).
- plugin.json: 0.9.1 -> 0.10.0 (minor — new reference file,
  user-visible structural change, plus the cd84d2e / 1cc857c
  Dockerfile.tmpl idempotent groupadd block from the UID branch).

docs.clawker.dev (driveby from fix/uid PR)
- container-internals.mdx: drop literal "UID 1001" from the claude
  user description; the Privilege Model table now says "claude
  (host-derived on Linux, 1001 elsewhere)". The literal was wrong
  on every non-1001 Linux host post-fix.
- threat-model.mdx, security.mdx: same hedge — describe the user as
  unprivileged + UID baked at build time with Linux-host-derived
  semantics, instead of asserting a stale literal value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(consts): drop os.Unsetenv from TestResolveHostID

resolveHostID reads env via os.Getenv, which returns "" for both
unset and set-to-"". The "unset" and "empty" cases exercised the
same code path; "unset" used os.Unsetenv which doesn't restore a
prior value at test end — a theoretical cross-test leak if the
probe env were set in the parent shell. Drop the "unset" case +
envSet branching; all cases now drive the env through t.Setenv,
which restores the prior value via t.Cleanup automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants