From 1615f709222de5d244b796284b59620e4a853e9c Mon Sep 17 00:00:00 2001 From: Luther Monson Date: Sat, 16 May 2026 07:44:33 -0700 Subject: [PATCH 1/2] fix(dind): clean per-job containerd state + add per-repo image cache MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related problems landed together because they're tightly coupled: the cleanup work would force a full re-pull of kindest/node (~1 GB) per job without the cache, and the cache trusts cleanup never touches its long-lived namespaces. Problem 1: per-job state leak dind.Server.Stop() destroyed in-memory tracked containers but left the underlying containerd per-job namespace (ephemerd-dind-) populated with Image records, leases, snapshots and content blobs. Over ~2 days we accumulated 73 leaked namespaces holding ~98 GB on a 100 GB VHDX, blocking new jobs with "no space left on device". Fix: pkg/dind/cleanup.go's CleanupJobNamespace enumerates and removes containers (with WithSnapshotCleanup), Image records (drops gc.ref labels), leases, then snapshots in leaf-first multi-pass order (containerd refuses to delete a snapshot with children), then content blobs, then the namespace metadata bucket itself. A 3-attempt async-GC-catchup retry handles transient FailedPrecondition results from containerd's eventually-consistent state. On boot, worker mode runs CleanupStaleDindNamespaces to sweep anything left behind by ungraceful exits (DeadlineExceeded, SIGKILL, host reboot). Verified live: 4 consecutive ephpm E2E jobs across two parallel runners exit-and-clean with zero leftover namespaces, VHDX growth bounded at ~59 GB after extensive testing vs the pre-fix unbounded growth. Problem 2: per-repo image cache to avoid the re-download tax Without it, every job would re-pull kindest/node and any other dind image because the Image records get deleted in step 2 of cleanup above, dropping the gc.refs on the underlying content blobs. Fix: pkg/dind/cache.go introduces a per-(provider, repo) long-lived namespace at ephemerd-dind-cache--. On image pull or container create that picked up a previously-pulled image, dind mirrors the Image record into the cache namespace (or refreshes its ephemerd.io/last-accessed label if already there). The cache's gc.refs keep the content blobs alive after the per-job namespace is cleaned up, so subsequent jobs in the same repo get a content-store hit instead of a network pull. Privacy boundary: containerd namespace isolation means a content blob referenced only from `dind-cache-foo-private`'s Image records is invisible to any other namespace's resolver. Two forges with same-named repos (e.g. github/ephpm vs gitea/ephpm) get distinct cache namespaces; two repos within a forge get distinct caches keyed by full owner/repo. Provider + Repo plumb through CreateJobRequest → runtime.CreateConfig → dind.Config so the cache namespace is derived from the dispatching forge, not parsed from the runner name. Cache pruner (cmd/ephemerd/main.go): a goroutine started in worker mode walks every dind-cache-* namespace every [dind].cache_prune_interval (default 24h) and evicts Image records whose last-accessed label is older than [dind].cache_max_age (default 168h / 7 days). Empty cache namespaces are removed entirely. Records pre-dating the label fall back to UpdatedAt so a deploy of this change doesn't nuke existing caches on first prune. Tests (all green locally with shared in-process containerd): - TestCleanup_DindNamespaces — image + lease + namespace removal - TestCleanup_DindNamespaces/StaleSweep — prefix filter, doesn't touch cache namespaces or non-dind namespaces - TestCacheNamespace_FormatAndIsolation — cross-provider + nested-repo sanitization, empty-input handling - TestSanitizeForNamespace_CollapsesAndTrims — no leading/trailing separator, no consecutive separators - TestCache_MirrorAndPrune — full mirror → refresh → backdate → prune → empty-namespace cleanup lifecycle - TestCachePrune_KeepsFreshAndPrefixedOnly — fresh records survive, non-cache namespaces untouched - TestPushHandlerEndToEnd still passes against the shared containerd (rewired to sharedTestContainerd because containerd's prometheus metrics use a process-global registry — two containerdpkg.New() in one test binary panics). --- api/v1/ephemerd.pb.go | 132 ++++++++------ api/v1/ephemerd.proto | 5 + cmd/ephemerd/main.go | 47 +++++ go.mod | 2 +- pkg/config/config.go | 31 ++++ pkg/dind/cache.go | 237 +++++++++++++++++++++++++ pkg/dind/cache_test.go | 239 +++++++++++++++++++++++++ pkg/dind/cleanup.go | 302 ++++++++++++++++++++++++++++++++ pkg/dind/cleanup_test.go | 135 ++++++++++++++ pkg/dind/containers.go | 17 ++ pkg/dind/dind.go | 102 ++++++++--- pkg/dind/registry_e2e_test.go | 28 +-- pkg/dind/testcontainerd_test.go | 83 +++++++++ pkg/runtime/runtime.go | 12 ++ pkg/scheduler/dispatch.go | 15 +- pkg/scheduler/dispatch_test.go | 10 +- pkg/scheduler/scheduler.go | 2 +- 17 files changed, 1296 insertions(+), 103 deletions(-) create mode 100644 pkg/dind/cache.go create mode 100644 pkg/dind/cache_test.go create mode 100644 pkg/dind/cleanup.go create mode 100644 pkg/dind/cleanup_test.go create mode 100644 pkg/dind/testcontainerd_test.go diff --git a/api/v1/ephemerd.pb.go b/api/v1/ephemerd.pb.go index a03b54e..f6a962e 100644 --- a/api/v1/ephemerd.pb.go +++ b/api/v1/ephemerd.pb.go @@ -580,6 +580,11 @@ type CreateJobRequest struct { Id string `protobuf:"bytes,1,opt,name=id,proto3" json:"id,omitempty"` Image string `protobuf:"bytes,2,opt,name=image,proto3" json:"image,omitempty"` JitConfig string `protobuf:"bytes,3,opt,name=jit_config,json=jitConfig,proto3" json:"jit_config,omitempty"` + // provider is the forge name (e.g. "github", "gitea"); together with + // repo it scopes dind's per-repo image cache. + Provider string `protobuf:"bytes,4,opt,name=provider,proto3" json:"provider,omitempty"` + // repo is the forge-native repo path (e.g. "owner/repo"). + Repo string `protobuf:"bytes,5,opt,name=repo,proto3" json:"repo,omitempty"` } func (x *CreateJobRequest) Reset() { @@ -635,6 +640,20 @@ func (x *CreateJobRequest) GetJitConfig() string { return "" } +func (x *CreateJobRequest) GetProvider() string { + if x != nil { + return x.Provider + } + return "" +} + +func (x *CreateJobRequest) GetRepo() string { + if x != nil { + return x.Repo + } + return "" +} + type CreateJobResponse struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache @@ -898,62 +917,65 @@ var file_api_v1_ephemerd_proto_rawDesc = []byte{ 0x6f, 0x77, 0x18, 0x02, 0x20, 0x01, 0x28, 0x08, 0x52, 0x06, 0x66, 0x6f, 0x6c, 0x6c, 0x6f, 0x77, 0x22, 0x1e, 0x0a, 0x08, 0x4c, 0x6f, 0x67, 0x43, 0x68, 0x75, 0x6e, 0x6b, 0x12, 0x12, 0x0a, 0x04, 0x64, 0x61, 0x74, 0x61, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x04, 0x64, 0x61, 0x74, 0x61, - 0x22, 0x57, 0x0a, 0x10, 0x43, 0x72, 0x65, 0x61, 0x74, 0x65, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x71, - 0x75, 0x65, 0x73, 0x74, 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, - 0x52, 0x02, 0x69, 0x64, 0x12, 0x14, 0x0a, 0x05, 0x69, 0x6d, 0x61, 0x67, 0x65, 0x18, 0x02, 0x20, - 0x01, 0x28, 0x09, 0x52, 0x05, 0x69, 0x6d, 0x61, 0x67, 0x65, 0x12, 0x1d, 0x0a, 0x0a, 0x6a, 0x69, - 0x74, 0x5f, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x09, - 0x6a, 0x69, 0x74, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x22, 0x13, 0x0a, 0x11, 0x43, 0x72, 0x65, - 0x61, 0x74, 0x65, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, 0x20, - 0x0a, 0x0e, 0x57, 0x61, 0x69, 0x74, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, - 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x02, 0x69, 0x64, - 0x22, 0x2e, 0x0a, 0x0f, 0x57, 0x61, 0x69, 0x74, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x73, 0x70, 0x6f, - 0x6e, 0x73, 0x65, 0x12, 0x1b, 0x0a, 0x09, 0x65, 0x78, 0x69, 0x74, 0x5f, 0x63, 0x6f, 0x64, 0x65, - 0x18, 0x01, 0x20, 0x01, 0x28, 0x0d, 0x52, 0x08, 0x65, 0x78, 0x69, 0x74, 0x43, 0x6f, 0x64, 0x65, - 0x22, 0x23, 0x0a, 0x11, 0x44, 0x65, 0x73, 0x74, 0x72, 0x6f, 0x79, 0x4a, 0x6f, 0x62, 0x52, 0x65, + 0x22, 0x87, 0x01, 0x0a, 0x10, 0x43, 0x72, 0x65, 0x61, 0x74, 0x65, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, - 0x09, 0x52, 0x02, 0x69, 0x64, 0x22, 0x14, 0x0a, 0x12, 0x44, 0x65, 0x73, 0x74, 0x72, 0x6f, 0x79, - 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x32, 0xda, 0x02, 0x0a, 0x07, - 0x43, 0x6f, 0x6e, 0x74, 0x72, 0x6f, 0x6c, 0x12, 0x41, 0x0a, 0x06, 0x53, 0x74, 0x61, 0x74, 0x75, - 0x73, 0x12, 0x1a, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, - 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x1b, 0x2e, - 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, 0x74, - 0x75, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x47, 0x0a, 0x08, 0x4c, 0x69, - 0x73, 0x74, 0x4a, 0x6f, 0x62, 0x73, 0x12, 0x1c, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, - 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x69, 0x73, 0x74, 0x4a, 0x6f, 0x62, 0x73, 0x52, 0x65, 0x71, - 0x75, 0x65, 0x73, 0x74, 0x1a, 0x1d, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, - 0x76, 0x31, 0x2e, 0x4c, 0x69, 0x73, 0x74, 0x4a, 0x6f, 0x62, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, - 0x6e, 0x73, 0x65, 0x12, 0x36, 0x0a, 0x06, 0x47, 0x65, 0x74, 0x4a, 0x6f, 0x62, 0x12, 0x1a, 0x2e, - 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x47, 0x65, 0x74, 0x4a, - 0x6f, 0x62, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x10, 0x2e, 0x65, 0x70, 0x68, 0x65, - 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x4a, 0x6f, 0x62, 0x12, 0x44, 0x0a, 0x07, 0x4b, - 0x69, 0x6c, 0x6c, 0x4a, 0x6f, 0x62, 0x12, 0x1b, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, - 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x4b, 0x69, 0x6c, 0x6c, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x71, 0x75, - 0x65, 0x73, 0x74, 0x1a, 0x1c, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, - 0x31, 0x2e, 0x4b, 0x69, 0x6c, 0x6c, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, - 0x65, 0x12, 0x45, 0x0a, 0x0a, 0x47, 0x65, 0x74, 0x4a, 0x6f, 0x62, 0x4c, 0x6f, 0x67, 0x73, 0x12, - 0x1e, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x47, 0x65, - 0x74, 0x4a, 0x6f, 0x62, 0x4c, 0x6f, 0x67, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, - 0x15, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x6f, - 0x67, 0x43, 0x68, 0x75, 0x6e, 0x6b, 0x30, 0x01, 0x32, 0xeb, 0x01, 0x0a, 0x08, 0x44, 0x69, 0x73, - 0x70, 0x61, 0x74, 0x63, 0x68, 0x12, 0x4a, 0x0a, 0x09, 0x43, 0x72, 0x65, 0x61, 0x74, 0x65, 0x4a, - 0x6f, 0x62, 0x12, 0x1d, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, - 0x2e, 0x43, 0x72, 0x65, 0x61, 0x74, 0x65, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, - 0x74, 0x1a, 0x1e, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, - 0x43, 0x72, 0x65, 0x61, 0x74, 0x65, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, - 0x65, 0x12, 0x44, 0x0a, 0x07, 0x57, 0x61, 0x69, 0x74, 0x4a, 0x6f, 0x62, 0x12, 0x1b, 0x2e, 0x65, - 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x57, 0x61, 0x69, 0x74, 0x4a, - 0x6f, 0x62, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x1c, 0x2e, 0x65, 0x70, 0x68, 0x65, - 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x57, 0x61, 0x69, 0x74, 0x4a, 0x6f, 0x62, 0x52, - 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x4d, 0x0a, 0x0a, 0x44, 0x65, 0x73, 0x74, 0x72, - 0x6f, 0x79, 0x4a, 0x6f, 0x62, 0x12, 0x1e, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, - 0x2e, 0x76, 0x31, 0x2e, 0x44, 0x65, 0x73, 0x74, 0x72, 0x6f, 0x79, 0x4a, 0x6f, 0x62, 0x52, 0x65, - 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x1f, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, - 0x2e, 0x76, 0x31, 0x2e, 0x44, 0x65, 0x73, 0x74, 0x72, 0x6f, 0x79, 0x4a, 0x6f, 0x62, 0x52, 0x65, - 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x42, 0x28, 0x5a, 0x26, 0x67, 0x69, 0x74, 0x68, 0x75, 0x62, - 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x65, 0x70, 0x68, 0x70, 0x6d, 0x2f, 0x65, 0x70, 0x68, 0x65, 0x6d, - 0x65, 0x72, 0x64, 0x2f, 0x61, 0x70, 0x69, 0x2f, 0x76, 0x31, 0x3b, 0x61, 0x70, 0x69, 0x76, 0x31, - 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, + 0x09, 0x52, 0x02, 0x69, 0x64, 0x12, 0x14, 0x0a, 0x05, 0x69, 0x6d, 0x61, 0x67, 0x65, 0x18, 0x02, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x69, 0x6d, 0x61, 0x67, 0x65, 0x12, 0x1d, 0x0a, 0x0a, 0x6a, + 0x69, 0x74, 0x5f, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, + 0x09, 0x6a, 0x69, 0x74, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x12, 0x1a, 0x0a, 0x08, 0x70, 0x72, + 0x6f, 0x76, 0x69, 0x64, 0x65, 0x72, 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, 0x52, 0x08, 0x70, 0x72, + 0x6f, 0x76, 0x69, 0x64, 0x65, 0x72, 0x12, 0x12, 0x0a, 0x04, 0x72, 0x65, 0x70, 0x6f, 0x18, 0x05, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x72, 0x65, 0x70, 0x6f, 0x22, 0x13, 0x0a, 0x11, 0x43, 0x72, + 0x65, 0x61, 0x74, 0x65, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x22, + 0x20, 0x0a, 0x0e, 0x57, 0x61, 0x69, 0x74, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, + 0x74, 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x02, 0x69, + 0x64, 0x22, 0x2e, 0x0a, 0x0f, 0x57, 0x61, 0x69, 0x74, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x73, 0x70, + 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x1b, 0x0a, 0x09, 0x65, 0x78, 0x69, 0x74, 0x5f, 0x63, 0x6f, 0x64, + 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0d, 0x52, 0x08, 0x65, 0x78, 0x69, 0x74, 0x43, 0x6f, 0x64, + 0x65, 0x22, 0x23, 0x0a, 0x11, 0x44, 0x65, 0x73, 0x74, 0x72, 0x6f, 0x79, 0x4a, 0x6f, 0x62, 0x52, + 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x64, 0x18, 0x01, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x02, 0x69, 0x64, 0x22, 0x14, 0x0a, 0x12, 0x44, 0x65, 0x73, 0x74, 0x72, 0x6f, + 0x79, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x32, 0xda, 0x02, 0x0a, + 0x07, 0x43, 0x6f, 0x6e, 0x74, 0x72, 0x6f, 0x6c, 0x12, 0x41, 0x0a, 0x06, 0x53, 0x74, 0x61, 0x74, + 0x75, 0x73, 0x12, 0x1a, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, + 0x2e, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x1b, + 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x53, 0x74, 0x61, + 0x74, 0x75, 0x73, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x47, 0x0a, 0x08, 0x4c, + 0x69, 0x73, 0x74, 0x4a, 0x6f, 0x62, 0x73, 0x12, 0x1c, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, + 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x69, 0x73, 0x74, 0x4a, 0x6f, 0x62, 0x73, 0x52, 0x65, + 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x1d, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, + 0x2e, 0x76, 0x31, 0x2e, 0x4c, 0x69, 0x73, 0x74, 0x4a, 0x6f, 0x62, 0x73, 0x52, 0x65, 0x73, 0x70, + 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x36, 0x0a, 0x06, 0x47, 0x65, 0x74, 0x4a, 0x6f, 0x62, 0x12, 0x1a, + 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x47, 0x65, 0x74, + 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x10, 0x2e, 0x65, 0x70, 0x68, + 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x4a, 0x6f, 0x62, 0x12, 0x44, 0x0a, 0x07, + 0x4b, 0x69, 0x6c, 0x6c, 0x4a, 0x6f, 0x62, 0x12, 0x1b, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, + 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x4b, 0x69, 0x6c, 0x6c, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x71, + 0x75, 0x65, 0x73, 0x74, 0x1a, 0x1c, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, + 0x76, 0x31, 0x2e, 0x4b, 0x69, 0x6c, 0x6c, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, + 0x73, 0x65, 0x12, 0x45, 0x0a, 0x0a, 0x47, 0x65, 0x74, 0x4a, 0x6f, 0x62, 0x4c, 0x6f, 0x67, 0x73, + 0x12, 0x1e, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x47, + 0x65, 0x74, 0x4a, 0x6f, 0x62, 0x4c, 0x6f, 0x67, 0x73, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, + 0x1a, 0x15, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x4c, + 0x6f, 0x67, 0x43, 0x68, 0x75, 0x6e, 0x6b, 0x30, 0x01, 0x32, 0xeb, 0x01, 0x0a, 0x08, 0x44, 0x69, + 0x73, 0x70, 0x61, 0x74, 0x63, 0x68, 0x12, 0x4a, 0x0a, 0x09, 0x43, 0x72, 0x65, 0x61, 0x74, 0x65, + 0x4a, 0x6f, 0x62, 0x12, 0x1d, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, + 0x31, 0x2e, 0x43, 0x72, 0x65, 0x61, 0x74, 0x65, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x71, 0x75, 0x65, + 0x73, 0x74, 0x1a, 0x1e, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, + 0x2e, 0x43, 0x72, 0x65, 0x61, 0x74, 0x65, 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, + 0x73, 0x65, 0x12, 0x44, 0x0a, 0x07, 0x57, 0x61, 0x69, 0x74, 0x4a, 0x6f, 0x62, 0x12, 0x1b, 0x2e, + 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x57, 0x61, 0x69, 0x74, + 0x4a, 0x6f, 0x62, 0x52, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x1c, 0x2e, 0x65, 0x70, 0x68, + 0x65, 0x6d, 0x65, 0x72, 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x57, 0x61, 0x69, 0x74, 0x4a, 0x6f, 0x62, + 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x4d, 0x0a, 0x0a, 0x44, 0x65, 0x73, 0x74, + 0x72, 0x6f, 0x79, 0x4a, 0x6f, 0x62, 0x12, 0x1e, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, + 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x44, 0x65, 0x73, 0x74, 0x72, 0x6f, 0x79, 0x4a, 0x6f, 0x62, 0x52, + 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x1f, 0x2e, 0x65, 0x70, 0x68, 0x65, 0x6d, 0x65, 0x72, + 0x64, 0x2e, 0x76, 0x31, 0x2e, 0x44, 0x65, 0x73, 0x74, 0x72, 0x6f, 0x79, 0x4a, 0x6f, 0x62, 0x52, + 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x42, 0x28, 0x5a, 0x26, 0x67, 0x69, 0x74, 0x68, 0x75, + 0x62, 0x2e, 0x63, 0x6f, 0x6d, 0x2f, 0x65, 0x70, 0x68, 0x70, 0x6d, 0x2f, 0x65, 0x70, 0x68, 0x65, + 0x6d, 0x65, 0x72, 0x64, 0x2f, 0x61, 0x70, 0x69, 0x2f, 0x76, 0x31, 0x3b, 0x61, 0x70, 0x69, 0x76, + 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, } var ( diff --git a/api/v1/ephemerd.proto b/api/v1/ephemerd.proto index dc94844..9350aa1 100644 --- a/api/v1/ephemerd.proto +++ b/api/v1/ephemerd.proto @@ -72,6 +72,11 @@ message CreateJobRequest { string id = 1; string image = 2; string jit_config = 3; + // provider is the forge name (e.g. "github", "gitea"); together with + // repo it scopes dind's per-repo image cache. + string provider = 4; + // repo is the forge-native repo path (e.g. "owner/repo"). + string repo = 5; } message CreateJobResponse {} diff --git a/cmd/ephemerd/main.go b/cmd/ephemerd/main.go index 21471d1..8567d4e 100644 --- a/cmd/ephemerd/main.go +++ b/cmd/ephemerd/main.go @@ -11,12 +11,14 @@ import ( "syscall" "time" + containerdclient "github.com/containerd/containerd/v2/client" apiv1 "github.com/ephpm/ephemerd/api/v1" "github.com/ephpm/ephemerd/pkg/artifacts" "github.com/ephpm/ephemerd/pkg/buildkit" "github.com/ephpm/ephemerd/pkg/cni" "github.com/ephpm/ephemerd/pkg/config" "github.com/ephpm/ephemerd/pkg/containerd" + "github.com/ephpm/ephemerd/pkg/dind" "github.com/ephpm/ephemerd/pkg/github" "github.com/ephpm/ephemerd/pkg/metrics" "github.com/ephpm/ephemerd/pkg/networking" @@ -288,6 +290,27 @@ func serve(ctx context.Context, configFile, imagesDirFlag string, containerdTCPP log.Warn("failed to clean orphan containers", "error", err) } + // Clean up dind per-job namespaces left by jobs that didn't shut + // down cleanly on the previous boot (DeadlineExceeded, SIGKILL, + // host reboot, etc.). Server.Stop's CleanupJobNamespace covers the + // graceful path; this catches everything else. Without this, every + // ungraceful exit accumulates ~1 GB of pinned image content and the + // namespace metadata bucket — we observed 73 leaked namespaces on + // a host that filled its 100 GB VHDX over a couple of days. + cleanupCtx, cancelCleanup := context.WithTimeout(ctx, 5*time.Minute) + dind.CleanupStaleDindNamespaces(cleanupCtx, rt.Client(), log) + cancelCleanup() + + // Periodic per-repo image cache pruner. Each cache namespace + // (ephemerd-dind-cache--) is scanned every + // CachePruneInterval, and any image record whose last-accessed + // label is older than CacheMaxAge gets dropped — containerd's + // content GC reclaims the unreferenced blobs. Empty cache + // namespaces get removed entirely. + if interval := cfg.Dind.DindCachePruneInterval(); interval > 0 && cfg.Dind.DindCacheMaxAge() > 0 { + go runDindCachePruner(ctx, rt.Client(), interval, cfg.Dind.DindCacheMaxAge(), log) + } + dispatchPort := int(containerdTCPPort) + 1 dispatchCleanup := scheduler.StartDispatchServer(dispatchPort, rt, log) defer dispatchCleanup() @@ -671,6 +694,30 @@ func initProviders(cfg *config.Config, log *slog.Logger) ([]providers.Provider, } // pollInterval returns the poll interval for the configured provider. +// runDindCachePruner runs the per-repo image cache pruner on a fixed +// interval until ctx is canceled. Called in worker mode so each Linux VM +// keeps its dind image cache bounded. Errors from a single pass are +// logged and the loop continues — the next tick retries. +func runDindCachePruner(ctx context.Context, c *containerdclient.Client, interval, maxAge time.Duration, log *slog.Logger) { + log = log.With("component", "dind-cache-pruner", "interval", interval, "max_age", maxAge) + log.Info("starting dind cache pruner") + ticker := time.NewTicker(interval) + defer ticker.Stop() + for { + select { + case <-ctx.Done(): + log.Info("dind cache pruner stopping") + return + case <-ticker.C: + passCtx, cancel := context.WithTimeout(ctx, 5*time.Minute) + if err := dind.CachePrune(passCtx, c, maxAge, log); err != nil { + log.Warn("dind cache prune pass failed", "error", err) + } + cancel() + } + } +} + func pollInterval(cfg *config.Config) time.Duration { switch cfg.Provider() { case "github": diff --git a/go.mod b/go.mod index 1e5b4c6..a9347dc 100644 --- a/go.mod +++ b/go.mod @@ -8,6 +8,7 @@ require ( github.com/Microsoft/go-winio v0.6.2 github.com/Microsoft/hcsshim v0.14.0-rc.1 github.com/containerd/containerd/v2 v2.2.2 + github.com/containerd/errdefs v1.0.0 github.com/containerd/go-cni v1.1.13 github.com/containerd/platforms v1.0.0-rc.2 github.com/golang/protobuf v1.5.4 @@ -58,7 +59,6 @@ require ( github.com/containerd/console v1.0.5 // indirect github.com/containerd/containerd/api v1.10.0 // indirect github.com/containerd/continuity v0.4.5 // indirect - github.com/containerd/errdefs v1.0.0 // indirect github.com/containerd/errdefs/pkg v0.3.0 // indirect github.com/containerd/fifo v1.1.0 // indirect github.com/containerd/go-runc v1.1.0 // indirect diff --git a/pkg/config/config.go b/pkg/config/config.go index 4f6a01d..0a6e295 100644 --- a/pkg/config/config.go +++ b/pkg/config/config.go @@ -100,6 +100,37 @@ type ContainerdConfig struct { // DindConfig configures the fake Docker daemon mounted into job containers. type DindConfig struct { Enabled bool `toml:"enabled"` // mount /var/run/docker.sock with a fake Docker API + + // CachePruneInterval is how often the per-repo image cache pruner runs. + // Accepts standard Go duration strings ("24h", "30m"). Set to 0 to + // disable pruning entirely. Default 24h. + CachePruneInterval time.Duration `toml:"cache_prune_interval"` + + // CacheMaxAge is the eviction threshold for cached image records: + // any record whose ephemerd.io/last-accessed label (or UpdatedAt as + // fallback) is older than this gets removed on the next prune pass. + // Containerd's content GC then reclaims the unreferenced blobs. + // Set to 0 to disable eviction (only empty-namespace cleanup runs). + // Default 168h (7 days). + CacheMaxAge time.Duration `toml:"cache_max_age"` +} + +// DindCachePruneInterval returns the prune interval with the default +// applied when unset (or set to 0). +func (d *DindConfig) DindCachePruneInterval() time.Duration { + if d.CachePruneInterval == 0 { + return 24 * time.Hour + } + return d.CachePruneInterval +} + +// DindCacheMaxAge returns the eviction threshold with the default applied +// when unset (or set to 0). +func (d *DindConfig) DindCacheMaxAge() time.Duration { + if d.CacheMaxAge == 0 { + return 7 * 24 * time.Hour + } + return d.CacheMaxAge } // ModuleProxyConfig configures the Go module caching proxy. diff --git a/pkg/dind/cache.go b/pkg/dind/cache.go new file mode 100644 index 0000000..8f5b36d --- /dev/null +++ b/pkg/dind/cache.go @@ -0,0 +1,237 @@ +package dind + +import ( + "context" + "fmt" + "log/slog" + "strings" + "time" + + "github.com/containerd/containerd/v2/client" + "github.com/containerd/containerd/v2/core/images" + "github.com/containerd/containerd/v2/pkg/namespaces" + "github.com/containerd/errdefs" +) + +// DindCacheNamespacePrefix prefixes every per-repo image cache namespace. +// +// Full namespace name format: +// +// ephemerd-dind-cache-- +// +// Examples: +// +// ephemerd-dind-cache-github-ephpm_ephpm +// ephemerd-dind-cache-gitea-ephpm_ephpm (distinct from the github one) +// ephemerd-dind-cache-gitlab-acme_platform_api (nested GitLab groups OK) +// +// Provider + repo together form the privacy boundary: two different forges +// with same-named repos do NOT share a cache, and two different orgs on the +// same forge get separate caches keyed by the full `owner/repo` path. +const DindCacheNamespacePrefix = "ephemerd-dind-cache-" + +// LastAccessedLabel records the most recent time an Image record in a cache +// namespace was touched (pull or container-create). The pruner uses this +// for LRU eviction. RFC3339-formatted, UTC. +const LastAccessedLabel = "ephemerd.io/last-accessed" + +// CacheNamespace returns the containerd namespace name used to cache image +// metadata for a given (provider, repo) pair. Both inputs are sanitized so +// the result is always a valid containerd namespace identifier (regex: +// ^[A-Za-z0-9]+(?:[._-]+[A-Za-z0-9]+)*$). +// +// Provider should be the value from providers.Provider.Name() (e.g. +// "github", "gitea"). Repo is the forge-native repo path (e.g. +// "owner/repo" on GitHub or "group/subgroup/project" on GitLab); path +// separators are mapped to underscores so the namespace identifier stays +// valid. Empty provider or repo returns "" — callers should treat that as +// "caching disabled for this job". +func CacheNamespace(provider, repo string) string { + provider = sanitizeForNamespace(provider) + repo = sanitizeForNamespace(repo) + if provider == "" || repo == "" { + return "" + } + return DindCacheNamespacePrefix + provider + "-" + repo +} + +// sanitizeForNamespace replaces every character that's not allowed in a +// containerd namespace identifier with an underscore, then collapses runs +// of underscores and trims leading/trailing ones. Containerd allows +// alphanumerics with `_`, `-`, `.` between them. +func sanitizeForNamespace(s string) string { + if s == "" { + return "" + } + out := make([]byte, 0, len(s)) + for i := 0; i < len(s); i++ { + c := s[i] + switch { + case c >= 'a' && c <= 'z', + c >= 'A' && c <= 'Z', + c >= '0' && c <= '9', + c == '-', c == '.': + out = append(out, c) + default: + out = append(out, '_') + } + } + // Collapse repeated separators and trim leading/trailing ones so + // containerd's regex (which forbids consecutive separators outside + // alphanumeric runs) accepts the result. + collapsed := make([]byte, 0, len(out)) + var prev byte + for _, c := range out { + if (c == '_' || c == '-' || c == '.') && (prev == '_' || prev == '-' || prev == '.') { + continue + } + collapsed = append(collapsed, c) + prev = c + } + return strings.Trim(string(collapsed), "_-.") +} + +// MirrorImageToCache copies an Image record from the per-job namespace into +// the per-repo cache namespace (creating it if needed), refreshing the +// LastAccessedLabel on the cache record. The underlying content blobs are +// already in the global content store from the original pull; this only +// adds metadata so the cache record's gc.ref labels keep the content alive +// after the per-job namespace is cleaned up. +// +// Returns nil if the cache namespace name is empty (no provider/repo set). +func MirrorImageToCache(ctx context.Context, c *client.Client, jobNS, cacheNS, imageName string, log *slog.Logger) error { + if c == nil || cacheNS == "" || imageName == "" { + return nil + } + jobCtx := namespaces.WithNamespace(ctx, jobNS) + jobImg, err := c.ImageService().Get(jobCtx, imageName) + if err != nil { + return fmt.Errorf("get image %q in %s: %w", imageName, jobNS, err) + } + + cacheCtx := namespaces.WithNamespace(ctx, cacheNS) + now := time.Now().UTC().Format(time.RFC3339) + if jobImg.Labels == nil { + jobImg.Labels = map[string]string{} + } + jobImg.Labels[LastAccessedLabel] = now + + // Try Create first. If the image already exists in the cache (re-pull + // of an already-cached tag), Create returns AlreadyExists and we + // Update the existing record instead so the LastAccessedLabel refresh + // takes effect. + if _, cerr := c.ImageService().Create(cacheCtx, jobImg); cerr != nil { + if !errdefs.IsAlreadyExists(cerr) { + return fmt.Errorf("create image %q in %s: %w", imageName, cacheNS, cerr) + } + if _, uerr := c.ImageService().Update(cacheCtx, jobImg, "labels", "target"); uerr != nil { + return fmt.Errorf("update image %q in %s: %w", imageName, cacheNS, uerr) + } + } + log.Debug("dind cache: mirrored image", "image", imageName, "cache", cacheNS) + return nil +} + +// RefreshLastAccessed bumps the LastAccessedLabel on a cached image. Called +// from the container-create path when a job references an image that's +// already in the cache (no pull happens, but the image is in use). Silently +// no-ops if the image isn't in the cache. +func RefreshLastAccessed(ctx context.Context, c *client.Client, cacheNS, imageName string, log *slog.Logger) { + if c == nil || cacheNS == "" || imageName == "" { + return + } + cacheCtx := namespaces.WithNamespace(ctx, cacheNS) + img, err := c.ImageService().Get(cacheCtx, imageName) + if err != nil { + if !errdefs.IsNotFound(err) { + log.Debug("dind cache: refresh get", "image", imageName, "cache", cacheNS, "error", err) + } + return + } + if img.Labels == nil { + img.Labels = map[string]string{} + } + img.Labels[LastAccessedLabel] = time.Now().UTC().Format(time.RFC3339) + if _, err := c.ImageService().Update(cacheCtx, img, "labels"); err != nil { + log.Debug("dind cache: refresh update", "image", imageName, "cache", cacheNS, "error", err) + } +} + +// CachePrune walks every per-repo cache namespace and evicts Image records +// whose LastAccessedLabel (or CreatedAt fallback for records pre-dating the +// label) is older than maxAge. Empty cache namespaces are deleted entirely. +// Containerd's content GC reclaims the unreferenced blobs after this runs. +// +// Returns nil and logs warnings on partial failures — the next pass will +// retry whatever didn't clean up this time. +func CachePrune(ctx context.Context, c *client.Client, maxAge time.Duration, log *slog.Logger) error { + if c == nil || maxAge <= 0 { + return nil + } + all, err := c.NamespaceService().List(ctx) + if err != nil { + return fmt.Errorf("list namespaces: %w", err) + } + + cutoff := time.Now().UTC().Add(-maxAge) + totalEvicted := 0 + namespacesPruned := 0 + + for _, ns := range all { + if !strings.HasPrefix(ns, DindCacheNamespacePrefix) { + continue + } + nsCtx := namespaces.WithNamespace(ctx, ns) + imgs, ierr := c.ImageService().List(nsCtx) + if ierr != nil { + log.Warn("cache prune: list images", "namespace", ns, "error", ierr) + continue + } + evicted := 0 + for _, img := range imgs { + ts := imageLastAccessed(img) + if ts.IsZero() || ts.After(cutoff) { + continue + } + if derr := c.ImageService().Delete(nsCtx, img.Name); derr != nil && !errdefs.IsNotFound(derr) { + log.Warn("cache prune: delete image", + "namespace", ns, "image", img.Name, "error", derr) + continue + } + evicted++ + } + if evicted > 0 { + log.Info("cache prune: evicted images", + "namespace", ns, "count", evicted, "max_age", maxAge) + } + totalEvicted += evicted + + // If the cache namespace is now empty, drop the metadata bucket + // too so it doesn't accumulate one stale bucket per repo that + // ever ran a job, even if the repo itself goes idle. + remaining, lerr := c.ImageService().List(nsCtx) + if lerr == nil && len(remaining) == 0 { + CleanupJobNamespace(ctx, c, ns, log) + namespacesPruned++ + } + } + + if totalEvicted > 0 || namespacesPruned > 0 { + log.Info("cache prune: complete", + "images_evicted", totalEvicted, "namespaces_pruned", namespacesPruned) + } + return nil +} + +// imageLastAccessed returns the timestamp the image was last used. Prefers +// the LastAccessedLabel; falls back to img.UpdatedAt for records that pre- +// date the label (so existing caches don't get nuked on the first prune +// after this code lands). +func imageLastAccessed(img images.Image) time.Time { + if ts := img.Labels[LastAccessedLabel]; ts != "" { + if t, err := time.Parse(time.RFC3339, ts); err == nil { + return t.UTC() + } + } + return img.UpdatedAt.UTC() +} diff --git a/pkg/dind/cache_test.go b/pkg/dind/cache_test.go new file mode 100644 index 0000000..fa9aa8b --- /dev/null +++ b/pkg/dind/cache_test.go @@ -0,0 +1,239 @@ +//go:build !darwin + +package dind + +import ( + "context" + "log/slog" + "os" + "slices" + "strings" + "testing" + "time" + + "github.com/containerd/containerd/v2/core/images" + "github.com/containerd/containerd/v2/pkg/namespaces" + "github.com/opencontainers/go-digest" + ocispec "github.com/opencontainers/image-spec/specs-go/v1" +) + +func TestCacheNamespace_FormatAndIsolation(t *testing.T) { + tests := []struct { + name, provider, repo, want string + }{ + {"github simple", "github", "ephpm/ephpm", "ephemerd-dind-cache-github-ephpm_ephpm"}, + {"gitea same name not collision", "gitea", "ephpm/ephpm", "ephemerd-dind-cache-gitea-ephpm_ephpm"}, + {"gitlab nested", "gitlab", "acme/platform/api", "ephemerd-dind-cache-gitlab-acme_platform_api"}, + {"upper preserved", "GitHub", "Org/Repo", "ephemerd-dind-cache-GitHub-Org_Repo"}, + {"weird chars sanitized", "github", "owner/repo@1", "ephemerd-dind-cache-github-owner_repo_1"}, + {"leading separator trimmed", "github", "/leading", "ephemerd-dind-cache-github-leading"}, + {"trailing separator trimmed", "github", "trailing/", "ephemerd-dind-cache-github-trailing"}, + {"empty provider", "", "any/repo", ""}, + {"empty repo", "github", "", ""}, + {"only-bad chars", "github", "////", ""}, + } + for _, tc := range tests { + t.Run(tc.name, func(t *testing.T) { + got := CacheNamespace(tc.provider, tc.repo) + if got != tc.want { + t.Errorf("CacheNamespace(%q, %q) = %q, want %q", tc.provider, tc.repo, got, tc.want) + } + }) + } + + // Cross-provider isolation invariant: same repo path on different + // providers MUST produce distinct namespaces. + if a, b := CacheNamespace("github", "ephpm/ephpm"), CacheNamespace("gitea", "ephpm/ephpm"); a == b { + t.Errorf("cross-provider collision: github and gitea both produced %q", a) + } + // Same-provider distinct repos isolated too. + if a, b := CacheNamespace("github", "foo/bar"), CacheNamespace("github", "foo/baz"); a == b { + t.Errorf("same-provider distinct repos collided: %q", a) + } +} + +// TestCache_MirrorAndPrune drives the full cache lifecycle: mirror an +// image record into a per-repo cache namespace, refresh its last-accessed, +// then exercise CachePrune to evict an artificially-aged record. +func TestCache_MirrorAndPrune(t *testing.T) { + if testing.Short() { + t.Skip("skipping cache test in short mode") + } + + log := slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelInfo})) + c := sharedTestContainerd(t) + + const ( + provider = "github" + repo = "ephpm/ephpm" + jobID = "ephemerd-github-ephpm-test-mirror" + imgName = "ghcr.io/ephpm/cache-test:1.0" + ) + jobNS := DindNamespacePrefix + "test-mirror" + cacheNS := CacheNamespace(provider, repo) + if cacheNS != "ephemerd-dind-cache-github-ephpm_ephpm" { + t.Fatalf("unexpected cacheNS: %q", cacheNS) + } + + // Stage an Image record in the per-job namespace, then mirror it. + jobCtx, cancel := context.WithTimeout( + namespaces.WithNamespace(context.Background(), jobNS), + 60*time.Second, + ) + defer cancel() + imgRecord := images.Image{ + Name: imgName, + Target: ocispec.Descriptor{ + MediaType: ocispec.MediaTypeImageManifest, + Digest: digest.FromString("cache-mirror-test-manifest"), + Size: 256, + }, + } + if _, err := c.ImageService().Create(jobCtx, imgRecord); err != nil { + t.Fatalf("create job image: %v", err) + } + + if err := MirrorImageToCache(context.Background(), c, jobNS, cacheNS, imgName, log); err != nil { + t.Fatalf("MirrorImageToCache: %v", err) + } + + cacheCtx := namespaces.WithNamespace(context.Background(), cacheNS) + mirrored, err := c.ImageService().Get(cacheCtx, imgName) + if err != nil { + t.Fatalf("get cache image: %v", err) + } + if mirrored.Labels[LastAccessedLabel] == "" { + t.Errorf("last-accessed label not set after mirror; labels=%v", mirrored.Labels) + } + + // Idempotent: mirroring again should not error and should refresh the label. + firstTS := mirrored.Labels[LastAccessedLabel] + time.Sleep(1100 * time.Millisecond) // RFC3339 is second-precision + if err := MirrorImageToCache(context.Background(), c, jobNS, cacheNS, imgName, log); err != nil { + t.Fatalf("re-mirror: %v", err) + } + again, _ := c.ImageService().Get(cacheCtx, imgName) + if again.Labels[LastAccessedLabel] == firstTS { + t.Errorf("last-accessed didn't advance on re-mirror: still %q", firstTS) + } + + // Force-age the cache record so CachePrune evicts it. Set label to + // 10 days ago; prune threshold of 7 days should trip. + old := time.Now().Add(-10 * 24 * time.Hour).UTC().Format(time.RFC3339) + again.Labels[LastAccessedLabel] = old + if _, err := c.ImageService().Update(cacheCtx, again, "labels"); err != nil { + t.Fatalf("backdate label: %v", err) + } + + if err := CachePrune(context.Background(), c, 7*24*time.Hour, log); err != nil { + t.Fatalf("CachePrune: %v", err) + } + + // After prune the image record should be gone, and the now-empty + // cache namespace should have been removed too. + if _, err := c.ImageService().Get(cacheCtx, imgName); err == nil { + t.Errorf("image still present after prune") + } + list, lerr := c.NamespaceService().List(context.Background()) + if lerr != nil { + t.Fatalf("list namespaces: %v", lerr) + } + if slices.Contains(list, cacheNS) { + t.Errorf("empty cache namespace %q should have been cleaned up; got %v", cacheNS, list) + } + + // Job ns is untouched by cache prune. + if _, err := c.ImageService().Get(jobCtx, imgName); err != nil { + t.Errorf("job-namespace image was incorrectly removed: %v", err) + } +} + +// TestCachePrune_KeepsFreshAndPrefixedOnly verifies CachePrune doesn't +// touch fresh entries OR non-cache namespaces. +func TestCachePrune_KeepsFreshAndPrefixedOnly(t *testing.T) { + if testing.Short() { + t.Skip("skipping cache test in short mode") + } + log := slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelInfo})) + c := sharedTestContainerd(t) + + cacheNS := CacheNamespace("github", "ephpm/keep-fresh") + otherNS := "non-cache-namespace-keep" + + cacheCtx := namespaces.WithNamespace(context.Background(), cacheNS) + otherCtx := namespaces.WithNamespace(context.Background(), otherNS) + + // Fresh cache image — last-accessed = now. + fresh := images.Image{ + Name: "ghcr.io/ephpm/fresh:tag", + Target: ocispec.Descriptor{ + MediaType: ocispec.MediaTypeImageManifest, + Digest: digest.FromString("cache-prune-fresh-manifest"), + Size: 99, + }, + Labels: map[string]string{LastAccessedLabel: time.Now().UTC().Format(time.RFC3339)}, + } + if _, err := c.ImageService().Create(cacheCtx, fresh); err != nil { + t.Fatalf("create fresh cache image: %v", err) + } + // Non-cache namespace with an image — must not be touched. + other := images.Image{ + Name: "ghcr.io/something/else:v1", + Target: ocispec.Descriptor{ + MediaType: ocispec.MediaTypeImageManifest, + Digest: digest.FromString("cache-prune-other-manifest"), + Size: 99, + }, + } + if _, err := c.ImageService().Create(otherCtx, other); err != nil { + t.Fatalf("create other-ns image: %v", err) + } + + if err := CachePrune(context.Background(), c, 7*24*time.Hour, log); err != nil { + t.Fatalf("CachePrune: %v", err) + } + + if _, err := c.ImageService().Get(cacheCtx, fresh.Name); err != nil { + t.Errorf("fresh cache image was incorrectly evicted: %v", err) + } + if _, err := c.ImageService().Get(otherCtx, other.Name); err != nil { + t.Errorf("non-cache image was incorrectly touched: %v", err) + } +} + +// TestSanitizeForNamespace_CollapsesAndTrims is a focused unit test on the +// sanitization helper so the cross-provider isolation invariant has a +// dedicated regression target. +func TestSanitizeForNamespace_CollapsesAndTrims(t *testing.T) { + cases := map[string]string{ + "": "", + "plain": "plain", + "with-dashes": "with-dashes", + "with.dots": "with.dots", + "slash/inside": "slash_inside", + "a//b": "a_b", + "___leading": "leading", + "trailing___": "trailing", + "!!!only-special!!!": "only-special", + "acme/platform/api": "acme_platform_api", + } + for in, want := range cases { + if got := sanitizeForNamespace(in); got != want { + t.Errorf("sanitizeForNamespace(%q) = %q, want %q", in, got, want) + } + } + // Containerd's namespace identifier regex: alphanumerics with + // ._ - separators, no consecutive separators, no leading/trailing. + for in := range cases { + got := sanitizeForNamespace(in) + if got == "" { + continue + } + if strings.HasPrefix(got, "_") || strings.HasPrefix(got, "-") || strings.HasPrefix(got, ".") { + t.Errorf("sanitizeForNamespace(%q) = %q starts with separator", in, got) + } + if strings.HasSuffix(got, "_") || strings.HasSuffix(got, "-") || strings.HasSuffix(got, ".") { + t.Errorf("sanitizeForNamespace(%q) = %q ends with separator", in, got) + } + } +} diff --git a/pkg/dind/cleanup.go b/pkg/dind/cleanup.go new file mode 100644 index 0000000..a3d8c41 --- /dev/null +++ b/pkg/dind/cleanup.go @@ -0,0 +1,302 @@ +package dind + +import ( + "context" + "fmt" + "log/slog" + goruntime "runtime" + "strings" + "time" + + "github.com/containerd/containerd/v2/client" + "github.com/containerd/containerd/v2/core/content" + "github.com/containerd/containerd/v2/core/snapshots" + "github.com/containerd/containerd/v2/pkg/namespaces" + "github.com/containerd/errdefs" + "github.com/opencontainers/go-digest" +) + +// DindNamespacePrefix is the prefix every per-job containerd namespace the +// dind subsystem creates. Each running ephpm-style job has its own namespace +// (e.g. "ephemerd-dind-ephemerd-github-ephpm-fast_shannon") so containers, +// images, and leases from one job can't pin disk against another's. +const DindNamespacePrefix = "ephemerd-dind-" + +// CleanupJobNamespace removes everything inside a per-job dind namespace and +// then the namespace metadata bucket itself. Safe to call multiple times and +// safe to call on a namespace that contains stragglers from prior crashes — +// each step logs and continues on error rather than bailing partway through. +// +// Order matters: +// 1. Containers (with their tasks + snapshots) — releases the overlayfs +// upperdirs that hold container rootfs writes. +// 2. Images — drops the gc.ref labels that pin manifest+config+layer blobs. +// 3. Leases — releases any explicit content holds (buildkit etc. take these +// during pulls/builds). +// 4. Snapshots — orphan snapshots from layers that were unpinned in step 2 +// don't get reclaimed by containerd's async GC fast enough; the +// NamespaceService.Delete in step 6 would fail with FailedPrecondition +// until those snapshots are gone. Walk the snapshotter and remove them +// explicitly. +// 5. Content blobs — same story as snapshots; the async GC won't have +// swept by the time we try to delete the namespace. +// 6. NamespaceService.Delete — drops the metadata bucket. Will only succeed +// if 1-5 left the namespace truly empty; on failure we log and leave +// the bucket so a subsequent boot's CleanupStaleDindNamespaces can retry. +func CleanupJobNamespace(ctx context.Context, c *client.Client, ns string, log *slog.Logger) { + if c == nil || ns == "" { + return + } + log = log.With("namespace", ns) + nsCtx := namespaces.WithNamespace(ctx, ns) + + // 1. Containers + tasks + snapshots-attached-to-containers. + containers, err := c.Containers(nsCtx) + if err != nil && !errdefs.IsNotFound(err) { + log.Warn("dind cleanup: list containers", "error", err) + } + for _, cnt := range containers { + id := cnt.ID() + if task, terr := cnt.Task(nsCtx, nil); terr == nil && task != nil { + if status, serr := task.Status(nsCtx); serr == nil && status.Status == client.Running { + if kerr := task.Kill(nsCtx, 9); kerr != nil { + log.Debug("dind cleanup: task kill", "container", id, "error", kerr) + } + if exitCh, werr := task.Wait(nsCtx); werr == nil { + <-exitCh + } + } + if _, derr := task.Delete(nsCtx, client.WithProcessKill); derr != nil { + log.Debug("dind cleanup: task delete", "container", id, "error", derr) + } + } + if derr := cnt.Delete(nsCtx, client.WithSnapshotCleanup); derr != nil { + log.Warn("dind cleanup: container delete", "container", id, "error", derr) + } + } + + // 2. Images. Each Image record holds gc.ref labels to its manifest + + // config + layer blobs; deleting the image releases those refs. + if imgs, ierr := c.ImageService().List(nsCtx); ierr != nil && !errdefs.IsNotFound(ierr) { + log.Warn("dind cleanup: list images", "error", ierr) + } else { + for _, img := range imgs { + if derr := c.ImageService().Delete(nsCtx, img.Name); derr != nil && !errdefs.IsNotFound(derr) { + log.Warn("dind cleanup: image delete", "image", img.Name, "error", derr) + } + } + } + + // 3. Leases. + leasesSvc := c.LeasesService() + if ls, lerr := leasesSvc.List(nsCtx); lerr != nil && !errdefs.IsNotFound(lerr) { + log.Warn("dind cleanup: list leases", "error", lerr) + } else { + for _, l := range ls { + if derr := leasesSvc.Delete(nsCtx, l); derr != nil && !errdefs.IsNotFound(derr) { + log.Warn("dind cleanup: lease delete", "lease", l.ID, "error", derr) + } + } + } + + // 4. Snapshots. Containerd's namespace-Delete requires the snapshotter + // to also report empty, but async GC won't have swept the snapshots + // that 1-3 unpinned. Image layer snapshots form a parent-child tree + // (each layer is a child of the one below) and containerd refuses to + // delete a snapshot that still has children, so we have to remove + // leaves-first. Iterate until either the snapshotter is empty or a + // pass makes no progress (something else is pinning a node). + for _, snName := range snapshotterNames() { + snSvc := c.SnapshotService(snName) + if snSvc == nil { + continue + } + // Bound the loop at len(keys) passes — each pass that makes + // progress removes at least one leaf, so it can't take more + // than O(len) passes to drain a tree. + for pass := 0; ; pass++ { + var keys []string + walkErr := snSvc.Walk(nsCtx, func(_ context.Context, info snapshots.Info) error { + keys = append(keys, info.Name) + return nil + }) + if walkErr != nil && !errdefs.IsNotFound(walkErr) { + log.Warn("dind cleanup: walk snapshots", "snapshotter", snName, "error", walkErr) + break + } + if len(keys) == 0 { + break + } + if pass > len(keys)+1 { + // Defensive: shouldn't happen for valid trees, but guard + // against a pathological case that would loop forever. + log.Warn("dind cleanup: snapshot removal not converging", + "snapshotter", snName, "remaining", len(keys)) + break + } + progress := false + for _, key := range keys { + if derr := snSvc.Remove(nsCtx, key); derr != nil { + if errdefs.IsNotFound(derr) { + continue + } + if errdefs.IsFailedPrecondition(derr) { + // Parent of an as-yet-unremoved child. Skip; + // next pass will catch it once the leaf goes. + continue + } + log.Warn("dind cleanup: snapshot remove", + "snapshotter", snName, "key", key, "error", derr) + continue + } + progress = true + } + if !progress { + // Log per-snapshot detail so we can tell whether the stuck + // snapshot is a kindest/node tmpfs view, a buildkit-managed + // snapshot, or something else. Reproduces only the stuck + // ones (the leaves we already removed are gone). + for _, key := range keys { + info, statErr := snSvc.Stat(nsCtx, key) + if statErr != nil { + log.Warn("dind cleanup: stuck snapshot stat failed", + "snapshotter", snName, "key", key, "error", statErr) + continue + } + log.Warn("dind cleanup: stuck snapshot", + "snapshotter", snName, + "key", info.Name, + "parent", info.Parent, + "kind", info.Kind.String(), + "labels", info.Labels) + } + break + } + } + } + + // 4b. Best-effort retry of snapshot removal after a short delay — gives + // containerd's async GC a moment to release anything we couldn't + // directly remove (e.g. a recently-unmounted view that hasn't propagated + // yet). Bounded so we don't sit here forever on a genuinely stuck node. + for attempt := 0; attempt < 3; attempt++ { + stillThere := false + for _, snName := range snapshotterNames() { + snSvc := c.SnapshotService(snName) + if snSvc == nil { + continue + } + if walkErr := snSvc.Walk(nsCtx, func(_ context.Context, info snapshots.Info) error { + stillThere = true + if derr := snSvc.Remove(nsCtx, info.Name); derr != nil && !errdefs.IsNotFound(derr) && !errdefs.IsFailedPrecondition(derr) { + log.Debug("dind cleanup: retry snapshot remove", + "snapshotter", snName, "key", info.Name, "error", derr) + } + return nil + }); walkErr != nil { + log.Debug("dind cleanup: retry walk", "snapshotter", snName, "error", walkErr) + } + } + if !stillThere { + break + } + time.Sleep(time.Duration(attempt+1) * time.Second) + } + + // 5. Content blobs. Same story — gc.ref labels were dropped in step 2, + // but the content store's actual blob entries linger until the next + // GC pass. Walk and delete. + cs := c.ContentStore() + if cs != nil { + var digests []string + walkErr := cs.Walk(nsCtx, func(info content.Info) error { + digests = append(digests, info.Digest.String()) + return nil + }) + if walkErr != nil && !errdefs.IsNotFound(walkErr) { + log.Warn("dind cleanup: walk content", "error", walkErr) + } + for _, d := range digests { + dgst, perr := digest.Parse(d) + if perr != nil { + log.Debug("dind cleanup: parse content digest", "digest", d, "error", perr) + continue + } + if derr := cs.Delete(nsCtx, dgst); derr != nil && !errdefs.IsNotFound(derr) { + // Content is reference-counted; if another namespace still + // pins it (shared content with shareable label set), this + // fails. Log debug — that's expected, not a leak in OUR ns. + log.Debug("dind cleanup: content delete", "digest", d, "error", derr) + } + } + } + + // 6. Finally drop the namespace metadata bucket. If the bucket was never + // materialized (short job that didn't touch docker), Delete returns + // NotFound — that's fine, downgrade to debug. + if derr := c.NamespaceService().Delete(nsCtx, ns); derr != nil { + if errdefs.IsNotFound(derr) { + log.Debug("dind cleanup: namespace never materialized") + } else { + log.Warn("dind cleanup: namespace delete", "error", derr) + } + } else { + log.Info("dind cleanup: namespace removed") + } +} + +// CleanupStaleDindNamespaces enumerates every namespace matching +// DindNamespacePrefix and runs CleanupJobNamespace on each. Intended to be +// called once at ephemerd worker-mode startup to clean up after crashed or +// killed jobs from a previous boot — the same Server.Stop path would have +// done this on a graceful shutdown but a runner timeout / SIGKILL skips it. +func CleanupStaleDindNamespaces(ctx context.Context, c *client.Client, log *slog.Logger) { + if c == nil { + return + } + all, err := c.NamespaceService().List(ctx) + if err != nil { + log.Warn("dind cleanup: list namespaces", "error", err) + return + } + count := 0 + for _, ns := range all { + if !strings.HasPrefix(ns, DindNamespacePrefix) { + continue + } + // Don't touch the long-lived per-repo image caches; only per-job + // namespaces should be swept here. Cache pruning is a separate + // concern handled by CachePrune. + if strings.HasPrefix(ns, DindCacheNamespacePrefix) { + continue + } + count++ + CleanupJobNamespace(ctx, c, ns, log) + } + if count > 0 { + log.Info("dind cleanup: stale namespaces processed", "count", count) + } +} + +// snapshotterNames returns the snapshotter names dind/containerd uses on this +// platform. On Linux that's "overlayfs"; on Windows we use the "windows" +// snapshotter. We try every plausible name and skip ones that don't exist +// so the cleanup is robust against future snapshotter changes. +func snapshotterNames() []string { + switch goruntime.GOOS { + case "windows": + return []string{"windows", "windows-lcow"} + default: + return []string{"overlayfs", "native"} + } +} + +// dindNamespaceFromJobID returns the containerd namespace name a dind Server +// uses for a given jobID. Exposed (lowercased) so callers that have the +// JobID — not a *Server — can construct the namespace name consistently. +// +//nolint:unused // exposed for symmetry with Server.jobNamespace; future +// boot-time selective cleanup may use it. +func dindNamespaceFromJobID(jobID string) string { + return fmt.Sprintf("%s%s", DindNamespacePrefix, jobID) +} diff --git a/pkg/dind/cleanup_test.go b/pkg/dind/cleanup_test.go new file mode 100644 index 0000000..a70ae34 --- /dev/null +++ b/pkg/dind/cleanup_test.go @@ -0,0 +1,135 @@ +//go:build !darwin + +package dind + +import ( + "context" + "log/slog" + "os" + "slices" + "strings" + "testing" + "time" + + "github.com/containerd/containerd/v2/core/images" + "github.com/containerd/containerd/v2/core/leases" + "github.com/containerd/containerd/v2/pkg/namespaces" + "github.com/opencontainers/go-digest" + ocispec "github.com/opencontainers/image-spec/specs-go/v1" +) + +// TestCleanup_DindNamespaces drives both cleanup helpers against a single +// embedded containerd. Combined into one TestMain-style function with +// subtests because containerd's prometheus metrics use a process-global +// registry — spinning up two containerd.New() instances in the same +// process panics with "duplicate metrics collector registration". +// +// Subtests share the containerd but use distinct namespace names so they +// don't bleed into each other. +// +// Regression test for the disk-fill leak (73 leaked dind namespaces +// pinning ~1 GB of image content each on a 100 GB VHDX). +func TestCleanup_DindNamespaces(t *testing.T) { + if testing.Short() { + t.Skip("skipping cleanup test in short mode") + } + + c := sharedTestContainerd(t) + log := slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelInfo})) + + // Containerd namespace identifiers max out at 76 chars, so test + // namespace names need to stay short — well under what production + // uses (ephemerd-dind-ephemerd-github--, ~50 chars). + t.Run("RemovesImageLeaseAndNamespace", func(t *testing.T) { + ns := DindNamespacePrefix + "test-image-lease" + nsCtx := namespaces.WithNamespace(context.Background(), ns) + + // Populate with one Image record and one Lease. + imgRecord := images.Image{ + Name: "example.test/dind/leak:tag", + Target: ocispec.Descriptor{ + MediaType: ocispec.MediaTypeImageManifest, + Digest: digest.FromString("dind-cleanup-test-manifest"), + Size: 42, + }, + } + ctx, cancel := context.WithTimeout(nsCtx, 30*time.Second) + defer cancel() + if _, err := c.ImageService().Create(ctx, imgRecord); err != nil { + t.Fatalf("create image: %v", err) + } + lease, err := c.LeasesService().Create(ctx, leases.WithExpiration(5*time.Minute)) + if err != nil { + t.Fatalf("create lease: %v", err) + } + + // Sanity: namespace visible before cleanup. + before, lerr := c.NamespaceService().List(context.Background()) + if lerr != nil { + t.Fatalf("list namespaces (pre): %v", lerr) + } + if !slices.Contains(before, ns) { + t.Fatalf("namespace %q missing before cleanup; got %v", ns, before) + } + + CleanupJobNamespace(context.Background(), c, ns, log) + + after, lerr := c.NamespaceService().List(context.Background()) + if lerr != nil { + t.Fatalf("list namespaces (post): %v", lerr) + } + if slices.Contains(after, ns) { + t.Errorf("namespace %q still present after cleanup; got %v", ns, after) + } + gone := namespaces.WithNamespace(context.Background(), ns) + if _, gerr := c.ImageService().Get(gone, imgRecord.Name); gerr == nil { + t.Errorf("image %q still resolvable after cleanup", imgRecord.Name) + } + if ls, lerr := c.LeasesService().List(gone); lerr == nil { + for _, l := range ls { + if l.ID == lease.ID { + t.Errorf("lease %q still present after cleanup", lease.ID) + } + } + } + }) + + t.Run("StaleSweepFiltersByPrefix", func(t *testing.T) { + dindNS := DindNamespacePrefix + "stale-sweep" + keepNS := "keep-me-buildkit-style" + + for _, ns := range []string{dindNS, keepNS} { + nsCtx, cancel := context.WithTimeout( + namespaces.WithNamespace(context.Background(), ns), + 30*time.Second, + ) + if _, err := c.LeasesService().Create(nsCtx, leases.WithExpiration(5*time.Minute)); err != nil { + cancel() + t.Fatalf("create lease in %s: %v", ns, err) + } + cancel() + } + + CleanupStaleDindNamespaces(context.Background(), c, log) + + list, err := c.NamespaceService().List(context.Background()) + if err != nil { + t.Fatalf("list namespaces: %v", err) + } + if slices.Contains(list, dindNS) { + t.Errorf("dind namespace %q should have been cleaned; got %v", dindNS, list) + } + if !slices.Contains(list, keepNS) { + t.Errorf("non-dind namespace %q was wrongly removed; got %v", keepNS, list) + } + for _, ns := range list { + // Cache namespaces (ephemerd-dind-cache-*) are intentionally + // preserved by CleanupStaleDindNamespaces — they're long-lived + // per-repo image caches and are managed by CachePrune, not the + // stale-job sweeper. + if strings.HasPrefix(ns, DindNamespacePrefix) && !strings.HasPrefix(ns, DindCacheNamespacePrefix) { + t.Errorf("leftover dind-prefixed (non-cache) namespace after cleanup: %q", ns) + } + } + }) +} diff --git a/pkg/dind/containers.go b/pkg/dind/containers.go index b224164..a0b2f93 100644 --- a/pkg/dind/containers.go +++ b/pkg/dind/containers.go @@ -252,6 +252,23 @@ func (s *Server) handleContainerCreate(w http.ResponseWriter, r *http.Request) { }) return } + // Mirror the just-pulled image into the per-repo cache so a future + // job in the same repo can hit it without a network round-trip. + if s.cacheNamespace != "" { + for _, name := range dedup(pullRef, req.Image) { + if merr := MirrorImageToCache(r.Context(), s.client, s.jobNamespace, s.cacheNamespace, name, s.log); merr != nil { + s.log.Debug("dind cache: mirror after container-create pull", "image", name, "error", merr) + } + } + } + } + + // Refresh the last-accessed label on the cache record for this image + // (if cached). Captures the case where a job uses an image that was + // previously pulled by an earlier job in the same repo and is being + // run via `docker run` without re-pulling. + if s.cacheNamespace != "" { + RefreshLastAccessed(r.Context(), s.client, s.cacheNamespace, req.Image, s.log) } // Build OCI spec. Always target Linux — dind containers are Linux. diff --git a/pkg/dind/dind.go b/pkg/dind/dind.go index 25ba68b..aab80c7 100644 --- a/pkg/dind/dind.go +++ b/pkg/dind/dind.go @@ -33,17 +33,18 @@ const sharedNamespace = "ephemerd" // Server is a per-job fake Docker daemon. type Server struct { - jobID string - jobNamespace string // per-job containerd namespace for isolation - sockPath string // host-side unix socket path (Linux/macOS only) - endpoint string // what the container should set DOCKER_HOST to (e.g. "tcp://gw:port" on Windows) - listener net.Listener - server *http.Server - client *client.Client - network *networking.Manager - buildkit *buildkit.Server // shared embedded BuildKit solver (nil → fall back to platform default) - runnerNetNS string // path to runner container's net namespace; used to install DNAT rules for port bindings - log *slog.Logger + jobID string + jobNamespace string // per-job containerd namespace for isolation + cacheNamespace string // per-(provider,repo) shared image cache namespace; empty disables caching + sockPath string // host-side unix socket path (Linux/macOS only) + endpoint string // what the container should set DOCKER_HOST to (e.g. "tcp://gw:port" on Windows) + listener net.Listener + server *http.Server + client *client.Client + network *networking.Manager + buildkit *buildkit.Server // shared embedded BuildKit solver (nil → fall back to platform default) + runnerNetNS string // path to runner container's net namespace; used to install DNAT rules for port bindings + log *slog.Logger mu sync.Mutex images map[string]*imageEntry // in-memory image store scoped to this job @@ -64,6 +65,18 @@ type Config struct { // JobID is the unique job identifier. JobID string + // Provider is the forge provider name ("github", "gitea", "forgejo", + // "gitlab", "woodpecker") for the job. Used together with Repo to + // build the per-repo image cache namespace; if empty, image caching + // across jobs is disabled and every pull is cold for this job. + Provider string + + // Repo is the forge-native repo path (e.g. "owner/repo" on GitHub + // or "group/subgroup/project" on GitLab). Used together with Provider + // to build the per-repo image cache namespace; if empty, image + // caching across jobs is disabled. + Repo string + // DataDir is the ephemerd data directory. The socket and temp layers // are stored under /jobs//docker/. DataDir string @@ -109,17 +122,18 @@ func New(cfg Config) (*Server, error) { jobID: cfg.JobID, // containerd namespace name regex (^[A-Za-z0-9]+(?:[._-](?:[A-Za-z0-9]+))*$) // rejects slashes. Use hyphens to namespace per-job dind state. - jobNamespace: "ephemerd-dind-" + cfg.JobID, - sockPath: sockPath, - client: cfg.Client, - network: cfg.Network, - buildkit: cfg.BuildKit, - runnerNetNS: cfg.RunnerNetNS, - log: cfg.Log.With("component", "dind", "job_id", cfg.JobID), - images: make(map[string]*imageEntry), - containers: make(map[string]*containerEntry), - execs: make(map[string]*execEntry), - networks: make(map[string]*networkEntry), + jobNamespace: "ephemerd-dind-" + cfg.JobID, + cacheNamespace: CacheNamespace(cfg.Provider, cfg.Repo), + sockPath: sockPath, + client: cfg.Client, + network: cfg.Network, + buildkit: cfg.BuildKit, + runnerNetNS: cfg.RunnerNetNS, + log: cfg.Log.With("component", "dind", "job_id", cfg.JobID), + images: make(map[string]*imageEntry), + containers: make(map[string]*containerEntry), + execs: make(map[string]*execEntry), + networks: make(map[string]*networkEntry), } s.initDefaultBridgeNetwork() return s, nil @@ -192,6 +206,18 @@ func (s *Server) Stop() { s.destroyAllExecs() s.destroyAllContainers() + // Clean up the per-job containerd namespace. destroyAllContainers handles + // containers tracked in the in-memory map; this catches stragglers + // (kindest/node-side containerd creations that landed in the same + // namespace but never registered in s.containers), then deletes the + // Image and Lease records so containerd's content GC can reclaim the + // pinned blobs, then drops the namespace metadata bucket itself. + // Without this, every job leaks ~1 GB of image content + the snapshot + // upperdir referenced by un-deleted Image records. + if s.client != nil { + CleanupJobNamespace(context.Background(), s.client, s.jobNamespace, s.log) + } + if s.server != nil { if err := s.server.Shutdown(context.Background()); err != nil { s.log.Debug("shutting down fake docker server", "error", err) @@ -489,10 +515,42 @@ func (s *Server) handleImagePull(w http.ResponseWriter, r *http.Request) { } s.mu.Unlock() + // Mirror the Image record into the per-repo cache namespace so the + // gc.ref labels keep the manifest+config+layer blobs alive after the + // per-job namespace is cleaned up. Next job in the same repo gets a + // content-store hit. Cross-repo / cross-provider jobs do NOT see this + // cache record (namespace isolation), so private images don't leak. + if s.cacheNamespace != "" { + // Mirror both the qualified ref (what containerd pulled under) + // and the unqualified docker-CLI form, so future cache hits via + // either name work. + for _, name := range dedup(ref, unqualifiedRef) { + if err := MirrorImageToCache(ctx, s.client, s.jobNamespace, s.cacheNamespace, name, s.log); err != nil { + s.log.Debug("dind cache: mirror failed", "image", name, "error", err) + } + } + } + writeProgress(fmt.Sprintf("Digest: %s", img.Target().Digest.String())) writeProgress(fmt.Sprintf("Status: Downloaded newer image for %s", unqualifiedRef)) } +// dedup returns the unique non-empty strings from the input in the order +// they first appear. Used so we don't mirror the same image twice when +// qualified and unqualified forms match. +func dedup(in ...string) []string { + seen := make(map[string]bool, len(in)) + out := make([]string, 0, len(in)) + for _, s := range in { + if s == "" || seen[s] { + continue + } + seen[s] = true + out = append(out, s) + } + return out +} + func (s *Server) handleNotImplemented(w http.ResponseWriter, r *http.Request) { s.log.Debug("unimplemented Docker API call", "method", r.Method, "path", r.URL.Path) writeJSON(w, http.StatusNotImplemented, map[string]string{ diff --git a/pkg/dind/registry_e2e_test.go b/pkg/dind/registry_e2e_test.go index e245f80..0c71ce1 100644 --- a/pkg/dind/registry_e2e_test.go +++ b/pkg/dind/registry_e2e_test.go @@ -24,7 +24,6 @@ import ( "github.com/containerd/containerd/v2/core/images" "github.com/containerd/containerd/v2/core/leases" "github.com/containerd/containerd/v2/pkg/namespaces" - containerdpkg "github.com/ephpm/ephemerd/pkg/containerd" "github.com/opencontainers/go-digest" ocispec "github.com/opencontainers/image-spec/specs-go/v1" ) @@ -124,11 +123,15 @@ func TestPushHandlerEndToEnd(t *testing.T) { mockHost := mustHost(t, mock.URL) mockRef := mockHost + "/" + repoName + ":" + imageTag - // Embedded containerd in a temp data dir, with a unique socket path - // so the test can run alongside the live daemon without clobbering its - // pipe / socket. Use os.MkdirTemp + best-effort RemoveAll because - // containerd's bbolt meta.db can still be held briefly after Stop() - // returns and t.TempDir's strict cleanup would mark the test failed. + // Reuse the process-wide shared containerd (see testcontainerd_test.go). + // containerd's prometheus metrics use a global registry, so spawning a + // second containerd in the same test binary panics. The shared instance + // is fine here because the test stages into the "buildkit" namespace + // which doesn't collide with anything else in the suite. + ctrdClient := sharedTestContainerd(t) + log := slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelInfo})) + // Per-test scratch dir for dind's socket + job dir, separate from the + // shared containerd's data dir. dataDir, err := os.MkdirTemp("", "ephemerd-push-e2e-*") if err != nil { t.Fatalf("temp dir: %v", err) @@ -138,19 +141,6 @@ func TestPushHandlerEndToEnd(t *testing.T) { t.Logf("cleanup: remove %s: %v", dataDir, err) } }) - socketPath := testSocketPath(t) - log := slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelInfo})) - ctrd, err := containerdpkg.New(containerdpkg.Config{ - DataDir: dataDir, - SocketPath: socketPath, - Log: log.With("component", "containerd"), - }) - if err != nil { - t.Skipf("embedded containerd unavailable in this env: %v", err) - } - t.Cleanup(ctrd.Stop) - - ctrdClient := ctrd.Client() bkNamespace := "buildkit" ctx, cancel := context.WithTimeout(namespaces.WithNamespace(context.Background(), bkNamespace), 60*time.Second) defer cancel() diff --git a/pkg/dind/testcontainerd_test.go b/pkg/dind/testcontainerd_test.go new file mode 100644 index 0000000..805b7c6 --- /dev/null +++ b/pkg/dind/testcontainerd_test.go @@ -0,0 +1,83 @@ +//go:build !darwin + +package dind + +import ( + "log/slog" + "os" + "sync" + "testing" + + "github.com/containerd/containerd/v2/client" + containerdpkg "github.com/ephpm/ephemerd/pkg/containerd" +) + +// containerd's prometheus metrics live in a process-global registry, so +// any second containerdpkg.New() in the same `go test` binary panics with +// "duplicate metrics collector registration attempted". Tests that need a +// real embedded containerd share this single instance via a sync.Once. +// +// The instance is created lazily on first call and torn down via a +// shutdown hook registered with the *first* test that uses it (subsequent +// callers reuse, no extra teardown). DataDir is a per-process temp dir +// shared across all callers — tests should put their state in distinct +// namespaces, not distinct data dirs. + +var ( + sharedCtrdOnce sync.Once + sharedCtrd *containerdpkg.Server + sharedCtrdErr error + sharedDataDir string +) + +func sharedTestContainerd(t *testing.T) *client.Client { + t.Helper() + sharedCtrdOnce.Do(func() { + dir, err := os.MkdirTemp("", "ephemerd-shared-ctrd-*") + if err != nil { + sharedCtrdErr = err + return + } + sharedDataDir = dir + + log := slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelInfo})) + ctrd, err := containerdpkg.New(containerdpkg.Config{ + DataDir: dir, + SocketPath: testSocketPath(t), + Log: log.With("component", "shared-test-containerd"), + }) + if err != nil { + sharedCtrdErr = err + return + } + sharedCtrd = ctrd + }) + + if sharedCtrdErr != nil { + t.Skipf("embedded containerd unavailable in this env: %v", sharedCtrdErr) + } + if sharedCtrd == nil { + t.Skip("embedded containerd not initialized") + } + return sharedCtrd.Client() +} + +// TestMain ensures the shared containerd (if any was started) is stopped +// before the test binary exits, so its bbolt meta.db is unlocked and the +// temp dir can be removed. +func TestMain(m *testing.M) { + code := m.Run() + if sharedCtrd != nil { + sharedCtrd.Stop() + } + if sharedDataDir != "" { + // Best-effort. On Windows containerd's meta.db can take a beat to + // release after Stop() returns; if RemoveAll fails the temp dir + // just gets cleaned by the OS later — not worth failing tests over. + if err := os.RemoveAll(sharedDataDir); err != nil { + // stderr because slog isn't worth wiring up here in TestMain. + _, _ = os.Stderr.WriteString("test cleanup: remove " + sharedDataDir + ": " + err.Error() + "\n") + } + } + os.Exit(code) +} diff --git a/pkg/runtime/runtime.go b/pkg/runtime/runtime.go index 2964bae..bac7b2c 100644 --- a/pkg/runtime/runtime.go +++ b/pkg/runtime/runtime.go @@ -500,6 +500,16 @@ type CreateConfig struct { ID string // unique job identifier (container name, dind socket path) Image string // OCI image reference (empty = use default) + // Provider is the forge provider name (e.g. "github", "gitea") that + // queued the job. Together with Repo it's used to scope dind's + // per-repo image cache. Empty disables caching for this job. + Provider string + + // Repo is the forge-native repo path (e.g. "owner/repo"). Together + // with Provider it's used to scope dind's per-repo image cache. + // Empty disables caching for this job. + Repo string + // JITConfig is the base64-encoded JIT config for GitHub runners. // Passed as "--jitconfig " to the runner entrypoint. // Mutually exclusive with Entrypoint. @@ -682,6 +692,8 @@ func (r *Runtime) Create(ctx context.Context, cfg CreateConfig) (*RunnerEnv, err var err error dindServer, err = dind.New(dind.Config{ JobID: id, + Provider: cfg.Provider, + Repo: cfg.Repo, DataDir: r.cfg.DataDir, Client: r.client, Network: r.cfg.Network, diff --git a/pkg/scheduler/dispatch.go b/pkg/scheduler/dispatch.go index 7fcc68b..fe161eb 100644 --- a/pkg/scheduler/dispatch.go +++ b/pkg/scheduler/dispatch.go @@ -30,7 +30,11 @@ func (s *dispatchServer) CreateJob(ctx context.Context, req *apiv1.CreateJobRequ s.log.Info("dispatch: creating job", "id", req.Id, "image", req.Image) env, err := s.rt.Create(ctx, runtime.CreateConfig{ - ID: req.Id, Image: req.Image, JITConfig: req.JitConfig, + ID: req.Id, + Image: req.Image, + JITConfig: req.JitConfig, + Provider: req.Provider, + Repo: req.Repo, }) if err != nil { s.log.Error("dispatch: create failed", "id", req.Id, "error", err) @@ -140,12 +144,17 @@ func NewDispatchClient(addr string) (*DispatchClient, error) { }, nil } -// Create dispatches a container create to the WSL worker. -func (d *DispatchClient) Create(ctx context.Context, id, image, jitConfig string) error { +// Create dispatches a container create to the WSL worker. provider + repo +// are passed through so the VM-side dind server can scope its per-repo +// image cache namespace to (provider, repo) and not leak private images +// across forges or repos. +func (d *DispatchClient) Create(ctx context.Context, id, image, jitConfig, provider, repo string) error { _, err := d.client.CreateJob(ctx, &apiv1.CreateJobRequest{ Id: id, Image: image, JitConfig: jitConfig, + Provider: provider, + Repo: repo, }) return err } diff --git a/pkg/scheduler/dispatch_test.go b/pkg/scheduler/dispatch_test.go index bfa6597..0f6b575 100644 --- a/pkg/scheduler/dispatch_test.go +++ b/pkg/scheduler/dispatch_test.go @@ -53,7 +53,7 @@ func TestDispatchClient_Create(t *testing.T) { ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) defer cancel() - if err := c.Create(ctx, "job-1", "alpine:latest", "jit-config"); err != nil { + if err := c.Create(ctx, "job-1", "alpine:latest", "jit-config", "github", "owner/repo"); err != nil { t.Fatalf("Create: %v", err) } @@ -72,6 +72,12 @@ func TestDispatchClient_Create(t *testing.T) { if got.JitConfig != "jit-config" { t.Errorf("JitConfig = %q", got.JitConfig) } + if got.Provider != "github" { + t.Errorf("Provider = %q, want github", got.Provider) + } + if got.Repo != "owner/repo" { + t.Errorf("Repo = %q, want owner/repo", got.Repo) + } } func TestDispatchClient_Create_PropagatesError(t *testing.T) { @@ -90,7 +96,7 @@ func TestDispatchClient_Create_PropagatesError(t *testing.T) { ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) defer cancel() - err := c.Create(ctx, "job-1", "img", "") + err := c.Create(ctx, "job-1", "img", "", "github", "owner/repo") if err == nil { t.Fatal("expected error from server") } diff --git a/pkg/scheduler/scheduler.go b/pkg/scheduler/scheduler.go index fb7f99b..028d410 100644 --- a/pkg/scheduler/scheduler.go +++ b/pkg/scheduler/scheduler.go @@ -619,7 +619,7 @@ func (s *Scheduler) handleLinuxJob(ctx context.Context, event providers.JobEvent jobCtx, cancel = context.WithCancel(ctx) } - if err := s.cfg.LinuxDispatcher.Create(jobCtx, claim.RunnerName, image, claim.RunnerConfig); err != nil { + if err := s.cfg.LinuxDispatcher.Create(jobCtx, claim.RunnerName, image, claim.RunnerConfig, event.Provider.Name(), event.Repo); err != nil { log.Error("dispatch create failed", "error", err) if rmErr := event.Provider.ReleaseJob(ctx, claim); rmErr != nil { log.Warn("failed to remove ghost runner", "runner_id", claim.RunnerID, "error", rmErr) From 3a211874e8af3063a6373b84ed6afb092b894803 Mon Sep 17 00:00:00 2001 From: Luther Monson Date: Sat, 16 May 2026 07:48:39 -0700 Subject: [PATCH 2/2] docs(dind): document per-job cleanup + per-repo image cache Extends docs/architecture/fake-docker-daemon.md with two new sections: - "Per-Job Namespace and Cleanup" covers the ephemerd-dind- namespace, the 6-step CleanupJobNamespace sequence (containers, images, leases, leaf-first snapshots, content, namespace), the FailedPrecondition retry, and the worker-mode CleanupStaleDindNamespaces sweep for crash recovery. - "Per-Repo Image Cache" covers the ephemerd-dind-cache-- namespace, the Provider/Repo plumbing path, the two cache-write events (pull and container-create), the containerd namespace-isolation privacy guarantee (and the explicit "don't ever set namespace.shareable" caveat), and the prune semantics including the UpdatedAt fallback for records pre-dating the last-accessed label. Updates docs/getting-started/configuration.md to surface cache_prune_interval and cache_max_age on the example config block and adds a [dind] section reference paragraph explaining the cache behavior, privacy boundary, and disable knobs. Also refreshes the Key Files table to include cleanup.go, cache.go, and their respective test files. --- docs/architecture/fake-docker-daemon.md | 81 +++++++++++++++++++++++-- docs/getting-started/configuration.md | 22 ++++++- 2 files changed, 98 insertions(+), 5 deletions(-) diff --git a/docs/architecture/fake-docker-daemon.md b/docs/architecture/fake-docker-daemon.md index 2c24e18..61cfde5 100644 --- a/docs/architecture/fake-docker-daemon.md +++ b/docs/architecture/fake-docker-daemon.md @@ -92,8 +92,74 @@ This is important: there is no nested Docker. The fake daemon creates first-clas ## Socket Lifecycle 1. **Job starts**: ephemerd creates a Unix socket at `/jobs//docker/d.sock`, starts the fake daemon goroutine, mounts the socket into the container at `/var/run/docker.sock`. -2. **Job runs**: Docker CLI in the container talks to the socket. ephemerd handles requests and creates sidecars as sibling containers. -3. **Job finishes**: ephemerd destroys all sibling containers created by this job, deletes the temp directory, closes the socket. No leaked state. +2. **Job runs**: Docker CLI in the container talks to the socket. ephemerd handles requests and creates sibling containers as sibling containers. +3. **Job finishes**: ephemerd destroys all sibling containers created by this job, deletes the temp directory, closes the socket, and runs the per-job namespace cleanup described below. + +## Per-Job Namespace and Cleanup + +Every job that uses dind gets its own containerd namespace: + +``` +ephemerd-dind- e.g. ephemerd-dind-ephemerd-github-ephpm-fast_shannon +``` + +All sibling containers, image records, leases, and snapshots created by the job live in this namespace. When the job exits, `Server.Stop()` runs `CleanupJobNamespace`: + +1. Kill and delete any in-flight tasks, delete every container with `WithSnapshotCleanup`. +2. Delete every Image record (drops the `containerd.io/gc.ref.content.*` labels that pin manifest + config + layer blobs). +3. Delete every lease. +4. Walk the snapshotter and remove snapshots **leaf-first in a multi-pass loop**. Image layer snapshots form a parent-child tree (each layer is a child of the one below) and containerd refuses to delete a snapshot that still has children. Each pass removes whatever currently has no children; the loop terminates when the snapshotter is empty or no pass makes progress. +5. Walk the content store and explicitly delete blobs (containerd's async content GC won't have swept yet by the time we want to delete the namespace). +6. `NamespaceService().Delete()` the metadata bucket itself. + +A short retry loop catches transient `FailedPrecondition` errors caused by containerd's eventually-consistent state. If a snapshot is genuinely stuck, the failure is logged with the snapshot's name, parent, and kind so operators can investigate. + +On worker-mode startup, `CleanupStaleDindNamespaces` sweeps everything matching `ephemerd-dind-*` that's not a cache namespace (see below), catching ungraceful exits — `DeadlineExceeded`, `SIGKILL`, host reboot — that bypassed `Server.Stop`. + +## Per-Repo Image Cache + +The cleanup above releases the `gc.ref` labels that previously pinned image content (manifest, config, layer blobs). Without further action, every job would pay a full network re-pull for `kindest/node` (~1 GB) and any other image the job touches. + +To avoid that tax, dind maintains a **per-(provider, repo)** long-lived cache namespace: + +``` +ephemerd-dind-cache-- + +ephemerd-dind-cache-github-ephpm_ephpm +ephemerd-dind-cache-gitea-ephpm_ephpm ← distinct from the github one +ephemerd-dind-cache-gitlab-acme_platform_api ← nested GitLab groups OK +``` + +`Provider` and `Repo` flow through `CreateJobRequest` → `runtime.CreateConfig` → `dind.Config`, so the cache namespace is derived from the dispatching forge rather than parsed from the runner name (which loses provider info). + +### Cache writes + +Two events mirror image metadata into the cache: + +1. **Image pull (`POST /images/create`)** — after a successful pull, the Image record is created/updated in the cache namespace with an `ephemerd.io/last-accessed` label set to the current RFC3339 UTC time. +2. **Container create (`POST /containers/create`)** — if the requested image is already present in the cache (no pull needed), the cache record's `last-accessed` label is refreshed. Captures cache hits driven by `docker run` of a previously-pulled image. + +The cache record's `gc.ref.content.*` labels pin the underlying content blobs in containerd's content store. Even when the per-job namespace is deleted and its Image record gone, the cache record keeps the blobs alive. The next job in the same repo gets a content-store hit and pulls only the manifest (to revalidate the digest). + +### Privacy boundary + +Containerd's namespace isolation is the privacy guarantee. A content blob whose only Image record reference lives in `ephemerd-dind-cache-foo-private` is **invisible** to a resolver running in any other namespace — containerd's content store lookup is namespace-scoped at the metadata layer. Two forges with same-named repos (`github/ephpm` vs `gitea/ephpm`) get distinct cache namespaces; two repos within the same forge get distinct caches keyed by the full `owner/repo` path. Auth credentials live in the per-job in-memory auth cache and are never copied into the cache namespace. + +This relies on never setting the `containerd.io/namespace.shareable` label on cache namespaces. Don't. + +### Cache pruning + +A goroutine started in worker-mode walks every `ephemerd-dind-cache-*` namespace on a fixed interval and evicts Image records whose `last-accessed` label is older than the configured threshold. Configuration: + +```toml +[dind] + cache_prune_interval = "24h" # how often the sweeper wakes up + cache_max_age = "168h" # 7 days — LRU threshold +``` + +After eviction, containerd's content GC reclaims any blob no longer referenced by an Image record in any namespace. Cache namespaces left empty after a prune pass are removed entirely so unused-repo metadata doesn't accumulate. + +Image records pre-dating the `last-accessed` label fall back to the record's `UpdatedAt` timestamp on first prune, so introducing this feature doesn't nuke pre-existing caches. ## Enabling @@ -101,12 +167,19 @@ Enable with `dind.enabled = true` in config or the `--dind` flag on `serve`: ```toml [dind] -enabled = true + enabled = true + cache_prune_interval = "24h" + cache_max_age = "168h" ``` ## Key Files | File | Purpose | |------|---------| -| `pkg/dind/dind.go` | Fake Docker API server, route dispatch, image pull | +| `pkg/dind/dind.go` | Fake Docker API server, route dispatch, image pull, cache-mirror on pull | +| `pkg/dind/containers.go` | Container lifecycle, `last-accessed` refresh on container-create | +| `pkg/dind/cleanup.go` | Per-job namespace cleanup (containers, images, leases, snapshots leaf-first, content, namespace) + boot-time stale sweep | +| `pkg/dind/cache.go` | Per-repo cache namespace name derivation + sanitization, mirror helper, last-accessed refresh, periodic prune | | `pkg/dind/dind_test.go` | Tests for health and image endpoints | +| `pkg/dind/cleanup_test.go` | Tests covering full namespace teardown + stale-sweep prefix filter | +| `pkg/dind/cache_test.go` | Tests covering cross-provider isolation, sanitization invariants, mirror + refresh + prune lifecycle | diff --git a/docs/getting-started/configuration.md b/docs/getting-started/configuration.md index cbd0fbe..cf60087 100644 --- a/docs/getting-started/configuration.md +++ b/docs/getting-started/configuration.md @@ -114,6 +114,8 @@ max_concurrent = 4 # max simultaneous jobs # --- Docker-in-Docker -------------------------------------------------------- [dind] # enabled = false # mount fake Docker socket into containers +# cache_prune_interval = "24h" # how often the per-repo image cache pruner runs +# cache_max_age = "168h" # evict cached image records inactive longer than this (7 days) # --- Metrics ------------------------------------------------------------------ [metrics] @@ -258,11 +260,29 @@ Container networking configuration. ### `[dind]` -Docker-in-Docker support. +Docker-in-Docker support. When `enabled`, every job sees `/var/run/docker.sock` and the runner's containerd serves a fake Docker Engine API on it. Image pulls from inside the job (e.g. `kind create cluster` pulling `kindest/node`) are mirrored into a long-lived per-repo namespace so the next job in the same repo gets a content-store hit instead of re-pulling. | Field | Type | Default | Description | |---|---|---|---| | `enabled` | boolean | `false` | Mount a fake Docker socket (`/var/run/docker.sock`) into job containers | +| `cache_prune_interval` | duration | `"24h"` | How often the per-repo image cache pruner runs. Set to `"0"` to disable pruning. | +| `cache_max_age` | duration | `"168h"` (7d) | Evict cached image records whose `ephemerd.io/last-accessed` label is older than this. Containerd's content GC reclaims the now-unreferenced blobs. | + +**Per-repo image cache.** Each (provider, repo) pair gets its own long-lived containerd namespace named `ephemerd-dind-cache--`. Examples: + +``` +ephemerd-dind-cache-github-ephpm_ephpm +ephemerd-dind-cache-gitea-ephpm_ephpm ← distinct from the github one +ephemerd-dind-cache-gitlab-acme_platform_api ← nested GitLab groups OK +``` + +The cache namespace persists across jobs and across ephemerd restarts. Per-job state lives in a separate namespace (`ephemerd-dind-`) which is deleted when each job exits. + +**Privacy boundary.** Containerd namespace isolation prevents one repo's cached image blobs from being resolved by any other namespace. Two forges with identically-named repos (`github/foo` vs `gitea/foo`) do not share a cache. Two repos within the same forge do not share a cache. Auth credentials are scoped to the per-job namespace's in-memory auth cache and are never copied into the long-lived cache namespace. + +**Pruning.** Every `cache_prune_interval`, dind walks each `ephemerd-dind-cache-*` namespace and evicts Image records whose `ephemerd.io/last-accessed` label is older than `cache_max_age`. Cache namespaces left empty after eviction are removed entirely. Records pre-dating the label fall back to the record's `UpdatedAt` timestamp so a deploy that introduces the cache feature doesn't nuke pre-existing records on first prune. + +**Disabling caching.** Setting `cache_max_age = "0"` disables eviction (the cache grows unbounded — useful for debugging but not recommended in production). Setting `cache_prune_interval = "0"` disables the pruner goroutine entirely; equivalent to "keep everything forever, even empty namespaces." ### `[metrics]`