fix(dind): resolve sibling bind sources via /proc/<runner-pid>/root#83
Merged
Conversation
The first cut of bind translation walked the runner snapshot's upperdir then each lowerdir in order, returning the first match. That broke for paths whose contents span multiple image layers — the GHA actions-runner image has /home/runner/externals/ as a dir entry in layer 4 but the actual node20/bin/node binary lives in layer 22. The walk picked layer 4, bound an empty tree, and actions/checkout failed downstream with 'exec: "/__e/node20/bin/node": no such file or directory'. Resolve sources through /proc/<runner-task-pid>/root instead. procfs's magic root link walks the runner's mount namespace, so the dind shim hands containerd the same merged overlay view the runner sees from the inside. The per-layer walk is kept as a fallback for unit tests that don't have a live PID. The runtime now captures task.Pid() after task.NewTask and passes it into SetRunnerRootfs alongside the existing snapshot key + bind table, right next to the SetRunnerNetNS call.
…d_test portforward_test.go already declares an itoa helper. golangci-lint's typecheck refuses the redeclaration in the same package. Inline strconv.Itoa at the one call site instead — one fewer indirection, no collision.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #82
After #82 merged, ephpm workflows that use
container:got past thefirst sh script (
/__w/_tempwas correctly resolved into the runner'supperdir) but failed on
actions/checkout:```
exec failed: exec: "/__e/node20/bin/node": stat /__e/node20/bin/node: no such file or directory
```
Root cause
The per-layer resolver in #82 returned the first lowerdir that had
the requested path. The GHA
actions/runnerimage has/home/runner/externals/as a directory entry in one layer butnode20/bin/nodedeep inside it in a different (later) layer. Thewalk picked the layer with the empty dir entry, bound an empty tree
into the sibling, and downstream exec couldn't find
node20.Fix
Resolve sources through
/proc/<runner-task-pid>/root/Xinstead.procfs's magic root link walks the runner's mount namespace, so dind
hands containerd the same merged overlay view the runner sees from the
inside. No layer ordering to get wrong — the kernel does the merge.
The per-layer walk is kept as a fallback for unit tests that don't have
a live PID (the e2e test sets PID=0 explicitly so it exercises the
snapshot-mount path against real overlayfs).
Wiring
pkg/dind/dind.go: newrunnerTaskPID uint32field onServer.SetRunnerRootfssignature grows by oneuint32argument.pkg/dind/bindtranslate.go::translateBindSource: new resolutionrule — when
runnerTaskPID > 0, use/proc/<pid>/root/<src>.Returned
rw(writes copy-up into the runner's own upperdir).pkg/dind/containers.go::buildBindMounts: readsrunnerTaskPIDalongside the existing snapshot/bind state.
pkg/runtime/runtime.go:SetRunnerRootfscall moves fromimmediately-after-NewContainer to immediately-after-NewTask so
task.Pid()is available. Sits next toSetRunnerNetNSnow.Tests
pkg/dind/bindtranslate_test.go:TestTranslateBindSource_TaskPIDResolvesViaProcRoot— uses the testbinary's own PID (
os.Getpid()), plants a marker int.TempDir(),asserts translation returns
/proc/<pid>/root/<marker>and the fileround-trips through that path.
TestTranslateBindSource_TaskPIDRejectsMissingSource— loud-failureregression: PID > 0 + missing source must error, not fall through to
the layer walk.
(PID=0, exercise the snapshot-walk fallback unchanged).
buildBindMountsintegration tests unchanged (still onPID=0/snapshot-walk).
pkg/dind/bindtranslate_e2e_test.go:a real overlayfs snapshot (the production proc-path is exercised by
the in-VM workflow run, not by a hermetic e2e — a live runner task is
needed for /proc to be meaningful).
Docs
docs/arch/dind-bind-translation.mdupdated with the new resolutionorder, a "why PID resolution beats per-layer walk" subsection
documenting the externals/node20 failure mode, and the security
envelope notes carried forward.
Test plan
container:. Expectactions/checkoutto findnode20and the rest of the workflowto proceed.