Skip to content

STAC-25033: verify pushed image binary arch matches requested platform#16

Merged
LouisLotter merged 2 commits into
mainfrom
STAC-25033-verify-binary-arch
Jun 12, 2026
Merged

STAC-25033: verify pushed image binary arch matches requested platform#16
LouisLotter merged 2 commits into
mainfrom
STAC-25033-verify-binary-arch

Conversation

@LouisLotter

@LouisLotter LouisLotter commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

STAC-25033

Why

The manifest of a pushed image always claims whatever --platform requested — regardless of what the Dockerfile actually compiled or copied in. A Dockerfile that hardcodes GOARCH (or copies a foreign-arch binary) therefore publishes an image that scans clean, gets cosign-signed, and only fails at runtime on the target nodes with exec format error. We hit this class of bug reviewing the kafkaup-operator migration (kafkaup-operator#1): its scaffold Dockerfile hardcodes GOARCH=amd64 while the release workflow pushes both amd64 and arm64.

What

One new step in push-single-arch, between push and signing (~30 lines):

  1. docker create the pushed digest for the requested platform (never started — no QEMU needed) and docker cp -L out the entrypoint (fallback: cmd) binary.
  2. Ask file(1) what it is, and fail if it's an ELF binary for the wrong architecture (x86-64 vs aarch64).

Only a definitive ELF mismatch fails. Everything else passes: no entrypoint, extraction failure, script entrypoints (common in docker-images family images), and arches other than amd64/arm64 (the two we publish).

Also adds .github/actions/** to the zizmor lint workflow path filters — composite-action changes (like this one) previously didn't trigger the security audit.

Validation

Tested end-to-end locally against real images: arch-mismatched image (amd64 platform, arm64 binary) → fails with ELF 64-bit LSB executable, ARM aarch64...; matching image → passes; symlinked entrypoint (/app -> /real) → dereferenced, passes; shell-script entrypoint → skips; cmd-only image (no entrypoint) → falls back to cmd, passes. zizmor no findings; shellcheck clean.

🤖 Generated with Claude Code

The manifest of a pushed image always claims whatever --platform requested,
regardless of what the Dockerfile actually compiled or copied in. A Dockerfile
that hardcodes GOARCH (or copies a foreign-arch binary) therefore publishes an
image that scans clean, gets signed, and only fails on the target nodes with
'exec format error'. Found while reviewing the kafkaup-operator migration,
whose scaffold Dockerfile hardcodes GOARCH=amd64 while the release workflow
pushes both amd64 and arm64.

Add a verification step between push and cosign signing: pull the pushed
digest for the requested platform, extract the entrypoint (or cmd) binary via
docker create + docker cp -L (no container start, so no emulation needed),
and compare the ELF e_machine field against the requested arch. Mismatch
fails the action before signing. Indeterminate cases (no entrypoint, relative
path, non-ELF entrypoint such as shell scripts) warn and pass so existing
script-entrypoint consumers are unaffected. A verify-binary-arch input
(default true) can disable the check and its image pull-back.

Also add .github/actions/** to the zizmor lint workflow path filters so
composite-action changes trigger the security audit; this change itself was
the first to miss it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@LouisLotter LouisLotter requested a review from a team as a code owner June 12, 2026 07:11
Replace the hand-rolled ELF header parser (dd/od, endianness handling,
e_machine table) with file(1), which ships on all GitHub-hosted runners and
prints the architecture in one line. Also drop the verify-binary-arch opt-out
input — the skip paths already cover every legitimate image shape (script
entrypoints, no entrypoint, extraction failure), so there is no valid reason
to publish an ELF entrypoint whose architecture differs from the platform —
and the explicit docker pull and absolute-path pre-check, which docker create
and the docker cp failure path already cover. Map only amd64/arm64 since
those are the architectures we publish; anything else skips with a warning.

Re-validated against the same local test images: arch-mismatch fails, match
passes, symlinked entrypoint dereferences, script entrypoint and cmd-only
images skip.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@LouisLotter

Copy link
Copy Markdown
Contributor Author

Due diligence on "isn't there a tool for this?" — researched before settling on the ~30-line step; short answer: no, and this step is the textbook encoding of what the ecosystem recommends.

  • No maintained tool or action validates "binary inside matches the manifest claim". Manifest-side tools (docker manifest inspect, buildx imagetools inspect, skopeo) only read the manifest — the thing that lies.
  • container-structure-test has file/metadata/command tests but no binary-arch assertion — and would add a pinned third-party binary + per-repo test YAML (more surface, not less).
  • crane's documented workflow for this exact question is crane export + file on the binary — i.e. this step, plus a tool install we don't need (docker create/cp does the extraction with what's already on the runner).
  • The SBOM we already attach could theoretically be queried for GOARCH, but that's jq-over-SPDX archaeology, syft-format-dependent, and Go-only — more fragile and more code.
  • Docker build checks (docker build --check) lint Dockerfile syntax and can't see GOARCH=amd64 inside a RUN line — they wouldn't catch the kafkaup case.
  • Precedent: oauth2-proxy shipped exactly this bug in v7.2.0 (#1442, x86_64 binary in the arm image) and fixed only the build (#1445) — prevention, no gate, so a regression ships again. Where the literature recommends verification at all (example), it's literally "run file on the main binary per platform — catches the wrong artifact even when the manifest looks right." The only other pattern in the wild is a QEMU docker run smoke test, which executes arbitrary entrypoint code at publish time — not something a shared action should do.

The step reduces to three irreducible parts: extract without executing (docker create + cp -L), identify (file -b), and don't false-positive script-entrypoint images (the skip branches). Everything else was already cut in ace2a56.

🤖 Generated with Claude Code

@Andreagit97

Andreagit97 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Due diligence on "isn't there a tool for this?" — researched before settling on the ~30-line step; short answer: no, and this step is the textbook encoding of what the ecosystem recommends.

IMHO the reason why there is no a tool is that probably we should test the image is working before publishing it 😂

@Andreagit97 Andreagit97 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed i'm ok with that but i'm happy to remove this workaround in case of future issues

@LouisLotter LouisLotter merged commit f7e766e into main Jun 12, 2026
1 check passed
@LouisLotter LouisLotter deleted the STAC-25033-verify-binary-arch branch June 12, 2026 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants