Docker 25 / API v1.44: fix buildpacks remote-builder breakage#42
Docker 25 / API v1.44: fix buildpacks remote-builder breakage#42jphenow wants to merge 31 commits into
Conversation
Update base Docker image from 24.0.7-alpine3.19 to 25.0.5-alpine3.20. Update Go version from 1.21 to 1.24.0. This prepares the foundation for upgrading dependent components.
…ersions Upgrade accelerated-container-image from v1.0.4 to v1.4.1 for improved performance. Pin overlaybd to a specific commit (6a6651652014bbcc5dd87a49f15ca2638ae9b1dc) and update alpine base image from 3.19 to 3.20. Update buildx from v0.12 to v0.13.1 to leverage latest Docker build improvements.
Update docker/docker package from v20.10.8 to v25.0.5+incompatible to align with the upgraded Docker base image. Update Go module dependencies to resolve transitive dependencies introduced by newer Docker version, including logrus, protobuf, gRPC, and OpenTelemetry libraries. Update documentation URL in storage.go comment to point to current API version (v1.44).
Updated Go versions in multi-stage build to match project requirements: - overlaybd_snapshotter_build: Go 1.21 → 1.23 (overlaybd-snapshotter v1.4.1 requires Go 1.23+) - dockerproxy_build: Go 1.21 → 1.24 (dockerproxy go.mod requires Go 1.24+) - Fixed docker/buildx-bin tag from v0.13.1 (non-existent) to v0.13 (available) These version constraints are dictated by the upstream projects and ensure all components compile successfully in the Docker 25.0.5 environment.
Created 9-phase test suite to validate Docker 25.0.5 upgrade with emphasis on buildpacks API v1.44 compatibility (resolves the "client version 1.52 is too new. Maximum supported API version is 1.43" error). Test phases: - Environment setup and Docker installation - Go build verification and code quality checks - Docker image build with overlaybd compilation - Component version verification - Docker API v1.44 functionality testing - Buildpacks API v1.44 compatibility (critical test) - overlaybd image conversion - Storage management and pruning - End-to-end integration tests All tests passed, confirming Docker 25.0.5 with API v1.44 is ready for deployment.
Documented comprehensive test results from Docker 25.0.5 upgrade validation, including all 9 test phases, version confirmations, and the critical buildpacks API v1.44 compatibility verification. Records the issues encountered during upgrade (Go version constraints, missing buildx tag) and their resolutions. This serves as both validation proof and reference documentation for the upgrade process.
dangra
left a comment
There was a problem hiding this comment.
Looks good Jon. The tests are a nice thing to have, it would be ideal to hook the CI with running them in a sprite but that something for another day
|
Yea I'll probably do before I merge probably |
Add comprehensive exclusions for git artifacts, tests, documentation, development files, build artifacts, and IDE configurations. This reduces image size and keeps the build context focused on runtime requirements.
Establish consistent code quality checks with pre-commit hooks for trailing whitespace, YAML validation, and Go formatting. Add corresponding Makefile targets to make linting easily accessible locally.
Replace single monolithic job with three specialized jobs (lint, test, build) that run in sequence with proper dependencies. Add tiered testing capability with workflow_dispatch inputs and smart tier selection based on branch/tag. Implement concurrent cancellation and modern GitHub Actions versions.
Add three-tier test structure supporting different testing scenarios: Tier 1 for fast PR validation, Tier 2 for critical integration tests (including API v1.44 compatibility check), and Tier 3 for full validation. Include test harness with common helper functions, per-tier test scripts, and comprehensive documentation explaining the testing strategy and how to run tests locally and in CI.
Replace hardcoded absolute paths with a portable method that computes the repository root relative to the test script location. This allows tests to run from any working directory and makes the test suite more robust across different execution environments.
Previously, CI would run on all branches when pushed. This caused unnecessary build overhead and noise in logs. By restricting to main and tags, we reduce CI load while maintaining safety checks on PRs and explicit deployments.
Instead of relying on the default entrypoint (which requires --privileged for dockerd setup), explicitly invoke docker and dockerd binaries. This allows version checks to run without elevated privileges while keeping the logic simpler and more direct.
Update the Go version across the build system and documentation to support newer language features and improvements. This upgrade is required for modern dependency compatibility.
Remove CLAUDE.md from Docker build context (it's documentation, not a runtime dependency), add dockerproxy binary to gitignore, and update a GitHub link to reference a specific commit for consistency.
Replace the old sequential test suite (00-08) with a tiered test structure already implemented in tier1/tier2/tier3. The new structure is modular, maintainable, and maps 1:1 to the old scripts. Update README to remove migration notes now that the transition is complete.
Extend the tier1 version check to verify Buildx v0.13.x and Alpine 3.20.x are present in the image, ensuring all critical components are correctly versioned alongside Docker itself.
The buildx binary is now provided via the Alpine base image or Docker installation, eliminating the need for an explicit multi-stage copy. This simplifies the build process and reduces maintenance overhead.
The version check test (tier1/03-version-check.sh) verified component versions that are now effectively guaranteed by the Docker image build itself. Removing this redundant test simplifies the test suite and reduces CI execution time without losing coverage of the actual components present in the image.
|
incorporated into CI with some adjustments. I'll work on some testing within flyctl this week. |
Scaffold a tier3 matrix job that exercises the full pack lifecycle through
a locally-run rchab container for each major CNB builder family. Intent is
regression surface, not green wall — older Heroku stacks (heroku-20,
classic:cnb) are marked EXPECT_FAIL=1 so they surface as informational
when their upstream dependencies 403, but still flag unexpected successes.
Also:
- test-buildpacks-deploy.sh: pin ruby 3.3.5p100 (required by heroku-24)
and set [env] PORT=3000 so the heroku/ruby default web process doesn't
boot-loop on `${PORT:?}`.
- Add tests/.gitignore exception for fixtures/**/*.txt (requirements.txt
and runtime.txt are source, not test output).
The matrix is gated: runs on main, release tags, and PRs with the
`test:buildpacks` label. `continue-on-error: true` at the job level, no
dependency from the `build` job — informational for now, promote rows to
required later once stable.
Default pull_request trigger types are opened/synchronize/reopened — not labeled. Adding test:tier2 / test:tier3 / test:buildpacks after a PR opens currently requires a manual re-push to re-evaluate the job if conditions. Include labeled explicitly.
… fixture Heroku's Python CNB buildpack (heroku/python@6.4.1+) no longer accepts runtime.txt — it fails fast with "The runtime.txt file isn't supported" and points users at .python-version. GCP's buildpacks still accept runtime.txt. Keeping both files in the fixture exercises either path without the matrix needing per-builder fixtures. Caller still passes RUNTIME_VERSION as a full version string (e.g. 3.11.9); heroku prefers just major.minor and may warn about pinning, but accepts the patch version.
Heroku's Python CNB buildpack treats the presence of runtime.txt as a
hard error ("The runtime.txt file isn't supported ... Please delete
your runtime.txt file"), not a deprecation warning. GCP's buildpacks
prefer .python-version when both are present (the build log reads
"Using Python version from /workspace/.python-version: 3.11.9").
So runtime.txt is unambiguously obsolete for this fixture. Keep only
.python-version.tmpl.
- Drop mention of tier1/03-version-check.sh (removed in 9c102b5). - Add tier3/04-buildpacks-matrix.sh to the structure tree. - Remove hardcoded "cd /home/sprite/flyctl/rchab" from the pre-commit install snippet — was left over from a previous author's dev env.
…rix (#42) The header comment claimed surprise-success returns exit 0 'because it's still info, not a regression', but the code at line 128 deliberately exits 5 so CI surfaces the row in red and the EXPECT_FAIL marker gets removed. Align the header with the actual (intentional) behavior and enumerate the rest of the exit codes while we're here.
The Go-client portion of the API v1.44 compatibility test previously swallowed every stderr with 2>/dev/null and fell through to a 'skipping programmatic test' banner when go build returned non-zero. Because this is described in the README as THE critical regression test for the buildpacks 'client version 1.52 is too new' breakage, silently skipping it on toolchain issues defeats the purpose. - Build in a mktemp dir so go.mod/go.sum don't leak to /tmp. - Run the whole build-and-run sequence in a subshell and rely on set -e to surface any failure. - Drop the stderr redirections so failures print their actual cause.
run_test calls 'exit 1' on first failure, which means set -e skipped the cleanup block at the bottom of run-tests.sh. That left the rchab-test container running on the host between local iterations, and then subsequent docker-run calls failed with 'name already in use'. Hoist the cleanup into a trap on EXIT so it fires on any exit path, and factor the repeated tier1/tier2 invocation lists into small helper functions while the file is open.
The 45-second sleep was a worst-case estimate of dockerd + dockerproxy startup time. On a cold runner it was sometimes still too short; on a warm local machine it was ~30s of waste per test run, and container failures only surfaced after the full wait when the subsequent curl finally errored out. Replace it with a poll loop that: - hits the dockerd /_ping endpoint (proxied straight through by dockerproxy on :2375) every 2s, - gives up if the container dies early, dumping the last 50 log lines instead of timing out silently, - defaults to a 60s timeout, overridable via RCHAB_READY_TIMEOUT.
There was a problem hiding this comment.
Pull request overview
This PR upgrades rchab’s Docker Engine base (Docker 25 / API v1.44) to unblock modern buildpacks tooling, and adds a tiered CI + local test harness to prevent regressions (including a dedicated API v1.44 compatibility check).
Changes:
- Upgrade runtime/build tooling: Docker base image →
docker:25.0.5-alpine3.20, overlaybd/snapshotter updates, Go toolchain + Docker SDK bump. - Add a 3-tier test suite (tier1/2/3) plus a buildpacks builder×language matrix runner.
- Restructure GitHub Actions CI into separate lint/test/build/buildpacks jobs with tier selection via branch/tag/labels.
Reviewed changes
Copilot reviewed 33 out of 35 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
Dockerfile |
Bumps Go builder images and Docker base image; updates overlaybd build inputs and removes explicit buildx copy. |
dockerproxy/go.mod |
Updates Go version and Docker SDK dependency set for Docker 25. |
dockerproxy/go.sum |
Dependency lockfile updates following module upgrades. |
dockerproxy/storage.go |
Updates moby API doc link to v1.44. |
.github/workflows/ci.yaml |
Splits CI into lint/test/buildpacks/build jobs; adds tier selection and buildpacks matrix job. |
tests/run-tests.sh |
Adds tier-aware local/CI test runner orchestrating tier1/2/3 scripts. |
tests/lib/common.sh |
Adds shared helpers to start/stop an rchab test container and run tier scripts. |
tests/tier1/01-go-unit-tests.sh |
Tier1 Go unit test + vet + module verify wrapper. |
tests/tier1/02-docker-build-verify.sh |
Tier1 Docker image build/inspect verification. |
tests/tier2/01-api-compatibility.sh |
Tier2 critical API v1.44 regression test, including a small Go client. |
tests/tier2/02-docker-in-docker.sh |
Tier2 nested-docker smoke test. |
tests/tier2/03-endpoint-smoke.sh |
Tier2 smoke tests for rchab custom endpoints and proxy ping/version. |
tests/tier3/01-overlaybd.sh |
Tier3 overlaybd endpoint/asset smoke test. |
tests/tier3/02-storage-pruning.sh |
Tier3 prune endpoint + disk/image listing checks. |
tests/tier3/03-integration.sh |
Tier3 end-to-end build/run workflow + misc checks. |
tests/tier3/04-buildpacks-matrix.sh |
CI-oriented runner for pack build against rchab for one builder/fixture pair. |
tests/fixtures/buildpacks/ruby/config.ru |
Ruby buildpacks fixture app. |
tests/fixtures/buildpacks/ruby/Procfile |
Ruby fixture Procfile. |
tests/fixtures/buildpacks/ruby/Gemfile.tmpl |
Ruby Gemfile template. |
tests/fixtures/buildpacks/ruby/Gemfile.lock.tmpl |
Ruby Gemfile.lock template with runtime version substitution. |
tests/fixtures/buildpacks/python/app.py |
Python buildpacks fixture app. |
tests/fixtures/buildpacks/python/requirements.txt |
Python fixture deps. |
tests/fixtures/buildpacks/python/Procfile |
Python fixture Procfile. |
tests/fixtures/buildpacks/python/.python-version.tmpl |
Python runtime version template. |
tests/fixtures/buildpacks/nodejs/index.js |
Node buildpacks fixture app. |
tests/fixtures/buildpacks/nodejs/package.json.tmpl |
Node runtime version template via engines field. |
tests/fixtures/buildpacks/nodejs/Procfile |
Node fixture Procfile. |
tests/README.md |
Documentation for tiers, local usage, and CI behavior. |
tests/.gitignore |
Ignores generated test outputs/logs under tests/. |
test-buildpacks-deploy.sh |
Local helper script for generating and optionally deploying a buildpacks app via flyctl + rchab override. |
Makefile |
Adds test, test-integration, test-all, and lint targets. |
CLAUDE.md |
Repository/architecture notes (tooling/CI/env vars), updated for Docker 25 + Go changes. |
.pre-commit-config.yaml |
Adds pre-commit hooks (whitespace/yaml + local Go hooks). |
.gitignore |
Ignores built dockerproxy/dockerproxy binary. |
.dockerignore |
Expands build context ignores (tests, docs, Vagrant, IDE files, etc.). |
Comments suppressed due to low confidence (1)
dockerproxy/go.mod:71
dockerproxy/go.modboth requiresgithub.com/docker/docker v25.0.5+incompatibleand replaces it with the exact same version. Thereplaceis redundant and can make future dependency updates more confusing; consider removing it unless it’s addressing a specific transitive-resolution issue (in which case, add a short comment explaining why it’s needed).
replace github.com/docker/docker => github.com/docker/docker v25.0.5+incompatible
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| docker run -d \ | ||
| --privileged \ | ||
| --name rchab-test \ | ||
| -p 8080:8080 \ | ||
| -p 2375:2375 \ | ||
| -e NO_AUTH=1 \ | ||
| -e NO_APP_NAME=1 \ | ||
| -e FLY_APP_NAME=rchab-test \ | ||
| -v /tmp/rchab-data:/data \ | ||
| --entrypoint /bin/sh \ |
There was a problem hiding this comment.
start_rchab_container binds /tmp/rchab-data on the host to /data in the container. Because this path is static, local runs (or multiple test invocations) can interfere with each other and accumulate state/disk usage across runs. Consider using a per-run temp dir (and removing it in stop_rchab_container/a trap) to keep tests hermetic.
| -e FLY_APP_NAME=rchab-test \ | ||
| -v /tmp/rchab-data:/data \ | ||
| --entrypoint /bin/sh \ | ||
| "${image}" -c "dockerd &>/var/log/dockerd.log & sleep 5 && /dockerproxy" |
There was a problem hiding this comment.
The comment says the container startup avoids “blind-sleeping”, but the actual command still does sleep 5 before starting /dockerproxy. Either remove the fixed sleep (rely on the existing poll loop) or update the comment so it matches the behavior.
| "${image}" -c "dockerd &>/var/log/dockerd.log & sleep 5 && /dockerproxy" | |
| "${image}" -c "dockerd &>/var/log/dockerd.log & /dockerproxy" |
| if version.APIVersion < "1.44" { | ||
| fmt.Fprintf(os.Stderr, "✗ ERROR: API version %s too old (need 1.44+)\n", version.APIVersion) | ||
| os.Exit(1) | ||
| } |
There was a problem hiding this comment.
This comparison is lexicographical (string) ordering, not a numeric API-version check. For example, versions like "1.9" vs "1.44" or future "1.100" can produce incorrect results. Parse the version components (split on '.') and compare numerically, or use a proper version parser before deciding the server is < 1.44.
| echo "Test 1: /flyio/v1/settings endpoint..." | ||
| RESPONSE=$(curl -s http://localhost:8080/flyio/v1/settings) | ||
| echo "${RESPONSE}" | jq . | ||
| echo "✓ Settings endpoint works" |
There was a problem hiding this comment.
This script assumes jq is installed; if it isn’t, set -euo pipefail will cause the test to abort even though the endpoint response might be fine. Either add a command -v jq preflight with a clear error, or make the jq pretty-print optional (fallback to raw output).
| WORKDIR /app | ||
| COPY dockerproxy . | ||
| RUN GOOS=linux GARCH=amd64 CGO_ENABLED=0 go build -o dockerproxy -ldflags "-X main.gitSha=$BUILD_SHA -X main.buildTime=$(date +'%Y-%m-%dT%TZ')" |
There was a problem hiding this comment.
In the dockerproxy build stage, GARCH=amd64 is not a recognized Go environment variable (it should be GOARCH). Also, $BUILD_SHA will be empty here because ARG BUILD_SHA is only declared in the final stage, so the ldflags gitSha will not be populated.
| WORKDIR /app | |
| COPY dockerproxy . | |
| RUN GOOS=linux GARCH=amd64 CGO_ENABLED=0 go build -o dockerproxy -ldflags "-X main.gitSha=$BUILD_SHA -X main.buildTime=$(date +'%Y-%m-%dT%TZ')" | |
| ARG BUILD_SHA | |
| WORKDIR /app | |
| COPY dockerproxy . | |
| RUN GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o dockerproxy -ldflags "-X main.gitSha=$BUILD_SHA -X main.buildTime=$(date +'%Y-%m-%dT%TZ')" |
| - ✅ Every push to any branch | ||
| - ✅ Manual workflow dispatch | ||
|
|
There was a problem hiding this comment.
The CI behavior described here doesn’t match the current workflow triggers: .github/workflows/ci.yaml only runs on push to main (plus tags) and on pull_request. “Every push to any branch” is no longer accurate unless there’s another workflow not shown here.
| - ✅ Every push to any branch | |
| - ✅ Manual workflow dispatch | |
| - ✅ Manual workflow dispatch |
| # Source common functions | ||
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
| source "${SCRIPT_DIR}/lib/common.sh" | ||
|
|
There was a problem hiding this comment.
Consider adding a small preflight here for required local dependencies (docker, curl, and jq at minimum). Right now missing tools will fail later with less actionable errors, even though this is intended to be a developer-facing test runner.
| check_required_dependencies() { | |
| local missing=() | |
| local dependency | |
| for dependency in docker curl jq; do | |
| if ! command -v "${dependency}" >/dev/null 2>&1; then | |
| missing+=("${dependency}") | |
| fi | |
| done | |
| if [ "${#missing[@]}" -gt 0 ]; then | |
| echo "Error: Missing required local dependencies: ${missing[*]}" | |
| echo "Please install the missing tool(s) before running this test suite." | |
| exit 1 | |
| fi | |
| } | |
| check_required_dependencies |
| name: go mod tidy | ||
| entry: bash -c 'cd dockerproxy && go mod tidy' | ||
| language: system | ||
| files: \.go$ |
There was a problem hiding this comment.
The go-mod-tidy hook is configured to run only when a *.go file changes, so edits to dockerproxy/go.mod / go.sum won’t trigger it. Update the files: selector (or use types: [go, go-mod]) so dependency-only changes still run go mod tidy.
| files: \.go$ | |
| files: ^dockerproxy/.*\.go$|^dockerproxy/go\.(mod|sum)$ |
Summary
Upgrades the rchab base image from
docker:24.0.7-alpine3.19todocker:25.0.5-alpine3.20(Docker API 1.43 → 1.44) to fix the silent buildpacks failure onflyctl deploy --remote-only:Modern buildpacks lifecycle tooling and recent
packbuilds speak Docker API ≥ 1.44, which rchab's olddocker:24.*base couldn't serve.Closes #36.
Pairs with flyctl#4829 (already merged) — not a shipping dependency, referenced for context.
Changes
Dockerfiledocker:24.0.7-alpine3.19→docker:25.0.5-alpine3.20; overlaybd v1.0.4 → v1.4.1 (pinned commit); buildx now bundled in base, explicit copy removeddockerproxy/go.moddocker/dockerv20.10.8 → v25.0.5; dropped obsoletecontainerd/dockerreplace directivesdockerproxy/storage.go.github/workflows/ci.yamlmainjob →lint/test/buildpacks/build; label-driven tier escalation (test:tier2/test:tier3/test:buildpacks); buildpacks matrixcontinue-on-error— informational for nowtests/CLAUDE.md, expanded.dockerignore, pre-commit hooks, Makefile lint targets