Skip to content

Docker 25 / API v1.44: fix buildpacks remote-builder breakage#42

Open
jphenow wants to merge 31 commits into
mainfrom
jphenow/buildpacks
Open

Docker 25 / API v1.44: fix buildpacks remote-builder breakage#42
jphenow wants to merge 31 commits into
mainfrom
jphenow/buildpacks

Conversation

@jphenow

@jphenow jphenow commented Feb 6, 2026

Copy link
Copy Markdown
Member

Summary

Upgrades the rchab base image from docker:24.0.7-alpine3.19 to docker:25.0.5-alpine3.20 (Docker API 1.43 → 1.44) to fix the silent buildpacks failure on flyctl deploy --remote-only:

client version 1.52 is too new. Maximum supported API version is 1.43

Modern buildpacks lifecycle tooling and recent pack builds speak Docker API ≥ 1.44, which rchab's old docker:24.* base couldn't serve.

Closes #36.

Pairs with flyctl#4829 (already merged) — not a shipping dependency, referenced for context.

Changes

Area Change
Dockerfile base docker:24.0.7-alpine3.19docker:25.0.5-alpine3.20; overlaybd v1.0.4 → v1.4.1 (pinned commit); buildx now bundled in base, explicit copy removed
dockerproxy/go.mod Go 1.21 → 1.24; docker/docker v20.10.8 → v25.0.5; dropped obsolete containerd / docker replace directives
dockerproxy/storage.go moby API doc URL bumped to v1.44.yaml (cosmetic)
.github/workflows/ci.yaml split monolithic main job → lint / test / buildpacks / build; label-driven tier escalation (test:tier2 / test:tier3 / test:buildpacks); buildpacks matrix continue-on-error — informational for now
tests/ new tiered suite: tier1 (fast), tier2 (API v1.44 regression — the critical one), tier3 (overlaybd, storage, buildpacks matrix across Heroku / Paketo / GCP × Ruby / Node / Python)
Plumbing CLAUDE.md, expanded .dockerignore, pre-commit hooks, Makefile lint targets

Update base Docker image from 24.0.7-alpine3.19 to 25.0.5-alpine3.20.
Update Go version from 1.21 to 1.24.0. This prepares the foundation for
upgrading dependent components.
…ersions

Upgrade accelerated-container-image from v1.0.4 to v1.4.1 for improved
performance. Pin overlaybd to a specific commit (6a6651652014bbcc5dd87a49f15ca2638ae9b1dc)
and update alpine base image from 3.19 to 3.20. Update buildx from v0.12 to
v0.13.1 to leverage latest Docker build improvements.
Update docker/docker package from v20.10.8 to v25.0.5+incompatible to align
with the upgraded Docker base image. Update Go module dependencies to resolve
transitive dependencies introduced by newer Docker version, including logrus,
protobuf, gRPC, and OpenTelemetry libraries. Update documentation URL in
storage.go comment to point to current API version (v1.44).
Updated Go versions in multi-stage build to match project requirements:
- overlaybd_snapshotter_build: Go 1.21 → 1.23 (overlaybd-snapshotter v1.4.1 requires Go 1.23+)
- dockerproxy_build: Go 1.21 → 1.24 (dockerproxy go.mod requires Go 1.24+)
- Fixed docker/buildx-bin tag from v0.13.1 (non-existent) to v0.13 (available)

These version constraints are dictated by the upstream projects and ensure all
components compile successfully in the Docker 25.0.5 environment.
Created 9-phase test suite to validate Docker 25.0.5 upgrade with emphasis
on buildpacks API v1.44 compatibility (resolves the "client version 1.52 is
too new. Maximum supported API version is 1.43" error).

Test phases:
- Environment setup and Docker installation
- Go build verification and code quality checks
- Docker image build with overlaybd compilation
- Component version verification
- Docker API v1.44 functionality testing
- Buildpacks API v1.44 compatibility (critical test)
- overlaybd image conversion
- Storage management and pruning
- End-to-end integration tests

All tests passed, confirming Docker 25.0.5 with API v1.44 is ready for deployment.
Documented comprehensive test results from Docker 25.0.5 upgrade validation,
including all 9 test phases, version confirmations, and the critical buildpacks
API v1.44 compatibility verification. Records the issues encountered during
upgrade (Go version constraints, missing buildx tag) and their resolutions.

This serves as both validation proof and reference documentation for the
upgrade process.
@jphenow jphenow requested a review from dangra February 6, 2026 23:04

@dangra dangra left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Jon. The tests are a nice thing to have, it would be ideal to hook the CI with running them in a sprite but that something for another day

@jphenow

jphenow commented Feb 7, 2026

Copy link
Copy Markdown
Member Author

Yea I'll probably do before I merge probably

Add comprehensive exclusions for git artifacts, tests, documentation,
development files, build artifacts, and IDE configurations. This reduces
image size and keeps the build context focused on runtime requirements.
Establish consistent code quality checks with pre-commit hooks for
trailing whitespace, YAML validation, and Go formatting. Add corresponding
Makefile targets to make linting easily accessible locally.
Replace single monolithic job with three specialized jobs (lint, test, build)
that run in sequence with proper dependencies. Add tiered testing capability
with workflow_dispatch inputs and smart tier selection based on branch/tag.
Implement concurrent cancellation and modern GitHub Actions versions.
Add three-tier test structure supporting different testing scenarios:
Tier 1 for fast PR validation, Tier 2 for critical integration tests
(including API v1.44 compatibility check), and Tier 3 for full validation.
Include test harness with common helper functions, per-tier test scripts,
and comprehensive documentation explaining the testing strategy and how
to run tests locally and in CI.
Replace hardcoded absolute paths with a portable method that computes
the repository root relative to the test script location. This allows
tests to run from any working directory and makes the test suite more
robust across different execution environments.
Previously, CI would run on all branches when pushed. This caused
unnecessary build overhead and noise in logs. By restricting to main
and tags, we reduce CI load while maintaining safety checks on PRs
and explicit deployments.
Instead of relying on the default entrypoint (which requires --privileged
for dockerd setup), explicitly invoke docker and dockerd binaries. This
allows version checks to run without elevated privileges while keeping
the logic simpler and more direct.
Update the Go version across the build system and documentation to
support newer language features and improvements. This upgrade is
required for modern dependency compatibility.
Remove CLAUDE.md from Docker build context (it's documentation, not
a runtime dependency), add dockerproxy binary to gitignore, and update
a GitHub link to reference a specific commit for consistency.
Replace the old sequential test suite (00-08) with a tiered test
structure already implemented in tier1/tier2/tier3. The new structure
is modular, maintainable, and maps 1:1 to the old scripts. Update
README to remove migration notes now that the transition is complete.
Extend the tier1 version check to verify Buildx v0.13.x and Alpine 3.20.x
are present in the image, ensuring all critical components are correctly
versioned alongside Docker itself.
The buildx binary is now provided via the Alpine base image or Docker
installation, eliminating the need for an explicit multi-stage copy.
This simplifies the build process and reduces maintenance overhead.
The version check test (tier1/03-version-check.sh) verified component
versions that are now effectively guaranteed by the Docker image build
itself. Removing this redundant test simplifies the test suite and
reduces CI execution time without losing coverage of the actual
components present in the image.
@jphenow

jphenow commented Feb 9, 2026

Copy link
Copy Markdown
Member Author

incorporated into CI with some adjustments. I'll work on some testing within flyctl this week.

Scaffold a tier3 matrix job that exercises the full pack lifecycle through
a locally-run rchab container for each major CNB builder family. Intent is
regression surface, not green wall — older Heroku stacks (heroku-20,
classic:cnb) are marked EXPECT_FAIL=1 so they surface as informational
when their upstream dependencies 403, but still flag unexpected successes.

Also:
- test-buildpacks-deploy.sh: pin ruby 3.3.5p100 (required by heroku-24)
  and set [env] PORT=3000 so the heroku/ruby default web process doesn't
  boot-loop on `${PORT:?}`.
- Add tests/.gitignore exception for fixtures/**/*.txt (requirements.txt
  and runtime.txt are source, not test output).

The matrix is gated: runs on main, release tags, and PRs with the
`test:buildpacks` label. `continue-on-error: true` at the job level, no
dependency from the `build` job — informational for now, promote rows to
required later once stable.
jphenow added 3 commits April 17, 2026 13:18
Default pull_request trigger types are opened/synchronize/reopened — not
labeled. Adding test:tier2 / test:tier3 / test:buildpacks after a PR
opens currently requires a manual re-push to re-evaluate the job if
conditions. Include labeled explicitly.
… fixture

Heroku's Python CNB buildpack (heroku/python@6.4.1+) no longer accepts
runtime.txt — it fails fast with "The runtime.txt file isn't supported"
and points users at .python-version. GCP's buildpacks still accept
runtime.txt. Keeping both files in the fixture exercises either path
without the matrix needing per-builder fixtures.

Caller still passes RUNTIME_VERSION as a full version string (e.g.
3.11.9); heroku prefers just major.minor and may warn about pinning,
but accepts the patch version.
jphenow added 6 commits April 17, 2026 14:39
Heroku's Python CNB buildpack treats the presence of runtime.txt as a
hard error ("The runtime.txt file isn't supported ... Please delete
your runtime.txt file"), not a deprecation warning. GCP's buildpacks
prefer .python-version when both are present (the build log reads
"Using Python version from /workspace/.python-version: 3.11.9").

So runtime.txt is unambiguously obsolete for this fixture. Keep only
.python-version.tmpl.
- Drop mention of tier1/03-version-check.sh (removed in 9c102b5).
- Add tier3/04-buildpacks-matrix.sh to the structure tree.
- Remove hardcoded "cd /home/sprite/flyctl/rchab" from the pre-commit
  install snippet — was left over from a previous author's dev env.
…rix (#42)

The header comment claimed surprise-success returns exit 0 'because it's
still info, not a regression', but the code at line 128 deliberately
exits 5 so CI surfaces the row in red and the EXPECT_FAIL marker gets
removed. Align the header with the actual (intentional) behavior and
enumerate the rest of the exit codes while we're here.
The Go-client portion of the API v1.44 compatibility test previously
swallowed every stderr with 2>/dev/null and fell through to a 'skipping
programmatic test' banner when go build returned non-zero. Because this
is described in the README as THE critical regression test for the
buildpacks 'client version 1.52 is too new' breakage, silently skipping
it on toolchain issues defeats the purpose.

- Build in a mktemp dir so go.mod/go.sum don't leak to /tmp.
- Run the whole build-and-run sequence in a subshell and rely on
  set -e to surface any failure.
- Drop the stderr redirections so failures print their actual cause.
run_test calls 'exit 1' on first failure, which means set -e skipped
the cleanup block at the bottom of run-tests.sh. That left the
rchab-test container running on the host between local iterations,
and then subsequent docker-run calls failed with 'name already in use'.

Hoist the cleanup into a trap on EXIT so it fires on any exit path,
and factor the repeated tier1/tier2 invocation lists into small helper
functions while the file is open.
The 45-second sleep was a worst-case estimate of dockerd + dockerproxy
startup time. On a cold runner it was sometimes still too short; on a
warm local machine it was ~30s of waste per test run, and container
failures only surfaced after the full wait when the subsequent curl
finally errored out.

Replace it with a poll loop that:
- hits the dockerd /_ping endpoint (proxied straight through by
  dockerproxy on :2375) every 2s,
- gives up if the container dies early, dumping the last 50 log
  lines instead of timing out silently,
- defaults to a 60s timeout, overridable via RCHAB_READY_TIMEOUT.
@jphenow jphenow changed the title Jphenow/buildpacks Docker 25 / API v1.44: fix buildpacks remote-builder breakage Apr 20, 2026
@dangra dangra requested a review from Copilot April 23, 2026 03:18

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades rchab’s Docker Engine base (Docker 25 / API v1.44) to unblock modern buildpacks tooling, and adds a tiered CI + local test harness to prevent regressions (including a dedicated API v1.44 compatibility check).

Changes:

  • Upgrade runtime/build tooling: Docker base image → docker:25.0.5-alpine3.20, overlaybd/snapshotter updates, Go toolchain + Docker SDK bump.
  • Add a 3-tier test suite (tier1/2/3) plus a buildpacks builder×language matrix runner.
  • Restructure GitHub Actions CI into separate lint/test/build/buildpacks jobs with tier selection via branch/tag/labels.

Reviewed changes

Copilot reviewed 33 out of 35 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
Dockerfile Bumps Go builder images and Docker base image; updates overlaybd build inputs and removes explicit buildx copy.
dockerproxy/go.mod Updates Go version and Docker SDK dependency set for Docker 25.
dockerproxy/go.sum Dependency lockfile updates following module upgrades.
dockerproxy/storage.go Updates moby API doc link to v1.44.
.github/workflows/ci.yaml Splits CI into lint/test/buildpacks/build jobs; adds tier selection and buildpacks matrix job.
tests/run-tests.sh Adds tier-aware local/CI test runner orchestrating tier1/2/3 scripts.
tests/lib/common.sh Adds shared helpers to start/stop an rchab test container and run tier scripts.
tests/tier1/01-go-unit-tests.sh Tier1 Go unit test + vet + module verify wrapper.
tests/tier1/02-docker-build-verify.sh Tier1 Docker image build/inspect verification.
tests/tier2/01-api-compatibility.sh Tier2 critical API v1.44 regression test, including a small Go client.
tests/tier2/02-docker-in-docker.sh Tier2 nested-docker smoke test.
tests/tier2/03-endpoint-smoke.sh Tier2 smoke tests for rchab custom endpoints and proxy ping/version.
tests/tier3/01-overlaybd.sh Tier3 overlaybd endpoint/asset smoke test.
tests/tier3/02-storage-pruning.sh Tier3 prune endpoint + disk/image listing checks.
tests/tier3/03-integration.sh Tier3 end-to-end build/run workflow + misc checks.
tests/tier3/04-buildpacks-matrix.sh CI-oriented runner for pack build against rchab for one builder/fixture pair.
tests/fixtures/buildpacks/ruby/config.ru Ruby buildpacks fixture app.
tests/fixtures/buildpacks/ruby/Procfile Ruby fixture Procfile.
tests/fixtures/buildpacks/ruby/Gemfile.tmpl Ruby Gemfile template.
tests/fixtures/buildpacks/ruby/Gemfile.lock.tmpl Ruby Gemfile.lock template with runtime version substitution.
tests/fixtures/buildpacks/python/app.py Python buildpacks fixture app.
tests/fixtures/buildpacks/python/requirements.txt Python fixture deps.
tests/fixtures/buildpacks/python/Procfile Python fixture Procfile.
tests/fixtures/buildpacks/python/.python-version.tmpl Python runtime version template.
tests/fixtures/buildpacks/nodejs/index.js Node buildpacks fixture app.
tests/fixtures/buildpacks/nodejs/package.json.tmpl Node runtime version template via engines field.
tests/fixtures/buildpacks/nodejs/Procfile Node fixture Procfile.
tests/README.md Documentation for tiers, local usage, and CI behavior.
tests/.gitignore Ignores generated test outputs/logs under tests/.
test-buildpacks-deploy.sh Local helper script for generating and optionally deploying a buildpacks app via flyctl + rchab override.
Makefile Adds test, test-integration, test-all, and lint targets.
CLAUDE.md Repository/architecture notes (tooling/CI/env vars), updated for Docker 25 + Go changes.
.pre-commit-config.yaml Adds pre-commit hooks (whitespace/yaml + local Go hooks).
.gitignore Ignores built dockerproxy/dockerproxy binary.
.dockerignore Expands build context ignores (tests, docs, Vagrant, IDE files, etc.).
Comments suppressed due to low confidence (1)

dockerproxy/go.mod:71

  • dockerproxy/go.mod both requires github.com/docker/docker v25.0.5+incompatible and replaces it with the exact same version. The replace is redundant and can make future dependency updates more confusing; consider removing it unless it’s addressing a specific transitive-resolution issue (in which case, add a short comment explaining why it’s needed).

replace github.com/docker/docker => github.com/docker/docker v25.0.5+incompatible


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/lib/common.sh
Comment on lines +47 to +56
docker run -d \
--privileged \
--name rchab-test \
-p 8080:8080 \
-p 2375:2375 \
-e NO_AUTH=1 \
-e NO_APP_NAME=1 \
-e FLY_APP_NAME=rchab-test \
-v /tmp/rchab-data:/data \
--entrypoint /bin/sh \

Copilot AI Apr 23, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_rchab_container binds /tmp/rchab-data on the host to /data in the container. Because this path is static, local runs (or multiple test invocations) can interfere with each other and accumulate state/disk usage across runs. Consider using a per-run temp dir (and removing it in stop_rchab_container/a trap) to keep tests hermetic.

Copilot uses AI. Check for mistakes.
Comment thread tests/lib/common.sh
-e FLY_APP_NAME=rchab-test \
-v /tmp/rchab-data:/data \
--entrypoint /bin/sh \
"${image}" -c "dockerd &>/var/log/dockerd.log & sleep 5 && /dockerproxy"

Copilot AI Apr 23, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the container startup avoids “blind-sleeping”, but the actual command still does sleep 5 before starting /dockerproxy. Either remove the fixed sleep (rely on the existing poll loop) or update the comment so it matches the behavior.

Suggested change
"${image}" -c "dockerd &>/var/log/dockerd.log & sleep 5 && /dockerproxy"
"${image}" -c "dockerd &>/var/log/dockerd.log & /dockerproxy"

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +91
if version.APIVersion < "1.44" {
fmt.Fprintf(os.Stderr, "✗ ERROR: API version %s too old (need 1.44+)\n", version.APIVersion)
os.Exit(1)
}

Copilot AI Apr 23, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comparison is lexicographical (string) ordering, not a numeric API-version check. For example, versions like "1.9" vs "1.44" or future "1.100" can produce incorrect results. Parse the version components (split on '.') and compare numerically, or use a proper version parser before deciding the server is < 1.44.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +24
echo "Test 1: /flyio/v1/settings endpoint..."
RESPONSE=$(curl -s http://localhost:8080/flyio/v1/settings)
echo "${RESPONSE}" | jq .
echo "✓ Settings endpoint works"

Copilot AI Apr 23, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script assumes jq is installed; if it isn’t, set -euo pipefail will cause the test to abort even though the endpoint response might be fine. Either add a command -v jq preflight with a clear error, or make the jq pretty-print optional (fallback to raw output).

Copilot uses AI. Check for mistakes.
Comment thread Dockerfile
Comment on lines 26 to 28
WORKDIR /app
COPY dockerproxy .
RUN GOOS=linux GARCH=amd64 CGO_ENABLED=0 go build -o dockerproxy -ldflags "-X main.gitSha=$BUILD_SHA -X main.buildTime=$(date +'%Y-%m-%dT%TZ')"

Copilot AI Apr 23, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the dockerproxy build stage, GARCH=amd64 is not a recognized Go environment variable (it should be GOARCH). Also, $BUILD_SHA will be empty here because ARG BUILD_SHA is only declared in the final stage, so the ldflags gitSha will not be populated.

Suggested change
WORKDIR /app
COPY dockerproxy .
RUN GOOS=linux GARCH=amd64 CGO_ENABLED=0 go build -o dockerproxy -ldflags "-X main.gitSha=$BUILD_SHA -X main.buildTime=$(date +'%Y-%m-%dT%TZ')"
ARG BUILD_SHA
WORKDIR /app
COPY dockerproxy .
RUN GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o dockerproxy -ldflags "-X main.gitSha=$BUILD_SHA -X main.buildTime=$(date +'%Y-%m-%dT%TZ')"

Copilot uses AI. Check for mistakes.
Comment thread tests/README.md
Comment on lines +88 to +90
- ✅ Every push to any branch
- ✅ Manual workflow dispatch

Copilot AI Apr 23, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI behavior described here doesn’t match the current workflow triggers: .github/workflows/ci.yaml only runs on push to main (plus tags) and on pull_request. “Every push to any branch” is no longer accurate unless there’s another workflow not shown here.

Suggested change
- ✅ Every push to any branch
- ✅ Manual workflow dispatch
- ✅ Manual workflow dispatch

Copilot uses AI. Check for mistakes.
Comment thread tests/run-tests.sh
# Source common functions
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/lib/common.sh"

Copilot AI Apr 23, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a small preflight here for required local dependencies (docker, curl, and jq at minimum). Right now missing tools will fail later with less actionable errors, even though this is intended to be a developer-facing test runner.

Suggested change
check_required_dependencies() {
local missing=()
local dependency
for dependency in docker curl jq; do
if ! command -v "${dependency}" >/dev/null 2>&1; then
missing+=("${dependency}")
fi
done
if [ "${#missing[@]}" -gt 0 ]; then
echo "Error: Missing required local dependencies: ${missing[*]}"
echo "Please install the missing tool(s) before running this test suite."
exit 1
fi
}
check_required_dependencies

Copilot uses AI. Check for mistakes.
Comment thread .pre-commit-config.yaml
name: go mod tidy
entry: bash -c 'cd dockerproxy && go mod tidy'
language: system
files: \.go$

Copilot AI Apr 23, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The go-mod-tidy hook is configured to run only when a *.go file changes, so edits to dockerproxy/go.mod / go.sum won’t trigger it. Update the files: selector (or use types: [go, go-mod]) so dependency-only changes still run go mod tidy.

Suggested change
files: \.go$
files: ^dockerproxy/.*\.go$|^dockerproxy/go\.(mod|sum)$

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Any way to get access to Docker Engine 25+ on remote builders?

3 participants