Skip to content

ci: cache kurtosis infra images and retry engine bootstrap#21602

Queued
yperbasis wants to merge 1 commit into
mainfrom
yperbasis/kurtosis-infra-image-cache
Queued

ci: cache kurtosis infra images and retry engine bootstrap#21602
yperbasis wants to merge 1 commit into
mainfrom
yperbasis/kurtosis-infra-image-cache

Conversation

@yperbasis
Copy link
Copy Markdown
Member

Problem

In this caplin-minimal kurtosis job Docker Hub was unreachable from the runner: the Kurtosis engine bootstrap — which happens inside kurtosis run, mid-action, where it cannot be retried — tried to pull timberio/vector:0.45.0-debian and timed out on registry-1.docker.io, failing the job before any test ran. The workflows already cache CL images (docker save / actions/cache / docker load) precisely to avoid Docker Hub exposure, but Kurtosis's own infrastructure images weren't covered.

Fix

Applied to test-kurtosis-assertoor.yml and test-kurtosis-gloas.yml.

Take Docker Hub off the critical path

  • Pin KURTOSIS_VERSION: 1.15.2 and pass it to the assertoor action via its kurtosis_version input. Previously the action installed whatever apt.fury.io serves (its default is latest), so the CLI version — and with it the engine image tag — could drift silently.
  • Extend the cached image set with the five Kurtosis infra images, verified against the kurtosis 1.15.2 source: kurtosistech/engine, kurtosistech/core (APIC) and kurtosistech/files-artifacts-expander (all tagged with the CLI version), timberio/vector:0.45.0-debian (logs aggregator — the pull that failed) and fluent/fluent-bit:4.0.0 (logs collector). Kurtosis uses the missing image-download mode, so pre-loaded images are used without any registry call.

Retry the cheap part

  • New "Install Kurtosis CLI and start engine" step before the assertoor action: 3 attempts with backoff, kurtosis engine stop between attempts. The action reuses a running engine, so engine bootstrap moves out of the un-retryable composite action into a retryable ~15 s step — a registry blip no longer costs a 20+ minute test step.
  • The cache-miss pull step now retries each docker pull 3× with backoff.
  • Added the Conditional Docker Login to the assertoor matrix job — it was the only job pulling from Docker Hub anonymously on cache misses (the build job and gloas already log in).

Notes

  • First run after merge is a one-time cache miss on the new key (now with retries + authenticated pulls). The gloas workflow saves its cache only on non-pull_request events, so one workflow_dispatch run after merge warms it.
  • Out of scope: images pulled by ethereum-package itself (e.g. ethereum-genesis-generator — version owned by the package branch, falls back to a normal pull), and qa-txpool-performance-test.yml (erigontech fork of the action on self-hosted runners with persistent local image caches).
  • No TDD cycle: mechanical CI workflow change. Validated with actionlint (same flags as the lint workflow) and shellcheck on the new run blocks; image names/tags verified against the kurtosis 1.15.2 sources.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Kurtosis-based CI workflows against transient Docker Hub outages by pre-caching Kurtosis infrastructure images and moving Kurtosis engine bootstrap into an explicit, retryable step before the (non-retryable) composite action runs.

Changes:

  • Pin the Kurtosis CLI version and pass it into the assertoor action to avoid silent CLI/engine drift.
  • Extend Docker image caching to include Kurtosis infra images (engine/core/files-artifacts-expander/vector/fluent-bit) and add retry-with-backoff for cache-miss pulls.
  • Add a dedicated “Install Kurtosis CLI and start engine” step with retries so engine bootstrap is no longer hidden mid-action.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
.github/workflows/test-kurtosis-gloas.yml Adds pinned Kurtosis version + infra image caching, pull retries, and a retryable engine bootstrap step before running the assertoor action.
.github/workflows/test-kurtosis-assertoor.yml Adds conditional Docker Hub login for the matrix job, pins Kurtosis version + infra image caching, pull retries, and a retryable engine bootstrap step before running the assertoor action.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@taratorio taratorio added this pull request to the merge queue Jun 3, 2026
Any commits made after this event will not be merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants