Skip to content

feat(ci): add disconnected OCP smoke test for Helm and Operator#5008

Draft
zdrapela wants to merge 15 commits into
redhat-developer:mainfrom
zdrapela:rhdh.3
Draft

feat(ci): add disconnected OCP smoke test for Helm and Operator#5008
zdrapela wants to merge 15 commits into
redhat-developer:mainfrom
zdrapela:rhdh.3

Conversation

@zdrapela

@zdrapela zdrapela commented Jun 24, 2026

Copy link
Copy Markdown
Member

https://redhat.atlassian.net/browse/RHIDP-13974

Add end-to-end disconnected (air-gapped) CI smoke tests for RHDH deployed via Helm and Operator on OCP.

What it does

Mirrors RHDH container images, Helm chart, operator, and dynamic plugins to an isolated mirror registry on a disconnected OCP cluster, deploys RHDH, and runs a Playwright smoke test (guest login → homepage).

Architecture

Shared library (lib/disconnected.sh)

  • require_env / setup_auth — validate env vars, configure container-tools auth
  • build_imageset_config — generate oc-mirror ImageSetConfiguration from chart values
  • run_oc_mirror — mirror images via oc-mirror --v2
  • patch_idms — add cross-registry entries to the generated IDMS
  • fetch_script — download scripts from rhdh-operator

Helm handler (jobs/ocp-disconnected-helm.sh)

  • Chart pull (GA from charts.openshift.io, CI from oci://quay.io/rhdh/chart)
  • Image mirroring via oc-mirror (hub, PostgreSQL, lightspeed, RAG content)
  • IDMS/ITMS generation and patching with cross-registry mirrors
  • Plugin mirroring via mirror-plugins.sh from rhdh-operator
  • Namespace setup: registries.conf ConfigMap, mirror CA ConfigMap, registry auth Secret
  • Helm post-renderer to inject disconnected volumes without array clobber
  • Minimal values file (values_disconnected-smoke.yaml) — guest auth only

Operator handler (jobs/ocp-disconnected-operator.sh)

  • Operator mirroring + installation via prepare-restricted-environment.sh
  • Plugin mirroring, namespace setup, Backstage CR with registries.conf mount

Post-renderer (resources/disconnected/helm-post-renderer.sh)

Patches the rendered Deployment to add:

  1. registries.conf — redirects plugin pulls to mirror registry
  2. policy.json — permissive signature policy for mirrored images
  3. Mirror registry CA cert at /etc/containers/certs.d/<registry>/ca.crt

Issues encountered and fixed

Helm "array clobber" (extraVolumes replaced by override file)

Static helm-overrides.yaml defined 4 extraVolumes but chart 1.11 has 7. Helm replaces (not merges) arrays from -f files. Fix: use --post-renderer to append volumes to the rendered manifests.

Missing ConfigMaps (app-config-rhdh, dynamic-plugins-config)

values_showcase.yaml references ConfigMaps created by apply_yaml_files() which the disconnected handler skips. Fix: use minimal values_disconnected-smoke.yaml with only guest auth.

TLS failure (x509: certificate signed by unknown authority)

Init container's skopeo can't verify mirror registry's self-signed CA. Fix: mount CA cert at /etc/containers/certs.d/<registry>/ca.crt (standard container-tools per-registry path).

Auth failure (authentication required)

Init container has no credentials for the mirror registry. Fix: create ${RELEASE_NAME}-dynamic-plugins-registry-auth Secret (chart's built-in optional mount).

Signature verification failure (A signature was required, but no signature exists)

RHDH image's policy.json requires Red Hat GPG signatures for registry.access.redhat.com. Mirrored images lack signatures (server unreachable). Fix: mount permissive policy.json in init container.

REGISTRY_AUTH_FILE conflict with oc-mirror

The distribution/distribution library (used by oc-mirror) interprets REGISTRY_AUTH_FILE as storage driver config, causing panics. Fix: unset before oc-mirror, restore after.

Chart digest encoding (repo@sha256 + tag)

Chart encodes PostgreSQL digest refs as repository: "repo@sha256" + tag: "<hash>". Fix: PG_SEPARATOR normalizes this for IDMS fields.

Companion PR

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.77%. Comparing base (4002fc1) to head (c63a371).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5008      +/-   ##
==========================================
- Coverage   55.39%   54.77%   -0.62%     
==========================================
  Files         122      110      -12     
  Lines        2365     2147     -218     
  Branches      563      542      -21     
==========================================
- Hits         1310     1176     -134     
+ Misses       1048      969      -79     
+ Partials        7        2       -5     
Flag Coverage Δ
rhdh 54.77% <ø> (-0.62%) ⬇️

Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4002fc1...c63a371. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

Copy link
Copy Markdown
Contributor

The container image build workflow finished with status: cancelled.

@github-actions

Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@github-actions

Copy link
Copy Markdown
Contributor

The container image build workflow finished with status: cancelled.

@zdrapela

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-helm-nightly

@zdrapela

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-operator-nightly

@github-actions

Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@zdrapela

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-helm-nightly

@github-actions

Copy link
Copy Markdown
Contributor

The container image build workflow finished with status: failure.

@github-actions

Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@zdrapela

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-helm-nightly

@github-actions

Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@github-actions

Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@github-actions

Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@zdrapela zdrapela changed the title feat(ci): add disconnected CI smoke tests for Helm deployment feat(ci): add disconnected OCP smoke test for Helm and Operator Jun 26, 2026
@zdrapela

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-helm-nightly

@zdrapela

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-operator-nightly

@github-actions

Copy link
Copy Markdown
Contributor

The container image build workflow finished with status: cancelled.

@zdrapela

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-helm-nightly

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@zdrapela

zdrapela commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-helm-nightly

1 similar comment
@zdrapela

zdrapela commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-helm-nightly

@zdrapela

zdrapela commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-operator-nightly

@zdrapela

zdrapela commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-helm-nightly

@zdrapela

zdrapela commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/retest

@zdrapela

zdrapela commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-helm-nightly

zdrapela added 14 commits July 4, 2026 11:00
Add end-to-end disconnected CI pipeline handlers that deploy RHDH in
an isolated OCP cluster and run a Playwright smoke test.

Helm path uses oc-mirror v2 (downloaded at runtime) for chart + image
mirroring, following the documented air-gapped workflow:
- GA (registry.redhat.io): chart from charts.openshift.io, oc-mirror
  discovers and mirrors default images automatically
- CI/upstream: chart from oci://quay.io/rhdh/chart via helm.local,
  override images added as additionalImages
- oc-mirror generates IDMS, patched with cross-registry entries for
  both quay.io and registry.redhat.io hub image sources
- Chart installed from local tgz in oc-mirror workspace

Operator path uses prepare-restricted-environment.sh from rhdh-operator
for operator+operand mirroring and installation (documented approach).

Both paths share:
- Auth setup (REGISTRY_AUTH_FILE + XDG_RUNTIME_DIR/containers/auth.json)
- Plugin mirroring via mirror-plugins.sh with registries.conf covering
  registry.access.redhat.com/rhdh, quay.io/rhdh, and
  ghcr.io/redhat-developer/rhdh-plugin-export-overlays (6 CI plugins)
- Helm overrides for registries.conf volume mount (avoids array clobber)
- CATALOG_INDEX_IMAGE override support for CI build verification

Dispatcher routing in openshift-ci-tests.sh for
*ocp*disconnected*helm*nightly* and *ocp*disconnected*operator*nightly*
patterns, positioned before generic *ocp*helm*nightly* to prevent
false matches.

Assisted-by: OpenCode
…rator handler

Use the fixed prepare-restricted-environment.sh from
redhat-developer/rhdh-operator#3109 until it merges. The upstream
script has a bug where INSTALL_YQ=0 still triggers the yq install
path because [[ 0 ]] is truthy in bash, causing a 'mv: cannot move
yq_linux_amd64 to /tmp/.local/bin/yq_mf: No such file or directory'
failure.

Assisted-by: OpenCode
oc-mirror panics when REGISTRY_AUTH_FILE is set because the
distribution/distribution library interprets it as a storage driver
config, not a container auth file path. Unset it before running
oc-mirror and restore after — oc-mirror reads auth from
${XDG_RUNTIME_DIR}/containers/auth.json instead.

Save key artifacts to ARTIFACT_DIR with disconnected- prefix for
post-failure debugging: ImageSetConfiguration, generated IDMS/ITMS,
patched IDMS, chart values, rendered ConfigMap, helm set flags,
and Backstage CR.

Assisted-by: OpenCode
…erator

Strip @sha256 suffixes from chart-extracted repository paths before
constructing IDMS entries. The RHDH Helm chart encodes digest
references as repository: "repo@sha256" + tag: "<hash>", but IDMS
source/mirror fields require clean registry/repo paths without @.
Uses ${var%@*} which is a no-op on paths without @.

Install podman at runtime for the operator disconnected handler.
prepare-restricted-environment.sh requires podman for building
custom operator index images.

Assisted-by: OpenCode
… workaround

Helm: replace image: null in helm-overrides.yaml with an envsubst
placeholder (${MIRROR_REGISTRY_URL}/${IMAGE_REPO}:${TAG_NAME}).
The chart validates initContainers[0].image as a string, rejecting
null. The overrides file is now rendered through envsubst before
passing to helm upgrade -i.

Operator: make podman install visible (don't suppress output) and
verify it succeeds before continuing. If apt-get fails (e.g. repos
unreachable via proxy), abort with a clear message instead of
silently proceeding.

Revert the rhdh-operator#3109 PR ref workaround — the yq install
bug has been fixed upstream. Use disconnected::fetch_script again.

Assisted-by: OpenCode
Replace the static helm-overrides.yaml values file with a Helm
post-renderer script that appends the registries.conf volume and
mount to the already-rendered Deployment manifests.

This avoids the Helm 'array clobber' pitfall: a values file that
defines extraVolumes[] or initContainers[] replaces the chart's
entire default array, losing volumes added by newer chart versions.
The post-renderer patches the final manifests so the chart's
defaults are always preserved regardless of chart version changes.

Also improves operator podman install error visibility by removing
-qq flags and adding proper error checking for apt-get commands.

Assisted-by: OpenCode
The full values_showcase.yaml references ConfigMaps (app-config-rhdh,
dynamic-plugins-config) and secrets (rhdh-secrets) that are created by
the normal apply_yaml_files() flow, which the disconnected handler
skips. This caused pods to be stuck in Init:0/2 waiting for missing
ConfigMaps.

Add a minimal values file that provides only what the smoke test
needs: guest auth with dangerouslyAllowOutsideDevelopment. The chart
defaults handle everything else (internal PostgreSQL, backend secret,
default dynamic plugins, route).

Assisted-by: OpenCode
The install-dynamic-plugins init container runs skopeo to fetch the
plugin catalog index. IDMS redirects quay.io pulls to the mirror
registry, but skopeo inside the pod doesn't trust the mirror's
self-signed CA certificate (x509: certificate signed by unknown
authority).

Create a ConfigMap containing the system CA bundle concatenated with
the mirror registry CA, and mount it in the init container at
/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem (Go's default
system cert store path on RHEL/UBI).

Assisted-by: OpenCode
Use the standard container-tools per-registry CA mechanism instead of
replacing the system trust store. Mount just the mirror registry CA
at /etc/containers/certs.d/<registry>/ca.crt — skopeo reads this
path natively, no system CA bundle concatenation needed.

The mirror registry URL is passed via --post-renderer-args so the
mount path is constructed dynamically per CI run.

Fixes: cat /etc/pki/tls/certs/ca-bundle.crt failing on the
Ubuntu-based CI runner (Debian uses /etc/ssl/certs/).

Assisted-by: OpenCode
The install-dynamic-plugins init container uses skopeo to fetch the
plugin catalog index. With IDMS redirecting quay.io to the mirror
registry, skopeo gets 'authentication required' because the pod has
no credentials for the mirror.

Create the -dynamic-plugins-registry-auth secret from
the combined pull secret. The chart already mounts this optional
secret at /opt/app-root/src/.config/containers in the init container
— no post-renderer changes needed.

Assisted-by: OpenCode
Both oc-mirror and podman are now pre-installed in the e2e-runner
image (PR redhat-developer#5036). Remove:
- disconnected::install_oc_mirror function and OC_MIRROR_BIN variable
- Runtime apt-get install podman block in operator handler
- Section lettering from both handlers

Assisted-by: OpenCode
The install-dynamic-plugins init container fails to install
Lightspeed plugins from the mirror registry:

  Source image rejected: A signature was required, but no signature
  exists

The RHDH image ships a policy.json (RHDHBUGS-2799) that requires
Red Hat GPG signatures for registry.access.redhat.com images. In a
disconnected environment, IDMS redirects these pulls to the mirror
registry, but the mirror doesn't host Red Hat's signature store.

Add a permissive policy.json to the existing rhdh-plugin-mirror-conf
ConfigMap and mount it at /etc/containers/policy.json in the init
container. This is the standard approach for air-gapped environments
where the signature server is unreachable.

Assisted-by: OpenCode
…verification

Two issues fixed:

1. Playwright page.goto timeout (50s) on disconnected clusters:
   The CI runner accesses the test cluster through a squid proxy
   (HTTPS_PROXY set by proxy-conf.sh). curl uses it automatically,
   but Playwright's Chromium does not pick up the env var without
   explicit configuration. Add proxy to Playwright's use config,
   reading from HTTPS_PROXY when set (no-op for connected envs).

2. Lightspeed plugin signature verification failure:
   'Source image rejected: A signature was required, but no
   signature exists'
   The RHDH image's policy.json (RHDHBUGS-2799) requires Red Hat
   GPG signatures for registry.access.redhat.com images. In
   disconnected environments, IDMS redirects to the mirror which
   lacks signatures. Mount a permissive policy.json in the init
   container via the existing rhdh-plugin-mirror-conf ConfigMap.

Assisted-by: OpenCode
The CI pod runs with nested_podman: true (hostUsers: false), placing
it inside a Linux user namespace. When prepare-restricted-environment.sh
calls podman build, podman tries to create another user namespace
inside the existing one, which fails with:

  newuidmap: open of uid_map failed: Permission denied
  Error: cannot set up namespace using /usr/bin/newuidmap: exit status 1

Export BUILDAH_ISOLATION=chroot before invoking the script so all
podman build / buildah calls use chroot isolation instead of nested
user namespaces. The env var is respected by both podman and buildah
without needing to modify the downstream rhdh-operator script.

Assisted-by: OpenCode
@zdrapela

zdrapela commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-helm-nightly
/test e2e-ocp-disconnected-operator-nightly

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

The container image build workflow finished with status: cancelled.

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

Three issues fixed:

1. Playwright proxy credentials not parsed:
   HTTPS_PROXY is http://user:pass@host:3128 (credentials in URL).
   Playwright requires credentials as separate username/password
   fields, not embedded in the server URL. Add parseProxy() helper
   that extracts them via URL parsing.

2. Operator podman build fails with nested user namespaces:
   'newuidmap: open of uid_map failed: Permission denied'
   BUILDAH_ISOLATION=chroot alone is insufficient — the error occurs
   during podman's rootless storage setup before build isolation
   applies. Set _CONTAINERS_USERNS_CONFIGURED=1 to skip newuidmap
   and force vfs storage driver to avoid fuse-overlayfs userns ops.

3. Added debug logging for proxy config and podman environment to
   aid future CI troubleshooting.

Assisted-by: OpenCode
@zdrapela

zdrapela commented Jul 4, 2026

Copy link
Copy Markdown
Member Author

/test e2e-ocp-disconnected-helm-nightly
/test e2e-ocp-disconnected-operator-nightly

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@sonarqubecloud

sonarqubecloud Bot commented Jul 4, 2026

Copy link
Copy Markdown

@openshift-ci

openshift-ci Bot commented Jul 4, 2026

Copy link
Copy Markdown

@zdrapela: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ocp-disconnected-helm-nightly c63a371 link false /test e2e-ocp-disconnected-helm-nightly
ci/prow/e2e-ocp-disconnected-operator-nightly 588bb84 link false /test e2e-ocp-disconnected-operator-nightly

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant