Skip to content

[WIP] AGENT-1449: Add IRI registry credential rotation support#5766

Open
rwsu wants to merge 4 commits intoopenshift:mainfrom
rwsu:AGENT-1449-auth-rotation
Open

[WIP] AGENT-1449: Add IRI registry credential rotation support#5766
rwsu wants to merge 4 commits intoopenshift:mainfrom
rwsu:AGENT-1449-auth-rotation

Conversation

@rwsu
Copy link

@rwsu rwsu commented Mar 13, 2026

- What I did

Implement safe credential rotation for the IRI registry using a desired-vs-current pattern with generation-numbered usernames. The auth secret holds the desired password; the pull secret (read from rendered MachineConfig) holds the deployed password. When they differ, a three-phase rotation is performed:

  1. Deploy dual htpasswd (old + new credentials with different usernames)
  2. Update pull secret after all MCPs finish rolling out
  3. Clean up dual htpasswd to single entry after new pull secret is deployed

This avoids authentication deadlocks during rolling MachineConfig updates because the pull secret always contains the old credentials, which are present in every version of the htpasswd. Mid-rotation password changes are handled by verifying htpasswd hashes with bcrypt.CompareHashAndPassword and regenerating if they don't match.

Key changes:

  • Add MachineConfigPool lister/informer to IRI controller
  • Add reconcileAuthCredentials with three-case rotation logic
  • Add getDeployedIRICredentials (reads from rendered MC, not API)
  • Add areAllPoolsUpdated (checks all pools including workers)
  • Add HtpasswdHasValidEntry, GenerateHtpasswdEntry, GenerateDualHtpasswd,
  • NextIRIUsername, ExtractIRICredentialsFromPullSecret helpers
  • Vendor golang.org/x/crypto/bcrypt for htpasswd hash generation
  • Add credential rotation design doc

- How to verify it

Update the password to trigger the rotation to start:

oc -n openshift-machine-config-operator patch secret internal-release-image-registry-auth \
  --type merge -p '{"data":{"password":"'$(echo -n "new-password" | base64)'"}}'

Verify the /etc/iri-registry/auth/htpasswd has been updated.
Verify iri-registry works for both new and old credentials during rollout.
Verify global pull-secret contains the new credentials after rollout is complete.

- Description for the changelog

Add credential rotation support for the IRI registry. When the auth secret's password field is updated, the controller performs a three-phase rotation: (1) deploys a dual htpasswd with both old and new credentials so all nodes accept both passwords during rollout, (2) updates the global pull secret with the new credentials after all MachineConfigPools are fully updated, and (3) cleans up the dual htpasswd to a single entry once the new credentials are deployed everywhere. This avoids authentication deadlocks caused by api-int load-balancing requests across master nodes that may be at different stages of the rollout.

Also adds e2e tests for registry authentication (401 on unauthenticated requests, 200 with valid credentials) and an end-to-end credential rotation test that exercises all three phases.

Depends on #5765.

rwsu and others added 4 commits March 13, 2026 11:40
Add htpasswd-based authentication to the IRI registry. The installer
generates credentials and provides them via a bootstrap secret. The MCO
mounts the htpasswd file into the registry container and configures
registry auth environment variables. The registry password is merged
into the node pull secret so kubelet can authenticate when pulling
the release image.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The IRI controller merges registry auth credentials into the global
pull secret after bootstrap. This triggers the template controller to
re-render template MCs (00-master, etc.) with the updated pull secret,
producing a different rendered MC hash than what bootstrap created.

The mismatch causes the MCD DaemonSet pod to fail during bootstrap:
it reads the bootstrap-rendered MC name from the node annotation, but
that MC no longer exists in-cluster (replaced by the re-rendered one).
The MCD falls back to reading /etc/machine-config-daemon/currentconfig,
which was never written because the firstboot MCD detected "no changes"
and skipped it. Both master nodes go Degraded and never recover.

Fix by merging IRI auth into the pull secret during bootstrap before
template MC rendering, so both bootstrap and in-cluster produce
identical rendered MC hashes.

Extract the pull secret merge logic into a shared MergeIRIAuthIntoPullSecret
function used by both the bootstrap path and the in-cluster IRI controller.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Implement safe credential rotation for the IRI registry using a
desired-vs-current pattern with generation-numbered usernames. The auth
secret holds the desired password; the pull secret (read from rendered
MachineConfig) holds the deployed password. When they differ, a
three-phase rotation is performed:

1. Deploy dual htpasswd (old + new credentials with different usernames)
2. Update pull secret after all MCPs finish rolling out
3. Clean up dual htpasswd to single entry after new pull secret is deployed

This avoids authentication deadlocks during rolling MachineConfig updates
because the pull secret always contains the old credentials, which are
present in every version of the htpasswd. Mid-rotation password changes
are handled by verifying htpasswd hashes with bcrypt.CompareHashAndPassword
and regenerating if they don't match.

Key changes:
- Add MachineConfigPool lister/informer to IRI controller
- Add reconcileAuthCredentials with three-case rotation logic
- Add getDeployedIRICredentials (reads from rendered MC, not API)
- Add areAllPoolsUpdated (checks all pools including workers)
- Add HtpasswdHasValidEntry, GenerateHtpasswdEntry, GenerateDualHtpasswd,
  NextIRIUsername, ExtractIRICredentialsFromPullSecret helpers
- Vendor golang.org/x/crypto/bcrypt for htpasswd hash generation
- Add credential rotation design doc

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
…ial rotation

Add three new e2e tests:
- TestIRIAuth_UnauthenticatedRequestReturns401: verifies registry rejects
  unauthenticated requests with 401 when auth is enabled
- TestIRIAuth_AuthenticatedRequestSucceeds: verifies registry accepts
  requests with valid Basic Auth credentials
- TestIRIAuth_CredentialRotation: end-to-end test of the three-phase
  credential rotation (dual htpasswd, pull secret update, cleanup)

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 13, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5e1423eb-13df-487c-b8b5-d460458dc11a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot
Copy link
Contributor

@rwsu: An error was encountered searching for bug AGENT-1449 on the Jira server at https://issues.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. No response returned: Get "https://issues.redhat.com/rest/api/2/issue/AGENT-1449": GET https://issues.redhat.com/rest/api/2/issue/AGENT-1449 giving up after 5 attempt(s)

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

- What I did

Implement safe credential rotation for the IRI registry using a desired-vs-current pattern with generation-numbered usernames. The auth secret holds the desired password; the pull secret (read from rendered MachineConfig) holds the deployed password. When they differ, a three-phase rotation is performed:

  1. Deploy dual htpasswd (old + new credentials with different usernames)
  2. Update pull secret after all MCPs finish rolling out
  3. Clean up dual htpasswd to single entry after new pull secret is deployed

This avoids authentication deadlocks during rolling MachineConfig updates because the pull secret always contains the old credentials, which are present in every version of the htpasswd. Mid-rotation password changes are handled by verifying htpasswd hashes with bcrypt.CompareHashAndPassword and regenerating if they don't match.

Key changes:

  • Add MachineConfigPool lister/informer to IRI controller
  • Add reconcileAuthCredentials with three-case rotation logic
  • Add getDeployedIRICredentials (reads from rendered MC, not API)
  • Add areAllPoolsUpdated (checks all pools including workers)
  • Add HtpasswdHasValidEntry, GenerateHtpasswdEntry, GenerateDualHtpasswd,
  • NextIRIUsername, ExtractIRICredentialsFromPullSecret helpers
  • Vendor golang.org/x/crypto/bcrypt for htpasswd hash generation
  • Add credential rotation design doc

- How to verify it

Update the password to trigger the rotation to start:

oc -n openshift-machine-config-operator patch secret internal-release-image-registry-auth \
 --type merge -p '{"data":{"password":"'$(echo -n "new-password" | base64)'"}}'

Verify the /etc/iri-registry/auth/htpasswd has been updated.
Verify iri-registry works for both new and old credentials during rollout.
Verify global pull-secret contains the new credentials after rollout is complete.

- Description for the changelog

Add credential rotation support for the IRI registry. When the auth secret's password field is updated, the controller performs a three-phase rotation: (1) deploys a dual htpasswd with both old and new credentials so all nodes accept both passwords during rollout, (2) updates the global pull secret with the new credentials after all MachineConfigPools are fully updated, and (3) cleans up the dual htpasswd to a single entry once the new credentials are deployed everywhere. This avoids authentication deadlocks caused by api-int load-balancing requests across master nodes that may be at different stages of the rollout.

Also adds e2e tests for registry authentication (401 on unauthenticated requests, 200 with valid credentials) and an end-to-end credential rotation test that exercises all three phases.

Depends on #5765.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rwsu
Once this PR has been reviewed and has the lgtm label, please assign rishabhsaini for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rwsu
Copy link
Author

rwsu commented Mar 13, 2026

/cc @andfasano

@openshift-ci openshift-ci bot requested a review from andfasano March 13, 2026 22:08
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 14, 2026

@rwsu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift 9c8ca8e link true /test e2e-hypershift
ci/prow/verify 9c8ca8e link true /test verify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants