feat(azure): enable ACR image pulls via managed identity on worker nodes#8205
feat(azure): enable ACR image pulls via managed identity on worker nodes#8205twolff-gh wants to merge 5 commits into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an optional ImageRegistryCredentials field to AzureNodePoolPlatform and a new AzureImageRegistryCredentials type with validation (ManagedIdentity, Registries). The operator sets VMIdentityUserAssigned and UserAssignedIdentities on Azure VM specs when credentials are present. It generates a worker MachineConfig (Ignition v3.2.0) embedding a kubelet credential-provider YAML and an acr-azure.json payload derived from Azure identifiers. Tests cover API JSON compatibility, credential-provider and acr JSON generation, Ignition contents, and MachineConfig creation. Kubeapilinter exclusions were added for the new field and test-only structs. Sequence Diagram(s)sequenceDiagram
actor Admin as Admin/User
participant NP as NodePool Spec
participant Op as Operator
participant AzureSpec as Azure VM Template
participant MCO as MachineConfig Generator
participant Ign as Ignition Config
participant Node as Worker Node
Admin->>NP: Create/Update NodePool with ImageRegistryCredentials
Op->>NP: Reconcile NodePool
Op->>AzureSpec: Build azureMachineTemplateSpec()
alt ImageRegistryCredentials set
AzureSpec->>AzureSpec: Set Identity = VMIdentityUserAssigned
AzureSpec->>AzureSpec: Add UserAssignedIdentities (ManagedIdentity)
end
Op->>MCO: generateMCORawConfig()
MCO->>MCO: getACRCredentialProviderConfig()
alt Azure + ImageRegistryCredentials
MCO->>MCO: generateACRCredentialProviderMachineConfig()
MCO->>Ign: Create Ignition v3.2.0 with:
Ign->>Ign: /etc/kubernetes/credential-providers/acr-credential-provider.yaml
Ign->>Ign: /etc/kubernetes/acr-azure.json
MCO->>Op: Return MachineConfig (wrapped in ConfigMap)
end
Op->>Node: Apply MachineConfig / Ignition
Node->>Node: Kubelet uses credential provider (ManagedIdentity)
🚥 Pre-merge checks | ✅ 8 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (8 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: twolff-gh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| short-lived tokens for ACR image pulls instead of relying on static pull secrets. | ||
| Changing this field will trigger a node rollout. | ||
| properties: | ||
| managedIdentity: |
There was a problem hiding this comment.
for standalone or other cases, do we want to support system-assigned identity? If so, we wouldn't have a customer pass these in at creation time.
There was a problem hiding this comment.
For the scope of this feature, I believe it was only for user-assigned identities. I'll confirm.
If we needed to support system-assigned, I wonder if it would need to be populated differently / new field?
There was a problem hiding this comment.
Yes, user-assigned was the main ask, but system-assigned would likely follow in the future.
Do they make sense as separate work - user now and system as another PR after?
There was a problem hiding this comment.
if system-assigned identities are added per-VM, then this doesn't make sense from a feature perspective. Maybe try and see what happens, then ignore it or implement it as you see fit.
There was a problem hiding this comment.
From what I see, a cluster needs to be created before getting a principal id which some process would need permissions to access customer resources.
Does the customer trust the platform to write IAM policies on their container registry? (Don't know)
Seems like a followup item. I'll gladly look into
…/R3) Address bennerv review comments R2 and R3 on PR openshift#8205: make the registries field optional (remove +required and MinItems=1), allow empty registries to fall through to default wildcard patterns (*.azurecr.io, etc.), and update tests accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hypershift-operator/controllers/nodepool/config.go`:
- Around line 242-248: The code dereferences
cg.hostedCluster.Spec.Platform.Azure directly which can panic; before calling
generateACRCredentialProviderMachineConfig, add a guard that
cg.hostedCluster.Spec.Platform != nil and cg.hostedCluster.Spec.Platform.Azure
!= nil, and handle the nil case (return an appropriate error or skip registry
credential config) so you don't call generateACRCredentialProviderMachineConfig
with a nil azureSpec; keep references to
cg.nodePool.Spec.Platform.Azure.ImageRegistryCredentials and the
generateACRCredentialProviderMachineConfig call when adding the guard and
subsequent handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 51b240dc-cd08-46f4-8ba4-b589a0fd067b
⛔ Files ignored due to path filters (14)
api/hypershift/v1beta1/zz_generated.deepcopy.gois excluded by!**/zz_generated*.go,!**/zz_generated*api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/AAA_ungated.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/GCPPlatform.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/nodepools.hypershift.openshift.io/OpenStack.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**client/applyconfiguration/hypershift/v1beta1/azureimageregistrycredentials.gois excluded by!client/**client/applyconfiguration/hypershift/v1beta1/azurenodepoolplatform.gois excluded by!client/**client/applyconfiguration/utils.gois excluded by!client/**cmd/install/assets/hypershift-operator/zz_generated.crd-manifests/nodepools-CustomNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/hypershift-operator/zz_generated.crd-manifests/nodepools-Default.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/hypershift-operator/zz_generated.crd-manifests/nodepools-TechPreviewNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamldocs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.mddocs/content/reference/api.mdis excluded by!docs/content/reference/api.mdvendor/github.com/openshift/hypershift/api/hypershift/v1beta1/azure.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*
📒 Files selected for processing (8)
api/.golangci.ymlapi/hypershift/v1beta1/azure.goapi/hypershift/v1beta1/azure_test.gohypershift-operator/controllers/nodepool/azure.gohypershift-operator/controllers/nodepool/azure_acr.gohypershift-operator/controllers/nodepool/azure_acr_test.gohypershift-operator/controllers/nodepool/azure_test.gohypershift-operator/controllers/nodepool/config.go
…/R3) Address bennerv review comments R2 and R3 on PR openshift#8205: make the registries field optional (remove +required and MinItems=1), allow empty registries to fall through to default wildcard patterns (*.azurecr.io, etc.), and update tests accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e11b55c to
18f22d6
Compare
| // acrDefaultMatchImages lists the default wildcard patterns for Azure Container Registry | ||
| // endpoints across all Azure clouds. These patterns are always appended to the user-specified | ||
| // registries to ensure the credential provider covers all standard ACR endpoints. | ||
| var acrDefaultMatchImages = []string{ |
There was a problem hiding this comment.
Can these not be defaulted in the api for the registries?
There was a problem hiding this comment.
From what I found, these come from MCO credential provider config, if changed upstream, API default could go stale. Does that sound accurate?
Can or should we change this at the MCO config?
| // multiple registries. When specified, these registries are added to the credential | ||
| // provider's matchImages list alongside the default wildcard patterns (*.azurecr.io, | ||
| // *.azurecr.cn, *.azurecr.de, *.azurecr.us). When omitted, only the default wildcard | ||
| // patterns are used, which cover all standard Azure Container Registry endpoints. |
There was a problem hiding this comment.
Can we call out a customer only gets 16 of the 20 spots since 4 are used for the wildcards?
There was a problem hiding this comment.
It sounds like this is additive. If the hard limit is 20 including the defaults, you should limit this to only what the user can actually provide.
There was a problem hiding this comment.
MaxItems=16 added. user gets 16 slots, 4 defaults added by controller.
There was a problem hiding this comment.
Removed the registries field. It was in the original WIP, but doesnt make sense in this final version.
Adds more complexity than necessary
The default patterns copied in plus the RBAC for ids should be enough
|
|
||
| azureMachineTemplate.Template.Spec.SSHPublicKey = dummySSHKey | ||
|
|
||
| if nodePool.Spec.Platform.Azure.ImageRegistryCredentials != nil { |
There was a problem hiding this comment.
Add comment documenting that identity assignment currently replaces (not appends), and should be refactored to append if additional identities are needed in the future.
| // multiple registries. When specified, these registries are added to the credential | ||
| // provider's matchImages list alongside the default wildcard patterns (*.azurecr.io, | ||
| // *.azurecr.cn, *.azurecr.de, *.azurecr.us). When omitted, only the default wildcard | ||
| // patterns are used, which cover all standard Azure Container Registry endpoints. |
There was a problem hiding this comment.
It sounds like this is additive. If the hard limit is 20 including the defaults, you should limit this to only what the user can actually provide.
| // multiple registries. When specified, these registries are added to the credential | ||
| // provider's matchImages list alongside the default wildcard patterns (*.azurecr.io, | ||
| // *.azurecr.cn, *.azurecr.de, *.azurecr.us). When omitted, only the default wildcard |
There was a problem hiding this comment.
Am I allowed to specify an entry that matches one of these default patterns?
There was a problem hiding this comment.
Controller duplicates silently. Duplicate is dropped.
Does this work?
There was a problem hiding this comment.
De-duplication is fine - should we prevent users from specifying the wildcard patterns that are applied by default though? It doesn't look like there would be any value to the users to be able to specify those aside from unnecessarily consuming some of their 16 slots.
Do we anticipate these defaults potentially changing much in the future (i.e removing one of them)? If so, it could make sense to allow end-users to explicitly provide them to be more resilient to changes in the defaults.
There was a problem hiding this comment.
Removed the registries field. It was in the original WIP, but doesnt make sense in this final version.
Adds more complexity than necessary
The default patterns copied in plus the RBAC for ids should be enough
1419528 to
aef1ea8
Compare
aef1ea8 to
2caafeb
Compare
2caafeb to
2b43b77
Compare
|
/retest |
| // +kubebuilder:validation:MaxItems=16 | ||
| // +kubebuilder:validation:items:MinLength=1 | ||
| // +kubebuilder:validation:items:MaxLength=253 | ||
| // +kubebuilder:validation:items:Pattern=`^(\*\.)?[a-zA-Z0-9]([a-zA-Z0-9\-]*[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9\-]*[a-zA-Z0-9])?)+$` |
There was a problem hiding this comment.
Instead of using the Pattern marker, we generally recommend using the XValidation marker to enforce the pattern via a CEL expression and supplying a human-readable error message on validation failure.
Using the pattern marker means that the end user is provided the literal regex as part of the validation failure instead of a user-friendly message that describes the constraints.
| // Format: /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName} | ||
| // | ||
| // +required | ||
| ManagedIdentity AzureManagedIdentityResourceID `json:"managedIdentity,omitempty"` |
There was a problem hiding this comment.
Include the length constraints for this field (enforced by the type) in the GoDoc here as well please
|
/uncc |
2af9a37 to
de3bfd7
Compare
| // resourceID is the ARM resource ID of the user-assigned managed identity. | ||
| // | ||
| // Format: /subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName} |
There was a problem hiding this comment.
Fully document the length constraints as well please
| // clientID is the client ID of the user-assigned managed identity. This field is optional | ||
| // and is mainly used for CI purposes. | ||
| // | ||
| // +optional | ||
| ClientID AzureClientID `json:"clientID,omitempty"` |
There was a problem hiding this comment.
Is there a use case where this field provides value to end users?
We shouldn't design the API surface for what we need for testing (unless I'm misunderstanding "CI purposes").
…Identity Add AzureImageRegistryCredentials struct on AzureNodePoolPlatform to configure kubelet credential provider for ACR authentication using a user-assigned managed identity on worker node VMs. - UserAssignedManagedIdentity struct with resourceID (required) - AzureManagedIdentityResourceID named type with CEL ARM resource ID validation, MinLength=1, MaxLength=512 - ImageRegistryCredentials is optional, value type with omitzero, triggers node rollout when changed - No Registries field: controller hardcodes wildcard patterns covering all standard ACR endpoints (*.azurecr.io/cn/de/us) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ntroller Add ACR credential provider MachineConfig generation and CAPI managed identity attachment for Azure NodePools with imageRegistryCredentials. When imageRegistryCredentials.managedIdentity is set on a NodePool: - AzureMachineTemplate gets Identity: UserAssigned with the MI - MachineConfig overrides MCO credential provider config to point at a separate acr-azure.json with useManagedIdentityExtension: true - Kubelet uses acr-credential-provider binary for short-lived IMDS tokens, no static pull secrets needed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add YAML-driven envtest cases for imageRegistryCredentials validation: - Valid managedIdentity with camelCase resourceGroups (pass) - Lowercase resourcegroups rejected (fail, consistent with other Azure resource ID validations in this file) - Invalid format string (fail) - Wrong provider type (fail) - Missing identity name (fail) - Extra path segments (fail) - imageRegistryCredentials not set (pass) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
de3bfd7 to
b4ae9c1
Compare
|
@twolff-gh: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
The PR branch is Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Root CauseThe root cause is a race condition in CRD cleanup within The bug (present on PR branch): After installing and validating CRDs for a feature set, the test calls The fix (on main via PR #8261, commit Why only Kube 1.35.0? CRD garbage collection speed varies by Kubernetes version. The Kube 1.35.0 envtest binary has slower CRD finalization timing, making it the most susceptible to this race condition. Other Kube versions and OCP-variant envtests happened to finalize fast enough to avoid the timeout. Why is PR #8205 affected? The PR branch was created on April 10, 2026, before the fix was merged on April 17, 2026. The branch is 2 commits behind Recommendations
Evidence
|
| // +kubebuilder:validation:MinLength=1 | ||
| // +kubebuilder:validation:MaxLength=512 | ||
| // +kubebuilder:validation:XValidation:rule="self.matches('^/subscriptions/[^/]+/resourceGroups/[^/]+/providers/Microsoft\\\\.ManagedIdentity/userAssignedIdentities/[^/]+$')",message="must be a valid ARM resource ID for a user-assigned managed identity (e.g., /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{name})" | ||
| type AzureManagedIdentityResourceID string |
There was a problem hiding this comment.
From another PR I reviewed recently that enforces this format, it looks like subscription ID and resource group can be further validated to be the correct format like:
// +kubebuilder:validation:XValidation:rule="oldSelf.hasValue() && oldSelf.value() == self || size(self.split('/')) == 9 && self.split('/')[2].matches('^[0-9a-fA-F]{8}-([0-9a-fA-F]{4}-){3}[0-9a-fA-F]{12}$')",message="the subscriptionId in the imageID must be a valid UUID. It should be 5 groups of hyphen separated hexadecimal characters in the form 8-4-4-4-12",optionalOldSelf=true
// +kubebuilder:validation:XValidation:rule=`oldSelf.hasValue() && oldSelf.value() == self || size(self.split('/')) == 9 && self.split('/')[4].matches('^[a-zA-Z0-9_\\(\\)\\.-]{1,90}$')`,message="the resourceGroupName should be between 1 and 90 characters, consisting only of alphanumeric characters, hyphens, underscores, periods and parenthesis",optionalOldSelf=true
Followup to #7793
Add support for configuring kubelet's image credential provider to
authenticate to Azure Container Registry (ACR) using a user-assigned
managed identity. This eliminates the need for static pull secrets
by leveraging IMDS-based token acquisition on worker nodes.
Summary by CodeRabbit
New Features
Bug Fixes
Tests