channels: fix region detection and discovery cache permission noise#18390
Merged
Conversation
kops-channels, kops-controller, and etcd-manager run as non-root and can't read /sys/devices/virtual/dmi/id/product_uuid (mode 0400), so util/pkg/vfs/s3context.go's isRunningOnEC2 returns false, skips IMDS, defaults to us-east-1, and pays for a cross-region GetBucketLocation / HeadBucket round-trip on every cold start before learning the bucket's real region. Derive AWS_REGION from the cluster spec's subnet zones (kops validation guarantees they all share a region) so the generated pod manifest carries the right value. Stubs that don't ship subnets (the nodeup-side ChannelsBuilder) fall back to AWS_REGION from the calling process env, matching the existing S3_REGION pattern. Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
kops-channels runs as UID 10013 with no HOME, so client-go's cached discovery falls back to /.kube/cache and logs ~22 "mkdir /.kube: permission denied" lines on every pod start before silently giving up. Point HOME at /tmp; chainguard/static (the ko base image) creates /tmp mode 1777 via wolfi-baselayout, so the non-root uid can write there via the container's writable overlay. No emptyDir needed. Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
The aws-cloud-controller-manager and kops-controller manifests still
list node-role.kubernetes.io/master alongside node-role.kubernetes.io/
control-plane in nodeSelectorTerms. The apiserver emits a deprecation
warning on every channel apply ("node-role.kubernetes.io/master is
use \"node-role.kubernetes.io/control-plane\" instead"). kops itself
hasn't applied the master label to control-plane nodes for years, so
the fallback term selects nothing and is dead weight.
Drop the master matchExpressions block. Keep the corresponding
toleration as harmless backward-compat for any straggler node that
still carries the legacy taint.
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
Channels' health check warned "expected status.conditions to be list, got <nil>" whenever a CRD set status.conditions to null instead of omitting the key. The two amazon-vpc-routed-eni CRDs trigger this on every channel apply. Fold the nil case into the existing "status conditions not found" info path; both are semantically "object has no conditions to report". Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
rifelpet
approved these changes
May 21, 2026
Contributor
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rifelpet The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Audit of a recent
kops-channelslog surfaced four sources of avoidable noise on every pod start. Each fix is its own commit and can be cherry-picked independently.env: derive
AWS_REGIONfrom cluster specVFS reads
AWS_REGIONto skip its IMDS region probe. The probe fails for non-root pods (/sys/devices/virtual/dmi/id/product_uuidis mode0400), so the container defaults tous-east-1and pays for a cross-region S3 round-trip on every cold start:BuildSystemComponentEnvVarsnow derives the region fromspec.Networking.Subnets[].Zonewhen the cluster is on AWS. Stub specs without subnets (the nodeup-sideChannelsBuilder) fall back to the parent process env, matching the existingS3_REGIONpattern.channels: set
HOMEso client-go can write its discovery cachekops-channelsruns as UID 10013 with noHOME, so client-go's cached discovery falls back to/.kubeand emits 22 lines per pod start before silently giving up:HOME=/tmp(mode1777viawolfi-baselayout, writable for the non-root uid) silences all 22. No emptyDir needed.addons: drop deprecated master
nodeAffinitytermaws-cloud-controller-managerandkops-controllerstill listnode-role.kubernetes.io/masteralongsidecontrol-planeinnodeSelectorTerms. The apiserver warns on every channel apply:kops hasn't applied the
masterlabel to nodes for years; the fallback term selects nothing. Drop thematchExpressions; keep the matching toleration as harmless backward-compat.applyset: treat nil
status.conditionsas absentCRDs that set
status.conditions: null(rather than omitting the key) triggerpkg/applylib/applyset/health.go's warn path. The two amazon-vpc-routed-eni CRDs hit it every reconcile:Fold the nil case into the existing
status conditions not foundinfo path.Source log
kops-channels.logfrom pull-kops-e2e-k8s-aws-amazonvpc / job 2057268228035448832`/cc @rifelpet @ameukam