Skip to content

channels: fix region detection and discovery cache permission noise#18390

Merged
k8s-ci-robot merged 5 commits into
kubernetes:masterfrom
hakman:fix-channels-logs
May 21, 2026
Merged

channels: fix region detection and discovery cache permission noise#18390
k8s-ci-robot merged 5 commits into
kubernetes:masterfrom
hakman:fix-channels-logs

Conversation

@hakman
Copy link
Copy Markdown
Member

@hakman hakman commented May 21, 2026

Audit of a recent kops-channels log surfaced four sources of avoidable noise on every pod start. Each fix is its own commit and can be cherry-picked independently.

env: derive AWS_REGION from cluster spec

VFS reads AWS_REGION to skip its IMDS region probe. The probe fails for non-root pods (/sys/devices/virtual/dmi/id/product_uuid is mode 0400), so the container defaults to us-east-1 and pays for a cross-region S3 round-trip on every cold start:

unable to read /sys/devices/virtual/dmi/id/product_uuid, assuming not running on EC2
defaulting region to "us-east-1"
found bucket in region "ap-northeast-1"

BuildSystemComponentEnvVars now derives the region from spec.Networking.Subnets[].Zone when the cluster is on AWS. Stub specs without subnets (the nodeup-side ChannelsBuilder) fall back to the parent process env, matching the existing S3_REGION pattern.

channels: set HOME so client-go can write its discovery cache

kops-channels runs as UID 10013 with no HOME, so client-go's cached discovery falls back to /.kube and emits 22 lines per pod start before silently giving up:

failed to write cache to /.kube/cache/discovery/127.0.0.1/servergroups.json due to mkdir /.kube: permission denied
failed to write cache to /.kube/cache/discovery/127.0.0.1/flowcontrol.apiserver.k8s.io/v1/serverresources.json due to mkdir /.kube: permission denied
[...20 more...]

HOME=/tmp (mode 1777 via wolfi-baselayout, writable for the non-root uid) silences all 22. No emptyDir needed.

addons: drop deprecated master nodeAffinity term

aws-cloud-controller-manager and kops-controller still list node-role.kubernetes.io/master alongside control-plane in nodeSelectorTerms. The apiserver warns on every channel apply:

Warning: spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[1].matchExpressions[0].key: node-role.kubernetes.io/master is use "node-role.kubernetes.io/control-plane" instead

kops hasn't applied the master label to nodes for years; the fallback term selects nothing. Drop the matchExpressions; keep the matching toleration as harmless backward-compat.

applyset: treat nil status.conditions as absent

CRDs that set status.conditions: null (rather than omitting the key) trigger pkg/applylib/applyset/health.go's warn path. The two amazon-vpc-routed-eni CRDs hit it every reconcile:

expected status.conditions to be list, got <nil>
expected status.conditions to be list, got <nil>

Fold the nil case into the existing status conditions not found info path.

Source log

kops-channels.log from pull-kops-e2e-k8s-aws-amazonvpc / job 2057268228035448832`

/cc @rifelpet @ameukam

@k8s-ci-robot k8s-ci-robot requested review from ameukam and rifelpet May 21, 2026 11:28
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/addons area/nodeup cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 21, 2026
hakman added 5 commits May 21, 2026 14:30
kops-channels, kops-controller, and etcd-manager run as non-root and
can't read /sys/devices/virtual/dmi/id/product_uuid (mode 0400), so
util/pkg/vfs/s3context.go's isRunningOnEC2 returns false, skips IMDS,
defaults to us-east-1, and pays for a cross-region GetBucketLocation /
HeadBucket round-trip on every cold start before learning the bucket's
real region.

Derive AWS_REGION from the cluster spec's subnet zones (kops validation
guarantees they all share a region) so the generated pod manifest carries
the right value. Stubs that don't ship subnets (the nodeup-side
ChannelsBuilder) fall back to AWS_REGION from the calling process env,
matching the existing S3_REGION pattern.

Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
kops-channels runs as UID 10013 with no HOME, so client-go's cached
discovery falls back to /.kube/cache and logs ~22 "mkdir /.kube:
permission denied" lines on every pod start before silently giving up.

Point HOME at /tmp; chainguard/static (the ko base image) creates /tmp
mode 1777 via wolfi-baselayout, so the non-root uid can write there via
the container's writable overlay. No emptyDir needed.

Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
The aws-cloud-controller-manager and kops-controller manifests still
list node-role.kubernetes.io/master alongside node-role.kubernetes.io/
control-plane in nodeSelectorTerms. The apiserver emits a deprecation
warning on every channel apply ("node-role.kubernetes.io/master is
use \"node-role.kubernetes.io/control-plane\" instead"). kops itself
hasn't applied the master label to control-plane nodes for years, so
the fallback term selects nothing and is dead weight.

Drop the master matchExpressions block. Keep the corresponding
toleration as harmless backward-compat for any straggler node that
still carries the legacy taint.

Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
Channels' health check warned "expected status.conditions to be list,
got <nil>" whenever a CRD set status.conditions to null instead of
omitting the key. The two amazon-vpc-routed-eni CRDs trigger this on
every channel apply.

Fold the nil case into the existing "status conditions not found" info
path; both are semantically "object has no conditions to report".

Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
@hakman hakman force-pushed the fix-channels-logs branch from a5ca2d9 to 6a63f97 Compare May 21, 2026 11:41
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 21, 2026
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2026
@k8s-ci-robot k8s-ci-robot merged commit 0171429 into kubernetes:master May 21, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/addons area/nodeup cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants