Skip to content

Fix K8s projected service account token expiry#7092

Draft
gavinelder wants to merge 1 commit intomasterfrom
fix/k8s-projected-token-refresh
Draft

Fix K8s projected service account token expiry#7092
gavinelder wants to merge 1 commit intomasterfrom
fix/k8s-projected-token-refresh

Conversation

@gavinelder
Copy link
Copy Markdown
Contributor

Summary

Fixes #6918 — pipelines running >1hr on Kubernetes clusters with projected service account tokens (AKS, RKE2, etc.) receive 401 Unauthorized because the token read at startup expires mid-run.

Root cause

ConfigDiscovery.fromCluster() read the SA token from /var/run/secrets/kubernetes.io/serviceaccount/token once at startup and stored it as a string in ClientConfig. The kubelet rotates this file on disk at ~80% of the token lifetime (~48min for the default 3607s expiry), but Nextflow never re-read it.

The previous workaround (PR #6742 / #6925) periodically recreated K8sClient via a Guava cache in K8sExecutor, but K8sTaskHandler cached the K8sClient instance at construction time, so the cache was never consulted after startup.

Fix

  • ConfigDiscovery.fromCluster() now stores the token file Path in ClientConfig.tokenFile instead of eagerly reading the content
  • ClientConfig.getBearerToken() reads from tokenFile with a 30s in-memory TTL cache — the file is re-read at most once per 30s per ClientConfig instance; on read failure the last known-good token is returned and a warning is logged
  • K8sClient.makeRequestCall() calls config.getBearerToken() instead of config.token
  • The Guava cache in K8sExecutor and the k8s.clientRefreshInterval config option are removed as they are no longer needed

Test plan

  • ./gradlew :plugins:nf-k8s:test passes (244 tests)
  • New unit tests cover: TTL cache hit, TTL expiry picks up rotated token, read failure falls back to last good token
  • Manual validation on a cluster with projected SA tokens (expirationSeconds: 3607) running a pipeline >1hr

🤖 Generated with Claude Code

Re-read the SA token from disk with a 30s TTL cache rather than
snapshotting it at startup. Kubelet rotates projected tokens at ~80%
of their lifetime (default ~48min), so the old approach caused 401s
on pipelines running longer than ~1hr.

The previous workaround (PR #6742 / #6925) periodically recreated
K8sClient via a Guava cache, but K8sTaskHandler cached the client
instance at construction time so the cache was never consulted.
The root fix is in ClientConfig.getBearerToken(), which reads the
token file fresh when the 30s TTL expires rather than at any higher
layer. The Guava cache and clientRefreshInterval config option are
removed as they are no longer needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Gavin Elder <gavin.elder@seqera.io>
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 30, 2026

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 0422609
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69f38a2e77978400087221b4
😎 Deploy Preview https://deploy-preview-7092--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

K8s executor caches client token at wrong layer — PR #6742 token refresh never triggered

1 participant