Fix K8s 401 after token expiry (#6918)#7093
Open
pditommaso wants to merge 1 commit intomasterfrom
Open
Conversation
Three caching layers caused the K8s client to hold an expired service-account token after the kubelet rotated it on disk, leading to 401 Unauthorized after ~60 minutes on clusters with short-lived projected tokens (e.g. AKS default 3607s). This change closes the remaining failure modes left open after the fixes in #6742 and #6925: - K8sTaskHandler no longer caches the K8sClient at construction; every access now delegates to the executor's Guava-cached client, so handlers pick up refreshes automatically. - ClientConfig retains the token file path (in-cluster discovery and kubeconfig/Nextflow-config tokenFile entries) so the token can be re-read from disk later. - K8sClient.apply() retries on HTTP 401 when tokenPath is set: the onRetry hook re-reads the token file (kubelet writes the rotated token to the same mount path) and updates the in-memory config before retrying. 401s with no tokenPath still propagate immediately as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
adamrtalbot
approved these changes
May 1, 2026
Collaborator
adamrtalbot
left a comment
There was a problem hiding this comment.
it looks good to me:
- if 401,
- refresh token
- to maxAttempts
Will need some real world verification.
Member
Author
|
Have you a way to replicate and test a dev build? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #6918.
Fixes the remaining 401 Unauthorized failure modes on the K8s executor after token expiry, left open after #6742 and #6925.
Three caching layers contributed to the bug. After this PR, all three are self-healing:
K8sConfig.getClient()K8sExecutor.getClient()K8sTaskHandler.clientgetClient()delegating to executorK8sClientrequest looptokenPathis setWhy a retry alone wasn't enough
K8s service-account tokens aren't refreshed via an API call — the kubelet rotates them by overwriting the file at the mount path (
/var/run/secrets/kubernetes.io/serviceaccount/token). The client just needs to re-read the file from disk on 401. This PR records the token file path onClientConfig(tokenPath) for the three discovery paths (in-cluster, kubeconfigtokenFile, Nextflow-configtokenFile) and uses it in the retry handler.For inline
token:strings in kubeconfig there's no file to re-read; those still fail fast on 401, unchanged.Files
K8sTaskHandler.groovy— drop cachedclientfield, addgetClient()delegating to executorClientConfig.groovy— addPath tokenPath; populate it fromfromNextflowConfig/fromUserAndClusterwhen atokenFileis givenConfigDiscovery.groovy— populatetokenPathinfromClusterK8sClient.groovy—apply()retries on 401 whentokenPathis set;onRetrylistener calls newrefreshToken()to re-read the file and updateconfig.tokenTest plan
ClientConfigTestcases asserttokenPathis preserved fortokenFile(Nextflow + kubeconfig) and not set for inline tokensConfigDiscoveryTest.should load from cluster envextended to assertconfig.tokenPath == TOKEN_FILEK8sClientTest.should re-read token from disk and retry on 401 when tokenPath is set— verifies file is re-read and second request uses the fresh tokenK8sClientTest.should not retry 401 when tokenPath is not set— preserves existing fail-fast behavior for inline tokens./gradlew :plugins:nf-k8s:test)🤖 Generated with Claude Code