Add separate configurable timeouts for getAccessToken, getNonce, and getCredential#161
Add separate configurable timeouts for getAccessToken, getNonce, and getCredential#161
Conversation
…getCredential Agent-Logs-Url: https://github.com/Azure/aks-secure-tls-bootstrap/sessions/25f644c9-1149-470c-bbfd-40cb0128b03e Co-authored-by: djsly <4981802+djsly@users.noreply.github.com>
| token, err := c.getAccessToken(ctx, config.UserAssignedIdentityID, config.AADResource, config.CloudProviderConfig) | ||
| accessTokenCtx, accessTokenCancel := context.WithTimeout(ctx, config.GetAccessTokenTimeout) | ||
| defer accessTokenCancel() | ||
| token, err := c.getAccessToken(accessTokenCtx, config.UserAssignedIdentityID, config.AADResource, config.CloudProviderConfig) |
There was a problem hiding this comment.
will it retry if it times out ?
There was a problem hiding this comment.
Yes, all three operations will retry on timeout — the per-operation context is re-created fresh from the parent ctx on every retry attempt, so each retry gets a full timeout window.
Specifically:
getAccessToken: a timeout causestoken.Refresh()to fail with a context error, which flows intotokenRefreshErrorToGetAccessTokenFailure. Since a context deadline exceeded error is not anadal.TokenRefreshError, it returnsbootstrapError{retryable: true}→ retried.getNonce/getCredential: already wrapped asbootstrapError{retryable: true}inclient.bootstrap()→ retried.
The overall Deadline still acts as the ultimate bound — if ctx itself expires, the retry loop stops regardless.
There was a problem hiding this comment.
what if I don't want to wait for the full deadline ? I would like to ensure that the total time getNonce can try to get a value is X counting the timeouts. so, there is the timeout at the client call, and the timeout for all retries together
There was a problem hiding this comment.
Added in commit a10740b. Three new fields (and corresponding CLI flags) cap the combined time across all retry attempts for each operation:
GetAccessTokenTotalTimeout/--get-access-token-total-timeoutGetNonceTotalTimeout/--get-nonce-total-timeoutGetCredentialTotalTimeout/--get-credential-total-timeout
Each budget context is created once before the retry loop in Bootstrap(), so it accumulates across retries. The per-attempt timeout is derived from the budget context (whichever deadline is shorter wins). When the budget is exhausted, the operation returns a non-retryable error and the retry loop stops.
Setting the value to 0 (the default) disables the cap — the only bound is then the overall Deadline.
Example: to allow at most 90s total across all getNonce attempts, each attempt capped at 20s:
{ "getNonceTotalTimeout": "90s", "getNonceTimeout": "20s" }|
@copilot review comments |
…eout, GetNonceTotalTimeout, GetCredentialTotalTimeout) Agent-Logs-Url: https://github.com/Azure/aks-secure-tls-bootstrap/sessions/57228f26-fe76-4595-88ce-4372227013fb Co-authored-by: djsly <4981802+djsly@users.noreply.github.com>
All three bootstrap operations (
getAccessToken,getNonce,getCredential) shared the single overallDeadlineas their only timeout boundary. This adds independent, configurable per-operation timeouts — both a per-attempt timeout and a total retry budget that caps the combined time across all retry attempts.Changes
Configstruct — six newtime.Durationfields with JSON tags:GetAccessTokenTimeout(default: 30s) — max time for a single attemptGetAccessTokenTotalTimeout— max combined time across all retry attempts (0 = no extra cap)GetNonceTimeout(default: 30s) — max time for a single attemptGetNonceTotalTimeout— max combined time across all retry attempts (0 = no extra cap)GetCredentialTimeout(default: 1m) — max time for a single attemptGetCredentialTotalTimeout— max combined time across all retry attempts (0 = no extra cap)Bootstrap()— three total-budget contexts are created once before the retry loop, so their deadlines accumulate across retries. When a budget is exhausted the failing operation returns a non-retryable error and the retry loop stops immediately.client.bootstrap()— each operation now runs under a child context derived from its budget context (or the parent context when no budget is set); the parent/budget context deadline still wins if shorter (standard Go behavior).CLI flags —
--get-access-token-timeout,--get-access-token-total-timeout,--get-nonce-timeout,--get-nonce-total-timeout,--get-credential-timeout,--get-credential-total-timeout; all configurable via the JSON config file as well.Example
{ "getAccessTokenTimeout": "45s", "getNonceTimeout": "20s", "getNonceTotalTimeout": "90s", "getCredentialTimeout": "30s", "getCredentialTotalTimeout": "120s" }