Skip to content

Add separate configurable timeouts for getAccessToken, getNonce, and getCredential#161

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/configure-timeout-values
Draft

Add separate configurable timeouts for getAccessToken, getNonce, and getCredential#161
Copilot wants to merge 3 commits intomainfrom
copilot/configure-timeout-values

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 2, 2026

All three bootstrap operations (getAccessToken, getNonce, getCredential) shared the single overall Deadline as their only timeout boundary. This adds independent, configurable per-operation timeouts — both a per-attempt timeout and a total retry budget that caps the combined time across all retry attempts.

Changes

  • Config struct — six new time.Duration fields with JSON tags:

    • GetAccessTokenTimeout (default: 30s) — max time for a single attempt
    • GetAccessTokenTotalTimeout — max combined time across all retry attempts (0 = no extra cap)
    • GetNonceTimeout (default: 30s) — max time for a single attempt
    • GetNonceTotalTimeout — max combined time across all retry attempts (0 = no extra cap)
    • GetCredentialTimeout (default: 1m) — max time for a single attempt
    • GetCredentialTotalTimeout — max combined time across all retry attempts (0 = no extra cap)
  • Bootstrap() — three total-budget contexts are created once before the retry loop, so their deadlines accumulate across retries. When a budget is exhausted the failing operation returns a non-retryable error and the retry loop stops immediately.

  • client.bootstrap() — each operation now runs under a child context derived from its budget context (or the parent context when no budget is set); the parent/budget context deadline still wins if shorter (standard Go behavior).

  • CLI flags--get-access-token-timeout, --get-access-token-total-timeout, --get-nonce-timeout, --get-nonce-total-timeout, --get-credential-timeout, --get-credential-total-timeout; all configurable via the JSON config file as well.

Example

{
  "getAccessTokenTimeout": "45s",
  "getNonceTimeout": "20s",
  "getNonceTotalTimeout": "90s",
  "getCredentialTimeout": "30s",
  "getCredentialTotalTimeout": "120s"
}
// Budget contexts created once before the retry loop:
nonceBudgetCtx, cancel := ctxWithOptionalTimeout(ctx, config.GetNonceTotalTimeout)
defer cancel()

// In bootstrap(), each call uses a per-attempt context derived from the budget:
nonceCtx, cancel := context.WithTimeout(nonceBudgetCtx, config.GetNonceTimeout)
defer cancel()
nonce, err := c.getNonce(nonceCtx, ...)
// If nonceBudgetCtx is exhausted → non-retryable error → retry loop stops

Copilot AI changed the title [WIP] Update timeout values for API calls to be configurable Add separate configurable timeouts for getAccessToken, getNonce, and getCredential Apr 2, 2026
Copilot AI requested a review from djsly April 2, 2026 00:46
Copilot finished work on behalf of djsly April 2, 2026 00:47
token, err := c.getAccessToken(ctx, config.UserAssignedIdentityID, config.AADResource, config.CloudProviderConfig)
accessTokenCtx, accessTokenCancel := context.WithTimeout(ctx, config.GetAccessTokenTimeout)
defer accessTokenCancel()
token, err := c.getAccessToken(accessTokenCtx, config.UserAssignedIdentityID, config.AADResource, config.CloudProviderConfig)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it retry if it times out ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all three operations will retry on timeout — the per-operation context is re-created fresh from the parent ctx on every retry attempt, so each retry gets a full timeout window.

Specifically:

  • getAccessToken: a timeout causes token.Refresh() to fail with a context error, which flows into tokenRefreshErrorToGetAccessTokenFailure. Since a context deadline exceeded error is not an adal.TokenRefreshError, it returns bootstrapError{retryable: true} → retried.
  • getNonce / getCredential: already wrapped as bootstrapError{retryable: true} in client.bootstrap() → retried.

The overall Deadline still acts as the ultimate bound — if ctx itself expires, the retry loop stops regardless.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if I don't want to wait for the full deadline ? I would like to ensure that the total time getNonce can try to get a value is X counting the timeouts. so, there is the timeout at the client call, and the timeout for all retries together

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in commit a10740b. Three new fields (and corresponding CLI flags) cap the combined time across all retry attempts for each operation:

  • GetAccessTokenTotalTimeout / --get-access-token-total-timeout
  • GetNonceTotalTimeout / --get-nonce-total-timeout
  • GetCredentialTotalTimeout / --get-credential-total-timeout

Each budget context is created once before the retry loop in Bootstrap(), so it accumulates across retries. The per-attempt timeout is derived from the budget context (whichever deadline is shorter wins). When the budget is exhausted, the operation returns a non-retryable error and the retry loop stops.

Setting the value to 0 (the default) disables the cap — the only bound is then the overall Deadline.

Example: to allow at most 90s total across all getNonce attempts, each attempt capped at 20s:

{ "getNonceTotalTimeout": "90s", "getNonceTimeout": "20s" }

@djsly
Copy link
Copy Markdown
Contributor

djsly commented Apr 2, 2026

@copilot review comments

…eout, GetNonceTotalTimeout, GetCredentialTotalTimeout)

Agent-Logs-Url: https://github.com/Azure/aks-secure-tls-bootstrap/sessions/57228f26-fe76-4595-88ce-4372227013fb

Co-authored-by: djsly <4981802+djsly@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Timeout for diffenrent calls are all using the same values

3 participants