Skip to content

fix: force token refresh on 401 and add debug logging#4

Open
drewr wants to merge 1 commit into
feat/verify-before-readyfrom
fix/token-refresh
Open

fix: force token refresh on 401 and add debug logging#4
drewr wants to merge 1 commit into
feat/verify-before-readyfrom
fix/token-refresh

Conversation

@drewr

@drewr drewr commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Important

This PR depends on #3 and should be merged after it.

Summary

Fixes the "tunnels stop working because of stale auth" failure mode and adds opt-in debug logging for the refresh process.

Root cause: The token refresh machinery existed and worked (proactive timer in ExternalTokenSource::run_refresh_loop + on-demand force_refresh()), but the heartbeat's 401 handler force_refresh_auth in connect-lib/lib/src/heartbeat.rs only logged the 401 and never triggered a refresh — a stale comment claimed refresh was "external/out-of-band". So when a token was rejected early (clock skew, revocation, IdP-side rotation), the heartbeat kept retrying with the dead token until the proactive timer eventually fired (up to ~1h), causing tunnels to stop working.

Changes

(a) Refresh now actually fires on 401

  • Added DatumCloudClient::force_token_refresh() (connect-lib/lib/src/datum_cloud/mod.rs) delegating to ExternalTokenSource::force_refresh().
  • Rewrote force_refresh_auth (connect-lib/lib/src/heartbeat.rs) to call datum.force_token_refresh() instead of just logging, and corrected the misleading doc comment. The LoginState::Missing guard is preserved.

(b) Debug logging under a flag

  • Added a --debug CLI flag and DATUM_CONNECT_DEBUG=1 env var to datum-connect (connect-lib/bin/src/main.rs). When enabled (and RUST_LOG isn't overriding), the tracing filter is bumped to datum_connect=debug,connect_lib=debug, printing refresh events to stderr.
  • Added structured info!/debug! events in ExternalTokenSource covering: forced-refresh requests, proactive-timer fires, successful swaps (with old/new JWT expiry, forced-vs-proactive flag), and failures with backoff.

Test

  • force_refresh_swaps_token_via_loop — a new #[tokio::test] that runs the real refresh loop with a counter-based helper, triggers force_refresh(), and asserts the token actually rotates and watchers get notified (guards the regression).

All 66 connect-lib tests pass; cargo build/clippy introduce no new warnings.

The heartbeat's 401 handler only logged the rejection and never triggered
a refresh — a stale comment claimed refresh was 'external/out-of-band'.
So when a token was rejected early (clock skew, revocation, IdP rotation),
the heartbeat kept retrying with the dead token until the proactive timer
eventually fired (up to ~1h), causing tunnels to stop working.

- Add DatumCloudClient::force_token_refresh() delegating to
  ExternalTokenSource::force_refresh().
- Rewrite heartbeat force_refresh_auth to actually call it; fix the
  misleading doc comment. LoginState::Missing guard preserved.
- Add --debug CLI flag and DATUM_CONNECT_DEBUG=1 env var to datum-connect;
  bumps tracing filter to debug so refresh events print to stderr.
- Add structured info!/debug! events in ExternalTokenSource covering
  forced-refresh requests, proactive-timer fires, successful swaps
  (with old/new JWT expiry, forced-vs-proactive flag), and failures.
- Add force_refresh_swaps_token_via_loop test guarding the regression.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant