Skip to content

fix(iroh-dns): lower default TXT TTL from 30s to 5s#179

Merged
drewr merged 1 commit into
mainfrom
fix/lower-iroh-txt-ttl-default
Jun 10, 2026
Merged

fix(iroh-dns): lower default TXT TTL from 30s to 5s#179
drewr merged 1 commit into
mainfrom
fix/lower-iroh-txt-ttl-default

Conversation

@drewr

@drewr drewr commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This is a dramatic improvement to the service by reconverging a tunnel in ~1 second instead of 30+ seconds.

The TTL is the dominant restart bottleneck: control-plane plumbing (heartbeat → reconcile → DNSRecordSet → Envoy metadata) completes in <1s, but iroh-gateway's resolver serves the stale TXT until it expires.

With this change:

% ./datum-connect tunnel listen --endpoint localhost:9999
Found existing tunnel for localhost:9999:
  id: tunnel-9zllw
  label: 4a66d3990335
  endpoint: http://localhost:9999

Tunnel already configured correctly.

Your endpoint ID: 6b66d57da4dc764e357fc69d454cd101f8a2a7bd60ee682f6f768bc23065b28a
Setting up tunnel...
  ✓ tunnel accepted (0.2s) [HTTPProxy/tunnel-9zllw]
  ✓ TLS certificate issued (0.2s) [HTTPProxy/tunnel-9zllw]
  ✓ connector ready (0.2s) [Connector/datum-connect-mhxj5]
  ✓ iroh DNS published (0.2s) [Connector/datum-connect-mhxj5]
  ✓ route programmed (0.2s) [HTTPProxy/tunnel-9zllw]
  ✓ envoy metadata propagated (1.0s) [HTTPProxy/tunnel-9zllw]
Verifying connectivity...
  ✓ origin reachable (0.1s) [http://localhost:9999]: HTTP 200
  ✓ proxy responding (0.1s) [https://tip-post-nqt7t.datumproxy.net]: HTTP 200
Tunnel ready after 1 sec: https://tip-post-nqt7t.datumproxy.net
Press Ctrl+C to stop...

Measured on datum-connect:

Scenario Time to proxy responding
Restart immediately after stop, TTL=30 ~30s
Restart 35s after stop (TTL elapsed) ~1s

Cost: a few extra authoritative DNS queries per active endpoint. Negligible at connector-zone cardinality.

Refs: datum-cloud/enhancements#756

The TTL gates how quickly a restarted connector becomes reachable;
control-plane plumbing finishes in <1s but the gateway's resolver
serves the stale answer until the TTL expires.

Measured on datum-connect (headless tunnel CLI):
  Restart immediately after stop, TTL=30: ~30s to proxy responding
  Restart 35s after stop (TTL elapsed): ~1s

Refs: datum-cloud/enhancements#756
@drewr drewr merged commit 6b87e21 into main Jun 10, 2026
11 of 12 checks passed
@drewr drewr deleted the fix/lower-iroh-txt-ttl-default branch June 10, 2026 12:18
@scotwells

Copy link
Copy Markdown
Contributor

@drewr is there documentation on how this is expected to work and why this change resolves the issue? Feels like this is covering an issue somewhere else in the stack.

@drewr

drewr commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

It centers around the general symptom that a tunnel can take several minutes to provision. Some of that was fixable client-side, but the effect of a long TTL here is reproducible. It shouldn't have downstream effects because nothing else is using Iroh relay TXT records. However, that's not to say that there isn't a better design we could use.

#167

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants