Skip to content

client: add onchain reconciler to daemon with enable/disable CLI#3034

Draft
snormore wants to merge 17 commits intomainfrom
snor/client-onchain-reconciler
Draft

client: add onchain reconciler to daemon with enable/disable CLI#3034
snormore wants to merge 17 commits intomainfrom
snor/client-onchain-reconciler

Conversation

@snormore
Copy link
Contributor

@snormore snormore commented Feb 18, 2026

Related: RFC-17

Summary of Changes

  • Add an onchain reconciler to the doublezerod daemon that polls Solana program data and automatically provisions/deprovisions tunnels based on onchain user state, replacing the previous approach where the CLI drove all provisioning logic
  • Simplify the doublezero connect CLI — it now creates/activates the onchain user and enables the daemon reconciler, which handles the actual tunnel provisioning; the CLI no longer builds provision requests or calls the daemon's /provision endpoint directly
  • Add doublezero enable/doublezero disable commands and corresponding daemon HTTP endpoints to control the reconciler
  • Add a CachingFetcher shared between the reconciler and latency manager to avoid duplicate GetProgramAccounts RPC calls
  • Add client IP auto-discovery (default route → ifconfig.me fallback) so the daemon can match onchain users without an explicit --client-ip flag
  • Remove the on-disk JSON state database (db.go) — the reconciler replaces crash-recovery by re-reading onchain state on startup
  • Update doublezero status to use a new /v2/status endpoint that includes reconciler_enabled state
  • Add e2e test for the enable/disable reconciler flow

Diff Breakdown

Category Files Lines (+/-) Net
Core logic 13 +1290 / -1121 +169
Scaffolding 8 +219 / -44 +175
Tests 13 +1992 / -969 +1023
Fixtures 13 +16 / -142 -126
Docs 2 +213 / -0 +213
Total 68 +4129 / -2303 +1826

~1290 lines of core logic across 13 files (reconciler, simplified CLI connect/disconnect, client IP discovery, caching fetcher), supported by ~220 lines of scaffolding and ~2000 lines of tests.

Key files (click to expand)

Testing Verification

  • 1302 lines of reconciler unit tests covering: single/multi-user provisioning, teardown on disable, user removal on deactivation, re-provisioning on device change, concurrent enable/disable, multicast group subscriptions
  • State persistence tests (fresh install, migration from old format, malformed JSON, precedence)
  • Client IP discovery tests (explicit flag, public/private route detection)
  • Caching fetcher tests (TTL expiry, concurrent access, error passthrough)
  • E2E test (ibrl_enable_disable_test.go) validates the full enable → connect → reconcile → disable → teardown flow
  • Existing e2e tests updated to pass --client-ip to the daemon container

@snormore snormore force-pushed the snor/client-onchain-reconciler branch 3 times, most recently from 590d528 to 988a52e Compare February 18, 2026 22:41
@snormore snormore requested a review from Copilot February 18, 2026 23:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves tunnel provisioning from the doublezero CLI into an onchain-driven reconciler loop inside the doublezerod daemon, adds runtime enable/disable controls, and updates status/reporting + tests/fixtures accordingly.

Changes:

  • Add an onchain reconciler (polling + provisioning/removal) to doublezerod, with persisted enable/disable state and new /enable, /disable, /v2/status endpoints.
  • Update CLI flows (connect, status, plus new enable/disable) to interact with the reconciler rather than directly provisioning.
  • Add caching onchain fetcher + adjust tests/e2e fixtures to account for reconciler state and updated output.

Reviewed changes

Copilot reviewed 66 out of 66 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
rfcs/rfc17-client-onchain-reconciler.md RFC documenting reconciler design, state, endpoints, and rollout plan
CHANGELOG.md Changelog entry describing reconciler + CLI enable/disable
e2e/user_ban_test.go Treat missing tunnel interface as successful route withdrawal
e2e/multicast_test.go Update status fixture usage to templated reconciler-aware output
e2e/internal/fixtures/diff.go Make CLI-table diff parsing robust to non-table preamble lines
e2e/ibrl_with_allocated_ip_test.go Update status fixture usage to templated reconciler-aware output
e2e/ibrl_test.go Update status fixture usage to templated reconciler-aware output
e2e/ibrl_enable_disable_test.go New E2E test for connect → disable → enable → restart persistence lifecycle
e2e/fixtures/multicast/doublezero_status_disconnected.txt Remove old disconnected status fixture
e2e/fixtures/multicast/doublezero_status_disconnected.tmpl New reconciler-aware disconnected status fixture template
e2e/fixtures/multicast/doublezero_status_connected_subscriber.tmpl Add “Reconciler” column to connected multicast subscriber fixture
e2e/fixtures/multicast/doublezero_status_connected_publisher.tmpl Add “Reconciler” column to connected multicast publisher fixture
e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_disconnected.txt Remove old disconnected status fixture
e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_disconnected.tmpl New reconciler-aware disconnected status fixture template
e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_connected.tmpl Add “Reconciler” column to connected allocated-IP fixture
e2e/fixtures/ibrl/doublezero_status_disconnected.txt Remove old disconnected status fixture
e2e/fixtures/ibrl/doublezero_status_disconnected.tmpl New reconciler-aware disconnected status fixture template
e2e/fixtures/ibrl/doublezero_status_connected.tmpl Add “Reconciler” column to connected IBRL fixture
client/doublezerod/cmd/doublezerod/main.go Add flags for client IP, reconciler poll interval, and state dir; pass into runtime
client/doublezerod/internal/runtime/run.go Wire reconciler + state migration + caching fetcher; update routes handler wiring; latency now uses fetcher
client/doublezerod/internal/runtime/run_test.go Update runtime tests to new Run() signature; remove statefile recovery assertions
client/doublezerod/internal/runtime/clientip.go New client IP auto-discovery (explicit → interfaces → ifconfig.me)
client/doublezerod/internal/runtime/clientip_test.go Unit tests for public IP classification and explicit IP behavior
client/doublezerod/internal/reconciler/reconciler.go New reconciler loop + enable/disable endpoints + /v2/status
client/doublezerod/internal/reconciler/state.go Persisted reconciler enable/disable state + migration from old doublezerod.json
client/doublezerod/internal/reconciler/state_test.go Unit tests for state load/write + migration behavior
client/doublezerod/internal/reconciler/metrics.go Prometheus metrics for reconciler polls/provisions/removals/matched users
client/doublezerod/internal/onchain/fetcher.go New TTL-based caching fetcher shared by reconciler and latency subsystem
client/doublezerod/internal/onchain/fetcher_test.go Unit tests for caching behavior (TTL, stale-on-error, concurrency)
client/doublezerod/internal/manager/manager.go Remove DB/statefile dependency; add ResolveTunnelSrc + GetProvisionedServices
client/doublezerod/internal/manager/http_test.go Update manager HTTP tests for removed DB + new GetProvisionedServices
client/doublezerod/internal/manager/db.go Remove old on-disk state DB implementation
client/doublezerod/internal/manager/db_test.go Remove tests for deleted DB/statefile system
client/doublezerod/internal/manager/fixtures/doublezerod.*.json Remove fixtures used exclusively for deleted DB/statefile behavior
client/doublezerod/internal/services/base.go Remove DBReaderWriter interface (services now keep ProvisionRequest in memory)
client/doublezerod/internal/services/services_test.go Update service creation tests after DB removal
client/doublezerod/internal/services/ibrl.go Store ProvisionRequest in memory; drop DB usage; expose ProvisionRequest()
client/doublezerod/internal/services/edgefiltering.go Store ProvisionRequest in memory; drop DB usage; expose ProvisionRequest()
client/doublezerod/internal/services/multicast.go Store ProvisionRequest in memory; drop DB usage; expose ProvisionRequest()
client/doublezerod/internal/latency/smartcontract.go Remove old direct smartcontract fetcher module (replaced by fetcher integration)
client/doublezerod/internal/latency/manager.go Support injected Fetcher; adjust SmartContractFunc signature and fetch path
client/doublezerod/internal/latency/manager_test.go Update tests for new smartcontract func signature and removed program ID options
client/doublezerod/internal/api/routes.go Replace DBReader with ServiceStateReader (provisioned services now in manager)
client/doublezerod/internal/api/routes_test.go Update routes tests to use ServiceStateReader mock
client/doublezero/src/servicecontroller.rs Add v2 status + enable/disable calls; remove CLI provisioning/remove/resolve-route APIs
client/doublezero/src/routes.rs Remove resolve_route command surface; keep routes retrieval
client/doublezero/src/main.rs Allow enable/disable commands without version warning gate
client/doublezero/src/cli/command.rs Add doublezero enable / doublezero disable commands
client/doublezero/src/command/mod.rs Register enable/disable subcommands
client/doublezero/src/command/enable.rs New enable command implementation + unit tests
client/doublezero/src/command/disable.rs New disable command implementation + unit tests
client/doublezero/src/command/status.rs Use /v2/status; surface reconciler state in output/table; synthesize disconnected row when empty
client/doublezero/src/command/connect.rs Switch connect flow to best-effort enable reconciler + poll daemon status for provisioning
client/doublezero/src/command/disconnect.rs Stop calling daemon /remove; poll daemon for deprovision completion after onchain user deletion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@snormore snormore force-pushed the snor/client-onchain-reconciler branch from 5d87927 to 18b601d Compare February 19, 2026 14:52
Add a reconciliation loop to doublezerod that polls onchain User state
and automatically provisions or removes GRE tunnels when users are
activated or deactivated. This replaces CLI-driven provisioning and the
doublezerod.json crash-recovery state file.

Key changes:
- Reconciler goroutine with enable/disable via HTTP endpoints and
  persistent state.json
- Caching fetcher for shared onchain RPC calls between reconciler and
  latency subsystems
- Auto-discover client IP from local interfaces when --client-ip is not
  provided
- Remove on-disk state DB; services track their own ProvisionRequest in
  memory
- Migrate from old doublezerod.json on upgrade
- Add doublezero enable/disable commands
- Migrate connect from direct provisioning to enabling the reconciler
  and polling until the daemon provisions the tunnel
- Update status to show reconciler state via GET /v2/status
- Reconciler lifecycle, HTTP handler, state file, and startup migration
  tests
- Caching fetcher and client IP discovery tests
- Enable/disable CLI and connect enable-failure tests (Rust)
- E2e test for enable/disable lifecycle with daemon restart persistence
- Update e2e status fixtures for reconciler column
When no dz_mode was specified (the common case), disconnect skipped
polling for daemon deprovisioning entirely. This caused e2e tests to
race with the reconciler's 10s poll interval — the tunnel could still
be up when post-disconnect checks ran.

Now disconnect always polls the daemon status until matching services
are gone, regardless of whether a specific mode was requested.
Synthesize a "disconnected" status entry when the v2/status endpoint
returns no services, matching the legacy /status endpoint behavior.
This fixes QA agent "no data available" errors.

Fix user ban e2e test to treat interface-gone errors as successful route
withdrawal, since the reconciler tears down the tunnel interface when
the user is banned.
The CLI now returns a synthetic "disconnected" row when no services are
running. Update all e2e disconnected status fixtures to expect this row,
converting from static .txt to .tmpl to template the Reconciler value
(true after disconnect, false after disable).
- state.go: return error on non-ErrNotExist read failures instead of
  falling through to migration
- state.go: write state atomically via temp file + rename
- manager.go: ResolveTunnelSrc falls back to first non-nil Src when no
  exact Dst match from RouteGet
- run.go: validate reconciler poll interval >= 1 to prevent ticker panic
- reconciler.go: make SetEnabled non-blocking to avoid hanging HTTP
  handlers on back-to-back enable/disable calls
- diff.go: check scanner.Err() in mapFromTable
- rfc17: fix contradictory rollout statement (fresh installs start
  disabled, not enabled by default)
- Make enable failure in connect flow check v2_status before giving up,
  so it only skips tunnel polling when the reconciler is genuinely not
  enabled
- Use services.IsUnicastUser/IsMulticastUser instead of redefining them
  in the reconciler package
- Refactor CachingFetcher to use singleflight instead of holding a mutex
  during the RPC call
- Replace interface scanning with default route source hint (UDP dial to
  8.8.8.8) for client IP discovery in both daemon and CLI, falling back
  to external discovery (ifconfig.me) when the route source isn't public
Move all reconciler logic into NetlinkManager, eliminating the separate
reconciler package. The reconciler's Manager interface is no longer needed
since NetlinkManager calls its own Provision/Remove methods directly. The
Fetcher interface moves to the manager package as the sole external dependency.
The daemon's reconciler auto-discovered the GitHub runner's public IP
via ifconfig.me instead of the container's CYOA network IP, causing it
to tear down manually provisioned tunnels.
@snormore snormore force-pushed the snor/client-onchain-reconciler branch from 98a5cd4 to fc867b3 Compare February 19, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments