client: add onchain reconciler to daemon with enable/disable CLI#3034
Draft
client: add onchain reconciler to daemon with enable/disable CLI#3034
Conversation
590d528 to
988a52e
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR moves tunnel provisioning from the doublezero CLI into an onchain-driven reconciler loop inside the doublezerod daemon, adds runtime enable/disable controls, and updates status/reporting + tests/fixtures accordingly.
Changes:
- Add an onchain reconciler (polling + provisioning/removal) to
doublezerod, with persisted enable/disable state and new/enable,/disable,/v2/statusendpoints. - Update CLI flows (
connect,status, plus newenable/disable) to interact with the reconciler rather than directly provisioning. - Add caching onchain fetcher + adjust tests/e2e fixtures to account for reconciler state and updated output.
Reviewed changes
Copilot reviewed 66 out of 66 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| rfcs/rfc17-client-onchain-reconciler.md | RFC documenting reconciler design, state, endpoints, and rollout plan |
| CHANGELOG.md | Changelog entry describing reconciler + CLI enable/disable |
| e2e/user_ban_test.go | Treat missing tunnel interface as successful route withdrawal |
| e2e/multicast_test.go | Update status fixture usage to templated reconciler-aware output |
| e2e/internal/fixtures/diff.go | Make CLI-table diff parsing robust to non-table preamble lines |
| e2e/ibrl_with_allocated_ip_test.go | Update status fixture usage to templated reconciler-aware output |
| e2e/ibrl_test.go | Update status fixture usage to templated reconciler-aware output |
| e2e/ibrl_enable_disable_test.go | New E2E test for connect → disable → enable → restart persistence lifecycle |
| e2e/fixtures/multicast/doublezero_status_disconnected.txt | Remove old disconnected status fixture |
| e2e/fixtures/multicast/doublezero_status_disconnected.tmpl | New reconciler-aware disconnected status fixture template |
| e2e/fixtures/multicast/doublezero_status_connected_subscriber.tmpl | Add “Reconciler” column to connected multicast subscriber fixture |
| e2e/fixtures/multicast/doublezero_status_connected_publisher.tmpl | Add “Reconciler” column to connected multicast publisher fixture |
| e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_disconnected.txt | Remove old disconnected status fixture |
| e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_disconnected.tmpl | New reconciler-aware disconnected status fixture template |
| e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_connected.tmpl | Add “Reconciler” column to connected allocated-IP fixture |
| e2e/fixtures/ibrl/doublezero_status_disconnected.txt | Remove old disconnected status fixture |
| e2e/fixtures/ibrl/doublezero_status_disconnected.tmpl | New reconciler-aware disconnected status fixture template |
| e2e/fixtures/ibrl/doublezero_status_connected.tmpl | Add “Reconciler” column to connected IBRL fixture |
| client/doublezerod/cmd/doublezerod/main.go | Add flags for client IP, reconciler poll interval, and state dir; pass into runtime |
| client/doublezerod/internal/runtime/run.go | Wire reconciler + state migration + caching fetcher; update routes handler wiring; latency now uses fetcher |
| client/doublezerod/internal/runtime/run_test.go | Update runtime tests to new Run() signature; remove statefile recovery assertions |
| client/doublezerod/internal/runtime/clientip.go | New client IP auto-discovery (explicit → interfaces → ifconfig.me) |
| client/doublezerod/internal/runtime/clientip_test.go | Unit tests for public IP classification and explicit IP behavior |
| client/doublezerod/internal/reconciler/reconciler.go | New reconciler loop + enable/disable endpoints + /v2/status |
| client/doublezerod/internal/reconciler/state.go | Persisted reconciler enable/disable state + migration from old doublezerod.json |
| client/doublezerod/internal/reconciler/state_test.go | Unit tests for state load/write + migration behavior |
| client/doublezerod/internal/reconciler/metrics.go | Prometheus metrics for reconciler polls/provisions/removals/matched users |
| client/doublezerod/internal/onchain/fetcher.go | New TTL-based caching fetcher shared by reconciler and latency subsystem |
| client/doublezerod/internal/onchain/fetcher_test.go | Unit tests for caching behavior (TTL, stale-on-error, concurrency) |
| client/doublezerod/internal/manager/manager.go | Remove DB/statefile dependency; add ResolveTunnelSrc + GetProvisionedServices |
| client/doublezerod/internal/manager/http_test.go | Update manager HTTP tests for removed DB + new GetProvisionedServices |
| client/doublezerod/internal/manager/db.go | Remove old on-disk state DB implementation |
| client/doublezerod/internal/manager/db_test.go | Remove tests for deleted DB/statefile system |
| client/doublezerod/internal/manager/fixtures/doublezerod.*.json | Remove fixtures used exclusively for deleted DB/statefile behavior |
| client/doublezerod/internal/services/base.go | Remove DBReaderWriter interface (services now keep ProvisionRequest in memory) |
| client/doublezerod/internal/services/services_test.go | Update service creation tests after DB removal |
| client/doublezerod/internal/services/ibrl.go | Store ProvisionRequest in memory; drop DB usage; expose ProvisionRequest() |
| client/doublezerod/internal/services/edgefiltering.go | Store ProvisionRequest in memory; drop DB usage; expose ProvisionRequest() |
| client/doublezerod/internal/services/multicast.go | Store ProvisionRequest in memory; drop DB usage; expose ProvisionRequest() |
| client/doublezerod/internal/latency/smartcontract.go | Remove old direct smartcontract fetcher module (replaced by fetcher integration) |
| client/doublezerod/internal/latency/manager.go | Support injected Fetcher; adjust SmartContractFunc signature and fetch path |
| client/doublezerod/internal/latency/manager_test.go | Update tests for new smartcontract func signature and removed program ID options |
| client/doublezerod/internal/api/routes.go | Replace DBReader with ServiceStateReader (provisioned services now in manager) |
| client/doublezerod/internal/api/routes_test.go | Update routes tests to use ServiceStateReader mock |
| client/doublezero/src/servicecontroller.rs | Add v2 status + enable/disable calls; remove CLI provisioning/remove/resolve-route APIs |
| client/doublezero/src/routes.rs | Remove resolve_route command surface; keep routes retrieval |
| client/doublezero/src/main.rs | Allow enable/disable commands without version warning gate |
| client/doublezero/src/cli/command.rs | Add doublezero enable / doublezero disable commands |
| client/doublezero/src/command/mod.rs | Register enable/disable subcommands |
| client/doublezero/src/command/enable.rs | New enable command implementation + unit tests |
| client/doublezero/src/command/disable.rs | New disable command implementation + unit tests |
| client/doublezero/src/command/status.rs | Use /v2/status; surface reconciler state in output/table; synthesize disconnected row when empty |
| client/doublezero/src/command/connect.rs | Switch connect flow to best-effort enable reconciler + poll daemon status for provisioning |
| client/doublezero/src/command/disconnect.rs | Stop calling daemon /remove; poll daemon for deprovision completion after onchain user deletion |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_disconnected.tmpl
Show resolved
Hide resolved
packethog
requested changes
Feb 19, 2026
5d87927 to
18b601d
Compare
Add a reconciliation loop to doublezerod that polls onchain User state and automatically provisions or removes GRE tunnels when users are activated or deactivated. This replaces CLI-driven provisioning and the doublezerod.json crash-recovery state file. Key changes: - Reconciler goroutine with enable/disable via HTTP endpoints and persistent state.json - Caching fetcher for shared onchain RPC calls between reconciler and latency subsystems - Auto-discover client IP from local interfaces when --client-ip is not provided - Remove on-disk state DB; services track their own ProvisionRequest in memory - Migrate from old doublezerod.json on upgrade
- Add doublezero enable/disable commands - Migrate connect from direct provisioning to enabling the reconciler and polling until the daemon provisions the tunnel - Update status to show reconciler state via GET /v2/status
- Reconciler lifecycle, HTTP handler, state file, and startup migration tests - Caching fetcher and client IP discovery tests - Enable/disable CLI and connect enable-failure tests (Rust) - E2e test for enable/disable lifecycle with daemon restart persistence - Update e2e status fixtures for reconciler column
When no dz_mode was specified (the common case), disconnect skipped polling for daemon deprovisioning entirely. This caused e2e tests to race with the reconciler's 10s poll interval — the tunnel could still be up when post-disconnect checks ran. Now disconnect always polls the daemon status until matching services are gone, regardless of whether a specific mode was requested.
Synthesize a "disconnected" status entry when the v2/status endpoint returns no services, matching the legacy /status endpoint behavior. This fixes QA agent "no data available" errors. Fix user ban e2e test to treat interface-gone errors as successful route withdrawal, since the reconciler tears down the tunnel interface when the user is banned.
The CLI now returns a synthetic "disconnected" row when no services are running. Update all e2e disconnected status fixtures to expect this row, converting from static .txt to .tmpl to template the Reconciler value (true after disconnect, false after disable).
- state.go: return error on non-ErrNotExist read failures instead of falling through to migration - state.go: write state atomically via temp file + rename - manager.go: ResolveTunnelSrc falls back to first non-nil Src when no exact Dst match from RouteGet - run.go: validate reconciler poll interval >= 1 to prevent ticker panic - reconciler.go: make SetEnabled non-blocking to avoid hanging HTTP handlers on back-to-back enable/disable calls - diff.go: check scanner.Err() in mapFromTable - rfc17: fix contradictory rollout statement (fresh installs start disabled, not enabled by default)
- Make enable failure in connect flow check v2_status before giving up, so it only skips tunnel polling when the reconciler is genuinely not enabled - Use services.IsUnicastUser/IsMulticastUser instead of redefining them in the reconciler package - Refactor CachingFetcher to use singleflight instead of holding a mutex during the RPC call - Replace interface scanning with default route source hint (UDP dial to 8.8.8.8) for client IP discovery in both daemon and CLI, falling back to external discovery (ifconfig.me) when the route source isn't public
Move all reconciler logic into NetlinkManager, eliminating the separate reconciler package. The reconciler's Manager interface is no longer needed since NetlinkManager calls its own Provision/Remove methods directly. The Fetcher interface moves to the manager package as the sole external dependency.
The daemon's reconciler auto-discovered the GitHub runner's public IP via ifconfig.me instead of the container's CYOA network IP, causing it to tear down manually provisioned tunnels.
98a5cd4 to
fc867b3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related: RFC-17
Summary of Changes
doublezeroddaemon that polls Solana program data and automatically provisions/deprovisions tunnels based on onchain user state, replacing the previous approach where the CLI drove all provisioning logicdoublezero connectCLI — it now creates/activates the onchain user and enables the daemon reconciler, which handles the actual tunnel provisioning; the CLI no longer builds provision requests or calls the daemon's/provisionendpoint directlydoublezero enable/doublezero disablecommands and corresponding daemon HTTP endpoints to control the reconcilerCachingFetchershared between the reconciler and latency manager to avoid duplicateGetProgramAccountsRPC calls--client-ipflagdb.go) — the reconciler replaces crash-recovery by re-reading onchain state on startupdoublezero statusto use a new/v2/statusendpoint that includesreconciler_enabledstateDiff Breakdown
~1290 lines of core logic across 13 files (reconciler, simplified CLI connect/disconnect, client IP discovery, caching fetcher), supported by ~220 lines of scaffolding and ~2000 lines of tests.
Key files (click to expand)
client/doublezerod/internal/manager/manager.go— reconciler loop, onchain state polling, provision/deprovision reconciliation, replaces DB-backed state with onchain-driven approachclient/doublezero/src/command/connect.rs— simplified to create onchain user + enable reconciler; removed direct provision request constructionclient/doublezero/src/command/status.rs— uses/v2/statusendpoint, shows reconciler state, synthetic disconnected entryclient/doublezero/src/servicecontroller.rs— simplified: removed provision/resolve-route methods, added enable/disable/v2-statusclient/doublezerod/internal/runtime/clientip.go— client IP auto-discovery (default route source → ifconfig.me fallback)client/doublezerod/internal/onchain/fetcher.go— caching fetcher shared between reconciler and latency managerclient/doublezerod/internal/manager/http.go— enable/disable/v2-status HTTP handlersclient/doublezerod/internal/manager/state.go— reconciler enabled/disabled state persistence with migration from old formatTesting Verification
ibrl_enable_disable_test.go) validates the full enable → connect → reconcile → disable → teardown flow--client-ipto the daemon container