Where you are: docs → reference → api → peer-sync Read this first: api.md See also: subsystems/switcher.md · deployment.md
TL;DR Nine endpoints handle dual-engine coordination: a lightweight health endpoint polled between engines, a force-takeover endpoint gated by a shared secret, a sync-health probe returning clock and sequence state, and a reconcile-lock protocol (acquire / refresh / release) plus state-snapshot get/apply for active-active reconciliation. The reconcile lock ensures only one engine at a time runs the active reconciler during a failover.
Purpose: unauthenticated health probe called between peer engines at ~5 Hz to detect failures faster than the control plane.
Auth: none (in authExemptPaths).
Handler: control/api_peer.go → (*API).handlePeerHealth.
Response 200 (peer.HealthResponse):
{
"status": "ok",
"uptimeMs": 12345,
"programSource": "cam-1",
"lastCommandSeq": 1729,
"engineLabel": "a",
"epoch": 17
}Errors: none. Defensive nil guards keep this endpoint from DoSing on refactor breakage.
Purpose: control-plane-initiated takeover. The control plane sends X-Control-Plane-Secret: <secret> to force an engine to become the leader regardless of its current follower status. When no secret is configured on the engine, this returns 501.
Auth: X-Control-Plane-Secret header (not bearer token — in authExemptPaths).
Handler: control/api_force_leader.go → (*API).handleForceLeader.
Request body: peer.ForceLeaderRequest.
Errors: 401 (missing / wrong secret), 501 (not configured), 409 (already leader).
Purpose: return lightweight health info: clock sync state, last command sequence, last executed sequence, last executed error, late-command count, uptime, fast-control pong timestamp.
Handler: control/api.go → (*API).handleSyncHealth.
Response 200:
{
"clockSync": { "mode": "ptp", "offsetNs": 1234, "jitterNs": 500 },
"lastCommandSeq": 1729,
"lastExecutedSeq": 1728,
"lastExecutedErr": "",
"lateCommands": 0,
"uptimeMs": 12345,
"lastPongSentUs": 1713200000000000
}Polled at 1 Hz by the browser in dual-engine mode for clock calibration.
Purpose: return a consistent snapshot of the engine's full state for peer reconciliation.
Handler: control/api_sync.go → (*API).handleGetStateSnapshot.
Response 200: sync.StateSnapshot.
Purpose: apply a peer's state snapshot to this engine (used during reconciliation after a split-brain).
Handler: (*API).handleApplyStateSnapshot.
Request body: sync.StateSnapshot.
Errors: 400 (invalid snapshot), 409 (reconcile-lock not held).
Purpose: acquire the reconcile lock. Returns the lock ID and a TTL. Lock is refreshed via PUT with the ID.
Handler: (*API).handleAcquireReconcileLock.
Response 200: { "lockId": "...", "expiresMs": 30000 }.
Errors: 409 (another reconciler holds the lock).
Purpose: refresh the lock's TTL.
Handler: (*API).handleRefreshReconcileLock. Errors: 404 (lock expired or wrong ID).
Handler: (*API).handleReleaseReconcileLock.
- Reference: api.md · state-broadcast.md
- Subsystems: switcher.md
- Operations: deployment.md