Add realtime operator display: push-based reporting + console viewer#45
Add realtime operator display: push-based reporting + console viewer#45firstmorecoffee wants to merge 3 commits intodeimoscontrols:mainfrom
Conversation
b5f9cc7 to
eb326f7
Compare
|
Neat! This is very cool - it'll take me a bit to dig through all of it, think about multicast networks, try it out, and have some thoughts. I like the core premise, and think we can lean into parts of it even more (there are no third-party plugins yet, so we can make units a required output of calcs, but we'll need to eventually combine that with a run-time unit checking system similar to Thanks again for putting this together! |
fbc2e7e to
30f6a81
Compare
|
Finally had a chance to test-drive this today! It's very slick. I like the super lightweight, low-latency plotting interface; I might have reached for some heavier web-y stack like leptos+plotly myself, out of convenience, but this native-first egui setup is objectively better since it's less complex, optimizes better, and can run on closed networks. The multicast system for getting data to the consoles also solves a long-standing issue that I've had in the back of my head re: how to handle multiple user consoles without over-subscribing the root node. Postcard with periodic schema delivery is an excellent balance of simplicity and performance to accomplish that without a shared compiled boundary. Couple notes from running it with a Deimos DAQ rev7 hardware attached via ethernet-usb adapter on an ubuntu laptop & running through a codex review pass:
|
|
@firstmorecoffee Here's a snip of the gap after navigating away for ~5sec, and the very large number of dropped frames due to navigating away to write the previous comment
|
30f6a81 to
a6a7554
Compare
|
Updates: 1. Stall on minimize / drain-the-backlog behavior. Reproduced via
Fix is viewer-side only (no wire-format or controller change):
Verified end-to-end: post- 2. 3. |
Adds a decision-grade live view of controller signals for test-stand
operators, complementing the existing firmware-level auto-abort
interlocks. The Grafana path (dispatchers -> TimescaleDB -> query) is
too indirect for judgment-cadence decisions; this change gives the
control room a wait-free direct view whose contents can be defended as
equal to what the controller saw.
- ReportingDispatcher serializes each per-cycle Row onto a UDP
multicast transport (default 239.255.0.1:29573). Writes are wait-free
and infallible from the control loop: WouldBlock increments a drop
counter and returns Ok(()).
- Wire format is postcard-encoded ReportingMessage::{Schema, Row} with
a leading byte tag; the Schema re-emits every 2 s (configurable) so
late-joining viewers discover channels within a bounded window.
- terminate() emits one final session-end Schema so viewers can mark
clean session boundaries.
- Schema/Row round-trip tests plus a HOOTL example (hootl_reporting.rs)
verify no drops in the normal path.
- Pure-Rust eframe/egui viewer; no browser, no npm. Matches Deimos's
"no platform restrictions" principle.
- Receiver thread joins the multicast group via socket2, buffers Row
until the first Schema arrives, then renders one egui_plot Plot per
configured panel with unit-labeled y-axes at a fixed 30 Hz repaint
cadence.
- Sequence-gap detection inserts NaN sentinels so dropped frames render
as explicit discontinuities, never silent interpolation.
- Three-state connection-health indicator (Fresh / Stale / NoSchemaYet)
with a configurable staleness threshold (default 2 s).
- Freeze-and-inspect toggle halts scrolling while the receiver
continues feeding the ring buffer and forensic log.
- Per-session forensic CSV records (viewer_received_at, seq,
controller_timestamp, controller_system_time, <channels...>) with
size-based rotation so post-run review can correlate what was on
screen with the underlying sample.
- Calc::get_output_units() -> Vec<Option<String>> with a default
returning all None - third-party calc plugins keep compiling.
- RtdPt100 and TcKtype declare K; Affine, InverseAffine, Polynomial,
Constant, and Sin accept an optional output_unit via a
with_output_unit(...) builder.
- CalcOrchestrator::get_dispatch_units() mirrors get_dispatch_names().
- Units flow from calcs to dispatchers via ControllerCtx.channel_units
(parallel to channel_names); the Dispatcher::init signature is
unchanged so user dispatcher plugins keep compiling.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Turns the console from a "does it work?" black box into an inspectable pipeline during debug, cleans up existing log noise, and fixes a handful of readability issues exposed by the first round of hands-on use. Viewer stderr telemetry (app.rs, main.rs): - Per-second tick with row count, rate, last seq, and per-category drop counters. - Wall-clock rx-lag min/mean/max, anchored on Row.system_time (the controller's cycle-start SystemTime::now()) so it answers "how old is what the operator sees" across transport + UI drain + any clock skew. A long comment in process_row captures why that number sits near one cycle_duration on loopback and documents the rejected epoch_ns + timestamp approach. - Schema lines tagged new-session / re-emission / session-end so controller lifecycle is visible alongside the UI. - Local HH:MM:SS.mmm prefix on every line for easy correlation with the controller's ISO-timestamped tracing output. - Idle suppression: after three consecutive zero-row ticks, emit only one heartbeat per minute instead of flooding the log. New ConnectionHealth::SessionEnded state: - Fires the instant the controller's session-end Schema arrives, not after the 2 s stale timeout — an operator can distinguish "clean shutdown" from "silent stall" immediately. - Replaces the secondary "Session ended." text line previously drawn below the indicator; unit test asserts precedence over Fresh. UI palette and layout (app.rs): - Replace saturated screen-primary colors (YELLOW/GREEN/RED) with a muted palette (MINT_GREEN, AMBER, CORAL_RED, SKY_BLUE, ORCHID) reserved for indicator glyphs only. - Labels render in the theme default foreground color so the text is legible on both dark and light themes; status communicated via a leading dot / warning glyph. - "All clean" dropped-frames row uses default text, not GRAY — no more looking-like-a-disabled-widget when nothing is wrong. Controller noise cleanup (csv.rs): - CSV-dispatcher core_affinity::set_for_current returned-false warning downgraded from warn! to debug!. Affinity on that path is best-effort; macOS's advisory scheduler always returns false. Still surfaceable via RUST_LOG=deimos::dispatcher::csv=debug. Two-terminal debug workflow (scripts/, .gitignore): - scripts/hootl-console / scripts/console run the two sides with RUST_LOG=deimos=debug,info and tee timestamped transcripts under logs/. - Forensic log pre-enabled in the example config (commented path explains per-session suffixing and 64 MiB rotation). - New dep: chrono (clock feature, no default features) for local-time formatting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The console could display stale data on a "live" plot for up to one window after the UI thread paused (e.g. an obscured window on Linux). Two root causes, both reproduced via SIGSTOP/SIGCONT: 1. The receiver thread's bounded channel used `try_send`, which on full drops the *newest* incoming message and keeps the oldest 3000 — so after a long pause the viewer is guaranteed stale-first. 2. The per-channel ring buffer evicted on `timestamp − window_seconds`, so pre-pause samples lingered in the visible window until live controller time advanced past them. Fix on the viewer side (no wire-format or controller changes): - Receiver: drop-oldest semantics on a full channel. New `OVERWRITTEN_FRAMES` counter distinguishes drop-oldest evictions from genuine packet loss in `DROPPED_FRAMES`, and is only incremented on actual displacement (the eviction-then-retry cannot fail under the single-writer invariant). - App: stall detection in `drain_messages` — wall-clock gap between drains, or a drained row whose `received_at − controller_system_time` exceeds `staleness_threshold_secs`. On a stall, clear pre-stall samples down to `tail_keep_secs` and splice a NaN discontinuity sentinel before the first retained post-stall point. Empty drains no longer advance `last_drain_at`, so stall detection fires on the first drain after a pause rather than being masked. - Sentinel emission deferred to a post-second-pass `apply_pending_stall` step so catch-up rows can't render after the sentinel, and so a stall that fires while the view is frozen still emits its sentinel on the first non-frozen drain after unfreeze. - New `Recovering` connection-health state: amber dot for `recovery_settle_secs` after the last stall, then back to Fresh. - Per-second telemetry tick gains `recv_drops`, `overwritten_frames`, `wire_drops`, `stalls_detected`, `stale_rows_evicted`. - Lag accounting skips rows whose `system_time` falls outside `chrono::DateTime::timestamp_nanos_opt`'s representable range so a misconfigured clock can't inject a ~56-year spurious lag value. - Forensic log records *receipt*, not display, so stall-elided rows still appear in the audit trail. Also bundled (prior scramble review on this branch): - Bound freeze-buffer growth and broaden console test coverage. - Mark `deimos-console` as `publish = false` (internal binary). - Fold `hootl_reporting` smoke test into `hootl_with_console`. - Misc doc-string corrections (csv_row width claim, hootl_lifecycle initial-state name, receiver loop description, dropped_frames / Schema re-emission notes, stalls_detected tick-line description, pre-Schema-evicted-rows carve-out). - `tests/integration_wire.rs` uses `.expect(...)` on dispatcher consume/terminate so contract violations surface as test failures. Spec: `improve-console-freshness-on-stall` adds the freshness invariant and stall-recovery requirement to `realtime-reporting`, archived under `openspec/changes/archive/2026-04-27-improve-console- freshness-on-stall/`. The dispatcher CLAUDE.md gains a one-line cross-reference to the recv-side freshness invariants. End-to-end verified: SIGSTOP/SIGCONT on the patched binary reports `stalls_detected=1` post-resume with no false positives during steady streaming. Higher-rate overflow of the bounded channel was not reproducible in this environment because the OS UDP buffer drops backlogs first (`wire_drops`); not a regression — an environment limit on forcing channel overflow. Verified: cargo test -p deimos -p deimos-console (all pass), cargo fmt --all -- --check (clean), scripts/validate-branch james/main (clippy-on-this-branch's-lines clean). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a6a7554 to
9a247ae
Compare
|
Cool! Those fixes sound reasonable. I tried this out again today using the basic.rs example connected to a rev7 DAQ at 100Hz samplerate. Still seeing a gap and some rubberbanding after minimizing and then bringing the console up again. I think the egui app update loop might be pausing when minimized, which makes it so that any logic implemented behind it to better handle buffer overflow doesn't end up running at all. We might have to separate the buffering logic from the display update, which would be good for handling higher data rates in any case - with 1kHz+ samplerate, that'll drive a lot of display updates that aren't visible anyway. console_rubberbanding.webmIt looks like the only way to get around it is to run the buffer processing on another thread with an update loop that isn't tied to egui's display update. I'll see if I can prototype something along those lines as a proof-of-concept. |
|
Here's a proof-of-concept with the buffer processing on a different thread from the UI update. This does resolve the rubberbanding, and the console is able to stay stable up to a 1kHz data rate. As written, the buffer processing thread uses the same update rate as the UI draw cycle (30Hz), but I think a higher rate would be preferable in order to reduce maximum frame lag to well below human reaction time. Maybe 120Hz buffer processing and configurable UI draw rate up to 60Hz or something along those lines. console_stable.webm |

Summary
Adds a decision-grade live view of controller signals for test-stand operators, complementing the existing firmware-level auto-abort interlocks. The Grafana path (dispatchers → TimescaleDB → query) is too indirect for judgment-cadence decisions; this gives the control room a wait-free direct view whose contents can be defended as equal to what the controller saw.
Reporting dispatcher (
software/deimos/src/dispatcher/reporting/)239.255.0.1:29573; writes are wait-free and infallible from the control loop (WouldBlockincrements a drop counter and returnsOk(())).postcard-encodedReportingMessage::{Schema, Row}with a leading byte tag. Schema re-emits every 2 s (configurable) so late-joining viewers discover channels within a bounded window.terminate()emits one final session-end Schema for clean boundary detection.deimos-console viewer (
software/deimos-console/)eframe/egui— no browser, no npm.socket2, buffersRowuntil the first Schema, then renders oneegui_plotPlot per configured panel with unit-labeled y-axes at 30 Hz repaint.NoSchemaYet/Fresh/Stale/SessionEnded/ReceiverDead, with a configurable staleness threshold (default 2 s).SessionEndedfires the instant the controller's session-end Schema arrives so operators distinguish clean shutdown from silent stall immediately, without waiting out the stale timeout.(viewer_received_at, seq, controller_timestamp, controller_system_time, <channels...>)so post-run review correlates what was on screen with the underlying sample.Viewer observability
Row.system_time(the controller's cycle-startSystemTime::now()) so the number answers "how old is what the operator sees" across transport + UI drain + any clock skew. A long comment inprocess_rowcaptures why that value sits near onecycle_durationon loopback and documents the rejectedepoch_ns + timestampalternative.new-session/re-emission/session-endso controller lifecycle is visible alongside the UI.chronodep (clockfeature, no default features) for the localHH:MM:SS.mmmlog prefix that lets viewer lines correlate with the controller's ISO-timestamped tracing output.Calc unit metadata
Calc::get_output_units() -> Vec<Option<String>>is a required trait method. Every built-in calc declares its output units explicitly.RtdPt100,TcKtype→Some("K").SequenceMachine→Some("s")forsequence_time_s,Nonefor user-defined data channels.Affine,InverseAffine,Polynomial,Constant,Sin→ optionaloutput_unitvia awith_output_unit(...)builder.Butter2andPidreturnvec![None]with a doc comment noting that passthrough-inheritance from the input channel's unit is the intended eventual behavior, onceCalcOrchestratorplumbs input units into each calc'sinit.CalcOrchestrator::get_dispatch_units()mirrorsget_dispatch_names(). Units flow to dispatchers via the newControllerCtx.channel_unitsfield (parallel tochannel_names).Dispatcher::initsignature is unchanged.Incidental cleanup
CsvDispatcherdowngrades thecore_affinity::set_for_currentreturned-false warning fromwarn!todebug!. Affinity on that path is best-effort — macOS's advisory scheduler always returns false — and the noise obscures real issues at the default log level. Still surfaceable viaRUST_LOG=deimos::dispatcher::csv=debug.Test plan
cargo test --manifest-path software/deimos/Cargo.toml— 38 passingcargo test --manifest-path software/deimos-console/Cargo.toml— 7 passing (5 unit + 2 integration)cargo run --example hootl_reporting --manifest-path software/deimos/Cargo.toml— exercises reporting dispatcher end-to-end, asserts zero dropped framescargo run --example hootl_with_console --manifest-path software/deimos/Cargo.toml— two-terminal run renders live signals with unit-labeled y-axesDocs
software/deimos-console/README.md— TOML config format and invocation.Deferred (noted for follow-up)
Butter2/Pidpassthrough-inheritance of input units — requiresCalcOrchestratorto pass the assembled channel-units slice into each calc'sinit; tracked as a separate follow-on.