fix(repeater-liveness): address issue #755 review feedback#798
Closed
efiten wants to merge 43 commits intoKpa-clawbot:masterfrom
Closed
fix(repeater-liveness): address issue #755 review feedback#798efiten wants to merge 43 commits intoKpa-clawbot:masterfrom
efiten wants to merge 43 commits intoKpa-clawbot:masterfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Async timers (setInterval/setTimeout) started by animateHop() can fire after destroy() has nulled animLayer and removed DOM elements. This caused three console errors on the Live page when navigating away mid- animation. Guards added at each async callback site. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…a-clawbot#475) bufferPacket was stamping all live WS packets with Date.now() (receive time). Packets arriving in the same batch all got identical timestamps, making the message history show the same "Xs ago" for every entry. Use pkt.timestamp || pkt.created_at (mirroring dbPacketToLive) so each packet reflects its actual origination time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…a-clawbot#284) Two bugs remained after prior fix attempts: 1. buildExport() diffed home fields against DEFAULTS instead of the server's original config (_SITE_CONFIG_ORIGINAL_HOME). If the server provided custom steps and the user only edited the checklist, those server steps were captured in localStorage — blocking future server updates from being visible after reload. 2. autoSave() synced branding to SITE_CONFIG but not home, so the live preview (hashchange re-render) read stale SITE_CONFIG.home and didn't reflect the user's edits until a full page reload. Fix: diff home fields against _SITE_CONFIG_ORIGINAL_HOME in buildExport, and sync state.home back to SITE_CONFIG.home in autoSave using the same original snapshot as base to prevent cumulative mutations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ve (Kpa-clawbot#288) P0 items from issue Kpa-clawbot#288: - Override indicators: small ⬤ dot next to any setting that differs from the server's configured value (not just DEFAULTS). Badge on each tab showing the count of overridden settings in that section. - "Saved!" toast: brief green pill animates in the panel after every autoSave, replacing the removed manual save button. - Remove "Save as my theme" button: autoSave already handles this; the Export tab now shows an override count summary instead. - Full export: Download/Copy now exports all current settings (not just diffs), so admins get a complete theme.json. localStorage still stores only user overrides. - Per-setting reset uses server values: Reset buttons for colors/branding restore to server config (theme.json / config.json), not DEFAULTS. - _serverState snapshot: computed once in initState() from DEFAULTS + server config (no localStorage) — single source of truth for all override comparisons. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rideIndicators Dots and badges were only updated on full render() (tab switch). Now autoSave() calls refreshOverrideIndicators() which updates tab badges and field dots in-place without a full re-render, so indicators appear immediately when a setting is changed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…a-clawbot#464) When groupByHash=true, each group only carries the representative observer_id. Client-side filter was checking only that field, silently dropping groups seen by the selected observer but with a different representative. Fix: pass observer param to the server so filterPackets/buildGroupedWhere do the correct "any observation matches" check. Skip the broken client-side observer filter in grouped mode. Extend both db and store paths to support comma-separated observer IDs (multi-select UI). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Kpa-clawbot#463) Two complementary fixes: 1. Ingestor (cmd/ingestor/main.go): move the status topic handler BEFORE the JSON parse. Previously a non-JSON or malformed status payload caused an early return, skipping UpsertObserver entirely and leaving last_seen stale. Now last_seen is always updated when a status message arrives, regardless of payload format. 2. Frontend (public/observers.js): replace Date.now() with a server-clock- corrected reference in healthStatus(). The API response already includes server_time; we now capture it at fetch time and use it (plus elapsed ms) as the reference point. This eliminates client/server clock skew from the online/stale/offline calculation. Adds TestHandleMessageStatusNonJSONPayload to cover the non-JSON case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
estimatedMemoryMB() used a hardcoded formula (packets × 5120 + obs × 500) that ignored distHops (1.5M records), distPaths (93K records), and spIndex (4.1M entries). This caused the formula to report ~1.2GB while actual heap was ~5GB, so the memory-based eviction check barely fired and the process grew until OOM-killed. Replace estimatedMemoryMB() with runtime.ReadMemStats so all data structures are counted. Replace the inner eviction simulation loop (which re-used the wrong formula) with a proportional calculation: if heap is N× over budget, evict enough packets to keep (1/N × 0.9) of the current count, with a 10% buffer so the next ingest cycle doesn't immediately re-trigger. Add memoryEstimator func() float64 field to PacketStore for test injection, so eviction tests stay deterministic without spawning goroutines or relying on process heap size. Addresses the "Packet store estimated memory" item in epic Kpa-clawbot#559. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Direct/zero-hop adverts always use pathByte=0x00 regardless of the node's actual hash size setting. When hashCount==0, the hash size bits are meaningless — display them as unknown rather than "1 byte". - packets.js: suppress Hash Size card entry and Advertised Hash Size field row when hashCount==0; update Path Length description to say "direct advert" instead of showing spurious hash_size=1 - nodes.js: skip per-advert hash size badge for zero-hop packets Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirror the server decoder test suite — add TestZeroHopTransportDirectHashSize and TestZeroHopTransportDirectHashSizeWithNonZeroUpperBits to the ingestor, closing the symmetry gap flagged in PR Kpa-clawbot#653 review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…awbot#662) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…es (Kpa-clawbot#662) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d GetNodeHealth test (Kpa-clawbot#662)
…detail pane (Kpa-clawbot#662) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- removeFromRelayTimeIndex: use in-place append instead of extra allocation - roles.js: revert window.HEALTH_THRESHOLDS to bare HEALTH_THRESHOLDS; fix test to expose bare global in VM context - docs: remove plan/spec process artifacts from repo - relayMetrics: document last_relayed behavior for entries older than 24h - addTxToRelayTimeIndex: add comment explaining FirstSeen parse rationale - nodes.js: show last relayed time in idle repeater explanation when available - nodes.js: fix relay stats template indentation in both detail pane blocks - TestRelayTimesWiredIntoIngest: add relayKeys <= hopKeys bounds assertion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ot#662) byPathHop was storing resolved pubkeys with original case while relayTimes and routes.go lookups (addCandidates(lowerPK)) both use lowercase, breaking the subset invariant and causing missed lookups for uppercase pubkeys. Lowercase consistently in both add and remove. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Kpa-clawbot#662) backfillResolvedPathsAsync was updating tx.ResolvedPath via pickBestObservation but never updating relayTimes. On a fresh deploy (or any instance where resolved_path was NULL in the DB), all repeaters showed as Idle despite active relay traffic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ats (Kpa-clawbot#662) Two bugs: 1. addTxToRelayTimeIndex only used tx.ResolvedPath (best observation). If the best observer received a packet directly while another observer received it via a repeater, that repeater's relay activity was invisible. Fix: scan ALL observations' resolved paths. Insert is now idempotent so the function can be called multiple times safely. removeFromRelayTimeIndex updated symmetrically to avoid orphaned entries on eviction. Also add an else branch in pollAndMerge so new observations on existing packets (path unchanged) still update relay times immediately. 2. renderDetail/loadFullNode called getStatusInfo(n) where n = data.node (from /api/nodes/{pubkey}, no relay stats). Relay counts and status were always undefined / Idle. Fix: assign n.stats = stats (from the health endpoint) before calling getStatusInfo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pa-clawbot#662) New 'St.' column after Last Seen shows 🟢/🟡/⚪ for each node. Hovering shows full context: 'Relaying — Relayed 2592 packets in last 24h, last 26s ago' (or 'Idle — Alive but no relay traffic...', 'Stale — Not heard for...'). Also replaces the manual status/class computation in renderRows with a single getStatusInfo() call for consistency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The nodes list was calling getStatusInfo(n) on nodes that had no stats (the list endpoint only returned DB data). This caused all repeaters to show as yellow/idle in the overview even when actively relaying. Fix: handleNodes in routes.go now enriches each repeater node with relay_count_1h, relay_count_24h, and last_relayed from the in-memory relayTimes index, matching exactly what the single-node health endpoint returns. The status filter also updated to use getStatusInfo() so relay state is considered when filtering active/stale. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts: # cmd/ingestor/decoder_test.go
buildPathHopIndex was indexing relay timestamps for all historical packets. On a live instance with 2.8M observations this allocated ~14 GB causing the OOM killer to terminate the server process. relayMetrics only queries 1h and 24h windows so any timestamp older than 24h is useless in the index. Skip addTxToRelayTimeIndex for packets older than 24h during the initial build. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vent OOM Loading all 366K transmissions and 2.8M observations at startup caused the server to consume 14+ GB and get OOM-killed on the live instance. The Load() SQL now adds a WHERE clause filtering to the configured retentionHours window when set. With retentionHours=168 (7 days), only ~7 days of packets are loaded into memory — sufficient for all in-memory queries. Historical data remains in SQLite and is available via DB queries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Must-fix items: - addTxToRelayTimeIndex now accepts pre-parsed millis to avoid 30K+ RFC3339 re-parses during bulk startup rebuild under the write lock - removeFromRelayTimeIndex uses copy+reslice instead of append to avoid per-eviction heap allocation proportional to path length - handleNodes snapshots relay times under RLock then releases before enrichment, keeping the lock scope to a quick map copy rather than the full node loop - Suppress last_relayed in API response when both counts are zero across all three endpoints (handleNodes, bulk health, single-node health) Should-fix items: - Add NOTE comment on HEALTH_THRESHOLDS mirror in test-repeater-liveness.js - Flatten multiline repeater template block into per-row inline conditionals in both table and dl views to match avgSnr/avgHops style Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Author
|
Closing — fixes should go on the existing PR #755 branch instead. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
addTxToRelayTimeIndexnow accepts pre-parsed millis — eliminates 30K+ redundant RFC3339 parses during startup rebuild under the write lockremoveFromRelayTimeIndexusescopy+resliceinstead ofappend— no per-eviction heap allocationhandleNodessnapshots relay times underRLockthen releases before enrichment, keeping lock held for a quick map copy onlylast_relayedsuppressed when both counts are zero across all three endpointsHEALTH_THRESHOLDSmirror in test file; repeater template rows flattened to inline conditionalsTest plan
go test ./...passes (excluding pre-existingTestMultiByteCapability_*failures)node test-repeater-liveness.js— 10 passed, 0 failed/api/nodesrepeater nodes:last_relayedabsent when both counts are 0, present when counts > 0🤖 Generated with Claude Code