Skip to content

feat: scope stats (#899) + multibyte capability map overlay (#903)#904

Closed
efiten wants to merge 32 commits intoKpa-clawbot:masterfrom
efiten:feat/scope-stats-899
Closed

feat: scope stats (#899) + multibyte capability map overlay (#903)#904
efiten wants to merge 32 commits intoKpa-clawbot:masterfrom
efiten:feat/scope-stats-899

Conversation

@efiten
Copy link
Copy Markdown
Contributor

@efiten efiten commented Apr 24, 2026

Summary

#899 — Scoped/unscoped transport-route statistics

  • Adds hashRegions config to the ingestor — region names whose HMAC-SHA256 keys are derived at startup, mirroring the existing hashChannels pattern
  • At ingest, transport-route packets with Code1 ≠ 0000 are HMAC-matched against configured region keys; result stored in new scope_name column
  • DB migration scope_name_v1 adds the column; scope matching runs only on new packets (no backfill goroutine)
  • New GET /api/scope-stats?window= endpoint (1h/24h/7d, 30s TTL) returning transport totals, scoped/unscoped counts, per-region breakdown, and time-series
  • New Scopes tab in the Analytics page with summary cards, per-region table, and two-line SVG chart

#903 — Multibyte capability map overlay

  • Adds multibyte_sup / multibyte_evidence columns to nodes + inactive_nodes via ingestor migration (multibyte_sup_v1)
  • Server detects the columns at startup and enriches all /api/nodes responses: multibyte_sup = 0/1/2, multibyte_evidence = "advert" / "path" / null
  • persistMultiByteCapability() upserts status from the existing analytics cycle (~15s) — no startup scan, no per-packet cost, no downgrade possible
  • Map controls get a "Show multibyte capability" toggle (under Byte Size, persisted in localStorage); repeater markers colored: solid green = confirmed, light green dashed = suspected, dimmed = unknown
  • Popup shows evidence label when overlay is active

fix — OOM crash on cold start with large DB

Load() previously loaded all transmissions from the DB regardless of retentionHours, so buildSubpathIndex() processed the full DB history on every startup. On a DB with 277K paths this produces 13.5M subpath index entries and OOM-kills the process before it ever starts listening — causing a supervisord crash loop.

Fix: apply the same retentionHours cutoff to Load()'s SQL that EvictStale() already uses at runtime. Startup now builds indexes only over the configured retention window, making startup time proportional to recent activity rather than total DB history.

Test plan

  • Run cd cmd/ingestor && go test ./... — covers schema migrations for both scope_name_v1 and multibyte_sup_v1
  • Run cd cmd/server && go test ./... — covers scope stats, node enrichment, persistMultiByteCapability, and retention-filtered Load()
  • Add "hashRegions": ["#yourregion"] to ingestor config, restart, verify scope_name column gets populated
  • Open Analytics → Scopes tab, verify summary cards and window selector work
  • Open Map, toggle "Show multibyte capability" — repeater markers change color, toggle state survives reload
  • Click a repeater with the overlay active — popup shows confirmed/suspected/not detected
  • Verify non-repeater nodes are unaffected by the multibyte toggle
  • Restart server with a large DB — verify it reaches "listening" within the retention window load time

Closes #899
Closes #903

🤖 Generated with Claude Code

@efiten
Copy link
Copy Markdown
Contributor Author

efiten commented Apr 25, 2026

Code review

Found 1 issue:

  1. Cache buster not bumped for analytics.js and packets.js (global CLAUDE.md says "Cache busters must be bumped in the same commit as any frontend file change")

Both frontend files are modified in this PR but public/index.html is absent from the diff — the ?v= cache busters for these scripts were not updated.

https://github.com/Kpa-clawbot/meshcore-analyzer/blob/fb3db67520bc66c61980a0c481223d6d0943da3f/public/analytics.js#L91-L93

https://github.com/Kpa-clawbot/meshcore-analyzer/blob/fb3db67520bc66c61980a0c481223d6d0943da3f/public/packets.js#L2014-L2016

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

efiten and others added 25 commits April 25, 2026 23:08
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…edback)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…#899)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…clawbot#899)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Register GET /api/scope-stats route with per-window 30s cache; fix
COALESCE on SUM in GetScopeStats summary query to handle empty result sets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…clawbot#899)

Add scope_name to transmissionBaseSQL/scanTransmissionRow when the column
exists, and render a Scope row in the detail-meta dl for scoped transport
packets (matched region = "#name", unmatched = "unknown scope").

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…deploy

The backfill processed all historical transport-route rows in a tight loop
at startup with no throttling. scope_name is populated at ingest for new
packets going forward; historical rows remain NULL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ot#903)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@efiten efiten force-pushed the feat/scope-stats-899 branch from fb3db67 to 59101ec Compare April 26, 2026 07:21
@efiten efiten changed the title feat(scope-stats): add scoped/unscoped transport-route statistics (#899) feat: scope stats (#899) + multibyte capability map overlay (#903) Apr 26, 2026
…old start

Load() previously loaded all transmissions regardless of retentionHours,
causing buildSubpathIndex() to process the full DB history on every startup.
On a DB with 277K paths this produces 13.5M subpath index entries, OOM-killing
the process before it ever starts listening.

Apply the same retentionHours cutoff to Load()'s SQL that Evict() already uses
at runtime. Startup now builds indexes only over the retention window, matching
steady-state behaviour and keeping index size proportional to recent activity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
efiten and others added 6 commits April 26, 2026 11:36
The server opens SQLite with mode=ro so persistMultiByteCapability()
silently failed on every UPDATE, leaving all nodes at multibyte_sup=0.

Replace DB persist with an in-memory cache: GetHashSizes() stores the
computeMultiByteCapability() result in mbCapSnapshot (under cacheMu),
GetMultibyteCapMap() exposes a pubkey→entry snapshot, and routes enrich
node responses from that map alongside EnrichNodeWithHashSize — the same
pattern already used for hash_size. Data refreshes each analytics cycle
(~15s); no DB writes needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without this, mbCapSnapshot stays empty until the first request to
/api/analytics/hash-sizes, leaving all nodes at multibyte_sup=0 on
the map overlay until a user visits the analytics page.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
loadNodes() is called on every incoming ADVERT packet, which called
renderMarkers() and destroyed any open popup. Track popup visibility
via Leaflet's popupopen/popupclose map events; skip renderMarkers()
during data refresh while a popup is open so the user can read it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…yte suspected detection

A node whose own adverts confirm hash_size=1 cannot forward multibyte
packets. If it appeared as a 'suspected' hop it was due to a prefix
collision (1-byte hash prefix shared with a multibyte-capable node), not
actual multibyte capability. Mark it 'unknown' instead of 'suspected'.

Adds TestMultiByteCapability_SuspectedGuard_OwnHashSize1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A node can send hash_size=1 adverts but still have multibyte firmware
and forward multibyte packets. The guard incorrectly suppressed
legitimate suspected classifications.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ction

Pre-Kpa-clawbot#886 ingestor data stored path bytes individually (1-byte entries per
hop) even for hs=2 packets. This caused 1-byte node prefixes to match
single-byte path fragments from hs=2 packets, incorrectly marking nodes
as suspected.

Fix: require hop string length (len(pfx)/2) to equal the packet hash_size.
A 1-byte hop in an hs=2 packet is stale/malformed data and must not
trigger a suspected classification.

Adds TestMultiByteCapability_HopLengthMismatch; updates PrefixCollision
test to use proper 2-byte hops instead of the now-filtered 1-byte case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner

@Kpa-clawbot Kpa-clawbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent Review — PR #904

This PR bundles three orthogonal changes (one explicitly admitted in the title, one not in the title at all). The combined diff is 3,784 added / 117 removed across 27 files, of which 2,576 lines are net-new design/plan markdown under docs/superpowers/ that nobody outside the implementer will read. Stripping that internal scratch-pad still leaves ~1,200 lines of production code touching the ingestor, server, store, routes, and frontend across two unrelated features and a startup-OOM hotfix.

This is more PR than reviewable. Below are the structural concerns plus per-feature must-fix items.

Structural concerns (P0 — must address before merge)

1. Scope explosion / unrelated bundling

The PR couples:

  • #899 — Scope stats (new ingestor config, new column, migration, ingest-side HMAC matching, new API endpoint, new analytics tab)
  • #903 — Multibyte capability map overlay (new ingestor columns, migration, server enrichment, new map toggle, new evidence labels)
  • An OOM-on-cold-start hotfix to Load() not mentioned in the PR title

These should be three separate PRs. Each is independently reviewable, independently revertable, independently bisectable. As bundled, you cannot bisect a regression to one feature without reverting the whole thing — and the OOM hotfix in particular is the kind of change that should land standalone with a clear paper trail (it's the kind of change that gets cherry-picked back to a release branch later).

Must-fix: split into PRs #904a (OOM hotfix), #904b (scope stats), #904c (multibyte overlay). The OOM fix can land today.

2. Multibyte-capability code duplication / not reusing existing computeMultiByteCapability

The repo already has PacketStore.computeMultiByteCapability() in cmd/server/store.go:6161 with full confirmed/suspected/unknown logic, evidence labels, and prefix-collision handling — including the recent #886-aware hop-length-must-match-hash_size guard.

This PR adds:

  • New multibyte_sup and multibyte_evidence columns to nodes and inactive_nodes on the ingestor side
  • A new ingestor migration multibyte_sup_v1
  • A new server-side persistMultiByteCapability() running on the ~15s analytics cycle
  • A new GetMultibyteCapMap() snapshot accessor for routes
  • A new enrichNodeWithMultibyte() shim

…all of which appears to shadow the existing code path rather than extend it. The existing MultiByteCapEntry struct + status/evidence labels are exactly what the overlay needs to render. Why is there a second persisted column instead of just exposing the existing computed field through the existing /api/nodes enrichment hook?

Concrete questions for the author:

  • What does the persisted multibyte_sup column give us that computeMultiByteCapability() doesn't already deliver per call (cached for 15s)?
  • The schema comment says "no downgrade possible" — but the existing in-memory computation is downgrade-aware (a confirmed 2-byte advert stays confirmed). Is the persisted column intended to survive DB-only restart-state? If so, the migration needs to also seed from the existing computation; otherwise on first startup the column is meaningless until the analytics cycle runs.
  • The frontend already reads hash_size and hash_size_inconsistent from /api/nodes. Why does multibyte_sup need a separate field instead of being derived from those + the existing capability snapshot?

Must-fix: justify why this isn't a 1-line frontend wire-up of the existing MultiByteCapEntry snapshot. If there's a real reason (e.g., persistence across restart before first analytics cycle completes), document it. Otherwise: delete the new column + migration and consume the existing capability snapshot.

3. Test coverage — specific gaps

The PR adds tests for:

  • TestGetMultibyteCapMap_Confirmed / _EmptyWhenNoSnapshot (snapshot mechanics)
  • TestEnrichNodeWithMultibyte_Confirmed / _Suspected / _ZeroEntryNoChange (helper unit tests)
  • TestMultiByteCapability_HopLengthMismatch (good — covers the #886 guard)
  • TestLoadRegionKeys (string trimming)
  • Schema migration smoke tests

What is not tested:

  • HMAC scope matching end-to-end — there is TestLoadRegionKeys but no test that constructs a transport-route packet with a known region key, runs it through matchScope(), and asserts scope_name is set on the persisted row. The whole feature hinges on this matching being correct, and HMAC truncation bugs are easy to introduce.
  • GET /api/scope-stats shape — no test that hits the route and asserts the returned JSON structure (totals, scoped/unscoped split, per-region breakdown, time-series shape).
  • 30s TTL invalidation — no test that the cache expires.
  • Load() retention cutoff — claims to fix the OOM, but there is no test that verifies a DB containing pre-cutoff transmissions yields a Load() that ignores them. This is the single most important behavior change in the PR (it's a startup-correctness fix); it must have a test.
  • Migration idempotency — what happens on the second startup after the column already exists? (The check exists in code, but a test that runs migration twice would be cheap insurance.)
  • Frontend Scopes-tab rendering — 128 added lines in analytics.js, zero tests.
  • Map overlay toggle persistence — claimed in PR body ("survives reload"), no test.

Must-fix: add the four backend tests above (HMAC end-to-end, route shape, TTL, retention). Frontend tests would be nice but the existing repo doesn't have great patterns for that, so accept as follow-up if not.

4. Documentation

docs/superpowers/plans/*.md and docs/superpowers/specs/*.md (2,576 lines of design notes) are author scratch-pad files, not user docs. They don't belong in docs/ at all — they belong as PR description content, GitHub Discussion artifacts, or in a separate internal/ directory.

What's missing:

  • No docs/user-guide/configuration.md update for the new hashRegions config field — operators cannot enable the feature without reading the source.
  • No docs/user-guide/map.md update for the new "Show multibyte capability" toggle.
  • No docs/user-guide/analytics.md update for the new Scopes tab.
  • No release notes entry.
  • The docs/api-spec.md change is +56 lines — verify it covers both the new /api/scope-stats endpoint AND the new fields on /api/nodes.

Must-fix:

  • Move docs/superpowers/** files out of the public docs tree (or into internal/specs/ if they must be retained — but really, they belong attached to the PR or in a wiki).
  • Update docs/user-guide/configuration.md, map.md, analytics.md to cover all three user-facing surfaces.

Per-feature must-fix items

Scope stats (#899)

  • HMAC truncation correctness: code := uint16(hmacBytes[0]) | uint16(hmacBytes[1])<<8 — verify endianness matches firmware. Cite the firmware source line in the PR body.
  • Region key empty/whitespace filteringTestLoadRegionKeys covers it, good.
  • No backfill goroutine is documented as intentional, but operators with existing DBs will see zero data in the Scopes tab until 24h of new packets accumulate. At minimum, the Analytics → Scopes UI should display "No scoped data yet — feature applies only to packets received after enabling hashRegions" when the totals are empty.

Multibyte overlay (#903)

  • Beyond the duplication concern (P0 #2 above), the visual encoding solid green = confirmed, light green dashed = suspected, dimmed = unknown needs a CSS-variable check (per AGENTS.md, all colors via CSS variables). Verify public/map.js doesn't hardcode hex values.
  • The popup "evidence label" is marker-state-dependent. What does the popup say when the overlay is OFF? Does the new field still render? It should be hidden when the overlay isn't active to avoid leaking implementation detail to non-power-users.

OOM hotfix (Load() retention)

  • The fix shape is right (mirror EvictStale's cutoff), but verify the SQL is safe against retentionHours == 0 (no retention configured). Looking at the diff, it is — guarded by if s.retentionHours > 0. Good.
  • Verify the OOM-reproducer DB (~277K paths) actually starts post-fix. The PR body claims it does but no proof attached. A regression test that constructs a synthetic DB with N pre-cutoff + M post-cutoff transmissions and asserts Load() reports M loaded would lock this in.

Summary

Category Count
Must-fix structural 4
Must-fix per-feature 5
Documentation gaps 5

This PR cannot merge as-is. The right path forward is split into three PRs, land the OOM hotfix immediately, and re-review the two features independently — at which point the multibyte feature should also be examined for whether it needs new code at all.

@efiten
Copy link
Copy Markdown
Contributor Author

efiten commented Apr 27, 2026

@efiten efiten closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: add a new stats page for scopes Feature: map that shows if repeater supports 2bytes

2 participants