feat: clock skew classifier redesign — default-detection model by Kpa-clawbot · Pull Request #907 · Kpa-clawbot/CoreScope

Kpa-clawbot · 2026-04-24T23:29:51Z

Clock Skew Classifier Redesign

Replaces the windowed-median / hysteresis / good-fraction classifier with a simple default-detection model based on known firmware epochs.

Spec: docs/clock-skew-redesign.md

New 5-tier severity model

Tier	Condition	Color
Default	`advert_ts` within `[epoch, epoch + 2y]` of any known firmware default	Gray
OK	`	skew
Degrading	`15s <	skew
Degraded	`60s <	skew
Wrong	`	skew

What changed

Backend classifier (cmd/server/clock_skew.go): dropped no_clock, severe, warn, absurd, bimodal_clock severities and all windowing/hysteresis logic. New isDefaultEpoch() checks advert timestamp against known firmware defaults [0, 1609459200, 1672531200, 1715770351]. Per-node severity = classification of most recent advert. Dropped recentMedianSkewSec, goodFraction, recentBadSampleCount, recentSampleCount from API. Added defaultEpoch field.
Backend tests (cmd/server/clock_skew_test.go): new tests for all tiers, boundary values, default epoch detection, beyond-uptime-cap fallthrough. Observer calibration tests preserved.
Frontend badges (public/roles.js, public/style.css, public/analytics.js, public/nodes.js): updated severity labels, CSS classes, filter tabs, and badge rendering. Removed bimodal/no_clock-specific UI code.
Frontend explainer (public/nodes.js): per-tier plain-English reason line in node clock card.

Go test results

ok  github.com/corescope/server  19.364s

References

Supersedes PR fix: bimodal clock hysteresis — don't flip to 'No Clock' on transient bad bursts #894
Addresses parts of feat: bimodal_clock severity — surface flaky-RTC nodes instead of hiding as 'No Clock' #845 (bimodal pattern → recognized as firmware default)
References Feature: Clock skew detection for nodes & observers #690, bug: Clock Analysis uses Median values so corrected clocks aren't reflected in reports leading to anomolies in the data #789

…epoch test - Remove stale 'recentMedianSkewSec' reference in nodes.js comment - Add TestIsDefault_OverlappingWindowsPicksLargest covering epoch selection when default ranges overlap

Kpa-clawbot

Independent Adversarial Review

Reviewed the full diff, spec doc, and file context for each changed file. Applied lenses: Frankenstein check, duplicate logic, spec drift, edge cases, API compat, test coverage, performance.

Must-fix (4 items — requesting changes)

1. extractTimestamp conflates "no timestamp" with epoch-0 — false default classification
cmd/server/clock_skew.go:247 — extractTimestamp returns 0 both when no timestamp field exists AND when the timestamp is literally 0. After the advertTS < 0 filter change (line 216), advertTS == 0 now passes through to isDefaultEpoch, which matches epoch 0 and classifies as default. Any ADVERT packet with a missing or unparseable timestamp will be incorrectly treated as an epoch-0 node.

Fix: Change extractTimestamp to return -1 for "not found" (or use a (int64, bool) return), and keep the advertTS < 0 guard in collectSamples. Alternatively, keep the old advertTS <= 0 filter and accept that actual epoch-0 nodes need advertTS >= 1 to be detected — but this conflicts with the spec's explicit "epoch 0 → default" requirement.

2. Stale CSS row classes .clock-fleet-row--warning and .clock-fleet-row--critical
public/style.css:2307-2308 — The old severity names warning and critical were replaced everywhere except these two CSS row highlight rules. They should be .clock-fleet-row--degrading and .clock-fleet-row--degraded respectively. The analytics table in analytics.js:3516 builds row classes as clock-fleet-row--<severity>, so rows with degrading/degraded severity get no background highlight (the CSS rules target the old names).

Fix: Rename .clock-fleet-row--warning → .clock-fleet-row--degrading and .clock-fleet-row--critical → .clock-fleet-row--degraded.

3. computeNodeSkew classifies using last-appended observer's skew, not most-recent observation
cmd/server/clock_skew.go:371 — lastCorrectedSkew := cs[len(cs)-1].skew takes the last element of the cs slice for a hash, which is the last observer appended (by sample iteration order), not necessarily the most recent observation. The spec says "most recent observation across all observers." The per-node path in getNodeClockSkewLocked correctly picks by lastObsTS, but the per-hash intermediate NodeClockSkew may have incorrect severity if observers arrive non-chronologically.

Fix: Pick the skew from the cs element with the largest observedTS, or sort cs by observedTS before indexing.

4. No integration test for advertTS == 0 through PacketStore.GetNodeClockSkew
cmd/server/clock_skew_test.go — TestIsDefaultEpoch_Boundaries tests isDefaultEpoch(0) in isolation, but no integration test puts an ADVERT with timestamp: 0 through the full PacketStore path. This is critical given bug #1 — need a test that confirms real epoch-0 adverts classify correctly AND that missing-timestamp packets don't false-positive.

Out-of-scope (pre-existing / follow-up)

--status-red CSS variable: skew-badge--wrong uses var(--status-red) — verify it exists in :root. Not introduced by this PR.
Breaking API fields removal: recentMedianSkewSec, goodFraction, recentBadSampleCount, recentSampleCount silently disappear. Consider a "Breaking changes" section in the PR description for external consumers.
isDefaultEpoch O(n) loop: Negligible at 4 epochs, note for future if the list grows.

Distinguishes 'no timestamp field' (returns -1) from real epoch-0 (returns 0). Adds jsonNumberOk helper that returns (value, bool). The collectSamples guard 'advertTS < 0' correctly filters missing timestamps while allowing epoch-0 through to isDefaultEpoch. Updates TestExtractTimestamp to verify both cases. Review item #1 on PR #907.

.clock-fleet-row--warning → .clock-fleet-row--degrading .clock-fleet-row--critical → .clock-fleet-row--degraded The JS in analytics.js builds classes as 'clock-fleet-row--<severity>' so the CSS must match the actual severity strings. Review item #2 on PR #907.

Uses max observedTS instead of last-appended slice element to determine the most recent skew sample per hash. Consolidates the latestObsTS and anyCal loop into a single pass. Review item #3 on PR #907.

TestGetNodeClockSkew_EpochZeroAdvert: verifies advert with timestamp==0 flows through PacketStore and classifies as severity=default, epoch=0. TestGetNodeClockSkew_MissingTimestamp: verifies advert with no timestamp field is skipped (extractTimestamp returns -1, filtered by collectSamples). Review item #4 on PR #907.

Kpa-clawbot · 2026-04-24T23:43:44Z

Review fixes addressed

All 4 must-fix items from the adversarial review:

✅ Fix 1 — extractTimestamp sentinel for missing timestamp — d4b1aa4
Returns timestampMissing (-1) instead of 0 when no timestamp field exists. Added jsonNumberOk helper. Epoch-0 (ts == 0) now correctly flows through to isDefaultEpoch.

✅ Fix 2 — CSS fleet row class rename — c46a60f
.clock-fleet-row--warning → .clock-fleet-row--degrading, .clock-fleet-row--critical → .clock-fleet-row--degraded. Already had --wrong and --default variants.

✅ Fix 3 — computeNodeSkew uses chronologically-latest — 86a4403
Picks skew from sample with max observedTS instead of last-appended slice element. Consolidated the loop.

✅ Fix 4 — Integration tests for epoch-0 and missing timestamp — 3cd7186
TestGetNodeClockSkew_EpochZeroAdvert: timestamp=0 → severity=default, epoch=0.
TestGetNodeClockSkew_MissingTimestamp: no timestamp field → skipped (nil result).

All tests pass: go test ./... → ok

Kpa-clawbot

🎓 Dijkstra Review — The Formalist

"Simplicity is prerequisite for reliability."

This redesign is a rare pleasure: a complex state machine replaced by a clean, total function with well-separated concerns. The spec is rigorous, the production data analysis is thorough, and the implementation follows the spec faithfully. I shall nonetheless hold it to the standard it deserves.

Invariants & Mutual Exclusivity ✅

The 5-tier severity model is mutually exclusive by construction: classifySkew checks default-epoch membership first (preempting all bands), then cascades through <= boundaries in a switch. No input can satisfy two branches. The tiers partition the input space completely — every (advertTS, absCorrectedSkewSec) pair maps to exactly one severity.

State Space ✅

Can a node be both "default" and "wrong"? No. isDefaultEpoch runs first; if it matches, severity is SkewDefault regardless of skew magnitude. The test TestClassify_Default_AllKnownEpochs correctly passes skew=999999 to verify this preemption.

Separation of Concerns ✅

Default detection operates on raw advertTS — independent of observer calibration. Band classification operates on absCorrectedSkewSec — the calibrated value. These are orthogonal: observer offsets (seconds-to-minutes scale) cannot move a raw advertTS from 2024 into 2026. Phase 2 calibration and default detection cannot interfere with each other. Clean.

Type Safety ✅

SkewSeverity is a named string type with named constants — idiomatic Go. The classifier only returns values from the constant set. Frontend uses SKEW_SEVERITY_COLORS / SKEW_SEVERITY_LABELS maps as the authoritative value registry; unknown severities degrade gracefully (unstyled badge).

Totality & Edge Cases ✅ (with one exception)

Negative advertTS: No epoch matches (all epochs ≥ 0), falls through to band classification. Safe.
advertTS = 0: Matches epoch-0 → default. Tested (TestGetNodeClockSkew_EpochZeroAdvert).
Missing timestamp: extractTimestamp returns timestampMissing (-1), collectSamples filters advertTS < 0. Clean sentinel. Tested (TestGetNodeClockSkew_MissingTimestamp).
NaN skew: All <= comparisons return false → falls through to SkewWrong. Correct behavior, though implicit.
+Inf skew: Same — SkewWrong. Correct.
Overlapping default-epoch windows: isDefaultEpoch picks the largest epoch ≤ advertTS. Tested (TestIsDefault_OverlappingWindowsPicksLargest).

Observer Calibration Orthogonality ✅

calibrateObservers is unchanged, computing offsets from multi-observer packet medians. computeNodeSkew applies offsets to raw skew before passing to classifySkew. Default detection reads only raw advertTS. These are cleanly layered — calibration cannot cause a false default, and default detection cannot corrupt calibration.

🔴 Must-Fix (1)

classifySkew does not enforce its own precondition — cmd/server/clock_skew.go:66

The parameter is named absCorrectedSkewSec (implying non-negative), but the function body does not enforce this. If a future caller passes a raw (possibly negative) skew without math.Abs():

classifySkew(advTS, -86400)  // -86400 <= 15 is TRUE → SkewOK ❌

A node 1 day behind would be classified as OK. Both current call sites correctly pass math.Abs(...), so this is latent — but the function violates a core principle: enforce preconditions at the boundary, do not rely on callers to uphold implicit contracts.

Fix (one line):

func classifySkew(advertTS int64, absCorrectedSkewSec float64) (SkewSeverity, int64) {
    absCorrectedSkewSec = math.Abs(absCorrectedSkewSec)  // ← add this
    if ok, epoch := isDefaultEpoch(advertTS); ok {

Add a test:

func TestClassify_NegativeSkewTreatedAsAbsolute(t *testing.T) {
    advTS := int64(1900000000)
    sev, _ := classifySkew(advTS, -86400)
    if sev != SkewWrong {
        t.Errorf("classifySkew(advTS, -86400) = %v, want wrong", sev)
    }
}

Out-of-scope (pre-existing, for follow-up)

jsonNumber (the non-Ok variant) still exists and returns 0 for missing keys — indistinguishable from a real zero. Pre-existing; not introduced by this PR.

Verdict

1 must-fix (cmd/server/clock_skew.go:66 — defensive math.Abs in classifySkew). Once addressed: zero structural concerns — the classifier is correct by construction, tiers are exhaustive and mutually exclusive, separation is clean, and tests cover all boundaries including the tricky epoch-0 and missing-timestamp cases.

"Testing shows the presence, not the absence of bugs." — but this classifier is simple enough that we can reason about its correctness directly, which is the higher standard.

Kpa-clawbot · 2026-04-24T23:49:32Z

✅ Dijkstra must-fix addressed in 4291b38: classifySkew now takes math.Abs internally; parameter renamed from absCorrectedSkewSec to skewSec. Callers no longer need to pre-abs. Regression test TestClassify_NegativeSkewUsesAbsoluteValue added.

Kpa-clawbot

🔬 MeshCore Protocol Engineer Review

Reviewed against firmware source at firmware/src/helpers/ArduinoHelpers.h and full git history of VolatileRTCClock.

Protocol-correctness summary

✅ ADVERT-only classification — collectSamples() filters by PayloadADVERT (line 195), getNodeClockSkewLocked() skips non-ADVERT (line 422). No from_node leakage from non-ADVERTs. Correct.

✅ 1715770351 confirmed — VolatileRTCClock constructor in ArduinoHelpers.h:11: base_time = 1715770351. Present since initial commit. set time RESET in CommonCLI.cpp calls setCurrentTime(1715770351). Verified.

✅ Epoch-0 / missing-timestamp distinction — timestampMissing = -1 sentinel, jsonNumberOk helper, collectSamples filters advertTS < 0. Epoch-0 flows through to isDefaultEpoch. Clean.

✅ Originator attribution — classifier operates on the ADVERT's embedded timestamp (originator's clock), not observation metadata. Repeater forwarding doesn't alter this. Correct.

✅ Default detection independent of calibration — classifySkew() checks raw advert_ts against isDefaultEpoch() before examining corrected skew. Observer offsets (seconds-to-minutes scale) can't move a 2024 timestamp to 2026. Correct per spec.

Must-fix (1)

MF-1: `maxPlausibleUptimeSec` = 730 days is too tight — imminent misclassification

File: cmd/server/clock_skew.go:47

The firmware default 1715770351 is 2024-05-15. Today is 2026-04-25 — that's 710 days. Solar-powered repeaters deployed at firmware release and never rebooted are 20 days from being reclassified from default to wrong.

Real-world solar repeaters (mountain-top, off-grid) run for years without reboot. A node at 800 days uptime with firmware-default clock should be default, not wrong — the latter implies "operator set incorrect time" which triggers unnecessary intervention.

Fix: Increase to 1095 * 86400 (3 years), or compute dynamically: max(baseline, now - oldest_known_epoch + margin). The constant table already acknowledges epochs will grow; the uptime cap should track that reality.

This is imminent — the 00id.net deployment already has nodes at 1715770351 + 0..113d. Some booted near firmware release.

Comments (not blocking)

C-1: 1672531200 — no firmware source evidence. Git history shows base_time was always 1715770351 from initial commit. The 5 production nodes at ~1672531542 are empirical evidence, but likely from a pre-repo firmware version or fork. Add a code comment noting "production-observed, firmware source not available."

C-2: 1609459200 — acknowledged speculation. Spec says "single-sample evidence." Shipping it is fine (false positive = harmless gray badge). Add a // speculative comment in the constant table.

C-3: VCR replay misclassification. Old replayed adverts will show as wrong because their timestamps are months old vs current time.Now(). Pre-existing issue, not introduced here, but the new red wrong badges make it more visible. Suggest a follow-up issue.

C-4: Single-observer uncalibrated nodes. When only one observer exists with no calibration, raw skew is used. If that observer's clock is off by 30s, a perfectly-synced node shows degrading. Inherent to Phase 2's design. Follow-up: surface "uncalibrated — single observer" more prominently in UI.

Verdict

1 must-fix (MF-1: uptime cap about to expire on real nodes). After that: approve.

Everything else is correct. Default-detection model matches firmware behavior, ADVERT-only classification is clean, observer calibration preserved, sentinel handling for epoch-0 vs missing-timestamp is solid work.

…times

Kpa-clawbot · 2026-04-25T00:08:23Z

✅ MF-1 addressed in 86ca793: maxPlausibleUptimeSec bumped 730d → 1095d (3 years). Spec doc + tests updated.

you added 5 commits April 24, 2026 23:18

docs: clock-skew classifier redesign spec

f872fd9

feat: replace clock skew classifier with default-detection model

545df27

test: cover default-detection classifier tiers and edge cases

2c675f5

ui: rename clock skew tiers to default/ok/degrading/degraded/wrong

0acbac6

ui: per-tier explainer line in node clock card

2106cc0

This was referenced Apr 24, 2026

fix: bimodal clock hysteresis — don't flip to 'No Clock' on transient bad bursts #894

Closed

feat: bimodal_clock severity — surface flaky-RTC nodes instead of hiding as 'No Clock' #845

Closed

polish: remove leftover recentMedianSkewSec comment, add overlapping-…

d617a55

…epoch test - Remove stale 'recentMedianSkewSec' reference in nodes.js comment - Add TestIsDefault_OverlappingWindowsPicksLargest covering epoch selection when default ranges overlap

Kpa-clawbot commented Apr 24, 2026

View reviewed changes

you added 4 commits April 24, 2026 23:42

fix: computeNodeSkew picks chronologically-latest observation

86a4403

Uses max observedTS instead of last-appended slice element to determine the most recent skew sample per hash. Consolidates the latestObsTS and anyCal loop into a single pass. Review item #3 on PR #907.

Kpa-clawbot commented Apr 24, 2026

View reviewed changes

fix: classifySkew defensive absolute value

4291b38

Kpa-clawbot commented Apr 25, 2026

View reviewed changes

fix: bump default-epoch uptime cap to 3 years for solar repeater life…

86ca793

…times

Merge branch 'master' into feat/clock-skew-default-detection

4e4de3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: clock skew classifier redesign — default-detection model#907

feat: clock skew classifier redesign — default-detection model#907
Kpa-clawbot wants to merge 13 commits intomasterfrom
feat/clock-skew-default-detection

Kpa-clawbot commented Apr 24, 2026

Uh oh!

Kpa-clawbot left a comment

Uh oh!

Kpa-clawbot commented Apr 24, 2026

Uh oh!

Kpa-clawbot left a comment

Uh oh!

Kpa-clawbot commented Apr 24, 2026

Uh oh!

Kpa-clawbot left a comment

Uh oh!

Kpa-clawbot commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kpa-clawbot commented Apr 24, 2026

Clock Skew Classifier Redesign

New 5-tier severity model

What changed

Go test results

References

Uh oh!

Kpa-clawbot left a comment

Choose a reason for hiding this comment

Independent Adversarial Review

Must-fix (4 items — requesting changes)

Out-of-scope (pre-existing / follow-up)

Uh oh!

Kpa-clawbot commented Apr 24, 2026

Review fixes addressed

Uh oh!

Kpa-clawbot left a comment

Choose a reason for hiding this comment

🎓 Dijkstra Review — The Formalist

Invariants & Mutual Exclusivity ✅

State Space ✅

Separation of Concerns ✅

Type Safety ✅

Totality & Edge Cases ✅ (with one exception)

Observer Calibration Orthogonality ✅

🔴 Must-Fix (1)

Out-of-scope (pre-existing, for follow-up)

Verdict

Uh oh!

Kpa-clawbot commented Apr 24, 2026

Uh oh!

Kpa-clawbot left a comment

Choose a reason for hiding this comment

🔬 MeshCore Protocol Engineer Review

Protocol-correctness summary

Must-fix (1)

MF-1: maxPlausibleUptimeSec = 730 days is too tight — imminent misclassification

Comments (not blocking)

Verdict

Uh oh!

Kpa-clawbot commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MF-1: `maxPlausibleUptimeSec` = 730 days is too tight — imminent misclassification