mesh(core): replay caches + override-resume + safety topic handlers (6/9 of #195 split)#225
Conversation
The integration centre. Depends on PR-2 (security), PR-3 (session +
config builders), PR-4 (audit log_safety_event), PR-5 (bridge dedup).
Lands after all four. CI on this branch in isolation is expected red
until PR-2/3/4/5 merge.
Modified:
- strands_robots/mesh/core.py (+1,713/-33) -- mesh lifecycle, RPC
dispatch, replay caches with _evict_replay_cache helper (R23 PEP-695
generic over key type), override-code resume, safety topic handlers
(_on_safety_estop / _on_safety_resume), wire-handler exception
narrowing (R24-A 4-handler symmetry pin: AttributeError /
UnicodeDecodeError / json.JSONDecodeError), per-peer source_zid HMAC
binding (R25 -- _on_safety_estop / _on_safety_resume bind
sample.source_info.source_id.zid into HMAC so replay-from-different-
peer-zid can't pass), _exec_cmd non-dict rejection (R24-B -- bare
string commands no longer auto-coerce to {action:execute,
instruction:<str>, policy_provider:mock}).
- strands_robots/mesh/sensors.py (+32/-15) -- SensorLoopsMixin
cleanup, narrowed exception clauses.
- strands_robots/mesh/input.py (+2/-2) -- minor signature fixes.
- strands_robots/mesh/__init__.py (+3/-3) -- export surface tweak.
Carries review fixes from strands-labs#195: R9 (_on_safety_estop missing
freshness/replay defenses + handler docstrings + _zenoh_config drift),
R13 (bare except in _on_safety_resume narrowed), R17 (STRANDS_MESH_RESUME_*
env validation -- module-level fail-loud), R19 (time.time -> time.monotonic
on replay cache TTL eviction; permissive-ACL session-open WARNING), R23
(_evict_replay_cache helper extraction with PEP-695 generic), R24-A
(test_wire_handler_narrow_except.py -- 3 tests pinning the 4-handler
symmetry), R24-B (test_exec_cmd_string_command_rejected), R25
(safety/source_zid HMAC binding test, peer_id charset/length
constraints, Mesh.start subscriber-cleanup narrowing).
Tests (1,420 LOC across 12 files): every wire handler, every safety
topic, replay-cache eviction (10 helper unit tests in
test_replay_cache_eviction.py + 25 integration tests preserved), the
4-handler exception symmetry pin, source_zid binding (458-LOC
test_safety_source_zid_binding.py), resume env validation (10 tests),
resume replay (380 LOC, override-code unhappy paths).
This PR carries ~50 of the 213 review threads on strands-labs#195 -- the largest
slice. Recommend two reviewers.
Optional further split if reviewers request: PR-6a (safety topics +
replay caches) / PR-6b (override-resume + RPC). Both halves still
~800 LOC and call into each other, so a single PR is cleaner.
Tracking: strands-labs#219
Source PR: strands-labs#195
Lands after: strands-labs#220 (PR-1), PR-2, PR-3, PR-4, PR-5
Lands before: PR-7 (tools imports core.Mesh)
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
PR-6 of the #195 split — wires the mesh integration centre: replay caches, override-resume, safety topic handlers, narrowed wire-handler exception clauses, per-peer source_zid HMAC binding, and _exec_cmd non-dict rejection. ~1.7k LOC of production code in core.py plus ~1.4k LOC of tests across 12 files. Carries a substantial fraction of the upstream review-thread fixes.
The security model (mTLS + ACL + body-level HMAC + wire-level zid binding + freshness window + per-issuer fairness cap) is well-considered and the test surface is broader than I expected for a PR this size — most behavioural fixes have a dedicated regression test that fails on pre-fix code, matching AGENTS.md > Review Learnings (#85) > 'Pin regression tests for reviewed fixes'.
What's good
- Pin tests for every reviewed fix —
test_estop_peer_id_replay.py,test_replay_cache_monotonic.py,test_resume_env_validation.py,test_wire_handler_narrow_except.py,test_safety_source_zid_binding.py. Each anchors a specific regression so a future refactor surfaces the bug here, not in production. - Wire-handler exception narrowing is symmetric across all four handlers (
_on_cmd,_on_response,_on_safety_estop,_on_safety_resume) and pinned by an explicit symmetry test. Matches AGENTS.md > Review Learnings (#86) > 'Exception Clauses Must Be Narrow'. - Lazy env-var resolution with documented import-order rationale closes a real ergonomic footgun for operators tuning
STRANDS_MESH_RESUME_*. _exec_cmdnon-dict rejection with structured error envelope +command_rejectedaudit log + symmetric ValidationError treatment — well done.- Defence-in-depth layering (cache key on
talone, per-issuer cap derived from cache contents, lockout-engagement gated by per-issuer cap) addresses the peer_id-permutation surface and the denial-of-estop surface from prior reviews.
Concerns
- Botched search/replace regression in docstrings/comments — confirmed in three files (
__init__.py,input.py,tests/mesh/test_mesh_robustness.py). See inline. This is unrelated to the PR's stated scope and trivial to revert; it should not ride along. - Stacking note — CI is expected red and the PR is drafted on PR-2/3/4/5. Reviewers verifying this in isolation cannot exercise the integration; the value of this review is bounded to the code-level findings and
_security/auditAPI expectations. - Optional split (PR-6a / PR-6b) — the diff is large enough that two reviewers + a single PR is reasonable, but if either reviewer asks for a split, the safety-topic handlers (
_on_safety_estop/_on_safety_resume+ replay caches) form a coherent unit that could land first; override-resume + RPC scoping is a second coherent unit.
Verification suggestions
Once the upstream stack lands and CI is green:
hatch run test tests/mesh/ # exercise the ~1.4k LOC of new tests
# Spot-check the security-sensitive ones that should fail on a pre-fix tree:
pytest tests/mesh/test_replay_cache_monotonic.py -v
pytest tests/mesh/test_estop_peer_id_replay.py -v
pytest tests/mesh/test_safety_source_zid_binding.py -v
pytest tests/mesh/test_wire_handler_narrow_except.py -vFor the corroboration-window concern (see inline) it would be worth adding an attacker-replay-vs-corroboration discrimination test before merge — pin the intended behaviour explicitly so a future tightening doesn't silently break the alert-severity model.
| if cache_key in self._estop_replay_cache: | ||
| # If lockout is active and within 0.2s, this is corroboration | ||
| # (two distinct operators, same t) not a replay attack. | ||
| if self._estop_lockout.is_set() and (time.time() - self._last_estop_ts) < 0.2: |
There was a problem hiding this comment.
Replay-vs-corroboration misclassification, security-sensitive.
A cache hit within 200ms of _last_estop_ts is unconditionally classified as estop_corroborated (severity info) instead of estop_replay_rejected (severity warning). An attacker who captures a legitimate estop envelope and replays it within 200ms of the original — or whose own envelope arrives first and engages the lockout, then a same-t replay arrives — gets the wrong forensic signal:
- Severity downgraded
warning→info, hiding from any operator dashboard alerting on replay attempts. - Attribution:
payload={"issuer": issuer_id, ...}records the replay's bodypeer_id, which is attacker-chosen (the cache key isfloat(t)alone, which was the whole point of the peer_id-permutation defence). A malicious peer can wait 50ms after a legit estop, republish unchanged but with their ownpeer_id, and earn a "corroboration" attribution they did not provide.
The existing test test_estop_replay.py::test_distinct_issuers_same_t_audited_as_corroborated pins this as intended, but the test does not cover the attacker-replay-with-mutated-peer_id case, which is indistinguishable from corroboration with the current heuristic.
Suggested tightening: gate the corroboration branch on something the attacker can't synthesise — e.g. require body.source_zid (or wire-zid via _extract_sample_source_zid(sample)) to differ from the cached entry's issuer's session zid. If both envelopes carry the same wire zid, that's a same-session replay; if they differ, it's two distinct sessions = legitimate corroboration.
At minimum, please add a regression test that fires a same-t replay with a mutated peer_id from the same wire zid and asserts the audit signal is estop_replay_rejected, not estop_corroborated. Per AGENTS.md > Review Learnings (#85) > 'Pin regression tests for reviewed fixes'.
| "init_mesh", | ||
| "get_local_robots", | ||
| # Session helpers (re-exported from .session for convenience) | ||
| # Session helpers (re-exported from.session for convenience) |
There was a problem hiding this comment.
Search/replace regression unrelated to PR scope. from .session (with the space) is the import-statement form; from.session is ungrammatical. Same pattern hit strands_robots/mesh/input.py:15-16 ({"motor.pos": float,...} vs {"motor.pos": float, ...}) and tests/mesh/test_mesh_robustness.py:235 (.mesh attribute lost its leading space). Looks like a regex sweep that stripped a single space adjacent to . and ,. Please revert these three drive-by edits — they are not part of PR-6 and they break the docstring/comment grammar.
| now_mono=now_mono, | ||
| ) | ||
| # Apply the eviction back to the real cache. | ||
| for evicted in set(self._estop_replay_cache.keys()) - set(ts_view.keys()): |
There was a problem hiding this comment.
_evict_replay_cache shape contract drift. The helper takes dict[K, float] but _estop_replay_cache is dict[float, tuple[str, float]]. The current pattern (build a ts_view, evict on the view, set-difference back to the real cache) works, but it strips the issuer_id attribution before the helper sees it.
This means the helper's "single source of truth" claim in its docstring is overstated — any future eviction policy that needs issuer awareness (e.g. "evict over-cap issuers preferentially" or "weight TTL by issuer") cannot live in _evict_replay_cache; it would have to be re-implemented inline at the estop call site, exactly the duplication the helper was introduced to prevent.
Not blocking, but consider either (a) widening the helper to accept dict[K, V] + a value_to_ts: Callable[[V], float], so the estop cache passes its tuple values through directly, or (b) acknowledging in the docstring that the resume cache is the only direct user and the estop cache uses a view-shim by design.
| # against a placeholder of the same byte length as the provided | ||
| # value so the compare runs to completion either way. | ||
| compare_target = expected.encode() if expected else b"\x00" * max(1, len(provided)) | ||
| compare_ok = hmac.compare_digest(compare_target, provided.encode()) |
There was a problem hiding this comment.
Constant-time-compare oracle defence still leaks len(expected).
hmac.compare_digest(a, b) is documented as constant-time only when len(a) == len(b). When the lengths differ, CPython returns False quickly via a length-mismatch shortcut. The compare_target = ... else b"\x00" * max(1, len(provided)) line tries to avoid the "compare didn't run" oracle, but when the operator HAS configured STRANDS_MESH_OVERRIDE_CODE, an attacker submitting probes of varying length still observes len(expected) == len(provided) vs len(expected) != len(provided) via response time.
The lockout-engaged and code-configured oracles are correctly closed; the length oracle is a residual but lower-severity concern. The docstring at L1673-1685 reads as if the timing oracle is fully closed — please soften it to acknowledge the residual length leak, or normalise both inputs to a fixed digest length first (e.g. hmac.new(some_key, value.encode(), 'sha256').digest() on both sides).
Reference: AGENTS.md > Review Learnings (#85) > 'Match docstrings to semantics'.
| Mesh class). Replaces the bare ``...`` stub so static | ||
| analysers (CodeQL #226) don't flag a no-effect statement. | ||
| """ | ||
| raise NotImplementedError("SensorLoopsMixin.publish must be provided by a host class") |
There was a problem hiding this comment.
SensorLoopsMixin.publish stub is correct but the rationale chain is fragile.
The NotImplementedError body is fine — it satisfies the CodeQL no-effect-statement check the comment references. But the contract "Mesh.publish shadows this stub via MRO" depends on every host class declaring class Mesh(SensorLoopsMixin) and also defining its own publish (which happens at core.py:2353). If a future refactor inserts another mixin between them that also implements publish but forwards differently, this stub becomes the wrong fallback.
Not blocking — a regression test pinning that Mesh.publish shadows the mixin (e.g. assert Mesh.publish is not SensorLoopsMixin.publish) would make the invariant explicit and surface a subclass authoring error at test time rather than runtime. Per AGENTS.md > Review Learnings (#85) > 'Pin regression tests for reviewed fixes'.
…dress cap, docstring drift Addresses 4 of 5 R1 review threads on PR strands-labs#223 (yinsong1986). Each fix is surgical and pinned by a regression test in `tests/mesh/test_security_review_r1.py`. Threads addressed: 1. **`MAX_TIMEOUT_S` was dead code.** Defined and exported, no consumer in PR-2 scope. Removed pending PR-6 (strands-labs#225) where the call site that enforces the cap should land alongside the constant. Per AGENTS.md Key Conventions strands-labs#10 ("No dead code -- if it's not called and not part of base class, delete it"). 2. **`MAX_SERVER_ADDRESS_LEN` is now a dedicated constant.** A model path and a `host[:port]` URL are semantically unrelated; reusing `MAX_MODEL_PATH_LEN` (512) for the address cap meant a future tightening of one would silently shift the other. New constant is 256 (FQDN bounded at 253 by RFC 1035; smallest power-of-2 ceiling that admits any legitimate input). 3. **`policy_provider` docstring drift.** The `validate_command` docstring claimed `policy_provider (optional): defaults to "mock"` but the implementation explicitly raises `ValidationError( "policy_provider is required for execute/start actions...")`. Docstring now reads `REQUIRED, in is_safe_policy_provider. No silent default -- a peer that omits this is rejected so it is never ambiguous whether mock was an explicit choice or a bug.` 4. **`policy_host` casing-preservation contract documented.** `is_safe_policy_host` casefolds + strips for the allowlist check, but `validate_command` preserves the caller's original casing / whitespace in the validated output. A downstream consumer doing `policy_host == "localhost"` for a loopback fast-path will fail on `" LOCALHOST "` despite both validating successfully. The contract is now documented in a comment adjacent to the `out["policy_host"] = policy_host` assignment so a future contributor reading the source sees the rule before stumbling into it. Same contract applies to `server_address` (noted in the comment). Deferred (cross-PR scope): 5. **`ALLOWED_ACTIONS == set(Mesh._dispatch)` regression test.** The reviewer correctly notes this pin must live in PR-6 (strands-labs#225) where `core.py` lands, since `ALLOWED_ACTIONS` and `Mesh._dispatch` live in different PR scopes today. Will reply on the thread. Pin verification (AGENTS.md S0 round-trip): - Pre-fix: 9/13 new tests fail (correctly identifying the absent constant, the present dead constant, the drift in the docstring, and the missing comment). - Post-fix: 13/13 pass. - Existing `test_validate_command.py` regression: 57/58 pass; the 1 failure (`test_flat_envelope_rejected_in_exec_cmd`) is pre-existing cross-PR — verified by re-running with my changes stashed: same failure. It depends on `Mesh._dispatch` from `core.py` (PR-6). No behaviour change on the wire-side validation path; only the constant a future consumer reads (`MAX_SERVER_ADDRESS_LEN` vs `MAX_MODEL_PATH_LEN`) differs, and only for inputs in the 257-512 char range, which is past every legitimate `host[:port]` shape.
Per umbrella tracker strands-labs#219 the four PR-2/3/4/5 slices land in parallel and must each go green standalone. test_r25_audit_symmetry.py imports modules introduced by sibling slices that are not present on this branch: from strands_robots.mesh import _zenoh_config as zc # lives on PR strands-labs#224 from strands_robots.mesh import audit as audit_mod # lives on this PR from strands_robots.mesh import core as core_mod # lives on this PR from strands_robots.mesh import security as sec # lives on PR strands-labs#223 That makes the file structurally un-passable on the audit slice in isolation, which is why call-test-lint fails mypy here with "Module strands_robots.mesh has no attribute 'security'" / "... no attribute '_zenoh_config'". The four pin sites the file describes (peer-id charset on security.py, partial-failure cleanup on core.py, O_NOFOLLOW on the seq lockfile in audit.py, and the STRANDS_MESH_TLS_KEY mode-0o600 docstring on _zenoh_config.py) span four sibling slices and only make sense as a cross-cutting integration pin once strands-labs#223, strands-labs#224, strands-labs#221 and strands-labs#225 have all merged. The natural home is PR-6 (strands-labs#225, mesh/core replay caches + override-resume + safety topic handlers) which is the integration slice that needs all the listed modules present anyway, or a follow-up issue queued behind PR-9 (strands-labs#226, docs). The audit-side pin (point 3, O_NOFOLLOW on the seq lockfile) is already covered on this PR by tests/mesh/test_audit_seq_symlink.py and tests/mesh/test_audit_psk_poison_record.py, so removing this file does not weaken regression coverage on the audit slice itself.
The test_application_security.py file (358 LOC) and the TestF15WireCommandKeyRequired class in test_validate_command.py exercise modules that land in later PRs of the strands-labs#195 split: - bridge_transport.DEFAULT_BRIDGE_SUFFIXES with 'safety/resume' (bridge_transport additions land in PR-5 / strands-labs#222) - audit._seq_sidecar_path, audit._resolve_log_max_bytes, F2/F3 symlink protections (land in PR-4 / strands-labs#221) - core.Mesh._expected_responders, core.BROADCAST_RESPONDER, the wire-side _exec_cmd dict-shape gate (land in PR-6 / strands-labs#225) - robot_mesh._interrupt_approves (lands in PR-7 / strands-labs#227) PR-2 is scoped to the leaf module strands_robots/mesh/security.py with zero internal mesh deps. These tests were pre-staged here but need their consumers to land first. They will re-land in the appropriate consumer PR. Test impact: 9 failing tests removed (8 from test_application_security.py, the F15 class moved out). All 571 remaining mesh tests pass. No production code change; no behaviour change on validate_command or any security.py contract.
🎯 Pentest evidence for this PR (#225 — replay caches + safety topic handlers)2026-05-26 pentest run What this PR is achieving (positive control)The replay caches + monotonic-
F-27 / B-21 is captured as a positive-control finding to pin. Edge cases worth a regression test in this PR
Finding NOT closed by this PR
Mapping for all PRs: |
Part 6 / 9 of the split of #195 — tracked by #219.
Drafted until PR-2 (#223), PR-3 (#224), PR-4 (#221), and PR-5 (#222) merge. CI on this branch in isolation is expected red — it imports from
strands_robots.mesh.security(PR-2),strands_robots.mesh.audit.log_safety_event(PR-4),strands_robots.mesh.session.start(PR-3), and the new bridge contract (PR-5).The integration centre
This is where everything else gets wired together. It carries ~50 of the 213 review threads on #195 — the largest slice. Recommend two reviewers.
What's in this PR
strands_robots/mesh/core.py(+1,713/-33) — mesh lifecycle, RPC dispatch, replay caches with_evict_replay_cachehelper (R23 PEP-695 generic over key type), override-code resume, safety topic handlers (_on_safety_estop/_on_safety_resume), wire-handler exception narrowing (R24-A 4-handler symmetry pin:AttributeError/UnicodeDecodeError/json.JSONDecodeError), per-peer source_zid HMAC binding (R25),_exec_cmdnon-dict rejection (R24-B — bare string commands no longer auto-coerce to{action:execute, instruction:<str>, policy_provider:mock}).strands_robots/mesh/sensors.py(+32/-15) — SensorLoopsMixin cleanup, narrowed exception clauses.strands_robots/mesh/input.py(+2/-2) — minor signature fixes.strands_robots/mesh/__init__.py(+3/-3) — export surface tweak.Reviewer focus
time.monotonic()for replay TTL (R19).7423be2on the source PR — prevents replay-from-different-peer-zid)._exec_cmdnon-dict rejection (R24-B).Carries review fixes from #195
R9, R13, R17, R19, R23, R24-A, R25-source-zid — see commit message for the full list.
Optional further split
If reviewers ask, I'll split into PR-6a (safety topics + replay caches) / PR-6b (override-resume + RPC). Both halves still ~800 LOC and call into each other, so a single PR is cleaner.
Tests
1,420 LOC across 12 files: every wire handler, every safety topic, replay-cache eviction (10 helper unit tests + 25 integration tests preserved), the 4-handler exception symmetry pin, source_zid binding (458-LOC
test_safety_source_zid_binding.py), resume env validation (10 tests), resume replay (380 LOC).Stacking note
Depends on PR-1 (#220), PR-2 (#223), PR-3 (#224), PR-4 (#221), PR-5 (#222). Will be marked Ready for Review once all four upstream PRs land. Until then the diff is reviewable but CI is red.
Landing order: PR-1 → PR-2/3/4/5 → PR-6 → PR-7 → PR-8 → PR-9. Full plan:
PR_LIST.md. Tracking: #219.