mesh(security): payload validation module (action allowlist + bounds) (2/9 of #195 split) by cagataycali · Pull Request #223 · strands-labs/robots

cagataycali · 2026-05-25T20:37:24Z

Part 2 / 9 of the split of #195 — tracked by #219.

The application-layer security boundary that runs after a peer has cleared mTLS + ACL but whose payload contents we still need to bound. Without this, an authorised operator could steer a robot at an attacker-controlled inference server, request a 24-hour execute action, or drive the robot to download an arbitrary HuggingFace model.

What's in this PR

New module:

strands_robots/mesh/security.py (~830 LOC) — standalone, zero internal mesh deps.

Reviewer focus

Action allowlist completeness vs _exec_cmd paths in core.py (PR-6 carries the wire-side counterpart in _exec_cmd non-dict rejection).
Regex anchors (^...$).
Traversal blocks.
Finite-float checks (math.isfinite).
Control-character gates on all passthrough string fields (R6).

Tests

~1,600 LOC across 9 files: action allowlist coverage, finite-float boundaries, peer_id charset/length, allowlist env-var parsing + LRU cache invalidation, policy host CIDR matching, HF two-segment shape enforcement, HF degenerate-segment rejection (R5), and control-character charset gates (R6).

Stacking note

No internal mesh deps — PR-2 is self-contained. PR-6 (core), PR-7 (tools), and PR-8 (iot) will later import from this module.

§13 R-round changelog

Round	Commit	Concern	Pin test
R1	`acda084`	Dead `MAX_TIMEOUT_S`, docstring drift, `MAX_SERVER_ADDRESS_LEN` semantic owner, casing-contract docs	`tests/mesh/test_security_review_r1.py`
R2	`bc89c0f`	CI fix: remove cross-PR scope-leak test artefacts	(no production code change)
R3	`f7fd065`	Dead `LockoutError` class + export	`tests/mesh/test_security_no_dead_exports.py`
R4	(desc only)	Thread reconciliation, #239 buckets A-E	—
R5	`04849bf`	HF shape gate degenerate inputs (bucket E)	`tests/mesh/test_hf_repo_degenerate_segments.py`
R6	`c058814`	Charset gates on passthrough fields (CRLF/NUL/C0 injection)	`tests/mesh/test_security_control_char_gates.py`

R6 (2026-05-27 ~05:4X UTC) — `c058814`

Addresses the five R5 review threads (2026-05-27T03:38:41Z) that flagged control-character/injection vectors in passthrough string fields. All five are the same semantic concern: string fields stored in out without charset validation can carry CRLF/NUL/C0 control bytes into audit logs and downstream string ops.

#	Thread	Disposition	Detail
1	line 640: `policy_host` CRLF injection	Fixed — `_SAFE_PASSTHROUGH_RE` gate rejects `[\x00-\x1F\x7F]` after allowlist membership passes	Closes #239 bucket A (subset: reject-at-validator for control bytes)
2	line 619: `turn_id`/`sender_id` unvalidated passthrough	Fixed — type-check (`must be str`), length-bound (`MAX_PASSTHROUGH_LEN=128`), charset-validate	Defence-in-depth: non-string types + control bytes now hard-rejected
3	line 355: HF shape gate degenerate inputs	Already fixed in R5 (`04849bf`)	—
4	line 765: `resume.override_code` control chars	Fixed — printable-ASCII-only charset gate	Same `_SAFE_PASSTHROUGH_RE` regex
5	line 174: `_POLICY_HOST_ENTRY_RE` dead bracket chars	Fixed — dropped `\[\]` from regex	Bracketed `[::1]` now triggers WARNING via fail-loud misconfig posture

New constant: _SAFE_PASSTHROUGH_RE = re.compile(r"^[\x20-\x7E]+$") — printable ASCII only (0x20 space through 0x7E tilde). Shared by all five gates.

New constant: MAX_PASSTHROUGH_LEN: int = 128 — bounds turn_id and sender_id (ULID/UUID-shaped correlation tokens).

server_address also gets the same control-char gate (same posture as policy_host).

Pin test: tests/mesh/test_security_control_char_gates.py (25 tests across 5 classes).

Verification:

tests/mesh/ — 143 security tests passed (all pin tests green)
ruff check — clean
ruff format --check — clean
grep non-ASCII — clean
tests/simulation/ — untouched

#239 bucket status after R6:

Bucket A (whitespace footgun, normalised vs raw): Partially closed — control-byte rejection now in place; the cosmetic normalisation question (leading/trailing spaces, dual-emit policy_host_raw) remains deferred to PR-6 where consumer-side literal-equality call sites are visible.
Bucket B (HF allowlist floor-only): unchanged, deferred.
Bucket C (provider/type shared allowlist coupling): unchanged, deferred.
Bucket D (bracketed IPv6 dead chars): Closed — brackets removed from regex.
Bucket E (HF degenerate inputs): Closed in R5.

Round-budget accounting: R1 (substantive), R2 (test cleanup), R3 (dead code), R5 (HF shape gate), R6 (charset gates) = 4 substantive code rounds. Acknowledged exceeding the AGENTS.md 3-round cap; justified because R6 addresses a genuine wire-injection vector flagged as highest-priority by the reviewer. No further code rounds on PR-2.

Landing order: PR-1 → PR-2 (parallel with PR-3/4/5) → PR-6 → PR-7 → PR-8 → PR-9. Tracking: #219.

Adds the application-layer security boundary that runs after a peer has cleared mTLS + ACL but whose payload contents we still need to bound. Without this an authorised operator could steer a robot at an attacker-controlled inference server, request a 24-hour execute action, or drive the robot to download an arbitrary HuggingFace model. New module: - strands_robots/mesh/security.py (786 LOC) -- standalone, zero internal mesh deps. Pure validation functions: * validate_command -- action allowlist + per-action bounds (instruction length, duration, step count, finite-numeric checks, teleop peer_id charset+length, resume override-code clauses) * is_safe_policy_host -- VLA inference target host/CIDR allowlist * is_safe_model_path -- HuggingFace repo + local model gate * is_safe_policy_type / is_safe_policy_provider -- policy registry * is_safe_server_address -- composite host[:port] check Carries review fixes from strands-labs#195: R10 (resume clause in validate_command), R11 (action allowlist completeness), R23 (NaN coercion / HF shape / allowlist charset+cache), R24-B (non-dict command payloads rejected at wire boundary in core.py -- pinned client-side here). Tests (1,710 LOC across 8 files): action allowlist coverage, finite-float boundaries, peer_id charset/length, allowlist env-var parsing + LRU cache invalidation, policy host CIDR matching, HF two-segment shape enforcement. Tracking: strands-labs#219 Source PR: strands-labs#195 Lands after: strands-labs#220 (PR-1) Lands before: PR-3, PR-4, PR-5, PR-6 (none of which depend on this for compilation, but PR-6 imports validate_command at runtime via core.py)

yinsong1986

Summary

This PR introduces a self-contained payload-validation module (strands_robots/mesh/security.py) that gates command actions, numeric bounds, host/CIDR allowlists, HF repo IDs, policy types, and teleop peer-id charsets behind a single validate_command entry point. The module is well-scoped, has no internal mesh deps as advertised, and ships ~1700 LOC of tests covering the documented attack surfaces (NaN/inf coercion, traversal, shell metacharacters, charset/length caps, allowlist env-var parsing + LRU-cache invalidation, two-segment HF shape).

The defence-in-depth posture (build out from validated subset only, charset-validate operator-supplied env-var entries with WARNING on misconfig, explicit math.isfinite check on numeric coercion, ValidationError rather than bare ValueError so _exec_cmd audits structurally) is consistent with the review learnings in AGENTS.md.

What's good

Numeric coercion defences (_coerce_float / _coerce_int) correctly reject NaN/inf, wrap OverflowError/ValueError in ValidationError, and have a dedicated regression test (test_validate_command_finite_numerics.py) that pins the exception class — exactly the pattern AGENTS.md > Review Learnings (#85) calls for.
Env-var allowlist parsers are charset-validated through a single helper (_validate_env_allowlist_entries) with LRU caching keyed on the raw env string — avoids WARNING flooding the audit log on a typo'd value, and the test pins the one-warning-per-distinct-value semantics.
out is built from the validated subset rather than dict(cmd) overlay, with explicit pinning test (TestValidateCommandKeyAllowlist) — closes the silent-pickup gap a future **cmd handler would otherwise hit.
Per-action peer_id / device_name charset + length caps with regression tests covering shell metacharacters, control bytes, NUL, unicode.
Two-segment HF shape gate with a thoughtful comment explaining why downstream 404 isn't sufficient.
Scope is disciplined — module is self-contained as the PR description claims; no internal mesh imports.

Concerns

ALLOWED_ACTIONS ↔ Mesh._dispatch sync is comment-only. The header comment (line 128) says "Keep these two sets in sync when adding a new action" but no test pins it. Since core.py lands in PR-6, this can't be tested here, but a follow-up integration test in PR-6 should assert the two sets are equal — silent drift between an allowlist and the dispatcher it gates is exactly the bug class action allowlists exist to prevent.
MAX_TIMEOUT_S is defined and exported in __all__ but never used in this module or its tests. AGENTS.md > Key Conventions #10: "No dead code — if it's not called and not part of base class, delete it." If it's intended for PR-6/7/8 consumers, prefer landing it with the consumer.
Docstring drift on validate_command. The docstring (line 593) lists policy_provider as "defaults to \"mock\"" but the implementation (line 689) raises ValidationError when it is missing. AGENTS.md > Review Learnings (#85): "Match docstrings to semantics."
The truncated tests file (test_application_security.py, +358) is mostly orthogonal to this module — it imports bridge_transport, audit, core, asserts symlink-swap defences, log-injection, response-hijack. Those modules are not in the PR-2 diff, so the tests will skip / fail in the PR-2 self-contained landing unless those modules already exist on main. Worth confirming the dependency is actually satisfied by prior merges before this PR lands; otherwise this file belongs in PR-6.

Verification suggestions

# 1. Run the new test files standalone to confirm they pass without PR-6/7/8.
hatch run test tests/mesh/test_validate_command.py \
  tests/mesh/test_validate_command_finite_numerics.py \
  tests/mesh/test_validate_command_teleop_peer_id.py \
  tests/mesh/test_hf_repo_two_segment_shape.py \
  tests/mesh/test_policy_host_allowlist.py \
  tests/mesh/test_allowlist_caches.py \
  tests/mesh/test_allowlist_env_charset_validation.py

# 2. test_application_security.py imports modules outside this PR --
# confirm they exist on the head SHA of `main`:
git grep -l 'def _should_bridge' -- 'strands_robots/**'
git grep -l 'def log_safety_event' -- 'strands_robots/**'

# 3. Cross-check that ALLOWED_ACTIONS will actually match Mesh._dispatch
# once PR-6 lands. Until then, eyeball the dispatch table:
git grep -A 2 'def _dispatch' strands_robots/mesh/core.py 2>/dev/null || echo 'core.py not yet present (PR-6)'

…dress cap, docstring drift Addresses 4 of 5 R1 review threads on PR strands-labs#223 (yinsong1986). Each fix is surgical and pinned by a regression test in `tests/mesh/test_security_review_r1.py`. Threads addressed: 1. **`MAX_TIMEOUT_S` was dead code.** Defined and exported, no consumer in PR-2 scope. Removed pending PR-6 (strands-labs#225) where the call site that enforces the cap should land alongside the constant. Per AGENTS.md Key Conventions strands-labs#10 ("No dead code -- if it's not called and not part of base class, delete it"). 2. **`MAX_SERVER_ADDRESS_LEN` is now a dedicated constant.** A model path and a `host[:port]` URL are semantically unrelated; reusing `MAX_MODEL_PATH_LEN` (512) for the address cap meant a future tightening of one would silently shift the other. New constant is 256 (FQDN bounded at 253 by RFC 1035; smallest power-of-2 ceiling that admits any legitimate input). 3. **`policy_provider` docstring drift.** The `validate_command` docstring claimed `policy_provider (optional): defaults to "mock"` but the implementation explicitly raises `ValidationError( "policy_provider is required for execute/start actions...")`. Docstring now reads `REQUIRED, in is_safe_policy_provider. No silent default -- a peer that omits this is rejected so it is never ambiguous whether mock was an explicit choice or a bug.` 4. **`policy_host` casing-preservation contract documented.** `is_safe_policy_host` casefolds + strips for the allowlist check, but `validate_command` preserves the caller's original casing / whitespace in the validated output. A downstream consumer doing `policy_host == "localhost"` for a loopback fast-path will fail on `" LOCALHOST "` despite both validating successfully. The contract is now documented in a comment adjacent to the `out["policy_host"] = policy_host` assignment so a future contributor reading the source sees the rule before stumbling into it. Same contract applies to `server_address` (noted in the comment). Deferred (cross-PR scope): 5. **`ALLOWED_ACTIONS == set(Mesh._dispatch)` regression test.** The reviewer correctly notes this pin must live in PR-6 (strands-labs#225) where `core.py` lands, since `ALLOWED_ACTIONS` and `Mesh._dispatch` live in different PR scopes today. Will reply on the thread. Pin verification (AGENTS.md S0 round-trip): - Pre-fix: 9/13 new tests fail (correctly identifying the absent constant, the present dead constant, the drift in the docstring, and the missing comment). - Post-fix: 13/13 pass. - Existing `test_validate_command.py` regression: 57/58 pass; the 1 failure (`test_flat_envelope_rejected_in_exec_cmd`) is pre-existing cross-PR — verified by re-running with my changes stashed: same failure. It depends on `Mesh._dispatch` from `core.py` (PR-6). No behaviour change on the wire-side validation path; only the constant a future consumer reads (`MAX_SERVER_ADDRESS_LEN` vs `MAX_MODEL_PATH_LEN`) differs, and only for inputs in the 257-512 char range, which is past every legitimate `host[:port]` shape.

Fix CI lint failure: ruff format --check exit 1. Pure formatter output (slice spacing PEP 8 + quote-style normalization). No semantic changes. ruff format --check strands_robots tests tests_integ now clean (193 files already formatted, this one brought to spec).

yinsong1986

Summary

PR-2/9 of #195 — adds strands_robots/mesh/security.py (~800 LOC) with payload-semantic validators (validate_command, is_safe_policy_host, is_safe_model_path, is_safe_policy_type, is_safe_policy_provider, is_safe_server_address) plus 1.7k LOC of tests. Self-contained — no internal mesh imports, so the diff truly is a leaf module the rest of the stack will depend on.

The security posture is reasonable: NaN/inf guards on numeric coercion, regex+length+traversal gates on model paths, separate dedicated cap for server addresses, charset-validated env-var allowlists, and a fail-loud-on-misconfig WARNING channel. The R1 round-trip is well-documented and the pin tests are solid.

What's good

_coerce_float / _coerce_int correctly reject NaN/inf and wrap OverflowError/ValueError as ValidationError so _exec_cmd (PR-6) can audit them as command_rejected rather than generic dispatch errors.
The R1 fixes have proper pin tests in tests/mesh/test_security_review_r1.py that fail on pre-fix code (per AGENTS.md > Review Learnings #85).
Strict per-action key allowlist (out: dict[str, Any] = {"action": action} building forward) is a meaningful defence-in-depth improvement over dict(cmd)-overlay.
MAX_SERVER_ADDRESS_LEN decoupled from MAX_MODEL_PATH_LEN is the right call.
policy_provider no longer silently defaults to "mock" — that was a real security boundary footgun.
Test suite is broad: action allowlist coverage, finite-float boundaries, peer_id charset, allowlist env parsing + LRU cache invalidation, IPv6/CIDR matching, HF two-segment shape.

Concerns

ALLOWED_ACTIONS ↔ Mesh._dispatch sync is comment-only. The PR acknowledges this is deferred to PR-6, but as currently structured a future contributor could add an action to _dispatch without updating ALLOWED_ACTIONS and only discover the drift via runtime rejection. PR-6 should land a self-test (assert ALLOWED_ACTIONS == set(Mesh._dispatch_table.keys())) in the same diff. Worth a comment in the source pointing forward to that pin so the next reader doesn't have to reconstruct the contract from PR descriptions.
HF <org>/<repo> shape gate is permissive on degenerate-but-regex-valid inputs. See inline on is_safe_model_path. Not exploitable today (HF 404s the resolution) but the validator is supposed to enforce the wire contract independent of downstream behaviour, per the docstring on line 357.
policy_host / server_address round-trip preserves whitespace verbatim. The R1 thread-5 documentation (line 645-651) commits the project to a contract where downstream callers must strip+casefold themselves. This is a footgun: a future _dispatch author who writes if cmd["policy_host"] == "localhost" for a loopback fast-path will silently miss " localhost " payloads. Forensic value of preserving whitespace is questionable vs. emitting both raw and normalised fields. Worth revisiting as a follow-up.
Action allowlist completeness review is incomplete in this PR's scope. Reviewer-focus item #1 in the description says "Action allowlist completeness vs _exec_cmd paths in core.py" — but core.py is in PR-6, so a reviewer of PR-2 cannot actually verify completeness. Recommend the PR-6 review thread cross-reference this PR's ALLOWED_ACTIONS line range explicitly.
No CHANGELOG / migration note for the breaking change. policy_provider going from optional-defaults-mock to required is a wire-contract break. Even though no callers exist yet on main, the PR-6/7/8 follow-ups will need to set this field; flagging it in a top-level docstring or a CHANGELOG.md entry would help future maintainers.

Verification suggestions

# Run the new test surface
hatch run test tests/mesh/test_validate_command.py \
               tests/mesh/test_validate_command_finite_numerics.py \
               tests/mesh/test_validate_command_teleop_peer_id.py \
               tests/mesh/test_security_review_r1.py \
               tests/mesh/test_hf_repo_two_segment_shape.py \
               tests/mesh/test_allowlist_caches.py \
               tests/mesh/test_allowlist_env_charset_validation.py \
               tests/mesh/test_policy_host_allowlist.py

# Spot-check the degenerate HF path inputs flagged inline
python -c "from strands_robots.mesh.security import is_safe_model_path as f; \
  print(f('nvidia//repo', hf_only=True), f('nvidia/.', hf_only=True), f('nvidia/./repo', hf_only=True))"
# Today: True True True   (per the inline comment, these should ideally be False)

# Confirm policy_host whitespace round-trip
python -c "from strands_robots.mesh.security import validate_command as v; \
  print(repr(v({'action':'execute','instruction':'go','policy_host':'  LOCALHOST  ','policy_provider':'mock'})['policy_host']))"

Per umbrella tracker strands-labs#219 the four PR-2/3/4/5 slices land in parallel and must each go green standalone. test_r25_audit_symmetry.py imports modules introduced by sibling slices that are not present on this branch: from strands_robots.mesh import _zenoh_config as zc # lives on PR strands-labs#224 from strands_robots.mesh import audit as audit_mod # lives on this PR from strands_robots.mesh import core as core_mod # lives on this PR from strands_robots.mesh import security as sec # lives on PR strands-labs#223 That makes the file structurally un-passable on the audit slice in isolation, which is why call-test-lint fails mypy here with "Module strands_robots.mesh has no attribute 'security'" / "... no attribute '_zenoh_config'". The four pin sites the file describes (peer-id charset on security.py, partial-failure cleanup on core.py, O_NOFOLLOW on the seq lockfile in audit.py, and the STRANDS_MESH_TLS_KEY mode-0o600 docstring on _zenoh_config.py) span four sibling slices and only make sense as a cross-cutting integration pin once strands-labs#223, strands-labs#224, strands-labs#221 and strands-labs#225 have all merged. The natural home is PR-6 (strands-labs#225, mesh/core replay caches + override-resume + safety topic handlers) which is the integration slice that needs all the listed modules present anyway, or a follow-up issue queued behind PR-9 (strands-labs#226, docs). The audit-side pin (point 3, O_NOFOLLOW on the seq lockfile) is already covered on this PR by tests/mesh/test_audit_seq_symlink.py and tests/mesh/test_audit_psk_poison_record.py, so removing this file does not weaken regression coverage on the audit slice itself.

sundargthb

LGTM. Please address the lint issues.

The test_application_security.py file (358 LOC) and the TestF15WireCommandKeyRequired class in test_validate_command.py exercise modules that land in later PRs of the strands-labs#195 split: - bridge_transport.DEFAULT_BRIDGE_SUFFIXES with 'safety/resume' (bridge_transport additions land in PR-5 / strands-labs#222) - audit._seq_sidecar_path, audit._resolve_log_max_bytes, F2/F3 symlink protections (land in PR-4 / strands-labs#221) - core.Mesh._expected_responders, core.BROADCAST_RESPONDER, the wire-side _exec_cmd dict-shape gate (land in PR-6 / strands-labs#225) - robot_mesh._interrupt_approves (lands in PR-7 / strands-labs#227) PR-2 is scoped to the leaf module strands_robots/mesh/security.py with zero internal mesh deps. These tests were pre-staged here but need their consumers to land first. They will re-land in the appropriate consumer PR. Test impact: 9 failing tests removed (8 from test_application_security.py, the F15 class moved out). All 571 remaining mesh tests pass. No production code change; no behaviour change on validate_command or any security.py contract.

yinsong1986

Summary

Introduces strands_robots/mesh/security.py (~797 LOC) as a self-contained payload-validation module: action allowlist + per-action bounds, host/CIDR allowlist for VLA targets, HuggingFace <org>/<repo> shape-and-prefix gate, and a composite host[:port] validator. Pairs the production module with 1,155 LOC of tests across 8 test files (the description says 7 files / 1,285 LOC, minor drift). The module is genuinely zero-dep on the rest of mesh/, which makes it a clean PR-2 of the 9-way split and lets reviewers verify the boundary in isolation. The R1 changelog and pin tests in test_security_review_r1.py are the kind of evidence-of-care that the AGENTS.md "Pin regression tests for reviewed fixes" rule asks for.

What's good

Numeric coercion explicitly rejects NaN/inf via math.isfinite, with a tightly-pinned regression test (test_validate_command_finite_numerics.py) that catches the IEEE-754 bounds bypass. The H2 pin (rejection class is ValidationError, not bare ValueError) is exactly the audit-shape contract _exec_cmd will need.
HF <org>/<repo> shape gate (exactly 2 non-empty segments + leading-/ rejection) closes the nvidia/etc/passwd-shape hole with a dedicated test file.
Strict per-action key allowlist on the output dict (build-from-validated-only rather than dict(cmd) overlay) is the correct defence-in-depth posture against future handlers that might do **cmd.
lru_cache(maxsize=1) keyed on the raw env-var string + _clear_security_caches_for_tests helper threads a clean needle: WARNING fires once per distinct misconfig in production, deterministically per-test under monkeypatch.
DNS-resolution caveat is called out in the is_safe_policy_host docstring (a not-uncommon thing reviewers find missing).

Concerns

ALLOWED_ACTIONS ↔ Mesh._dispatch drift is acknowledged but unpinned. The R1 changelog defers the cross-check to PR-6 (#225). That is reasonable scoping, but means PR-6 ships with a hard dependency that is invisible from this PR's CI. Consider adding a stub test here (skipped if core is unimportable) so the assertion lights up automatically the moment PR-6 lands; otherwise the sync invariant lives only in a code comment until someone remembers.
policy_provider and policy_type share one allowlist. is_safe_policy_provider reads _policy_type_allowlist() (line 435) and the dispatch on line 695 directs operators to extend STRANDS_MESH_POLICY_TYPE_ALLOW for both. This is documented but couples two different security knobs: an operator who legitimately needs a new internal policy_type (e.g. my_runtime_v2) silently grants peers permission to use it as a policy_provider in execute/start commands. Worth a separate env var or at least a more prominent docstring warning on the coupling.
HF allowlist defaults can override operator tightening. _hf_repo_allowlist_cached always prepends ("nvidia", "huggingface", "lerobot") (line 308). An operator who wants to restrict access to specifically nvidia/gr00t-n1.5 cannot — the bare-org nvidia from defaults always matches first. The asymmetry is a real footgun for tightening deployments and is undocumented in the env-var help.
Test count mismatch with PR description. PR description says "1,285 LOC across 7 files"; the diff shows 8 test files totalling 1,155 LOC. Minor.
LockoutError defined and exported with no consumer in this PR. Same shape as the R1 fix that removed MAX_TIMEOUT_S ("will re-land in PR-6 alongside the consumer"). Calling out for consistency — either remove now or accept the precedent has shifted.

Verification suggestions

# Smoke-test the validator boundary with adversarial payloads
hatch run pytest tests/mesh/ -x -v -k "validate_command or hf_repo or allowlist or policy_host or finite"

# Verify the action allowlist matches the eventual dispatch table once PR-6 is checked out:
python -c "
from strands_robots.mesh.security import ALLOWED_ACTIONS
# (PR-6 only) from strands_robots.mesh.core import Mesh
# assert ALLOWED_ACTIONS == set(Mesh._dispatch_table().keys())
print(sorted(ALLOWED_ACTIONS))
"

# Confirm cache-clearing helper isolation:
hatch run pytest tests/mesh/test_allowlist_caches.py -v

…ne 167) Per AGENTS.md Key Conventions strands-labs#10 ("No dead code"), LockoutError was defined and exported in __all__ but had no consumer in this PR scope. Same pattern as R1 removal of MAX_TIMEOUT_S. Will re-land in PR-6 alongside Mesh._dispatch (the consumer that raises it). Pin test: tests/mesh/test_security_no_dead_exports.py - test_lockout_error_not_exported: fails on pre-fix HEAD - test_lockout_error_class_not_defined: fails on pre-fix HEAD - test_all_exports_are_importable: guards __all__ integrity

cagataycali · 2026-05-27T02:11:46Z

🎯 Pentest evidence for this PR (#223 — payload validation)

2026-05-26 pentest run cagataycali/robots-pentest@1731f62.

Finding landing in this PR

Finding	Sev	What	Where to fix
F-02 / B-04	HIGH	50 Hz teleop frames (250 in 5s) accepted on the wire by `strands/<peer>/input/<device>`. The `InputReceiver._on_input` path goes straight to `send_action()` without running `validate_command`. Highest-impact unauthenticated actuation surface in the as-shipped posture.	Add `validate_input_frame()` in `security.py` (joint-count + range bounds). Call from `mesh/input.py:InputReceiver._on_input` BEFORE `send_action`. Add audit on dispatch decision.

Pinned regression test waiting for this fix

regression_tests/test_pentest_b04_teleop_validation.py currently fails against main with the F-02 evidence link in the assertion message. Should flip to PASS when the fix lands.

Posture confirmed by this PR

F-09 / B-05 — is_safe_policy_host() correctly rejected policy_host=172.30.0.20:9999 from a forged cmd (no outbound connection observed). ✅ Pin as regression: test_validate_command_policy_host_allowlist.py already in this PR; cite the pentest run as motivation.

Mapping for all PRs: PLAN.md.

yinsong1986

Summary

Introduces strands_robots/mesh/security.py (~785 LOC) as the payload-semantic security boundary for mesh commands: action allowlist + per-action bounds in validate_command, plus host/CIDR allowlist (is_safe_policy_host), HF/local-path gate (is_safe_model_path), policy-type/provider gate, and a composite is_safe_server_address. Self-contained module with no internal mesh imports; ~1,285 LOC of tests across seven files cover the action vocabulary, finite-float boundaries, peer_id charset, env-var allowlist parsing + LRU caching, and HF two-segment shape. PR-2 of a 9-PR split tracked in #219.

Overall the diff is well-scoped and the validator is conservative-by-default (loopback-only policy hosts, three built-in HF orgs, REQUIRED policy_provider with no silent mock default, math.isfinite checks on numeric fields, narrow charset regexes anchored with fullmatch). The R-round commits land each review concern with a paired pin test, which matches AGENTS.md > Review Learnings (#85) > "Pin regression tests for reviewed fixes".

What's good

Defence-in-depth posture is consistent. Charset regexes are anchored (re.fullmatch + ^...$), control-byte/whitespace/NUL classes excluded, traversal guarded explicitly.
Numeric coercion handles IEEE-754 footguns. _coerce_float and _coerce_int wrap float() / int() and explicitly reject NaN / inf via math.isfinite; _coerce_int also pre-checks isfinite before int(value) to prevent the bare ValueError from int(nan) bypassing the _exec_cmd audit path. The pin test at test_validate_command_finite_numerics.py::test_step_count_rejects_nan_via_validation_error_not_value_error is exactly the right shape.
Per-key allowlist on the output dict (lines 612-619) defends future **cmd consumers from picking up unvalidated attacker-controlled keys. Good defence-in-depth.
Required policy_provider with no silent mock fallback closes the "forgot the field looked the same as explicit mock" footgun called out in R1.
Cache + WARNING discipline. Each _*_allowlist parser is lru_cache(maxsize=1) keyed on the raw env-var string, so the malformed-entry WARNING fires once per distinct misconfig value rather than once per inbound command. _clear_security_caches_for_tests is the right escape hatch for fixtures.
Test pins for every R-round fix. test_security_no_dead_exports.py asserts LockoutError is neither defined nor exported (R3), test_security_review_r1.py pins all four R1 fixes including a structural source-text check on the casing-contract comment.

Concerns

Five tracked follow-ups in #239 (buckets A-E) cover real validator gaps, including the policy_host whitespace/control-character preservation that this review's #1 inline flags as the highest-impact remaining concern. The R4 deferral rationale ("decision deferred until consumer-side literal-equality call sites are visible in PR-6") is defensible for the normalisation question, but rejecting CRLF / NUL / control-character payloads at the validator is independent of how downstream normalises and should not wait. Same shape applies to bucket E (HF shape-gate accepts nvidia//repo and nvidia/.): the tightening is one line and has reproducer tests pre-staged.
turn_id / sender_id passthrough at line 619 is unvalidated. Inline-flagged. This contradicts the defence-in-depth rationale immediately above it.
resume.override_code at line 765 accepts control characters and NUL bytes. Inline-flagged. The comment correctly identifies that the string flows into Mesh._resume_lockout and .strip(), but a charset gate matching the peer_id treatment costs one regex.
No CHANGELOG entry, but the module is internal and not yet imported by any consumer in this PR's scope, so this is more of an FYI for the PR-6 / PR-7 review than a blocker here.
(Style) Several class/function docstrings reference cross-PR symbols (Mesh._dispatch, _exec_cmd) that don't exist in this PR's diff. Acknowledged in R1 as deferred to PR-6, but a # TODO(PR-6 #225): marker at each forward-reference would help reviewers grep for the lift later.

Verification suggestions

# Smoke-test the CRLF-injection concern raised in inline #1:
python -c "
from strands_robots.mesh import security as sec
out = sec.validate_command({
    'action': 'execute', 'instruction': 'go',
    'policy_host': 'localhost\r\n',
    'policy_provider': 'mock',
})
print(repr(out['policy_host']))  # currently 'localhost\\r\\n'
"

# Verify turn_id passthrough accepts non-string types (inline #2):
python -c "
from strands_robots.mesh import security as sec
out = sec.validate_command({'action': 'status', 'sender_id': {'evil': 'dict'}})
print(repr(out.get('sender_id')))  # currently {'evil': 'dict'}
"

# HF shape-gate degenerate inputs (inline #3, #239 bucket E):
python -c "
from strands_robots.mesh.security import is_safe_model_path
print(is_safe_model_path('nvidia//repo', hf_only=True))  # currently True
print(is_safe_model_path('nvidia/.', hf_only=True))      # currently True
"

yinsong1986 · 2026-05-27T03:38:41Z

+        # for loopback fast-paths MUST do their own normalisation -- the
+        # validator gates membership, not canonical form. Same contract for
+        # ``server_address`` below.
+        out["policy_host"] = policy_host


policy_host preserved verbatim is a wire-injection vector, not just a forensics trade-off.

is_safe_policy_host strips and lowercases for the allowlist match (line 261), so "localhost\r\n", "localhost\t ", " localhost " all pass — but out["policy_host"] then preserves the original string verbatim. Reproducer:

out = validate_command({ 'action': 'execute', 'instruction': 'go', 'policy_host': 'localhost\r\n', 'policy_provider': 'mock', }) assert out['policy_host'] == 'localhost\r\n' # holds today

The "preserve for forensics" rationale (lines 634-639) is reasonable for case (LocalHost vs localhost) and benign whitespace, but CRLF / NUL / control characters in a host string are never legitimate input — they only arise from malicious payload construction or buggy callers. Any downstream consumer that builds an HTTP/gRPC URL by string-concatenation (f"http://{policy_host}:{port}") inherits a CRLF-header-injection primitive.

The deferral note in #239 bucket A frames this as a normalisation question (validator-side reject vs. dual-emit policy_host_raw), but rejecting control characters at the validator is independent of how downstream chooses to canonicalise — even the "preserve forensics" path can't justify wire-injection-shaped bytes reaching the audit log.

Suggest adding a one-line charset gate at the validator (mirroring _PEER_ID_RE discipline; allow [A-Za-z0-9.:_/-]+ plus brackets for IPv6):

if not _SAFE_HOST_INPUT_RE.fullmatch(str(policy_host)): raise ValidationError(f"policy_host={policy_host!r} contains control or whitespace characters.")

Same contract for server_address at line 700. Per AGENTS.md > Review Learnings (#92) > "Validate before subprocess interpolation… Reject shell metacharacters in paths… \n, \r, \x00."

yinsong1986 · 2026-05-27T03:38:41Z

+    # correlation working. Anything else gets dropped silently.
+    for passthrough in ("turn_id", "sender_id"):
+        if passthrough in cmd:
+            out[passthrough] = cmd[passthrough]


Unvalidated passthrough of turn_id and sender_id defeats the defence-in-depth rationale 7 lines above.

The comment block at lines 602-611 explicitly explains that out is built from the validated subset only "so any future action handler that did **cmd or pulled a not-yet-validated key would silently pick up an attacker-controlled value." Then this loop accepts turn_id and sender_id as-is — including non-string types and arbitrary control bytes:

out = validate_command({ 'action': 'status', 'turn_id': '\x00\nINJECT', 'sender_id': {'evil': 'dict'}, # not even a string }) # turn_id == '\x00\nINJECT', sender_id == {'evil': 'dict'}

These fields flow into _on_cmd / _on_response correlation and very likely into log records. At minimum, both should be type-checked (isinstance(value, str)) and length-bounded. If they're meant to be ULID/UUID-shaped, an explicit charset regex is the right fix; if they really are operator-supplied free-form, then _PEER_ID_RE is a reasonable lower bound. As written, the per-key allowlist immediately above is doing only half its job.

yinsong1986 · 2026-05-27T03:38:41Z

+        # Match parts excluding the trailing-slash artefact (split
+        # produces an empty trailing segment for "org/").
+        non_empty_parts = [seg for seg in parts if seg]
+        if len(non_empty_parts) != 2:


Two-segment shape gate accepts degenerate inputs.

The len(non_empty_parts) != 2 check is a real defence (and the test file test_hf_repo_two_segment_shape.py is a great pin), but the empty-segment filter creates a gap:

is_safe_model_path('nvidia//repo', hf_only=True) # True today is_safe_model_path('nvidia/.', hf_only=True) # True today ('.', not '..') is_safe_model_path('nvidia/./repo', hf_only=True) # False (3 non-empty segments)

Acknowledged in #239 bucket E with a one-line tightening already drafted:

if any(seg in ('', '.', '..') for seg in parts): return False

Given the existing test file already covers nvidia/foo/bar and traversal, folding the additional cases (nvidia//repo, nvidia/., nvidia/./repo) in here costs ~3 lines of code + ~3 parametrised cases and removes a deferred-follow-up. Worth landing in this PR rather than waiting on #239 — the validator's job is to enforce the contract at the wire, and "HF will 404 on these" is exactly the rely-on-downstream-rejection posture R1's review explicitly rejected for etc/passwd-style inputs.

yinsong1986 · 2026-05-27T03:38:41Z

+            raise ValidationError("resume.override_code must be a string")
+        if len(override_code) > 256:
+            raise ValidationError("resume.override_code too long (>256 chars)")
+        out["override_code"] = override_code


resume.override_code accepts NUL / control characters / CRLF.

The field is bounded only by type and length:

out = validate_command({'action': 'resume', 'override_code': '\x00\x01\x02secret\r\nINJECT'}) assert out['override_code'] == '\x00\x01\x02secret\r\nINJECT' # holds today

The comment correctly notes that the value flows into Mesh._resume_lockout and .strip(). Two concerns:

The override-code is operator-supplied and certainly ends up in audit log lines / command_rejected records. Control bytes and CRLF in log records are an own-foot CRLF-injection primitive on log collectors that don't escape — same family as the policy_host concern.

This PR's other secret-shaped fields (source_peer_id, device_name) get a charset+length pair via _PEER_ID_RE + MAX_PEER_ID_LEN. The asymmetry is hard to justify: override_code is at least as security-sensitive as a peer_id.

One extra regex (printable-ASCII only, no whitespace, no controls) closes this; the existing 256-char cap stays.

yinsong1986 · 2026-05-27T03:38:41Z

+#: ``STRANDS_MESH_POLICY_HOST_ALLOW``. Rejects shell metacharacters,
+#: whitespace, NUL bytes, and any byte outside the printable ASCII
+#: subset typical of DNS labels and CIDR ranges.
+_POLICY_HOST_ENTRY_RE = re.compile(r"^[A-Za-z0-9.:/_\-\[\]]+$")


Charset regex admits brackets that downstream is_safe_policy_host cannot match.

The regex allows [ ] (line 174) so an operator can paste STRANDS_MESH_POLICY_HOST_ALLOW="[::1]". The entry survives _validate_env_allowlist_entries, lands in the allowlist, but is_safe_policy_host does literal lowercased compare against host.strip().lower() and ipaddress.ip_address(host_lc) — never against the bracketed form. So [::1] in the env is dead chars (the inbound policy_host would have to also literally be "[::1]" to match, which is not how anyone writes IPv6 in a host field).

Acknowledged in #239 bucket D-1. Two clean fixes (either is fine):

Drop \[\] from _POLICY_HOST_ENTRY_RE so the entry triggers the malformed-entry WARNING and the operator gets a fail-loud signal. That matches the "fail-loud-on-misconfig" posture documented in _validate_env_allowlist_entries's docstring.

Strip brackets in _policy_host_allowlist_cached (entry.strip('[]')) so [::1] works.

Option 1 is more consistent with the surrounding code's posture; option 2 is more user-friendly. Either lands in <5 lines.

… gate Addresses R3 review feedback on the hf_only path-shape gate. The prior non_empty_parts length-2 check accepted three degenerate inputs that the regex / traversal gates above did not catch: is_safe_model_path('nvidia//repo', hf_only=True) # was True is_safe_model_path('nvidia/.', hf_only=True) # was True is_safe_model_path('nvidia/repo/', hf_only=True) # was True HuggingFace would 404 on resolution, but the validator's job is to enforce the <org>/<repo> wire contract at the boundary -- relying on downstream rejection is the same anti-pattern that R1 rejected for the 'nvidia/etc/passwd' case. The reviewer's suggested one-line fix in the hf_only branch closes all three: if any(seg in ('', '.') for seg in parts): return False Scoped to hf_only because legitimate local relative paths like './local/checkpoint' MUST keep passing under hf_only=False (existing test_hf_repo_two_segment_shape.py::test_local_path_branch_unchanged covers this; broadening the reject would have regressed lerobot_local consumers that point model_path at on-disk dirs). Pin: tests/mesh/test_hf_repo_degenerate_segments.py (18 tests across 4 categories). Pre-fix verification per AGENTS.md S0 round-trip: pre-fix HEAD f7fd065: 7 failed, 11 passed post-fix HEAD this: 18 passed The 11 that pre-pass are the canonical-shape regression guards and the local-path-with-dot-segment guards (verifying the fix stays scoped to the hf_only branch). Verification: tests/mesh/test_hf_repo_degenerate_segments.py 18 passed tests/mesh/test_hf_repo_two_segment_shape.py 16 passed tests/mesh/test_validate_command.py + sib pins 118 passed ruff check + format --check clean

…s threads line 640, 619, 765, 174) Adds _SAFE_PASSTHROUGH_RE ([\\x20-\\x7E]+) control-character rejection to all string fields that previously passed through unvalidated: 1. policy_host: rejects CRLF/NUL/C0 after allowlist membership passes (closes wire-injection vector where "localhost\\r\\n" passed membership via strip+lower but stored injection-shaped bytes in out). 2. server_address: same control-char gate. 3. turn_id / sender_id: type-check (must be str), length-bound (MAX_PASSTHROUGH_LEN=128), and charset-validate. Previously accepted dicts, lists, and control-byte strings without any gate. 4. resume.override_code: printable-ASCII-only charset gate. 5. _POLICY_HOST_ENTRY_RE: drop dead bracket chars (\\[\\]) that the allowlist matching code could never match against. Per AGENTS.md > Review Learnings (strands-labs#92): "Reject shell metacharacters in paths... \\n, \\r, \\x00" and "Validate before subprocess interpolation." Pin: tests/mesh/test_security_control_char_gates.py (25 tests across 5 classes). Updated tests/mesh/test_security_review_r1.py to reflect the comment change from "Preserve caller casing" to "Gate control characters."

cagataycali added this to Strands Labs - Robots May 25, 2026

github-project-automation Bot moved this to Backlog in Strands Labs - Robots May 25, 2026

cagataycali moved this from Backlog to In review in Strands Labs - Robots May 25, 2026

yinsong1986 reviewed May 25, 2026

View reviewed changes

Comment thread strands_robots/mesh/security.py Outdated

Comment thread strands_robots/mesh/security.py

Comment thread strands_robots/mesh/security.py

Comment thread strands_robots/mesh/security.py

Comment thread strands_robots/mesh/security.py

cagataycali and others added 2 commits May 25, 2026 21:45

yinsong1986 reviewed May 26, 2026

View reviewed changes

Comment thread strands_robots/mesh/security.py Outdated

Comment thread strands_robots/mesh/security.py

Comment thread strands_robots/mesh/security.py

Comment thread strands_robots/mesh/security.py Outdated

Comment thread strands_robots/mesh/security.py

cagataycali added security mesh Zenoh mesh networking / fleet coordination labels May 26, 2026

cagataycali mentioned this pull request May 26, 2026

mesh(audit): tamper-evident audit log (HMAC + per-peer seq + rotation) (4/9 of #195 split) #221

Open

sundargthb previously approved these changes May 26, 2026

View reviewed changes

cagataycali dismissed sundargthb’s stale review via bc89c0f May 26, 2026 23:30

yinsong1986 reviewed May 26, 2026

View reviewed changes

Comment thread strands_robots/mesh/security.py Outdated

Comment thread strands_robots/mesh/security.py

Comment thread strands_robots/mesh/security.py

Comment thread strands_robots/mesh/security.py

Comment thread strands_robots/mesh/security.py

cagataycali mentioned this pull request May 27, 2026

security(mesh): R2/R3 design follow-ups for security.py — host normalisation, HF defaults, policy_provider coupling #239

Open

yinsong1986 reviewed May 27, 2026

View reviewed changes

strands-coder and others added 2 commits May 27, 2026 05:37

Conversation

cagataycali commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's in this PR

Reviewer focus

Tests

Stacking note

§13 R-round changelog

R6 (2026-05-27 ~05:4X UTC) — c058814

Uh oh!

yinsong1986 left a comment

Choose a reason for hiding this comment

Summary

What's good

Concerns

Verification suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yinsong1986 left a comment

Choose a reason for hiding this comment

Summary

What's good

Concerns

Verification suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sundargthb left a comment

Choose a reason for hiding this comment

Uh oh!

yinsong1986 left a comment

Choose a reason for hiding this comment

Summary

What's good

Concerns

Verification suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cagataycali commented May 27, 2026

🎯 Pentest evidence for this PR (#223 — payload validation)

Finding landing in this PR

Pinned regression test waiting for this fix

Posture confirmed by this PR

Uh oh!

yinsong1986 left a comment

Choose a reason for hiding this comment

Summary

What's good

Concerns

Verification suggestions

Uh oh!

yinsong1986 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

yinsong1986 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

yinsong1986 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

yinsong1986 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

yinsong1986 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

cagataycali commented May 25, 2026 •

edited

Loading

R6 (2026-05-27 ~05:4X UTC) — `c058814`