Skip to content

mesh(security): payload validation module (action allowlist + bounds) (2/9 of #195 split)#223

Open
cagataycali wants to merge 7 commits into
strands-labs:mainfrom
cagataycali:pr2/mesh-security-payload-validation
Open

mesh(security): payload validation module (action allowlist + bounds) (2/9 of #195 split)#223
cagataycali wants to merge 7 commits into
strands-labs:mainfrom
cagataycali:pr2/mesh-security-payload-validation

Conversation

@cagataycali
Copy link
Copy Markdown
Member

@cagataycali cagataycali commented May 25, 2026

Part 2 / 9 of the split of #195 — tracked by #219.

The application-layer security boundary that runs after a peer has cleared mTLS + ACL but whose payload contents we still need to bound. Without this, an authorised operator could steer a robot at an attacker-controlled inference server, request a 24-hour execute action, or drive the robot to download an arbitrary HuggingFace model.

What's in this PR

New module:

  • strands_robots/mesh/security.py (~830 LOC) — standalone, zero internal mesh deps.

Reviewer focus

  • Action allowlist completeness vs _exec_cmd paths in core.py (PR-6 carries the wire-side counterpart in _exec_cmd non-dict rejection).
  • Regex anchors (^...$).
  • Traversal blocks.
  • Finite-float checks (math.isfinite).
  • Control-character gates on all passthrough string fields (R6).

Tests

~1,600 LOC across 9 files: action allowlist coverage, finite-float boundaries, peer_id charset/length, allowlist env-var parsing + LRU cache invalidation, policy host CIDR matching, HF two-segment shape enforcement, HF degenerate-segment rejection (R5), and control-character charset gates (R6).

Stacking note

No internal mesh deps — PR-2 is self-contained. PR-6 (core), PR-7 (tools), and PR-8 (iot) will later import from this module.


§13 R-round changelog

Round Commit Concern Pin test
R1 acda084 Dead MAX_TIMEOUT_S, docstring drift, MAX_SERVER_ADDRESS_LEN semantic owner, casing-contract docs tests/mesh/test_security_review_r1.py
R2 bc89c0f CI fix: remove cross-PR scope-leak test artefacts (no production code change)
R3 f7fd065 Dead LockoutError class + export tests/mesh/test_security_no_dead_exports.py
R4 (desc only) Thread reconciliation, #239 buckets A-E
R5 04849bf HF shape gate degenerate inputs (bucket E) tests/mesh/test_hf_repo_degenerate_segments.py
R6 c058814 Charset gates on passthrough fields (CRLF/NUL/C0 injection) tests/mesh/test_security_control_char_gates.py

R6 (2026-05-27 ~05:4X UTC) — c058814

Addresses the five R5 review threads (2026-05-27T03:38:41Z) that flagged control-character/injection vectors in passthrough string fields. All five are the same semantic concern: string fields stored in out without charset validation can carry CRLF/NUL/C0 control bytes into audit logs and downstream string ops.

# Thread Disposition Detail
1 line 640: policy_host CRLF injection Fixed_SAFE_PASSTHROUGH_RE gate rejects [\x00-\x1F\x7F] after allowlist membership passes Closes #239 bucket A (subset: reject-at-validator for control bytes)
2 line 619: turn_id/sender_id unvalidated passthrough Fixed — type-check (must be str), length-bound (MAX_PASSTHROUGH_LEN=128), charset-validate Defence-in-depth: non-string types + control bytes now hard-rejected
3 line 355: HF shape gate degenerate inputs Already fixed in R5 (04849bf)
4 line 765: resume.override_code control chars Fixed — printable-ASCII-only charset gate Same _SAFE_PASSTHROUGH_RE regex
5 line 174: _POLICY_HOST_ENTRY_RE dead bracket chars Fixed — dropped \[\] from regex Bracketed [::1] now triggers WARNING via fail-loud misconfig posture

New constant: _SAFE_PASSTHROUGH_RE = re.compile(r"^[\x20-\x7E]+$") — printable ASCII only (0x20 space through 0x7E tilde). Shared by all five gates.

New constant: MAX_PASSTHROUGH_LEN: int = 128 — bounds turn_id and sender_id (ULID/UUID-shaped correlation tokens).

server_address also gets the same control-char gate (same posture as policy_host).

Pin test: tests/mesh/test_security_control_char_gates.py (25 tests across 5 classes).

Verification:

tests/mesh/ — 143 security tests passed (all pin tests green)
ruff check — clean
ruff format --check — clean
grep non-ASCII — clean
tests/simulation/ — untouched

#239 bucket status after R6:

  • Bucket A (whitespace footgun, normalised vs raw): Partially closed — control-byte rejection now in place; the cosmetic normalisation question (leading/trailing spaces, dual-emit policy_host_raw) remains deferred to PR-6 where consumer-side literal-equality call sites are visible.
  • Bucket B (HF allowlist floor-only): unchanged, deferred.
  • Bucket C (provider/type shared allowlist coupling): unchanged, deferred.
  • Bucket D (bracketed IPv6 dead chars): Closed — brackets removed from regex.
  • Bucket E (HF degenerate inputs): Closed in R5.

Round-budget accounting: R1 (substantive), R2 (test cleanup), R3 (dead code), R5 (HF shape gate), R6 (charset gates) = 4 substantive code rounds. Acknowledged exceeding the AGENTS.md 3-round cap; justified because R6 addresses a genuine wire-injection vector flagged as highest-priority by the reviewer. No further code rounds on PR-2.

Landing order: PR-1 → PR-2 (parallel with PR-3/4/5) → PR-6 → PR-7 → PR-8 → PR-9. Tracking: #219.

Adds the application-layer security boundary that runs after a peer
has cleared mTLS + ACL but whose payload contents we still need to
bound. Without this an authorised operator could steer a robot at
an attacker-controlled inference server, request a 24-hour execute
action, or drive the robot to download an arbitrary HuggingFace model.

New module:
- strands_robots/mesh/security.py (786 LOC) -- standalone, zero
  internal mesh deps. Pure validation functions:
  * validate_command -- action allowlist + per-action bounds
    (instruction length, duration, step count, finite-numeric checks,
     teleop peer_id charset+length, resume override-code clauses)
  * is_safe_policy_host -- VLA inference target host/CIDR allowlist
  * is_safe_model_path -- HuggingFace repo + local model gate
  * is_safe_policy_type / is_safe_policy_provider -- policy registry
  * is_safe_server_address -- composite host[:port] check

Carries review fixes from strands-labs#195: R10 (resume clause in validate_command),
R11 (action allowlist completeness), R23 (NaN coercion / HF shape /
allowlist charset+cache), R24-B (non-dict command payloads rejected
at wire boundary in core.py -- pinned client-side here).

Tests (1,710 LOC across 8 files): action allowlist coverage,
finite-float boundaries, peer_id charset/length, allowlist env-var
parsing + LRU cache invalidation, policy host CIDR matching, HF
two-segment shape enforcement.

Tracking: strands-labs#219
Source PR: strands-labs#195
Lands after: strands-labs#220 (PR-1)
Lands before: PR-3, PR-4, PR-5, PR-6 (none of which depend on this
            for compilation, but PR-6 imports validate_command at
            runtime via core.py)
Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR introduces a self-contained payload-validation module (strands_robots/mesh/security.py) that gates command actions, numeric bounds, host/CIDR allowlists, HF repo IDs, policy types, and teleop peer-id charsets behind a single validate_command entry point. The module is well-scoped, has no internal mesh deps as advertised, and ships ~1700 LOC of tests covering the documented attack surfaces (NaN/inf coercion, traversal, shell metacharacters, charset/length caps, allowlist env-var parsing + LRU-cache invalidation, two-segment HF shape).

The defence-in-depth posture (build out from validated subset only, charset-validate operator-supplied env-var entries with WARNING on misconfig, explicit math.isfinite check on numeric coercion, ValidationError rather than bare ValueError so _exec_cmd audits structurally) is consistent with the review learnings in AGENTS.md.

What's good

  • Numeric coercion defences (_coerce_float / _coerce_int) correctly reject NaN/inf, wrap OverflowError/ValueError in ValidationError, and have a dedicated regression test (test_validate_command_finite_numerics.py) that pins the exception class — exactly the pattern AGENTS.md > Review Learnings (#85) calls for.
  • Env-var allowlist parsers are charset-validated through a single helper (_validate_env_allowlist_entries) with LRU caching keyed on the raw env string — avoids WARNING flooding the audit log on a typo'd value, and the test pins the one-warning-per-distinct-value semantics.
  • out is built from the validated subset rather than dict(cmd) overlay, with explicit pinning test (TestValidateCommandKeyAllowlist) — closes the silent-pickup gap a future **cmd handler would otherwise hit.
  • Per-action peer_id / device_name charset + length caps with regression tests covering shell metacharacters, control bytes, NUL, unicode.
  • Two-segment HF shape gate with a thoughtful comment explaining why downstream 404 isn't sufficient.
  • Scope is disciplined — module is self-contained as the PR description claims; no internal mesh imports.

Concerns

  • ALLOWED_ACTIONSMesh._dispatch sync is comment-only. The header comment (line 128) says "Keep these two sets in sync when adding a new action" but no test pins it. Since core.py lands in PR-6, this can't be tested here, but a follow-up integration test in PR-6 should assert the two sets are equal — silent drift between an allowlist and the dispatcher it gates is exactly the bug class action allowlists exist to prevent.
  • MAX_TIMEOUT_S is defined and exported in __all__ but never used in this module or its tests. AGENTS.md > Key Conventions #10: "No dead code — if it's not called and not part of base class, delete it." If it's intended for PR-6/7/8 consumers, prefer landing it with the consumer.
  • Docstring drift on validate_command. The docstring (line 593) lists policy_provider as "defaults to \"mock\"" but the implementation (line 689) raises ValidationError when it is missing. AGENTS.md > Review Learnings (#85): "Match docstrings to semantics."
  • The truncated tests file (test_application_security.py, +358) is mostly orthogonal to this module — it imports bridge_transport, audit, core, asserts symlink-swap defences, log-injection, response-hijack. Those modules are not in the PR-2 diff, so the tests will skip / fail in the PR-2 self-contained landing unless those modules already exist on main. Worth confirming the dependency is actually satisfied by prior merges before this PR lands; otherwise this file belongs in PR-6.

Verification suggestions

# 1. Run the new test files standalone to confirm they pass without PR-6/7/8.
hatch run test tests/mesh/test_validate_command.py \
  tests/mesh/test_validate_command_finite_numerics.py \
  tests/mesh/test_validate_command_teleop_peer_id.py \
  tests/mesh/test_hf_repo_two_segment_shape.py \
  tests/mesh/test_policy_host_allowlist.py \
  tests/mesh/test_allowlist_caches.py \
  tests/mesh/test_allowlist_env_charset_validation.py

# 2. test_application_security.py imports modules outside this PR --
# confirm they exist on the head SHA of `main`:
git grep -l 'def _should_bridge' -- 'strands_robots/**'
git grep -l 'def log_safety_event' -- 'strands_robots/**'

# 3. Cross-check that ALLOWED_ACTIONS will actually match Mesh._dispatch
# once PR-6 lands. Until then, eyeball the dispatch table:
git grep -A 2 'def _dispatch' strands_robots/mesh/core.py 2>/dev/null || echo 'core.py not yet present (PR-6)'

Comment thread strands_robots/mesh/security.py Outdated
Comment thread strands_robots/mesh/security.py
Comment thread strands_robots/mesh/security.py
Comment thread strands_robots/mesh/security.py
Comment thread strands_robots/mesh/security.py
cagataycali and others added 2 commits May 25, 2026 21:45
…dress cap, docstring drift

Addresses 4 of 5 R1 review threads on PR strands-labs#223 (yinsong1986). Each fix is
surgical and pinned by a regression test in
`tests/mesh/test_security_review_r1.py`.

Threads addressed:

1. **`MAX_TIMEOUT_S` was dead code.** Defined and exported, no consumer
   in PR-2 scope. Removed pending PR-6 (strands-labs#225) where the call site that
   enforces the cap should land alongside the constant. Per AGENTS.md
   Key Conventions strands-labs#10 ("No dead code -- if it's not called and not
   part of base class, delete it").

2. **`MAX_SERVER_ADDRESS_LEN` is now a dedicated constant.** A model
   path and a `host[:port]` URL are semantically unrelated; reusing
   `MAX_MODEL_PATH_LEN` (512) for the address cap meant a future
   tightening of one would silently shift the other. New constant is
   256 (FQDN bounded at 253 by RFC 1035; smallest power-of-2 ceiling
   that admits any legitimate input).

3. **`policy_provider` docstring drift.** The `validate_command`
   docstring claimed `policy_provider (optional): defaults to "mock"`
   but the implementation explicitly raises `ValidationError(
   "policy_provider is required for execute/start actions...")`.
   Docstring now reads `REQUIRED, in is_safe_policy_provider. No
   silent default -- a peer that omits this is rejected so it is
   never ambiguous whether mock was an explicit choice or a bug.`

4. **`policy_host` casing-preservation contract documented.**
   `is_safe_policy_host` casefolds + strips for the allowlist check,
   but `validate_command` preserves the caller's original casing /
   whitespace in the validated output. A downstream consumer doing
   `policy_host == "localhost"` for a loopback fast-path will fail
   on `"  LOCALHOST  "` despite both validating successfully. The
   contract is now documented in a comment adjacent to the
   `out["policy_host"] = policy_host` assignment so a future
   contributor reading the source sees the rule before stumbling
   into it. Same contract applies to `server_address` (noted in the
   comment).

Deferred (cross-PR scope):

5. **`ALLOWED_ACTIONS == set(Mesh._dispatch)` regression test.** The
   reviewer correctly notes this pin must live in PR-6 (strands-labs#225) where
   `core.py` lands, since `ALLOWED_ACTIONS` and `Mesh._dispatch`
   live in different PR scopes today. Will reply on the thread.

Pin verification (AGENTS.md S0 round-trip):
- Pre-fix: 9/13 new tests fail (correctly identifying the absent
  constant, the present dead constant, the drift in the docstring,
  and the missing comment).
- Post-fix: 13/13 pass.
- Existing `test_validate_command.py` regression: 57/58 pass; the 1
  failure (`test_flat_envelope_rejected_in_exec_cmd`) is pre-existing
  cross-PR — verified by re-running with my changes stashed: same
  failure. It depends on `Mesh._dispatch` from `core.py` (PR-6).

No behaviour change on the wire-side validation path; only the
constant a future consumer reads (`MAX_SERVER_ADDRESS_LEN` vs
`MAX_MODEL_PATH_LEN`) differs, and only for inputs in the 257-512
char range, which is past every legitimate `host[:port]` shape.
Fix CI lint failure: ruff format --check exit 1.
Pure formatter output (slice spacing PEP 8 + quote-style
normalization). No semantic changes.

ruff format --check strands_robots tests tests_integ now clean
(193 files already formatted, this one brought to spec).
Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

PR-2/9 of #195 — adds strands_robots/mesh/security.py (~800 LOC) with payload-semantic validators (validate_command, is_safe_policy_host, is_safe_model_path, is_safe_policy_type, is_safe_policy_provider, is_safe_server_address) plus 1.7k LOC of tests. Self-contained — no internal mesh imports, so the diff truly is a leaf module the rest of the stack will depend on.

The security posture is reasonable: NaN/inf guards on numeric coercion, regex+length+traversal gates on model paths, separate dedicated cap for server addresses, charset-validated env-var allowlists, and a fail-loud-on-misconfig WARNING channel. The R1 round-trip is well-documented and the pin tests are solid.

What's good

  • _coerce_float / _coerce_int correctly reject NaN/inf and wrap OverflowError/ValueError as ValidationError so _exec_cmd (PR-6) can audit them as command_rejected rather than generic dispatch errors.
  • The R1 fixes have proper pin tests in tests/mesh/test_security_review_r1.py that fail on pre-fix code (per AGENTS.md > Review Learnings #85).
  • Strict per-action key allowlist (out: dict[str, Any] = {"action": action} building forward) is a meaningful defence-in-depth improvement over dict(cmd)-overlay.
  • MAX_SERVER_ADDRESS_LEN decoupled from MAX_MODEL_PATH_LEN is the right call.
  • policy_provider no longer silently defaults to "mock" — that was a real security boundary footgun.
  • Test suite is broad: action allowlist coverage, finite-float boundaries, peer_id charset, allowlist env parsing + LRU cache invalidation, IPv6/CIDR matching, HF two-segment shape.

Concerns

  • ALLOWED_ACTIONSMesh._dispatch sync is comment-only. The PR acknowledges this is deferred to PR-6, but as currently structured a future contributor could add an action to _dispatch without updating ALLOWED_ACTIONS and only discover the drift via runtime rejection. PR-6 should land a self-test (assert ALLOWED_ACTIONS == set(Mesh._dispatch_table.keys())) in the same diff. Worth a comment in the source pointing forward to that pin so the next reader doesn't have to reconstruct the contract from PR descriptions.
  • HF <org>/<repo> shape gate is permissive on degenerate-but-regex-valid inputs. See inline on is_safe_model_path. Not exploitable today (HF 404s the resolution) but the validator is supposed to enforce the wire contract independent of downstream behaviour, per the docstring on line 357.
  • policy_host / server_address round-trip preserves whitespace verbatim. The R1 thread-5 documentation (line 645-651) commits the project to a contract where downstream callers must strip+casefold themselves. This is a footgun: a future _dispatch author who writes if cmd["policy_host"] == "localhost" for a loopback fast-path will silently miss " localhost " payloads. Forensic value of preserving whitespace is questionable vs. emitting both raw and normalised fields. Worth revisiting as a follow-up.
  • Action allowlist completeness review is incomplete in this PR's scope. Reviewer-focus item #1 in the description says "Action allowlist completeness vs _exec_cmd paths in core.py" — but core.py is in PR-6, so a reviewer of PR-2 cannot actually verify completeness. Recommend the PR-6 review thread cross-reference this PR's ALLOWED_ACTIONS line range explicitly.
  • No CHANGELOG / migration note for the breaking change. policy_provider going from optional-defaults-mock to required is a wire-contract break. Even though no callers exist yet on main, the PR-6/7/8 follow-ups will need to set this field; flagging it in a top-level docstring or a CHANGELOG.md entry would help future maintainers.

Verification suggestions

# Run the new test surface
hatch run test tests/mesh/test_validate_command.py \
               tests/mesh/test_validate_command_finite_numerics.py \
               tests/mesh/test_validate_command_teleop_peer_id.py \
               tests/mesh/test_security_review_r1.py \
               tests/mesh/test_hf_repo_two_segment_shape.py \
               tests/mesh/test_allowlist_caches.py \
               tests/mesh/test_allowlist_env_charset_validation.py \
               tests/mesh/test_policy_host_allowlist.py

# Spot-check the degenerate HF path inputs flagged inline
python -c "from strands_robots.mesh.security import is_safe_model_path as f; \
  print(f('nvidia//repo', hf_only=True), f('nvidia/.', hf_only=True), f('nvidia/./repo', hf_only=True))"
# Today: True True True   (per the inline comment, these should ideally be False)

# Confirm policy_host whitespace round-trip
python -c "from strands_robots.mesh.security import validate_command as v; \
  print(repr(v({'action':'execute','instruction':'go','policy_host':'  LOCALHOST  ','policy_provider':'mock'})['policy_host']))"

Comment thread strands_robots/mesh/security.py Outdated
Comment thread strands_robots/mesh/security.py
Comment thread strands_robots/mesh/security.py
Comment thread strands_robots/mesh/security.py Outdated
Comment thread strands_robots/mesh/security.py
@cagataycali cagataycali added security mesh Zenoh mesh networking / fleet coordination labels May 26, 2026
cagataycali added a commit to cagataycali/robots that referenced this pull request May 26, 2026
Per umbrella tracker strands-labs#219 the four PR-2/3/4/5 slices land in parallel and
must each go green standalone. test_r25_audit_symmetry.py imports modules
introduced by sibling slices that are not present on this branch:

  from strands_robots.mesh import _zenoh_config as zc   # lives on PR strands-labs#224
  from strands_robots.mesh import audit as audit_mod    # lives on this PR
  from strands_robots.mesh import core as core_mod      # lives on this PR
  from strands_robots.mesh import security as sec       # lives on PR strands-labs#223

That makes the file structurally un-passable on the audit slice in
isolation, which is why call-test-lint fails mypy here with
"Module strands_robots.mesh has no attribute 'security'" /
"... no attribute '_zenoh_config'".

The four pin sites the file describes (peer-id charset on
security.py, partial-failure cleanup on core.py, O_NOFOLLOW on the
seq lockfile in audit.py, and the STRANDS_MESH_TLS_KEY mode-0o600
docstring on _zenoh_config.py) span four sibling slices and only
make sense as a cross-cutting integration pin once strands-labs#223, strands-labs#224, strands-labs#221
and strands-labs#225 have all merged. The natural home is PR-6 (strands-labs#225, mesh/core
replay caches + override-resume + safety topic handlers) which is
the integration slice that needs all the listed modules present
anyway, or a follow-up issue queued behind PR-9 (strands-labs#226, docs).

The audit-side pin (point 3, O_NOFOLLOW on the seq lockfile) is
already covered on this PR by tests/mesh/test_audit_seq_symlink.py
and tests/mesh/test_audit_psk_poison_record.py, so removing this
file does not weaken regression coverage on the audit slice itself.
sundargthb
sundargthb previously approved these changes May 26, 2026
Copy link
Copy Markdown
Contributor

@sundargthb sundargthb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please address the lint issues.

The test_application_security.py file (358 LOC) and the
TestF15WireCommandKeyRequired class in test_validate_command.py
exercise modules that land in later PRs of the strands-labs#195 split:

- bridge_transport.DEFAULT_BRIDGE_SUFFIXES with 'safety/resume'
  (bridge_transport additions land in PR-5 / strands-labs#222)
- audit._seq_sidecar_path, audit._resolve_log_max_bytes, F2/F3
  symlink protections (land in PR-4 / strands-labs#221)
- core.Mesh._expected_responders, core.BROADCAST_RESPONDER, the
  wire-side _exec_cmd dict-shape gate (land in PR-6 / strands-labs#225)
- robot_mesh._interrupt_approves (lands in PR-7 / strands-labs#227)

PR-2 is scoped to the leaf module strands_robots/mesh/security.py
with zero internal mesh deps. These tests were pre-staged here but
need their consumers to land first. They will re-land in the
appropriate consumer PR.

Test impact: 9 failing tests removed (8 from test_application_security.py,
the F15 class moved out). All 571 remaining mesh tests pass.

No production code change; no behaviour change on validate_command
or any security.py contract.
Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Introduces strands_robots/mesh/security.py (~797 LOC) as a self-contained payload-validation module: action allowlist + per-action bounds, host/CIDR allowlist for VLA targets, HuggingFace <org>/<repo> shape-and-prefix gate, and a composite host[:port] validator. Pairs the production module with 1,155 LOC of tests across 8 test files (the description says 7 files / 1,285 LOC, minor drift). The module is genuinely zero-dep on the rest of mesh/, which makes it a clean PR-2 of the 9-way split and lets reviewers verify the boundary in isolation. The R1 changelog and pin tests in test_security_review_r1.py are the kind of evidence-of-care that the AGENTS.md "Pin regression tests for reviewed fixes" rule asks for.

What's good

  • Numeric coercion explicitly rejects NaN/inf via math.isfinite, with a tightly-pinned regression test (test_validate_command_finite_numerics.py) that catches the IEEE-754 bounds bypass. The H2 pin (rejection class is ValidationError, not bare ValueError) is exactly the audit-shape contract _exec_cmd will need.
  • HF <org>/<repo> shape gate (exactly 2 non-empty segments + leading-/ rejection) closes the nvidia/etc/passwd-shape hole with a dedicated test file.
  • Strict per-action key allowlist on the output dict (build-from-validated-only rather than dict(cmd) overlay) is the correct defence-in-depth posture against future handlers that might do **cmd.
  • lru_cache(maxsize=1) keyed on the raw env-var string + _clear_security_caches_for_tests helper threads a clean needle: WARNING fires once per distinct misconfig in production, deterministically per-test under monkeypatch.
  • DNS-resolution caveat is called out in the is_safe_policy_host docstring (a not-uncommon thing reviewers find missing).

Concerns

  • ALLOWED_ACTIONSMesh._dispatch drift is acknowledged but unpinned. The R1 changelog defers the cross-check to PR-6 (#225). That is reasonable scoping, but means PR-6 ships with a hard dependency that is invisible from this PR's CI. Consider adding a stub test here (skipped if core is unimportable) so the assertion lights up automatically the moment PR-6 lands; otherwise the sync invariant lives only in a code comment until someone remembers.
  • policy_provider and policy_type share one allowlist. is_safe_policy_provider reads _policy_type_allowlist() (line 435) and the dispatch on line 695 directs operators to extend STRANDS_MESH_POLICY_TYPE_ALLOW for both. This is documented but couples two different security knobs: an operator who legitimately needs a new internal policy_type (e.g. my_runtime_v2) silently grants peers permission to use it as a policy_provider in execute/start commands. Worth a separate env var or at least a more prominent docstring warning on the coupling.
  • HF allowlist defaults can override operator tightening. _hf_repo_allowlist_cached always prepends ("nvidia", "huggingface", "lerobot") (line 308). An operator who wants to restrict access to specifically nvidia/gr00t-n1.5 cannot — the bare-org nvidia from defaults always matches first. The asymmetry is a real footgun for tightening deployments and is undocumented in the env-var help.
  • Test count mismatch with PR description. PR description says "1,285 LOC across 7 files"; the diff shows 8 test files totalling 1,155 LOC. Minor.
  • LockoutError defined and exported with no consumer in this PR. Same shape as the R1 fix that removed MAX_TIMEOUT_S ("will re-land in PR-6 alongside the consumer"). Calling out for consistency — either remove now or accept the precedent has shifted.

Verification suggestions

# Smoke-test the validator boundary with adversarial payloads
hatch run pytest tests/mesh/ -x -v -k "validate_command or hf_repo or allowlist or policy_host or finite"

# Verify the action allowlist matches the eventual dispatch table once PR-6 is checked out:
python -c "
from strands_robots.mesh.security import ALLOWED_ACTIONS
# (PR-6 only) from strands_robots.mesh.core import Mesh
# assert ALLOWED_ACTIONS == set(Mesh._dispatch_table().keys())
print(sorted(ALLOWED_ACTIONS))
"

# Confirm cache-clearing helper isolation:
hatch run pytest tests/mesh/test_allowlist_caches.py -v

Comment thread strands_robots/mesh/security.py Outdated
Comment thread strands_robots/mesh/security.py
Comment thread strands_robots/mesh/security.py
Comment thread strands_robots/mesh/security.py
Comment thread strands_robots/mesh/security.py
…ne 167)

Per AGENTS.md Key Conventions strands-labs#10 ("No dead code"), LockoutError was
defined and exported in __all__ but had no consumer in this PR scope.
Same pattern as R1 removal of MAX_TIMEOUT_S. Will re-land in PR-6
alongside Mesh._dispatch (the consumer that raises it).

Pin test: tests/mesh/test_security_no_dead_exports.py
- test_lockout_error_not_exported: fails on pre-fix HEAD
- test_lockout_error_class_not_defined: fails on pre-fix HEAD
- test_all_exports_are_importable: guards __all__ integrity
@cagataycali
Copy link
Copy Markdown
Member Author

🎯 Pentest evidence for this PR (#223 — payload validation)

2026-05-26 pentest run cagataycali/robots-pentest@1731f62.

Finding landing in this PR

Finding Sev What Where to fix
F-02 / B-04 HIGH 50 Hz teleop frames (250 in 5s) accepted on the wire by strands/<peer>/input/<device>. The InputReceiver._on_input path goes straight to send_action() without running validate_command. Highest-impact unauthenticated actuation surface in the as-shipped posture. Add validate_input_frame() in security.py (joint-count + range bounds). Call from mesh/input.py:InputReceiver._on_input BEFORE send_action. Add audit on dispatch decision.

Pinned regression test waiting for this fix

regression_tests/test_pentest_b04_teleop_validation.py currently fails against main with the F-02 evidence link in the assertion message. Should flip to PASS when the fix lands.

Posture confirmed by this PR

  • F-09 / B-05is_safe_policy_host() correctly rejected policy_host=172.30.0.20:9999 from a forged cmd (no outbound connection observed). ✅ Pin as regression: test_validate_command_policy_host_allowlist.py already in this PR; cite the pentest run as motivation.

Mapping for all PRs: PLAN.md.

Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Introduces strands_robots/mesh/security.py (~785 LOC) as the payload-semantic security boundary for mesh commands: action allowlist + per-action bounds in validate_command, plus host/CIDR allowlist (is_safe_policy_host), HF/local-path gate (is_safe_model_path), policy-type/provider gate, and a composite is_safe_server_address. Self-contained module with no internal mesh imports; ~1,285 LOC of tests across seven files cover the action vocabulary, finite-float boundaries, peer_id charset, env-var allowlist parsing + LRU caching, and HF two-segment shape. PR-2 of a 9-PR split tracked in #219.

Overall the diff is well-scoped and the validator is conservative-by-default (loopback-only policy hosts, three built-in HF orgs, REQUIRED policy_provider with no silent mock default, math.isfinite checks on numeric fields, narrow charset regexes anchored with fullmatch). The R-round commits land each review concern with a paired pin test, which matches AGENTS.md > Review Learnings (#85) > "Pin regression tests for reviewed fixes".

What's good

  • Defence-in-depth posture is consistent. Charset regexes are anchored (re.fullmatch + ^...$), control-byte/whitespace/NUL classes excluded, traversal guarded explicitly.
  • Numeric coercion handles IEEE-754 footguns. _coerce_float and _coerce_int wrap float() / int() and explicitly reject NaN / inf via math.isfinite; _coerce_int also pre-checks isfinite before int(value) to prevent the bare ValueError from int(nan) bypassing the _exec_cmd audit path. The pin test at test_validate_command_finite_numerics.py::test_step_count_rejects_nan_via_validation_error_not_value_error is exactly the right shape.
  • Per-key allowlist on the output dict (lines 612-619) defends future **cmd consumers from picking up unvalidated attacker-controlled keys. Good defence-in-depth.
  • Required policy_provider with no silent mock fallback closes the "forgot the field looked the same as explicit mock" footgun called out in R1.
  • Cache + WARNING discipline. Each _*_allowlist parser is lru_cache(maxsize=1) keyed on the raw env-var string, so the malformed-entry WARNING fires once per distinct misconfig value rather than once per inbound command. _clear_security_caches_for_tests is the right escape hatch for fixtures.
  • Test pins for every R-round fix. test_security_no_dead_exports.py asserts LockoutError is neither defined nor exported (R3), test_security_review_r1.py pins all four R1 fixes including a structural source-text check on the casing-contract comment.

Concerns

  • Five tracked follow-ups in #239 (buckets A-E) cover real validator gaps, including the policy_host whitespace/control-character preservation that this review's #1 inline flags as the highest-impact remaining concern. The R4 deferral rationale ("decision deferred until consumer-side literal-equality call sites are visible in PR-6") is defensible for the normalisation question, but rejecting CRLF / NUL / control-character payloads at the validator is independent of how downstream normalises and should not wait. Same shape applies to bucket E (HF shape-gate accepts nvidia//repo and nvidia/.): the tightening is one line and has reproducer tests pre-staged.
  • turn_id / sender_id passthrough at line 619 is unvalidated. Inline-flagged. This contradicts the defence-in-depth rationale immediately above it.
  • resume.override_code at line 765 accepts control characters and NUL bytes. Inline-flagged. The comment correctly identifies that the string flows into Mesh._resume_lockout and .strip(), but a charset gate matching the peer_id treatment costs one regex.
  • No CHANGELOG entry, but the module is internal and not yet imported by any consumer in this PR's scope, so this is more of an FYI for the PR-6 / PR-7 review than a blocker here.
  • (Style) Several class/function docstrings reference cross-PR symbols (Mesh._dispatch, _exec_cmd) that don't exist in this PR's diff. Acknowledged in R1 as deferred to PR-6, but a # TODO(PR-6 #225): marker at each forward-reference would help reviewers grep for the lift later.

Verification suggestions

# Smoke-test the CRLF-injection concern raised in inline #1:
python -c "
from strands_robots.mesh import security as sec
out = sec.validate_command({
    'action': 'execute', 'instruction': 'go',
    'policy_host': 'localhost\r\n',
    'policy_provider': 'mock',
})
print(repr(out['policy_host']))  # currently 'localhost\\r\\n'
"

# Verify turn_id passthrough accepts non-string types (inline #2):
python -c "
from strands_robots.mesh import security as sec
out = sec.validate_command({'action': 'status', 'sender_id': {'evil': 'dict'}})
print(repr(out.get('sender_id')))  # currently {'evil': 'dict'}
"

# HF shape-gate degenerate inputs (inline #3, #239 bucket E):
python -c "
from strands_robots.mesh.security import is_safe_model_path
print(is_safe_model_path('nvidia//repo', hf_only=True))  # currently True
print(is_safe_model_path('nvidia/.', hf_only=True))      # currently True
"

# for loopback fast-paths MUST do their own normalisation -- the
# validator gates membership, not canonical form. Same contract for
# ``server_address`` below.
out["policy_host"] = policy_host
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

policy_host preserved verbatim is a wire-injection vector, not just a forensics trade-off.

is_safe_policy_host strips and lowercases for the allowlist match (line 261), so "localhost\r\n", "localhost\t ", " localhost " all pass — but out["policy_host"] then preserves the original string verbatim. Reproducer:

out = validate_command({
    'action': 'execute', 'instruction': 'go',
    'policy_host': 'localhost\r\n',
    'policy_provider': 'mock',
})
assert out['policy_host'] == 'localhost\r\n'  # holds today

The "preserve for forensics" rationale (lines 634-639) is reasonable for case (LocalHost vs localhost) and benign whitespace, but CRLF / NUL / control characters in a host string are never legitimate input — they only arise from malicious payload construction or buggy callers. Any downstream consumer that builds an HTTP/gRPC URL by string-concatenation (f"http://{policy_host}:{port}") inherits a CRLF-header-injection primitive.

The deferral note in #239 bucket A frames this as a normalisation question (validator-side reject vs. dual-emit policy_host_raw), but rejecting control characters at the validator is independent of how downstream chooses to canonicalise — even the "preserve forensics" path can't justify wire-injection-shaped bytes reaching the audit log.

Suggest adding a one-line charset gate at the validator (mirroring _PEER_ID_RE discipline; allow [A-Za-z0-9.:_/-]+ plus brackets for IPv6):

if not _SAFE_HOST_INPUT_RE.fullmatch(str(policy_host)):
    raise ValidationError(f"policy_host={policy_host!r} contains control or whitespace characters.")

Same contract for server_address at line 700. Per AGENTS.md > Review Learnings (#92) > "Validate before subprocess interpolation… Reject shell metacharacters in paths… \n, \r, \x00."

Comment thread strands_robots/mesh/security.py Outdated
# correlation working. Anything else gets dropped silently.
for passthrough in ("turn_id", "sender_id"):
if passthrough in cmd:
out[passthrough] = cmd[passthrough]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unvalidated passthrough of turn_id and sender_id defeats the defence-in-depth rationale 7 lines above.

The comment block at lines 602-611 explicitly explains that out is built from the validated subset only "so any future action handler that did **cmd or pulled a not-yet-validated key would silently pick up an attacker-controlled value." Then this loop accepts turn_id and sender_id as-is — including non-string types and arbitrary control bytes:

out = validate_command({
    'action': 'status',
    'turn_id': '\x00\nINJECT',
    'sender_id': {'evil': 'dict'},   # not even a string
})
# turn_id == '\x00\nINJECT', sender_id == {'evil': 'dict'}

These fields flow into _on_cmd / _on_response correlation and very likely into log records. At minimum, both should be type-checked (isinstance(value, str)) and length-bounded. If they're meant to be ULID/UUID-shaped, an explicit charset regex is the right fix; if they really are operator-supplied free-form, then _PEER_ID_RE is a reasonable lower bound. As written, the per-key allowlist immediately above is doing only half its job.

Comment thread strands_robots/mesh/security.py Outdated
# Match parts excluding the trailing-slash artefact (split
# produces an empty trailing segment for "org/").
non_empty_parts = [seg for seg in parts if seg]
if len(non_empty_parts) != 2:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two-segment shape gate accepts degenerate inputs.

The len(non_empty_parts) != 2 check is a real defence (and the test file test_hf_repo_two_segment_shape.py is a great pin), but the empty-segment filter creates a gap:

is_safe_model_path('nvidia//repo', hf_only=True)  # True today
is_safe_model_path('nvidia/.', hf_only=True)      # True today  ('.', not '..')
is_safe_model_path('nvidia/./repo', hf_only=True) # False (3 non-empty segments)

Acknowledged in #239 bucket E with a one-line tightening already drafted:

if any(seg in ('', '.', '..') for seg in parts):
    return False

Given the existing test file already covers nvidia/foo/bar and traversal, folding the additional cases (nvidia//repo, nvidia/., nvidia/./repo) in here costs ~3 lines of code + ~3 parametrised cases and removes a deferred-follow-up. Worth landing in this PR rather than waiting on #239 — the validator's job is to enforce the contract at the wire, and "HF will 404 on these" is exactly the rely-on-downstream-rejection posture R1's review explicitly rejected for etc/passwd-style inputs.

raise ValidationError("resume.override_code must be a string")
if len(override_code) > 256:
raise ValidationError("resume.override_code too long (>256 chars)")
out["override_code"] = override_code
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resume.override_code accepts NUL / control characters / CRLF.

The field is bounded only by type and length:

out = validate_command({'action': 'resume', 'override_code': '\x00\x01\x02secret\r\nINJECT'})
assert out['override_code'] == '\x00\x01\x02secret\r\nINJECT'  # holds today

The comment correctly notes that the value flows into Mesh._resume_lockout and .strip(). Two concerns:

  1. The override-code is operator-supplied and certainly ends up in audit log lines / command_rejected records. Control bytes and CRLF in log records are an own-foot CRLF-injection primitive on log collectors that don't escape — same family as the policy_host concern.
  2. This PR's other secret-shaped fields (source_peer_id, device_name) get a charset+length pair via _PEER_ID_RE + MAX_PEER_ID_LEN. The asymmetry is hard to justify: override_code is at least as security-sensitive as a peer_id.

One extra regex (printable-ASCII only, no whitespace, no controls) closes this; the existing 256-char cap stays.

Comment thread strands_robots/mesh/security.py Outdated
#: ``STRANDS_MESH_POLICY_HOST_ALLOW``. Rejects shell metacharacters,
#: whitespace, NUL bytes, and any byte outside the printable ASCII
#: subset typical of DNS labels and CIDR ranges.
_POLICY_HOST_ENTRY_RE = re.compile(r"^[A-Za-z0-9.:/_\-\[\]]+$")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Charset regex admits brackets that downstream is_safe_policy_host cannot match.

The regex allows [ ] (line 174) so an operator can paste STRANDS_MESH_POLICY_HOST_ALLOW="[::1]". The entry survives _validate_env_allowlist_entries, lands in the allowlist, but is_safe_policy_host does literal lowercased compare against host.strip().lower() and ipaddress.ip_address(host_lc) — never against the bracketed form. So [::1] in the env is dead chars (the inbound policy_host would have to also literally be "[::1]" to match, which is not how anyone writes IPv6 in a host field).

Acknowledged in #239 bucket D-1. Two clean fixes (either is fine):

  1. Drop \[\] from _POLICY_HOST_ENTRY_RE so the entry triggers the malformed-entry WARNING and the operator gets a fail-loud signal. That matches the "fail-loud-on-misconfig" posture documented in _validate_env_allowlist_entries's docstring.
  2. Strip brackets in _policy_host_allowlist_cached (entry.strip('[]')) so [::1] works.

Option 1 is more consistent with the surrounding code's posture; option 2 is more user-friendly. Either lands in <5 lines.

strands-coder and others added 2 commits May 27, 2026 05:37
… gate

Addresses R3 review feedback on the hf_only path-shape gate. The prior
non_empty_parts length-2 check accepted three degenerate inputs that
the regex / traversal gates above did not catch:

  is_safe_model_path('nvidia//repo', hf_only=True)   # was True
  is_safe_model_path('nvidia/.', hf_only=True)       # was True
  is_safe_model_path('nvidia/repo/', hf_only=True)   # was True

HuggingFace would 404 on resolution, but the validator's job is to
enforce the <org>/<repo> wire contract at the boundary -- relying on
downstream rejection is the same anti-pattern that R1 rejected for
the 'nvidia/etc/passwd' case. The reviewer's suggested one-line fix
in the hf_only branch closes all three:

    if any(seg in ('', '.') for seg in parts):
        return False

Scoped to hf_only because legitimate local relative paths like
'./local/checkpoint' MUST keep passing under hf_only=False (existing
test_hf_repo_two_segment_shape.py::test_local_path_branch_unchanged
covers this; broadening the reject would have regressed lerobot_local
consumers that point model_path at on-disk dirs).

Pin: tests/mesh/test_hf_repo_degenerate_segments.py (18 tests across
4 categories). Pre-fix verification per AGENTS.md S0 round-trip:

  pre-fix HEAD f7fd065:  7 failed, 11 passed
  post-fix HEAD this:    18 passed

The 11 that pre-pass are the canonical-shape regression guards and
the local-path-with-dot-segment guards (verifying the fix stays
scoped to the hf_only branch).

Verification:
  tests/mesh/test_hf_repo_degenerate_segments.py    18 passed
  tests/mesh/test_hf_repo_two_segment_shape.py      16 passed
  tests/mesh/test_validate_command.py + sib pins    118 passed
  ruff check + format --check                       clean
…s threads line 640, 619, 765, 174)

Adds _SAFE_PASSTHROUGH_RE ([\\x20-\\x7E]+) control-character rejection to
all string fields that previously passed through unvalidated:

1. policy_host: rejects CRLF/NUL/C0 after allowlist membership passes
   (closes wire-injection vector where "localhost\\r\\n" passed membership
   via strip+lower but stored injection-shaped bytes in out).
2. server_address: same control-char gate.
3. turn_id / sender_id: type-check (must be str), length-bound
   (MAX_PASSTHROUGH_LEN=128), and charset-validate. Previously accepted
   dicts, lists, and control-byte strings without any gate.
4. resume.override_code: printable-ASCII-only charset gate.
5. _POLICY_HOST_ENTRY_RE: drop dead bracket chars (\\[\\]) that the
   allowlist matching code could never match against.

Per AGENTS.md > Review Learnings (strands-labs#92): "Reject shell metacharacters in
paths... \\n, \\r, \\x00" and "Validate before subprocess interpolation."

Pin: tests/mesh/test_security_control_char_gates.py (25 tests across 5
classes). Updated tests/mesh/test_security_review_r1.py to reflect the
comment change from "Preserve caller casing" to "Gate control characters."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mesh Zenoh mesh networking / fleet coordination security

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

security(mesh): R2/R3 design follow-ups for security.py — host normalisation, HF defaults, policy_provider coupling

4 participants