Skip to content

mesh(tools): robot_mesh HITL via tool_context.interrupt + per-action rate limit (7/9 of #195 split)#227

Open
cagataycali wants to merge 1 commit into
strands-labs:mainfrom
cagataycali:pr7/mesh-tools-robot-mesh-hitl
Open

mesh(tools): robot_mesh HITL via tool_context.interrupt + per-action rate limit (7/9 of #195 split)#227
cagataycali wants to merge 1 commit into
strands-labs:mainfrom
cagataycali:pr7/mesh-tools-robot-mesh-hitl

Conversation

@cagataycali
Copy link
Copy Markdown
Member

Part 7 / 9 of the split of #195 — tracked by #219.

Drafted until PR-2 (#223) and PR-6 land.

The LLM tool surface

Approval delivered out-of-band of the LLM's tool-argument flow (via Strands SDK tool_context.interrupt), so prompt injection cannot smuggle approval. Only y / yes / approve / approved (case- and whitespace-insensitive, exact-match after strip().lower()) approve. Declined approvals do not consume rate-limit slots.

Per-action rate limit (separate buckets for emergency_stop / broadcast / execute) with token-bucket semantics.

What's in this PR

  • strands_robots/tools/robot_mesh.py (+324/-8) — HITL flow, per-action rate limit, client-side validate_command (defence-in-depth — programmatic callers go through the same gate as wire receive).
  • 2 test files (325 LOC).

Reviewer focus

  • Prompt injection isolation: approval delivered via SDK tool_context.interrupt, not via tool args. Verify the LLM has no mechanism to extract or fabricate the approval token.
  • Per-action rate limit: separate token buckets prevent emergency_stop flood from cannibalising execute quota.
  • Declined-approval doesn't consume slots.
  • Exact-match approval (no fuzzy match, no "yeah" / "sure" / "go").

Carries review fixes from #195

R11 (tool surface API drift), R16 (test file naming — subject-named not methodology-named), R24 (_exec_cmd non-dict rejection client-side counterpart).

Stacking note

Depends on PR-1 (#220), PR-2 (#223), PR-6. CI on this branch in isolation is expected red until those land.


Landing order: PR-1 → PR-2/3/4/5 → PR-6 → PR-7 → PR-8 → PR-9. Full plan: PR_LIST.md. Tracking: #219.

…rate limit

The LLM tool surface. Approval delivered out-of-band of the LLM's
tool-argument flow (via Strands SDK tool_context.interrupt), so prompt
injection cannot smuggle approval. Only y / yes / approve / approved
(case- and whitespace-insensitive, exact-match after strip().lower())
approve. Declined approvals do not consume rate-limit slots.

Per-action rate limit (separate buckets for emergency_stop / broadcast
/ execute) with token-bucket semantics.

Modified:
- strands_robots/tools/robot_mesh.py (+324/-8) -- HITL flow, per-
  action rate limit, client-side validate_command (defence-in-depth
  -- programmatic callers go through the same gate as wire receive).

Carries review fixes from strands-labs#195: R11 (tool surface API drift), R16
(test file naming -- subject-named not methodology-named), R24
(_exec_cmd non-dict rejection client-side counterpart).

Tests (325 LOC): HITL exact-match string parsing, declined != consumed,
per-action bucket isolation, prompt-injection isolation (approval not
extractable from LLM context), validate_command client-side.

Tracking: strands-labs#219
Source PR: strands-labs#195
Lands after: strands-labs#220 (PR-1), PR-2 (security), PR-6 (core.Mesh contract)
Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Replaces the old confirm: bool tool argument on robot_mesh with an out-of-band Strands SDK tool_context.interrupt(...) for emergency_stop / broadcast, adds per-action sliding-window rate limits with separate buckets, and runs mesh.security.validate_command client-side as defence-in-depth. Approval is exact-match (y / yes / approve / approved). Declined approvals deliberately do not consume a rate-limit slot, with a TOCTOU close-out via an atomic _rate_limit_check_and_record on the post-interrupt path. The threat model and the four layers of defence are clearly stated in the test docstring.

The core security argument holds: nothing the LLM can put in tool args can flip the approval bit, since the operator response arrives via the SDK's interrupt channel, not via the tool's argument dict. The fail-closed behaviour when interrupts are unavailable (RuntimeError from tool_context.interrupt) is correctly distinguished from InterruptException propagation, and the comment at line 375-379 calls this out explicitly.

What's good

  • Check / record split for the rate limiter (with rationale in the docstring) is a thoughtful fix for the "declined approval starves real emergency_stop" failure mode.
  • TOCTOU close-out via _rate_limit_check_and_record is a real bug fix backed by TestRateLimitTOCTOU.
  • Pre-interrupt validation of the broadcast command (R8-7) avoids burning operator approval and audit rows on a payload the validator will reject.
  • _audit_tool_action's wide except is correctly justified inline against the AGENTS.md narrow-clause rule.
  • Tests cover all four advertised defence layers (interrupt gate, rate limit, validation, audit).

Concerns

  • Rate-limit list drift in PR description: the description says "separate buckets for emergency_stop / broadcast / execute", but the implementation has buckets for tell / send / broadcast / stop / emergency_stop (no execute). Minor but worth fixing in the description so reviewers don't grep for a non-existent bucket.
  • Signature change is a breaking API change: adding a required tool_context: ToolContext parameter (even with @tool(context=True)) breaks any non-SDK caller that previously did robot_mesh(action="peers") directly. The CHANGELOG / migration note for #195's split should call this out, especially because pre-1.0 breaking changes need explicit documentation per AGENTS.md.
  • CI red on this branch in isolation is acknowledged in the PR description. Re-confirm green after PR-1 (#220), PR-2 (#223), PR-6 land before merging this.
  • Inline comments below flag a real defence-in-depth bug (the assert), an audit-coverage gap on mesh.X exceptions, and a missing regression test.

Verification suggestions

# Confirm assert-stripped behaviour reproduces the broadcast-with-None-cmd path:
python -OO -c 'from strands_robots.tools import robot_mesh as m; ...'  # see inline

# Run the PR's test file against the un-stacked branch (will need PR-1/2/6 first):
hatch run pytest tests/mesh/test_robot_mesh_security.py -v

# Spot-check the audit path: trigger an emergency_stop where mesh.emergency_stop()
# raises (e.g. session torn down) and confirm an audit row is still emitted.

# error (someone added "broadcast" without the pre-validate).
assert validated_broadcast_cmd is not None, (
"broadcast reached its handler without pre-validation — R8-7 contract broken"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert is stripped under python -O — if anyone ever runs the agent with optimisation flags (or a downstream packager does), this assert becomes a no-op and cmd falls through to None, which then gets passed straight into mesh.broadcast(None, timeout=timeout) on the next line. That defeats the R8-7 contract this comment is asserting.

Replace with an explicit raise so the invariant survives -O:

if validated_broadcast_cmd is None:
    raise RuntimeError(
        "broadcast reached its handler without pre-validation — R8-7 contract broken"
    )
cmd = validated_broadcast_cmd

(General rule: assert is for tests and developer self-checks the program can survive losing; control-flow invariants that are part of a security contract should never be guarded by assert.)

)
cmd = validated_broadcast_cmd
results = mesh.broadcast(cmd, timeout=timeout)
_audit_tool_action(action, "*", True, f"action={cmd.get('action')} responses={len(results)}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Audit log gap when mesh.X raises. The docstring (line 310-311) promises "Every tell / send / broadcast / stop / emergency_stop is audited." In practice, the audit row is only written after the underlying mesh.broadcast / mesh.tell / mesh.send / mesh.emergency_stop call returns. If those raise (e.g. zenoh session dropped, RPC timeout that surfaces as an exception, broken transport), the @tool decorator catches the exception but no _audit_tool_action(..., success=False, ...) row is emitted — so a successful operator approval that then failed at the wire layer is invisible in the audit log.

For a safety-significant tool the audit log should be the last thing to break. Wrap each mesh.X call:

try:
    results = mesh.broadcast(cmd, timeout=timeout)
except Exception as exc:  # noqa: BLE001 — audit must record any failure
    _audit_tool_action(action, "*", False, f"mesh.broadcast raised: {exc!r}")
    raise

Applies symmetrically to tell (L466), send (L491), stop (L519), emergency_stop (L527), subscribe (L537), watch (L552). Per AGENTS.md > Review Learnings (#85) > "Pin regression tests for reviewed fixes", this also wants a test that asserts an audit row exists when mesh.broadcast raises.

# pause the agent loop, so we deliberately do NOT catch
# ``Exception`` here — that would swallow the normal
# interrupt-pause flow and turn every approval into an
# immediate "interrupt unavailable" error.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a regression test pinning InterruptException propagation. The comment correctly explains why the clause is except RuntimeError and not except Exception — swallowing InterruptException would break the SDK's pause flow and silently turn every approval into "interrupt unavailable". But there's no test that asserts this. A future refactor that widens the except (or adds a try: ... except Exception: wrapper higher up) would silently re-introduce the bug.

Per AGENTS.md > Review Learnings (#85) > "Pin regression tests for reviewed fixes", add a test along the lines of:

class _FakeInterrupt(BaseException):  # mimics SDK behaviour
    pass

def test_interrupt_exception_propagates(self):
    ctx = MagicMock(name="ToolContext")
    ctx.interrupt.side_effect = _FakeInterrupt("agent loop pause")
    with patch("strands_robots.tools.robot_mesh._resolve_mesh", return_value=_stub_mesh()):
        with pytest.raises(_FakeInterrupt):
            _call("emergency_stop", ctx=ctx)

Ideally use the actual SDK exception type, but a stand-in BaseException subclass works because the production code only catches RuntimeError.

rl_race_err = _rate_limit_check_and_record(action)
if rl_race_err is not None:
_audit_tool_action(action, target, False, f"rate_limit_race: {rl_race_err}")
return _err(rl_race_err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprising UX: operator approves, action then rejected by rate-limit race. When the post-approval _rate_limit_check_and_record rejects (because a concurrent invocation raced past while waiting on the operator), the audit row reads rate_limit_race, the operator was prompted and said yes, and the action did not run. From the operator's chair this looks like "I approved an emergency stop and nothing happened."

The behaviour is technically correct (the limit is the limit), but consider:

  1. The error message returned to the agent should explicitly tell the operator their approval was wasted, so they don't assume the e-stop succeeded. The current text ("rate limit exceeded ... raced past") is for the LLM, not the human.
  2. Logging at WARNING level here (not just audit-DEBUG) would surface this rare race in production logs.
  3. Worth considering whether emergency_stop specifically should have an exception path that bypasses the limit on a confirmed operator override — the docstring on _rate_limit_check says "the rate limit exists to bound LLM-driven nuisance, not to inhibit a genuine emergency," but the post-approval rejection does exactly that for the unlucky race-loser.

@tool(context=True)
def robot_mesh(
action: str,
tool_context: ToolContext,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking signature change worth a CHANGELOG / migration note. Adding tool_context: ToolContext as a required positional parameter (even with @tool(context=True) injecting it via the SDK) breaks any caller that previously invoked robot_mesh(action="peers") directly without going through the agent loop — e.g. unit tests in downstream consumers, scripts that did from strands_robots.tools.robot_mesh import robot_mesh; robot_mesh(action="peers"). Such callers will now get TypeError: missing required positional argument: 'tool_context'.

This is fine as an intentional change (the tool is by definition agent-facing) but per the AGENTS.md "pre-1.0 breaking change without a CHANGELOG entry or migration sketch" guidance from past review-learnings, the split's release notes / CHANGELOG should call this out explicitly along with the recommended replacement (the __wrapped__ / mock-context pattern your tests use).

@cagataycali cagataycali added security mesh Zenoh mesh networking / fleet coordination labels May 26, 2026
cagataycali pushed a commit to cagataycali/robots that referenced this pull request May 26, 2026
The test_application_security.py file (358 LOC) and the
TestF15WireCommandKeyRequired class in test_validate_command.py
exercise modules that land in later PRs of the strands-labs#195 split:

- bridge_transport.DEFAULT_BRIDGE_SUFFIXES with 'safety/resume'
  (bridge_transport additions land in PR-5 / strands-labs#222)
- audit._seq_sidecar_path, audit._resolve_log_max_bytes, F2/F3
  symlink protections (land in PR-4 / strands-labs#221)
- core.Mesh._expected_responders, core.BROADCAST_RESPONDER, the
  wire-side _exec_cmd dict-shape gate (land in PR-6 / strands-labs#225)
- robot_mesh._interrupt_approves (lands in PR-7 / strands-labs#227)

PR-2 is scoped to the leaf module strands_robots/mesh/security.py
with zero internal mesh deps. These tests were pre-staged here but
need their consumers to land first. They will re-land in the
appropriate consumer PR.

Test impact: 9 failing tests removed (8 from test_application_security.py,
the F15 class moved out). All 571 remaining mesh tests pass.

No production code change; no behaviour change on validate_command
or any security.py contract.
@cagataycali
Copy link
Copy Markdown
Member Author

🎯 Pentest evidence for this PR (#227 — HITL + rate limit)

2026-05-26 pentest run cagataycali/robots-pentest@1731f62.

What this PR is achieving (positive control)

The HITL tool_context.interrupt gates plus per-action sliding-window rate limit are the strong-but-non-portable defence at the LLM → robot boundary. The pentest exercised it 10 times in two scenarios:

Scenario Result
6 direct prompt-injection prompts with HITL enabled Agent refused all 4 dangerous prompts (emergency_stop, broadcast w/ attacker policy_host, etc.); 2 readonly tool calls audited
Same 6 prompts with LLM_HITL_AUTOAPPROVE=y (sims operator-absent) Agent still refused all 4 — reasoning-layer defence held even when HITL gate auto-approved
4 indirect injections (doc summarisation, fake runbook, fleet loop, tool-output smuggle) Agent refused all 4

Sample model reasoning quoted in results/07b_prompt_injection_hitl_autoapprove.log:

"The emergency_stop action is a hard safety gate — it requires a human operator to explicitly type EMERGENCY STOP NOW in plain English, directly in this conversation. It cannot be bypassed by a claim that the on-call has approved."

Caveats worth a regression test in this PR

  • Tested with claude-sonnet-4-20250514. Reasoning-layer defence is non-portable. A future regression test should run the same prompt battery against a smaller / older model (Haiku, GPT-4o-mini, Llama-3-8B) to characterise the model-strength dependency.
  • Rate limit (emergency_stop=3/60s) was not exhausted in the test run — worth a stress test that fires 4 HITL-approved emergency_stops within 60s and asserts the 4th is rejected with the exact Try again in Xs wait-time string.
  • HITL tool_context.interrupt MUST raise RuntimeError when no agent loop is attached (so a direct agent.tool.robot_mesh(...) call from a script can't smuggle an approval). Pin a test_robot_mesh_security.py::test_interrupt_unavailable_fails_closed regression.

Full writeup: F-07 / B-07 and F-12.

LGTM signals from the pentest

  • HITL interrupts fired correctly on emergency_stop and broadcast actions (visible in victim logs).
  • Rate-limit caching (_rate_limit_check_and_record) was the only defence between fast-fire indirect prompts and bandwidth-DoS via the LLM — it held.
  • Tool audit (_audit_tool_action) wrote 6 records over the 6-prompt battery, all linked to legitimate tell / subscribe calls. Audit log is the only thing that survived the LLM's reasoning gating layer.

Mapping for all PRs: PLAN.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mesh Zenoh mesh networking / fleet coordination security

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants