AARM R4 core conformance: five-decision enforcement (ALLOW/DENY/ABORT/MODIFY/STEP_UP/DEFER)#210
Conversation
Bring the external firma-protobuf 0.1.1 crate into the workspace as a path dependency (crates/firma-protobuf) so the wire contract is editable in-tree. Extend the EnforcementDecision proto enum with the three AARM R4 authorization decisions required for conformance: ENFORCEMENT_DECISION_MODIFY = 4 ENFORCEMENT_DECISION_STEP_UP = 5 ENFORCEMENT_DECISION_DEFER = 6 These are backward-compatible enum additions; existing wire values 0-3 are unchanged. Bump the vendored crate to 0.2.0 and align its build to workspace conventions (system protoc, workspace deps, workspace lints).
Introduce the firma-core types backing the AARM R4 STEP_UP, DEFER, and MODIFY authorization decisions: - DenyReason::StepUpRequired — the call is blocked pending human approval or stronger authentication (R4 STEP_UP). - DenyReason::Deferred — the call is blocked and retried after a backoff window (R4 DEFER). - ModificationSpec — opaque description of a request transformation under R4 MODIFY, sourced from a Cedar @modify annotation. Both new deny reasons carry the agent-visible surface for blocked remediation outcomes: STEP_UP and DEFER do not proceed, so the handler returns a structured denial whose reason tells the agent how to recover. Exhaustive coverage tests are updated.
Add the MODIFY, STEP_UP, and DEFER authorization decisions end-to-end so the policy engine can produce all five AARM R4 outcomes (R4) instead of only ALLOW/DENY. Post-Cedar remediation layer (cedar_evaluator.rs): - Parse @modify / @step_up / @defer annotations on forbid policies at bundle load time into a PolicyId-keyed remediation map. - evaluate_verdict lifts a Cedar Deny into PolicyVerdict::Modify / StepUp / Defer when a firing forbid carries a remediation annotation (precedence StepUp > Defer > Modify); otherwise the deny is hard. Malformed @defer values degrade to a plain forbid (fail-closed per call, not fail-bundle). Annotations on permit policies are ignored. Constraint enforcer + decision (constraint_enforcement.rs, decision.rs): - New PolicyVerdict enum (Allow/Deny/Modify/StepUp/Defer) with a default PolicyEvaluation::evaluate_verdict that delegates to the bool evaluate(), so all 16 deny-all/allow-all test stubs keep working. - ConstraintEnforcer::evaluate{,_with_timeout} now return the verdict on the Ok path; Deny and engine errors remain hard Deny decisions. - EnforcementDecision gains Modify, StepUp, Defer variants plus is_modify/step_up/defer helpers and a step_up_pending_hitl bridge constructor from the local-exec HITL path. Pipeline + handler + audit (pipeline.rs, handler.rs, audit.rs): - enforce_inner lifts the verdict into the right EnforcementDecision; Modify dispatches like Allow, StepUp/Defer block with verified identity attached. Monitor mode overrides Deny/Modify/StepUp/Defer to Passthrough, preserving the original reason in the audit record. - handler maps StepUp/Defer to a structured Deny (StepUpRequired / Deferred) so the existing 403 path serves them with no interceptor changes; Modify proceeds to dispatch. - audit Decision mirror enum gains Modify=4, StepUp=5, Defer=6 (the as-i32 wire cast is unchanged); audit_decision_fields pins each outcome's decision code and reason. CLI monitor (firma): --decision accepts modify/step_up/defer; the filters classify wire codes 4/5/6 as Known and the Unknown(7) forward-compat fallback is preserved. Also fix two pre-existing benches that were never updated when TenancyMode was added to CapabilityValidator::new (passed the missing 5th argument), and apply rustfmt to the touched files.
Update contributor and user docs for the new ALLOW/DENY/ABORT/MODIFY/ STEP_UP/DEFER outcomes: - AGENTS.md: fix stale firma-proto references (now the vendored crates/firma-protobuf crate) and note both proto crates need system protoc. - docs/architecture/sidecar-overview.md: the wire-decision table now lists all six codes, the Stage2 result is Ok(verdict), and the firma-protobuf crate is described as vendored in-tree. - docs-site concepts/pipeline.md: new Authorization Decisions (AARM R4) section with the decision table, the @modify/@step_up/@defer annotation mechanism, precedence, and monitor-mode behavior. - docs-site concepts/policies.md: new Remediation annotations subsection with worked Cedar examples. - docs-site/public/llms.txt: discovery entries for the five-decision conformance and the annotation-driven remediation model.
…ma-config tests
The firma-config integration tests used std::assert_matches, which is an
unstable library feature (stabilized in Rust 1.96.0). This made the
workspace fail to build --all-targets on any toolchain earlier than the
repo's pinned 1.96.0. Replace every call site with stable equivalents:
- assert_matches!(x, Some(_)) -> assert!(x.is_some())
- assert_matches!(x, None) -> assert!(x.is_none())
- assert_matches!(e, S { f, .. })-> assert!(matches!(e, S { f, .. }))
- assert_matches!(e, S { f, path, .. } if path == v)
-> destructure + assert_eq! for better failure diagnostics
The integration tests still require cargo nextest (process isolation) to
run, as enforced by the existing NEXTEST=1 guard in helper.rs; this change
only affects compilation, not test execution.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…ables dprint check (CI's formatter, v0.54.0) flagged two files that rustfmt does not cover: - crates/firma-protobuf/Cargo.toml: TOML key ordering (alphabetical within [package]), 2-space array indent, and a missing trailing newline at EOF. - docs/architecture/sidecar-overview.md: markdown table column-width reflow from the new decision rows. Formatting-only; no semantic change. Generated by dprint fmt (v0.54.0, matching CI).
The Docs CI check builds rustdoc with -D warnings, and the [`remediation`](Self::remediation) link in evaluate_verdict's doc comment resolved to a private struct field, tripping the rustdoc::private_intra_doc_links warning. Drop the link; the field is private so it cannot be linked from public docs.
|
Why are we inlining the protobuf crate? Let's make changes directly in https://github.com/Firma-AI/firma-protobuf. |
Why are we making this change? Let's revert it. |
| /// `StepUpRequired` and `Deferred` carry the AARM R4 `STEP_UP` and `DEFER` | ||
| /// outcomes: the call does not proceed, so the agent-visible surface is a | ||
| /// structured denial whose reason tells the agent how to recover (request | ||
| /// approval, retry later). This is distinct from the feature-backlog variants | ||
| /// below, which are unrelated to AARM DEFER. |
There was a problem hiding this comment.
I dont' think we need this comment here. Let's just document this inline for each variant.
There was a problem hiding this comment.
Updated
Let's ensure benches are compiled in CI, so this doesn't happen again. We can do in a separate PR. |
…) in firma-config tests" This reverts commit 184b183.
Because 0.2.0 isn't published yet (it needs PR #5 merged upstream first). I can leave it as-is for now, and once proto PR #5 merges and 0.2.0 is published to crates.io, then I can revert vendoring and switch to |
The AARM R4 mapping and recovery instructions are already documented inline on each variant (StepUpRequired, Deferred). The block comment on the enum was redundant.
We can decouple the two PRs—just have this one depend on Firma-AI/firma-protobuf#5 as a git dependency. |
| /// Only `forbid` policies are eligible: a `permit` cannot raise a deny, so a | ||
| /// remediation annotation on it has no effect on the decision. Malformed | ||
| /// annotations (e.g. `@defer("not-a-number")`) are logged and skipped so the | ||
| /// policy degrades to a plain `forbid` (fail-closed per call), not a | ||
| /// fail-bundle — a single bad annotation never blocks bundle installation. |
There was a problem hiding this comment.
Are we sure that's the behaviour we want?
I think it'd be less surprising, for users, to see their bundle rejected than to have a different bundle used for execution and then having to sift through logs to see what actually happened.
There was a problem hiding this comment.
Updated.
| StepUp(String), | ||
| /// `@defer("<ms>")` — delay execution; the annotation value is the | ||
| /// retry-after backoff in milliseconds. | ||
| Defer(u64), |
There was a problem hiding this comment.
Let's store a std::time::Duration here.
There was a problem hiding this comment.
And enforce it's >0.
There was a problem hiding this comment.
Updated — Remediation::Defer now stores std::time::Duration (commit dbba6de).
There was a problem hiding this comment.
Updated — zero-duration @defer is rejected at load time (commit dbba6de).
| /// annotation. | ||
| Defer { | ||
| /// Backoff window before the agent should retry, in milliseconds. | ||
| retry_after_ms: u64, |
There was a problem hiding this comment.
Same here, let's store a std::time::Duration.
There was a problem hiding this comment.
Updated
| if let Some(value) = policy.annotation(ANNOTATION_MODIFY) { | ||
| map.insert(id.clone(), Remediation::Modify(value.to_string())); | ||
| continue; | ||
| } | ||
| if let Some(value) = policy.annotation(ANNOTATION_STEP_UP) { | ||
| map.insert(id.clone(), Remediation::StepUp(value.to_string())); | ||
| continue; | ||
| } | ||
| if let Some(value) = policy.annotation(ANNOTATION_DEFER) { |
There was a problem hiding this comment.
What happens if a forbid policy has multiple annotations?
I think we should reject it.
If not, we should only keep the highest priority annotation.
There was a problem hiding this comment.
I rewrote the annotation-scan to collect all present annotations first, reject if more than one, then parse the single one.
| if policy.effect() != Effect::Forbid { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
For similar reasons, we should also scan permit policies to ensure they don't carry annotations that would only make sense on forbid policies, and reject them if they do.
There was a problem hiding this comment.
Updated — permit policies carrying remediation annotations now reject the bundle (AnnotationOnPermit, commit 15a0ef7).
| Modify(String), | ||
| /// `@step_up("…")` — require human approval; the annotation value is the | ||
| /// human-readable challenge / approval description. | ||
| StepUp(String), |
There was a problem hiding this comment.
We should enforce that neither of these two are empty/whitespace only.
There was a problem hiding this comment.
Updated — @modify and @step_up values that are empty or whitespace-only now reject the bundle (MalformedAnnotation, commit 15a0ef7).
A malformed @defer annotation (e.g. @defer("not-a-number")) now rejects the entire bundle at load time instead of silently degrading the policy to a plain forbid. apply_bundle already handles the error by keeping the previous good snapshot active, so the operator gets an immediate, actionable error rather than a silent semantic divergence between the authored policy and the one actually enforced. This is consistent with how Cedar syntax errors already reject the bundle (PolicyParse). Annotations are part of the same bundle definition and should follow the same fail-fast pattern.
@defer annotation values now parse into std::time::Duration instead of a raw u64 milliseconds count. A zero duration rejects the bundle at load time: a defer with no backoff is indistinguishable from a plain deny, so the author almost certainly misconfigured the policy.
PolicyVerdict::Defer now carries std::time::Duration instead of a raw u64 milliseconds count, matching Remediation::Defer in the Cedar evaluator. The u64 conversion moves to the pipeline boundary where EnforcementDecision::Defer (the wire-facing type surfaced to agents) is assembled.
Replace three sequential scans (any + find for each priority tier) with a single max_by_key pass using a priority rank function (StepUp=3, Defer=2, Modify=1).
A single forbid policy with both @modify and @step_up previously kept only the first match (Modify) and silently dropped the rest — the opposite of the documented cross-policy precedence (StepUp > Defer > Modify). The result depended on an implicit check order rather than the authored semantics, so the same annotations resolved differently on one policy vs two. Reject the bundle at load time (ConflictingAnnotations). The author must split into separate forbid policies, which pick_remediation then resolves with the documented precedence.
Address two remaining review comments: 1. A permit policy carrying a remediation annotation (@modify / @step_up / @defer) now rejects the bundle (AnnotationOnPermit). A permit cannot raise a deny, so the annotation is a misconfiguration that would be silently ignored — the author almost certainly attached it to the wrong policy. 2. @modify and @step_up values that are empty or whitespace-only now reject the bundle (MalformedAnnotation). These carry human-readable descriptions surfaced to the agent; an empty value is a misconfiguration. build_remediation_map is restructured to a single pass: collect annotations for all policies, reject permits with annotations, reject forbids with multiple annotations, then validate the single annotation value.
Replace the opaque ModificationSpec::description string with a tagged enum and a small annotation DSL. V1 supports redact_header:<name> which strips the named HTTP header (case-insensitive) from the dispatch clone before forwarding. Previously MODIFY dispatched the request unmodified — the @modify description was audit-only. This was dangerous: the docs claimed a transformation was applied but none was, and the only evidence was buried in audit logs. A deny-with-context approach would risk an agent retry loop matching the same Cedar rule. The structural approach applies the transformation in the sidecar, not the agent, so no loop is possible. The original envelope is preserved for audit; the dispatch clone is mutated. The applied transformation (e.g. redacted_header:authorization) is recorded in the audit deny_reason field. Annotation values are parsed at load time via ModificationSpec::parse; unknown kinds, empty header names, and empty values reject the bundle (MalformedAnnotation). The DSL is extensible — new kinds (e.g. strip_query_param) can be added as enum variants without a wire break.
…grade Address all review comments from Luke: 1. Use http::HeaderName instead of String in RedactHeader — proper HTTP header name validation via HeaderName::from_bytes. 2. Replace String error type in ModificationSpec::parse with a typed ModificationError enum (thiserror). 3. Add StepUpSpec newtype — validates non-empty at construction. 4. Add DeferDuration newtype — validates > 0 at construction. 5. apply() returns Result; fails closed with UnsupportedModificationTarget when the policy targets HTTP headers but the action is not HTTP. 6. Wire newtypes into Remediation and PolicyVerdict enums. 7. Simplify ConflictingAnnotations doc comment. 8. Document redact_header vs credential injection interaction in the handler and the type doc. 9. Apply redact_header to CONNECT and authorize_upgrade paths — no asymmetry between dispatch paths. 10. Add test for invalid header name rejection.
Extract the MODIFY dispatch arm into a dispatch_modify helper method to stay under clippy::too_many_lines (101 → under 100 lines on handle). Update anyhow 1.0.102 → 1.0.103 to resolve RUSTSEC-2026-0190 (unsound Error::downcast_mut). The advisory was published 2026-06-25 and is pre-existing — anyhow was already in the dependency tree before this PR.
Summary
OpenFirma previously produced only 2 of the 5 AARM-required authorization decisions (ALLOW/DENY; ABORT is sidecar-internal). This is a hard conformance failure against AARM R4, which requires the policy engine to be capable of producing exactly one of
ALLOW | DENY | MODIFY | STEP_UP | DEFERfor every evaluated action. This PR implements the full five-decision set end-to-end.Conformance source: AARM v1.0 spec, R4.
What changed
Wire contract (Phase 1) — Vendored the external
firma-protobuf0.1.1 crates.io crate into the workspace ascrates/firma-protobuf(path-dep workspace member) so the wire contract is editable in-tree. Extended thefirma.v1.EnforcementDecisionproto enum with the three R4 remediation decisions:Backward-compatible enum additions; existing wire values 0–3 are unchanged. Bumped the vendored crate to 0.2.0 and aligned its build to workspace conventions (system
protoc, workspace deps, workspace lints). Fixed stalefirma-protoreferences inAGENTS.md.Core types (Phase 2) — Added typed
DenyReason::StepUpRequiredandDenyReason::Deferred(the agent-visible surface for blocked remediation outcomes) plusModificationSpectofirma-core.Post-Cedar remediation layer (Phase 3) — Cedar natively emits only
Allow/Deny. The three remediation outcomes are sourced from annotations onforbidpolicies, lifted by a new post-Cedar layer incedar_evaluator.rs:At bundle load,
forbidpolicies with@modify/@step_up/@deferannotations are indexed into aPolicyId → Remediationmap. On a CedarDeny,Response::diagnostics().reason()is scanned for firingforbidpolicy IDs; a match lifts the deny into the correspondingPolicyVerdict(precedenceStepUp > Defer > Modifywhen several fire). Aforbidwithout an annotation stays a hardDeny. Annotations onpermitpolicies are ignored (apermitcannot raise a deny). Malformed@defervalues degrade to a plainforbid(fail-closed per call, not fail-bundle).PolicyEvaluation::evaluate_verdictis a new default trait method delegating to the existing boolevaluate(), so all 16 deny-all/allow-all test stubs keep working unchanged. OnlyCedarPolicyEvaluator(and itsSwappablePolicyEvaluationwrapper) override it.Enforcement decision + pipeline (Phases 3–4) —
EnforcementDecisiongainsModify,StepUp,Defervariants withis_modify/step_up/deferhelpers and astep_up_pending_hitlbridge constructor from the local-exec HITL path (UDS wire +firma-rununchanged).ConstraintEnforcer::evaluate{,_with_timeout}now returnPolicyVerdicton theOkpath; the pipeline lifts each verdict into the matchingEnforcementDecision(Modify dispatches like Allow; StepUp/Defer block with verified identity attached). Monitor mode overridesDeny/Modify/StepUp/Defer→Passthrough, preserving the original reason in the audit record.Handler + audit + CLI (Phases 5–6) — The handler maps
Modify→dispatch andStepUp/Defer→structured 403 (DenyReason::StepUpRequired/Deferred) — no interceptor changes. Theaudit::Decisionmirror enum gainsModify=4, StepUp=5, Defer=6(theas i32wire cast is unchanged).firma monitor --decisionacceptsmodify/step-up/defer; the CLI classifies wire codes 4/5/6 asKnownand theUnknown(7)forward-compat fallback is preserved.Docs — Updated
docs-site(pipeline + policies pages),docs/architecture/sidecar-overview.md,docs-site/public/llms.txt, andAGENTS.md.Drive-by fixes
benches/stage1.rs,benches/pipeline.rs) that were never updated whenTenancyModewas added toCapabilityValidator::newin Enforce single-agent tenancy to prevent LRU session-state cross-contamination #176 — they were uncompilable onmain. Passed the missing 5th argument sojust buildworks.std::assert_matcheswith stableassert!(matches!(...))/.is_some()/.is_none()/ destructure+assert_eq!infirma-configintegration tests.assert_matchesis unstable on Rust < 1.96.0 (the repo pins 1.96.0, but this unblocks--all-targetsbuilds on earlier toolchains too). The tests still requirecargo nextest(process isolation) as enforced by the existingNEXTEST=1guard inhelper.rs.Test plan
cargo fmt --all -- --check— 0 diffscargo clippy --workspace --all-targets -- -D warnings— cleancargo build --workspace --all-targets— clean (incl.firma-configtests, previously uncompilable)cargo nextest run --workspace --all-targets— 1226 passed, 0 failed, 12 skippedcargo test --workspace --doc— 3 passed, 0 failedNew tests added:
ModificationSpecserde roundtrip;DenyReason::StepUpRequired/Deferreddisplay+serde;EnforcementDecisionvariant helpers +with_identity+step_up_pending_hitlbridge; Cedar annotation parsing (modify/step_up/defer lift, plain-forbid stays deny, malformed@deferdegrades,permitannotation ignored, multi-fire precedence); pipelineaudit_payload_from_decisionfor Modify/StepUp/Defer; CLIaarm_r4_decisions_are_known_and_filterable+decision_modify_step_up_defer_filters_select_their_codes.Risk / sharp edges
firma-protobufalignsprost/tonic-prost-buildto workspace versions (no duplicate builds observed).MODIFYdispatch semantics (V1):MODIFYdispatches likeALLOW— the modification description is recorded in the auditdeny_reason. A future task can replace the opaqueModificationSpecstring with a structured patch (header redactions, param rewrites) at which point the type gains tagged variants without a wire break.PASSTHROUGHwire value:PASSTHROUGHis still serialized asALLOW(1) with an emptytoken_id, matching the pre-existing convention.LocalExecDecision::PendingHitlis bridged onto the unifiedSTEP_UPsurface via a constructor only; the UDS wire format andfirma-runmediator are unchanged.Conformance claim
This PR satisfies AARM Core R4 (the policy engine can produce all five decisions). It does not change R1–R3 or R5–R6 status. R7–R9 (Extended) are out of scope.