test: add e2e enforcement scenarios by luca-iachini · Pull Request #174 · Firma-AI/openfirma

luca-iachini · 2026-06-19T07:58:06Z

Summary

End-to-end enforcement scenarios driving real agents (claude, codex) through firma run. Each runs two phases: baseline (agent alone, confirms the task is doable) and enforcement (under firma, confirms the expected decision).

Scenario	What it checks
`simple_prompt`	ALLOW plain chat — no protected action, agent completes
`allow_http_call`	ALLOW outbound HTTP when capability + `permit` policy both grant it
`deny_forbidden_http_resource`	DENY — explicit `forbid` on the resource UID
`deny_unclassified_intent`	DENY — unclassified request (no mapping rule) fails closed
`deny_http_call`	DENY — mapped action class the agent holds no capability for
`block_raw_tcp_egress`	BLOCK — raw TCP socket egress refused by the sandbox
`fs_read_deny`	DENY filesystem read of a secret outside allowed scope
`fs_delete_deny`	DENY filesystem delete of a protected file

Run

cargo nextest run -p firma --test e2e --run-ignored all -E 'test(claude::)'
cargo nextest run -p firma --test e2e --run-ignored all -E 'test(codex::)'

Add the full integration test infrastructure: harness, config, audit utilities, CI workflow, and supporting crate changes. Wire up one scenario (normal_llm_call) to validate the end-to-end flow before the remaining scenarios land in the follow-up PR.

Add 7 scenarios covering the key enforcement policies: block_paste_service, block_unlisted_host, tool_call_exfil, direct_tcp_bypass, fs_read_deny, fs_delete_deny, code_fibonacci.

supervisor writes flat AuthorityConfig TOML; firma authority --config calls load_section(..., "authority") which expects a section wrapper.

Per-run authority always runs plaintext on loopback. User config may have TLS cert paths and a fixed listen_addr; carrying those into the spawned process causes FRAME_SIZE_ERROR (h2c client vs TLS server). Clear tls config and select an ephemeral loopback port up front.

codecov · 2026-06-24T16:21:40Z

Codecov Report

❌ Patch coverage is 77.50000% with 36 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
crates/firma-run/src/authority/supervisor.rs	73.03%	11 Missing and 13 partials ⚠️
crates/firma-authority/src/keygen.rs	87.30%	6 Missing and 2 partials ⚠️
crates/firma-run/src/runtime.rs	0.00%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

…egration-tests # Conflicts: # Cargo.lock # crates/firma-run/src/authority/supervisor.rs

…ests # Conflicts: # Cargo.lock # crates/firma-run/src/authority/supervisor.rs # crates/firma/Cargo.toml

…218) Fixes FIR-404 — https://linear.app/firma-ai/issue/FIR-404 ## Summary `firma run` fails closed with `failed to read key file … No such file or directory` when the persisted `[authority].key_file` no longer exists on disk — e.g. it lives in `$XDG_RUNTIME_DIR` (a tmpfs cleared on logout/reboot) and the machine has since rebooted. ### Root cause The persisted authority-config path added in #174 reads the configured `key_file` directly but **never generates it when absent**, despite: - `persisted_authority_config`’s doc claiming *“The key is generated on demand if the configured path has none yet.”* - `generate_authority_key`’s doc referencing an `ensure_authority_key` helper that **was never implemented**. Before #174, autostart always minted a fresh ephemeral key in the per-run marker dir, so a missing configured key never mattered. #174 switched to reading the persisted key without regenerating it. ### Fix - Implement `ensure_authority_key` — idempotent: reuse an existing key so issued tokens survive restarts; mint one (creating the parent dir) only when the secret is missing. Called from `persisted_authority_config`. ### Test - `crates/firma/tests/run_echo_smoke.rs`: scaffolds a real persisted config, wipes the key to mimic the tmpfs wipe, and runs `firma run -- echo hello` **end to end** — asserting the sandboxed command executes and the key is regenerated. Skips gracefully only when the host lacks unprivileged user namespaces. - `regenerates_missing_key_file` unit test on `persisted_authority_config`. #174’s e2e scenarios always run against a freshly-minted key in a tempdir and are `#[ignore]`d, so they never exercise a stale persisted config — this adds the missing coverage. ### Verification - Reproduced the failure on unpatched code (`authority autostart failed`); patched → prints `hello`, key regenerated. - `cargo clippy -p firma-run -p firma --tests` clean; `cargo fmt --check` clean; new + existing tests pass.

luca-iachini force-pushed the fir-368-integration-tests branch from 8d0cd33 to 4fd2256 Compare June 19, 2026 08:00

luca-iachini marked this pull request as draft June 19, 2026 17:28

Base automatically changed from fir-368-e2e-tests to main June 23, 2026 16:33

luca-iachini added 10 commits June 24, 2026 15:43

refactor(tests): rename integration_tests → e2e

953cb56

feat(tests): add remaining e2e enforcement scenarios

7db2860

Add 7 scenarios covering the key enforcement policies: block_paste_service, block_unlisted_host, tool_call_exfil, direct_tcp_bypass, fs_read_deny, fs_delete_deny, code_fibonacci.

fix(run): wrap authority config in [authority] section before spawn

cf23538

supervisor writes flat AuthorityConfig TOML; firma authority --config calls load_section(..., "authority") which expects a section wrapper.

rename scenarious

d41231a

remove old workflow

3608d4e

handle mock dynamic port

4e9e9dd

fix snap

1b73059

allow policy test

959bf29

luca-iachini force-pushed the fir-368-integration-tests branch from e85cd03 to 959bf29 Compare June 24, 2026 16:19

luca-iachini added 15 commits June 24, 2026 18:25

fix fmt

78cc0c4

refactor scenario file handling

26460d5

refactor tests

57989da

refactor key gen

ee8b857

simply audit

7e05f18

fix assertions

98becdc

simplify doc

955498d

test(e2e): drop allow_workspace_code_task scenario

dfc3f6a

add constants

265d28d

ci(e2e): trigger e2e tests on push to integration branch

1470f94

remove gen key

e9cf74a

gate tests

dd9717a

add skip

31e757a

refresh insta

6a1752b

fix codex

3cb68e9

luca-iachini added 5 commits June 25, 2026 21:49

remove trigger

9882d6b

skip tests on macos using nextest override

beb718c

refactor

ff2aac5

refactor

b6dbb4c

Merge branch 'main' of github.com:Firma-AI/openfirma into fir-368-int…

b9d78db

…egration-tests # Conflicts: # Cargo.lock # crates/firma-run/src/authority/supervisor.rs

luca-iachini marked this pull request as ready for review June 26, 2026 15:53

luca-iachini requested review from LukeMathWalker, falcucci and veeso June 26, 2026 15:53

luca-iachini changed the title ~~test: add remaining e2e enforcement scenarios~~ test: add e2e enforcement scenarios Jun 26, 2026