test: add e2e enforcement scenarios#174
Merged
Merged
Conversation
8d0cd33 to
4fd2256
Compare
Add the full integration test infrastructure: harness, config, audit utilities, CI workflow, and supporting crate changes. Wire up one scenario (normal_llm_call) to validate the end-to-end flow before the remaining scenarios land in the follow-up PR.
Add 7 scenarios covering the key enforcement policies: block_paste_service, block_unlisted_host, tool_call_exfil, direct_tcp_bypass, fs_read_deny, fs_delete_deny, code_fibonacci.
supervisor writes flat AuthorityConfig TOML; firma authority --config calls load_section(..., "authority") which expects a section wrapper.
Per-run authority always runs plaintext on loopback. User config may have TLS cert paths and a fixed listen_addr; carrying those into the spawned process causes FRAME_SIZE_ERROR (h2c client vs TLS server). Clear tls config and select an ephemeral loopback port up front.
e85cd03 to
959bf29
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…egration-tests # Conflicts: # Cargo.lock # crates/firma-run/src/authority/supervisor.rs
LukeMathWalker
approved these changes
Jun 29, 2026
veeso
approved these changes
Jun 30, 2026
…ests # Conflicts: # Cargo.lock # crates/firma-run/src/authority/supervisor.rs # crates/firma/Cargo.toml
veeso
added a commit
that referenced
this pull request
Jul 1, 2026
…218) Fixes FIR-404 — https://linear.app/firma-ai/issue/FIR-404 ## Summary `firma run` fails closed with `failed to read key file … No such file or directory` when the persisted `[authority].key_file` no longer exists on disk — e.g. it lives in `$XDG_RUNTIME_DIR` (a tmpfs cleared on logout/reboot) and the machine has since rebooted. ### Root cause The persisted authority-config path added in #174 reads the configured `key_file` directly but **never generates it when absent**, despite: - `persisted_authority_config`’s doc claiming *“The key is generated on demand if the configured path has none yet.”* - `generate_authority_key`’s doc referencing an `ensure_authority_key` helper that **was never implemented**. Before #174, autostart always minted a fresh ephemeral key in the per-run marker dir, so a missing configured key never mattered. #174 switched to reading the persisted key without regenerating it. ### Fix - Implement `ensure_authority_key` — idempotent: reuse an existing key so issued tokens survive restarts; mint one (creating the parent dir) only when the secret is missing. Called from `persisted_authority_config`. ### Test - `crates/firma/tests/run_echo_smoke.rs`: scaffolds a real persisted config, wipes the key to mimic the tmpfs wipe, and runs `firma run -- echo hello` **end to end** — asserting the sandboxed command executes and the key is regenerated. Skips gracefully only when the host lacks unprivileged user namespaces. - `regenerates_missing_key_file` unit test on `persisted_authority_config`. #174’s e2e scenarios always run against a freshly-minted key in a tempdir and are `#[ignore]`d, so they never exercise a stale persisted config — this adds the missing coverage. ### Verification - Reproduced the failure on unpatched code (`authority autostart failed`); patched → prints `hello`, key regenerated. - `cargo clippy -p firma-run -p firma --tests` clean; `cargo fmt --check` clean; new + existing tests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
End-to-end enforcement scenarios driving real agents (
claude,codex) throughfirma run. Each runs two phases: baseline (agent alone, confirms the task is doable) and enforcement (under firma, confirms the expected decision).simple_promptallow_http_callpermitpolicy both grant itdeny_forbidden_http_resourceforbidon the resource UIDdeny_unclassified_intentdeny_http_callblock_raw_tcp_egressfs_read_denyfs_delete_denyRun