Skip to content

test: add e2e enforcement scenarios#174

Merged
luca-iachini merged 36 commits into
mainfrom
fir-368-integration-tests
Jun 30, 2026
Merged

test: add e2e enforcement scenarios#174
luca-iachini merged 36 commits into
mainfrom
fir-368-integration-tests

Conversation

@luca-iachini

@luca-iachini luca-iachini commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

End-to-end enforcement scenarios driving real agents (claude, codex) through firma run. Each runs two phases: baseline (agent alone, confirms the task is doable) and enforcement (under firma, confirms the expected decision).

Scenario What it checks
simple_prompt ALLOW plain chat — no protected action, agent completes
allow_http_call ALLOW outbound HTTP when capability + permit policy both grant it
deny_forbidden_http_resource DENY — explicit forbid on the resource UID
deny_unclassified_intent DENY — unclassified request (no mapping rule) fails closed
deny_http_call DENY — mapped action class the agent holds no capability for
block_raw_tcp_egress BLOCK — raw TCP socket egress refused by the sandbox
fs_read_deny DENY filesystem read of a secret outside allowed scope
fs_delete_deny DENY filesystem delete of a protected file

Run

cargo nextest run -p firma --test e2e --run-ignored all -E 'test(claude::)'
cargo nextest run -p firma --test e2e --run-ignored all -E 'test(codex::)'

@luca-iachini luca-iachini force-pushed the fir-368-integration-tests branch from 8d0cd33 to 4fd2256 Compare June 19, 2026 08:00
@luca-iachini luca-iachini marked this pull request as draft June 19, 2026 17:28
Base automatically changed from fir-368-e2e-tests to main June 23, 2026 16:33
Add the full integration test infrastructure: harness, config, audit
utilities, CI workflow, and supporting crate changes. Wire up one
scenario (normal_llm_call) to validate the end-to-end flow before
the remaining scenarios land in the follow-up PR.
Add 7 scenarios covering the key enforcement policies:
block_paste_service, block_unlisted_host, tool_call_exfil,
direct_tcp_bypass, fs_read_deny, fs_delete_deny, code_fibonacci.
supervisor writes flat AuthorityConfig TOML; firma authority --config
calls load_section(..., "authority") which expects a section wrapper.
Per-run authority always runs plaintext on loopback. User config may
have TLS cert paths and a fixed listen_addr; carrying those into the
spawned process causes FRAME_SIZE_ERROR (h2c client vs TLS server).
Clear tls config and select an ephemeral loopback port up front.
@luca-iachini luca-iachini force-pushed the fir-368-integration-tests branch from e85cd03 to 959bf29 Compare June 24, 2026 16:19
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.50000% with 36 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/firma-run/src/authority/supervisor.rs 73.03% 11 Missing and 13 partials ⚠️
crates/firma-authority/src/keygen.rs 87.30% 6 Missing and 2 partials ⚠️
crates/firma-run/src/runtime.rs 0.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@luca-iachini luca-iachini marked this pull request as ready for review June 26, 2026 15:53
@luca-iachini luca-iachini changed the title test: add remaining e2e enforcement scenarios test: add e2e enforcement scenarios Jun 26, 2026
Comment thread tests/e2e/scenarios/fs_delete_deny.rs
Comment thread tests/e2e/scenarios/fs_read_deny.rs Outdated
Comment thread tests/e2e/scenarios/fs_read_deny.rs Outdated
Comment thread tests/e2e/scenarios/fs_read_deny.rs Outdated
Comment thread tests/e2e/scenarios/deny_unmapped_http_call.rs Outdated
Comment thread tests/e2e/scenarios/deny_unclassified_intent.rs
Comment thread tests/e2e/snapshots/e2e__audit__claude_allow_via_policy.snap Outdated
Comment thread crates/firma-run/src/authority/supervisor.rs
…ests

# Conflicts:
#	Cargo.lock
#	crates/firma-run/src/authority/supervisor.rs
#	crates/firma/Cargo.toml
@luca-iachini luca-iachini merged commit a7ace5b into main Jun 30, 2026
15 checks passed
@luca-iachini luca-iachini deleted the fir-368-integration-tests branch June 30, 2026 09:40
veeso added a commit that referenced this pull request Jul 1, 2026
…218)

Fixes FIR-404 — https://linear.app/firma-ai/issue/FIR-404

## Summary

`firma run` fails closed with `failed to read key file … No such file or
directory` when the persisted `[authority].key_file` no longer exists on
disk — e.g. it lives in `$XDG_RUNTIME_DIR` (a tmpfs cleared on
logout/reboot) and the machine has since rebooted.

### Root cause

The persisted authority-config path added in #174 reads the configured
`key_file` directly but **never generates it when absent**, despite:

- `persisted_authority_config`’s doc claiming *“The key is generated on
demand if the configured path has none yet.”*
- `generate_authority_key`’s doc referencing an `ensure_authority_key`
helper that **was never implemented**.

Before #174, autostart always minted a fresh ephemeral key in the
per-run marker dir, so a missing configured key never mattered. #174
switched to reading the persisted key without regenerating it.

### Fix

- Implement `ensure_authority_key` — idempotent: reuse an existing key
so issued tokens survive restarts; mint one (creating the parent dir)
only when the secret is missing. Called from
`persisted_authority_config`.

### Test

- `crates/firma/tests/run_echo_smoke.rs`: scaffolds a real persisted
config, wipes the key to mimic the tmpfs wipe, and runs `firma run --
echo hello` **end to end** — asserting the sandboxed command executes
and the key is regenerated. Skips gracefully only when the host lacks
unprivileged user namespaces.
- `regenerates_missing_key_file` unit test on
`persisted_authority_config`.

#174’s e2e scenarios always run against a freshly-minted key in a
tempdir and are `#[ignore]`d, so they never exercise a stale persisted
config — this adds the missing coverage.

### Verification

- Reproduced the failure on unpatched code (`authority autostart
failed`); patched → prints `hello`, key regenerated.
- `cargo clippy -p firma-run -p firma --tests` clean; `cargo fmt
--check` clean; new + existing tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants