feat(cli): add autonomous reporting and eval proof [W6.A.2]#77
feat(cli): add autonomous reporting and eval proof [W6.A.2]#77TimothyVang wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 216da585cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| volume_args = [ | ||
| "-v", | ||
| f"{_volume_source(status, host_evidence_path.parent)}:/evidence", |
There was a problem hiding this comment.
Restore read-only evidence mounts
When any registered tool runs, this mounts the host evidence directory at /evidence without the previous :ro,noexec options. In environments where the sandboxed command misbehaves or is compromised, it can now write back to the evidence parent before the CLI records outputs, violating the repo's read-only evidence invariant and risking evidence mutation that later commands trust. Please restore read-only/noexec on the evidence mount while keeping any cache mounts writable as needed.
Useful? React with 👍 / 👎.
| case_id=case_id, | ||
| tool_key="fls", | ||
| evidence_index=evidence_index, | ||
| extra_args=("-r",), |
There was a problem hiding this comment.
Pass partition offsets to LOLBin discovery
For partitioned disk images, _run_disk_case_sequence first derives offsets and runs fsstat/fls with -o, but this helper then does the recursive LOLBin discovery as plain fls -r (and later icat without an offset). On normal E01/dd images where the filesystem starts at a partition offset, TSK will inspect the container rather than the filesystem, so PowerShell transcripts/Prefetch won't be discovered and the new EVIL_FOUND finding is silently missed. Pass the selected offset through discovery and extraction.
Useful? React with 👍 / 👎.
Summary
Task IDs
Test Plan
uv run python scripts/build_check.py --tier fastuv run pytest tests/cli/test_cli.py tests/cli/test_run_case_logic.py tests/proof/test_cloud_proof.py tests/policy/test_eval_fail_closed.py tests/scorers/test_hallucination_rate.py -q