Skip to content

feat(cli): add autonomous reporting and eval proof [W6.A.2]#77

Open
TimothyVang wants to merge 4 commits into
mainfrom
codex/split-verified-wip
Open

feat(cli): add autonomous reporting and eval proof [W6.A.2]#77
TimothyVang wants to merge 4 commits into
mainfrom
codex/split-verified-wip

Conversation

@TimothyVang

Copy link
Copy Markdown
Owner

Summary

  • Add autonomous CLI case-to-report flow, package checks, analyst HTML/PDF reporting, and submission artifacts.
  • Wire cloud proof eval to the real proof harness with a hallucination scorer and per-case ground-truth checks.
  • Harden SIFT parsing/microsandbox handling and remove the obsolete build-side swarm surface.

Task IDs

  • W1.E.2
  • W6.A.2
  • W4.D.1
  • W1.A.0

Test Plan

  • uv run python scripts/build_check.py --tier fast
  • uv run pytest tests/cli/test_cli.py tests/cli/test_run_case_logic.py tests/proof/test_cloud_proof.py tests/policy/test_eval_fail_closed.py tests/scorers/test_hallucination_rate.py -q

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 216da585cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


volume_args = [
"-v",
f"{_volume_source(status, host_evidence_path.parent)}:/evidence",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore read-only evidence mounts

When any registered tool runs, this mounts the host evidence directory at /evidence without the previous :ro,noexec options. In environments where the sandboxed command misbehaves or is compromised, it can now write back to the evidence parent before the CLI records outputs, violating the repo's read-only evidence invariant and risking evidence mutation that later commands trust. Please restore read-only/noexec on the evidence mount while keeping any cache mounts writable as needed.

Useful? React with 👍 / 👎.

case_id=case_id,
tool_key="fls",
evidence_index=evidence_index,
extra_args=("-r",),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pass partition offsets to LOLBin discovery

For partitioned disk images, _run_disk_case_sequence first derives offsets and runs fsstat/fls with -o, but this helper then does the recursive LOLBin discovery as plain fls -r (and later icat without an offset). On normal E01/dd images where the filesystem starts at a partition offset, TSK will inspect the container rather than the filesystem, so PowerShell transcripts/Prefetch won't be discovered and the new EVIL_FOUND finding is silently missed. Pass the selected offset through discovery and extraction.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant