Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .agents/skills/agents-shipgate/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Do not use it for general linting, runtime monitoring, evals, model-output quali
- Existing manifest: run `agents-shipgate scan -c shipgate.yaml --suggest-patches --format json`.
- First GitHub CI: copy `assets/advisory-pr-comment.yml` to `.github/workflows/agents-shipgate.yml`.
- Explain one finding: run `agents-shipgate explain-finding <fingerprint> --from agents-shipgate-reports/report.json --json`.
- Triage heuristic findings: run `agents-shipgate findings --from agents-shipgate-reports/report.json --provenance-kind keyword_heuristic,regex_heuristic --json`.

## Boundaries

Expand Down
2 changes: 1 addition & 1 deletion .claude/commands/shipgate.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Required behavior (do not skip):
1. Set `AGENTS_SHIPGATE_AGENT_MODE=1` for every CLI call so errors emit a `next_action` JSON line on stderr.
2. Run `agents-shipgate contract --json` when available and use it to verify the installed CLI's schema versions and gating signal.
3. Confirm with the user before running `agents-shipgate init --workspace . --write` (it writes `shipgate.yaml` to the workspace).
4. Parse `agents-shipgate-reports/report.json` directly — do not scrape the markdown. **For release gating, read `release_decision.decision` first** (`"blocked" | "review_required" | "insufficient_evidence" | "passed"`; baseline-aware, v0.8+; `insufficient_evidence` added v0.14) along with `release_decision.{reason, blockers, review_items, fail_policy.would_fail_ci}`. Other stable fields: `findings[].{check_id, severity, tool_name, recommendation}`. `summary.{critical_count, high_count, medium_count, status}` is legacy and baseline-blind — kept for v0.7 callers, do not lead with it. The Release Evidence Packet is at `agents-shipgate-reports/packet.{md,json,html}`. Full contract: [`docs/agent-contract-current.md`](https://github.com/ThreeMoonsLab/agents-shipgate/blob/main/docs/agent-contract-current.md).
4. Parse `agents-shipgate-reports/report.json` directly — do not scrape the markdown. **For release gating, read `release_decision.decision` first** (`"blocked" | "review_required" | "insufficient_evidence" | "passed"`; baseline-aware, v0.8+; `insufficient_evidence` added v0.14) along with `release_decision.{reason, blockers, review_items, fail_policy.would_fail_ci}`. Other stable fields: `findings[].{check_id, severity, tool_name, recommendation}`. For reviewer triage by source reliability, run `agents-shipgate findings --from agents-shipgate-reports/report.json --provenance-kind keyword_heuristic,regex_heuristic --json`; `findings[].provenance_kind` is not a gate input. `summary.{critical_count, high_count, medium_count, status}` is legacy and baseline-blind — kept for v0.7 callers, do not lead with it. The Release Evidence Packet is at `agents-shipgate-reports/packet.{md,json,html}`. Full contract: [`docs/agent-contract-current.md`](https://github.com/ThreeMoonsLab/agents-shipgate/blob/main/docs/agent-contract-current.md).
5. Add `agents-shipgate-reports/` to `.gitignore` if it is not already.
6. Do **not** run `agents-shipgate baseline save` in this flow — baselining is a separate decision.

Expand Down
6 changes: 6 additions & 0 deletions .cursor/rules/agents-shipgate.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,12 @@ auto_apply, propose_patch_for_review, escalate_to_human,
suppress_with_reason, informational. Do not synthesize an action from
the underlying flags when the enum is present.

For reviewer triage by source reliability, run
`agents-shipgate findings --from agents-shipgate-reports/report.json
--provenance-kind keyword_heuristic,regex_heuristic --json`. The
underlying `findings[].provenance_kind` field is a filter signal only,
not a gate input.

To translate a single finding into user-facing prose, run:

agents-shipgate explain-finding <FINGERPRINT> \
Expand Down
3 changes: 2 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ Other stable top-level fields:
- `baseline.{matched_count, new_count, resolved_count}`
- `tool_inventory[]`
- `codex_plugin_surface` (v0.13+, static Codex plugin package/marketplace facts)
- `findings[].provenance_kind` (v0.15+, per-finding rule provenance — `static_declaration | ast_extraction | keyword_heuristic | regex_heuristic | policy_pack`; independent of `confidence`, useful for filtering heuristic-only findings)
- `findings[].provenance_kind` (v0.15+, per-finding rule provenance — `static_declaration | ast_extraction | keyword_heuristic | regex_heuristic | policy_pack`; independent of `confidence`, useful for reviewer filtering via `agents-shipgate findings`; never a release-gate input)
- `findings[].blocks_release` (v0.16+, explicit release-policy blockers from Action Surface Diff policies)
- `action_surface_facts` / `action_surface_diff` (v0.16+, deterministic action snapshot and base/head action delta)
- `release_decision.contribution_rules[]` (v0.17+, per-finding audit of how each finding contributed to the decision; one row per `report.findings` entry, with `category` ∈ `{blocker, review_item, excluded}` and `rule` ∈ `{policy_block_new, severity_block_new, policy_baseline_accepted, severity_baseline_accepted, review_required, sub_threshold, suppressed}`)
Expand Down Expand Up @@ -406,6 +406,7 @@ Promised to not break in `0.x` minor versions. See [STABILITY.md](STABILITY.md)
| `agents-shipgate contract` | `--json` |
| `agents-shipgate explain` | `<check_id>`, `--no-plugins`, `--json` |
| `agents-shipgate explain-finding` | `<fingerprint>`, `--from`, `--no-plugins`, `--json` |
| `agents-shipgate findings` | `--from`, `--provenance-kind`, `--include-suppressed`, `--json` |
| `agents-shipgate bootstrap` | `--workspace`, `--confidence`, `--no-ci`, `--no-apply`, `--json` |
| `agents-shipgate list-checks` | `--json`, `--no-plugins` |
| `agents-shipgate baseline save` | `-c`, `--out` |
Expand Down
3 changes: 2 additions & 1 deletion STABILITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ These commands and flags are stable across all `0.x.y` releases. They will only
| `agents-shipgate contract` | `--json` |
| `agents-shipgate explain` | `<check_id>`, `--no-plugins`, `--json` |
| `agents-shipgate explain-finding` (v0.12+) | `<fingerprint>`, `--from`, `--no-plugins`, `--json` |
| `agents-shipgate findings` (v0.20+) | `--from` (default: `agents-shipgate-reports/report.json`), `--provenance-kind`, `--include-suppressed`, `--json` |
| `agents-shipgate bootstrap` | `--workspace`, `--confidence`, `--no-ci`, `--no-apply`, `--json` |
| `agents-shipgate list-checks` | `--json`, `--no-plugins` |
| `agents-shipgate baseline save` | `-c`, `--config`, `--out` |
Expand Down Expand Up @@ -94,7 +95,7 @@ In `agents-shipgate-reports/report.json`, the following are guaranteed:
- `findings[].agent_action` (v0.12+) — deterministic projection of `patches`, `autofix_safe`, and `requires_human_review`. Enum: `auto_apply | propose_patch_for_review | escalate_to_human | suppress_with_reason | informational`. The first four cover the actionable cases; `informational` covers suppressed findings or non-actionable advisories. `suppress_with_reason` is reserved for future check classes that explicitly mark themselves as suppressible — the v0.12 deterministic projection does not emit it. New consumers should read `agent_action` first and treat the underlying flags as advisory.
- `agent_summary.{verdict, headline, blocker_count, review_item_count, auto_appliable_patches, needs_human_review, first_recommended_action}` (v0.12+) — top-level deterministic projection of `release_decision` + per-finding `agent_action`. Lets a coding agent read one block instead of traversing arrays. `first_recommended_action` is `{kind: "command" | "info", command: string | null, why: string}`; the `command` form carries an actual CLI invocation, the `info` form is a "surface this to the user" hint. Same inputs always produce the same output; this block cannot disagree with the underlying `release_decision` and `findings[].agent_action`.
- `codex_plugin_surface.{plugins, marketplaces, skills, apps, mcp_server_stubs, hook_stubs, mcp_inventory_files, component_path_issues, warnings}` (v0.13+) — static Codex plugin package and marketplace facts. Only explicit MCP inventory tools enter `tool_inventory[]`; apps, hooks, skills, and MCP server declarations stay in this surface block.
- `findings[].provenance_kind` (v0.15+) — records *how a finding was produced*; independent of `confidence`, which records how *sure* we are. Enum: `static_declaration | ast_extraction | keyword_heuristic | regex_heuristic | policy_pack`. `static_declaration` covers manifest, MCP, OpenAPI schema facts, and declarative framework inputs like ADK YAML agent configs or LangChain/CrewAI inventory JSON files — high-trust structural data. `ast_extraction` covers findings against Tools parsed from user Python source by a framework extractor (LangChain function/structured tools, CrewAI function/class tools, ADK Python toolsets); these are subject to extraction error and agents that distrust AST quality can filter them as a class. Framework checks that fire against both AST-extracted and declaratively loaded tools (ADK's per-tool checks) pick the label per tool from `tool.source_type`. `keyword_heuristic` covers token-list matches (broad scope, read-only prompts, free-text parameter names); `regex_heuristic` covers regex matches (secrets, prompt injection); `policy_pack` covers findings emitted by externally loaded policy packs. Built-in checks set the value via the required kwarg on the `tool_finding`/`agent_finding` helpers; third-party plugin checks that construct `Finding(...)` directly and omit the field are coerced to `static_declaration` by `annotate_remediation` so the wire schema stays satisfied. Required + non-nullable on the wire; the field is Python-Optional only so older v0.12/v0.13 reports loaded by `explain-finding` and minimal synthetic test fixtures keep working.
- `findings[].provenance_kind` (v0.15+) — records *how a finding was produced*; independent of `confidence`, which records how *sure* we are. It is a reviewer triage/filter signal only: it never changes `release_decision`, severity, fingerprints, baselines, or CI exit behavior. Use `agents-shipgate findings --from agents-shipgate-reports/report.json --provenance-kind keyword_heuristic,regex_heuristic --json` to filter active findings by provenance class. Enum: `static_declaration | ast_extraction | keyword_heuristic | regex_heuristic | policy_pack`. `static_declaration` covers manifest, MCP, OpenAPI schema facts, and declarative framework inputs like ADK YAML agent configs or LangChain/CrewAI inventory JSON files — high-trust structural data. `ast_extraction` covers findings against Tools parsed from user Python source by a framework extractor (LangChain function/structured tools, CrewAI function/class tools, ADK Python toolsets); these are subject to extraction error and agents that distrust AST quality can filter them as a class. Framework checks that fire against both AST-extracted and declaratively loaded tools (ADK's per-tool checks) pick the label per tool from `tool.source_type`. `keyword_heuristic` covers token-list matches (broad scope, read-only prompts, free-text parameter names); `regex_heuristic` covers regex matches (secrets, prompt injection); `policy_pack` covers findings emitted by externally loaded policy packs. Built-in checks set the value via the required kwarg on the `tool_finding`/`agent_finding` helpers; third-party plugin checks that construct `Finding(...)` directly and omit the field are coerced to `static_declaration` by `annotate_remediation` so the wire schema stays satisfied. Required + non-nullable on the wire; the field is Python-Optional only so older v0.12/v0.13 reports loaded by `explain-finding` and minimal synthetic test fixtures keep working.
- `findings[].blocks_release` (v0.16+) — explicit release-policy blocking bit. Built-in and user-defined Action Surface Diff policies, plus declarative policy-pack rules with `block: true`, set it for findings that must block release when active and unbaselined; ordinary severity-based gating still works for existing checks.
- `action_surface_facts.actions[]` (v0.16+) — deterministic current action snapshot: action id, operation, effect, normalized risk tags, scopes, approval policy, safeguards, evidence, input fields, and stable hashes.
- `action_surface_diff.{enabled, base, summary, added, removed, modified, notes}` (v0.16+) — reviewer-facing delta for what the agent can do vs. a prior report or v0.4 baseline. Policy findings derived from this diff can set `findings[].blocks_release=true` and affect `release_decision.decision` and strict-mode exit behavior.
Expand Down
14 changes: 13 additions & 1 deletion docs/agent-contract-current.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,9 @@ The action exposes these as outputs `decision`, `blocker_count`, `review_item_co

`agents-shipgate contract --json` exposes `manual_review_signals[]` as the
installed CLI's stable list of report/packet fields to inspect for human review
work.
work. `findings[].provenance_kind` is included there as a filter/review signal
only; it never changes the release decision, severity, fingerprints, baselines,
or CI exit behavior.

The capability/intent diff fields (v0.9+), used by reviewers to spot misalignment between declared agent intent and actual tool surface:

Expand Down Expand Up @@ -96,6 +98,16 @@ Per-finding `provenance_kind` enum (v0.15+), additive classification — read th

Provenance generally follows the rule's own trigger (e.g., a rule that checks for a declared manifest field is `static_declaration` even when the underlying Tool was AST-extracted). For framework checks that fire across both AST and declarative tool sources (ADK's per-tool checks against `google_adk_function` AND `google_adk_config` tools), the label tracks the underlying tool's source. Third-party plugin checks that don't yet set the field land at `static_declaration` by default — pre-v0.15 plugins continue to validate against the v0.15 wire schema. Use `findings[].source.type` for the precise underlying tool source.

To filter operationally, use:

```bash
agents-shipgate findings --from agents-shipgate-reports/report.json \
--provenance-kind keyword_heuristic,regex_heuristic --json
```

The command reads active findings by default; add `--include-suppressed` when a
reviewer needs suppressed entries in the same provenance summary.

For reviewer-shaped output, also read the **Release Evidence Packet** at `agents-shipgate-reports/packet.{md,json,html}` (and `packet.pdf` when the `[pdf]` extras are installed). Packet outputs are redacted by the same default privacy layer as the report. The packet has fixed reviewer sections governed by [`docs/packet-schema.v0.6.json`](packet-schema.v0.6.json) — see [STABILITY.md §Release Evidence Packet](../STABILITY.md#release-evidence-packet-v06).
Packet schema `0.6` preserves the v0.5 `action_surface_diff` section and
adds two independent additive extensions:
Expand Down
11 changes: 11 additions & 0 deletions docs/report-reading-for-agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,17 @@ Per-finding stable fields (see [`AGENTS.md`](../AGENTS.md) Task 2 for the full l

Group by `severity` to summarize; cite `check_id` so the user can run `agents-shipgate explain <check_id>` for rationale.

For reviewer triage by source reliability, filter on
`findings[].provenance_kind` with the dedicated command:

```bash
agents-shipgate findings --from agents-shipgate-reports/report.json \
--provenance-kind keyword_heuristic,regex_heuristic --json
```

This is not a gate signal. It does not change severity, release decisions,
fingerprints, baselines, or CI exit codes.

### Step 4 · Per-finding autofix fields (v0.7+)

For every active finding, inspect:
Expand Down
6 changes: 6 additions & 0 deletions docs/target-repo-agent-snippets.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,12 @@ auto_apply, propose_patch_for_review, escalate_to_human,
suppress_with_reason, informational. Do not synthesize an action from
the underlying flags when the enum is present.

For reviewer triage by source reliability, run
`agents-shipgate findings --from agents-shipgate-reports/report.json
--provenance-kind keyword_heuristic,regex_heuristic --json`. The
underlying `findings[].provenance_kind` field is a filter signal only,
not a gate input.

To translate a single finding into user-facing prose, run:

agents-shipgate explain-finding <FINGERPRINT> \
Expand Down
17 changes: 15 additions & 2 deletions llms-full.txt
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ Other stable top-level fields:
- `baseline.{matched_count, new_count, resolved_count}`
- `tool_inventory[]`
- `codex_plugin_surface` (v0.13+, static Codex plugin package/marketplace facts)
- `findings[].provenance_kind` (v0.15+, per-finding rule provenance — `static_declaration | ast_extraction | keyword_heuristic | regex_heuristic | policy_pack`; independent of `confidence`, useful for filtering heuristic-only findings)
- `findings[].provenance_kind` (v0.15+, per-finding rule provenance — `static_declaration | ast_extraction | keyword_heuristic | regex_heuristic | policy_pack`; independent of `confidence`, useful for reviewer filtering via `agents-shipgate findings`; never a release-gate input)
- `findings[].blocks_release` (v0.16+, explicit release-policy blockers from Action Surface Diff policies)
- `action_surface_facts` / `action_surface_diff` (v0.16+, deterministic action snapshot and base/head action delta)
- `release_decision.contribution_rules[]` (v0.17+, per-finding audit of how each finding contributed to the decision; one row per `report.findings` entry, with `category` ∈ `{blocker, review_item, excluded}` and `rule` ∈ `{policy_block_new, severity_block_new, policy_baseline_accepted, severity_baseline_accepted, review_required, sub_threshold, suppressed}`)
Expand Down Expand Up @@ -431,6 +431,7 @@ Promised to not break in `0.x` minor versions. See [STABILITY.md](STABILITY.md)
| `agents-shipgate contract` | `--json` |
| `agents-shipgate explain` | `<check_id>`, `--no-plugins`, `--json` |
| `agents-shipgate explain-finding` | `<fingerprint>`, `--from`, `--no-plugins`, `--json` |
| `agents-shipgate findings` | `--from`, `--provenance-kind`, `--include-suppressed`, `--json` |
| `agents-shipgate bootstrap` | `--workspace`, `--confidence`, `--no-ci`, `--no-apply`, `--json` |
| `agents-shipgate list-checks` | `--json`, `--no-plugins` |
| `agents-shipgate baseline save` | `-c`, `--out` |
Expand Down Expand Up @@ -856,7 +857,9 @@ The action exposes these as outputs `decision`, `blocker_count`, `review_item_co

`agents-shipgate contract --json` exposes `manual_review_signals[]` as the
installed CLI's stable list of report/packet fields to inspect for human review
work.
work. `findings[].provenance_kind` is included there as a filter/review signal
only; it never changes the release decision, severity, fingerprints, baselines,
or CI exit behavior.

The capability/intent diff fields (v0.9+), used by reviewers to spot misalignment between declared agent intent and actual tool surface:

Expand Down Expand Up @@ -916,6 +919,16 @@ Per-finding `provenance_kind` enum (v0.15+), additive classification — read th

Provenance generally follows the rule's own trigger (e.g., a rule that checks for a declared manifest field is `static_declaration` even when the underlying Tool was AST-extracted). For framework checks that fire across both AST and declarative tool sources (ADK's per-tool checks against `google_adk_function` AND `google_adk_config` tools), the label tracks the underlying tool's source. Third-party plugin checks that don't yet set the field land at `static_declaration` by default — pre-v0.15 plugins continue to validate against the v0.15 wire schema. Use `findings[].source.type` for the precise underlying tool source.

To filter operationally, use:

```bash
agents-shipgate findings --from agents-shipgate-reports/report.json \
--provenance-kind keyword_heuristic,regex_heuristic --json
```

The command reads active findings by default; add `--include-suppressed` when a
reviewer needs suppressed entries in the same provenance summary.

For reviewer-shaped output, also read the **Release Evidence Packet** at `agents-shipgate-reports/packet.{md,json,html}` (and `packet.pdf` when the `[pdf]` extras are installed). Packet outputs are redacted by the same default privacy layer as the report. The packet has fixed reviewer sections governed by [`docs/packet-schema.v0.6.json`](packet-schema.v0.6.json) — see [STABILITY.md §Release Evidence Packet](../STABILITY.md#release-evidence-packet-v06).
Packet schema `0.6` preserves the v0.5 `action_surface_diff` section and
adds two independent additive extensions:
Expand Down
Loading