Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions .agents/skills/pr-review-auditor/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
---
name: pr-review-auditor
description: Review the currently checked-out PR by discovering its upstream base branch with GitHub CLI, diffing base..HEAD, and performing a correctness-first code review that enforces repository policy with special priority on *GUIDELINES*.md rules and then AGENTS.md. Use when asked to do a full PR review, /review-style audit, commit-range quality check, or policy-compliance review of branch changes.
---

# PR Review Auditor

## Overview

Review branch changes from PR base to `HEAD` with findings-first output.
Prioritize correctness risks, then CI regressions, then test gaps, and enforce
policy precedence: `*GUIDELINES*.md` before `AGENTS.md`.

## Workflow

1. Collect context from the active branch. Run
`.agents/skills/pr-review-auditor/scripts/collect_pr_context.sh` from repo
root to detect:

- checked-out branch,
- PR number and base branch (via `gh pr view`),
- merge-base commit,
- review range (`<base_remote>/<base>..HEAD`),
- changed files,
- available policy files (`*GUIDELINES*.md`, `AGENTS.md`).

2. Inspect commits and patches in range. Run
`.agents/skills/pr-review-auditor/scripts/review_range.sh` with the range
from step 1. This prints:

- commits with full messages (`git log --format=fuller`),
- per-commit changed files,
- unified patches for detailed behavior review.

3. Run a three-pass policy protocol (Markdown-only). Pass 1: Rule extraction

- Read all `*GUIDELINES*.md` files first, then `AGENTS.md`.
- Extract rules section-by-section by heading. For each heading, list actionable
rules before moving to the next heading.
- Extract every actionable rule into a numbered list with source citation.
- Keep wording close to source text and avoid merging unrelated rules.
- Minimum extraction floors:
- `TESTING_GUIDELINES.md`: at least 20 rules unless a heading-by-heading audit
proves fewer.
- `AGENTS.md`: at least 15 rules unless a heading-by-heading audit proves
fewer.
- If below floor, include an explicit justification section that names each
heading and why no additional actionable rule exists. Pass 2: Evidence
mapping
- For each extracted rule, assign `pass`, `fail`, or `n/a`.
- Provide concrete evidence (`file:line`, command output, or explicit absence).
- If rules conflict, cite both and follow `*GUIDELINES*.md`. Pass 3: Adversarial
gap check
- Re-read diffs and policy files only to find missed rules/findings.
- Add new findings only when they include new source citations and evidence.
- Require at least 5 "candidate missed rules" per policy file during this pass,
then mark each as `new rule` or `duplicate` with rationale.

4. Perform review in severity order.

- Correctness/behavioral regressions.
- CI and workflow regressions.
- Test gaps or weak assertions.
- Secondary maintainability/style concerns.

5. Produce findings-first output. List issues with file/line references and
policy citation when applicable. Separate policy findings into two explicit
sections:

- `GUIDELINES findings`
- `AGENTS findings`

6. Provide a remediation plan. Include:

- ordered fix plan,
- tests to run and why,
- tests not run and why,
- assumptions and open questions.

## Output Requirements

- Findings first, ordered by severity.
- Include commit-message quality notes when relevant.
- Cite exact paths and lines for each finding.
- Include `files reviewed`, `tests run`, and `tests not run`.
- If no findings exist, state that explicitly and list residual risks.
- Include a `Rule Coverage Matrix` section in Markdown.
- Do not mark the review complete until every extracted rule has a status and
evidence.
- Do not emit internal script-path fallback chatter; run the canonical
`.agents/skills/pr-review-auditor/scripts/*` paths directly.
- Include a `Rule Extraction Summary` with:
- total rules per source file,
- per-heading rule counts,
- whether extraction floors were met,
- explicit justification if any floor was not met.

## Rule Coverage Matrix Template

Use this exact table shape:

| Rule ID | Source | Rule Summary | Status (pass/fail/n-a) | Evidence |
| ------- | --------------------- | ------------ | ---------------------- | ------------------ |
| G-001 | TESTING_GUIDELINES.md | ... | pass | tests/test_x.py:42 |
| A-001 | AGENTS.md | ... | fail | src/module.py:87 |

## Trigger Examples

- "Figure out this PR's base branch and do a /review from base to HEAD."
- "Audit all commits in this branch and check AGENTS + GUIDELINES compliance."
- "Review patches in this PR and give me a fix plan with test gaps."

## Resources

- `.agents/skills/pr-review-auditor/scripts/collect_pr_context.sh`: derive PR
base/range and policy file inventory.
- `.agents/skills/pr-review-auditor/scripts/review_range.sh`: print full commit
messages and patches for a range.
- `references/review-checklist.md`: compact review checklist and severity
rubric.
4 changes: 4 additions & 0 deletions .agents/skills/pr-review-auditor/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "PR Review Auditor"
short_description: "Review base..HEAD with GUIDELINES-first policy checks."
default_prompt: "Use $pr-review-auditor to review the current PR from upstream base to HEAD with findings-first output and strict GUIDELINES/AGENTS compliance checks."
66 changes: 66 additions & 0 deletions .agents/skills/pr-review-auditor/references/review-checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# PR Review Checklist

## Policy Precedence

1. `*GUIDELINES*.md` rules are highest-priority compliance checks.
2. `AGENTS.md` rules are required secondary checks.
3. If rules conflict, cite both files and follow `*GUIDELINES*.md`.

## Review Scope

- Determine PR base branch and audit `<base_remote>/<base>..HEAD`.
- Review full commit messages, changed files, and relevant patches.
- Focus on behavior, CI impact, tests, then style.

## Three-Pass Completeness Protocol

1. Rule extraction pass:

- Enumerate every actionable rule from `*GUIDELINES*.md`, then `AGENTS.md`.
- Extract heading-by-heading and record per-heading counts.
- Assign stable rule IDs (for example `G-001`, `A-001`).
- Meet extraction floors unless explicitly justified:
- `TESTING_GUIDELINES.md`: >= 20 rules.
- `AGENTS.md`: >= 15 rules.
- If below floor, justify heading-by-heading why no additional actionable
rules exist.

2. Evidence mapping pass:

- Map every rule to `pass`, `fail`, or `n/a`.
- Attach evidence for every status (path/line or concrete command output).

3. Gap pass:

- Re-check changed files and policy text for missing rules/findings.
- Add only findings backed by new source citations and evidence.
- Generate at least 5 missed-rule candidates per policy file, then classify each
as `new rule` or `duplicate` with rationale.

## Severity Order

1. Correctness risk: behavior bug, data loss, runtime failure, API mismatch.
2. CI/workflow regression: broken matrix, wrong triggers, missing job parity.
3. Test gap: missing or weak coverage for changed behavior.
4. Maintainability/style: readability or consistency issues with low break risk.

## Mandatory Output Sections

- Findings (ordered by severity, each with path/line evidence).
- Rule Coverage Matrix (every extracted rule must be represented).
- Rule Extraction Summary (totals per file, per-heading counts, floors met/not
met).
- GUIDELINES findings.
- AGENTS findings.
- Open questions and assumptions.
- Remediation plan with validation steps.
- Files reviewed.
- Tests run.
- Tests not run and why.

## Completion Gate

- The review is incomplete if any extracted rule is missing from the matrix.
- The review is incomplete if any matrix row lacks status or evidence.
- The review is incomplete if extraction floors are missed without explicit
heading-by-heading justification.
82 changes: 82 additions & 0 deletions .agents/skills/pr-review-auditor/scripts/collect_pr_context.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
#!/usr/bin/env bash
set -euo pipefail

branch="$(git rev-parse --abbrev-ref HEAD)"
base_remote="origin"
if git remote get-url upstream > /dev/null 2>&1; then
base_remote="upstream"
fi

context_source="gh-pr-view"
pr_number="N/A"
head_ref="${branch}"
base_ref=""
pr_json=""
gh_error=""

if command -v gh > /dev/null 2>&1 && command -v jq > /dev/null 2>&1; then
gh_error_file="$(mktemp)"
if ! pr_json="$(gh pr view --json number,baseRefName,headRefName 2> "${gh_error_file}")"; then
pr_json=""
fi
if [[ -z "${pr_json}" ]]; then
gh_error="$(< "${gh_error_file}")"
fi
rm -f "${gh_error_file}"
fi

if [[ -n "${pr_json}" && "${pr_json}" != "null" ]]; then
pr_number="$(jq -r '.number' <<< "${pr_json}")"
base_ref="$(jq -r '.baseRefName' <<< "${pr_json}")"
head_ref="$(jq -r '.headRefName' <<< "${pr_json}")"
else
context_source="remote-head-fallback"
remote_head_ref="$(git symbolic-ref --quiet --short "refs/remotes/${base_remote}/HEAD" || true)"
if [[ -z "${remote_head_ref}" ]]; then
echo "error: unable to derive base branch from refs/remotes/${base_remote}/HEAD" >&2
echo "hint: run 'git remote set-head ${base_remote} --auto' and retry" >&2
if [[ -n "${gh_error}" ]]; then
echo "gh pr view: ${gh_error}" >&2
fi
exit 1
fi
base_ref="${remote_head_ref#"${base_remote}"/}"
if [[ -n "${gh_error}" ]]; then
echo "warning: gh pr view failed; using ${base_remote}/HEAD fallback" >&2
echo "gh pr view: ${gh_error}" >&2
fi
fi

fetch_error_file="$(mktemp)"
if ! git fetch --quiet "${base_remote}" "${base_ref}" 2> "${fetch_error_file}"; then
if git rev-parse --verify --quiet "${base_remote}/${base_ref}" > /dev/null; then
echo "warning: git fetch failed; using cached ${base_remote}/${base_ref}" >&2
cat "${fetch_error_file}" >&2
else
echo "error: unable to fetch ${base_remote}/${base_ref} and no local cached ref exists" >&2
cat "${fetch_error_file}" >&2
rm -f "${fetch_error_file}"
exit 1
fi
fi
rm -f "${fetch_error_file}"

merge_base="$(git merge-base "${base_remote}/${base_ref}" HEAD)"
review_range="${base_remote}/${base_ref}..HEAD"

echo "CONTEXT_SOURCE=${context_source}"
echo "PR_NUMBER=${pr_number}"
echo "CURRENT_BRANCH=${branch}"
echo "PR_HEAD_BRANCH=${head_ref}"
echo "PR_BASE_BRANCH=${base_ref}"
echo "PR_BASE_REMOTE=${base_remote}"
echo "MERGE_BASE=${merge_base}"
echo "REVIEW_RANGE=${review_range}"
echo

echo "CHANGED_FILES:"
git diff --name-only "${review_range}"
echo

echo "POLICY_FILES:"
find . -type f \( -name '*GUIDELINES*.md' -o -name 'AGENTS.md' \) | sort
24 changes: 24 additions & 0 deletions .agents/skills/pr-review-auditor/scripts/review_range.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/usr/bin/env bash
set -euo pipefail

range="${1:-}"
if [[ -z "${range}" ]]; then
echo "usage: $(basename "$0") <git-range>" >&2
echo "example: $(basename "$0") upstream/main..HEAD" >&2
exit 1
fi

echo "=== RANGE ==="
echo "${range}"
echo

echo "=== COMMITS (full messages) ==="
git log --format=fuller "${range}"
echo

echo "=== COMMIT FILE SUMMARIES ==="
git log --name-status --format='commit %H%nAuthor: %an <%ae>%nDate: %ad%n%n%s%n%b' "${range}"
echo

echo "=== PATCHES ==="
git log --patch --stat "${range}"
35 changes: 32 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,38 @@
- `uvx nox -s tests` — create matrix virtualenvs via nox and execute the pytest
session.
- `nox` executes pytest sessions with built-in parallelism; when invoking pytest
directly use `pytest -n 8 --maxschedchunk 2` to mirror the parallel test
directly use `pytest -n auto --maxschedchunk 2` to mirror the parallel test
scheduling and keep runtimes predictable.
- Coverage reports (XML/Markdown/HTML) are produced by the nox `tests` session
and stored under `coverage/py<version>/` (for example,
`coverage/py3.12/coverage.xml`, `coverage/py3.12/coverage.md`,
`coverage/py3.12/html/`).

## Code Quality Checklist

- When code changes are complete, or when asked to "check code quality", run
this default sequence:
- `uv run ruff format .`
- `uv run ruff check .`
- `uv run basedpyright`
- Do not include pytest or nox runs in the default "code quality" request; treat
runtime tests as a separate validation step.

## Test Validation Checklist

- Run tests separately from code quality checks:
- `uv run pytest -n auto --maxschedchunk 2` (or a focused module when
iterating)
- For full compatibility before handoff/PR, run:
- `uvx nox -s tests` (Python 3.10-3.14 matrix + coverage artifacts)
- For class-function coverage gate validation (when relevant to client changes),
run:
- `uv run python scripts/check_class_function_coverage.py coverage/py<version>/coverage.json --fail-under 90 --class PdfRestClient --class AsyncPdfRestClient --class _FilesClient --class _AsyncFilesClient`
- Always report:
- files changed
- tests/checks run and not run
- why any checks were skipped

## Coding Style & Naming Conventions

- Target Python 3.10–3.14; use 4-space indentation and type hints for public
Expand Down Expand Up @@ -125,7 +150,9 @@
a shared validation suite when multiple endpoints rely on the same input rules
(e.g., `tests/test_graphic_payload_validation.py`).
- Do not import from private modules (names beginning with an underscore) in
tests or production code—expose any shared helpers via a public module first.
production code. In tests, prefer public modules first; allow private-model
imports only when necessary to validate request serialization or mock
server-facing payload contracts that are not exposed publicly.

## Testing Guidelines

Expand All @@ -134,7 +161,9 @@
considered complete. Mirror the naming/structure used by the graphic
conversion suites: one module per endpoint, parameterized success cases that
enumerate all accepted literals, at least one invalid input that hits the
server, and coverage for any request options surfaced on the client. If an
server, and coverage for server-observable endpoint options. Validate
`extra_query`/`extra_headers`/`extra_body`/`timeout` plumbing in unit tests
(MockTransport) unless a live assertion depends on those options. If an
endpoint cannot be exercised live, call that out explicitly in the PR
description with the reason and the follow-up plan; otherwise reviewers should
block the change. Treat this as a release gate on par with unit tests.
Expand Down
11 changes: 9 additions & 2 deletions TESTING_GUIDELINES.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,15 +201,22 @@ iteration required.
predictable. Use `pytest.fixture(scope="module"/"class")` and
`pytest_asyncio.fixture` to cache uploaded PDFs/profiles for both transports.
- **Sync + async parity:** Every live module should contain matching sync and
async tests for success, customization, streaming, and invalid paths
(compression levels, conversion options, file streaming helpers).
async tests for success, streaming, and invalid paths (compression levels,
conversion options, file streaming helpers). Treat transport-plumbing
customization (`extra_query`, `extra_headers`, `extra_body`, `timeout`) as a
unit-test responsibility unless the endpoint exposes server-observable
behavior tied to those options.
- **Enumerate literals:** Parameterize over every accepted literal (compression
levels, `color_model`, `smoothing`, merge selectors, redaction presets). Each
literal should hit the server once per transport.
- **Optional arguments:** Exercise options such as custom output prefixes,
diagnostics toggles, merge metadata, and URL uploads. Validate the server
honors them (filenames start with the user-provided prefix, warnings appear
when expected).
- **Customization placement:** Verify request customization wiring in mocked
tests (sync + async) for each endpoint. Live customization tests are optional
and should be added only when they validate server-side behavior beyond client
request assembly.
- **Negative live cases:** Override JSON via `extra_body`/`extra_query` to
bypass local validation and assert `PdfRestApiError` (or the exact server
exception) surfaces—for example, sending an invalid compression literal or
Expand Down