diff --git a/.agents/skills/pr-review-auditor/SKILL.md b/.agents/skills/pr-review-auditor/SKILL.md new file mode 100644 index 00000000..856c4507 --- /dev/null +++ b/.agents/skills/pr-review-auditor/SKILL.md @@ -0,0 +1,120 @@ +--- +name: pr-review-auditor +description: Review the currently checked-out PR by discovering its upstream base branch with GitHub CLI, diffing base..HEAD, and performing a correctness-first code review that enforces repository policy with special priority on *GUIDELINES*.md rules and then AGENTS.md. Use when asked to do a full PR review, /review-style audit, commit-range quality check, or policy-compliance review of branch changes. +--- + +# PR Review Auditor + +## Overview + +Review branch changes from PR base to `HEAD` with findings-first output. +Prioritize correctness risks, then CI regressions, then test gaps, and enforce +policy precedence: `*GUIDELINES*.md` before `AGENTS.md`. + +## Workflow + +1. Collect context from the active branch. Run + `.agents/skills/pr-review-auditor/scripts/collect_pr_context.sh` from repo + root to detect: + +- checked-out branch, +- PR number and base branch (via `gh pr view`), +- merge-base commit, +- review range (`/..HEAD`), +- changed files, +- available policy files (`*GUIDELINES*.md`, `AGENTS.md`). + +2. Inspect commits and patches in range. Run + `.agents/skills/pr-review-auditor/scripts/review_range.sh` with the range + from step 1. This prints: + +- commits with full messages (`git log --format=fuller`), +- per-commit changed files, +- unified patches for detailed behavior review. + +3. Run a three-pass policy protocol (Markdown-only). Pass 1: Rule extraction + +- Read all `*GUIDELINES*.md` files first, then `AGENTS.md`. +- Extract rules section-by-section by heading. For each heading, list actionable + rules before moving to the next heading. +- Extract every actionable rule into a numbered list with source citation. +- Keep wording close to source text and avoid merging unrelated rules. +- Minimum extraction floors: + - `TESTING_GUIDELINES.md`: at least 20 rules unless a heading-by-heading audit + proves fewer. + - `AGENTS.md`: at least 15 rules unless a heading-by-heading audit proves + fewer. + - If below floor, include an explicit justification section that names each + heading and why no additional actionable rule exists. Pass 2: Evidence + mapping +- For each extracted rule, assign `pass`, `fail`, or `n/a`. +- Provide concrete evidence (`file:line`, command output, or explicit absence). +- If rules conflict, cite both and follow `*GUIDELINES*.md`. Pass 3: Adversarial + gap check +- Re-read diffs and policy files only to find missed rules/findings. +- Add new findings only when they include new source citations and evidence. +- Require at least 5 "candidate missed rules" per policy file during this pass, + then mark each as `new rule` or `duplicate` with rationale. + +4. Perform review in severity order. + +- Correctness/behavioral regressions. +- CI and workflow regressions. +- Test gaps or weak assertions. +- Secondary maintainability/style concerns. + +5. Produce findings-first output. List issues with file/line references and + policy citation when applicable. Separate policy findings into two explicit + sections: + +- `GUIDELINES findings` +- `AGENTS findings` + +6. Provide a remediation plan. Include: + +- ordered fix plan, +- tests to run and why, +- tests not run and why, +- assumptions and open questions. + +## Output Requirements + +- Findings first, ordered by severity. +- Include commit-message quality notes when relevant. +- Cite exact paths and lines for each finding. +- Include `files reviewed`, `tests run`, and `tests not run`. +- If no findings exist, state that explicitly and list residual risks. +- Include a `Rule Coverage Matrix` section in Markdown. +- Do not mark the review complete until every extracted rule has a status and + evidence. +- Do not emit internal script-path fallback chatter; run the canonical + `.agents/skills/pr-review-auditor/scripts/*` paths directly. +- Include a `Rule Extraction Summary` with: + - total rules per source file, + - per-heading rule counts, + - whether extraction floors were met, + - explicit justification if any floor was not met. + +## Rule Coverage Matrix Template + +Use this exact table shape: + +| Rule ID | Source | Rule Summary | Status (pass/fail/n-a) | Evidence | +| ------- | --------------------- | ------------ | ---------------------- | ------------------ | +| G-001 | TESTING_GUIDELINES.md | ... | pass | tests/test_x.py:42 | +| A-001 | AGENTS.md | ... | fail | src/module.py:87 | + +## Trigger Examples + +- "Figure out this PR's base branch and do a /review from base to HEAD." +- "Audit all commits in this branch and check AGENTS + GUIDELINES compliance." +- "Review patches in this PR and give me a fix plan with test gaps." + +## Resources + +- `.agents/skills/pr-review-auditor/scripts/collect_pr_context.sh`: derive PR + base/range and policy file inventory. +- `.agents/skills/pr-review-auditor/scripts/review_range.sh`: print full commit + messages and patches for a range. +- `references/review-checklist.md`: compact review checklist and severity + rubric. diff --git a/.agents/skills/pr-review-auditor/agents/openai.yaml b/.agents/skills/pr-review-auditor/agents/openai.yaml new file mode 100644 index 00000000..8639e469 --- /dev/null +++ b/.agents/skills/pr-review-auditor/agents/openai.yaml @@ -0,0 +1,4 @@ +interface: + display_name: "PR Review Auditor" + short_description: "Review base..HEAD with GUIDELINES-first policy checks." + default_prompt: "Use $pr-review-auditor to review the current PR from upstream base to HEAD with findings-first output and strict GUIDELINES/AGENTS compliance checks." diff --git a/.agents/skills/pr-review-auditor/references/review-checklist.md b/.agents/skills/pr-review-auditor/references/review-checklist.md new file mode 100644 index 00000000..8b184640 --- /dev/null +++ b/.agents/skills/pr-review-auditor/references/review-checklist.md @@ -0,0 +1,66 @@ +# PR Review Checklist + +## Policy Precedence + +1. `*GUIDELINES*.md` rules are highest-priority compliance checks. +2. `AGENTS.md` rules are required secondary checks. +3. If rules conflict, cite both files and follow `*GUIDELINES*.md`. + +## Review Scope + +- Determine PR base branch and audit `/..HEAD`. +- Review full commit messages, changed files, and relevant patches. +- Focus on behavior, CI impact, tests, then style. + +## Three-Pass Completeness Protocol + +1. Rule extraction pass: + +- Enumerate every actionable rule from `*GUIDELINES*.md`, then `AGENTS.md`. +- Extract heading-by-heading and record per-heading counts. +- Assign stable rule IDs (for example `G-001`, `A-001`). +- Meet extraction floors unless explicitly justified: + - `TESTING_GUIDELINES.md`: >= 20 rules. + - `AGENTS.md`: >= 15 rules. + - If below floor, justify heading-by-heading why no additional actionable + rules exist. + +2. Evidence mapping pass: + +- Map every rule to `pass`, `fail`, or `n/a`. +- Attach evidence for every status (path/line or concrete command output). + +3. Gap pass: + +- Re-check changed files and policy text for missing rules/findings. +- Add only findings backed by new source citations and evidence. +- Generate at least 5 missed-rule candidates per policy file, then classify each + as `new rule` or `duplicate` with rationale. + +## Severity Order + +1. Correctness risk: behavior bug, data loss, runtime failure, API mismatch. +2. CI/workflow regression: broken matrix, wrong triggers, missing job parity. +3. Test gap: missing or weak coverage for changed behavior. +4. Maintainability/style: readability or consistency issues with low break risk. + +## Mandatory Output Sections + +- Findings (ordered by severity, each with path/line evidence). +- Rule Coverage Matrix (every extracted rule must be represented). +- Rule Extraction Summary (totals per file, per-heading counts, floors met/not + met). +- GUIDELINES findings. +- AGENTS findings. +- Open questions and assumptions. +- Remediation plan with validation steps. +- Files reviewed. +- Tests run. +- Tests not run and why. + +## Completion Gate + +- The review is incomplete if any extracted rule is missing from the matrix. +- The review is incomplete if any matrix row lacks status or evidence. +- The review is incomplete if extraction floors are missed without explicit + heading-by-heading justification. diff --git a/.agents/skills/pr-review-auditor/scripts/collect_pr_context.sh b/.agents/skills/pr-review-auditor/scripts/collect_pr_context.sh new file mode 100755 index 00000000..4218877d --- /dev/null +++ b/.agents/skills/pr-review-auditor/scripts/collect_pr_context.sh @@ -0,0 +1,82 @@ +#!/usr/bin/env bash +set -euo pipefail + +branch="$(git rev-parse --abbrev-ref HEAD)" +base_remote="origin" +if git remote get-url upstream > /dev/null 2>&1; then + base_remote="upstream" +fi + +context_source="gh-pr-view" +pr_number="N/A" +head_ref="${branch}" +base_ref="" +pr_json="" +gh_error="" + +if command -v gh > /dev/null 2>&1 && command -v jq > /dev/null 2>&1; then + gh_error_file="$(mktemp)" + if ! pr_json="$(gh pr view --json number,baseRefName,headRefName 2> "${gh_error_file}")"; then + pr_json="" + fi + if [[ -z "${pr_json}" ]]; then + gh_error="$(< "${gh_error_file}")" + fi + rm -f "${gh_error_file}" +fi + +if [[ -n "${pr_json}" && "${pr_json}" != "null" ]]; then + pr_number="$(jq -r '.number' <<< "${pr_json}")" + base_ref="$(jq -r '.baseRefName' <<< "${pr_json}")" + head_ref="$(jq -r '.headRefName' <<< "${pr_json}")" +else + context_source="remote-head-fallback" + remote_head_ref="$(git symbolic-ref --quiet --short "refs/remotes/${base_remote}/HEAD" || true)" + if [[ -z "${remote_head_ref}" ]]; then + echo "error: unable to derive base branch from refs/remotes/${base_remote}/HEAD" >&2 + echo "hint: run 'git remote set-head ${base_remote} --auto' and retry" >&2 + if [[ -n "${gh_error}" ]]; then + echo "gh pr view: ${gh_error}" >&2 + fi + exit 1 + fi + base_ref="${remote_head_ref#"${base_remote}"/}" + if [[ -n "${gh_error}" ]]; then + echo "warning: gh pr view failed; using ${base_remote}/HEAD fallback" >&2 + echo "gh pr view: ${gh_error}" >&2 + fi +fi + +fetch_error_file="$(mktemp)" +if ! git fetch --quiet "${base_remote}" "${base_ref}" 2> "${fetch_error_file}"; then + if git rev-parse --verify --quiet "${base_remote}/${base_ref}" > /dev/null; then + echo "warning: git fetch failed; using cached ${base_remote}/${base_ref}" >&2 + cat "${fetch_error_file}" >&2 + else + echo "error: unable to fetch ${base_remote}/${base_ref} and no local cached ref exists" >&2 + cat "${fetch_error_file}" >&2 + rm -f "${fetch_error_file}" + exit 1 + fi +fi +rm -f "${fetch_error_file}" + +merge_base="$(git merge-base "${base_remote}/${base_ref}" HEAD)" +review_range="${base_remote}/${base_ref}..HEAD" + +echo "CONTEXT_SOURCE=${context_source}" +echo "PR_NUMBER=${pr_number}" +echo "CURRENT_BRANCH=${branch}" +echo "PR_HEAD_BRANCH=${head_ref}" +echo "PR_BASE_BRANCH=${base_ref}" +echo "PR_BASE_REMOTE=${base_remote}" +echo "MERGE_BASE=${merge_base}" +echo "REVIEW_RANGE=${review_range}" +echo + +echo "CHANGED_FILES:" +git diff --name-only "${review_range}" +echo + +echo "POLICY_FILES:" +find . -type f \( -name '*GUIDELINES*.md' -o -name 'AGENTS.md' \) | sort diff --git a/.agents/skills/pr-review-auditor/scripts/review_range.sh b/.agents/skills/pr-review-auditor/scripts/review_range.sh new file mode 100755 index 00000000..8c6a61ed --- /dev/null +++ b/.agents/skills/pr-review-auditor/scripts/review_range.sh @@ -0,0 +1,24 @@ +#!/usr/bin/env bash +set -euo pipefail + +range="${1:-}" +if [[ -z "${range}" ]]; then + echo "usage: $(basename "$0") " >&2 + echo "example: $(basename "$0") upstream/main..HEAD" >&2 + exit 1 +fi + +echo "=== RANGE ===" +echo "${range}" +echo + +echo "=== COMMITS (full messages) ===" +git log --format=fuller "${range}" +echo + +echo "=== COMMIT FILE SUMMARIES ===" +git log --name-status --format='commit %H%nAuthor: %an <%ae>%nDate: %ad%n%n%s%n%b' "${range}" +echo + +echo "=== PATCHES ===" +git log --patch --stat "${range}" diff --git a/AGENTS.md b/AGENTS.md index 4dc7de69..632e009a 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -23,13 +23,38 @@ - `uvx nox -s tests` — create matrix virtualenvs via nox and execute the pytest session. - `nox` executes pytest sessions with built-in parallelism; when invoking pytest - directly use `pytest -n 8 --maxschedchunk 2` to mirror the parallel test + directly use `pytest -n auto --maxschedchunk 2` to mirror the parallel test scheduling and keep runtimes predictable. - Coverage reports (XML/Markdown/HTML) are produced by the nox `tests` session and stored under `coverage/py/` (for example, `coverage/py3.12/coverage.xml`, `coverage/py3.12/coverage.md`, `coverage/py3.12/html/`). +## Code Quality Checklist + +- When code changes are complete, or when asked to "check code quality", run + this default sequence: + - `uv run ruff format .` + - `uv run ruff check .` + - `uv run basedpyright` +- Do not include pytest or nox runs in the default "code quality" request; treat + runtime tests as a separate validation step. + +## Test Validation Checklist + +- Run tests separately from code quality checks: + - `uv run pytest -n auto --maxschedchunk 2` (or a focused module when + iterating) +- For full compatibility before handoff/PR, run: + - `uvx nox -s tests` (Python 3.10-3.14 matrix + coverage artifacts) +- For class-function coverage gate validation (when relevant to client changes), + run: + - `uv run python scripts/check_class_function_coverage.py coverage/py/coverage.json --fail-under 90 --class PdfRestClient --class AsyncPdfRestClient --class _FilesClient --class _AsyncFilesClient` +- Always report: + - files changed + - tests/checks run and not run + - why any checks were skipped + ## Coding Style & Naming Conventions - Target Python 3.10–3.14; use 4-space indentation and type hints for public @@ -125,7 +150,9 @@ a shared validation suite when multiple endpoints rely on the same input rules (e.g., `tests/test_graphic_payload_validation.py`). - Do not import from private modules (names beginning with an underscore) in - tests or production code—expose any shared helpers via a public module first. + production code. In tests, prefer public modules first; allow private-model + imports only when necessary to validate request serialization or mock + server-facing payload contracts that are not exposed publicly. ## Testing Guidelines @@ -134,7 +161,9 @@ considered complete. Mirror the naming/structure used by the graphic conversion suites: one module per endpoint, parameterized success cases that enumerate all accepted literals, at least one invalid input that hits the - server, and coverage for any request options surfaced on the client. If an + server, and coverage for server-observable endpoint options. Validate + `extra_query`/`extra_headers`/`extra_body`/`timeout` plumbing in unit tests + (MockTransport) unless a live assertion depends on those options. If an endpoint cannot be exercised live, call that out explicitly in the PR description with the reason and the follow-up plan; otherwise reviewers should block the change. Treat this as a release gate on par with unit tests. diff --git a/TESTING_GUIDELINES.md b/TESTING_GUIDELINES.md index 853cd4ae..08f21f42 100644 --- a/TESTING_GUIDELINES.md +++ b/TESTING_GUIDELINES.md @@ -201,8 +201,11 @@ iteration required. predictable. Use `pytest.fixture(scope="module"/"class")` and `pytest_asyncio.fixture` to cache uploaded PDFs/profiles for both transports. - **Sync + async parity:** Every live module should contain matching sync and - async tests for success, customization, streaming, and invalid paths - (compression levels, conversion options, file streaming helpers). + async tests for success, streaming, and invalid paths (compression levels, + conversion options, file streaming helpers). Treat transport-plumbing + customization (`extra_query`, `extra_headers`, `extra_body`, `timeout`) as a + unit-test responsibility unless the endpoint exposes server-observable + behavior tied to those options. - **Enumerate literals:** Parameterize over every accepted literal (compression levels, `color_model`, `smoothing`, merge selectors, redaction presets). Each literal should hit the server once per transport. @@ -210,6 +213,10 @@ iteration required. diagnostics toggles, merge metadata, and URL uploads. Validate the server honors them (filenames start with the user-provided prefix, warnings appear when expected). +- **Customization placement:** Verify request customization wiring in mocked + tests (sync + async) for each endpoint. Live customization tests are optional + and should be added only when they validate server-side behavior beyond client + request assembly. - **Negative live cases:** Override JSON via `extra_body`/`extra_query` to bypass local validation and assert `PdfRestApiError` (or the exact server exception) surfaces—for example, sending an invalid compression literal or