pdfrest · datalogics-cgreen · Feb 10, 2026 · Feb 9, 2026 · Feb 9, 2026 · Feb 9, 2026
diff --git a/.agents/skills/pr-review-auditor/SKILL.md b/.agents/skills/pr-review-auditor/SKILL.md
@@ -0,0 +1,120 @@
+---
+name: pr-review-auditor
+description: Review the currently checked-out PR by discovering its upstream base branch with GitHub CLI, diffing base..HEAD, and performing a correctness-first code review that enforces repository policy with special priority on *GUIDELINES*.md rules and then AGENTS.md. Use when asked to do a full PR review, /review-style audit, commit-range quality check, or policy-compliance review of branch changes.
+---
+
+# PR Review Auditor
+
+## Overview
+
+Review branch changes from PR base to `HEAD` with findings-first output.
+Prioritize correctness risks, then CI regressions, then test gaps, and enforce
+policy precedence: `*GUIDELINES*.md` before `AGENTS.md`.
+
+## Workflow
+
+1. Collect context from the active branch. Run
+   `.agents/skills/pr-review-auditor/scripts/collect_pr_context.sh` from repo
+   root to detect:
+
+- checked-out branch,
+- PR number and base branch (via `gh pr view`),
+- merge-base commit,
+- review range (`<base_remote>/<base>..HEAD`),
+- changed files,
+- available policy files (`*GUIDELINES*.md`, `AGENTS.md`).
+
+2. Inspect commits and patches in range. Run
+   `.agents/skills/pr-review-auditor/scripts/review_range.sh` with the range
+   from step 1. This prints:
+
+- commits with full messages (`git log --format=fuller`),
+- per-commit changed files,
+- unified patches for detailed behavior review.
+
+3. Run a three-pass policy protocol (Markdown-only). Pass 1: Rule extraction
+
+- Read all `*GUIDELINES*.md` files first, then `AGENTS.md`.
+- Extract rules section-by-section by heading. For each heading, list actionable
+  rules before moving to the next heading.
+- Extract every actionable rule into a numbered list with source citation.
+- Keep wording close to source text and avoid merging unrelated rules.
+- Minimum extraction floors:
+  - `TESTING_GUIDELINES.md`: at least 20 rules unless a heading-by-heading audit
+    proves fewer.
+  - `AGENTS.md`: at least 15 rules unless a heading-by-heading audit proves
+    fewer.
+  - If below floor, include an explicit justification section that names each
+    heading and why no additional actionable rule exists. Pass 2: Evidence
+    mapping
+- For each extracted rule, assign `pass`, `fail`, or `n/a`.
+- Provide concrete evidence (`file:line`, command output, or explicit absence).
+- If rules conflict, cite both and follow `*GUIDELINES*.md`. Pass 3: Adversarial
+  gap check
+- Re-read diffs and policy files only to find missed rules/findings.
+- Add new findings only when they include new source citations and evidence.
+- Require at least 5 "candidate missed rules" per policy file during this pass,
+  then mark each as `new rule` or `duplicate` with rationale.
+
+4. Perform review in severity order.
+
+- Correctness/behavioral regressions.
+- CI and workflow regressions.
+- Test gaps or weak assertions.
+- Secondary maintainability/style concerns.
+
+5. Produce findings-first output. List issues with file/line references and
+   policy citation when applicable. Separate policy findings into two explicit
+   sections:
+
+- `GUIDELINES findings`
+- `AGENTS findings`
+
+6. Provide a remediation plan. Include:
+
+- ordered fix plan,
+- tests to run and why,
+- tests not run and why,
+- assumptions and open questions.
+
+## Output Requirements
+
+- Findings first, ordered by severity.
+- Include commit-message quality notes when relevant.
+- Cite exact paths and lines for each finding.
+- Include `files reviewed`, `tests run`, and `tests not run`.
+- If no findings exist, state that explicitly and list residual risks.
+- Include a `Rule Coverage Matrix` section in Markdown.
+- Do not mark the review complete until every extracted rule has a status and
+  evidence.
+- Do not emit internal script-path fallback chatter; run the canonical
+  `.agents/skills/pr-review-auditor/scripts/*` paths directly.
+- Include a `Rule Extraction Summary` with:
+  - total rules per source file,
+  - per-heading rule counts,
+  - whether extraction floors were met,
+  - explicit justification if any floor was not met.
+
+## Rule Coverage Matrix Template
+
+Use this exact table shape:
+
+| Rule ID | Source                | Rule Summary | Status (pass/fail/n-a) | Evidence           |
+| ------- | --------------------- | ------------ | ---------------------- | ------------------ |
+| G-001   | TESTING_GUIDELINES.md | ...          | pass                   | tests/test_x.py:42 |
+| A-001   | AGENTS.md             | ...          | fail                   | src/module.py:87   |
+
+## Trigger Examples
+
+- "Figure out this PR's base branch and do a /review from base to HEAD."
+- "Audit all commits in this branch and check AGENTS + GUIDELINES compliance."
+- "Review patches in this PR and give me a fix plan with test gaps."
+
+## Resources
+
+- `.agents/skills/pr-review-auditor/scripts/collect_pr_context.sh`: derive PR
+  base/range and policy file inventory.
+- `.agents/skills/pr-review-auditor/scripts/review_range.sh`: print full commit
+  messages and patches for a range.
+- `references/review-checklist.md`: compact review checklist and severity
+  rubric.
diff --git a/.agents/skills/pr-review-auditor/agents/openai.yaml b/.agents/skills/pr-review-auditor/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "PR Review Auditor"
+  short_description: "Review base..HEAD with GUIDELINES-first policy checks."
+  default_prompt: "Use $pr-review-auditor to review the current PR from upstream base to HEAD with findings-first output and strict GUIDELINES/AGENTS compliance checks."
diff --git a/.agents/skills/pr-review-auditor/references/review-checklist.md b/.agents/skills/pr-review-auditor/references/review-checklist.md
@@ -0,0 +1,66 @@
+# PR Review Checklist
+
+## Policy Precedence
+
+1. `*GUIDELINES*.md` rules are highest-priority compliance checks.
+2. `AGENTS.md` rules are required secondary checks.
+3. If rules conflict, cite both files and follow `*GUIDELINES*.md`.
+
+## Review Scope
+
+- Determine PR base branch and audit `<base_remote>/<base>..HEAD`.
+- Review full commit messages, changed files, and relevant patches.
+- Focus on behavior, CI impact, tests, then style.
+
+## Three-Pass Completeness Protocol
+
+1. Rule extraction pass:
+
+- Enumerate every actionable rule from `*GUIDELINES*.md`, then `AGENTS.md`.
+- Extract heading-by-heading and record per-heading counts.
+- Assign stable rule IDs (for example `G-001`, `A-001`).
+- Meet extraction floors unless explicitly justified:
+  - `TESTING_GUIDELINES.md`: >= 20 rules.
+  - `AGENTS.md`: >= 15 rules.
+  - If below floor, justify heading-by-heading why no additional actionable
+    rules exist.
+
+2. Evidence mapping pass:
+
+- Map every rule to `pass`, `fail`, or `n/a`.
+- Attach evidence for every status (path/line or concrete command output).
+
+3. Gap pass:
+
+- Re-check changed files and policy text for missing rules/findings.
+- Add only findings backed by new source citations and evidence.
+- Generate at least 5 missed-rule candidates per policy file, then classify each
+  as `new rule` or `duplicate` with rationale.
+
+## Severity Order
+
+1. Correctness risk: behavior bug, data loss, runtime failure, API mismatch.
+2. CI/workflow regression: broken matrix, wrong triggers, missing job parity.
+3. Test gap: missing or weak coverage for changed behavior.
+4. Maintainability/style: readability or consistency issues with low break risk.
+
+## Mandatory Output Sections
+
+- Findings (ordered by severity, each with path/line evidence).
+- Rule Coverage Matrix (every extracted rule must be represented).
+- Rule Extraction Summary (totals per file, per-heading counts, floors met/not
+  met).
+- GUIDELINES findings.
+- AGENTS findings.
+- Open questions and assumptions.
+- Remediation plan with validation steps.
+- Files reviewed.
+- Tests run.
+- Tests not run and why.
+
+## Completion Gate
+
+- The review is incomplete if any extracted rule is missing from the matrix.
+- The review is incomplete if any matrix row lacks status or evidence.
+- The review is incomplete if extraction floors are missed without explicit
+  heading-by-heading justification.
diff --git a/.agents/skills/pr-review-auditor/scripts/collect_pr_context.sh b/.agents/skills/pr-review-auditor/scripts/collect_pr_context.sh
@@ -0,0 +1,82 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+branch="$(git rev-parse --abbrev-ref HEAD)"
+base_remote="origin"
+if git remote get-url upstream > /dev/null 2>&1; then
+    base_remote="upstream"
+fi
+
+context_source="gh-pr-view"
+pr_number="N/A"
+head_ref="${branch}"
+base_ref=""
+pr_json=""
+gh_error=""
+
+if command -v gh > /dev/null 2>&1 && command -v jq > /dev/null 2>&1; then
+    gh_error_file="$(mktemp)"
+    if ! pr_json="$(gh pr view --json number,baseRefName,headRefName 2> "${gh_error_file}")"; then
+        pr_json=""
+    fi
+    if [[ -z "${pr_json}" ]]; then
+        gh_error="$(< "${gh_error_file}")"
+    fi
+    rm -f "${gh_error_file}"
+fi
+
+if [[ -n "${pr_json}" && "${pr_json}" != "null" ]]; then
+    pr_number="$(jq -r '.number' <<< "${pr_json}")"
+    base_ref="$(jq -r '.baseRefName' <<< "${pr_json}")"
+    head_ref="$(jq -r '.headRefName' <<< "${pr_json}")"
+else
+    context_source="remote-head-fallback"
+    remote_head_ref="$(git symbolic-ref --quiet --short "refs/remotes/${base_remote}/HEAD" || true)"
+    if [[ -z "${remote_head_ref}" ]]; then
+        echo "error: unable to derive base branch from refs/remotes/${base_remote}/HEAD" >&2
+        echo "hint: run 'git remote set-head ${base_remote} --auto' and retry" >&2
+        if [[ -n "${gh_error}" ]]; then
+            echo "gh pr view: ${gh_error}" >&2
+        fi
+        exit 1
+    fi
+    base_ref="${remote_head_ref#"${base_remote}"/}"
+    if [[ -n "${gh_error}" ]]; then
+        echo "warning: gh pr view failed; using ${base_remote}/HEAD fallback" >&2
+        echo "gh pr view: ${gh_error}" >&2
+    fi
+fi
+
+fetch_error_file="$(mktemp)"
+if ! git fetch --quiet "${base_remote}" "${base_ref}" 2> "${fetch_error_file}"; then
+    if git rev-parse --verify --quiet "${base_remote}/${base_ref}" > /dev/null; then
+        echo "warning: git fetch failed; using cached ${base_remote}/${base_ref}" >&2
+        cat "${fetch_error_file}" >&2
+    else
+        echo "error: unable to fetch ${base_remote}/${base_ref} and no local cached ref exists" >&2
+        cat "${fetch_error_file}" >&2
+        rm -f "${fetch_error_file}"
+        exit 1
+    fi
+fi
+rm -f "${fetch_error_file}"
+
+merge_base="$(git merge-base "${base_remote}/${base_ref}" HEAD)"
+review_range="${base_remote}/${base_ref}..HEAD"
+
+echo "CONTEXT_SOURCE=${context_source}"
+echo "PR_NUMBER=${pr_number}"
+echo "CURRENT_BRANCH=${branch}"
+echo "PR_HEAD_BRANCH=${head_ref}"
+echo "PR_BASE_BRANCH=${base_ref}"
+echo "PR_BASE_REMOTE=${base_remote}"
+echo "MERGE_BASE=${merge_base}"
+echo "REVIEW_RANGE=${review_range}"
+echo
+
+echo "CHANGED_FILES:"
+git diff --name-only "${review_range}"
+echo
+
+echo "POLICY_FILES:"
+find . -type f \( -name '*GUIDELINES*.md' -o -name 'AGENTS.md' \) | sort
diff --git a/.agents/skills/pr-review-auditor/scripts/review_range.sh b/.agents/skills/pr-review-auditor/scripts/review_range.sh
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+range="${1:-}"
+if [[ -z "${range}" ]]; then
+    echo "usage: $(basename "$0") <git-range>" >&2
+    echo "example: $(basename "$0") upstream/main..HEAD" >&2
+    exit 1
+fi
+
+echo "=== RANGE ==="
+echo "${range}"
+echo
+
+echo "=== COMMITS (full messages) ==="
+git log --format=fuller "${range}"
+echo
+
+echo "=== COMMIT FILE SUMMARIES ==="
+git log --name-status --format='commit %H%nAuthor: %an <%ae>%nDate: %ad%n%n%s%n%b' "${range}"
+echo
+
+echo "=== PATCHES ==="
+git log --patch --stat "${range}"
diff --git a/AGENTS.md b/AGENTS.md
@@ -23,13 +23,38 @@
 - `uvx nox -s tests` — create matrix virtualenvs via nox and execute the pytest
   session.
 - `nox` executes pytest sessions with built-in parallelism; when invoking pytest
-  directly use `pytest -n 8 --maxschedchunk 2` to mirror the parallel test
+  directly use `pytest -n auto --maxschedchunk 2` to mirror the parallel test
   scheduling and keep runtimes predictable.
 - Coverage reports (XML/Markdown/HTML) are produced by the nox `tests` session
   and stored under `coverage/py<version>/` (for example,
   `coverage/py3.12/coverage.xml`, `coverage/py3.12/coverage.md`,
   `coverage/py3.12/html/`).
 
+## Code Quality Checklist
+
+- When code changes are complete, or when asked to "check code quality", run
+  this default sequence:
+  - `uv run ruff format .`
+  - `uv run ruff check .`
+  - `uv run basedpyright`
+- Do not include pytest or nox runs in the default "code quality" request; treat
+  runtime tests as a separate validation step.
+
+## Test Validation Checklist
+
+- Run tests separately from code quality checks:
+  - `uv run pytest -n auto --maxschedchunk 2` (or a focused module when
+    iterating)
+- For full compatibility before handoff/PR, run:
+  - `uvx nox -s tests` (Python 3.10-3.14 matrix + coverage artifacts)
+- For class-function coverage gate validation (when relevant to client changes),
+  run:
+  - `uv run python scripts/check_class_function_coverage.py coverage/py<version>/coverage.json --fail-under 90 --class PdfRestClient --class AsyncPdfRestClient --class _FilesClient --class _AsyncFilesClient`
+- Always report:
+  - files changed
+  - tests/checks run and not run
+  - why any checks were skipped
+
 ## Coding Style & Naming Conventions
 
 - Target Python 3.10–3.14; use 4-space indentation and type hints for public
@@ -125,7 +150,9 @@
   a shared validation suite when multiple endpoints rely on the same input rules
   (e.g., `tests/test_graphic_payload_validation.py`).
 - Do not import from private modules (names beginning with an underscore) in
-  tests or production code—expose any shared helpers via a public module first.
+  production code. In tests, prefer public modules first; allow private-model
+  imports only when necessary to validate request serialization or mock
+  server-facing payload contracts that are not exposed publicly.
 
 ## Testing Guidelines
 
@@ -134,7 +161,9 @@
   considered complete. Mirror the naming/structure used by the graphic
   conversion suites: one module per endpoint, parameterized success cases that
   enumerate all accepted literals, at least one invalid input that hits the
-  server, and coverage for any request options surfaced on the client. If an
+  server, and coverage for server-observable endpoint options. Validate
+  `extra_query`/`extra_headers`/`extra_body`/`timeout` plumbing in unit tests
+  (MockTransport) unless a live assertion depends on those options. If an
   endpoint cannot be exercised live, call that out explicitly in the PR
   description with the reason and the follow-up plan; otherwise reviewers should
   block the change. Treat this as a release gate on par with unit tests.

diff --git a/TESTING_GUIDELINES.md b/TESTING_GUIDELINES.md
@@ -201,15 +201,22 @@ iteration required.
   predictable. Use `pytest.fixture(scope="module"/"class")` and
   `pytest_asyncio.fixture` to cache uploaded PDFs/profiles for both transports.
 - **Sync + async parity:** Every live module should contain matching sync and
-  async tests for success, customization, streaming, and invalid paths
-  (compression levels, conversion options, file streaming helpers).
+  async tests for success, streaming, and invalid paths (compression levels,
+  conversion options, file streaming helpers). Treat transport-plumbing
+  customization (`extra_query`, `extra_headers`, `extra_body`, `timeout`) as a
+  unit-test responsibility unless the endpoint exposes server-observable
+  behavior tied to those options.
 - **Enumerate literals:** Parameterize over every accepted literal (compression
   levels, `color_model`, `smoothing`, merge selectors, redaction presets). Each
   literal should hit the server once per transport.
 - **Optional arguments:** Exercise options such as custom output prefixes,
   diagnostics toggles, merge metadata, and URL uploads. Validate the server
   honors them (filenames start with the user-provided prefix, warnings appear
   when expected).
+- **Customization placement:** Verify request customization wiring in mocked
+  tests (sync + async) for each endpoint. Live customization tests are optional
+  and should be added only when they validate server-side behavior beyond client
+  request assembly.
 - **Negative live cases:** Override JSON via `extra_body`/`extra_query` to
   bypass local validation and assert `PdfRestApiError` (or the exact server
   exception) surfaces—for example, sending an invalid compression literal or