DavidHavoc · DavidHavoc · May 15, 2026 · May 15, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -18,24 +18,29 @@ core/
   blackboard/        # Redis-backed shared state (thesis-only for now)
   orchestrator/
     thesis_flow.py   # ThesisOrchestrator — legacy, do not break
-    readme_flow.py   # ReadmeAuditOrchestrator — new code-audit slice
+    readme_flow.py   # ReadmeAuditOrchestrator — README slice
+    pr_flow.py       # PrAuditOrchestrator — PR slice
+    audit_prompts.py # Shared template loader for all audit prompts
     compiler.py      # PromptCompiler for thesis (blackboard → prompt vars)
   router/            # Provider tier routing (quality/balanced/cheap)
   memory/episodic.py # Qdrant episodic memory (thesis)
   schemas.py         # Thesis Pydantic models
-  schemas_audit.py   # Code-audit Pydantic models (kept separate on purpose)
+  schemas_audit.py   # Code-audit Pydantic models (AuditClaim aliases ReadmeClaim)
   sources/
     base.py          # SourceAdapter ABC — the new evidence-backend contract
     local_repo.py    # LocalRepoAdapter — grep over a local repo
+    github.py        # GitHubAdapter (over PR diff) + PrSpec + live fetcher
 providers/
   unified.py         # UnifiedLLM: provider fallback, breakers, DRY_RUN path
   thesis_agents.py   # Thesis agent suite — untouched, keep passing
-  code_audit_agents.py # README planner, checker, critic + trust gate
+  code_audit_agents.py # README + PR planners/checkers + trust gate + critic
   budget.py          # BudgetGuard (contextvars-scoped session ceiling)
   resilience.py      # Tenacity + pybreaker glue
 prompts/
-  *.md               # Thesis templates (head_planner, specialist_*, ...)
-  code_audit/*.md    # Audit templates (readme_planner, readme_checker, ...)
+  *.md                       # Thesis templates (head_planner, specialist_*, ...)
+  code_audit/readme_*.md     # README auditor templates
+  code_audit/pr_*.md         # PR auditor templates
+  code_audit/audit_critic.md # Shared critic template (used by both)
 tools/mcp/           # Literature MCP tools; will migrate behind SourceAdapter
 apps/
   cli/main.py        # Single argparse CLI for both `thesis ...` and `audit ...`
@@ -65,7 +70,18 @@ This overwrites whatever the LLM said. A confidently hallucinating checker that
 
 Mirror this pattern when you add new auditors (PR auditor, compliance auditor, etc.): keep the LLM creative, but enforce the trust invariant in Python.
 
-## The README-audit flow (current slice)
+## Shared audit pipeline (README and PR slices)
+
+Both auditors follow the same four-stage shape, parameterised by adapter and prompts:
+
+1. **Planner (LLM)** — extracts atomic factual claims as `AuditClaim` (alias of `ReadmeClaim`). The README planner reads `claim_type ∈ {feature, install, usage, requirement, metric, api, other}`; the PR planner reads `claim_type ∈ {add, remove, fix, refactor, test, behavior, doc, other}`. Same schema, domain-specific enums in the prompt.
+2. **Researcher (deterministic Python)** — runs `adapter.search_any(claim.search_hints, limit=N)` over whichever `SourceAdapter` the orchestrator is bound to. README slice uses `LocalRepoAdapter`; PR slice uses `GitHubAdapter`. Adapters never call an LLM and never reach the network at audit time (the PR adapter does its fetch ahead, returning a `PrSpec`).
+3. **Checker (LLM + trust gate)** — judges each `(claim, evidence)` pair. Trust gate runs after the LLM responds.
+4. **Critic (LLM)** — adversarial pass. `AuditCriticAgent` is shared across slices.
+
+`core/orchestrator/audit_prompts.py` holds the template registry — adding a new auditor just means registering its prompts there. `core/schemas_audit.py` exposes `AuditClaim` / `AuditClaimList` as aliases of the (legacy-named) `ReadmeClaim` / `ReadmeClaimList`. Rename can happen when the third auditor lands; until then the alias keeps PR code readable.
+
+## The README-audit flow (slice 1)
 
 1. **Planner (LLM)** — reads the README, extracts atomic factual claims with verbatim quotes + grep-friendly search hints. Schema: `ReadmeClaimList`.
 2. **Researcher (deterministic Python)** — uses `LocalRepoAdapter.search_any(hints)` to retrieve evidence snippets from the repo. **No LLM call here** — it's just a filesystem grep with safety rails (path traversal guard, file-size cap, dir excludes).
@@ -74,6 +90,13 @@ Mirror this pattern when you add new auditors (PR auditor, compliance auditor, e
 
 The audited README is **excluded** from its own evidence pool — otherwise every fabricated claim could "verify itself" against the README quote. See `ReadmeAuditOrchestrator.audit` for the exclusion logic.
 
+## The PR-audit flow (slice 2)
+
+1. **Fetcher** — `fetch_pr_from_github(url, token)` hits the GitHub REST API for PR metadata + unified diff + changed files, returning a `PrSpec`. Tests skip this and use `load_pr_fixture(directory)` to build `PrSpec` from `pr.json` + `diff.patch`. Anonymous requests work for public repos but hit the 60/hour rate limit fast; pass a token explicitly or via `GITHUB_TOKEN` / `GH_TOKEN`.
+2. **Adapter** — `GitHubAdapter(pr_spec)` parses the diff once into `DiffHunk` objects (path + new-file line offset + hunk text). `search_any(terms)` greps hunk bodies, stripping the leading `+`/`-`/space marker before matching so a hint of `+` doesn't spuriously match every added line. One hit per hunk to avoid duplicate snippets.
+3. **Planner / Checker** — `PrPlannerAgent` + `PrCheckerAgent` mirror their README cousins but use PR-specific prompts. The checker's prompt instructs it to interpret `+` lines as additions, `-` as removals — semantic decisions live in the prompt; mechanical retrieval stays in the adapter.
+4. **No artefact-exclusion needed** — the PR description is never part of the diff, so the self-evidence problem doesn't arise here. Different shape from README, same trust invariant.
+
 ## Coexistence rules
 
 - **Do not break the thesis path.** The full thesis test suite (`tests/test_*.py` minus `tests/code_audit/`) must stay green. Thesis is being deprecated *gradually*, not yanked.
@@ -100,7 +123,8 @@ The audited README is **excluded** from its own evidence pool — otherwise ever
 See `ROADMAP.md` for the full picture. Short version:
 
 - ✅ Slice 1 (shipped): README auditor.
-- 🚧 Next slices: PR auditor (`audit pr <url>`), compliance auditor, architecture auditor. All slot in behind the same `SourceAdapter` + agent-suite + trust-gate pattern.
+- ✅ Slice 2 (shipped): PR auditor (`audit pr <url>`).
+- 🚧 Next slices: compliance auditor, architecture auditor. All slot in behind the same `SourceAdapter` + agent-suite + trust-gate pattern.
 - 🚧 Layered source adapters: repo (highest trust) → language specs / RFCs → dependency source. The literature MCP tools will migrate behind the same contract.
 - 🚧 Cherry-picked from the v1.0 plan: tool/source registry, light provider-registry abstraction (Ollama later for local inference on private repos), structlog audit trail.
 - ⏸️ Deferred: PyPI packaging, Typer CLI rewrite, OTel, Smart truncation, Ollama. Not blocking the audit-track expansion.

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,7 @@ All notable changes to OpenWorkers are documented here. The format is loosely ba
 ## [Unreleased]
 
 ### Added
+- **Code-audit track — PR auditor (second slice).** New `openworkers audit pr <github-url>` CLI subcommand verifies a pull request description against its actual diff. Same pipeline shape as the README auditor (planner → deterministic researcher → checker + trust gate → critic), with two new layers: `core/sources/github.py` (`GitHubAdapter` implementing `SourceAdapter` over a unified diff, plus `PrSpec` value object, `parse_pr_url`, `fetch_pr_from_github` for live GitHub fetch with optional `GITHUB_TOKEN`/`GH_TOKEN`, and `load_pr_fixture` for offline tests) and `core/orchestrator/pr_flow.py` (`PrAuditOrchestrator`). New agents `PrPlannerAgent` and `PrCheckerAgent` in `providers/code_audit_agents.py`; `AuditCriticAgent` reused as-is. New prompts `prompts/code_audit/pr_planner.md` + `pr_checker.md` with PR-specific claim types (`add | remove | fix | refactor | test | behavior | doc | other`) and diff-aware verdict rules. Audit-prompt rendering extracted from `readme_flow.py` into `core/orchestrator/audit_prompts.py` so each new auditor can register templates without touching unrelated modules. `tests/fixtures/sample_pr/` contains a canned PR (json + unified diff) with verified / drifted / contradicted / fabricated claims; `tests/code_audit/test_pr_flow.py` asserts verdict distribution and an explicit trust-gate override.
 - **Code-audit track — README auditor (first slice).** New `openworkers audit readme <repo>` CLI subcommand verifies every factual claim in a README against the actual repository, emitting `verified | drifted | unsupported | contradicted` verdicts with cited file paths. Pipeline: planner (LLM) → researcher (deterministic grep via new `LocalRepoAdapter`) → checker (LLM + post-LLM trust gate) → critic (LLM adversarial pass). New modules: `core/sources/` (`SourceAdapter` ABC + `LocalRepoAdapter`), `core/schemas_audit.py` (Pydantic audit models), `core/orchestrator/readme_flow.py` (`ReadmeAuditOrchestrator`), `providers/code_audit_agents.py` (planner / checker / critic + `_enforce_trust_gate` invariant), `prompts/code_audit/*.md` (audit templates). The trust gate is enforced in code, not delegated to prompts: any claim with no retrieved evidence is forced to `unsupported` regardless of LLM output. The audited README is excluded from its own evidence pool so fabricated claims cannot self-verify. `tests/code_audit/test_readme_flow.py` exercises the full flow with a stubbed `UnifiedLLM.generate_fn` and an `tests/fixtures/sample_repo/` containing a deliberate mix of verified / drifted / contradicted / fabricated claims. Thesis pipeline untouched.
 - **Contributor onboarding doc** `AGENTS.md` capturing project DNA, code-audit slice design, trust-gate invariant, conventions, and the recipe for adding new auditors.
 - **RAG over user PDFs** (first incremental v1.0 slice). New `tools/mcp/rag.py` with sentence-aware chunker, `RAGIndexer` (PDF/text → Qdrant via PyMuPDF + FastEmbed `BAAI/bge-small-en-v1.5`), and `RAGSearchTool` (registered as `rag_search` in `ToolRegistry`). Collections namespaced under `rag_*` so they cannot collide with `thesis_corpus` or `episodes`. New CLI: `thesis ingest add|list|delete`. New flag: `thesis research ... --rag-collection <name>` makes the researcher pull from the user collection alongside arXiv/SS. New field: `ResearchContext.rag_collection`. `tests/test_rag.py` covers chunking edge cases, BOM/text extraction, collection naming, indexer round-trip, privacy gating, and idempotent re-ingest.

diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@ Both domains share the same DNA: a hierarchical pipeline (planner → researcher
 
 ## Code audit *(new track)*
 
-`openworkers audit readme <repo>` extracts every factual claim from a README and verdicts each one against the actual repository:
+`openworkers audit readme <repo>` and `openworkers audit pr <github-pr-url>` extract every factual claim from a README or PR description and verdict each one against the actual repository / diff:
 
 | Verdict | Meaning |
 |---|---|
@@ -26,9 +26,11 @@ Both domains share the same DNA: a hierarchical pipeline (planner → researcher
 | `contradicted` | Code directly disproves the claim |
 | `unsupported` | No evidence in the repo — enforced in code, not delegated to the LLM |
 
-The pipeline is planner (LLM extracts claims) → researcher (deterministic grep via `LocalRepoAdapter`) → checker (LLM judges + trust gate forces `unsupported` when evidence is empty) → critic (adversarial pass). The audited README is excluded from its own evidence pool, so fabricated claims cannot verify themselves.
+Both auditors use the same pipeline: planner (LLM extracts claims) → researcher (deterministic grep via a `SourceAdapter` — `LocalRepoAdapter` for repos, `GitHubAdapter` for PR diffs) → checker (LLM judges + trust gate forces `unsupported` when evidence is empty) → critic (adversarial pass). The audited artefact is excluded from its own evidence pool, so fabricated claims cannot verify themselves.
 
-Roadmap for this track: PR auditor (PR description vs. diff), compliance auditor (security/policy claims vs. code), architecture auditor (design doc vs. implementation). See [AGENTS.md](AGENTS.md) for the contributor recipe.
+`audit pr` reads `GITHUB_TOKEN` or `GH_TOKEN` for higher rate limits; offline testing uses canned fixtures via `--fixture <dir>` (see `tests/fixtures/sample_pr/`).
+
+Roadmap for this track: compliance auditor (security/policy claims vs. code), architecture auditor (design doc vs. implementation), layered source adapters (specs / RFCs / dependency source), tool/source registry, Ollama for local inference on private repos. See [AGENTS.md](AGENTS.md) for the contributor recipe.
 
 ## What it does
 

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -24,7 +24,7 @@
 A second domain alongside the thesis assistant: audit factual claims in technical artefacts against the codebase. Same DNA — multi-agent, structured JSON, never fabricates, trust gate refuses verdicts without evidence — applied to a domain where trustworthy automated review matters to OSS maintainers and contributors. See [AGENTS.md](AGENTS.md) for the contributor recipe and trust-gate invariant.
 
 - ✅ **README auditor** *(first slice, shipped)*. `openworkers audit readme <repo>` extracts atomic claims from a README and verdicts each one against the actual codebase as `verified | drifted | unsupported | contradicted`. Trust gate is enforced in `providers/code_audit_agents.py::_enforce_trust_gate`, not in prompts. The audited README is excluded from its own evidence pool. New `SourceAdapter` abstraction (`core/sources/`) with `LocalRepoAdapter`.
-- 🚧 **PR auditor** — `openworkers audit pr <url>`. Verify the PR description against the actual diff; flag scope creep, missing tests, undocumented changes. Needs a `GitHubAdapter` implementing `SourceAdapter`.
+- ✅ **PR auditor** *(second slice, shipped)*. `openworkers audit pr <github-url>` extracts atomic claims from a PR description and verdicts each against the actual unified diff. New `GitHubAdapter` (in `core/sources/github.py`) implements `SourceAdapter` over a unified diff via a `PrSpec` value object; live fetch lives in `fetch_pr_from_github` and is decoupled from the adapter for offline testing. PR-specific prompts under `prompts/code_audit/pr_*.md` use diff-aware verdict rules (e.g., `add` claims must match `+`-prefixed hunks).
 - 🚧 **Compliance auditor** — `openworkers audit compliance <repo>`. Verify security/policy claims ("inputs sanitized", "no secrets", "auth required on X") against the code.
 - 🚧 **Architecture auditor** — verify RFC / design-doc claims against implementation, language specs, and dependency source.
 - 🚧 **Layered source adapters** — repo (highest trust) → language specs / RFCs (`SpecAdapter`) → dependency source (`DependencyAdapter`). The existing literature MCP tools (arXiv / Semantic Scholar / CrossRef) will migrate behind the same `SourceAdapter` contract.

diff --git a/apps/cli/main.py b/apps/cli/main.py
@@ -9,6 +9,7 @@
     format_session_text,
 )
 from core.memory.episodic import EpisodicMemory
+from core.orchestrator.pr_flow import PrAuditOrchestrator, format_pr_report_text
 from core.orchestrator.readme_flow import ReadmeAuditOrchestrator, format_report_text
 from core.orchestrator.thesis_flow import ThesisOrchestrator
 from core.router.engine import Router
@@ -235,9 +236,32 @@ async def cmd_audit_dispatch(args):
     """Route `audit <subcommand>` to its handler."""
     if args.audit_action == "readme":
         return await cmd_audit_readme(args)
+    if args.audit_action == "pr":
+        return await cmd_audit_pr(args)
     raise SystemExit(f"Unknown audit action: {args.audit_action}")
 
 
+async def cmd_audit_pr(args):
+    """Run the PR auditor against a GitHub PR URL (or a local fixture dir)."""
+    from core.sources.github import fetch_pr_from_github, load_pr_fixture
+
+    unified = create_unified_llm()
+    orch = PrAuditOrchestrator(unified=unified)
+
+    if args.fixture:
+        pr_spec = load_pr_fixture(args.fixture)
+    else:
+        pr_spec = await asyncio.to_thread(fetch_pr_from_github, args.url, args.token)
+
+    report, critique = await orch.audit(pr_spec)
+    if args.format == "json":
+        payload = {"report": report.model_dump(), "critique": critique.model_dump()}
+        _output(payload, "json", args.output)
+    else:
+        _output(format_pr_report_text(report, critique), "text", args.output)
+    return report
+
+
 async def cmd_audit_readme(args):
     """Run the README auditor on a local repo."""
     unified = create_unified_llm()
@@ -399,6 +423,31 @@ def build_parser() -> argparse.ArgumentParser:
     )
     add_output_args(p_audit_readme)
 
+    p_audit_pr = audit_sub.add_parser(
+        "pr",
+        help="Verify a PR description against the actual diff",
+    )
+    p_audit_pr.add_argument(
+        "url",
+        type=str,
+        nargs="?",
+        default="",
+        help="GitHub PR URL (https://github.com/owner/repo/pull/N)",
+    )
+    p_audit_pr.add_argument(
+        "--fixture",
+        type=str,
+        default=None,
+        help="Audit a fixture directory containing pr.json + diff.patch instead of hitting GitHub",
+    )
+    p_audit_pr.add_argument(
+        "--token",
+        type=str,
+        default=None,
+        help="GitHub token (default: $GITHUB_TOKEN or $GH_TOKEN)",
+    )
+    add_output_args(p_audit_pr)
+
     p_ingest = sub.add_parser("ingest", help="Manage user RAG collections (PDF/text -> Qdrant)")
     ingest_sub = p_ingest.add_subparsers(dest="ingest_action", required=True)
 

diff --git a/core/orchestrator/audit_prompts.py b/core/orchestrator/audit_prompts.py
@@ -0,0 +1,45 @@
+"""Shared prompt loader for the code-audit family of orchestrators.
+
+Lives outside any single flow module so each new auditor can register
+its templates without circular imports. Deliberately not reusing
+``PromptCompiler``: that compiler is wired to extract thesis blackboard
+state, which audit flows don't use. Audit templates only need
+``{{ var }}`` substitution.
+"""
+
+from __future__ import annotations
+
+import os
+from typing import Any
+
+_AUDIT_PROMPT_DIR = os.path.join(
+    os.path.dirname(os.path.dirname(os.path.dirname(__file__))),
+    "prompts",
+    "code_audit",
+)
+
+# Registry of template name → filename under ``prompts/code_audit/``.
+# Each new auditor appends its entries here; the renderer accepts any
+# registered name without further changes elsewhere.
+TEMPLATE_FILES: dict[str, str] = {
+    "readme_planner": "readme_planner.md",
+    "readme_checker": "readme_checker.md",
+    "pr_planner": "pr_planner.md",
+    "pr_checker": "pr_checker.md",
+    "audit_critic": "audit_critic.md",
+}
+
+
+def render_audit_prompt(name: str, variables: dict[str, Any]) -> str:
+    filename = TEMPLATE_FILES.get(name)
+    if not filename:
+        raise ValueError(f"Unknown audit template: {name}")
+    path = os.path.join(_AUDIT_PROMPT_DIR, filename)
+    try:
+        with open(path, encoding="utf-8") as f:
+            template = f.read()
+    except OSError:
+        return f"[Template {name} not found at {path}]"
+    for key, value in variables.items():
+        template = template.replace("{{ " + key + " }}", str(value))
+    return template