Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 31 additions & 7 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,24 +18,29 @@ core/
blackboard/ # Redis-backed shared state (thesis-only for now)
orchestrator/
thesis_flow.py # ThesisOrchestrator — legacy, do not break
readme_flow.py # ReadmeAuditOrchestrator — new code-audit slice
readme_flow.py # ReadmeAuditOrchestrator — README slice
pr_flow.py # PrAuditOrchestrator — PR slice
audit_prompts.py # Shared template loader for all audit prompts
compiler.py # PromptCompiler for thesis (blackboard → prompt vars)
router/ # Provider tier routing (quality/balanced/cheap)
memory/episodic.py # Qdrant episodic memory (thesis)
schemas.py # Thesis Pydantic models
schemas_audit.py # Code-audit Pydantic models (kept separate on purpose)
schemas_audit.py # Code-audit Pydantic models (AuditClaim aliases ReadmeClaim)
sources/
base.py # SourceAdapter ABC — the new evidence-backend contract
local_repo.py # LocalRepoAdapter — grep over a local repo
github.py # GitHubAdapter (over PR diff) + PrSpec + live fetcher
providers/
unified.py # UnifiedLLM: provider fallback, breakers, DRY_RUN path
thesis_agents.py # Thesis agent suite — untouched, keep passing
code_audit_agents.py # README planner, checker, critic + trust gate
code_audit_agents.py # README + PR planners/checkers + trust gate + critic
budget.py # BudgetGuard (contextvars-scoped session ceiling)
resilience.py # Tenacity + pybreaker glue
prompts/
*.md # Thesis templates (head_planner, specialist_*, ...)
code_audit/*.md # Audit templates (readme_planner, readme_checker, ...)
*.md # Thesis templates (head_planner, specialist_*, ...)
code_audit/readme_*.md # README auditor templates
code_audit/pr_*.md # PR auditor templates
code_audit/audit_critic.md # Shared critic template (used by both)
tools/mcp/ # Literature MCP tools; will migrate behind SourceAdapter
apps/
cli/main.py # Single argparse CLI for both `thesis ...` and `audit ...`
Expand Down Expand Up @@ -65,7 +70,18 @@ This overwrites whatever the LLM said. A confidently hallucinating checker that

Mirror this pattern when you add new auditors (PR auditor, compliance auditor, etc.): keep the LLM creative, but enforce the trust invariant in Python.

## The README-audit flow (current slice)
## Shared audit pipeline (README and PR slices)

Both auditors follow the same four-stage shape, parameterised by adapter and prompts:

1. **Planner (LLM)** — extracts atomic factual claims as `AuditClaim` (alias of `ReadmeClaim`). The README planner reads `claim_type ∈ {feature, install, usage, requirement, metric, api, other}`; the PR planner reads `claim_type ∈ {add, remove, fix, refactor, test, behavior, doc, other}`. Same schema, domain-specific enums in the prompt.
2. **Researcher (deterministic Python)** — runs `adapter.search_any(claim.search_hints, limit=N)` over whichever `SourceAdapter` the orchestrator is bound to. README slice uses `LocalRepoAdapter`; PR slice uses `GitHubAdapter`. Adapters never call an LLM and never reach the network at audit time (the PR adapter does its fetch ahead, returning a `PrSpec`).
3. **Checker (LLM + trust gate)** — judges each `(claim, evidence)` pair. Trust gate runs after the LLM responds.
4. **Critic (LLM)** — adversarial pass. `AuditCriticAgent` is shared across slices.

`core/orchestrator/audit_prompts.py` holds the template registry — adding a new auditor just means registering its prompts there. `core/schemas_audit.py` exposes `AuditClaim` / `AuditClaimList` as aliases of the (legacy-named) `ReadmeClaim` / `ReadmeClaimList`. Rename can happen when the third auditor lands; until then the alias keeps PR code readable.

## The README-audit flow (slice 1)

1. **Planner (LLM)** — reads the README, extracts atomic factual claims with verbatim quotes + grep-friendly search hints. Schema: `ReadmeClaimList`.
2. **Researcher (deterministic Python)** — uses `LocalRepoAdapter.search_any(hints)` to retrieve evidence snippets from the repo. **No LLM call here** — it's just a filesystem grep with safety rails (path traversal guard, file-size cap, dir excludes).
Expand All @@ -74,6 +90,13 @@ Mirror this pattern when you add new auditors (PR auditor, compliance auditor, e

The audited README is **excluded** from its own evidence pool — otherwise every fabricated claim could "verify itself" against the README quote. See `ReadmeAuditOrchestrator.audit` for the exclusion logic.

## The PR-audit flow (slice 2)

1. **Fetcher** — `fetch_pr_from_github(url, token)` hits the GitHub REST API for PR metadata + unified diff + changed files, returning a `PrSpec`. Tests skip this and use `load_pr_fixture(directory)` to build `PrSpec` from `pr.json` + `diff.patch`. Anonymous requests work for public repos but hit the 60/hour rate limit fast; pass a token explicitly or via `GITHUB_TOKEN` / `GH_TOKEN`.
2. **Adapter** — `GitHubAdapter(pr_spec)` parses the diff once into `DiffHunk` objects (path + new-file line offset + hunk text). `search_any(terms)` greps hunk bodies, stripping the leading `+`/`-`/space marker before matching so a hint of `+` doesn't spuriously match every added line. One hit per hunk to avoid duplicate snippets.
3. **Planner / Checker** — `PrPlannerAgent` + `PrCheckerAgent` mirror their README cousins but use PR-specific prompts. The checker's prompt instructs it to interpret `+` lines as additions, `-` as removals — semantic decisions live in the prompt; mechanical retrieval stays in the adapter.
4. **No artefact-exclusion needed** — the PR description is never part of the diff, so the self-evidence problem doesn't arise here. Different shape from README, same trust invariant.

## Coexistence rules

- **Do not break the thesis path.** The full thesis test suite (`tests/test_*.py` minus `tests/code_audit/`) must stay green. Thesis is being deprecated *gradually*, not yanked.
Expand All @@ -100,7 +123,8 @@ The audited README is **excluded** from its own evidence pool — otherwise ever
See `ROADMAP.md` for the full picture. Short version:

- ✅ Slice 1 (shipped): README auditor.
- 🚧 Next slices: PR auditor (`audit pr <url>`), compliance auditor, architecture auditor. All slot in behind the same `SourceAdapter` + agent-suite + trust-gate pattern.
- ✅ Slice 2 (shipped): PR auditor (`audit pr <url>`).
- 🚧 Next slices: compliance auditor, architecture auditor. All slot in behind the same `SourceAdapter` + agent-suite + trust-gate pattern.
- 🚧 Layered source adapters: repo (highest trust) → language specs / RFCs → dependency source. The literature MCP tools will migrate behind the same contract.
- 🚧 Cherry-picked from the v1.0 plan: tool/source registry, light provider-registry abstraction (Ollama later for local inference on private repos), structlog audit trail.
- ⏸️ Deferred: PyPI packaging, Typer CLI rewrite, OTel, Smart truncation, Ollama. Not blocking the audit-track expansion.
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ All notable changes to OpenWorkers are documented here. The format is loosely ba
## [Unreleased]

### Added
- **Code-audit track — PR auditor (second slice).** New `openworkers audit pr <github-url>` CLI subcommand verifies a pull request description against its actual diff. Same pipeline shape as the README auditor (planner → deterministic researcher → checker + trust gate → critic), with two new layers: `core/sources/github.py` (`GitHubAdapter` implementing `SourceAdapter` over a unified diff, plus `PrSpec` value object, `parse_pr_url`, `fetch_pr_from_github` for live GitHub fetch with optional `GITHUB_TOKEN`/`GH_TOKEN`, and `load_pr_fixture` for offline tests) and `core/orchestrator/pr_flow.py` (`PrAuditOrchestrator`). New agents `PrPlannerAgent` and `PrCheckerAgent` in `providers/code_audit_agents.py`; `AuditCriticAgent` reused as-is. New prompts `prompts/code_audit/pr_planner.md` + `pr_checker.md` with PR-specific claim types (`add | remove | fix | refactor | test | behavior | doc | other`) and diff-aware verdict rules. Audit-prompt rendering extracted from `readme_flow.py` into `core/orchestrator/audit_prompts.py` so each new auditor can register templates without touching unrelated modules. `tests/fixtures/sample_pr/` contains a canned PR (json + unified diff) with verified / drifted / contradicted / fabricated claims; `tests/code_audit/test_pr_flow.py` asserts verdict distribution and an explicit trust-gate override.
- **Code-audit track — README auditor (first slice).** New `openworkers audit readme <repo>` CLI subcommand verifies every factual claim in a README against the actual repository, emitting `verified | drifted | unsupported | contradicted` verdicts with cited file paths. Pipeline: planner (LLM) → researcher (deterministic grep via new `LocalRepoAdapter`) → checker (LLM + post-LLM trust gate) → critic (LLM adversarial pass). New modules: `core/sources/` (`SourceAdapter` ABC + `LocalRepoAdapter`), `core/schemas_audit.py` (Pydantic audit models), `core/orchestrator/readme_flow.py` (`ReadmeAuditOrchestrator`), `providers/code_audit_agents.py` (planner / checker / critic + `_enforce_trust_gate` invariant), `prompts/code_audit/*.md` (audit templates). The trust gate is enforced in code, not delegated to prompts: any claim with no retrieved evidence is forced to `unsupported` regardless of LLM output. The audited README is excluded from its own evidence pool so fabricated claims cannot self-verify. `tests/code_audit/test_readme_flow.py` exercises the full flow with a stubbed `UnifiedLLM.generate_fn` and an `tests/fixtures/sample_repo/` containing a deliberate mix of verified / drifted / contradicted / fabricated claims. Thesis pipeline untouched.
- **Contributor onboarding doc** `AGENTS.md` capturing project DNA, code-audit slice design, trust-gate invariant, conventions, and the recipe for adding new auditors.
- **RAG over user PDFs** (first incremental v1.0 slice). New `tools/mcp/rag.py` with sentence-aware chunker, `RAGIndexer` (PDF/text → Qdrant via PyMuPDF + FastEmbed `BAAI/bge-small-en-v1.5`), and `RAGSearchTool` (registered as `rag_search` in `ToolRegistry`). Collections namespaced under `rag_*` so they cannot collide with `thesis_corpus` or `episodes`. New CLI: `thesis ingest add|list|delete`. New flag: `thesis research ... --rag-collection <name>` makes the researcher pull from the user collection alongside arXiv/SS. New field: `ResearchContext.rag_collection`. `tests/test_rag.py` covers chunking edge cases, BOM/text extraction, collection naming, indexer round-trip, privacy gating, and idempotent re-ingest.
Expand Down
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Both domains share the same DNA: a hierarchical pipeline (planner → researcher

## Code audit *(new track)*

`openworkers audit readme <repo>` extracts every factual claim from a README and verdicts each one against the actual repository:
`openworkers audit readme <repo>` and `openworkers audit pr <github-pr-url>` extract every factual claim from a README or PR description and verdict each one against the actual repository / diff:

| Verdict | Meaning |
|---|---|
Expand All @@ -26,9 +26,11 @@ Both domains share the same DNA: a hierarchical pipeline (planner → researcher
| `contradicted` | Code directly disproves the claim |
| `unsupported` | No evidence in the repo — enforced in code, not delegated to the LLM |

The pipeline is planner (LLM extracts claims) → researcher (deterministic grep via `LocalRepoAdapter`) → checker (LLM judges + trust gate forces `unsupported` when evidence is empty) → critic (adversarial pass). The audited README is excluded from its own evidence pool, so fabricated claims cannot verify themselves.
Both auditors use the same pipeline: planner (LLM extracts claims) → researcher (deterministic grep via a `SourceAdapter` — `LocalRepoAdapter` for repos, `GitHubAdapter` for PR diffs) → checker (LLM judges + trust gate forces `unsupported` when evidence is empty) → critic (adversarial pass). The audited artefact is excluded from its own evidence pool, so fabricated claims cannot verify themselves.

Roadmap for this track: PR auditor (PR description vs. diff), compliance auditor (security/policy claims vs. code), architecture auditor (design doc vs. implementation). See [AGENTS.md](AGENTS.md) for the contributor recipe.
`audit pr` reads `GITHUB_TOKEN` or `GH_TOKEN` for higher rate limits; offline testing uses canned fixtures via `--fixture <dir>` (see `tests/fixtures/sample_pr/`).

Roadmap for this track: compliance auditor (security/policy claims vs. code), architecture auditor (design doc vs. implementation), layered source adapters (specs / RFCs / dependency source), tool/source registry, Ollama for local inference on private repos. See [AGENTS.md](AGENTS.md) for the contributor recipe.

## What it does

Expand Down
2 changes: 1 addition & 1 deletion ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
A second domain alongside the thesis assistant: audit factual claims in technical artefacts against the codebase. Same DNA — multi-agent, structured JSON, never fabricates, trust gate refuses verdicts without evidence — applied to a domain where trustworthy automated review matters to OSS maintainers and contributors. See [AGENTS.md](AGENTS.md) for the contributor recipe and trust-gate invariant.

- ✅ **README auditor** *(first slice, shipped)*. `openworkers audit readme <repo>` extracts atomic claims from a README and verdicts each one against the actual codebase as `verified | drifted | unsupported | contradicted`. Trust gate is enforced in `providers/code_audit_agents.py::_enforce_trust_gate`, not in prompts. The audited README is excluded from its own evidence pool. New `SourceAdapter` abstraction (`core/sources/`) with `LocalRepoAdapter`.
- 🚧 **PR auditor** `openworkers audit pr <url>`. Verify the PR description against the actual diff; flag scope creep, missing tests, undocumented changes. Needs a `GitHubAdapter` implementing `SourceAdapter`.
- **PR auditor** *(second slice, shipped)*. `openworkers audit pr <github-url>` extracts atomic claims from a PR description and verdicts each against the actual unified diff. New `GitHubAdapter` (in `core/sources/github.py`) implements `SourceAdapter` over a unified diff via a `PrSpec` value object; live fetch lives in `fetch_pr_from_github` and is decoupled from the adapter for offline testing. PR-specific prompts under `prompts/code_audit/pr_*.md` use diff-aware verdict rules (e.g., `add` claims must match `+`-prefixed hunks).
- 🚧 **Compliance auditor** — `openworkers audit compliance <repo>`. Verify security/policy claims ("inputs sanitized", "no secrets", "auth required on X") against the code.
- 🚧 **Architecture auditor** — verify RFC / design-doc claims against implementation, language specs, and dependency source.
- 🚧 **Layered source adapters** — repo (highest trust) → language specs / RFCs (`SpecAdapter`) → dependency source (`DependencyAdapter`). The existing literature MCP tools (arXiv / Semantic Scholar / CrossRef) will migrate behind the same `SourceAdapter` contract.
Expand Down
49 changes: 49 additions & 0 deletions apps/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
format_session_text,
)
from core.memory.episodic import EpisodicMemory
from core.orchestrator.pr_flow import PrAuditOrchestrator, format_pr_report_text
from core.orchestrator.readme_flow import ReadmeAuditOrchestrator, format_report_text
from core.orchestrator.thesis_flow import ThesisOrchestrator
from core.router.engine import Router
Expand Down Expand Up @@ -235,9 +236,32 @@ async def cmd_audit_dispatch(args):
"""Route `audit <subcommand>` to its handler."""
if args.audit_action == "readme":
return await cmd_audit_readme(args)
if args.audit_action == "pr":
return await cmd_audit_pr(args)
raise SystemExit(f"Unknown audit action: {args.audit_action}")


async def cmd_audit_pr(args):
"""Run the PR auditor against a GitHub PR URL (or a local fixture dir)."""
from core.sources.github import fetch_pr_from_github, load_pr_fixture

unified = create_unified_llm()
orch = PrAuditOrchestrator(unified=unified)

if args.fixture:
pr_spec = load_pr_fixture(args.fixture)
else:
pr_spec = await asyncio.to_thread(fetch_pr_from_github, args.url, args.token)

report, critique = await orch.audit(pr_spec)
if args.format == "json":
payload = {"report": report.model_dump(), "critique": critique.model_dump()}
_output(payload, "json", args.output)
else:
_output(format_pr_report_text(report, critique), "text", args.output)
return report


async def cmd_audit_readme(args):
"""Run the README auditor on a local repo."""
unified = create_unified_llm()
Expand Down Expand Up @@ -399,6 +423,31 @@ def build_parser() -> argparse.ArgumentParser:
)
add_output_args(p_audit_readme)

p_audit_pr = audit_sub.add_parser(
"pr",
help="Verify a PR description against the actual diff",
)
p_audit_pr.add_argument(
"url",
type=str,
nargs="?",
default="",
help="GitHub PR URL (https://github.com/owner/repo/pull/N)",
)
p_audit_pr.add_argument(
"--fixture",
type=str,
default=None,
help="Audit a fixture directory containing pr.json + diff.patch instead of hitting GitHub",
)
p_audit_pr.add_argument(
"--token",
type=str,
default=None,
help="GitHub token (default: $GITHUB_TOKEN or $GH_TOKEN)",
)
add_output_args(p_audit_pr)

p_ingest = sub.add_parser("ingest", help="Manage user RAG collections (PDF/text -> Qdrant)")
ingest_sub = p_ingest.add_subparsers(dest="ingest_action", required=True)

Expand Down
45 changes: 45 additions & 0 deletions core/orchestrator/audit_prompts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
"""Shared prompt loader for the code-audit family of orchestrators.

Lives outside any single flow module so each new auditor can register
its templates without circular imports. Deliberately not reusing
``PromptCompiler``: that compiler is wired to extract thesis blackboard
state, which audit flows don't use. Audit templates only need
``{{ var }}`` substitution.
"""

from __future__ import annotations

import os
from typing import Any

_AUDIT_PROMPT_DIR = os.path.join(
os.path.dirname(os.path.dirname(os.path.dirname(__file__))),
"prompts",
"code_audit",
)

# Registry of template name → filename under ``prompts/code_audit/``.
# Each new auditor appends its entries here; the renderer accepts any
# registered name without further changes elsewhere.
TEMPLATE_FILES: dict[str, str] = {
"readme_planner": "readme_planner.md",
"readme_checker": "readme_checker.md",
"pr_planner": "pr_planner.md",
"pr_checker": "pr_checker.md",
"audit_critic": "audit_critic.md",
}


def render_audit_prompt(name: str, variables: dict[str, Any]) -> str:
filename = TEMPLATE_FILES.get(name)
if not filename:
raise ValueError(f"Unknown audit template: {name}")
path = os.path.join(_AUDIT_PROMPT_DIR, filename)
try:
with open(path, encoding="utf-8") as f:
template = f.read()
except OSError:
return f"[Template {name} not found at {path}]"
for key, value in variables.items():
template = template.replace("{{ " + key + " }}", str(value))
return template
Loading
Loading