GitHub - sdave0/argus-audit: A RAG verification pipeline for regulated environments that decomposes complex policies into independently graded, hallucination-free claims.

Automated Claim Verification for Policy and Compliance

When querying dense SLAs, vendor agreements, or internal financial policies, a single rule often contains multiple strict conditions. Standard RAG pipelines find evidence for one, ignore the others, and synthesize a confident paragraph that passes them all.

ArgusAudit solves this liability in regulated environments through an agentic verification pipeline that treats compliance as a classification problem. Every query is decomposed into discrete verifiable claims, each graded independently against the source material. Three claims in means three separate verdicts out.

---

Example

Input: "All production S3 buckets must use AES-256 encryption, rotate keys every 90 days, and replicate data to the 'us-west-2' region."

Status	Claim	Evidence	Source
SUPPORTED	AES-256 encryption required	"All production data must use AES-256..."	InfoSec Std v4.1 (p.12)
REFUTED	Keys rotate every 90 days	Policy mandates annual rotation only	Key Mgmt Policy (p.4)
UNKNOWN	Replicate to us-west-2	Policy mentions offsite backup, no region specified	No context found

Three claims in, two different verdicts. A standard RAG pipeline returns one confident paragraph that passes all three.

How It Works

ArgusAudit runs a multi-step verification pipeline where each stage is isolated, gated, and independently observable — decomposition, retrieval, grading, and guardrail validation never share state.

Claim Decomposition

A query like "AES-256 encryption and 90-day key rotation" is a single vector search that averages the meaning of the full sentence. Strong matches on "encryption" drown out weak matches on "rotation". The LLM fills the gap.

ArgusAudit splits the query before retrieval. Each claim is an isolated unit with its own retrieval pass and its own verdict. Nothing bleeds into anything else.

Blind Evidence Grading

The model grades evidence without seeing source metadata. Retrieved chunks are passed as anonymous snippets indexed by integer. The LLM returns a verdict tied to a snippet ID. The orchestration layer resolves that ID back to the document name and page number after grading is complete.

The model cannot fabricate a citation because it never had one to work from.

Hybrid Retrieval

Retrieval combines dense and sparse signals using Reciprocal Rank Fusion:

BAAI/bge-small-en for semantic intent matching
BM25 for exact keyword hits on clause numbers, defined terms, and specific identifiers

Guardrail

Before any response is streamed, a final structured pass validates that every claim in the output is grounded in a retrieved snippet. Output is rejected at the schema level if a claim lacks an evidence reference. The system does not release ungrounded findings.

Command Modes

Claim Verification Mode

@c runs the full verification pipeline: decomposition, hybrid retrieval, blind grading, report generation, and guardrail check.


@c Verify that all vendors must hold ISO 27001 certification.

Retrieval Mode

@r runs a direct vector search against the index. Useful for fast document lookups that do not require structured grading.


@r What is the deductible for general liability?

Architecture

The system uses an agentic LangGraph workflow designed for high reliability. The agent follows a strict, self-monitoring process:

Evidence-Gated Generation: The agent must successfully find and score relevant evidence before it can formulate a response.
Self-Correction: If the system fails to find evidence or exhausts its execution limits (budget), it automatically adjusts its approach.
Output Validation: All final responses are cross-checked against the retrieved evidence to guarantee accuracy before release.

Capabilities

Manual Review Controls

The interface is built for human sign-off, not passive consumption. A Contradiction Heatmap cross-references documents against claims in real time, surfacing conflicting evidence before it becomes a finding. Reviewers can override any verdict directly, generating a timestamped audit log that records every human intervention.

Unverifiable findings trigger a prominent callout flagging the claim for manual review rather than silently passing it.

Budget Circuit Breakers

Unconstrained agent loops burn compute and return late. Graph execution is gated by hard limits enforced via a custom BudgetTrackerCallback:

Limit	Default
Max LLM calls	10
Max tokens	15,000
Max wall time	90 seconds

When a ceiling is hit, the system returns a partial verification with a clear status rather than timing out or throwing a 500.

Closed-Loop Telemetry

Every output collects a thumbs up or down. Reviewers who override a finding submit a reason code: hallucinated_citation, missing_evidence, incorrect_verdict. A metrics script aggregates these logs into four tracked signals:

Verdict Accuracy
Citation Correctness
Abstention Quality
Human Override Rate

This structured feedback loop generates the high-quality dataset required to identify weaknesses, systematically enhance the agentic flows, and eventually enable autonomous self-learning.

Stack

Layer	Technology	Why
Orchestration	LangGraph	State isolation per claim, budget gating across nodes
Vector Store	Qdrant	Page-level filtering to bind chunks to source metadata
Validation	Pydantic v2	Strict schemas for grading input and audit log output
Embeddings	BAAI/bge-small-en	Strong MTEB retrieval performance at small model size
API	FastAPI	Async SSE for concurrent audit streams

Quick Start

Requires Docker, Python 3.10+, Node.js 18+.

Single command (Windows)

start_all.bat

Starts Qdrant, the FastAPI backend, and the React frontend concurrently. To stop all instances cleanly:

stop_all.bat

Manual startup

# 1. Vector store
docker-compose up -d

# 2. Backend
call .venv\Scripts\activate.bat
uvicorn backend.api.main:app --reload --host 0.0.0.0 --port 8000

# 3. Frontend
cd frontend-react
npm install && npm run dev

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
config		config
frontend-react		frontend-react
images		images
scripts		scripts
tests		tests
utils		utils
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
start_all.bat		start_all.bat
stop_all.bat		stop_all.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Example

How It Works

Claim Decomposition

Blind Evidence Grading

Hybrid Retrieval

Guardrail

Command Modes

Claim Verification Mode

Retrieval Mode

Architecture

Capabilities

Manual Review Controls

Budget Circuit Breakers

Closed-Loop Telemetry

Stack

Quick Start

Single command (Windows)

Manual startup

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Example

How It Works

Claim Decomposition

Blind Evidence Grading

Hybrid Retrieval

Guardrail

Command Modes

Claim Verification Mode

Retrieval Mode

Architecture

Capabilities

Manual Review Controls

Budget Circuit Breakers

Closed-Loop Telemetry

Stack

Quick Start

Single command (Windows)

Manual startup

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages