`agentic-qa-kit`

The agentic QA operating system for software projects

Turn any repository into an agentic QA lab. Works with Claude · Codex · Gemini · Copilot. Bun-first. Enterprise-ready.

Not a test runner. An operating system for agentic QA.

A standardized framework that turns coding agents into QA engineers guided by risk maps, invariants, scenarios, probes, oracles, and replay. It is not a prompt. It is the reusable framework that makes the prompt operational, reproducible, versionable, and adaptable to every project.

Why this exists

Coding agents (Claude Code, Codex CLI, Gemini CLI, GitHub Copilot CLI) are great at writing code. They are poor QA engineers by default: they will gladly add a feature without imagining how a malicious user might exploit it, how a second tenant might leak across, or how the LLM tool-calling layer can be tricked into refunding a payment without confirmation.

agentic-qa-kit provides the operating system the agent needs to behave like a senior QA engineer on your project:

An explicit risk map with severity, invariants, probes, and oracles
Pre-built scenario packs for APIs, web UIs, LLM agents, security, migrations
Adapters that install the right skills for Claude / Codex / Gemini / Copilot
A runner that executes profiles deterministically (smoke, exploratory, security, release-gate)
Findings with three-level reproducibility, bug-level deterministic replay, and suggested regression tests
Optional admin panel (React) and server (Bun/Node) for multi-team self-hosted deployments

What makes it different

🧠 Multi-agent native — Claude · Codex · Gemini · Copilot first-class adapters, not "Claude with the others bolted on". Adapter capability negotiation, so each agent uses its best primitives (subagents, skills, slash commands, hooks).
🎯 Deterministic replay where it matters — three-level reproducibility (bug / scenario / agent). The kit never lies about LLM determinism. Bug-level deterministic replay is required for any release-gate verified finding.
🔒 Sandbox by design — container-per-scenario isolation default for security and release-gate profiles. Egress allowlists. Tool-call budgets. Resource limits. Cost kill-switches.
💰 Cost governance built-in — per-org / project / profile / scenario budgets in USD and tokens, hard kill-switches, attribution to risk areas. No more "an agent loop burned $400 overnight".
🏠 BYOK + on-prem LLM — bring your own Anthropic/OpenAI keys, or use vLLM / Bedrock private / Azure OpenAI VNet / llama.cpp. Air-gap deploy supported.
📋 OWASP Top 10 Agentic (2026) built-in security pack. Plus STRIDE / FMEA risk discovery (v0.6).
🧾 Hash-chained audit log + WORM export. SOC2 / ISO 27001 / GDPR / HIPAA alignment on the roadmap (v0.3 self-hosted, v1.0 GA).
🔁 Process-first governance — every PR follows a documented loop with Copilot Code Review. Lessons captured in docs/LESSON.md for permanent improvement.

Quick start (junior-friendly)

Status note: the kit reached v1.0 GA (24-task roadmap complete) and is now at v1.1. The 18 workspace packages (@aqa/schemas, @aqa/kit, @aqa/runner, @aqa/reporter, @aqa/server, @aqa/admin, @aqa/compliance, @aqa/methodology, …) ship from this monorepo. Detailed walk-through: docs/getting-started.md.

Preview the v0.1.0 quick start (click to expand)

1. Install Bun

# macOS / Linux
curl -fsSL https://bun.sh/install | bash

# Windows (PowerShell)
powershell -c "irm bun.sh/install.ps1 | iex"

2. Install the kit in your project

cd /path/to/your/project
bun add -d agentic-qa-kit

If you don't have a project yet, clone examples/bun-api from this repo (available in v0.1.0).

3. Initialize the AQA workspace

bunx aqa init

Detects your stack and creates .aqa/ with testing.md, risk-map.yaml, profiles.yaml, and scenarios for the packs your project matches.

4. Install agent-specific files (pick one or many)

bunx aqa install-agent-files --targets claude,codex,gemini,copilot

This generates CLAUDE.md + .claude/skills/aqa-*, AGENTS.md + .agents/skills/, GEMINI.md + .gemini/skills/, .github/copilot-instructions.md + .github/skills/.

5. Run your first agentic QA pass

bunx aqa run --profile smoke

A 10-minute, non-destructive sweep. When it finishes:

bunx aqa report

You'll see findings like:

AQA-2026-0001 [P1] Cross-tenant data leak (verified, 3/3 deterministic replay)
AQA-2026-0002 [P3] Missing rate limit on /api/search

6. Replay a finding to confirm

bunx aqa replay AQA-2026-0001

Re-runs the deterministic bug reproduction (curl / Playwright / SQL) and tells you if it still reproduces. If it doesn't, the bug is fixed — closes the loop.

The mental model in 7 words

Risk → Invariant → Scenario → Probe → Oracle → Finding → Replay

Every concept in AQA is one of these seven things or a tool that operates on them. See docs/ecosystem-explained.md for the deep introduction.

Multi-agent

Target	Files generated	Capability highlights
🟣 Claude Code	`CLAUDE.md`, `.claude/skills/aqa-`, `.claude/agents/aqa-`	Skills, subagents (isolated context), hooks, MCP
🟢 Codex	`AGENTS.md`, `.agents/skills/aqa-*`, optional Codex plugin	Skills, explicit subagents, plugins, MCP
🔵 Gemini CLI	`GEMINI.md`, `.gemini/skills/aqa-`, `.gemini/agents/`, `.gemini/commands/.toml`	Skills, subagents, slash commands, MCP
⚫ GitHub Copilot CLI	`.github/copilot-instructions.md`, `.github/skills/aqa-`, `.github/agents/.agent.md`, `.github/hooks/*.json`	Skills (auto-detects `.claude/skills`), custom agents, hooks

Capability negotiation is runtime: the kit asks the agent target what it supports, and degrades gracefully when something is missing.

Architecture at a glance

+- Local mode (single dev / CI) -----------------------------+
|  bunx aqa CLI                                              |
|   |- engine + runner (sandboxed)                           |
|   |- packs (core, api, web-ui, llm-agent, security, ...)   |
|   |- adapters (Claude/Codex/Gemini/Copilot)                |
|   `- .aqa/  (project state, runs, findings, replay)        |
+------------------------------------------------------------+

+- Self-hosted (multi-team, post v0.3) ----------------------+
|  Control Plane (HA)                                        |
|   |- agentic-qa-kit-server (Hono+Bun or Express+Node)      |
|   |- agentic-qa-kit-admin (React)                          |
|   |- Postgres HA . Redis/NATS . S3-compat . Vault . OIDC   |
|   `- OTel Collector + Prometheus + Tempo + Loki            |
|                                                            |
|  Runners (per-team / CI shared / dev laptop)               |
|   - mTLS + OIDC to the control plane                       |
|   - execute scenarios next to the code (code never leaves) |
+------------------------------------------------------------+

Full diagram: docs/architecture/reference.md (stub; expanded in v0.1.0).

Roadmap

Version	Theme	Highlights
`v0.0.1-governance`	Bootstrap	Process docs, CI, Copilot review automation, admin spec
`v0.1.x`	Foundation	Schemas, CLI (init/doctor/validate), 5 base packs, 4 adapters, runner+smoke, reports, admin viewer
`v0.2.x`	Determinism & cost	3-level replay, cost governance, container sandbox default
`v0.3.x`	Enterprise table-stakes	Postgres backend, SSO/RBAC, pack signing, on-prem LLM, Helm chart, air-gap installer
`v0.4.x`	Admin editing	Scenario Studio, AI-generation with review workflow
`v0.5.x`	Multi-team	Server + runner fleet, findings dedup, bug→fix→verify-fix loop
`v0.6.x`	Methodology rigor	STRIDE/FMEA/OWASP integration, oracle ensemble, judge calibration
`v1.0`	GA enterprise — shipped	SOC2/ISO controls catalog, `aqa-audit-verify` CLI, pen-test scope doc
`v1.1`	Polish — shipped	Banner, full Helm chart (runner StatefulSet, Ingress, NetworkPolicy, Postgres subchart), 3 example targets (Bun, Next.js, Laravel)
`v1.2`	Admin SPA wired — shipped	Tailwind 4 + TanStack Router + Query + 12 screens, audit-chain verification in-browser via Web Crypto
`v1.3`	Quality batch — shipped	Admin server↔UI mapping, 6 detail routes, 12 new admin tests, CLI E2E smoke gate, threat-model expansion, CHANGELOG backfill

Status

GA (v1.0 shipped, v1.3 current). The full 24-task roadmap is closed: schemas, CLI (@aqa/kit), 5 baseline packs, multi-agent adapters (Claude/Codex/Gemini/Copilot), runner with hash-chained audit, reporter with 3-level replay, admin panel, server + runner fleet, on-prem LLM adapters, SSO/RBAC, Postgres backend, pack signing + scanning, container sandbox, cost governance, findings dedup + clustering, STRIDE/FMEA/OWASP methodology layer, Helm chart + Terraform + air-gap installer, SOC2/ISO controls catalog + aqa-audit-verify CLI.

Release notes per tag: Releases page. Live state: docs/PROGRESS.md. Architectural decisions: docs/adr/.

Documentation

docs/getting-started.md — junior onboarding
docs/PACK-AUTHORING.md — write your own pack (community guide)
docs/ecosystem-explained.md — concepts deep-dive
docs/RULES.md — contribution rules
docs/adr/ — architecture decisions
docs/design/admin-panel-template.md — admin UI spec (for parallel template work)
AGENTS.md — single source of truth for AI contributors
docs/architecture/reference.md — full architecture (stub; expanded in v0.1.0)
docs/security/threat-model.md — STRIDE applied to AQA (stub; expanded in v0.1.0)
docs/methodology/agentic-qa.md — methodology paper (stub; expanded in v0.1.0)

Contributing

Please read CONTRIBUTING.md, AGENTS.md, and docs/RULES.md first.

We follow a strict PR loop with Copilot Code Review on every PR (automated by .github/workflows/copilot-review.yml).

Security

For vulnerabilities, use the private channel in SECURITY.md — do not file public issues.

License

Maintainers

Padosoft — info@padosoft.com

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.agents/skills		.agents/skills
.claude/skills		.claude/skills
.gemini/skills		.gemini/skills
.github		.github
deploy		deploy
docs		docs
examples		examples
packages		packages
packs		packs
resources		resources
scripts		scripts
.bun-version		.bun-version
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.nvmrc		.nvmrc
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
biome.json		biome.json
bun.lock		bun.lock
bunfig.toml		bunfig.toml
package.json		package.json
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`agentic-qa-kit`

The agentic QA operating system for software projects

Table of contents

Why this exists

What makes it different

Quick start (junior-friendly)

1. Install Bun

2. Install the kit in your project

3. Initialize the AQA workspace

4. Install agent-specific files (pick one or many)

5. Run your first agentic QA pass

6. Replay a finding to confirm

The mental model in 7 words

Multi-agent

Architecture at a glance

Roadmap

Status

Documentation

Contributing

Security

License

Maintainers

About

Uh oh!

Releases 16

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agentic-qa-kit

The agentic QA operating system for software projects

Table of contents

Why this exists

What makes it different

Quick start (junior-friendly)

1. Install Bun

2. Install the kit in your project

3. Initialize the AQA workspace

4. Install agent-specific files (pick one or many)

5. Run your first agentic QA pass

6. Replay a finding to confirm

The mental model in 7 words

Multi-agent

Architecture at a glance

Roadmap

Status

Documentation

Contributing

Security

License

Maintainers

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 16

Uh oh!

Contributors

Uh oh!

Languages

`agentic-qa-kit`