Skip to content

padosoft/agentic-qa-kit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

agentic-qa-kit

The agentic QA operating system for software projects

Turn any repository into an agentic QA lab. Works with Claude · Codex · Gemini · Copilot. Bun-first. Enterprise-ready.

License: Apache 2.0 Bun Node TypeScript strict CI Release Status: GA Works with Claude · Codex · Gemini · Copilot

Not a test runner. An operating system for agentic QA.

A standardized framework that turns coding agents into QA engineers guided by risk maps, invariants, scenarios, probes, oracles, and replay. It is not a prompt. It is the reusable framework that makes the prompt operational, reproducible, versionable, and adaptable to every project.


Agentic QA Kit — agentic QA operating system

Table of contents


Why this exists

Coding agents (Claude Code, Codex CLI, Gemini CLI, GitHub Copilot CLI) are great at writing code. They are poor QA engineers by default: they will gladly add a feature without imagining how a malicious user might exploit it, how a second tenant might leak across, or how the LLM tool-calling layer can be tricked into refunding a payment without confirmation.

agentic-qa-kit provides the operating system the agent needs to behave like a senior QA engineer on your project:

  • An explicit risk map with severity, invariants, probes, and oracles
  • Pre-built scenario packs for APIs, web UIs, LLM agents, security, migrations
  • Adapters that install the right skills for Claude / Codex / Gemini / Copilot
  • A runner that executes profiles deterministically (smoke, exploratory, security, release-gate)
  • Findings with three-level reproducibility, bug-level deterministic replay, and suggested regression tests
  • Optional admin panel (React) and server (Bun/Node) for multi-team self-hosted deployments

What makes it different

  • 🧠 Multi-agent native — Claude · Codex · Gemini · Copilot first-class adapters, not "Claude with the others bolted on". Adapter capability negotiation, so each agent uses its best primitives (subagents, skills, slash commands, hooks).
  • 🎯 Deterministic replay where it matters — three-level reproducibility (bug / scenario / agent). The kit never lies about LLM determinism. Bug-level deterministic replay is required for any release-gate verified finding.
  • 🔒 Sandbox by design — container-per-scenario isolation default for security and release-gate profiles. Egress allowlists. Tool-call budgets. Resource limits. Cost kill-switches.
  • 💰 Cost governance built-in — per-org / project / profile / scenario budgets in USD and tokens, hard kill-switches, attribution to risk areas. No more "an agent loop burned $400 overnight".
  • 🏠 BYOK + on-prem LLM — bring your own Anthropic/OpenAI keys, or use vLLM / Bedrock private / Azure OpenAI VNet / llama.cpp. Air-gap deploy supported.
  • 📋 OWASP Top 10 Agentic (2026) built-in security pack. Plus STRIDE / FMEA risk discovery (v0.6).
  • 🧾 Hash-chained audit log + WORM export. SOC2 / ISO 27001 / GDPR / HIPAA alignment on the roadmap (v0.3 self-hosted, v1.0 GA).
  • 🔁 Process-first governance — every PR follows a documented loop with Copilot Code Review. Lessons captured in docs/LESSON.md for permanent improvement.

Quick start (junior-friendly)

Status note: the kit reached v1.0 GA (24-task roadmap complete) and is now at v1.1. The 18 workspace packages (@aqa/schemas, @aqa/kit, @aqa/runner, @aqa/reporter, @aqa/server, @aqa/admin, @aqa/compliance, @aqa/methodology, …) ship from this monorepo. Detailed walk-through: docs/getting-started.md.

Preview the v0.1.0 quick start (click to expand)

1. Install Bun

# macOS / Linux
curl -fsSL https://bun.sh/install | bash

# Windows (PowerShell)
powershell -c "irm bun.sh/install.ps1 | iex"

2. Install the kit in your project

cd /path/to/your/project
bun add -d agentic-qa-kit

If you don't have a project yet, clone examples/bun-api from this repo (available in v0.1.0).

3. Initialize the AQA workspace

bunx aqa init

Detects your stack and creates .aqa/ with testing.md, risk-map.yaml, profiles.yaml, and scenarios for the packs your project matches.

4. Install agent-specific files (pick one or many)

bunx aqa install-agent-files --targets claude,codex,gemini,copilot

This generates CLAUDE.md + .claude/skills/aqa-*, AGENTS.md + .agents/skills/, GEMINI.md + .gemini/skills/, .github/copilot-instructions.md + .github/skills/.

5. Run your first agentic QA pass

bunx aqa run --profile smoke

A 10-minute, non-destructive sweep. When it finishes:

bunx aqa report

You'll see findings like:

AQA-2026-0001 [P1] Cross-tenant data leak (verified, 3/3 deterministic replay)
AQA-2026-0002 [P3] Missing rate limit on /api/search

6. Replay a finding to confirm

bunx aqa replay AQA-2026-0001

Re-runs the deterministic bug reproduction (curl / Playwright / SQL) and tells you if it still reproduces. If it doesn't, the bug is fixed — closes the loop.

The mental model in 7 words

Risk → Invariant → Scenario → Probe → Oracle → Finding → Replay

Every concept in AQA is one of these seven things or a tool that operates on them. See docs/ecosystem-explained.md for the deep introduction.

Multi-agent

Target Files generated Capability highlights
🟣 Claude Code CLAUDE.md, .claude/skills/aqa-*, .claude/agents/aqa-* Skills, subagents (isolated context), hooks, MCP
🟢 Codex AGENTS.md, .agents/skills/aqa-*, optional Codex plugin Skills, explicit subagents, plugins, MCP
🔵 Gemini CLI GEMINI.md, .gemini/skills/aqa-*, .gemini/agents/, .gemini/commands/*.toml Skills, subagents, slash commands, MCP
GitHub Copilot CLI .github/copilot-instructions.md, .github/skills/aqa-*, .github/agents/*.agent.md, .github/hooks/*.json Skills (auto-detects .claude/skills), custom agents, hooks

Capability negotiation is runtime: the kit asks the agent target what it supports, and degrades gracefully when something is missing.

Architecture at a glance

+- Local mode (single dev / CI) -----------------------------+
|  bunx aqa CLI                                              |
|   |- engine + runner (sandboxed)                           |
|   |- packs (core, api, web-ui, llm-agent, security, ...)   |
|   |- adapters (Claude/Codex/Gemini/Copilot)                |
|   `- .aqa/  (project state, runs, findings, replay)        |
+------------------------------------------------------------+

+- Self-hosted (multi-team, post v0.3) ----------------------+
|  Control Plane (HA)                                        |
|   |- agentic-qa-kit-server (Hono+Bun or Express+Node)      |
|   |- agentic-qa-kit-admin (React)                          |
|   |- Postgres HA . Redis/NATS . S3-compat . Vault . OIDC   |
|   `- OTel Collector + Prometheus + Tempo + Loki            |
|                                                            |
|  Runners (per-team / CI shared / dev laptop)               |
|   - mTLS + OIDC to the control plane                       |
|   - execute scenarios next to the code (code never leaves) |
+------------------------------------------------------------+

Full diagram: docs/architecture/reference.md (stub; expanded in v0.1.0).

Roadmap

Version Theme Highlights
v0.0.1-governance Bootstrap Process docs, CI, Copilot review automation, admin spec
v0.1.x Foundation Schemas, CLI (init/doctor/validate), 5 base packs, 4 adapters, runner+smoke, reports, admin viewer
v0.2.x Determinism & cost 3-level replay, cost governance, container sandbox default
v0.3.x Enterprise table-stakes Postgres backend, SSO/RBAC, pack signing, on-prem LLM, Helm chart, air-gap installer
v0.4.x Admin editing Scenario Studio, AI-generation with review workflow
v0.5.x Multi-team Server + runner fleet, findings dedup, bug→fix→verify-fix loop
v0.6.x Methodology rigor STRIDE/FMEA/OWASP integration, oracle ensemble, judge calibration
v1.0 GA enterprise — shipped SOC2/ISO controls catalog, aqa-audit-verify CLI, pen-test scope doc
v1.1 Polish — shipped Banner, full Helm chart (runner StatefulSet, Ingress, NetworkPolicy, Postgres subchart), 3 example targets (Bun, Next.js, Laravel)
v1.2 Admin SPA wired — shipped Tailwind 4 + TanStack Router + Query + 12 screens, audit-chain verification in-browser via Web Crypto
v1.3 Quality batch — shipped Admin server↔UI mapping, 6 detail routes, 12 new admin tests, CLI E2E smoke gate, threat-model expansion, CHANGELOG backfill

Status

GA (v1.0 shipped, v1.3 current). The full 24-task roadmap is closed: schemas, CLI (@aqa/kit), 5 baseline packs, multi-agent adapters (Claude/Codex/Gemini/Copilot), runner with hash-chained audit, reporter with 3-level replay, admin panel, server + runner fleet, on-prem LLM adapters, SSO/RBAC, Postgres backend, pack signing + scanning, container sandbox, cost governance, findings dedup + clustering, STRIDE/FMEA/OWASP methodology layer, Helm chart + Terraform + air-gap installer, SOC2/ISO controls catalog + aqa-audit-verify CLI.

Release notes per tag: Releases page. Live state: docs/PROGRESS.md. Architectural decisions: docs/adr/.

Documentation

Contributing

Please read CONTRIBUTING.md, AGENTS.md, and docs/RULES.md first.

We follow a strict PR loop with Copilot Code Review on every PR (automated by .github/workflows/copilot-review.yml).

Security

For vulnerabilities, use the private channel in SECURITY.md — do not file public issues.

License

Apache License 2.0. © Padosoft.

Maintainers

Padosoftinfo@padosoft.com

About

AQA Kit is not a test runner. It is an agentic QA operating system for software projects.Turn any repository into an agentic QA lab. Works with Claude · Codex · Gemini · Copilot. Bun-first. Enterprise-ready.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors