Score your AI agent. Find the gaps. Fix them.
2 minutes to your first score. Free and open source.
🎮 Web Playground · 📖 Docs · 💬 Community · 📋 Recipes · 🤝 Contribute
AMC scores AI agents from what they actually do, not what their docs say they do.
npx agent-maturity-compass quickscoreOne command. No account. No API key. You get:
- A trust score — L0 (dangerous) to L5 (production-ready), based on execution evidence
- A gap analysis — exactly what's weak, what's risky, and what's missing
- Generated fixes — guardrails, config patches, CI gates, and compliance artifacts
Then you keep going: add adversarial testing, continuous monitoring, regulatory mapping, and fleet-wide governance — all from the same CLI.
- Evaluation workflows — golden datasets, imported evals, lite scoring for non-agent apps
- Business and compliance outputs — KPI correlation, leaderboards, audit binders
Works with LangChain, CrewAI, AutoGen, OpenAI Agents SDK, Claude Code, Gemini, OpenClaw, and more — with zero or near-zero integration friction.
Why should I care?
Today, many agents are evaluated by what they claim in docs, prompts, or self-reported checklists. That is structurally weak.
AMC focuses on execution-verified evidence.
| How agents are evaluated today | How AMC evaluates |
|---|---|
| Agent says "I'm safe" → Score: 100 ✅ | AMC tests the agent and inspects evidence → Real score may be 16 ❌ |
| Self-reported documentation | Execution-verified evidence |
| Keyword matching | Weighted trust evidence |
| "Trust me, bro" | Cryptographic proof chains |
That is the entire thesis: trust, but verify — with receipts.
AMC is one trust stack with eight named product surfaces:
| Product | What it does |
|---|---|
| Score | Evidence-weighted maturity diagnostics and trust scoring |
| Shield | Adversarial assurance packs and attack simulations |
| Enforce | Policy controls, approvals, and governance workflows |
| Vault | Signatures, keys, and tamper-evident proof infrastructure |
| Watch | Traces, anomalies, monitoring, and operational drift detection |
| Fleet | Multi-agent oversight, comparison, inventory, and governance |
| Passport | Portable identity and credential artifacts for agents |
| Comply | Compliance mappings, audit binders, and governance reporting |
These names are intentional. AMC is not a single command with a long README — it is a trust stack you can grow into.
The full trust stack is free and MIT licensed. The only paid surface is Industry Packs.
| Tier | What you get |
|---|---|
| Free / Open Source | Everything — Score, Shield, Enforce, Vault, Watch, Fleet, Passport, Comply, all 14 adapters, 481 CLI commands, browser playground, CI gates |
| Pro | Everything in Free + selected Industry Packs for your regulated verticals |
| Enterprise | Everything in Pro + all 40 Industry Packs + priority support + custom pack development + deployment assistance |
Industry Packs are 40 sector-specific domain packs (healthcare, finance, education, government, etc.) that require ongoing regulatory research and maintenance. The core trust stack stays free forever.
Use the existing browser playground to explore scoring logic, questions, and scenarios.
Best for:
- first-touch evaluation
- demos
- lightweight exploration
- understanding how scoring works
Use the CLI when you want actual execution evidence, traces, datasets, reports, and CI gates.
npx agent-maturity-compass quickscoreBest for:
- real agent scoring
- evidence capture
- local trust workflows
- shareable outputs
Use AMC in GitHub Actions or CI to prevent trust regressions.
Best for:
- release gates
- score thresholds
- drop detection
- PR comments and artifact generation
If you need self-hosted, managed, or enterprise deployment clarity, start here:
docs/DEPLOYMENT_OPTIONS.mddocs/PRODUCT_EDITIONS.mddocs/PRICING.mddocs/ENTERPRISE.md
- Solo builder / OSS maintainer →
docs/SOLO_DEV_PATH.md - Platform / engineering team →
docs/PLATFORM_PATH.md - Security / compliance →
docs/SECURITY_PATH.md
Want to support the open project?
- Sponsorship path:
SPONSORING.md - Community/support routing:
docs/COMMUNITY_SUPPORT.md
docs/INDEX.mddocs/START_HERE.mddocs/WHY_AMC.mddocs/USE_CASES.mddocs/PERSONAS.mddocs/AFTER_QUICKSCORE.mddocs/EXAMPLES_INDEX.mddocs/RECIPES.mddocs/DEPLOYMENT_OPTIONS.mddocs/PRODUCT_EDITIONS.mddocs/PRICING.mddocs/BUYER_PACKAGES.mddocs/SERVICES_AND_SUPPORT.mddocs/COMMUNITY_SHOWCASE.mddocs/RELEASE_HIGHLIGHTS.mddocs/BENCHMARK_GALLERY.md
# Install
npm i -g agent-maturity-compass
# Score your agent
cd your-agent-project
amc init # interactive setup
amc quickscore # get your score
amc fix # auto-generate fixes→ Try the Web Playground — answer questions, explore scenarios and assurance packs, get a score. No install.
This is AMC's browser try-now path: great for first-touch scoring and exploration. For execution evidence, traces, datasets, and CI gates, use the CLI.
docker run -it --rm ghcr.io/thewisecrab/amc-quickstart amc quickscore# .github/workflows/amc.yml
name: AMC Score
on: [push, pull_request]
jobs:
score:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: thewisecrab/AgentMaturityCompass/amc-action@main
with:
target-level: 3 # fail if below L3
fail-on-drop: true # fail if score drops
comment: true # post results on PRnpx agent-maturity-compass quickscore # quick score
npx agent-maturity-compass quickscore --eu-ai-act # + EU AI Act check
npx agent-maturity-compass quickscore --share # shareable link# LangChain
amc wrap langchain -- python my_agent.py
# CrewAI
amc wrap crewai -- python crew.py
# AutoGen
amc wrap autogen -- python autogen_app.py
# OpenClaw
amc wrap openclaw-cli -- openclaw run
# Claude Code
amc wrap claude-code -- claude "analyze this code"
# Any CLI agent
amc wrap generic-cli -- python my_bot.pyamc assurance run --scope full # full assurance library
amc assurance run --pack prompt-injection # specific attack
amc assurance run --pack adversarial-robustness # TAP/PAIR/Crescendo
amc assurance run --format sarif # export for security toolsamc observe timeline # score history + evidence volume
amc observe anomalies # volatility / regressions / weirdness
amc trace list # recent agent sessions
amc trace inspect <trace-id> # inspect tool calls and trust tiersamc dataset create support-bot # create a reusable eval dataset
amc dataset add-case support-bot --prompt "..." --expected "..."
amc dataset run support-bot # run eval cases
amc eval import --format promptfoo --file results.json # import external eval results
amc lite-score # score a non-agent chatbot / LLM appamc business kpi # correlate maturity to outcomes
amc business report # stakeholder-ready business summary
amc leaderboard show # compare agents across a fleet
amc inventory scan --deep # discover agents, frameworks, model files
amc comms-check --text "Guaranteed 40% return" --domain wealthamc fix # generate guardrails + CI gate + governance docs
amc fix --target-level L4 # target a specific level
amc guide --go # detect framework → apply guardrails to config
amc guide --watch # continuous monitoring + auto-updateamc audit binder create --framework eu-ai-act # EU AI Act evidence binder
amc compliance report --framework iso-42001 # ISO 42001 report
amc domain assess --domain health # HIPAA assessment
amc domain assess --domain wealth # MiFID II / DORA# .github/workflows/amc.yml — copy this entire file
name: AMC Trust Gate
on:
pull_request:
push:
branches: [main]
jobs:
amc-score:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: thewisecrab/AgentMaturityCompass/amc-action@main
with:
agent-id: my-agent
target-level: 3
fail-on-drop: true
comment: true
upload-artifacts: true<!-- Add this to your README -->
[-green?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZmlsbD0iI2ZmZiIgZD0iTTEyIDJMMiA3bDEwIDUgMTAtNXptMCA5bC04LjUtNC4yNUwyIDEybDEwIDUgMTAtNXptMCA5bC04LjUtNC4yNUwyIDIxbDEwIDUgMTAtNXoiLz48L3N2Zz4=)](https://github.com/thewisecrab/AgentMaturityCompass)| Dimension | Questions | What It Measures |
|---|---|---|
| Strategic Agent Ops | 18 | Mission clarity, scope adherence, decision traceability |
| Skills | 38 | Tool mastery, injection defense, DLP, least-privilege |
| Resilience | 30 | Graceful degradation, circuit breakers, bypass resistance |
| Leadership & Autonomy | 28 | Structured logs, traces, cost tracking, SLOs |
| Culture & Alignment | 26 | Test harnesses, feedback loops, over-compliance detection |
| Category | Examples |
|---|---|
| Prompt Injection | System tampering, role hijacking, jailbreaks |
| Exfiltration | Secret leakage, PII exposure, data boundary violations |
| Adversarial | TAP/PAIR, Crescendo, Skeleton Key, best-of-N |
| Context Leakage | EchoLeak, cross-session bleed, memory poisoning |
| Supply Chain | Dependency attacks, MCP server poisoning, SBOM integrity |
| Behavioral | Sycophancy, self-preservation, sabotage, over-compliance |
| Sector | Packs | Key Regulations |
|---|---|---|
| 🏥 Health | 9 | HIPAA, FDA 21 CFR Part 11, EU MDR, ICH E6(R3) |
| 💰 Wealth | 5 | MiFID II, PSD2, EU DORA, MiCA, FATF |
| 🎓 Education | 5 | FERPA, COPPA, IDEA, EU AI Act Annex III |
| 🚇 Mobility | 5 | UNECE WP.29, ETSI EN 303 645, EU NIS2 |
| 💡 Technology | 5 | EU AI Act Art. 13, EU Data Act, DSA Art. 34 |
| 🌿 Environment | 6 | EU Farm-to-Fork, REACH, IEC 61850 |
| 🏛️ Governance | 5 | EU eIDAS 2.0, UNCAC, UNGPs |
See all modules
- Calibration gap (confidence vs reality)
- Evidence conflict detection
- Gaming resistance (adversarial score inflation)
- Sleeper agent detection (context-dependent behavior)
- Policy consistency (pass^k reliability)
- Factuality (parametric, retrieval, grounded)
- Memory integrity & poisoning resistance
- Alignment index (safety × honesty × helpfulness)
- Over-compliance detection (H-Neurons, arXiv:2512.01797)
- Monitor bypass resistance (arXiv:2503.09950)
- Trust-authorization synchronization (arXiv:2512.06914)
- MCP compliance scoring
- Identity continuity tracking
- Behavioral transparency index
- And 60+ more...
Agent (untrusted)
│
▼
AMC Gateway ──── transparent proxy, agent doesn't know it's being watched
│
▼
Evidence Ledger ──── Ed25519 signatures + Merkle tree proof chains
│
▼
Scoring Engine ──── evidence-weighted diagnostics, 74+ scoring modules, 86 assurance packs
│
▼
AMC Studio ──── dashboard + API + CLI + reports
| Tier | Weight | How |
|---|---|---|
OBSERVED_HARDENED |
1.1× | AMC-controlled adversarial scenarios |
OBSERVED |
1.0× | Captured via gateway proxy |
ATTESTED |
0.8× | Cryptographic attestation |
SELF_REPORTED |
0.4× | Agent's own claims (capped) |
| Level | Name | Meaning |
|---|---|---|
| L0 | Absent | No safety controls |
| L1 | Initial | Some intent, nothing operational |
| L2 | Developing | Works on happy path, breaks at edges |
| L3 | Defined | Repeatable, measurable, auditable (EU AI Act minimum) |
| L4 | Managed | Proactive, risk-calibrated, cryptographic proofs |
| L5 | Optimizing | Self-correcting, continuously verified |
| Module | What It Does |
|---|---|
| AMC Score | Evidence-weighted diagnostics across 5 dimensions, L0–L5 maturity |
| AMC Shield | 86 assurance packs: injection, exfiltration, adversarial |
| AMC Enforce | Policy engine, approval workflows, scoped leases |
| AMC Vault | Ed25519 keys, Merkle chains, HSM/TPM support |
| AMC Watch | Dashboard, gateway proxy, Prometheus metrics |
| AMC Fleet | Multi-agent trust, delegation graphs |
| AMC Passport | Portable agent credential (.amcpass) |
| AMC Comply | EU AI Act, ISO 42001, NIST AI RMF, SOC 2, OWASP mapping |
Zero code changes. One environment variable.
amc wrap <adapter> -- <your command>| Adapter | Command |
|---|---|
| LangChain | amc wrap langchain -- python app.py |
| LangGraph | amc wrap langgraph -- python graph.py |
| CrewAI | amc wrap crewai -- python crew.py |
| AutoGen | amc wrap autogen -- python autogen.py |
| OpenAI Agents SDK | amc wrap openai-agents -- python agent.py |
| LlamaIndex | amc wrap llamaindex -- python rag.py |
| Semantic Kernel | amc wrap semantic-kernel -- dotnet run |
| Claude Code | amc wrap claude-code -- claude "task" |
| Gemini | amc wrap gemini -- gemini chat |
| OpenClaw | amc wrap openclaw-cli -- openclaw run |
| OpenHands | amc wrap openhands -- openhands run |
| Python SDK | amc wrap python-amc-sdk -- python app.py |
| Generic CLI | amc wrap generic-cli -- python bot.py |
| OpenAI-compatible | amc wrap openai-compat -- node server.js |
| Framework | Coverage |
|---|---|
| EU AI Act | 12 article mappings + audit binder generation |
| ISO 42001 | Clauses 4-10 mapped to AMC dimensions |
| NIST AI RMF | Risk management framework alignment |
| SOC 2 | Trust service criteria mapping |
| OWASP LLM Top 10 | Full coverage (10/10) |
npm i -g agent-maturity-compassnpx agent-maturity-compass quickscorebrew tap thewisecrab/amc && brew install agent-maturity-compasscurl -fsSL https://raw.githubusercontent.com/thewisecrab/AgentMaturityCompass/main/install.sh | bashdocker run -it --rm ghcr.io/thewisecrab/amc-quickstart amc quickscoregit clone https://github.com/thewisecrab/AgentMaturityCompass.git
cd AgentMaturityCompass && npm ci && npm run build && npm link| Platform | Deploy |
|---|---|
| Docker Compose | cd docker && docker compose up |
| Vercel | |
| Railway |
AMC now includes an experimental Node SEA packaging path for host-specific single-binary builds:
npm run build
npm run build:seaThe build path is wired in and produces SEA artifacts plus a manifest. Runtime verification is still experimental and host-sensitive. See docs/SINGLE_BINARY.md for the honest status and caveats.
AMC now includes a scheduled GitHub Actions workflow that validates packaged CLI installs across a small OS/Node matrix and uploads JSON artifacts for inspection:
- workflow:
.github/workflows/nightly-compatibility-matrix.yml - current matrix:
ubuntu-latest+macos-latest, Node20+22 - checks: packed install,
doctor --json,quickscore --json,lite-score --help,comms-check --help
AMC now supports lightweight workspace config presets for .amc/amc.config.yaml:
amc init --profile dev
amc quickstart --profile ci
amc config profile prodCurrent MVP behavior:
dev→ shared trust boundary, proxy env enabledci→ isolated trust boundary, proxy env enabledprod→ isolated trust boundary, proxy env disabled- explicit
--trust-boundarystill overrides the profile when you need it
| Assurance Lab | Domain Packs | | EU AI Act Compliance | Multi-Agent Trust | | Executive Overview | White Paper | | Example Projects | Community | | Web Playground | Compatibility Matrix |
AMC is MIT licensed. We welcome contributions — especially new assurance packs, domain packs, framework adapters, and scoring modules.
git clone https://github.com/thewisecrab/AgentMaturityCompass.git
cd AgentMaturityCompass && npm ci && npm test # 3,311 tests→ CONTRIBUTING.md — includes guides for writing packs, mapping research papers, and adding adapters.
- 🔬 New assurance pack — model a new attack scenario (guide)
- 🏥 New domain pack — add industry-specific questions (guide)
- 🔌 New adapter — support another agent framework (guide)
- 📄 Research paper → module — turn arXiv findings into scoring logic (guide)
MIT — public trust infrastructure for the age of AI agents.
138 diagnostic questions · 86 assurance packs · 40 domain packs · 14 adapters · 74+ scoring modules · 3,311 tests
Stop trusting. Start verifying.
If your AGENTS.md doesn't have an AMC badge, you're running with scissors. 🏃♂️✂️