Agent Maturity Compass

Score your AI agent. Find the gaps. Fix them.
2 minutes to your first score. Free and open source.

🎮 Web Playground · 📖 Docs · 💬 Community · 📋 Recipes · 🤝 Contribute

What is this?

AMC scores AI agents from what they actually do, not what their docs say they do.

npx agent-maturity-compass quickscore

One command. No account. No API key. You get:

A trust score — L0 (dangerous) to L5 (production-ready), based on execution evidence
A gap analysis — exactly what's weak, what's risky, and what's missing
Generated fixes — guardrails, config patches, CI gates, and compliance artifacts

Then you keep going: add adversarial testing, continuous monitoring, regulatory mapping, and fleet-wide governance — all from the same CLI.

Evaluation workflows — golden datasets, imported evals, lite scoring for non-agent apps
Business and compliance outputs — KPI correlation, leaderboards, audit binders

Works with LangChain, CrewAI, AutoGen, OpenAI Agents SDK, Claude Code, Gemini, OpenClaw, and more — with zero or near-zero integration friction.

Why should I care?

Today, many agents are evaluated by what they claim in docs, prompts, or self-reported checklists. That is structurally weak.

AMC focuses on execution-verified evidence.

How agents are evaluated today	How AMC evaluates
Agent says "I'm safe" → Score: 100 ✅	AMC tests the agent and inspects evidence → Real score may be 16 ❌
Self-reported documentation	Execution-verified evidence
Keyword matching	Weighted trust evidence
"Trust me, bro"	Cryptographic proof chains

That is the entire thesis: trust, but verify — with receipts.

Product family

AMC is one trust stack with eight named product surfaces:

Product	What it does
Score	Evidence-weighted maturity diagnostics and trust scoring
Shield	Adversarial assurance packs and attack simulations
Enforce	Policy controls, approvals, and governance workflows
Vault	Signatures, keys, and tamper-evident proof infrastructure
Watch	Traces, anomalies, monitoring, and operational drift detection
Fleet	Multi-agent oversight, comparison, inventory, and governance
Passport	Portable identity and credential artifacts for agents
Comply	Compliance mappings, audit binders, and governance reporting

These names are intentional. AMC is not a single command with a long README — it is a trust stack you can grow into.

Pricing

The full trust stack is free and MIT licensed. The only paid surface is Industry Packs.

Tier	What you get
Free / Open Source	Everything — Score, Shield, Enforce, Vault, Watch, Fleet, Passport, Comply, all 14 adapters, 481 CLI commands, browser playground, CI gates
Pro	Everything in Free + selected Industry Packs for your regulated verticals
Enterprise	Everything in Pro + all 40 Industry Packs + priority support + custom pack development + deployment assistance

Industry Packs are 40 sector-specific domain packs (healthcare, finance, education, government, etc.) that require ongoing regulatory research and maintenance. The core trust stack stays free forever.

Choose your path

1. Browser — fastest first look

Use the existing browser playground to explore scoring logic, questions, and scenarios.

→ Try the Web Playground

Best for:

first-touch evaluation
demos
lightweight exploration
understanding how scoring works

2. CLI — real evidence workflows

Use the CLI when you want actual execution evidence, traces, datasets, reports, and CI gates.

npx agent-maturity-compass quickscore

Best for:

real agent scoring
evidence capture
local trust workflows
shareable outputs

3. CI — enforce standards continuously

Use AMC in GitHub Actions or CI to prevent trust regressions.

Best for:

release gates
score thresholds
drop detection
PR comments and artifact generation

4. Deployment / enterprise path

If you need self-hosted, managed, or enterprise deployment clarity, start here:

docs/DEPLOYMENT_OPTIONS.md
docs/PRODUCT_EDITIONS.md
docs/PRICING.md
docs/ENTERPRISE.md

Start by persona

Solo builder / OSS maintainer → docs/SOLO_DEV_PATH.md
Platform / engineering team → docs/PLATFORM_PATH.md
Security / compliance → docs/SECURITY_PATH.md

Support AMC

Want to support the open project?

Sponsorship path: SPONSORING.md
Community/support routing: docs/COMMUNITY_SUPPORT.md

Core routing docs

docs/INDEX.md
docs/START_HERE.md
docs/WHY_AMC.md
docs/USE_CASES.md
docs/PERSONAS.md
docs/AFTER_QUICKSCORE.md
docs/EXAMPLES_INDEX.md
docs/RECIPES.md
docs/DEPLOYMENT_OPTIONS.md
docs/PRODUCT_EDITIONS.md
docs/PRICING.md
docs/BUYER_PACKAGES.md
docs/SERVICES_AND_SUPPORT.md
docs/COMMUNITY_SHOWCASE.md
docs/RELEASE_HIGHLIGHTS.md
docs/BENCHMARK_GALLERY.md

⚡ Quick Start

Option 1: Terminal (2 minutes)

# Install
npm i -g agent-maturity-compass

# Score your agent
cd your-agent-project
amc init          # interactive setup
amc quickscore    # get your score
amc fix           # auto-generate fixes

Option 2: Browser (0 minutes)

→ Try the Web Playground — answer questions, explore scenarios and assurance packs, get a score. No install.

This is AMC's browser try-now path: great for first-touch scoring and exploration. For execution evidence, traces, datasets, and CI gates, use the CLI.

Option 3: Docker (0 config)

docker run -it --rm ghcr.io/thewisecrab/amc-quickstart amc quickscore

Option 4: CI/CD (copy-paste)

# .github/workflows/amc.yml
name: AMC Score
on: [push, pull_request]
jobs:
  score:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: thewisecrab/AgentMaturityCompass/amc-action@main
        with:
          target-level: 3        # fail if below L3
          fail-on-drop: true     # fail if score drops
          comment: true          # post results on PR

📋 Recipes — Copy-Paste Examples

Score any agent in one line

npx agent-maturity-compass quickscore                    # quick score
npx agent-maturity-compass quickscore --eu-ai-act        # + EU AI Act check
npx agent-maturity-compass quickscore --share            # shareable link

Wrap an existing agent (zero code changes)

# LangChain
amc wrap langchain -- python my_agent.py

# CrewAI
amc wrap crewai -- python crew.py

# AutoGen
amc wrap autogen -- python autogen_app.py

# OpenClaw
amc wrap openclaw-cli -- openclaw run

# Claude Code
amc wrap claude-code -- claude "analyze this code"

# Any CLI agent
amc wrap generic-cli -- python my_bot.py

Red-team your agent

amc assurance run --scope full                           # full assurance library
amc assurance run --pack prompt-injection                # specific attack
amc assurance run --pack adversarial-robustness          # TAP/PAIR/Crescendo
amc assurance run --format sarif                         # export for security tools

Inspect traces and operational drift

amc observe timeline                                     # score history + evidence volume
amc observe anomalies                                    # volatility / regressions / weirdness
amc trace list                                           # recent agent sessions
amc trace inspect <trace-id>                             # inspect tool calls and trust tiers

Build golden datasets and run evals

amc dataset create support-bot                           # create a reusable eval dataset
amc dataset add-case support-bot --prompt "..." --expected "..."
amc dataset run support-bot                              # run eval cases
amc eval import --format promptfoo --file results.json   # import external eval results
amc lite-score                                           # score a non-agent chatbot / LLM app

Business, inventory, and reporting

amc business kpi                                         # correlate maturity to outcomes
amc business report                                      # stakeholder-ready business summary
amc leaderboard show                                     # compare agents across a fleet
amc inventory scan --deep                                # discover agents, frameworks, model files
amc comms-check --text "Guaranteed 40% return" --domain wealth

Auto-fix everything

amc fix                          # generate guardrails + CI gate + governance docs
amc fix --target-level L4        # target a specific level
amc guide --go                   # detect framework → apply guardrails to config
amc guide --watch                # continuous monitoring + auto-update

Compliance in one command

amc audit binder create --framework eu-ai-act            # EU AI Act evidence binder
amc compliance report --framework iso-42001              # ISO 42001 report
amc domain assess --domain health                        # HIPAA assessment
amc domain assess --domain wealth                        # MiFID II / DORA

GitHub Actions — full CI gate

# .github/workflows/amc.yml — copy this entire file
name: AMC Trust Gate
on:
  pull_request:
  push:
    branches: [main]

jobs:
  amc-score:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: thewisecrab/AgentMaturityCompass/amc-action@main
        with:
          agent-id: my-agent
          target-level: 3
          fail-on-drop: true
          comment: true
          upload-artifacts: true

Badge for your README

<!-- Add this to your README -->
[![AMC Score](https://img.shields.io/badge/AMC-L3_(72.5)-green?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZmlsbD0iI2ZmZiIgZD0iTTEyIDJMMiA3bDEwIDUgMTAtNXptMCA5bC04LjUtNC4yNUwyIDEybDEwIDUgMTAtNXptMCA5bC04LjUtNC4yNUwyIDIxbDEwIDUgMTAtNXoiLz48L3N2Zz4=)](https://github.com/thewisecrab/AgentMaturityCompass)

Result:

🧪 What AMC Tests

138 Diagnostic Questions × 5 Dimensions

Dimension	Questions	What It Measures
Strategic Agent Ops	18	Mission clarity, scope adherence, decision traceability
Skills	38	Tool mastery, injection defense, DLP, least-privilege
Resilience	30	Graceful degradation, circuit breakers, bypass resistance
Leadership & Autonomy	28	Structured logs, traces, cost tracking, SLOs
Culture & Alignment	26	Test harnesses, feedback loops, over-compliance detection

86 Assurance Packs

Category	Examples
Prompt Injection	System tampering, role hijacking, jailbreaks
Exfiltration	Secret leakage, PII exposure, data boundary violations
Adversarial	TAP/PAIR, Crescendo, Skeleton Key, best-of-N
Context Leakage	EchoLeak, cross-session bleed, memory poisoning
Supply Chain	Dependency attacks, MCP server poisoning, SBOM integrity
Behavioral	Sycophancy, self-preservation, sabotage, over-compliance

40 Industry Domain Packs

Sector	Packs	Key Regulations
🏥 Health	9	HIPAA, FDA 21 CFR Part 11, EU MDR, ICH E6(R3)
💰 Wealth	5	MiFID II, PSD2, EU DORA, MiCA, FATF
🎓 Education	5	FERPA, COPPA, IDEA, EU AI Act Annex III
🚇 Mobility	5	UNECE WP.29, ETSI EN 303 645, EU NIS2
💡 Technology	5	EU AI Act Art. 13, EU Data Act, DSA Art. 34
🌿 Environment	6	EU Farm-to-Fork, REACH, IEC 61850
🏛️ Governance	5	EU eIDAS 2.0, UNCAC, UNGPs

74+ Scoring Modules

See all modules

Calibration gap (confidence vs reality)
Evidence conflict detection
Gaming resistance (adversarial score inflation)
Sleeper agent detection (context-dependent behavior)
Policy consistency (pass^k reliability)
Factuality (parametric, retrieval, grounded)
Memory integrity & poisoning resistance
Alignment index (safety × honesty × helpfulness)
Over-compliance detection (H-Neurons, arXiv:2512.01797)
Monitor bypass resistance (arXiv:2503.09950)
Trust-authorization synchronization (arXiv:2512.06914)
MCP compliance scoring
Identity continuity tracking
Behavioral transparency index
And 60+ more...

🏗️ Architecture

Agent (untrusted)
    │
    ▼
AMC Gateway ──── transparent proxy, agent doesn't know it's being watched
    │
    ▼
Evidence Ledger ──── Ed25519 signatures + Merkle tree proof chains
    │
    ▼
Scoring Engine ──── evidence-weighted diagnostics, 74+ scoring modules, 86 assurance packs
    │
    ▼
AMC Studio ──── dashboard + API + CLI + reports

Evidence Trust Tiers

Tier	Weight	How
`OBSERVED_HARDENED`	1.1×	AMC-controlled adversarial scenarios
`OBSERVED`	1.0×	Captured via gateway proxy
`ATTESTED`	0.8×	Cryptographic attestation
`SELF_REPORTED`	0.4×	Agent's own claims (capped)

Maturity Scale

Level	Name	Meaning
L0	Absent	No safety controls
L1	Initial	Some intent, nothing operational
L2	Developing	Works on happy path, breaks at edges
L3	Defined	Repeatable, measurable, auditable (EU AI Act minimum)
L4	Managed	Proactive, risk-calibrated, cryptographic proofs
L5	Optimizing	Self-correcting, continuously verified

The Platform

Module	What It Does
AMC Score	Evidence-weighted diagnostics across 5 dimensions, L0–L5 maturity
AMC Shield	86 assurance packs: injection, exfiltration, adversarial
AMC Enforce	Policy engine, approval workflows, scoped leases
AMC Vault	Ed25519 keys, Merkle chains, HSM/TPM support
AMC Watch	Dashboard, gateway proxy, Prometheus metrics
AMC Fleet	Multi-agent trust, delegation graphs
AMC Passport	Portable agent credential (.amcpass)
AMC Comply	EU AI Act, ISO 42001, NIST AI RMF, SOC 2, OWASP mapping

🔌 14 Framework Adapters

Zero code changes. One environment variable.

amc wrap <adapter> -- <your command>

Adapter	Command
LangChain	`amc wrap langchain -- python app.py`
LangGraph	`amc wrap langgraph -- python graph.py`
CrewAI	`amc wrap crewai -- python crew.py`
AutoGen	`amc wrap autogen -- python autogen.py`
OpenAI Agents SDK	`amc wrap openai-agents -- python agent.py`
LlamaIndex	`amc wrap llamaindex -- python rag.py`
Semantic Kernel	`amc wrap semantic-kernel -- dotnet run`
Claude Code	`amc wrap claude-code -- claude "task"`
Gemini	`amc wrap gemini -- gemini chat`
OpenClaw	`amc wrap openclaw-cli -- openclaw run`
OpenHands	`amc wrap openhands -- openhands run`
Python SDK	`amc wrap python-amc-sdk -- python app.py`
Generic CLI	`amc wrap generic-cli -- python bot.py`
OpenAI-compatible	`amc wrap openai-compat -- node server.js`

📖 Full adapter docs

📊 Compliance Mapping

Framework	Coverage
EU AI Act	12 article mappings + audit binder generation
ISO 42001	Clauses 4-10 mapped to AMC dimensions
NIST AI RMF	Risk management framework alignment
SOC 2	Trust service criteria mapping
OWASP LLM Top 10	Full coverage (10/10)

🚀 Install

npm (recommended)

npm i -g agent-maturity-compass

npx (no install)

npx agent-maturity-compass quickscore

Homebrew

brew tap thewisecrab/amc && brew install agent-maturity-compass

curl

curl -fsSL https://raw.githubusercontent.com/thewisecrab/AgentMaturityCompass/main/install.sh | bash

Docker

docker run -it --rm ghcr.io/thewisecrab/amc-quickstart amc quickscore

From source

git clone https://github.com/thewisecrab/AgentMaturityCompass.git
cd AgentMaturityCompass && npm ci && npm run build && npm link

☁️ Deploy (One-Click)

Platform	Deploy
Docker Compose	`cd docker && docker compose up`
Vercel
Railway

📚 Docs


Quickstart (5 min)	Agent Guide
Solo Dev Quickstart	Platform Engineer Quickstart
Security & Compliance Quickstart	Troubleshooting
CLI Reference (481 commands)	Architecture
Compatibility Matrix	Starter Blueprints
Install Packages	Support Policy
Release Cadence	CI Templates
Hardening Guide	Community

Single-binary install (experimental)

AMC now includes an experimental Node SEA packaging path for host-specific single-binary builds:

npm run build
npm run build:sea

The build path is wired in and produces SEA artifacts plus a manifest. Runtime verification is still experimental and host-sensitive. See docs/SINGLE_BINARY.md for the honest status and caveats.

Nightly compatibility matrix

AMC now includes a scheduled GitHub Actions workflow that validates packaged CLI installs across a small OS/Node matrix and uploads JSON artifacts for inspection:

workflow: .github/workflows/nightly-compatibility-matrix.yml
current matrix: ubuntu-latest + macos-latest, Node 20 + 22
checks: packed install, doctor --json, quickscore --json, lite-score --help, comms-check --help

Workspace config profiles (MVP)

AMC now supports lightweight workspace config presets for .amc/amc.config.yaml:

amc init --profile dev
amc quickstart --profile ci
amc config profile prod

Current MVP behavior:

dev → shared trust boundary, proxy env enabled
ci → isolated trust boundary, proxy env enabled
prod → isolated trust boundary, proxy env disabled
explicit --trust-boundary still overrides the profile when you need it

🤝 Contributing

AMC is MIT licensed. We welcome contributions — especially new assurance packs, domain packs, framework adapters, and scoring modules.

git clone https://github.com/thewisecrab/AgentMaturityCompass.git
cd AgentMaturityCompass && npm ci && npm test   # 3,311 tests

→ CONTRIBUTING.md — includes guides for writing packs, mapping research papers, and adding adapters.

Good first contributions

🔬 New assurance pack — model a new attack scenario (guide)
🏥 New domain pack — add industry-specific questions (guide)
🔌 New adapter — support another agent framework (guide)
📄 Research paper → module — turn arXiv findings into scoring logic (guide)

📄 License

MIT — public trust infrastructure for the age of AI agents.

138 diagnostic questions · 86 assurance packs · 40 domain packs · 14 adapters · 74+ scoring modules · 3,311 tests
Stop trusting. Start verifying.

_{If your AGENTS.md doesn't have an AMC badge, you're running with scissors. 🏃‍♂️✂️}

Name		Name	Last commit message	Last commit date
Latest commit History 497 Commits
.amc		.amc
.changeset		.changeset
.github		.github
.serena		.serena
Formula		Formula
amc-action		amc-action
api		api
deploy		deploy
docker		docker
docs		docs
examples		examples
integrations/pytest-amc		integrations/pytest-amc
internal/debug		internal/debug
platform		platform
research		research
scripts		scripts
sdk/python		sdk/python
src		src
tests		tests
tools/evil-mcp-server		tools/evil-mcp-server
vscode-extension		vscode-extension
website		website
whitepaper		whitepaper
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.runner		Dockerfile.runner
LICENSE		LICENSE
README.md		README.md
RESEARCH_GAPS_MARCH_2026.md		RESEARCH_GAPS_MARCH_2026.md
SECURITY.md		SECURITY.md
SPONSORING.md		SPONSORING.md
package-lock.json		package-lock.json
package.json		package.json
railway.json		railway.json
test_model.pkl		test_model.pkl
tsconfig.json		tsconfig.json
vercel.json		vercel.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

Agent Maturity Compass

What is this?

Product family

Pricing

Choose your path

1. Browser — fastest first look

2. CLI — real evidence workflows

3. CI — enforce standards continuously

4. Deployment / enterprise path

Start by persona

Support AMC

Core routing docs

⚡ Quick Start

Option 1: Terminal (2 minutes)

Option 2: Browser (0 minutes)

Option 3: Docker (0 config)

Option 4: CI/CD (copy-paste)

📋 Recipes — Copy-Paste Examples

Score any agent in one line

Wrap an existing agent (zero code changes)

Red-team your agent

Inspect traces and operational drift

Build golden datasets and run evals

Business, inventory, and reporting

Auto-fix everything

Compliance in one command

GitHub Actions — full CI gate

Badge for your README

🧪 What AMC Tests

138 Diagnostic Questions × 5 Dimensions

86 Assurance Packs

40 Industry Domain Packs

74+ Scoring Modules

🏗️ Architecture

Evidence Trust Tiers

Maturity Scale

The Platform

🔌 14 Framework Adapters

📊 Compliance Mapping

🚀 Install

npm (recommended)

npx (no install)

Homebrew

curl

Docker

From source

☁️ Deploy (One-Click)

📚 Docs

Single-binary install (experimental)

Nightly compatibility matrix

Workspace config profiles (MVP)

🤝 Contributing

Good first contributions

📄 License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages