Skip to content

thewisecrab/AgentMaturityCompass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

497 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMC

Agent Maturity Compass

Score your AI agent. Find the gaps. Fix them.
2 minutes to your first score. Free and open source.

npm downloads tests MIT commands

🎮 Web Playground · 📖 Docs · 💬 Community · 📋 Recipes · 🤝 Contribute


What is this?

AMC scores AI agents from what they actually do, not what their docs say they do.

npx agent-maturity-compass quickscore

One command. No account. No API key. You get:

  1. A trust score — L0 (dangerous) to L5 (production-ready), based on execution evidence
  2. A gap analysis — exactly what's weak, what's risky, and what's missing
  3. Generated fixes — guardrails, config patches, CI gates, and compliance artifacts

Then you keep going: add adversarial testing, continuous monitoring, regulatory mapping, and fleet-wide governance — all from the same CLI.

  • Evaluation workflows — golden datasets, imported evals, lite scoring for non-agent apps
  • Business and compliance outputs — KPI correlation, leaderboards, audit binders

Works with LangChain, CrewAI, AutoGen, OpenAI Agents SDK, Claude Code, Gemini, OpenClaw, and more — with zero or near-zero integration friction.

Why should I care?

Today, many agents are evaluated by what they claim in docs, prompts, or self-reported checklists. That is structurally weak.

AMC focuses on execution-verified evidence.

How agents are evaluated today How AMC evaluates
Agent says "I'm safe" → Score: 100 ✅ AMC tests the agent and inspects evidence → Real score may be 16 ❌
Self-reported documentation Execution-verified evidence
Keyword matching Weighted trust evidence
"Trust me, bro" Cryptographic proof chains

That is the entire thesis: trust, but verify — with receipts.


Product family

AMC is one trust stack with eight named product surfaces:

Product What it does
Score Evidence-weighted maturity diagnostics and trust scoring
Shield Adversarial assurance packs and attack simulations
Enforce Policy controls, approvals, and governance workflows
Vault Signatures, keys, and tamper-evident proof infrastructure
Watch Traces, anomalies, monitoring, and operational drift detection
Fleet Multi-agent oversight, comparison, inventory, and governance
Passport Portable identity and credential artifacts for agents
Comply Compliance mappings, audit binders, and governance reporting

These names are intentional. AMC is not a single command with a long README — it is a trust stack you can grow into.


Pricing

The full trust stack is free and MIT licensed. The only paid surface is Industry Packs.

Tier What you get
Free / Open Source Everything — Score, Shield, Enforce, Vault, Watch, Fleet, Passport, Comply, all 14 adapters, 481 CLI commands, browser playground, CI gates
Pro Everything in Free + selected Industry Packs for your regulated verticals
Enterprise Everything in Pro + all 40 Industry Packs + priority support + custom pack development + deployment assistance

Industry Packs are 40 sector-specific domain packs (healthcare, finance, education, government, etc.) that require ongoing regulatory research and maintenance. The core trust stack stays free forever.


Choose your path

1. Browser — fastest first look

Use the existing browser playground to explore scoring logic, questions, and scenarios.

→ Try the Web Playground

Best for:

  • first-touch evaluation
  • demos
  • lightweight exploration
  • understanding how scoring works

2. CLI — real evidence workflows

Use the CLI when you want actual execution evidence, traces, datasets, reports, and CI gates.

npx agent-maturity-compass quickscore

Best for:

  • real agent scoring
  • evidence capture
  • local trust workflows
  • shareable outputs

3. CI — enforce standards continuously

Use AMC in GitHub Actions or CI to prevent trust regressions.

Best for:

  • release gates
  • score thresholds
  • drop detection
  • PR comments and artifact generation

4. Deployment / enterprise path

If you need self-hosted, managed, or enterprise deployment clarity, start here:

  • docs/DEPLOYMENT_OPTIONS.md
  • docs/PRODUCT_EDITIONS.md
  • docs/PRICING.md
  • docs/ENTERPRISE.md

Start by persona

  • Solo builder / OSS maintainerdocs/SOLO_DEV_PATH.md
  • Platform / engineering teamdocs/PLATFORM_PATH.md
  • Security / compliancedocs/SECURITY_PATH.md

Support AMC

Want to support the open project?

  • Sponsorship path: SPONSORING.md
  • Community/support routing: docs/COMMUNITY_SUPPORT.md

Core routing docs

  • docs/INDEX.md
  • docs/START_HERE.md
  • docs/WHY_AMC.md
  • docs/USE_CASES.md
  • docs/PERSONAS.md
  • docs/AFTER_QUICKSCORE.md
  • docs/EXAMPLES_INDEX.md
  • docs/RECIPES.md
  • docs/DEPLOYMENT_OPTIONS.md
  • docs/PRODUCT_EDITIONS.md
  • docs/PRICING.md
  • docs/BUYER_PACKAGES.md
  • docs/SERVICES_AND_SUPPORT.md
  • docs/COMMUNITY_SHOWCASE.md
  • docs/RELEASE_HIGHLIGHTS.md
  • docs/BENCHMARK_GALLERY.md

⚡ Quick Start

Option 1: Terminal (2 minutes)

# Install
npm i -g agent-maturity-compass

# Score your agent
cd your-agent-project
amc init          # interactive setup
amc quickscore    # get your score
amc fix           # auto-generate fixes

Option 2: Browser (0 minutes)

→ Try the Web Playground — answer questions, explore scenarios and assurance packs, get a score. No install.

This is AMC's browser try-now path: great for first-touch scoring and exploration. For execution evidence, traces, datasets, and CI gates, use the CLI.

Option 3: Docker (0 config)

docker run -it --rm ghcr.io/thewisecrab/amc-quickstart amc quickscore

Option 4: CI/CD (copy-paste)

# .github/workflows/amc.yml
name: AMC Score
on: [push, pull_request]
jobs:
  score:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: thewisecrab/AgentMaturityCompass/amc-action@main
        with:
          target-level: 3        # fail if below L3
          fail-on-drop: true     # fail if score drops
          comment: true          # post results on PR

📋 Recipes — Copy-Paste Examples

Score any agent in one line

npx agent-maturity-compass quickscore                    # quick score
npx agent-maturity-compass quickscore --eu-ai-act        # + EU AI Act check
npx agent-maturity-compass quickscore --share            # shareable link

Wrap an existing agent (zero code changes)

# LangChain
amc wrap langchain -- python my_agent.py

# CrewAI
amc wrap crewai -- python crew.py

# AutoGen
amc wrap autogen -- python autogen_app.py

# OpenClaw
amc wrap openclaw-cli -- openclaw run

# Claude Code
amc wrap claude-code -- claude "analyze this code"

# Any CLI agent
amc wrap generic-cli -- python my_bot.py

Red-team your agent

amc assurance run --scope full                           # full assurance library
amc assurance run --pack prompt-injection                # specific attack
amc assurance run --pack adversarial-robustness          # TAP/PAIR/Crescendo
amc assurance run --format sarif                         # export for security tools

Inspect traces and operational drift

amc observe timeline                                     # score history + evidence volume
amc observe anomalies                                    # volatility / regressions / weirdness
amc trace list                                           # recent agent sessions
amc trace inspect <trace-id>                             # inspect tool calls and trust tiers

Build golden datasets and run evals

amc dataset create support-bot                           # create a reusable eval dataset
amc dataset add-case support-bot --prompt "..." --expected "..."
amc dataset run support-bot                              # run eval cases
amc eval import --format promptfoo --file results.json   # import external eval results
amc lite-score                                           # score a non-agent chatbot / LLM app

Business, inventory, and reporting

amc business kpi                                         # correlate maturity to outcomes
amc business report                                      # stakeholder-ready business summary
amc leaderboard show                                     # compare agents across a fleet
amc inventory scan --deep                                # discover agents, frameworks, model files
amc comms-check --text "Guaranteed 40% return" --domain wealth

Auto-fix everything

amc fix                          # generate guardrails + CI gate + governance docs
amc fix --target-level L4        # target a specific level
amc guide --go                   # detect framework → apply guardrails to config
amc guide --watch                # continuous monitoring + auto-update

Compliance in one command

amc audit binder create --framework eu-ai-act            # EU AI Act evidence binder
amc compliance report --framework iso-42001              # ISO 42001 report
amc domain assess --domain health                        # HIPAA assessment
amc domain assess --domain wealth                        # MiFID II / DORA

GitHub Actions — full CI gate

# .github/workflows/amc.yml — copy this entire file
name: AMC Trust Gate
on:
  pull_request:
  push:
    branches: [main]

jobs:
  amc-score:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: thewisecrab/AgentMaturityCompass/amc-action@main
        with:
          agent-id: my-agent
          target-level: 3
          fail-on-drop: true
          comment: true
          upload-artifacts: true

Badge for your README

<!-- Add this to your README -->
[![AMC Score](https://img.shields.io/badge/AMC-L3_(72.5)-green?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZmlsbD0iI2ZmZiIgZD0iTTEyIDJMMiA3bDEwIDUgMTAtNXptMCA5bC04LjUtNC4yNUwyIDEybDEwIDUgMTAtNXptMCA5bC04LjUtNC4yNUwyIDIxbDEwIDUgMTAtNXoiLz48L3N2Zz4=)](https://github.com/thewisecrab/AgentMaturityCompass)

Result: AMC Score


🧪 What AMC Tests

138 Diagnostic Questions × 5 Dimensions

Dimension Questions What It Measures
Strategic Agent Ops 18 Mission clarity, scope adherence, decision traceability
Skills 38 Tool mastery, injection defense, DLP, least-privilege
Resilience 30 Graceful degradation, circuit breakers, bypass resistance
Leadership & Autonomy 28 Structured logs, traces, cost tracking, SLOs
Culture & Alignment 26 Test harnesses, feedback loops, over-compliance detection

86 Assurance Packs

Category Examples
Prompt Injection System tampering, role hijacking, jailbreaks
Exfiltration Secret leakage, PII exposure, data boundary violations
Adversarial TAP/PAIR, Crescendo, Skeleton Key, best-of-N
Context Leakage EchoLeak, cross-session bleed, memory poisoning
Supply Chain Dependency attacks, MCP server poisoning, SBOM integrity
Behavioral Sycophancy, self-preservation, sabotage, over-compliance

40 Industry Domain Packs

Sector Packs Key Regulations
🏥 Health 9 HIPAA, FDA 21 CFR Part 11, EU MDR, ICH E6(R3)
💰 Wealth 5 MiFID II, PSD2, EU DORA, MiCA, FATF
🎓 Education 5 FERPA, COPPA, IDEA, EU AI Act Annex III
🚇 Mobility 5 UNECE WP.29, ETSI EN 303 645, EU NIS2
💡 Technology 5 EU AI Act Art. 13, EU Data Act, DSA Art. 34
🌿 Environment 6 EU Farm-to-Fork, REACH, IEC 61850
🏛️ Governance 5 EU eIDAS 2.0, UNCAC, UNGPs

74+ Scoring Modules

See all modules
  • Calibration gap (confidence vs reality)
  • Evidence conflict detection
  • Gaming resistance (adversarial score inflation)
  • Sleeper agent detection (context-dependent behavior)
  • Policy consistency (pass^k reliability)
  • Factuality (parametric, retrieval, grounded)
  • Memory integrity & poisoning resistance
  • Alignment index (safety × honesty × helpfulness)
  • Over-compliance detection (H-Neurons, arXiv:2512.01797)
  • Monitor bypass resistance (arXiv:2503.09950)
  • Trust-authorization synchronization (arXiv:2512.06914)
  • MCP compliance scoring
  • Identity continuity tracking
  • Behavioral transparency index
  • And 60+ more...

🏗️ Architecture

Agent (untrusted)
    │
    ▼
AMC Gateway ──── transparent proxy, agent doesn't know it's being watched
    │
    ▼
Evidence Ledger ──── Ed25519 signatures + Merkle tree proof chains
    │
    ▼
Scoring Engine ──── evidence-weighted diagnostics, 74+ scoring modules, 86 assurance packs
    │
    ▼
AMC Studio ──── dashboard + API + CLI + reports

Evidence Trust Tiers

Tier Weight How
OBSERVED_HARDENED 1.1× AMC-controlled adversarial scenarios
OBSERVED 1.0× Captured via gateway proxy
ATTESTED 0.8× Cryptographic attestation
SELF_REPORTED 0.4× Agent's own claims (capped)

Maturity Scale

Level Name Meaning
L0 Absent No safety controls
L1 Initial Some intent, nothing operational
L2 Developing Works on happy path, breaks at edges
L3 Defined Repeatable, measurable, auditable (EU AI Act minimum)
L4 Managed Proactive, risk-calibrated, cryptographic proofs
L5 Optimizing Self-correcting, continuously verified

The Platform

Module What It Does
AMC Score Evidence-weighted diagnostics across 5 dimensions, L0–L5 maturity
AMC Shield 86 assurance packs: injection, exfiltration, adversarial
AMC Enforce Policy engine, approval workflows, scoped leases
AMC Vault Ed25519 keys, Merkle chains, HSM/TPM support
AMC Watch Dashboard, gateway proxy, Prometheus metrics
AMC Fleet Multi-agent trust, delegation graphs
AMC Passport Portable agent credential (.amcpass)
AMC Comply EU AI Act, ISO 42001, NIST AI RMF, SOC 2, OWASP mapping

🔌 14 Framework Adapters

Zero code changes. One environment variable.

amc wrap <adapter> -- <your command>
Adapter Command
LangChain amc wrap langchain -- python app.py
LangGraph amc wrap langgraph -- python graph.py
CrewAI amc wrap crewai -- python crew.py
AutoGen amc wrap autogen -- python autogen.py
OpenAI Agents SDK amc wrap openai-agents -- python agent.py
LlamaIndex amc wrap llamaindex -- python rag.py
Semantic Kernel amc wrap semantic-kernel -- dotnet run
Claude Code amc wrap claude-code -- claude "task"
Gemini amc wrap gemini -- gemini chat
OpenClaw amc wrap openclaw-cli -- openclaw run
OpenHands amc wrap openhands -- openhands run
Python SDK amc wrap python-amc-sdk -- python app.py
Generic CLI amc wrap generic-cli -- python bot.py
OpenAI-compatible amc wrap openai-compat -- node server.js

📖 Full adapter docs


📊 Compliance Mapping

Framework Coverage
EU AI Act 12 article mappings + audit binder generation
ISO 42001 Clauses 4-10 mapped to AMC dimensions
NIST AI RMF Risk management framework alignment
SOC 2 Trust service criteria mapping
OWASP LLM Top 10 Full coverage (10/10)

🚀 Install

npm (recommended)

npm i -g agent-maturity-compass

npx (no install)

npx agent-maturity-compass quickscore

Homebrew

brew tap thewisecrab/amc && brew install agent-maturity-compass

curl

curl -fsSL https://raw.githubusercontent.com/thewisecrab/AgentMaturityCompass/main/install.sh | bash

Docker

docker run -it --rm ghcr.io/thewisecrab/amc-quickstart amc quickscore

From source

git clone https://github.com/thewisecrab/AgentMaturityCompass.git
cd AgentMaturityCompass && npm ci && npm run build && npm link

☁️ Deploy (One-Click)

Platform Deploy
Docker Compose cd docker && docker compose up
Vercel Deploy
Railway Deploy

📚 Docs

Quickstart (5 min) Agent Guide
Solo Dev Quickstart Platform Engineer Quickstart
Security & Compliance Quickstart Troubleshooting
CLI Reference (481 commands) Architecture
Compatibility Matrix Starter Blueprints
Install Packages Support Policy
Release Cadence CI Templates
Hardening Guide Community

Single-binary install (experimental)

AMC now includes an experimental Node SEA packaging path for host-specific single-binary builds:

npm run build
npm run build:sea

The build path is wired in and produces SEA artifacts plus a manifest. Runtime verification is still experimental and host-sensitive. See docs/SINGLE_BINARY.md for the honest status and caveats.

Nightly compatibility matrix

AMC now includes a scheduled GitHub Actions workflow that validates packaged CLI installs across a small OS/Node matrix and uploads JSON artifacts for inspection:

  • workflow: .github/workflows/nightly-compatibility-matrix.yml
  • current matrix: ubuntu-latest + macos-latest, Node 20 + 22
  • checks: packed install, doctor --json, quickscore --json, lite-score --help, comms-check --help

Workspace config profiles (MVP)

AMC now supports lightweight workspace config presets for .amc/amc.config.yaml:

amc init --profile dev
amc quickstart --profile ci
amc config profile prod

Current MVP behavior:

  • dev → shared trust boundary, proxy env enabled
  • ci → isolated trust boundary, proxy env enabled
  • prod → isolated trust boundary, proxy env disabled
  • explicit --trust-boundary still overrides the profile when you need it

| Assurance Lab | Domain Packs | | EU AI Act Compliance | Multi-Agent Trust | | Executive Overview | White Paper | | Example Projects | Community | | Web Playground | Compatibility Matrix |


🤝 Contributing

AMC is MIT licensed. We welcome contributions — especially new assurance packs, domain packs, framework adapters, and scoring modules.

git clone https://github.com/thewisecrab/AgentMaturityCompass.git
cd AgentMaturityCompass && npm ci && npm test   # 3,311 tests

CONTRIBUTING.md — includes guides for writing packs, mapping research papers, and adding adapters.

Good first contributions

  • 🔬 New assurance pack — model a new attack scenario (guide)
  • 🏥 New domain pack — add industry-specific questions (guide)
  • 🔌 New adapter — support another agent framework (guide)
  • 📄 Research paper → module — turn arXiv findings into scoring logic (guide)

📄 License

MIT — public trust infrastructure for the age of AI agents.


138 diagnostic questions · 86 assurance packs · 40 domain packs · 14 adapters · 74+ scoring modules · 3,311 tests
Stop trusting. Start verifying.

If your AGENTS.md doesn't have an AMC badge, you're running with scissors. 🏃‍♂️✂️

About

The credit score for AI agents. 140 diagnostic questions, 75 scoring modules, 85 assurance packs, 40 domain packs, 2723 tests. Evidence-gated trust scoring with cryptographic proof chains.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors