Open-source observability & audit trail for AI agents
git clone https://github.com/amitpaz1/agentlens
cd agentlens
cp .env.example .env
docker compose up
# Open http://localhost:3000For production (auth enabled, Stripe, TLS):
docker compose -f docker-compose.yml -f docker-compose.prod.yml upAgentLens is a flight recorder for AI agents. It captures every LLM call, tool invocation, approval decision, and error β then presents it through a queryable API and real-time web dashboard.
Four ways to integrate β pick what fits your stack:
| Integration | Language | Effort | Capture |
|---|---|---|---|
| π€ OpenClaw Plugin | OpenClaw | Copy & enable | Every Anthropic call β prompts, tokens, cost, tools β zero code |
| π Python Auto-Instrumentation | Python | 1 line | Every OpenAI / Anthropic / LangChain call β deterministic |
| π MCP Server | Any (MCP) | Config block | Tool calls, sessions, events from Claude Desktop / Cursor |
| π¦ SDK | Python, TypeScript | Code | Full control β log events, query analytics, build integrations |
- π Python Auto-Instrumentation β
agentlensai.init()and every LLM call across 9 providers (OpenAI, Anthropic, LiteLLM, Bedrock, Vertex, Gemini, Mistral, Cohere, Ollama) is captured automatically. Deterministic β no reliance on LLM behavior. - π MCP-Native β Ships as an MCP server. Agents connect to it like any other tool. Works with Claude Desktop, Cursor, and any MCP client.
- π§ LLM Call Tracking β Full prompt/completion visibility, token usage, cost aggregation, latency measurement, and privacy redaction.
- π Real-Time Dashboard β Session timelines, event explorer, LLM analytics, cost tracking, and alerting in a beautiful web UI.
- π Tamper-Evident Audit Trail β Append-only event storage with SHA-256 hash chains per session. Cryptographically linked and verifiable.
- π° Cost Tracking β Track token usage and estimated costs per session, per agent, per model, over time. Alert on cost spikes.
- π¨ Alerting β Configurable rules for error rate, cost threshold, latency anomalies, and inactivity.
- π AgentKit Ecosystem β First-class integrations with AgentGate (approval flows), FormBridge (data collection), Lore (cross-agent memory), and AgentEval (testing & evaluation).
- π Tenant Isolation β Multi-tenant support with per-tenant data scoping, API key binding, and embedding isolation.
- β€οΈβπ©Ή Health Scores β 5-dimension health scoring (error rate, cost efficiency, tool success, latency, completion rate) with trend tracking. Monitor agent reliability at a glance.
- π‘ Cost Optimization β Complexity-aware model recommendation engine. Classifies LLM calls by complexity tier and suggests cheaper alternatives with projected savings.
- πΌ Session Replay β Step-through any past session with full context reconstruction β LLM history, tool results, cost accumulation, and error tracking at every step.
- βοΈ A/B Benchmarking β Statistical comparison of agent variants using Welch's t-test and chi-squared analysis across 8 metrics. Create experiments, collect data, get p-values.
- π‘οΈ Guardrails β Automated safety rules that monitor error rates, costs, health scores, and custom metrics. Actions include pausing agents, sending webhooks, downgrading models, and applying AgentGate policies. Dry-run mode for safe testing.
- π Framework Plugins β Optional plugins for LangChain, CrewAI, AutoGen, and Semantic Kernel. Auto-detection, fail-safe, non-blocking instrumentation with zero code changes.
- π Self-Hosted β SQLite by default, no external dependencies. MIT licensed. Your data stays on your infrastructure.
AgentLens ships with a real-time web dashboard for monitoring your agents.
The overview page shows live metrics β sessions, events, errors, and active agents β with a 24-hour event timeline chart, recent sessions with status badges (active/completed), and a recent errors feed. Everything updates in real-time via SSE.
The sessions table shows every agent session with sortable columns: agent name, status, start time, duration, event count, error count, and total cost. Filter by agent or status (Active / Completed / Error) to drill down.
Click into any session to see the full event timeline β every tool call, error, cost event, and session lifecycle event in chronological order. The green β Chain Valid badge confirms the tamper-evident hash chain is intact. Filter by event type (Tool Calls, Errors, Approvals, Custom). Cost breakdown shows token usage and spend.
The events explorer gives you a searchable, filterable view of every event across all sessions. Filter by event type, severity, agent, or time range. Full-text search works on payload content. Each row shows the tool name, agent, session, severity level, and duration.
The LLM Analytics page shows total LLM calls, cost, latency, and token usage across all agents. Cost and calls over time charts, plus a model comparison table breaking down usage by provider and model (Anthropic, OpenAI, Google). Filter by agent, provider, or model.
LLM calls appear in the session timeline with π§ icons and indigo styling, paired with their completions by callId. Each node shows the model, message count, token usage (in/out), cost badge, and latency. Tool calls and LLM calls are interleaved chronologically β see exactly what the agent thought, then did.
Click any LLM call to see the full prompt and completion in a chat-bubble style viewer. System, user, assistant, and tool messages each get distinct styling. The metadata panel shows provider, model, parameters (temperature, max tokens), token breakdown (input/output/thinking/cache), cost, latency, tools provided to the model, and the tamper-evident hash chain.
The Health Overview page shows a 5-dimension health score (0β100) for every agent: error rate, cost efficiency, tool success, latency, and completion rate. Each dimension is scored independently and combined into a weighted overall score. Trend arrows (β improving, β stable, β degrading) show direction over time. Click any agent to see a historical sparkline of their score.
The Cost Optimization page analyzes your LLM call patterns and recommends cheaper model alternatives. Calls are classified by complexity tier (simple / moderate / complex), and the recommendation engine suggests where you can safely downgrade β e.g., "Switch gpt-4o β gpt-4o-mini for SIMPLE tasks, saving $89/month." Confidence levels and success rate comparisons are shown for each recommendation.
Session Replay lets you step through any past session event by event with full context reconstruction. A scrubber/timeline control moves through steps chronologically. At each step, the context panel shows cumulative cost, LLM conversation history, tool call results, pending approvals, and error count. Filter by event type, jump to specific steps, or replay just the summary.
The Benchmarks page lets you create and manage A/B experiments comparing agent variants. Define 2β10 variants with session tags, pick metrics (cost, latency, error rate, success rate, tokens, duration), and collect data. Results include per-variant statistics, Welch's t-test p-values, confidence stars (β β β β β β ), and distribution charts. The full workflow β draft β running β completed β is managed from the dashboard.
The Guardrails page lets you create and manage automated safety rules that monitor error rates, costs, health scores, and custom metrics. Each rule has a condition, action, cooldown, and optional dry-run mode. The list shows trigger counts and last triggered time. Click any rule for the detail page with full configuration, runtime state, and trigger history. The Activity Feed shows a real-time log of all triggers across all rules with filtering by agent and rule.
Don't want to self-host? AgentLens Cloud is a fully managed SaaS β same SDK, zero infrastructure:
import agentlensai
agentlensai.init(
cloud=True,
api_key="als_cloud_your_key_here",
agent_id="my-agent",
)
# That's it β all LLM calls are captured and sent to the cloud- Same SDK, one parameter change β switch
url=tocloud=True - Managed Postgres β multi-tenant with row-level security
- Team features β organizations, RBAC, audit logs, usage billing
- No server to run β dashboard at app.agentlens.ai
π Cloud Setup Guide β sign up, create API key, verify first event π Migration Guide β move from self-hosted to cloud in 5 minutes π§ Troubleshooting β common issues and how to fix them
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your AI Agents β
β β
β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββ β
β β Python App β β MCP Client β β TypeScript β β
β β (OpenAI, β β (Claude Desktop, β β App β β
β β Anthropic, β β Cursor, etc.) β β β β
β β LangChain) β β β β β β
β ββββββββββ¬ββββββββββ ββββββββββ¬ββββββββββββ βββββββββ¬βββββββββ β
β β β β β
β agentlensai.init() MCP Protocol (stdio) @agentlensai/sdk β
β Auto-instrumentation β β β
β β ββββββββ΄ββββββββββββ β β
β β β @agentlensai/mcp β β β
β β ββββββββ¬ββββββββββββ β β
β β β β β
β βββββββββββββββββββββββΌββββββββββββββββββββββββ β
β β β
β HTTP REST API β
βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β @agentlensai/server β
β β
β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββββ β
β β Ingest β β Query β β Alert β β LLM β β
β β Engine β β Engine β β Engine β β Analytics β β
β βββββββ¬βββββββ βββββββ¬βββββββ ββββββββββββββ ββββββββββββββββ β
β βββββββββββββββββ β
β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββββ β
β β Recall β β Reflect β β Context β β Guardrails β β
β β (Semantic) β β (Patterns) β β (X-Session)β β Engine β β
β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββββ β
β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββββ β
β β Health β β Cost β β Session β β Benchmark β β
β β Scoring β β Optimizer β β Replay β β Engine β β
β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββββ β
β β β
β ββββββββ΄βββββββ βββββββββββββββ β
β β SQLite β β Dashboard β β
β β (append β β React SPA β β
β β only) β β (served β β
β βββββββββββββββ β at /) β β
β βββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Integrations: AgentGate βββ
FormBridge ββ€βββΊ POST /api/events/ingest
Generic βββββ (HMAC-SHA256 verified)
npx @agentlensai/serverOpens on http://localhost:3400 with SQLite β zero config.
curl -X POST http://localhost:3400/api/keys \
-H "Content-Type: application/json" \
-d '{"name": "my-agent"}'Save the als_... key from the response β it's shown only once.
If you're running OpenClaw, the AgentLens plugin captures every Anthropic API call automatically β prompts, completions, token usage, costs, latency, and tool calls. No proxy, no preload scripts, no code changes.
# Copy the plugin into OpenClaw's extensions directory
cp -r packages/openclaw-plugin /usr/lib/node_modules/openclaw/extensions/agentlens-relay
# Enable it
openclaw config patch '{"plugins":{"entries":{"agentlens-relay":{"enabled":true}}}}'
# Restart
openclaw gateway restartThat's it. Open the AgentLens dashboard and you'll see every LLM call with full prompt visibility, cost tracking, and tool call extraction.
Set AGENTLENS_URL if your AgentLens instance isn't on localhost:3000. See the plugin README for details.
One line β every LLM call captured automatically. 9 providers supported: OpenAI, Anthropic, LiteLLM, AWS Bedrock, Google Vertex AI, Google Gemini, Mistral AI, Cohere, and Ollama.
pip install agentlensai[all-providers] # all 9 providers
# or pick specific ones:
pip install agentlensai[openai] # just OpenAI
pip install agentlensai[bedrock,ollama] # Bedrock + Ollamaimport agentlensai
agentlensai.init(
url="http://localhost:3400",
api_key="als_your_key",
agent_id="my-agent",
)
# Every LLM call is now captured automatically (all installed providers)
import openai
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
# ^ Logged: model, tokens, cost, latency, full prompt/completion
# Works with Anthropic too
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
# ^ Also captured automatically
# LangChain? Use the callback handler:
from agentlensai.integrations.langchain import AgentLensCallbackHandler
chain.invoke(input, config={"callbacks": [AgentLensCallbackHandler()]})
agentlensai.shutdown() # flush remaining eventsKey guarantees:
- β Deterministic β every call captured, not dependent on LLM choosing to log
- β Fail-safe β if the server is down, your code works normally
- β Non-blocking β events sent via background thread
- β
Privacy β
init(redact=True)strips content, keeps metadata
For Claude Desktop, Cursor, or any MCP client β zero code changes:
Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"agentlens": {
"command": "npx",
"args": ["@agentlensai/mcp"],
"env": {
"AGENTLENS_API_URL": "http://localhost:3400",
"AGENTLENS_API_KEY": "als_your_key_here"
}
}
}
}Cursor (.cursor/mcp.json):
{
"mcpServers": {
"agentlens": {
"command": "npx",
"args": ["@agentlensai/mcp"],
"env": {
"AGENTLENS_API_URL": "http://localhost:3400",
"AGENTLENS_API_KEY": "als_your_key_here"
}
}
}
}For full control β log events, query analytics, build integrations:
Python:
pip install agentlensaifrom agentlensai import AgentLensClient
client = AgentLensClient("http://localhost:3400", api_key="als_your_key")
sessions = client.get_sessions()
analytics = client.get_llm_analytics()
print(f"Total cost: ${analytics.summary.total_cost_usd:.2f}")
# Health scores & optimization (v0.6.0+)
health = client.get_health("my-agent", window=7)
overview = client.get_health_overview()
history = client.get_health_history("my-agent", days=30)
recs = client.get_optimization_recommendations(period=7)
client.close()TypeScript:
npm install @agentlensai/sdkimport { AgentLensClient } from '@agentlensai/sdk';
const client = new AgentLensClient({
baseUrl: 'http://localhost:3400',
apiKey: 'als_your_key',
});
const sessions = await client.getSessions();
const analytics = await client.getLlmAnalytics();Navigate to http://localhost:3400 β see sessions, timelines, analytics, and alerts in real time.
AgentLens ships 12 MCP tools β 5 core observability tools, 3 for intelligence & analytics, and 4 for operations:
| Tool | Purpose | Description |
|---|---|---|
agentlens_session_start |
Start Session | Begin a new observability session for an agent run. Returns a session ID for correlating subsequent events. |
agentlens_log_event |
Log Event | Record a custom event (tool call, error, approval, etc.) into the current session timeline. |
agentlens_log_llm_call |
Log LLM Call | Record an LLM call with model, messages, tokens, cost, and latency. Pairs with completions via callId. |
agentlens_query_events |
Query Events | Search and filter events across sessions by type, severity, agent, time range, and payload content. |
agentlens_session_end |
End Session | Close the current session, flush pending events, and finalize the hash chain. |
| Tool | Purpose | Description |
|---|---|---|
agentlens_recall |
Semantic Search | Search past events and sessions by meaning. Use before starting tasks to find relevant history. |
agentlens_reflect |
Pattern Analysis | Analyze behavioral patterns β recurring errors, cost trends, tool sequences, performance changes. |
agentlens_context |
Cross-Session Context | Retrieve topic-focused history with session summaries and key events ranked by relevance. |
| Tool | Purpose | Description |
|---|---|---|
agentlens_health |
Health Scores | Check the agent's 5-dimension health score (0β100) with trend tracking. Dimensions: error rate, cost efficiency, tool success, latency, completion rate. |
agentlens_optimize |
Cost Optimization | Get model switch recommendations with projected monthly savings. Analyzes call complexity and suggests cheaper alternatives. |
agentlens_replay |
Session Replay | Replay a past session as a structured timeline with numbered steps, context annotations, and cost accumulation. |
agentlens_benchmark |
A/B Benchmarking | Create, manage, and analyze A/B experiments comparing agent variants with statistical significance testing. |
agentlens_guardrails |
Guardrails | Create, list, and manage automated safety rules β conditions, actions, cooldowns, dry-run mode, and trigger history. |
These tools are automatically available when using the MCP server. Agents can also access the underlying REST API directly via the SDK:
// Recall β semantic search over events and sessions
const results = await client.recall({ query: 'authentication errors', scope: 'events' });
// Reflect β analyze patterns
const analysis = await client.reflect({ analysis: 'error_patterns', agentId: 'my-agent' });
// Context β cross-session history
const context = await client.getContext({ topic: 'database migrations', limit: 5 });AgentLens integrates with Lore for cross-agent memory and lesson sharing. Set LORE_ENABLED=true to enable lesson management in the dashboard.
See the Lore Integration Guide for setup.
Auto-instrumentation across 9 LLM providers with a single
init()call. (View cast file)
Quick links:
- Discovery & Delegation β Register capabilities, discover agents, delegate tasks
| Package | Description | PyPI |
|---|---|---|
agentlensai |
Python SDK + auto-instrumentation for 9 LLM providers (OpenAI, Anthropic, LiteLLM, Bedrock, Vertex, Gemini, Mistral, Cohere, Ollama) |
| Package | Description | npm |
|---|---|---|
@agentlensai/server |
Hono API server + dashboard serving | |
@agentlensai/mcp |
MCP server for agent instrumentation | |
@agentlensai/sdk |
Programmatic TypeScript client | |
@agentlensai/core |
Shared types, schemas, hash chain utilities | |
@agentlensai/cli |
Command-line interface | |
@agentlensai/dashboard |
React web dashboard (bundled with server) | private |
| Endpoint | Description |
|---|---|
POST /api/events |
Ingest events (batch) |
GET /api/events |
Query events with filters |
GET /api/sessions |
List sessions |
GET /api/sessions/:id/timeline |
Session timeline with hash chain verification |
GET /api/analytics |
Bucketed metrics over time |
GET /api/analytics/costs |
Cost breakdown by agent |
POST /api/alerts/rules |
Create alert rules |
GET /api/recall |
Semantic search over events and sessions |
GET /api/reflect |
Pattern analysis (errors, costs, tools, performance) |
GET /api/context |
Cross-session context retrieval |
POST /api/events/ingest |
Webhook ingestion (AgentGate/FormBridge) |
GET /api/agents/:id/health |
Agent health score with dimensions |
GET /api/health/overview |
Health overview for all agents |
GET /api/health/history |
Historical health snapshots |
GET /api/optimize/recommendations |
Cost optimization recommendations |
GET /api/sessions/:id/replay |
Session replay with context reconstruction |
POST /api/benchmarks |
Create a benchmark |
GET /api/benchmarks |
List benchmarks |
GET /api/benchmarks/:id |
Get benchmark detail |
PUT /api/benchmarks/:id/status |
Transition benchmark status |
GET /api/benchmarks/:id/results |
Get benchmark comparison results |
DELETE /api/benchmarks/:id |
Delete a benchmark |
POST /api/guardrails |
Create guardrail rule |
GET /api/guardrails |
List guardrail rules |
GET /api/guardrails/:id |
Get guardrail rule |
PUT /api/guardrails/:id |
Update guardrail rule |
DELETE /api/guardrails/:id |
Delete guardrail rule |
GET /api/guardrails/history |
List guardrail trigger history |
GET /api/agents/:id |
Get agent detail (includes pausedAt, modelOverride) |
POST /api/keys |
Create API keys |
The @agentlensai/cli package provides command-line access to key features:
npx @agentlensai/cli health # Overview of all agents
npx @agentlensai/cli health --agent my-agent # Detailed health with dimensions
npx @agentlensai/cli health --agent my-agent --history # Score trend over time
npx @agentlensai/cli optimize # Cost optimization recommendations
npx @agentlensai/cli optimize --agent my-agent --period 7Both commands support --format json for machine-readable output. See agentlens health --help and agentlens optimize --help for all options.
# Clone and install
git clone https://github.com/amitpaz1/agentlens.git
cd agentlens
pnpm install
# Run all checks
pnpm typecheck
pnpm test
pnpm lint
# Start dev server
pnpm dev- Node.js β₯ 20.0.0
- pnpm β₯ 10.0.0
We welcome contributions! See CONTRIBUTING.md for setup instructions, coding standards, and the PR process.
| Project | Description | |
|---|---|---|
| AgentLens | Observability & audit trail for AI agents | β¬ οΈ you are here |
| Lore | Cross-agent memory and lesson sharing | |
| AgentGate | Human-in-the-loop approval gateway | |
| FormBridge | Agent-human mixed-mode forms | |
| AgentEval | Testing & evaluation framework | |
| agentkit-mesh | Agent discovery & delegation | |
| agentkit-cli | Unified CLI orchestrator | |
| agentkit-guardrails | Reactive policy guardrails |













