Skip to content

Open-source observability and audit trail platform for AI agents. MCP-native, tamper-evident event logging, real-time dashboard.

License

Notifications You must be signed in to change notification settings

agentkitai/agentlens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

227 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” AgentLens

Open-source observability & audit trail for AI agents

PyPI npm server npm mcp License: MIT Build Status


🐳 Quick Start (Docker)

git clone https://github.com/amitpaz1/agentlens
cd agentlens
cp .env.example .env
docker compose up
# Open http://localhost:3000

For production (auth enabled, Stripe, TLS):

docker compose -f docker-compose.yml -f docker-compose.prod.yml up

AgentLens Demo

AgentLens is a flight recorder for AI agents. It captures every LLM call, tool invocation, approval decision, and error β€” then presents it through a queryable API and real-time web dashboard.

Four ways to integrate β€” pick what fits your stack:

Integration Language Effort Capture
πŸ€– OpenClaw Plugin OpenClaw Copy & enable Every Anthropic call β€” prompts, tokens, cost, tools β€” zero code
🐍 Python Auto-Instrumentation Python 1 line Every OpenAI / Anthropic / LangChain call β€” deterministic
πŸ”Œ MCP Server Any (MCP) Config block Tool calls, sessions, events from Claude Desktop / Cursor
πŸ“¦ SDK Python, TypeScript Code Full control β€” log events, query analytics, build integrations

✨ Key Features

  • 🐍 Python Auto-Instrumentation β€” agentlensai.init() and every LLM call across 9 providers (OpenAI, Anthropic, LiteLLM, Bedrock, Vertex, Gemini, Mistral, Cohere, Ollama) is captured automatically. Deterministic β€” no reliance on LLM behavior.
  • πŸ”Œ MCP-Native β€” Ships as an MCP server. Agents connect to it like any other tool. Works with Claude Desktop, Cursor, and any MCP client.
  • 🧠 LLM Call Tracking β€” Full prompt/completion visibility, token usage, cost aggregation, latency measurement, and privacy redaction.
  • πŸ“Š Real-Time Dashboard β€” Session timelines, event explorer, LLM analytics, cost tracking, and alerting in a beautiful web UI.
  • πŸ”’ Tamper-Evident Audit Trail β€” Append-only event storage with SHA-256 hash chains per session. Cryptographically linked and verifiable.
  • πŸ’° Cost Tracking β€” Track token usage and estimated costs per session, per agent, per model, over time. Alert on cost spikes.
  • 🚨 Alerting β€” Configurable rules for error rate, cost threshold, latency anomalies, and inactivity.
  • πŸ”— AgentKit Ecosystem β€” First-class integrations with AgentGate (approval flows), FormBridge (data collection), Lore (cross-agent memory), and AgentEval (testing & evaluation).
  • πŸ”’ Tenant Isolation β€” Multi-tenant support with per-tenant data scoping, API key binding, and embedding isolation.
  • β€οΈβ€πŸ©Ή Health Scores β€” 5-dimension health scoring (error rate, cost efficiency, tool success, latency, completion rate) with trend tracking. Monitor agent reliability at a glance.
  • πŸ’‘ Cost Optimization β€” Complexity-aware model recommendation engine. Classifies LLM calls by complexity tier and suggests cheaper alternatives with projected savings.
  • πŸ“Ό Session Replay β€” Step-through any past session with full context reconstruction β€” LLM history, tool results, cost accumulation, and error tracking at every step.
  • βš–οΈ A/B Benchmarking β€” Statistical comparison of agent variants using Welch's t-test and chi-squared analysis across 8 metrics. Create experiments, collect data, get p-values.
  • πŸ›‘οΈ Guardrails β€” Automated safety rules that monitor error rates, costs, health scores, and custom metrics. Actions include pausing agents, sending webhooks, downgrading models, and applying AgentGate policies. Dry-run mode for safe testing.
  • πŸ”Œ Framework Plugins β€” Optional plugins for LangChain, CrewAI, AutoGen, and Semantic Kernel. Auto-detection, fail-safe, non-blocking instrumentation with zero code changes.
  • 🏠 Self-Hosted β€” SQLite by default, no external dependencies. MIT licensed. Your data stays on your infrastructure.

πŸ“Έ Dashboard

AgentLens ships with a real-time web dashboard for monitoring your agents.

Overview β€” At-a-Glance Metrics

Dashboard Overview

The overview page shows live metrics β€” sessions, events, errors, and active agents β€” with a 24-hour event timeline chart, recent sessions with status badges (active/completed), and a recent errors feed. Everything updates in real-time via SSE.

Sessions β€” Track Every Agent Run

Sessions List

The sessions table shows every agent session with sortable columns: agent name, status, start time, duration, event count, error count, and total cost. Filter by agent or status (Active / Completed / Error) to drill down.

Session Detail β€” Timeline & Hash Chain

Session Detail

Click into any session to see the full event timeline β€” every tool call, error, cost event, and session lifecycle event in chronological order. The green βœ“ Chain Valid badge confirms the tamper-evident hash chain is intact. Filter by event type (Tool Calls, Errors, Approvals, Custom). Cost breakdown shows token usage and spend.

Events Explorer β€” Search & Filter Everything

Events Explorer

The events explorer gives you a searchable, filterable view of every event across all sessions. Filter by event type, severity, agent, or time range. Full-text search works on payload content. Each row shows the tool name, agent, session, severity level, and duration.

🧠 LLM Analytics β€” Prompt & Cost Tracking

LLM Analytics

The LLM Analytics page shows total LLM calls, cost, latency, and token usage across all agents. Cost and calls over time charts, plus a model comparison table breaking down usage by provider and model (Anthropic, OpenAI, Google). Filter by agent, provider, or model.

🧠 Session Timeline β€” LLM Call Pairing

LLM Timeline

LLM calls appear in the session timeline with 🧠 icons and indigo styling, paired with their completions by callId. Each node shows the model, message count, token usage (in/out), cost badge, and latency. Tool calls and LLM calls are interleaved chronologically β€” see exactly what the agent thought, then did.

πŸ’¬ Prompt Detail β€” Chat Bubble Viewer

LLM Call Detail

Click any LLM call to see the full prompt and completion in a chat-bubble style viewer. System, user, assistant, and tool messages each get distinct styling. The metadata panel shows provider, model, parameters (temperature, max tokens), token breakdown (input/output/thinking/cache), cost, latency, tools provided to the model, and the tamper-evident hash chain.

β€οΈβ€πŸ©Ή Health Overview β€” Agent Reliability at a Glance

Health Overview

The Health Overview page shows a 5-dimension health score (0–100) for every agent: error rate, cost efficiency, tool success, latency, and completion rate. Each dimension is scored independently and combined into a weighted overall score. Trend arrows (↑ improving, β†’ stable, ↓ degrading) show direction over time. Click any agent to see a historical sparkline of their score.

πŸ’‘ Cost Optimization β€” Model Recommendations

Cost Optimization

The Cost Optimization page analyzes your LLM call patterns and recommends cheaper model alternatives. Calls are classified by complexity tier (simple / moderate / complex), and the recommendation engine suggests where you can safely downgrade β€” e.g., "Switch gpt-4o β†’ gpt-4o-mini for SIMPLE tasks, saving $89/month." Confidence levels and success rate comparisons are shown for each recommendation.

πŸ“Ό Session Replay β€” Step-Through Debugger

Session Replay

Session Replay lets you step through any past session event by event with full context reconstruction. A scrubber/timeline control moves through steps chronologically. At each step, the context panel shows cumulative cost, LLM conversation history, tool call results, pending approvals, and error count. Filter by event type, jump to specific steps, or replay just the summary.

βš–οΈ Benchmarks β€” A/B Testing for Agents

Benchmarks

The Benchmarks page lets you create and manage A/B experiments comparing agent variants. Define 2–10 variants with session tags, pick metrics (cost, latency, error rate, success rate, tokens, duration), and collect data. Results include per-variant statistics, Welch's t-test p-values, confidence stars (β˜… β˜…β˜… β˜…β˜…β˜…), and distribution charts. The full workflow β€” draft β†’ running β†’ completed β€” is managed from the dashboard.

πŸ›‘οΈ Guardrails β€” Automated Safety Rules

Guardrails

The Guardrails page lets you create and manage automated safety rules that monitor error rates, costs, health scores, and custom metrics. Each rule has a condition, action, cooldown, and optional dry-run mode. The list shows trigger counts and last triggered time. Click any rule for the detail page with full configuration, runtime state, and trigger history. The Activity Feed shows a real-time log of all triggers across all rules with filtering by agent and rule.

☁️ AgentLens Cloud

Don't want to self-host? AgentLens Cloud is a fully managed SaaS β€” same SDK, zero infrastructure:

import agentlensai

agentlensai.init(
    cloud=True,
    api_key="als_cloud_your_key_here",
    agent_id="my-agent",
)
# That's it β€” all LLM calls are captured and sent to the cloud
  • Same SDK, one parameter change β€” switch url= to cloud=True
  • Managed Postgres β€” multi-tenant with row-level security
  • Team features β€” organizations, RBAC, audit logs, usage billing
  • No server to run β€” dashboard at app.agentlens.ai

πŸ“– Cloud Setup Guide β€” sign up, create API key, verify first event πŸ“– Migration Guide β€” move from self-hosted to cloud in 5 minutes πŸ”§ Troubleshooting β€” common issues and how to fix them

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Your AI Agents                                                   β”‚
β”‚                                                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Python App       β”‚  β”‚  MCP Client       β”‚  β”‚  TypeScript    β”‚  β”‚
β”‚  β”‚  (OpenAI,         β”‚  β”‚  (Claude Desktop,  β”‚  β”‚  App           β”‚  β”‚
β”‚  β”‚   Anthropic,      β”‚  β”‚   Cursor, etc.)    β”‚  β”‚                β”‚  β”‚
β”‚  β”‚   LangChain)      β”‚  β”‚                    β”‚  β”‚                β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚           β”‚                     β”‚                       β”‚          β”‚
β”‚    agentlensai.init()    MCP Protocol (stdio)    @agentlensai/sdk β”‚
β”‚    Auto-instrumentation         β”‚                       β”‚          β”‚
β”‚           β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚          β”‚
β”‚           β”‚              β”‚ @agentlensai/mcp β”‚           β”‚          β”‚
β”‚           β”‚              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚          β”‚
β”‚           β”‚                     β”‚                       β”‚          β”‚
β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                                 β”‚                                  β”‚
β”‚                          HTTP REST API                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              @agentlensai/server                                  β”‚
β”‚                                                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Ingest    β”‚ β”‚   Query    β”‚ β”‚   Alert    β”‚ β”‚  LLM         β”‚  β”‚
β”‚  β”‚  Engine    β”‚ β”‚   Engine   β”‚ β”‚   Engine   β”‚ β”‚  Analytics   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Recall    β”‚ β”‚  Reflect   β”‚ β”‚  Context   β”‚ β”‚  Guardrails  β”‚  β”‚
β”‚  β”‚ (Semantic) β”‚ β”‚ (Patterns) β”‚ β”‚ (X-Session)β”‚ β”‚  Engine      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Health    β”‚ β”‚   Cost     β”‚ β”‚  Session   β”‚ β”‚  Benchmark   β”‚  β”‚
β”‚  β”‚  Scoring   β”‚ β”‚  Optimizer β”‚ β”‚  Replay    β”‚ β”‚  Engine      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚               β”‚                                                   β”‚
β”‚        β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚        β”‚   SQLite    β”‚         β”‚  Dashboard  β”‚                   β”‚
β”‚        β”‚  (append    β”‚         β”‚  React SPA  β”‚                   β”‚
β”‚        β”‚   only)     β”‚         β”‚  (served    β”‚                   β”‚
β”‚        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚   at /)     β”‚                   β”‚
β”‚                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  Integrations:  AgentGate ──┐
                 FormBridge ────► POST /api/events/ingest
                 Generic β”€β”€β”€β”€β”˜     (HMAC-SHA256 verified)

πŸš€ Quick Start

1. Start the Server

npx @agentlensai/server

Opens on http://localhost:3400 with SQLite β€” zero config.

2. Create an API Key

curl -X POST http://localhost:3400/api/keys \
  -H "Content-Type: application/json" \
  -d '{"name": "my-agent"}'

Save the als_... key from the response β€” it's shown only once.

3. Instrument Your Agent

πŸ€– OpenClaw Plugin

If you're running OpenClaw, the AgentLens plugin captures every Anthropic API call automatically β€” prompts, completions, token usage, costs, latency, and tool calls. No proxy, no preload scripts, no code changes.

# Copy the plugin into OpenClaw's extensions directory
cp -r packages/openclaw-plugin /usr/lib/node_modules/openclaw/extensions/agentlens-relay

# Enable it
openclaw config patch '{"plugins":{"entries":{"agentlens-relay":{"enabled":true}}}}'

# Restart
openclaw gateway restart

That's it. Open the AgentLens dashboard and you'll see every LLM call with full prompt visibility, cost tracking, and tool call extraction.

Set AGENTLENS_URL if your AgentLens instance isn't on localhost:3000. See the plugin README for details.

🐍 Python Auto-Instrumentation

One line β€” every LLM call captured automatically. 9 providers supported: OpenAI, Anthropic, LiteLLM, AWS Bedrock, Google Vertex AI, Google Gemini, Mistral AI, Cohere, and Ollama.

pip install agentlensai[all-providers]   # all 9 providers
# or pick specific ones:
pip install agentlensai[openai]          # just OpenAI
pip install agentlensai[bedrock,ollama]  # Bedrock + Ollama
import agentlensai

agentlensai.init(
    url="http://localhost:3400",
    api_key="als_your_key",
    agent_id="my-agent",
)

# Every LLM call is now captured automatically (all installed providers)
import openai
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
# ^ Logged: model, tokens, cost, latency, full prompt/completion

# Works with Anthropic too
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
# ^ Also captured automatically

# LangChain? Use the callback handler:
from agentlensai.integrations.langchain import AgentLensCallbackHandler
chain.invoke(input, config={"callbacks": [AgentLensCallbackHandler()]})

agentlensai.shutdown()  # flush remaining events

Key guarantees:

  • βœ… Deterministic β€” every call captured, not dependent on LLM choosing to log
  • βœ… Fail-safe β€” if the server is down, your code works normally
  • βœ… Non-blocking β€” events sent via background thread
  • βœ… Privacy β€” init(redact=True) strips content, keeps metadata

πŸ”Œ MCP Integration

For Claude Desktop, Cursor, or any MCP client β€” zero code changes:

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "agentlens": {
      "command": "npx",
      "args": ["@agentlensai/mcp"],
      "env": {
        "AGENTLENS_API_URL": "http://localhost:3400",
        "AGENTLENS_API_KEY": "als_your_key_here"
      }
    }
  }
}

Cursor (.cursor/mcp.json):

{
  "mcpServers": {
    "agentlens": {
      "command": "npx",
      "args": ["@agentlensai/mcp"],
      "env": {
        "AGENTLENS_API_URL": "http://localhost:3400",
        "AGENTLENS_API_KEY": "als_your_key_here"
      }
    }
  }
}

πŸ“¦ Programmatic SDK

For full control β€” log events, query analytics, build integrations:

Python:

pip install agentlensai
from agentlensai import AgentLensClient

client = AgentLensClient("http://localhost:3400", api_key="als_your_key")
sessions = client.get_sessions()
analytics = client.get_llm_analytics()
print(f"Total cost: ${analytics.summary.total_cost_usd:.2f}")

# Health scores & optimization (v0.6.0+)
health = client.get_health("my-agent", window=7)
overview = client.get_health_overview()
history = client.get_health_history("my-agent", days=30)
recs = client.get_optimization_recommendations(period=7)

client.close()

TypeScript:

npm install @agentlensai/sdk
import { AgentLensClient } from '@agentlensai/sdk';

const client = new AgentLensClient({
  baseUrl: 'http://localhost:3400',
  apiKey: 'als_your_key',
});
const sessions = await client.getSessions();
const analytics = await client.getLlmAnalytics();

4. Open the Dashboard

Navigate to http://localhost:3400 β€” see sessions, timelines, analytics, and alerts in real time.

πŸ”Œ MCP Tools

AgentLens ships 12 MCP tools β€” 5 core observability tools, 3 for intelligence & analytics, and 4 for operations:

Core Observability

Tool Purpose Description
agentlens_session_start Start Session Begin a new observability session for an agent run. Returns a session ID for correlating subsequent events.
agentlens_log_event Log Event Record a custom event (tool call, error, approval, etc.) into the current session timeline.
agentlens_log_llm_call Log LLM Call Record an LLM call with model, messages, tokens, cost, and latency. Pairs with completions via callId.
agentlens_query_events Query Events Search and filter events across sessions by type, severity, agent, time range, and payload content.
agentlens_session_end End Session Close the current session, flush pending events, and finalize the hash chain.

Intelligence & Analytics

Tool Purpose Description
agentlens_recall Semantic Search Search past events and sessions by meaning. Use before starting tasks to find relevant history.
agentlens_reflect Pattern Analysis Analyze behavioral patterns β€” recurring errors, cost trends, tool sequences, performance changes.
agentlens_context Cross-Session Context Retrieve topic-focused history with session summaries and key events ranked by relevance.

Operations

Tool Purpose Description
agentlens_health Health Scores Check the agent's 5-dimension health score (0–100) with trend tracking. Dimensions: error rate, cost efficiency, tool success, latency, completion rate.
agentlens_optimize Cost Optimization Get model switch recommendations with projected monthly savings. Analyzes call complexity and suggests cheaper alternatives.
agentlens_replay Session Replay Replay a past session as a structured timeline with numbered steps, context annotations, and cost accumulation.
agentlens_benchmark A/B Benchmarking Create, manage, and analyze A/B experiments comparing agent variants with statistical significance testing.
agentlens_guardrails Guardrails Create, list, and manage automated safety rules β€” conditions, actions, cooldowns, dry-run mode, and trigger history.

These tools are automatically available when using the MCP server. Agents can also access the underlying REST API directly via the SDK:

// Recall β€” semantic search over events and sessions
const results = await client.recall({ query: 'authentication errors', scope: 'events' });

// Reflect β€” analyze patterns
const analysis = await client.reflect({ analysis: 'error_patterns', agentId: 'my-agent' });

// Context β€” cross-session history
const context = await client.getContext({ topic: 'database migrations', limit: 5 });

πŸ”— Lore Integration (Optional)

AgentLens integrates with Lore for cross-agent memory and lesson sharing. Set LORE_ENABLED=true to enable lesson management in the dashboard.

See the Lore Integration Guide for setup.

🎬 v0.10.0 Multi-Provider Demo

AgentLens v0.10.0 Multi-Provider Demo

Auto-instrumentation across 9 LLM providers with a single init() call. (View cast file)

Quick links:

πŸ“¦ Packages

Python (PyPI)

Package Description PyPI
agentlensai Python SDK + auto-instrumentation for 9 LLM providers (OpenAI, Anthropic, LiteLLM, Bedrock, Vertex, Gemini, Mistral, Cohere, Ollama) PyPI

TypeScript / Node.js (npm)

Package Description npm
@agentlensai/server Hono API server + dashboard serving npm
@agentlensai/mcp MCP server for agent instrumentation npm
@agentlensai/sdk Programmatic TypeScript client npm
@agentlensai/core Shared types, schemas, hash chain utilities npm
@agentlensai/cli Command-line interface npm
@agentlensai/dashboard React web dashboard (bundled with server) private

πŸ”Œ API Overview

Endpoint Description
POST /api/events Ingest events (batch)
GET /api/events Query events with filters
GET /api/sessions List sessions
GET /api/sessions/:id/timeline Session timeline with hash chain verification
GET /api/analytics Bucketed metrics over time
GET /api/analytics/costs Cost breakdown by agent
POST /api/alerts/rules Create alert rules
GET /api/recall Semantic search over events and sessions
GET /api/reflect Pattern analysis (errors, costs, tools, performance)
GET /api/context Cross-session context retrieval
POST /api/events/ingest Webhook ingestion (AgentGate/FormBridge)
GET /api/agents/:id/health Agent health score with dimensions
GET /api/health/overview Health overview for all agents
GET /api/health/history Historical health snapshots
GET /api/optimize/recommendations Cost optimization recommendations
GET /api/sessions/:id/replay Session replay with context reconstruction
POST /api/benchmarks Create a benchmark
GET /api/benchmarks List benchmarks
GET /api/benchmarks/:id Get benchmark detail
PUT /api/benchmarks/:id/status Transition benchmark status
GET /api/benchmarks/:id/results Get benchmark comparison results
DELETE /api/benchmarks/:id Delete a benchmark
POST /api/guardrails Create guardrail rule
GET /api/guardrails List guardrail rules
GET /api/guardrails/:id Get guardrail rule
PUT /api/guardrails/:id Update guardrail rule
DELETE /api/guardrails/:id Delete guardrail rule
GET /api/guardrails/history List guardrail trigger history
GET /api/agents/:id Get agent detail (includes pausedAt, modelOverride)
POST /api/keys Create API keys

Full API Reference β†’

⌨️ CLI

The @agentlensai/cli package provides command-line access to key features:

npx @agentlensai/cli health                          # Overview of all agents
npx @agentlensai/cli health --agent my-agent          # Detailed health with dimensions
npx @agentlensai/cli health --agent my-agent --history # Score trend over time
npx @agentlensai/cli optimize                          # Cost optimization recommendations
npx @agentlensai/cli optimize --agent my-agent --period 7

Both commands support --format json for machine-readable output. See agentlens health --help and agentlens optimize --help for all options.

πŸ› οΈ Development

# Clone and install
git clone https://github.com/amitpaz1/agentlens.git
cd agentlens
pnpm install

# Run all checks
pnpm typecheck
pnpm test
pnpm lint

# Start dev server
pnpm dev

Requirements

  • Node.js β‰₯ 20.0.0
  • pnpm β‰₯ 10.0.0

🀝 Contributing

We welcome contributions! See CONTRIBUTING.md for setup instructions, coding standards, and the PR process.

🧰 AgentKit Ecosystem

Project Description
AgentLens Observability & audit trail for AI agents ⬅️ you are here
Lore Cross-agent memory and lesson sharing
AgentGate Human-in-the-loop approval gateway
FormBridge Agent-human mixed-mode forms
AgentEval Testing & evaluation framework
agentkit-mesh Agent discovery & delegation
agentkit-cli Unified CLI orchestrator
agentkit-guardrails Reactive policy guardrails

πŸ“„ License

MIT Β© Amit Paz