AI-powered network diagnostics with two modes: Operator Mode (SSH to devices) and Consumer Mode (edge diagnostics).
Diagnose enterprise networks by connecting to routers and switches:
- Device status and metrics (CPU, memory, interface stats)
- Path finding between devices
- Trend analysis with breach prediction
- Anomaly detection (z-score, rate shifts, volatility)
- Root cause analysis with confidence scoring
- Config change correlation
- Multi-vendor support (Cisco IOS-XR, IOS-XE, NX-OS)
Diagnose your home/office network without device access. See CONSUMER_MODE.md for the full consumer guide (tools, baseline workflow, use cases).
- Web dashboard – Browser UI with "My Connection" overview, consumer tools, guest sessions, and per-identity baselines
- Gateway health checks
- DNS resolution timing
- Traceroute with hop analysis
- WiFi signal quality (macOS/Linux/Windows)
- Baseline tracking with anomaly detection (per-identity when using the dashboard)
- Provider context (BGP/AS lookup, outage correlation)
- Continuous monitoring agent with intent system
- Speedtest integration
# Install from source
git clone https://github.com/vedevpatel/mcp-network-diagnostics.git
cd mcp-network-diagnostics
uv syncOne command to run the dashboard, then open the app in your browser—no API key or sign-in required:
uv run python -m mcp_network.dashboard
# Open http://localhost:8080Use the site as a guest (session is identified by a signed cookie; baselines and data are scoped per identity). From the Overview you get a live "My Connection" view (gateway, DNS, internet latency). From Tools you can run "Check my connection", "Trace path", "Why is it slow?", and baseline record/compare.
To use the same diagnostics from Claude (or any MCP client), point your Claude Desktop config at this repo and run check_my_connection() or why_is_it_slow("zoom.us") via MCP.
For the full consumer guide—all tools, baseline workflow, and use cases—see CONSUMER_MODE.md.
Run the dashboard for a browser-based "check my connection" experience—no MCP or API key required:
uv run python -m mcp_network.dashboard
# Open http://localhost:8080- Overview – "My Connection" live status (gateway, DNS, latency).
- Tools – Consumer tools (Check my connection, Trace path, Why is it slow?, Record/compare baseline, etc.) plus operator and agent tools when configured.
- Guest session – A signed cookie identifies your session; baselines and data are scoped per identity. Header shows "Using as guest" (optional "Sign in" for future use).
- Rate limits – Per-guest limit (default 60 requests/min). Set
CONSUMER_RATE_LIMIT_PER_MINUTEto override. - Optional auth – Set
MCP_NETWORK_DASHBOARD_REQUIRE_AUTH=1to require an API key for Tools and Settings.
With Docker: docker compose up -d then open http://localhost:8080.
Add the MCP server to Claude Desktop by editing ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows).
Diagnose your own network — no credentials, no config files:
{
"mcpServers": {
"network-diagnostics": {
"command": "/path/to/uv",
"args": ["--directory", "/path/to/mcp-network-diagnostics", "run", "mcp-network"]
}
}
}Try device diagnostics with a fake 10-router topology:
{
"mcpServers": {
"network-diagnostics": {
"command": "/path/to/uv",
"args": ["--directory", "/path/to/mcp-network-diagnostics", "run", "mcp-network", "--collector", "simulated"]
}
}
}Restart Claude Desktop after editing the config.
| Tool | Use Case |
|---|---|
check_my_connection() |
Quick health check — WiFi, gateway, DNS, internet latency |
why_is_it_slow("zoom.us") |
Diagnose slow connections — pinpoints bottleneck location |
trace_path("8.8.8.8") |
Traceroute with AS/provider info per hop |
scan_local_network() |
List devices on your LAN (from ARP table) |
record_baseline() / compare_to_baseline() |
Track normal behavior, detect anomalies |
set_intent("Zoom should stay under 100ms") |
Continuous monitoring with natural language goals |
Operator-only tools (requires --collector simulated or SSH/Prometheus):
| Tool | Use Case |
|---|---|
list_devices() / get_device_status("R1") |
View topology and device health |
diagnose_latency("R1", "R5") |
AI-powered hop-by-hop latency diagnosis |
predict_trends() |
Forecast metric breaches (needs 5+ samples) |
detect_anomalies() |
Statistical anomaly detection across devices |
mcp-network --collector simulatedGenerates a fake 10-router topology for demos. Try:
get_device_status("R1")diagnose_latency("R1", "R5")predict_trends()- After callingrefresh_metrics()5+ times
export DEVNET_IOSXE_USERNAME=developer
export DEVNET_IOSXE_PASSWORD=C1sco12345
export DEVNET_NXOS_USERNAME=admin
export DEVNET_NXOS_PASSWORD=RG!_Yw200
mcp-network --collector ssh --topology-file iosxe_topology.yamlClaude Desktop:
{
"mcpServers": {
"network-diagnostics": {
"command": "/path/to/uv",
"args": [
"--directory", "/path/to/mcp-network-diagnostics",
"run", "mcp-network",
"--collector", "ssh",
"--topology-file", "/path/to/iosxe_topology.yaml"
],
"env": {
"DEVNET_IOSXE_USERNAME": "developer",
"DEVNET_IOSXE_PASSWORD": "C1sco12345",
"DEVNET_NXOS_USERNAME": "admin",
"DEVNET_NXOS_PASSWORD": "RG!_Yw200"
}
}
}
}docker run -d -p 9090:9090 prom/prometheus
docker run -d -p 9100:9100 prom/node-exporter
mcp-network --collector prometheus \
--prometheus-url http://localhost:9090 \
--topology-file network_topology.yaml| Tool | Description |
|---|---|
check_my_connection() |
Gateway ping, DNS, WiFi stats, speedtest check |
why_is_it_slow(target) |
Diagnose latency issues to a destination |
trace_path(target) |
Traceroute with AS/provider enrichment |
record_baseline() |
Start baseline tracking (auto-records over time) |
compare_to_baseline() |
Detect anomalies vs historical normal |
clear_baseline() |
Reset baseline data |
run_speedtest() |
Bandwidth test (requires speedtest-cli) |
scan_local_network() |
List devices on your LAN (from ARP table) |
Set network goals in natural language and let the agent watch for violations:
# Start monitoring
set_intent("Zoom calls should never lag")
set_intent("Alert me if gaming latency exceeds 50ms")
set_intent("My connection should stay close to baseline")
# Check status
agent_status()
list_intents()
# View incidents
get_incidents()
# Stop when done
stop_agent()The agent:
- Monitors every 60s (configurable)
- Parses natural language goals → structured intents
- Auto-diagnoses violations
- Tracks baselines automatically
- Alert cooldown prevents spam
| Tool | Description |
|---|---|
get_device_status(device_id) |
CPU, memory, interface stats, health |
list_devices() |
All devices in topology |
diagnose_latency(src, dst) |
Intelligent path diagnosis |
find_path(src, dst) |
Shortest path between devices |
refresh_metrics() |
Update metrics (simulated collector only) |
predict_trends() |
Forecast metric breaches (5+ samples) |
detect_anomalies() |
Statistical anomaly detection |
analyze_root_cause(device_id, metric) |
Config change correlation |
┌─────────────────────────────────────────────────────────┐
│ MCP Server (stdio) │
├─────────────────────────────────────────────────────────┤
│ Intelligence Layer │
│ • Path finding • Trend analysis • Anomaly detection │
│ • Root cause • Intent parsing • Context enrichment│
├─────────────────────────────────────────────────────────┤
│ Data Collection │
├──────────────────┬──────────────────────────────────────┤
│ Operator Mode │ Consumer Mode │
│ • SSH │ • EdgeCollector (ping/trace/DNS) │
│ • Prometheus │ • BaselineStorage (ring buffers) │
│ • Simulated │ • NetworkAgent (continuous) │
└──────────────────┴──────────────────────────────────────┘
All operator mode collectors use YAML topology files:
devices:
- id: my-router # Unique ID for tool calls
type: router # router or switch
device_type: iosxe # iosxr, iosxe, nxos (SSH only)
host: 192.168.1.1
username: ${MY_USER} # ${VAR} = env variable
password: ${MY_PASS}
port: 22
interfaces:
- name: GigabitEthernet0/0/0
prometheus_name: GigE0_0_0 # Prometheus only
links:
- src_device: my-router
src_interface: GigabitEthernet0/0/0
dst_device: other-router
dst_interface: GigabitEthernet1
default_latency_ms: 2.0
thresholds: # Optional, overrides defaults
cpu: 80.0
memory: 85.0
utilization: 80.0
errors: 100
anomaly:
z_score_threshold: 2.0
rate_shift_threshold: 3.0Environment variable substitution: ${VAR_NAME} is replaced with $VAR_NAME at startup.
.local.yaml convention: Files matching *_topology.local.yaml are gitignored for credentials.
The MCP server supports two transports:
| Transport | Use Case | How to Connect |
|---|---|---|
| stdio (default) | Claude Desktop, local MCP clients | Add to claude_desktop_config.json |
| streamable-http | Remote API access, web integrations, multi-client | HTTP endpoint at http://host:port/mcp |
Default for Claude Desktop. The MCP client spawns the server as a subprocess and communicates over stdin/stdout.
# Run directly (for testing)
mcp-network --collector simulated
# Claude Desktop config points to the commandFor remote access or when multiple clients need to connect to the same server.
mcp-network --transport streamable-http --port 8000 --path /mcp
# Endpoint: http://localhost:8000/mcpClients connect via HTTP POST to the /mcp endpoint using the MCP JSON-RPC protocol.
For production HTTP MCP deployments:
# Start HTTP MCP server (no auth)
uv run mcp-network --transport streamable-http --port 8000 --path /mcp
# With authentication required
uv run mcp-network --transport streamable-http --port 8000 --require-authWhen --require-auth is set, clients must provide an API key:
curl -X POST http://localhost:8000/mcp \
-H "Authorization: Bearer mcp_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'Create API keys via:
- The
create_api_keyMCP tool (superuser role) - The dashboard Settings page (when using API keys file)
- Directly in
~/.mcp_network/api_keys.json
Roles: consumer (edge tools only) → operator (+ device access) → admin (+ agent control) → superuser (full access). See SECURITY.md for details.
- Per-key limits: Based on role (consumer: 60/min, operator: 120/min, admin: 300/min)
- Global limit: 1000 req/min total (override with
MCP_NETWORK_GLOBAL_RPMenv var) - Exceeded limits return HTTP 429 with
Retry-Afterheader
# Build and run
docker compose up -d
# Access dashboard at http://localhost:8080For HTTP MCP behind a reverse proxy:
# docker-compose.override.yml
services:
mcp-http:
build: .
command: ["uv", "run", "mcp-network", "--transport", "streamable-http", "--port", "8000", "--require-auth"]
ports:
- "8000:8000"
environment:
- MCP_NETWORK_SESSION_SECRET=${SESSION_SECRET}
volumes:
- ./api_keys.json:/root/.mcp_network/api_keys.json:roPut nginx or Caddy in front for TLS termination:
# nginx.conf snippet
location /mcp {
proxy_pass http://mcp-http:8000/mcp;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}When deploying HTTP MCP:
- Always use TLS in production (terminate at reverse proxy)
- Set a strong session secret:
export MCP_NETWORK_SESSION_SECRET=$(openssl rand -hex 32) - Configure CORS:
MCP_NETWORK_CORS_ORIGINS=https://your-app.com(defaults to same-origin only) - SSRF protection: Consumer tools validate destinations to prevent internal network scanning
- Command injection: SSH collector validates commands are read-only (
show,display, etc.) - Rate limiting: Enabled by default; tune via env vars
See SECURITY.md for the full threat model and security architecture.
# Install with dev dependencies
uv pip install -e ".[dev]"
# Run tests
pytest tests/
# Type checking
mypy src/
# Linting
ruff check src/While the dashboard is running (uv run python -m mcp_network.dashboard):
# Test all dashboard endpoints, rate limiting, session handling
./test_dashboard_curl.sh
# Test with lower rate limit for faster rate-limit verification
CONSUMER_RATE_LIMIT_PER_MINUTE=10 ./test_dashboard_curl.sh- docs/MCP_QUICKSTART.md — 5-minute guide to get Claude Desktop connected
- examples/mcp_client_example.py — Programmatic MCP client usage
- examples/http_mcp_examples.sh — curl examples for HTTP MCP API
- 359 tests covering all collectors, tools, and analysis
- Unit tests for algorithms (trend, anomaly, pathfinding)
- Integration tests for MCP tools
- Cross-platform edge collector tests (macOS/Linux/Windows)
- Read-only - No device configuration changes
- Static topology - Define devices/links in YAML (no auto-discovery)
- DevNet credentials - Rotate periodically; refresh from developer.cisco.com if SSH fails
- Consumer mode limitations - Traceroute requires root/admin on some platforms
- Agent persistence - Runs within MCP server process; stops on server restart
src/mcp_network/
├── dashboard/ # Web UI (consumer + operator views)
│ ├── app.py # FastAPI app, session middleware
│ ├── session.py # Guest session (signed cookie)
│ ├── consumer_limits.py # Per-identity rate limits
│ ├── routes/ # Overview, tools, devices, incidents, etc.
│ └── templates/ # Jinja2 HTML
├── collectors/ # Data collection backends
│ ├── simulated.py # Fake topology for testing
│ ├── ssh.py # Cisco SSH collector
│ ├── prometheus.py # Prometheus metrics
│ └── edge.py # Consumer mode diagnostics
├── graph/ # Path finding & analysis
├── trends/ # Time-series analysis
│ ├── analyzer.py # Breach prediction
│ └── anomaly.py # Statistical detection
├── context/ # External enrichment
│ ├── bgp.py # AS lookup via Team Cymru
│ └── outages.py # Provider status
├── agent/ # Continuous monitoring
│ ├── core.py # NetworkAgent loop
│ └── intents.py # Natural language parsing
├── baseline/ # Consumer baseline tracking
└── tools/ # MCP tool implementations
1. Check connection health
→ check_my_connection()
2. Diagnose a slow service
→ why_is_it_slow("netflix.com")
3. Investigate routing
→ trace_path("8.8.8.8")
(Shows AS numbers, provider info, latency per hop)
4. Establish baseline
→ record_baseline()
(Run check_my_connection() 5+ times over days)
5. Detect anomalies
→ compare_to_baseline()
(Shows if current latency is 2x+ worse)
6. Continuous monitoring
→ set_intent("Zoom should stay under 100ms")
→ agent_status() # Check every minute
1. View topology
→ list_devices()
2. Check device health
→ get_device_status("R1")
3. Find path
→ find_path("R1", "R5")
4. Diagnose latency
→ diagnose_latency("R1", "R5")
(AI analyzes hop-by-hop, identifies bottlenecks)
5. Track trends (simulated only)
→ refresh_metrics() x5
→ predict_trends()
(Shows if CPU will breach in 12 minutes)
6. Detect anomalies
→ refresh_metrics() x10
→ detect_anomalies()
(Z-score spikes, rate shifts, volatility changes)
7. Root cause
→ analyze_root_cause("R2", "cpu")
(Correlates with config changes, health events)
Built with: