This was one cool project that I have been trying to implement for sometime.
Can we transform security reconnaissance data into prioritized, explainable insights using a Knowledge Graph and local LLM? Yes we can!
ExposureGraph is an open-source exposure management tool/project that:
- Discovers your attack surface using reconnaissance tools (subfinder, httpx)
- Stores asset relationships in a Neo4j Knowledge Graph
- Calculates explainable risk scores with transparent factors
- Enables natural language queries via local LLM (Ollama)
- Visualizes your security posture through an interactive Streamlit dashboard
- Integrates with AI agents via MCP (Model Context Protocol) server
Security teams drown in scanner output but lack context. A vulnerability on a forgotten staging server is different from one on your payment API. ExposureGraph transforms raw data into prioritized, explainable intelligence.
| Problem | ExposureGraph Solution |
|---|---|
| Scanner output is flat lists | Knowledge Graph captures relationships |
| Risk scores are black boxes | Every score has explainable factors |
| Queries require Cypher/SQL knowledge | Ask questions in natural language |
| Data stays in spreadsheets | Interactive dashboard with drill-down |
| AI assistants can't access your data | MCP server exposes tools natively |
flowchart TB
subgraph Collection["Data Collection"]
SF[subfinder]
HX[httpx]
end
subgraph Storage["Knowledge Graph"]
NEO[(Neo4j)]
end
subgraph Analysis["Risk Analysis"]
SCORE[RiskCalculator]
LLM[Ollama LLM]
end
subgraph Interface["User Interface"]
CLI[CLI Tools]
DASH[Streamlit Dashboard]
CHAT[Chat Interface]
end
subgraph Agentic["AI Agent Integration"]
MCP[MCP Server]
CC[Claude Code / MCP Client]
end
SF -->|subdomains| NEO
HX -->|services| NEO
NEO --> SCORE
SCORE -->|risk_score + factors| NEO
NEO <--> LLM
LLM --> CHAT
NEO --> DASH
NEO --> CLI
NEO <--> MCP
MCP <-->|stdio| CC
style NEO fill:#4CAF50,color:#fff
style LLM fill:#2196F3,color:#fff
style DASH fill:#9C27B0,color:#fff
style MCP fill:#FF9800,color:#fff
style CC fill:#FF9800,color:#fff
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ subfinder │────▶│ Neo4j │◀────│ Streamlit │
│ httpx │ │ Knowledge │ │ Dashboard │
│ │ │ Graph │ │ + Chat │
└─────────────┘ └──────┬──────┘ └─────────────┘
│
┌──────────┼──────────┐
│ │ │
┌──────▼──────┐ │ ┌──────▼──────┐
│ Ollama │ │ │ MCP │
│ Llama 3.1 │ │ │ Server │
└─────────────┘ │ └──────┬──────┘
│ │
│ ┌──────▼──────┐
│ │ Claude Code │
│ │ / MCP Client│
│ └─────────────┘
(:Domain {name: "acme-corp.com"})
│
│ [:HAS_SUBDOMAIN]
▼
(:Subdomain {fqdn: "api.acme-corp.com"})
│
│ [:HOSTS]
▼
(:WebService {url, status_code, risk_score, risk_factors, ...})
- Subdomain Discovery - Passive reconnaissance via subfinder
- HTTP Fingerprinting - Service detection with httpx (status, title, server, technologies)
- Knowledge Graph Storage - Neo4j with relationship modeling
- Explainable Risk Scoring - Transparent factors (non-production, version disclosure, outdated tech)
- Natural Language Queries - Ask questions, get Cypher + summaries
- Local LLM - Privacy-preserving with Ollama (no cloud API calls)
- Interactive Dashboard - Metrics, charts, filterable asset list
- Chat Interface - Conversational queries with suggested questions
- Demo Mode - Realistic seed data for demonstrations
- MCP Server - 7 tools + 2 resources for AI agent integration (Claude Code, etc.)
| Component | Technology | Purpose |
|---|---|---|
| Language | Python 3.11+ | Core application |
| Graph DB | Neo4j 5.x Community | Knowledge graph storage |
| LLM | Ollama + Llama 3.1 (8B) | Natural language queries |
| UI | Streamlit | Dashboard and chat |
| Charts | Plotly | Interactive visualizations |
| CLI | Typer + Rich | Command-line interface |
| Recon | subfinder, httpx | Asset discovery |
| MCP | mcp[cli] (FastMCP) | AI agent integration |
| Containers | Docker Compose | Infrastructure |
- Docker Desktop installed
- Python 3.11+ with pip
- Ollama installed (optional, for real LLM queries)
git clone https://github.com/yourusername/exposure-graph.git
cd exposure-graph
# Install Python dependencies
pip install -r requirements.txtdocker-compose up -d
# Wait ~30 seconds for Neo4j to initialize
# Verify: open http://localhost:7474 in browserpython scripts/seed_demo.py
# Or use Docker profile for one-command setup:
# docker-compose --profile demo upstreamlit run src/ui/app.py
# Open http://localhost:8501 in browserDashboard Page: View metrics, risk distribution, and top risky assets
Assets Page: Search and filter services, drill into risk breakdowns
Chat Page: Ask questions like:
- "What are our riskiest assets?"
- "Show me staging servers"
- "What services run nginx?"
# Scan a domain (must be in ALLOWED_TARGETS)
python scripts/run_scan.py scan scanme.sh
# Check graph status
python scripts/run_scan.py status# With Ollama running
python scripts/query.py "What are the riskiest assets?"
# Mock mode (no Ollama needed)
python scripts/query.py --mock "Show staging servers"
# See example queries
python scripts/query.py examplesExposureGraph includes an MCP (Model Context Protocol) server that lets AI agents like Claude Code query the knowledge graph, calculate risk scores, and generate reports directly.
| Tool | Description |
|---|---|
get_risk_overview |
Dashboard-level stats and risk distribution |
get_risky_assets |
Top N riskiest services with filters |
get_assets_for_domain |
Subdomains and services for a domain |
calculate_risk_score |
What-if risk scoring for any URL |
run_cypher_query |
Read-only Cypher queries (write ops blocked) |
query_graph |
Natural language questions (requires Ollama) |
generate_risk_report |
Executive or technical risk reports |
| URI | Description |
|---|---|
exposuregraph://schema |
Neo4j node/relationship schema |
exposuregraph://scoring-model |
Risk scoring model documentation |
The project includes .mcp.json for automatic Claude Code integration. After cloning:
- Install dependencies:
pip install -r requirements.txt - Start Neo4j:
docker-compose up -d - Seed demo data:
python scripts/seed_demo.py seed --clear - Restart Claude Code — the MCP server is auto-discovered
mcp dev src/mcp/server.pyThis opens a browser UI where you can invoke each tool interactively.
Once connected, ask questions naturally:
- "What is the current risk overview of our attack surface?"
- "Show me assets with risk scores above 70"
- "What subdomains does acme-corp.com have?"
- "Calculate the risk for http://staging.example.com with status 200 and nginx/1.0.5"
- "Generate an executive risk report"
- "Run a Cypher query to find all services running nginx"
Environment variables (or .env file):
# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD= # Empty for local dev
# Ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
# Security
ALLOWED_TARGETS=scanme.sh,example.com
# Development
MOCK_LLM=false # Set true to skip Ollama
LOG_LEVEL=INFOEvery asset receives a score from 0-100 with transparent contributing factors:
| Factor | Points | Condition |
|---|---|---|
| Base Score | 20 | Every exposed service |
| Live Service | +30 | HTTP 200 response |
| Non-Production | +15 | URL contains staging/dev/test/uat |
| Version Disclosure | +10 | Server header reveals version |
| Outdated Technology | +20 | Known EOL software detected |
| No HTTPS | +15 | Unencrypted HTTP |
| Directory Listing | +10 | "Index of" in page title |
Example Breakdown:
https://staging.api.example.com
Score: 75/100
+20 Base Score
+30 Live Service (HTTP 200)
+15 Non-Production ("staging" in URL)
+10 Version Disclosure (nginx/1.18.0)
exposure-graph/
├── README.md # This file
├── DISCLAIMER.md # Legal disclaimer
├── LICENSE # MIT License
├── .mcp.json # MCP server config for Claude Code
├── docker-compose.yml # Neo4j + demo seeder
├── requirements.txt # Python dependencies
├── config.py # Pydantic settings
│
├── src/
│ ├── collectors/ # subfinder, httpx wrappers
│ ├── graph/ # Neo4j client and models
│ ├── scoring/ # Risk calculator
│ ├── ai/ # LLM client and query agent
│ ├── mcp/ # MCP server for AI agent access
│ └── ui/ # Streamlit application
│
└── scripts/
├── run_scan.py # CLI for scanning
├── query.py # CLI for natural language queries
└── seed_demo.py # Demo data generator
# Manual testing workflow
docker-compose up -d
python scripts/seed_demo.py
streamlit run src/ui/app.py
# Test queries
python scripts/query.py --mock "What are our riskiest assets?"- Format with
black(line length 100) - Sort imports with
isort - Type hints required
- Google-style docstrings
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run manual tests to verify functionality
- Commit with descriptive message (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Additional collectors (nuclei, nmap)
- More risk factors (SSL certificate issues, CORS misconfiguration)
- Vulnerability correlation (CVE lookup based on detected versions)
- Scheduled scanning
- Export functionality (CSV, PDF reports)
- Additional visualizations (network graph, timeline)
| Issue | Solution |
|---|---|
| "Neo4j connection refused" | Run docker-compose up -d, wait 30s |
| "Ollama model not found" | Run ollama pull llama3.1:8b |
| "subfinder not found" | Install Go, then go install github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest |
| "httpx wrong version" | Ensure Go httpx is in PATH before Python httpx |
| "LLM responses are slow" | Use smaller model (llama3.2:3b) or enable mock mode |
| "Streamlit won't start" | Run pip install --upgrade streamlit |
| MCP server not connecting | Restart Claude Code, run /mcp to check status |
| MCP tools return errors | Ensure Neo4j is running and seeded with demo data |
This tool is for authorized security testing only.
- Only scan domains you own or have explicit permission to test
- Default allowed targets:
scanme.sh(ProjectDiscovery's test domain) - Never use against systems without written authorization
See DISCLAIMER.md for full legal notice.
This project is licensed under the MIT License - see the LICENSE file for details.
- ProjectDiscovery for subfinder and httpx
- Neo4j for the graph database
- Ollama for local LLM inference
- Streamlit for rapid UI development
Built as a portfolio project demonstrating modern security tooling with AI integration.