"AI eats context. We decide what's on the menu."
Open-source context engine that feeds AI agents with policy-compliant enterprise knowledge β self-hosted, GDPR-native, provider-agnostic.
Every AI agent needs context. But feeding enterprise data to LLMs means losing control β over who sees what, how long it's retained, and whether it complies with GDPR. Most solutions either block AI entirely or hand everything over. Neither works.
Powerbrain sits between your data and your AI agents. It delivers context through the Model Context Protocol (MCP), with every request checked by a policy engine. Your data stays on your infrastructure. Your policies decide what gets through.
Agent / Skill
β MCP
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Powerbrain MCP Server β
β ββ OPA Policy Check (every request) β
β ββ Circuit Breaker + Approval Queue (Art. 14) β
β ββ Qdrant Vector Search (oversampled) β
β ββ Cross-Encoder Reranking (top-k) β
β ββ Context Summarization (policy-controlled) β
β ββ Sealed Vault (PII pseudonymization) β
β ββ Tamper-Evident Audit Log (Art. 12) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β β
βΌ βΌ βΌ βΌ
Qdrant PostgreSQL OPA Ollama
(vectors) (data+vault) (policies) (embeddings+LLM)
β
βΌ
ββββββββββββββββ
β pb-worker β Accuracy metrics, drift
β (APScheduler)β detection, audit retention
ββββββββββββββββ
π Policy-Aware Context Delivery β Every search request is checked against OPA policies. Classification levels (public, internal, confidential, restricted) control what each agent role can access. Compliance is executable code, not documentation.
π‘οΈ Sealed Vault & Pseudonymization β PII is detected at ingestion (Microsoft Presidio), pseudonymized with per-project salts, and stored in a dual-layer vault. Originals require HMAC-signed, time-limited tokens with purpose binding. Art. 17 deletion: remove the vault mapping and pseudonyms become irreversible.
π― Relevance Pipeline β 3-stage search: Qdrant oversampling (5x candidates) β OPA policy filtering β Cross-Encoder reranking. Graceful degradation: if the reranker is down, results fall back to vector ordering.
π Context Summarization β Agents can request summaries instead of raw chunks. OPA policies can enforce summarization for sensitive data (confidential = summary only, no raw text), control detail levels, or deny summarization entirely. Powered by Ollama.
π MCP-Native Interface β 16 tools accessible through the Model Context Protocol. Works with any MCP-compatible agent (Claude, OpenCode, custom). One endpoint, one protocol.
π Self-Hosted & GDPR-Native β Everything runs on your infrastructure. No external API calls for embeddings, search, or summarization. Docker Compose up and you're running.
π AI Provider Proxy β Optional gateway between your AI consumers and their LLM providers. Transparently injects Powerbrain tools into every LLM request and executes tool calls automatically. Your teams use any LLM they prefer (100+ providers via LiteLLM); Powerbrain ensures they always query policy-checked enterprise context. Activate with docker compose --profile proxy up.
Powerbrain is not itself a high-risk AI system, but Deployers who operate one in regulated sectors (finance, healthcare, HR) need infrastructure that delivers the Art. 9β15 capabilities. Powerbrain ships them as executable building blocks, not PDFs:
| Article | Feature | How |
|---|---|---|
| Art. 9 β Risk management | Concrete risk register + live indicators | docs/risk-management.md with 8 identified risks (R-01..R-08). GET /health with Accept: application/json returns 6 live risk indicators and HTTP 503 when critical. |
| Art. 10 β Data quality | Blocking ingestion quality gate | Composite score (length, language confidence, PII density, encoding, metadata) with per-source_type thresholds via OPA pb.ingestion.quality_gate. Rejected documents are audited in ingestion_rejections. |
| Art. 11 / Annex IV β Technical docs | Admin-triggered Annex IV generator | generate_compliance_doc MCP tool renders all 9 Annex IV sections as Markdown from live runtime state (models, OPA policies, collections, audit chain, risk register). |
| Art. 12 β Logging | Tamper-evident audit hash chain | SHA-256 hash chain on agent_access_log via PostgreSQL trigger with advisory locks, append-only enforcement, checkpoint+prune retention that preserves chain continuity. Verify via verify_audit_integrity, export via export_audit_log. |
| Art. 13 β Transparency | Auth-required transparency endpoint | GET /transparency and get_system_info MCP tool expose active models, OPA policies, collection stats, PII scanner config, and audit integrity β with deterministic version fingerprint. |
| Art. 14 β Human oversight | Global kill-switch + approval queue | POST /circuit-breaker halts all data-retrieval tools instantly. Confidential/restricted requests from non-admin roles are intercepted into pending_reviews; admins decide via review_pending, agents poll via get_review_status. |
| Art. 15 β Accuracy & drift | Windowed feedback metrics + embedding drift detection | Per-collection baseline centroids in embedding_reference_set, refreshed every 5 minutes by pb-worker. Prometheus gauges + alerts (QualityDrift, HighEmptyResultRate, RerankerScoreDrift, EmbeddingDriftDetected), pre-provisioned pb-accuracy Grafana dashboard. |
The pb-worker maintenance container runs four APScheduler jobs: accuracy metrics refresh (5 min), pending-review timeouts (hourly), GDPR retention cleanup (daily 02:00), audit retention cleanup (daily 03:00).
git clone <repo-url> && cd powerbrain
cp .env.example .env
# Edit .env: set PG_PASSWORD (and optionally FORGEJO_URL, FORGEJO_TOKEN)
docker compose up -d
# Pull the embedding model
docker exec pb-ollama ollama pull nomic-embed-text
# Create vector collections
for col in pb_general pb_code pb_rules; do
curl -s -X PUT "http://localhost:6333/collections/$col" \
-H 'Content-Type: application/json' \
-d '{"vectors":{"size":768,"distance":"Cosine"}}' && echo " β $col β"
doneConnect your agent:
{
"mcpServers": {
"powerbrain": {
"type": "http",
"url": "http://localhost:8080/mcp"
}
}
}That's it. Your agent now has access to search_knowledge, query_data, graph_query, generate_compliance_doc, and 12 more tools.
# 1. Uncomment/add your LLM provider in pb-proxy/litellm_config.yaml
# 2. Set API keys in .env (e.g. OPENAI_API_KEY=sk-...)
docker compose --profile proxy up -d
# List available models:
curl http://localhost:8090/v1/models
# Use the proxy β Powerbrain tools are injected automatically:
curl http://localhost:8090/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"What are our GDPR deletion policies?"}]}'1. Agent calls search_knowledge("GDPR deletion policy", summarize=true)
2. Powerbrain embeds the query via Ollama (nomic-embed-text)
3. Qdrant returns 50 candidates (10 Γ 5 oversampling)
4. OPA filters by agent role + data classification β 30 remain
5. Cross-Encoder reranks by query-document relevance β top 10
6. OPA summarization policy: allowed? required? detail level?
7. Ollama summarizes the chunks (if applicable)
8. Response: results + summary + policy transparency
-
Sovereignty by design β Data sovereignty is not a feature, it's the architecture. No external API calls. No cloud dependencies. Your data, your rules.
-
Enable, don't restrict β The goal is not to prevent AI adoption, but to make it safely usable. Powerbrain says "yes, but with guardrails" instead of "no."
-
Policy as code β Compliance rules are OPA/Rego policies, version-controlled and testable. Not Word documents. Not checkbox audits.
| Document | Description |
|---|---|
| What is Powerbrain? | Detailed overview and positioning |
| Architecture | Technical deep-dive |
| Deployment Guide | Dev, production, TLS, Docker Secrets |
| Technology Decisions | ADRs and trade-offs |
| Risk Register | EU AI Act Art. 9 risk register (R-01..R-08) |
| EU AI Act Plan | Implementation plan for B-40..B-46 |
| CLAUDE.md | Agent-facing reference (tools, schemas, conventions) |
Powerbrain is open source (Apache 2.0). Contributions welcome β whether it's a new OPA policy, a better reranker model, or documentation improvements.
Open source. Closed data. π
Apache License 2.0. Dependencies under their respective licenses.