Skip to content

nuetzliches/powerbrain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

278 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Powerbrain

"AI eats context. We decide what's on the menu."

Open-source context engine that feeds AI agents with policy-compliant enterprise knowledge β€” self-hosted, GDPR-native, provider-agnostic.


The Problem

Every AI agent needs context. But feeding enterprise data to LLMs means losing control β€” over who sees what, how long it's retained, and whether it complies with GDPR. Most solutions either block AI entirely or hand everything over. Neither works.

The Solution

Powerbrain sits between your data and your AI agents. It delivers context through the Model Context Protocol (MCP), with every request checked by a policy engine. Your data stays on your infrastructure. Your policies decide what gets through.

Agent / Skill
    β”‚ MCP
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Powerbrain MCP Server                          β”‚
β”‚  β”œβ”€ OPA Policy Check (every request)            β”‚
β”‚  β”œβ”€ Circuit Breaker + Approval Queue (Art. 14)  β”‚
β”‚  β”œβ”€ Qdrant Vector Search (oversampled)          β”‚
β”‚  β”œβ”€ Cross-Encoder Reranking (top-k)             β”‚
β”‚  β”œβ”€ Context Summarization (policy-controlled)   β”‚
β”‚  β”œβ”€ Sealed Vault (PII pseudonymization)         β”‚
β”‚  └─ Tamper-Evident Audit Log (Art. 12)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚           β”‚           β”‚           β”‚
    β–Ό           β–Ό           β–Ό           β–Ό
 Qdrant    PostgreSQL     OPA       Ollama
 (vectors)  (data+vault)  (policies) (embeddings+LLM)
                β”‚
                β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  pb-worker   β”‚  Accuracy metrics, drift
         β”‚ (APScheduler)β”‚  detection, audit retention
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

✨ Core Features

πŸ”’ Policy-Aware Context Delivery β€” Every search request is checked against OPA policies. Classification levels (public, internal, confidential, restricted) control what each agent role can access. Compliance is executable code, not documentation.

πŸ›‘οΈ Sealed Vault & Pseudonymization β€” PII is detected at ingestion (Microsoft Presidio), pseudonymized with per-project salts, and stored in a dual-layer vault. Originals require HMAC-signed, time-limited tokens with purpose binding. Art. 17 deletion: remove the vault mapping and pseudonyms become irreversible.

🎯 Relevance Pipeline β€” 3-stage search: Qdrant oversampling (5x candidates) β†’ OPA policy filtering β†’ Cross-Encoder reranking. Graceful degradation: if the reranker is down, results fall back to vector ordering.

πŸ“ Context Summarization β€” Agents can request summaries instead of raw chunks. OPA policies can enforce summarization for sensitive data (confidential = summary only, no raw text), control detail levels, or deny summarization entirely. Powered by Ollama.

πŸ”Œ MCP-Native Interface β€” 16 tools accessible through the Model Context Protocol. Works with any MCP-compatible agent (Claude, OpenCode, custom). One endpoint, one protocol.

🏠 Self-Hosted & GDPR-Native β€” Everything runs on your infrastructure. No external API calls for embeddings, search, or summarization. Docker Compose up and you're running.

πŸ”€ AI Provider Proxy β€” Optional gateway between your AI consumers and their LLM providers. Transparently injects Powerbrain tools into every LLM request and executes tool calls automatically. Your teams use any LLM they prefer (100+ providers via LiteLLM); Powerbrain ensures they always query policy-checked enterprise context. Activate with docker compose --profile proxy up.

πŸ‡ͺπŸ‡Ί EU AI Act Compliance Toolkit

Powerbrain is not itself a high-risk AI system, but Deployers who operate one in regulated sectors (finance, healthcare, HR) need infrastructure that delivers the Art. 9–15 capabilities. Powerbrain ships them as executable building blocks, not PDFs:

Article Feature How
Art. 9 β€” Risk management Concrete risk register + live indicators docs/risk-management.md with 8 identified risks (R-01..R-08). GET /health with Accept: application/json returns 6 live risk indicators and HTTP 503 when critical.
Art. 10 β€” Data quality Blocking ingestion quality gate Composite score (length, language confidence, PII density, encoding, metadata) with per-source_type thresholds via OPA pb.ingestion.quality_gate. Rejected documents are audited in ingestion_rejections.
Art. 11 / Annex IV β€” Technical docs Admin-triggered Annex IV generator generate_compliance_doc MCP tool renders all 9 Annex IV sections as Markdown from live runtime state (models, OPA policies, collections, audit chain, risk register).
Art. 12 β€” Logging Tamper-evident audit hash chain SHA-256 hash chain on agent_access_log via PostgreSQL trigger with advisory locks, append-only enforcement, checkpoint+prune retention that preserves chain continuity. Verify via verify_audit_integrity, export via export_audit_log.
Art. 13 β€” Transparency Auth-required transparency endpoint GET /transparency and get_system_info MCP tool expose active models, OPA policies, collection stats, PII scanner config, and audit integrity β€” with deterministic version fingerprint.
Art. 14 β€” Human oversight Global kill-switch + approval queue POST /circuit-breaker halts all data-retrieval tools instantly. Confidential/restricted requests from non-admin roles are intercepted into pending_reviews; admins decide via review_pending, agents poll via get_review_status.
Art. 15 β€” Accuracy & drift Windowed feedback metrics + embedding drift detection Per-collection baseline centroids in embedding_reference_set, refreshed every 5 minutes by pb-worker. Prometheus gauges + alerts (QualityDrift, HighEmptyResultRate, RerankerScoreDrift, EmbeddingDriftDetected), pre-provisioned pb-accuracy Grafana dashboard.

The pb-worker maintenance container runs four APScheduler jobs: accuracy metrics refresh (5 min), pending-review timeouts (hourly), GDPR retention cleanup (daily 02:00), audit retention cleanup (daily 03:00).

πŸš€ Quick Start

git clone <repo-url> && cd powerbrain
cp .env.example .env
# Edit .env: set PG_PASSWORD (and optionally FORGEJO_URL, FORGEJO_TOKEN)

docker compose up -d

# Pull the embedding model
docker exec pb-ollama ollama pull nomic-embed-text

# Create vector collections
for col in pb_general pb_code pb_rules; do
  curl -s -X PUT "http://localhost:6333/collections/$col" \
    -H 'Content-Type: application/json' \
    -d '{"vectors":{"size":768,"distance":"Cosine"}}' && echo " β†’ $col βœ“"
done

Connect your agent:

{
  "mcpServers": {
    "powerbrain": {
      "type": "http",
      "url": "http://localhost:8080/mcp"
    }
  }
}

That's it. Your agent now has access to search_knowledge, query_data, graph_query, generate_compliance_doc, and 12 more tools.

Optional: AI Provider Proxy

# 1. Uncomment/add your LLM provider in pb-proxy/litellm_config.yaml
# 2. Set API keys in .env (e.g. OPENAI_API_KEY=sk-...)
docker compose --profile proxy up -d

# List available models:
curl http://localhost:8090/v1/models

# Use the proxy β€” Powerbrain tools are injected automatically:
curl http://localhost:8090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"What are our GDPR deletion policies?"}]}'

πŸ” How It Works

1. Agent calls search_knowledge("GDPR deletion policy", summarize=true)
2. Powerbrain embeds the query via Ollama (nomic-embed-text)
3. Qdrant returns 50 candidates (10 Γ— 5 oversampling)
4. OPA filters by agent role + data classification β†’ 30 remain
5. Cross-Encoder reranks by query-document relevance β†’ top 10
6. OPA summarization policy: allowed? required? detail level?
7. Ollama summarizes the chunks (if applicable)
8. Response: results + summary + policy transparency

🧭 Principles

  1. Sovereignty by design β€” Data sovereignty is not a feature, it's the architecture. No external API calls. No cloud dependencies. Your data, your rules.

  2. Enable, don't restrict β€” The goal is not to prevent AI adoption, but to make it safely usable. Powerbrain says "yes, but with guardrails" instead of "no."

  3. Policy as code β€” Compliance rules are OPA/Rego policies, version-controlled and testable. Not Word documents. Not checkbox audits.

πŸ“š Documentation

Document Description
What is Powerbrain? Detailed overview and positioning
Architecture Technical deep-dive
Deployment Guide Dev, production, TLS, Docker Secrets
Technology Decisions ADRs and trade-offs
Risk Register EU AI Act Art. 9 risk register (R-01..R-08)
EU AI Act Plan Implementation plan for B-40..B-46
CLAUDE.md Agent-facing reference (tools, schemas, conventions)

🀝 Contributing

Powerbrain is open source (Apache 2.0). Contributions welcome β€” whether it's a new OPA policy, a better reranker model, or documentation improvements.

Open source. Closed data. πŸ”

πŸ“„ License

Apache License 2.0. Dependencies under their respective licenses.

About

Open-source context engine that feeds AI agents with policy-compliant enterprise knowledge. MCP-native, GDPR & EU AI Act ready, self-hosted.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors