Five PydanticAI governance-domain agents with a shared MCP server pool, Pydantic AI Gateway, and AG-UI streaming — built for industrial control system documentation (FANUC, Siemens, PLC).
Control Plane Data Plane
┌─────────────────────────┐ ┌──────────────────────────┐
│ Pydantic AI Gateway │ │ Vespa (hybrid search) │
│ (provider routing, │ │ Ollama (embeddings/LLM) │
│ cost limits, OTel) │ │ Docling Serve (parsing) │
├─────────────────────────┤ │ PostgreSQL + pgvector │
│ 5 Governance Agents │ │ OTel Collector → Jaeger │
│ ┌───────┬───────┐ │ └──────────────────────────┘
│ │Query │Ingest │ │
│ ├───────┼───────┤ │ UI
│ │Eval │Memory │ │ ┌──────────────────────────┐
│ ├───────┴───────┤ │ │ CopilotKit + Next.js │
│ │ System │ │ │ (AG-UI SSE streaming) │
│ └───────────────┘ │ └──────────────────────────┘
└─────────────────────────┘
| Agent | Port | Domain | Primary Endpoint |
|---|---|---|---|
| QueryAgent | 8010 | Runtime Intelligence | /search |
| IngestionAgent | 8011 | Knowledge Construction | /ingest |
| EvaluationAgent | 8012 | Quality Assurance | /score |
| MemoryAgent | 8013 | State Governance | /store |
| SystemAgent | 8014 | Infrastructure Control | /execute |
- In-process (default): Gateway loads all 5 agents via
httpx.ASGITransport— single container - Container: Each agent runs as a separate service with HTTP routing
Set AGENT_DEPLOY_MODE=container to switch modes.
- Docker Desktop 20.10+ with Docker Compose 2.0+
- Python 3.12+, uv package manager
- 16GB+ RAM (GPU recommended)
# Core stack (Gateway, Vespa, Ollama, Docling, PostgreSQL, UI)
docker compose -f infra/compose/docker-compose.yml up -d
# With observability (adds OTel Collector + Jaeger)
docker compose -f infra/compose/docker-compose.yml --profile observability up -d
# With GPU acceleration
docker compose -f infra/compose/docker-compose.yml -f infra/compose/docker-compose.gpu.yml up -duv syncuv run python -m pytest tests/ -vuv run uvicorn services.gateway.app.main:create_app --factory --reload --port 8002curl -X POST http://localhost:8002/v1/run \
-H "Content-Type: application/json" \
-d '{"agent": "query-agent", "prompt": "FANUC alarm codes", "context": {"hits": 5}}'RAG/
├── agents/ # 5 governance-domain agents
│ ├── query/ # QueryAgent (retrieval + ranking)
│ ├── ingestion/ # IngestionAgent (Docling pipeline)
│ ├── evaluation/ # EvaluationAgent (quality scoring)
│ ├── memory/ # MemoryAgent (session + long-term)
│ └── system/ # SystemAgent (git/shell/MCP)
├── libs/rag_common/ # Shared libraries
│ ├── clients/ # Vespa, Ollama, agent_interface
│ ├── models/ # Pydantic DTOs
│ └── embeddings.py # Two-tier embedding cache
├── services/gateway/ # Pydantic AI Gateway (PAIG)
├── infra/ # Docker Compose, initdb, OTel
├── mcp/ # MCP server configurations
├── servers/ # MCP tool files (glossary, etc.)
├── skills/ # Reusable skill functions
├── ui/ # CopilotKit + Next.js frontend
├── tests/ # Unit, integration, retrieval, benchmarks
├── infra/migrations/ # Database migration files (V001__*.sql)
├── Docs/ # Governance documents
│ ├── ADRs/ # Architecture Decision Records
│ ├── PDR.md # Project Design Record
│ └── MEMORY.md # Memory systems analysis
├── vespa-app/ # Vespa application package
├── reports/ # Parity matrix, refactor analysis
└── Obsolete/ # Retired legacy code (preserved)
| Service | Port | URL | Purpose |
|---|---|---|---|
| Gateway | 8002 | http://localhost:8002 | Agent orchestration + AG-UI |
| Vespa | 8081 | http://localhost:8081 | Hybrid search engine |
| Ollama | 11434 | http://localhost:11434 | Embeddings + LLM |
| Docling | 5001 | http://localhost:5001 | Document parsing |
| PostgreSQL | 5432 | localhost:5432 | Agent memory + pgvector |
| Jaeger | 16686 | http://localhost:16686 | Trace visualization |
| UI | 3000 | http://localhost:3000 | CopilotKit frontend |
GET /health— Health checkGET /v1/providers— Available LLM providersGET /v1/agents— Registered agentsPOST /v1/run— Execute agent action (JSON)POST /v1/ag-ui— AG-UI SSE streaming
- ADR — Architecture Decision Records
- PDR — Project Design Record
- Repo Manifest — File-level source of truth
- Memory Systems — Memory architecture analysis and data-flow documentation
- GPU Acceleration — Use
docker-compose.gpu.ymloverlay (see Quick Start above)
# All tests
uv run python -m pytest tests/ -v
# Gateway tests only
uv run python -m pytest tests/test_gateway.py tests/integration/ -v
# Retrieval tests
uv run python -m pytest tests/retrieval/ -vuv run python governance_check.py
uv run python parity_check.py- Agents: PydanticAI with multi-agent delegation (Levels 2-5)
- Gateway: Pydantic AI Gateway (AGPL-3.0, self-hosted)
- Search: Vespa.ai (hybrid BM25 + dense retrieval)
- Embeddings: Ollama qwen3-embedding:0.6b (1024 dims)
- Memory: PostgreSQL + pgvector + Vespa agent memory
- Observability: OpenTelemetry → Jaeger
- UI: CopilotKit + Next.js (AG-UI protocol)
- MCP: Shared server pool with role-based access
- Containers: Docker Compose with lockfile-based builds (non-root)
- PydanticAI: https://ai.pydantic.dev/
- Pydantic AI Gateway: https://ai.pydantic.dev/gateway/
- AG-UI Protocol: https://docs.ag-ui.com/
- Vespa: https://docs.vespa.ai/
- CopilotKit: https://docs.copilotkit.ai/
- MCP: https://modelcontextprotocol.io/
Version: 1.0.0 (Agent-First Platform)
Status: Operational