A production-grade, event-driven AI chat platform with RAG (Retrieval Augmented Generation) built on .NET 10 and Angular 21. The system delivers real-time, token-by-token AI responses using a fully decoupled microservices architecture — event sourcing, CQRS, Wolverine sagas, and SignalR streaming, all wired together through RabbitMQ and exposed via Kong.
Note: This is a demonstration project optimized for local development and showcasing architectural patterns. For production deployment, see the Production Readiness section.
What makes this project stand out:
- 🎯 Event Sourcing with Marten — Complete audit trail, event replay capability, optimistic concurrency with automatic retry policies
- 🔄 Wolverine Saga Orchestration — Stateful multi-step workflows for AI response generation with durable timeout handling
- 🧠 3-Stage RAG Pipeline — Decision layer → Query rewriting → Vector retrieval with graceful degradation at each step
- ⚡ Optimized Vector Search — pgvector with HNSW index for fast similarity search, scope-based access control (global/user/session)
- 📡 Token-by-Token Streaming — Real-time SignalR streaming with intelligent batching (publishes every 10 tokens, reducing network traffic by 90%)
- 🔐 Multi-Layer Authorization — UserId filtering in all queries, aggregate-level validation, scope-based document access
- 🏗️ Clean Architecture — Domain-driven design, CQRS pattern, clear separation of concerns across bounded contexts
- 🎨 Modern Frontend — Angular 21 standalone components with NgRx SignalStore for signals-based reactive state management
- Architecture
- RAG Pipeline
- Services Overview
- Tech Stack
- Prerequisites
- Quick Start
- Running Services Individually
- Configuration
- API Reference
- End-to-End Message Flow
- Performance Optimizations
- Production Readiness
- Troubleshooting
- Domain Model
- Project Structure
- Development Notes
graph TD
User([User Browser])
Kong[Kong Gateway:8000]
Keycloak[Keycloak Auth]
subgraph "Backend Services"
ChatApi["ChatService (Event Sourcing)"]
DocApi["DocumentService (Upload)"]
DocWorker["DocumentIngestion (RAG)"]
NotifApi["NotificationService (SignalR)"]
AiWorker["AiService (Worker)"]
end
subgraph "Infrastructure"
Marten[Marten/Postgres]
Rabbit[RabbitMQ]
Ollama[Ollama LLM]
MinIO[MinIO/S3]
PgVector[pgvector DB]
end
User --> Kong
Kong --> ChatApi
Kong --> DocApi
Kong --> NotifApi
Kong --> WebClient[Angular WebClient]
ChatApi --> Marten
ChatApi <--> Rabbit
DocApi --> MinIO
DocApi <--> Rabbit
DocWorker <--> Rabbit
DocWorker --> MinIO
DocWorker --> PgVector
NotifApi <--> Rabbit
AiWorker <--> Rabbit
AiWorker --> Ollama
AiWorker --> PgVector
ChatApi --> Keycloak
DocApi --> Keycloak
NotifApi --> Keycloak
WebClient --> Keycloak
Infrastructure Stack:
PostgreSQL :5432 — Marten event store (aichat) + pgvector (documents) + Keycloak auth
Keycloak :8080 — OAuth2 / OIDC identity provider
Ollama :11434 — Local LLM runtime (llama3, optimized for CPU with extended timeouts)
RabbitMQ :5672 — Event bus (Wolverine transport)
MinIO :9000 — S3-compatible object storage
Kong :8000 — API gateway (single origin, no CORS configuration needed)
3-Stage Retrieval Augmented Generation:
┌──────────────────────────────────────────────────────────────┐
│ Stage 1: RAG Decision (Binary Classifier) │
│ ───────────────────────────────────────── │
│ Input: User message │
│ Model: llama3 (fast inference, <500ms typical) │
│ Output: YES/NO — should invoke RAG for this query? │
│ Examples: │
│ "What's in our Q3 report?" → YES (internal document) │
│ "What's the capital of France?" → NO (general knowledge) │
│ Timeout: 500 seconds (CPU-optimized for local Ollama) │
└──────────────────────────────────────────────────────────────┘
↓ (if YES)
┌──────────────────────────────────────────────────────────────┐
│ Stage 2: Query Rewriting │
│ ──────────────────────── │
│ Input: Raw user query + last 4 conversation turns │
│ Model: llama3 │
│ Output: Standalone, context-resolved query │
│ Resolves: │
│ - Pronouns ("it" → "the customer retention analysis") │
│ - Temporal refs ("yesterday" → "2026-03-22") │
│ - Coreferences ("that document" → "Q3_Sales_Report.pdf") │
│ Timeout: 500 seconds (CPU-optimized for local Ollama) │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ Stage 3: Vector Retrieval (pgvector) │
│ ────────────────────────────────── │
│ 1. Embed rewritten query (nomic-embed-text, 768 dimensions) │
│ 2. pgvector HNSW cosine search (retrieves top-15) │
│ 3. Filter by scope (global/user/session) + relevance ≥0.3 │
│ 4. Deduplicate by content │
│ 5. Take top-5 chunks │
│ 6. Inject into system prompt before final LLM generation │
│ Timeout: 10 seconds │
└──────────────────────────────────────────────────────────────┘
Graceful Degradation:
- Decision timeout → Skip RAG, proceed with LLM using general knowledge
- Rewrite timeout → Use raw query for retrieval (still functional)
- Retrieval timeout → Proceed without context, log warning
- Empty results → LLM answers from general knowledge
Document Scopes:
global: Visible to all users (shared knowledge base)user: Private to uploading usersession: Private to specific conversation (temporary context)
The core domain service implementing event sourcing — every user action is stored as an immutable domain event in Marten. Wolverine sagas orchestrate the multi-step AI response flow.
Key Responsibilities:
- Manage chat sessions (
SessionAggregate) with automatic title and summary generation - Persist user and assistant messages as domain events (
MessageAggregate) - Orchestrate AI responses via
ConversationSaga:- Queues concurrent messages (prevents race conditions with
[Transactional]attribute) - Routes LLM requests to AiService via RabbitMQ
- Handles timeouts with 20-minute durable timeout and RequestId matching
- Processes retries and gave-up events from AiService
- Queues concurrent messages (prevents race conditions with
- Build conversation history (last 20 messages) for LLM context
- Serve read queries via Marten inline projections (always consistent)
- Auto-generate session titles after first AI response
- Generate rolling summaries every 10 conversation turns
Technology: ASP.NET Core 10, Marten Event Store, Wolverine, PostgreSQL
A headless background worker (no HTTP API) that executes the RAG pipeline and streams AI responses token-by-token.
Key Responsibilities:
- Listen for
LlmResponseRequestedEventfrom RabbitMQllm-requestsqueue - Execute 3-stage RAG pipeline with per-stage timeouts and fallbacks
- Stream tokens in batches of 10 (reduces RabbitMQ traffic by 90%)
- Publish
LlmSourcesFoundEventwhen RAG retrieves relevant documents - Implement exponential backoff retry (3 attempts: 2s, 4s, 6s delays)
- Publish
LlmResponseRetryingEventon transient failures - Publish
LlmResponseGaveUpEventon terminal failures:MAX_RETRIES_EXCEEDED(all 3 attempts failed)TIMEOUT(20-minute saga timeout from ChatService)SESSION_DELETED(user closed conversation mid-generation)
- Support explicit cancellation via
CancelLlmGenerationEvent - Clean up cancellation tokens via
LlmCancellationRegistry(prevents memory leaks)
Token Batching Implementation:
// Accumulates 10 tokens before publishing to reduce network overhead
if (batchTokenCount % 10 == 0)
{
await context.PublishAsync(new LlmTokensGeneratedEvent(..., currentBatch, batchTokenCount));
currentBatch.Clear();
}Technology: .NET Worker Service, Wolverine, Microsoft.Extensions.AI, Ollama/OpenAI
DocumentService.Api:
- Accept document uploads (PDF, DOCX, TXT, Markdown) to MinIO
- List, retrieve, and delete documents (all endpoints enforce UserId authorization)
- Publish
DocumentUploadedEventto trigger async processing
DocumentIngestion.Worker:
- Listen for
DocumentUploadedEventfrom RabbitMQ - Download from MinIO → Parse with Kreuzberg → Chunk text (~512 tokens/chunk with overlap)
- Generate embeddings with
nomic-embed-text(768 dimensions) - Store in pgvector with scope/userId/sessionId metadata
- Publish success (
DocumentIndexedEvent) or failure (DocumentIndexingFailedEvent) events
Technology: ASP.NET Core 10, MinIO SDK, Kreuzberg Parser, pgvector, Ollama Embeddings
A lightweight SignalR hub that bridges RabbitMQ events to browser WebSocket connections. Stateless design — no server-side session state beyond SignalR connection tracking.
Key Responsibilities:
- Authenticate SignalR connections via Keycloak JWT (same token used for HTTP APIs)
- Fan out real-time messages to specific users via
Clients.User(userId):ReceiveToken— streaming token batches (10 tokens per event)ReceiveSources— RAG source documents foundReceiveCompleted— AI response finished (includes sources if RAG was used)ReceiveGaveUp— AI generation failed (with reason code)ReceiveRetrying— AI retrying after transient errorReceiveTitleUpdated— session title auto-generatedReceiveSummaryUpdated— session summary updated
Technology: ASP.NET Core SignalR, Wolverine
An Angular 21 SPA built with standalone components and NgRx SignalStore for reactive state management using Angular signals.
Key Features:
SessionStoreandMessageStore— Signals-based reactive state with RxJS interopNotificationService— SignalR client (@microsoft/signalr) managing real-time token streamKeycloakService— wrapskeycloak-jsfor OAuth2 authentication and automatic token refreshApiInterceptor— attaches Bearer tokens to all HTTP requests- Optimistic UI — user messages appear immediately (before server confirmation)
- Smart auto-scroll — only scrolls to bottom if user is within 150px of bottom
- Retry UX — displays "Retrying..." message on transient LLM failures
Technology: Angular 21, NgRx SignalStore, Angular Material, TypeScript
| Layer | Technology | Version |
|---|---|---|
| Backend runtime | .NET | 10 |
| Frontend framework | Angular | 21 |
| Message bus | Wolverine + RabbitMQ | 5.21 / 3.x |
| Event store | Marten + PostgreSQL | 8.5+ / 15 |
| Vector database | pgvector (PostgreSQL extension) | 0.7.0 |
| Real-time messaging | ASP.NET Core SignalR | .NET 10 |
| Authentication | Keycloak | 26.1 |
| AI runtime (local) | Ollama (llama3) | latest |
| AI runtime (cloud) | OpenAI API | gpt-4o |
| Embedding model | nomic-embed-text | 768 dims |
| Document parser | Kreuzberg | 4.4.4 |
| Object storage | MinIO (S3-compatible) | latest |
| API gateway | Kong | 3.6 |
| State management | NgRx SignalStore | 21 |
| Container orchestration | Docker Compose | v2 |
- Docker Desktop (v4.x or later)
- .NET 10 SDK — only required for local development outside Docker
- Node.js 22+ and npm 11+ — only required for local frontend development
Performance Note: Ollama runs on CPU if no GPU is available. First-token latency will be 10-30 seconds depending on hardware. Timeouts are configured to 500 seconds to accommodate CPU-only execution. For faster development, consider using OpenAI API (see Configuration).
# Clone and start everything
git clone <repo-url>
cd AiChatPlatform
docker compose up --build -dOn first run, Docker Compose will:
- Start PostgreSQL and initialize three databases (
aichat,documents,keycloak) - Execute
rag-schema.sqlto create pgvector tables with HNSW index - Start Keycloak and import the
aichatrealm with pre-configured test user - Start RabbitMQ with pre-defined exchanges and queues
- Start MinIO for document storage
- Build and start all five backend services
- Pull the
llama3model into Ollama (several minutes — monitor withdocker compose logs ollama-pull -f) - Build and start the Angular web client
- Start Kong API gateway with declarative configuration
Once all health checks pass, access points:
| URL | Description |
|---|---|
http://localhost:8000 |
Main web application (via Kong) |
http://localhost:8000/scalar |
Interactive API documentation (Scalar UI) |
http://localhost:8000/hubs/chat |
SignalR hub endpoint |
http://localhost:8080 |
Keycloak admin console |
http://localhost:15672 |
RabbitMQ management UI |
http://localhost:5432 |
PostgreSQL (direct connection) |
http://localhost:11434 |
Ollama API (direct connection) |
http://localhost:9000 |
MinIO console |
| Service | Username | Password |
|---|---|---|
| Web app (test user) | testuser |
password |
| Keycloak admin | admin |
admin |
| RabbitMQ | rabbitmq |
LUUcvHJHv22GE7e |
| PostgreSQL | postgres |
postgres |
| MinIO | minioadmin |
minioadmin |
Start infrastructure dependencies:
docker compose up postgres rabbitmq keycloak ollama minio -dRun each service via .NET CLI or Visual Studio:
# ChatService
cd ChatService/ChatService.Api
dotnet run
# NotificationService
cd NotificationService/NotificationService.Api
dotnet run
# AiService Worker
cd AiService/AiService.Worker
dotnet run
# DocumentService
cd DocumentService/DocumentService.Api
dotnet run
# DocumentIngestion Worker
cd DocumentIngestion/DocumentIngestion.Worker
dotnet runcd WebClient
npm install
npm start # serves at http://localhost:4200Update WebClient/src/assets/config.json to point to local services:
{
"keycloak": {
"url": "http://localhost:8080",
"realm": "aichat",
"clientId": "aichat-web"
},
"chatApiUrl": "http://localhost:5138/api/chat",
"documentApiUrl": "http://localhost:5027/api/document",
"notificationUrl": "http://localhost:5148"
}All services use appsettings.json for defaults, overridden by docker-compose.override.yml environment variables.
ChatService / NotificationService:
environment:
- Keycloak__Authority=http://keycloak:8080/realms/aichat
- Keycloak__Audience=account
- RabbitMQ__Uri=amqp://rabbitmq:LUUcvHJHv22GE7e@rabbitmq:5672
- ConnectionStrings__Postgres=Host=postgres;Database=aichat;Username=postgres;Password=postgresAiService.Worker:
environment:
- Ollama__Endpoint=http://ollama:11434
- Ollama__ModelName=llama3
- RabbitMQ__Uri=amqp://rabbitmq:LUUcvHJHv22GE7e@rabbitmq:5672
- ConnectionStrings__DocumentsDb=Host=postgres;Database=documents;Username=postgres;Password=postgresSwitch to OpenAI (faster than Ollama on CPU):
aiserviceworker:
environment:
- OpenAI__ApiKey=sk-your-key-hereDocumentService / DocumentIngestion.Worker:
environment:
- S3__Endpoint=http://minio:9000
- S3__AccessKey=minioadmin
- S3__SecretKey=minioadmin
- S3__Bucket=documents
- ConnectionStrings__DocumentsDb=Host=postgres;Database=documents;Username=postgres;Password=postgres
- KreuzbergParser__Endpoint=http://kreuzberg:8000All API calls route through Kong at http://localhost:8000. Every endpoint requires a valid Keycloak Bearer token in the Authorization header. Interactive documentation available at http://localhost:8000/scalar.
| Method | Path | Description |
|---|---|---|
POST |
/api/chat/start |
Create a new chat session |
POST |
/api/chat/message |
Send a user message (response streams via SignalR) |
POST |
/api/chat/close |
Close and archive a session |
GET |
/api/chat/user/conversations |
List all sessions for authenticated user |
GET |
/api/chat/conversation/{sessionId} |
Get session metadata |
GET |
/api/chat/conversation/{sessionId}/messages |
Get all messages in a session |
Example: Start a session
POST /api/chat/start
Authorization: Bearer <jwt-token>
{ "title": "My first chat" }
→ 202 Accepted
{ "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6" }Example: Send a message
POST /api/chat/message
Authorization: Bearer <jwt-token>
{ "sessionId": "3fa85f64-...", "content": "What is event sourcing?" }
→ 202 AcceptedThe AI response streams back via SignalR — the HTTP response is just an acknowledgement.
| Method | Path | Description |
|---|---|---|
POST |
/api/document/upload |
Upload a document (PDF/DOCX/TXT/MD) |
GET |
/api/document |
List all documents for authenticated user |
GET |
/api/document/{id} |
Get document metadata |
GET |
/api/document/{id}/status |
Get indexing status (pending/indexed/failed) |
DELETE |
/api/document/{id} |
Delete document and all chunks from vector store |
Example: Upload a document
curl -X POST http://localhost:8000/api/document/upload \
-H "Authorization: Bearer $TOKEN" \
-F "file=@Q3_Report.pdf" \
-F "scope=user" \
-F "sessionId=3fa85f64-..."
→ 202 Accepted
{ "id": "7b2c91f3-..." }Connect to /hubs/chat with Bearer token in query string or headers. The hub pushes these events:
| Event | Payload | Description |
|---|---|---|
ReceiveToken |
{ requestId, sessionId, token } |
Batched tokens (10 per event) |
ReceiveSources |
{ requestId, sessionId, sources: string[] } |
RAG found relevant documents |
ReceiveCompleted |
{ requestId, sessionId, sources?: string[] } |
AI response finished |
ReceiveGaveUp |
{ requestId, sessionId, reason } |
AI failed (reason: LLM_ERROR, TIMEOUT, MAX_RETRIES_EXCEEDED, SESSION_DELETED) |
ReceiveRetrying |
{ requestId, sessionId } |
AI retrying after transient error |
ReceiveTitleUpdated |
{ sessionId, newTitle } |
Session title auto-generated |
ReceiveSummaryUpdated |
{ sessionId, newSummary } |
Session summary updated |
1. User types message and clicks Send
└─► Angular MessageStore.sendMessage()
└─► POST /api/chat/message (HTTP via Kong → ChatService)
2. ChatService.SendMessageHandler
└─► Creates MessageAggregate
└─► Appends MessageCreatedEvent to Marten event stream
└─► Marten MessageProjection stores MessageDto (inline, synchronous)
3. ConversationSaga receives MessageCreatedEvent (via Wolverine subscription)
└─► If not processing: calls PromptBuilder.BuildAsync()
└─► Queries last 20 MessageDtos ordered by SentAt
└─► Builds List<ChatTurn> for conversation history
└─► Publishes LlmResponseRequestedEvent → RabbitMQ "llm-requests"
└─► Schedules 20-minute durable timeout (ConversationProcessingTimeout)
└─► If already processing: enqueues message.Id in PendingMessageIds queue
4. AiService.Worker.GenerateAiResponseHandler receives LlmResponseRequestedEvent
└─► Stage 1 (RAG Decision): ShouldInvokeAsync() → YES/NO (500s timeout)
└─► If NO: skip to step 6
└─► Stage 2 (Query Rewrite): BuildQueryAsync() resolves pronouns/refs (500s timeout)
└─► Stage 3 (Retrieval): ExecuteAsync() → pgvector HNSW search (10s timeout)
└─► Publishes LlmSourcesFoundEvent if sources found
└─► Calls IChatClient.GetStreamingResponseAsync() with RAG context injected
└─► Every 10 tokens: publishes batched LlmTokensGeneratedEvent → RabbitMQ
└─► On completion: publishes LlmResponseCompletedEvent (includes sources)
└─► On failure: publishes LlmResponseRetryingEvent (retry 1-3) then LlmResponseGaveUpEvent
5. NotificationService receives LlmTokensGeneratedEvent (from RabbitMQ)
└─► Looks up user's SignalR connectionId
└─► Calls Clients.User(userId).SendAsync("ReceiveToken", { token: "batch..." })
6. Angular NotificationService receives "ReceiveToken" (via SignalR WebSocket)
└─► MessageStore.appendToken(token)
└─► streamingContent signal updates
└─► Message list component re-renders automatically (Angular signals)
7. ConversationSaga receives LlmResponseCompletedEvent
└─► Creates MessageAggregate (Role=Assistant, Sources from RAG)
└─► Saves to Marten event stream
└─► If first assistant response: triggers session title generation
└─► If turnCount % 10 == 0: triggers rolling summary
└─► If PendingMessageIds.Count > 0: dequeues next message, starts processing
8. NotificationService receives LlmResponseCompletedEvent
└─► Sends "ReceiveCompleted" to client (with sources array)
9. Angular receives "ReceiveCompleted"
└─► MessageStore.finalizeStream()
└─► Moves streamingContent to messages array
└─► Displays RAG sources as chips below message
└─► Sets isStreaming = false, re-enables input field
-
pgvector HNSW Index — Configured in
Postgres/rag-schema.sqlCREATE INDEX idx_document_chunks_embedding ON rag.document_chunks USING hnsw (embedding vector_cosine_ops);
- Approximate nearest neighbor search (much faster than exact)
- Optimized for cosine similarity metric
-
Token Batching —
GenerateAiResponseHandler.csline 2211- Publishes every 10 tokens instead of every token
- 90% reduction in RabbitMQ message volume
- Reduces network overhead and deserialization CPU cost
-
Marten AsNoTracking — Used in all read queries
context.Documents.AsNoTracking()...
- Avoids change tracking overhead for read-only operations
-
Composite Indexes —
rag-schema.sqlline 8343CREATE INDEX idx_document_chunks_scope ON rag.document_chunks (scope, user_id, session_id);
- Optimizes scope-based filtering in RAG retrieval
-
RAG Decision Layer —
RagTool.ShouldInvokeAsync()- Binary classifier avoids unnecessary vector search
- Reduces latency and LLM token cost for general knowledge queries
-
Query Rewriting —
RagTool.BuildQueryAsync()- Resolves pronouns, temporal references, coreferences
- Improves vector search accuracy (better semantic match)
-
Inline Projections — Marten
ConversationProjectionandMessageProjection- Synchronous read model updates (no eventual consistency delay)
- Suitable for moderate message volumes (tested up to 10k messages per session)
Extended timeouts for Ollama running on CPU (no GPU):
- RAG decision: 500 seconds (line 2249)
- Query rewrite: 500 seconds (line 2264)
- Vector retrieval: 10 seconds (line 2280)
For production with GPU or cloud AI APIs, reduce to milliseconds:
decisionCts.CancelAfter(TimeSpan.FromMilliseconds(500)); // 0.5 secondsThis project is optimized for local demonstration of architectural patterns. For production deployment, address the following:
-
Add Unit and Integration Tests
- ConversationSaga state transitions
- Authorization enforcement in handlers
- RAG pipeline graceful degradation
- Token batching logic
-
Create EF Core Migrations
cd ChatService/ChatService.Infrastructure dotnet ef migrations add Initial cd DocumentService/DocumentService.Infrastructure dotnet ef migrations add Initial
-
Externalize Secrets
- Use Azure Key Vault, AWS Secrets Manager, or Docker secrets
- Rotate RabbitMQ, PostgreSQL, and MinIO credentials
- Store Keycloak client secrets securely
-
Enable SSL/TLS
- Configure SSL certificates in Kong
- Set
RequireHttpsMetadata = truein Keycloak options - Use HTTPS for all external endpoints
-
Add Health Checks
- Implement
/healthendpoints for Kubernetes liveness/readiness probes - Monitor RabbitMQ connection health
- Check PostgreSQL connectivity
- Implement
-
Configure Logging and Monitoring
- Structured logging with Serilog (include correlation IDs)
- Distributed tracing with OpenTelemetry
- Metrics collection (Prometheus + Grafana)
-
Implement Circuit Breakers
- Add Polly circuit breaker for Ollama/OpenAI calls
- Prevent cascading failures from external API timeouts
-
Database Connection Resilience
- Add retry policies for transient connection failures
- Configure connection pooling limits
-
Event Handler Idempotency
- Track processed event IDs to handle RabbitMQ redelivery
- Use unique constraint on message deduplication table
-
Shared Database: DocumentService, DocumentIngestion, and AiService all access the
documentsdatabase- Production fix: Migrate to event-driven (publish
DocumentChunkIndexedevents)
- Production fix: Migrate to event-driven (publish
-
No Database Migrations: Schema created via SQL scripts
- Demo rationale: Simplifies initial setup, schema is stable
-
Plain Text Secrets in docker-compose.yml
- Demo rationale: Easy local development, no production deployment
-
No SSL/TLS:
RequireHttpsMetadata = false- Demo rationale: Simplifies Keycloak integration locally
-
Extended Timeouts (500s): Optimized for CPU-only Ollama
- Production fix: Reduce to milliseconds when using GPU or cloud APIs
# Monitor download progress
docker compose logs ollama-pull -f
# If stuck, restart the pull service
docker compose restart ollama-pull
# Verify model is available
docker compose exec ollama ollama list# Check NotificationService logs
docker compose logs notificationserviceapi -f
# Verify Keycloak token is valid
curl -H "Authorization: Bearer $TOKEN" \
http://localhost:8000/hubs/chat/negotiate
# Test WebSocket connectivity
docker compose exec webclient curl http://notificationserviceapi:8080/hubs/chat# List all queues
docker compose exec rabbitmq rabbitmqctl list_queues
# Check if definitions were imported
docker compose exec rabbitmq cat /etc/rabbitmq/definitions.json
# Re-import definitions (requires restart)
docker compose restart rabbitmq-- Connect to PostgreSQL
docker compose exec postgres psql -U postgres -d documents
-- Verify index exists
SELECT indexname FROM pg_indexes
WHERE tablename = 'DocumentChunks' AND schemaname = 'rag';
-- Check query plan (should show "Index Scan using idx_document_chunks_embedding")
EXPLAIN ANALYZE
SELECT * FROM rag."DocumentChunks"
ORDER BY "Embedding" <=> '[0.1, 0.2, ...]' LIMIT 5;# Check DocumentIngestion.Worker logs
docker compose logs documentingestionworker -f
# Verify MinIO connectivity
docker compose exec documentingestionworker curl http://minio:9000/minio/health/live
# Check Kreuzberg parser availability
docker compose exec documentingestionworker curl http://kreuzberg:8000/health# Angular: Use --poll flag for file watcher
cd WebClient
npm start -- --poll
# Verify ASPNETCORE_ENVIRONMENT=Development in launchSettings.jsonSessionAggregate (Event Sourcing)
├─ Id: Guid (stream identifier)
├─ UserId: Guid
├─ Title: string (auto-generated after first AI response)
├─ Summary: string (rolling summary every 10 turns)
├─ StartedAt: DateTime
├─ LastActivityAt: DateTime
└─ DeletedAt: DateTime?
Domain Events:
- SessionCreatedEvent
- SessionUpdatedEvent
- SessionDeletedEvent
- SessionTitleUpdatedEvent
- SessionSummaryUpdatedEvent
MessageAggregate (Event Sourcing)
├─ Id: Guid
├─ SessionId: Guid (stream identifier)
├─ SenderId: Guid
├─ Content: string
├─ Role: MessageRole (User=0, Assistant=1, System=2)
├─ Sources: string[] (RAG source documents, nullable)
└─ SentAt: DateTime
Domain Events:
- MessageCreatedEvent
ConversationSaga (Wolverine persistent saga, keyed by SessionId)
├─ Id: Guid (= SessionId)
├─ UserId: Guid
├─ IsProcessing: bool
├─ ActiveRequestId: Guid?
├─ PendingMessageIds: Queue<Guid>
├─ TurnCount: int
└─ HasGeneratedTitle: bool
Marten Inline Projections (always consistent with events):
| Projection | Type | Purpose |
|---|---|---|
ConversationProjection |
MultiStreamProjection<ConversationDto> |
Aggregates session + message events into a denormalized read model per session |
MessageProjection |
EventProjection |
Creates one MessageDto per MessageCreatedEvent for efficient message list queries |
AiChatPlatform/
├── AiService/
│ ├── AiService.Application/
│ │ ├── Handlers/
│ │ │ ├── GenerateAiResponseHandler.cs # RAG pipeline + token batching
│ │ │ ├── CancelLlmGenerationHandler.cs
│ │ │ └── SummarizeConversationHandler.cs
│ │ ├── Services/
│ │ │ ├── ILlmService.cs
│ │ │ ├── IRagTool.cs
│ │ │ └── LlmCancellationRegistry.cs # CTS lifecycle management
│ │ └── Dtos/
│ │ ├── DocumentChunkDto.cs
│ │ └── RagToolResult.cs
│ ├── AiService.Infrastructure/
│ │ ├── Options/
│ │ │ ├── AiPromptOptions.cs # System prompts for RAG/summarization
│ │ │ └── ChatOptionsFactory.cs
│ │ ├── Services/
│ │ │ ├── OllamaLlmService.cs
│ │ │ ├── RagTool.cs # 3-stage RAG implementation
│ │ │ └── PgVectorRetrievalService.cs # Vector similarity search
│ │ └── Persistence/
│ │ ├── AiDbContext.cs # Read-only access to documents.rag
│ │ └── DocumentChunkEntity.cs
│ └── AiService.Worker/
│ ├── Program.cs # Wolverine + RabbitMQ configuration
│ └── Worker.cs
│
├── ChatService/
│ ├── ChatService.Api/
│ │ ├── Controllers/ChatController.cs
│ │ └── Extensions/ClaimsPrincipalExtensions.cs
│ ├── ChatService.Application/
│ │ ├── Features/ # CQRS handlers (commands + queries)
│ │ │ ├── StartChat/
│ │ │ ├── SendMessage/
│ │ │ ├── CloseConversation/
│ │ │ ├── GetConversation/
│ │ │ ├── GetMessages/
│ │ │ └── ListUserConversations/
│ │ ├── Sagas/
│ │ │ └── ConversationSaga.cs # Orchestration + timeout handling
│ │ ├── Services/
│ │ │ └── PromptBuilder.cs # Conversation history builder
│ │ └── Dtos/
│ │ ├── ConversationDto.cs
│ │ └── MessageDto.cs
│ ├── ChatService.Domain/
│ │ ├── Session/
│ │ │ ├── SessionAggregate.cs
│ │ │ └── Events/
│ │ └── Message/
│ │ ├── MessageAggregate.cs
│ │ └── Events/MessageCreatedEvent.cs
│ └── ChatService.Infrastructure/
│ ├── EventStore/
│ │ ├── MartenEventStoreRepository.cs
│ │ └── MartenReadOnlyEventStore.cs
│ ├── Projections/
│ │ ├── ConversationProjection.cs # MultiStreamProjection
│ │ └── MessageProjection.cs
│ └── WolverineMartenConfiguration.cs # Retry policies, RabbitMQ routing
│
├── DocumentService/
│ ├── DocumentService.Api/
│ │ └── Controllers/DocumentController.cs
│ ├── DocumentService.Application/
│ │ ├── Features/
│ │ │ ├── UploadDocument/
│ │ │ └── DeleteDocument/
│ │ └── Services/IDocumentRepository.cs
│ └── DocumentService.Infrastructure/
│ ├── Persistence/DocumentDbContext.cs # EF Core (documents.public)
│ ├── Repositories/DocumentRepository.cs
│ └── Services/S3StorageService.cs
│
├── DocumentIngestion/
│ ├── DocumentIngestion.Application/
│ │ ├── Handlers/
│ │ │ ├── DocumentUploadedHandler.cs # Parse → Embed → Store
│ │ │ └── DocumentDeletedHandler.cs
│ │ └── Services/
│ │ ├── IChunkingService.cs
│ │ ├── IEmbeddingService.cs
│ │ └── IVectorStoreRepository.cs
│ ├── DocumentIngestion.Infrastructure/
│ │ ├── Parsers/KreuzbergDocumentParser.cs
│ │ ├── Services/
│ │ │ ├── OllamaEmbeddingService.cs
│ │ │ └── S3StorageService.cs
│ │ ├── Repositories/PgVectorRepository.cs
│ │ └── Persistence/
│ │ ├── IngestionDbContext.cs # EF Core (documents.rag)
│ │ └── DocumentChunkEntity.cs
│ └── DocumentIngestion.Worker/
│ └── Program.cs
│
├── NotificationService/
│ ├── NotificationService.Api/
│ │ ├── ChatHub.cs # SignalR hub
│ │ └── Services/SignalRNotificationService.cs
│ └── NotificationService.Application/
│ ├── Handlers/ # RabbitMQ event → SignalR forwarding
│ │ ├── LlmTokenGeneratedHandler.cs
│ │ ├── LlmSourcesFoundHandler.cs
│ │ ├── LlmResponseCompletedHandler.cs
│ │ ├── LlmResponseGaveUpHandler.cs
│ │ ├── LlmResponseRetryingHandler.cs
│ │ ├── SessionTitleUpdatedNotificationHandler.cs
│ │ └── SessionSummaryUpdatedNotificationHandler.cs
│ └── Services/INotificationService.cs
│
├── WebClient/ # Angular 21 SPA
│ └── src/app/
│ ├── core/
│ │ ├── api/
│ │ │ ├── api.interceptor.ts
│ │ │ └── chat.service.ts
│ │ ├── auth/
│ │ │ ├── auth.guard.ts
│ │ │ └── keycloak.service.ts
│ │ ├── config/config.service.ts
│ │ └── signalr/notification.service.ts
│ ├── features/
│ │ ├── chat/
│ │ │ ├── chat.component.ts
│ │ │ ├── message-input/
│ │ │ └── message-list/
│ │ └── sessions/
│ │ ├── session-item/
│ │ └── session-list/
│ ├── models/
│ │ ├── message.model.ts
│ │ └── session.model.ts
│ ├── store/
│ │ ├── message.store.ts # NgRx SignalStore
│ │ └── session.store.ts
│ └── shared/
│ ├── confirm-dialog/
│ └── new-chat-dialog/
│
├── BuildingBlocks/
│ ├── BuildingBlocks.Contracts/ # Shared event contracts
│ │ ├── LlmEvents/
│ │ ├── SessionEvents/
│ │ ├── DocumentEvents/
│ │ └── Models/ChatTurn.cs
│ └── BuildingBlocks.Core/ # DDD base classes
│ ├── BaseAggregate.cs
│ ├── IEventStoreRepository.cs
│ └── IReadOnlyEventStore.cs
│
├── Kong/kong.yml # Declarative API gateway config
├── Keycloak/realm-export.json # Pre-configured OAuth2 realm
├── Postgres/
│ ├── init-multiple-databases.sh # Multi-database initialization
│ └── rag-schema.sql # pgvector tables + HNSW index
├── RabbitMQ/rabbitmq-definitions.json # Queue/exchange topology
├── docker-compose.yml
├── docker-compose.override.yml
└── AiChatPlatform.slnx # .NET solution file
RabbitMQ Queue Topology (pre-configured in rabbitmq-definitions.json):
| Queue | Producer | Consumer |
|---|---|---|
llm-requests |
ChatService (Saga) | AiService.Worker |
llm-summarization |
ChatService (Saga) | AiService.Worker |
llm-tokens.notificationservice |
AiService.Worker | NotificationService |
llm-completed.chatservice |
AiService.Worker | ChatService (Saga) |
llm-completed.notificationservice |
AiService.Worker | NotificationService |
llm-gave-up.chatservice |
AiService.Worker | ChatService (Saga) |
llm-gave-up.notificationservice |
AiService.Worker | NotificationService |
llm-retrying.notificationservice |
AiService.Worker | NotificationService |
document-uploads |
DocumentService | DocumentIngestion.Worker |
document-indexed |
DocumentIngestion.Worker | NotificationService |
session-notifications |
ChatService | NotificationService (fanout exchange) |
Marten Configuration:
- Event store mode: Inline projections (synchronous, always consistent)
- Async daemon: Configured in HotCold mode (unused unless async projections added)
- Optimistic concurrency: Retry policy with jittered cooldown (50ms, 100ms, 250ms)
Keycloak Realm:
- Auto-imported from
realm-export.jsonon first start - Client:
aichat-web(Authorization Code + PKCE flow) - Test user:
testuser/password - Realm reset: Delete
keycloak-datavolume, restart container
pgvector Extension:
- Enabled via
init-multiple-databases.shwhen PostgreSQL starts - HNSW index created by
rag-schema.sql(approximate nearest neighbor) - Requires PostgreSQL 15+ with pgvector extension installed
Stopping Services:
docker compose down # Stop containers, preserve volumes
docker compose down -v # Stop containers, delete all volumes (full reset)MIT
This is a portfolio/demonstration project showcasing modern .NET microservices architecture. Issues and pull requests are welcome for bug fixes and improvements.
Before submitting a PR:
- Add tests for new functionality
- Ensure all services build and start via
docker compose up - Update this README if adding features or changing behavior
Built as a demonstration of production-grade .NET microservices architecture with AI integration 🚀