From 95698a3241a56f6c973659b8e96ac983579e51ef Mon Sep 17 00:00:00 2001
From: Siva <siva.tamatam@pgedge.com>
Date: Wed, 22 Apr 2026 00:45:15 +0530
Subject: [PATCH 01/11] docs: add RAG service documentation and deployment
 guide

---
 changes/unreleased/Added-20260422-004204.yaml |    3 +
 docs/services/index.md                        |    5 +-
 docs/services/rag.md                          | 1218 +++++++++++++++++
 3 files changed, 1224 insertions(+), 2 deletions(-)
 create mode 100644 changes/unreleased/Added-20260422-004204.yaml
 create mode 100644 docs/services/rag.md

diff --git a/changes/unreleased/Added-20260422-004204.yaml b/changes/unreleased/Added-20260422-004204.yaml
new file mode 100644
index 00000000..4b8ad287
--- /dev/null
+++ b/changes/unreleased/Added-20260422-004204.yaml
@@ -0,0 +1,3 @@
+kind: Added
+body: Add RAG service with hybrid vector + LLM search
+time: 2026-04-22T00:42:04.283582+05:30
diff --git a/docs/services/index.md b/docs/services/index.md
index 5267e180..406ad457 100644
--- a/docs/services/index.md
+++ b/docs/services/index.md
@@ -15,8 +15,9 @@ the following service types:
 - The [pgEdge Postgres MCP Server](mcp.md) connects AI agents and
   LLM-powered applications to your database, enabling natural language
   queries and AI-powered data access.
-- The pgEdge RAG Server *(coming soon)* enables retrieval-augmented
-  generation workflows using your database as a knowledge store.
+- The [pgEdge RAG Server](rag.md) enables retrieval-augmented generation
+  workflows using your database as a knowledge store, returning
+  LLM-synthesized answers grounded in your data.
 - [PostgREST](postgrest.md) automatically generates a REST API from
   your PostgreSQL schema, making your data accessible over HTTP without
   writing backend code.
diff --git a/docs/services/rag.md b/docs/services/rag.md
new file mode 100644
index 00000000..e80a98fc
--- /dev/null
+++ b/docs/services/rag.md
@@ -0,0 +1,1218 @@
+# pgEdge RAG Server
+
+The RAG (Retrieval-Augmented Generation) service runs an intelligent query
+server alongside your database. The service uses vector and keyword search
+to retrieve relevant document chunks from PostgreSQL and synthesizes
+LLM-generated answers based on the retrieved context. For more information,
+see the [pgEdge RAG Server](https://github.com/pgEdge/pgedge-rag-server)
+project.
+
+## Overview
+
+The Control Plane provisions a RAG service container on each specified
+host. The service connects to the database using an existing user specified
+in the `connect_as` field (which must be defined in `database_users`). The
+credentials are automatically embedded in the service configuration by the
+Control Plane. Client applications submit natural language queries to the
+service, which performs hybrid vector and keyword search against document
+tables and returns LLM-synthesized answers with source citations.
+
+See [Managing Services](managing.md) for instructions on adding,
+updating, and removing services. The sections below cover RAG-specific
+configuration.
+
+## Database Prerequisites
+
+Before deploying a RAG service, your PostgreSQL database must have:
+
+1. **pgvector extension** installed and enabled
+2. **Document table(s)** with text and vector columns
+3. **HNSW index** on vector columns for fast similarity search
+4. **GIN index** on text columns for keyword search (BM25)
+
+The Control Plane can automatically provision all of these during database
+creation using the `scripts.post_database_create` hook. See [Preparing the
+Database](#preparing-the-database) for a complete example. Alternatively,
+you can provision these manually after database creation.
+
+## Automation & Responsibilities
+
+The Control Plane handles certain setup tasks automatically during database
+and service creation:
+
+**Automated (Control Plane)**
+- Creating pgvector extension
+- Creating document tables and indexes (via `scripts.post_database_create`)
+- Embedding RAG service credentials into configuration files
+- Deploying RAG container and health monitoring
+
+**Manual (You Provide)**
+- **Schema Design**: Deciding table structure, column names, vector dimensions
+- **Embedding Generation**: Using external APIs (OpenAI, Voyage, Ollama) to vectorize documents
+- **Document Loading**: Inserting documents and embeddings into the database
+- **API Credentials**: Providing LLM and embedding provider API keys
+
+## Configuration Reference
+
+All configuration fields are provided in the `config` object of the
+service spec.
+
+### Service Connection
+
+The `connect_as` field (at the service level) specifies which database user
+the RAG service will authenticate as. This user **must already be defined** in
+the `database_users` array when creating the database. The Control Plane
+automatically embeds that user's credentials in the service configuration.
+
+Example:
+```json
+{
+  "service_id": "rag",
+  "service_type": "rag",
+  "connect_as": "app_read_only",
+  "config": { ... }
+}
+```
+
+In this example, `app_read_only` must be defined in `database_users`:
+```json
+{
+  "username": "app_read_only",
+  "password": "your_password",
+  "attributes": ["LOGIN"]
+}
+```
+
+### Pipeline Configuration
+
+The `pipelines` array (required) defines one or more RAG workflows. Each
+pipeline specifies which tables to search, which embedding provider to use,
+and which LLM to use for answer generation.
+
+| Field | Type | Description |
+|---|---|---|
+| `pipelines[].name` | string | Required. Pipeline identifier used in query URLs. Lowercase alphanumeric, hyphens, and underscores. Must not start with a hyphen. |
+| `pipelines[].description` | string | Optional. Human-readable pipeline description. |
+| `pipelines[].tables[]` | array | Required. Array of table specifications. See [Table Configuration](#table-configuration). |
+| `pipelines[].embedding_llm` | object | Required. Embedding provider config. See [Embedding Configuration](#embedding-configuration). |
+| `pipelines[].rag_llm` | object | Required. LLM provider config. See [LLM Configuration](#llm-configuration). |
+| `pipelines[].token_budget` | integer | Optional. Max tokens for context documents sent to the LLM. |
+| `pipelines[].top_n` | integer | Optional. Number of documents to retrieve per query. |
+| `pipelines[].system_prompt` | string | Optional. Custom system prompt prepended to every LLM request for this pipeline. |
+| `pipelines[].search` | object | Optional. Search behavior settings. See [Search Configuration](#search-configuration). |
+
+### Embedding Configuration
+
+The `embedding_llm` object configures the embedding provider used to
+vectorize each incoming query. The embedding vector is then used for
+similarity search against stored document vectors. All required fields
+must be set; `api_key` is not required for `ollama`.
+
+| Field | Type | Description |
+|---|---|---|
+| `provider` | string | Required. The embedding provider. One of: `openai`, `voyage`, `anthropic`, `ollama`. |
+| `model` | string | Required. The embedding model name (e.g., `text-embedding-3-small`, `voyage-3`, `nomic-embed-text`). |
+| `api_key` | string | API key for the provider. Required for `openai`, `voyage`, and `anthropic`. Not used for `ollama`. |
+| `base_url` | string | Optional. Custom base URL for the provider API. For `ollama`, defaults to `http://localhost:11434`. |
+
+### LLM Configuration
+
+The `rag_llm` object configures the LLM provider used to synthesize the
+final answer from retrieved documents. `api_key` is required for all
+providers except `ollama`.
+
+| Field | Type | Description |
+|---|---|---|
+| `provider` | string | Required. The LLM provider. One of: `anthropic`, `openai`, `ollama`. |
+| `model` | string | Required. The model name (e.g., `claude-sonnet-4-20250514`, `gpt-4o`, `llama3.2`). |
+| `api_key` | string | API key for the provider. Required for `anthropic` and `openai`. Not used for `ollama`. |
+| `base_url` | string | Optional. Custom base URL for API gateway routing. For `ollama`, defaults to `http://localhost:11434`. |
+
+!!! note
+    If `embedding_llm` and `rag_llm` share the same provider and both specify
+    an `api_key`, the values must be identical. The RAG server maintains one
+    key slot per provider and cannot reconcile two different values.
+
+### Table Configuration
+
+Each table in a pipeline specifies how to access document text and
+embeddings.
+
+| Field | Type | Description |
+|---|---|---|
+| `table` | string | Required. The table or view name containing documents. |
+| `text_column` | string | Required. Column name containing the document text. |
+| `vector_column` | string | Required. Column name containing the embedding vectors. |
+| `id_column` | string | Optional. Column name for document IDs. Defaults to the table's primary key. Required for views. |
+
+### Search Configuration
+
+The `search` object tunes how documents are retrieved before being passed
+to the LLM.
+
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `hybrid_enabled` | boolean | `true` | Enable hybrid search combining vector similarity and BM25 keyword matching. Set to `false` for vector-only search. |
+| `vector_weight` | float | `0.5` | Weight for vector search versus BM25 (0.0–1.0). Higher values prioritize semantic relevance. |
+
+### Defaults Configuration
+
+The optional `defaults` object sets fallback values applied to any pipeline
+that does not specify its own `token_budget` or `top_n`.
+
+| Field | Type | Description |
+|---|---|---|
+| `defaults.token_budget` | integer | Default max tokens for context documents. Must be a positive integer. |
+| `defaults.top_n` | integer | Default number of documents to retrieve. Must be a positive integer. |
+
+## Preparing the Database
+
+Before deploying a RAG service, you must prepare your PostgreSQL database
+with pgvector, document tables, and indexes. The Control Plane automatically
+executes these during database creation when you include them in the
+`scripts.post_database_create` array in your database specification.
+
+### Required Schema
+
+The following SQL statements should be included in `scripts.post_database_create`
+to automatically initialize the database schema during creation:
+
+```sql
+-- Enable pgvector extension
+CREATE EXTENSION IF NOT EXISTS vector;
+
+-- Create documents table with embeddings
+CREATE TABLE IF NOT EXISTS documents_content_chunks (
+    id BIGSERIAL PRIMARY KEY,
+    content TEXT NOT NULL,
+    embedding vector(1536),
+    title TEXT,
+    source TEXT,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+-- HNSW index for vector similarity search
+CREATE INDEX IF NOT EXISTS documents_embedding_idx
+    ON documents_content_chunks USING hnsw (embedding vector_cosine_ops);
+
+-- GIN index for keyword search (BM25)
+CREATE INDEX IF NOT EXISTS documents_content_idx
+    ON documents_content_chunks USING gin (to_tsvector('english', content));
+```
+
+These statements are included as individual entries in the `scripts.post_database_create`
+array (see examples below).
+
+### Vector Dimensions
+
+Adjust the `vector(N)` dimension based on your embedding model:
+
+| Provider | Model | Dimensions |
+|----------|-------|-----------|
+| OpenAI | `text-embedding-3-small` | 1536 |
+| OpenAI | `text-embedding-3-large` | 3072 |
+| Voyage AI | `voyage-3` / `voyage-3-large` | 1024 |
+| Ollama | Varies by model | Check model documentation |
+
+### Loading Documents
+
+After the database and RAG service are deployed, you are responsible for
+generating embeddings for your documents and loading them into the database.
+The Control Plane does not automate this step—you must run this process
+separately, typically via an external application or scheduled task.
+
+Here's a Python example using OpenAI to generate embeddings and load documents:
+
+```python
+#!/usr/bin/env python3
+"""Generate embeddings and load documents into the RAG database."""
+
+import psycopg2
+from psycopg2.extras import execute_values
+from openai import OpenAI
+import os
+import sys
+
+# Configuration
+OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
+DB_HOST = os.environ.get("DB_HOST", "localhost")
+DB_USER = os.environ.get("DB_USER", "admin")
+DB_PASSWORD = os.environ.get("DB_PASSWORD", "admin_password")
+DB_NAME = os.environ.get("DB_NAME", "knowledge_base")
+
+def chunk_text(text, chunk_size=500, overlap=50):
+    """Split text into overlapping chunks."""
+    chunks = []
+    for i in range(0, len(text), chunk_size - overlap):
+        chunk = text[i:i + chunk_size]
+        if chunk.strip():
+            chunks.append(chunk)
+    return chunks
+
+def generate_embeddings(texts, client):
+    """Generate embeddings for multiple texts."""
+    response = client.embeddings.create(
+        model="text-embedding-3-small",
+        input=texts
+    )
+    return [item.embedding for item in response.data]
+
+# Sample documents
+documents = [
+    {
+        "title": "pgEdge Overview",
+        "content": "pgEdge is a distributed PostgreSQL system...",
+        "source": "docs"
+    },
+    {
+        "title": "RAG Guide",
+        "content": "RAG enables intelligent question-answering systems...",
+        "source": "docs"
+    }
+]
+
+if not OPENAI_API_KEY:
+    print("ERROR: OPENAI_API_KEY environment variable not set")
+    sys.exit(1)
+
+client = OpenAI(api_key=OPENAI_API_KEY)
+conn = psycopg2.connect(
+    host=DB_HOST,
+    user=DB_USER,
+    password=DB_PASSWORD,
+    database=DB_NAME
+)
+cur = conn.cursor()
+
+total_inserted = 0
+
+for doc in documents:
+    print(f"Processing: {doc['title']}")
+    chunks = chunk_text(doc["content"])
+    
+    if chunks:
+        # Generate embeddings for all chunks
+        embeddings = generate_embeddings(chunks, client)
+        
+        # Prepare batch insert data
+        insert_data = [
+            (chunk, embedding, doc["title"], doc["source"])
+            for chunk, embedding in zip(chunks, embeddings)
+        ]
+        
+        # Batch insert
+        insert_query = """
+            INSERT INTO documents_content_chunks
+                (content, embedding, title, source)
+            VALUES %s
+        """
+        execute_values(cur, insert_query, insert_data)
+        conn.commit()
+        
+        inserted = len(insert_data)
+        total_inserted += inserted
+        print(f"  Inserted {inserted} chunks")
+
+print(f"\nTotal chunks inserted: {total_inserted}")
+cur.close()
+conn.close()
+```
+
+**Usage:**
+```bash
+pip install psycopg2-binary openai
+export OPENAI_API_KEY="sk-..."
+export DB_HOST="localhost"
+export DB_USER="admin"
+export DB_PASSWORD="admin_password"
+export DB_NAME="knowledge_base"
+python3 load_rag_documents.py
+```
+
+## Examples
+
+The following examples show how to configure the RAG service for common
+use cases. All examples use the `scripts.post_database_create` field to
+automatically provision the database schema (pgvector extension, tables,
+and indexes) during database creation.
+
+### Minimal (OpenAI + Anthropic)
+
+In the following example, a `curl` command provisions a RAG service with
+OpenAI for embeddings and Anthropic Claude for answer generation:
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "id": "knowledge_base",
+            "spec": {
+                "database_name": "knowledge_base",
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    }
+                ],
+                "port": 5432,
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "scripts": {
+                    "post_database_create": [
+                        "CREATE EXTENSION IF NOT EXISTS vector",
+                        "CREATE TABLE IF NOT EXISTS documents_content_chunks (id BIGSERIAL PRIMARY KEY, content TEXT NOT NULL, embedding vector(1536), title TEXT, source TEXT)",
+                        "CREATE INDEX ON documents_content_chunks USING hnsw (embedding vector_cosine_ops)",
+                        "CREATE INDEX ON documents_content_chunks USING gin (to_tsvector('\''english'\'', content))"
+                    ]
+                },
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "admin",
+                        "config": {
+                            "pipelines": [
+                                {
+                                    "name": "default",
+                                    "description": "Main RAG pipeline",
+                                    "tables": [
+                                        {
+                                            "table": "documents_content_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "openai",
+                                        "model": "text-embedding-3-small",
+                                        "api_key": "sk-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "anthropic",
+                                        "model": "claude-sonnet-4-20250514",
+                                        "api_key": "sk-ant-..."
+                                    },
+                                    "token_budget": 4000,
+                                    "top_n": 10
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+### OpenAI End-to-End
+
+In the following example, OpenAI is used for both embeddings and answer
+generation:
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "id": "knowledge_base",
+            "spec": {
+                "database_name": "knowledge_base",
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    }
+                ],
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "admin",
+                        "config": {
+                            "pipelines": [
+                                {
+                                    "name": "default",
+                                    "tables": [
+                                        {
+                                            "table": "documents_content_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "openai",
+                                        "model": "text-embedding-3-small",
+                                        "api_key": "sk-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "openai",
+                                        "model": "gpt-4o",
+                                        "api_key": "sk-..."
+                                    }
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+### Voyage AI with Vector-Only Search
+
+In the following example, Voyage AI is used for embeddings and the service
+is configured for vector-only search (disabling BM25 keyword matching):
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "id": "knowledge_base",
+            "spec": {
+                "database_name": "knowledge_base",
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    }
+                ],
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "admin",
+                        "config": {
+                            "pipelines": [
+                                {
+                                    "name": "default",
+                                    "tables": [
+                                        {
+                                            "table": "documents_content_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "voyage",
+                                        "model": "voyage-3",
+                                        "api_key": "pa-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "anthropic",
+                                        "model": "claude-sonnet-4-20250514",
+                                        "api_key": "sk-ant-..."
+                                    },
+                                    "search": {
+                                        "hybrid_enabled": false,
+                                        "vector_weight": 1.0
+                                    }
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+### Ollama (Self-Hosted)
+
+In the following example, the RAG service uses a self-hosted Ollama server
+for both embeddings and answer generation. No API key is required; the
+Ollama server URL is provided via `base_url`:
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "id": "knowledge_base",
+            "spec": {
+                "database_name": "knowledge_base",
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    }
+                ],
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "admin",
+                        "config": {
+                            "pipelines": [
+                                {
+                                    "name": "default",
+                                    "tables": [
+                                        {
+                                            "table": "documents_content_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "ollama",
+                                        "model": "nomic-embed-text",
+                                        "base_url": "http://ollama-host:11434"
+                                    },
+                                    "rag_llm": {
+                                        "provider": "ollama",
+                                        "model": "llama3.2",
+                                        "base_url": "http://ollama-host:11434"
+                                    }
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+### Multiple Pipelines with Shared Defaults
+
+In the following example, two pipelines share default `token_budget` and
+`top_n` values set at the `defaults` level:
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "id": "knowledge_base",
+            "spec": {
+                "database_name": "knowledge_base",
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    }
+                ],
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "admin",
+                        "config": {
+                            "defaults": {
+                                "token_budget": 4000,
+                                "top_n": 10
+                            },
+                            "pipelines": [
+                                {
+                                    "name": "docs",
+                                    "description": "Product documentation",
+                                    "tables": [
+                                        {
+                                            "table": "doc_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "openai",
+                                        "model": "text-embedding-3-small",
+                                        "api_key": "sk-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "anthropic",
+                                        "model": "claude-sonnet-4-20250514",
+                                        "api_key": "sk-ant-..."
+                                    }
+                                },
+                                {
+                                    "name": "support",
+                                    "description": "Support ticket history",
+                                    "tables": [
+                                        {
+                                            "table": "ticket_chunks",
+                                            "text_column": "body",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "openai",
+                                        "model": "text-embedding-3-small",
+                                        "api_key": "sk-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "anthropic",
+                                        "model": "claude-sonnet-4-20250514",
+                                        "api_key": "sk-ant-..."
+                                    },
+                                    "top_n": 5
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+## End-to-End Walkthrough
+
+This section shows the complete flow from database creation to a working
+pipeline query.
+
+### Step 1 — Create the Database
+
+Include `scripts.post_database_create` to automatically provision the
+pgvector schema during database creation. This avoids any manual setup
+after deployment. Use a fixed `port` value for the RAG service so the
+URL stays stable across container restarts.
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "id": "knowledge-base",
+            "spec": {
+                "database_name": "knowledge_base",
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    },
+                    {
+                        "username": "app_read_only",
+                        "password": "readonly_password",
+                        "attributes": ["LOGIN"]
+                    }
+                ],
+                "port": 5432,
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "scripts": {
+                    "post_database_create": [
+                        "CREATE EXTENSION IF NOT EXISTS vector",
+                        "CREATE TABLE IF NOT EXISTS documents_content_chunks (id BIGSERIAL PRIMARY KEY, content TEXT NOT NULL, embedding vector(1536), title TEXT, source TEXT)",
+                        "CREATE INDEX ON documents_content_chunks USING hnsw (embedding vector_cosine_ops)",
+                        "CREATE INDEX ON documents_content_chunks USING gin (to_tsvector('\''english'\'', content))",
+                        "GRANT SELECT ON documents_content_chunks TO app_read_only"
+                    ]
+                },
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "app_read_only",
+                        "config": {
+                            "pipelines": [
+                                {
+                                    "name": "default",
+                                    "description": "Main RAG pipeline",
+                                    "tables": [
+                                        {
+                                            "table": "documents_content_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "openai",
+                                        "model": "text-embedding-3-small",
+                                        "api_key": "sk-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "anthropic",
+                                        "model": "claude-sonnet-4-5",
+                                        "api_key": "sk-ant-..."
+                                    },
+                                    "token_budget": 4000,
+                                    "top_n": 10
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+### Step 2 — Check the Database and Service Status
+
+Run the following command after ~60–90 seconds to check the database is
+ready and the RAG service is running:
+
+=== "curl"
+
+    ```sh
+    curl -s http://host-1:3000/v1/databases/knowledge-base
+    ```
+
+In the response, look for two things:
+
+- `state: "available"` at the top level — the database is provisioned
+  and healthy
+- `service_ready: true` inside `service_instances[].status` — the RAG
+  container is up and accepting requests
+
+```
+{
+  state: "available"
+  instances: [
+    {
+      state: "available"
+      postgres: {
+        patroni_state: "running"
+        role: "primary"
+      }
+    }
+  ]
+  service_instances: [
+    {
+      state: "running"
+      status: {
+        service_ready: true
+        ports: [
+          {
+            container_port: 8080
+            host_port: 9200
+            name: "tcp"
+          }
+        ]
+        last_health_at: "2026-04-22T10:00:00Z"
+      }
+    }
+  ]
+}
+```
+
+The `host_port` value is the port to use when querying the RAG service.
+If you used a fixed `port: 9200` in the service spec, this will always
+be `9200`.
+
+!!! tip
+    Use a fixed `port` value (e.g. `9200`) in the service spec rather than
+    `port: 0`. When `port: 0` is used, Docker assigns a random host port
+    that changes each time the RAG container is replaced (e.g. after an
+    API key update), requiring you to look up the new port each time.
+
+### Step 3 — Load Documents
+
+The RAG service needs documents with embeddings in the database before
+it can answer queries. The following Python script generates embeddings
+using OpenAI and inserts them into `documents_content_chunks`:
+
+```python
+#!/usr/bin/env python3
+import psycopg2
+from psycopg2.extras import execute_values
+from openai import OpenAI
+import os
+
+client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
+conn = psycopg2.connect(
+    host=os.environ.get("DB_HOST", "host-1"),
+    port=int(os.environ.get("DB_PORT", "5432")),
+    user=os.environ.get("DB_USER", "admin"),
+    password=os.environ.get("DB_PASSWORD", "admin_password"),
+    database=os.environ.get("DB_NAME", "knowledge_base"),
+)
+cur = conn.cursor()
+
+documents = [
+    {"title": "My Doc", "content": "Full document text goes here...", "source": "docs"},
+]
+
+def chunk_text(text, size=500, overlap=50):
+    return [text[i:i+size] for i in range(0, len(text), size-overlap) if text[i:i+size].strip()]
+
+for doc in documents:
+    chunks = chunk_text(doc["content"])
+    resp = client.embeddings.create(model="text-embedding-3-small", input=chunks)
+    embeddings = [item.embedding for item in resp.data]
+    execute_values(cur,
+        "INSERT INTO documents_content_chunks (content, embedding, title, source) VALUES %s",
+        [(c, e, doc["title"], doc["source"]) for c, e in zip(chunks, embeddings)],
+    )
+    conn.commit()
+    print(f"Loaded {len(chunks)} chunks from '{doc['title']}'")
+
+cur.close()
+conn.close()
+```
+
+```bash
+pip install psycopg2-binary openai
+export OPENAI_API_KEY="sk-..."
+export DB_HOST="host-1"
+export DB_USER="admin"
+export DB_PASSWORD="admin_password"
+export DB_NAME="knowledge_base"
+python3 load_documents.py
+```
+
+Verify documents were inserted:
+
+```bash
+psql "postgresql://admin:admin_password@host-1:5432/knowledge_base" \
+  -c "SELECT COUNT(*), COUNT(embedding) FROM documents_content_chunks;"
+```
+
+### Step 4 — Query the Pipeline
+
+```bash
+curl -X POST http://host-1:9200/v1/pipelines/default \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "How does multi-active replication work?",
+    "include_sources": true
+  }'
+```
+
+A successful response:
+
+```json
+{
+    "answer": "Multi-active replication allows multiple PostgreSQL nodes to accept writes simultaneously...",
+    "sources": [
+        {"id": "5", "content": "...", "score": 0.82},
+        {"id": "1", "content": "...", "score": 0.79}
+    ],
+    "tokens_used": 1243
+}
+```
+
+`sources` is only populated when `include_sources: true` is set in the
+request.
+
+### Step 5 — Update the Service Config
+
+To update the service (for example, to rotate an API key or change the
+LLM model), submit a `POST /v1/databases/{id}` with the complete updated
+spec. The update endpoint requires all fields — include `database_name`,
+`nodes`, `database_users`, and the full `services` array:
+
+=== "curl"
+
+    ```sh
+    curl -X POST http://host-1:3000/v1/databases/knowledge-base \
+        -H 'Content-Type: application/json' \
+        --data '{
+            "spec": {
+                "database_name": "knowledge_base",
+                "port": 5432,
+                "nodes": [
+                    { "name": "n1", "host_ids": ["host-1"] }
+                ],
+                "database_users": [
+                    {
+                        "username": "admin",
+                        "password": "admin_password",
+                        "db_owner": true,
+                        "attributes": ["SUPERUSER", "LOGIN"]
+                    },
+                    {
+                        "username": "app_read_only",
+                        "password": "readonly_password",
+                        "attributes": ["LOGIN"]
+                    }
+                ],
+                "services": [
+                    {
+                        "service_id": "rag",
+                        "service_type": "rag",
+                        "version": "latest",
+                        "host_ids": ["host-1"],
+                        "port": 9200,
+                        "connect_as": "app_read_only",
+                        "config": {
+                            "pipelines": [
+                                {
+                                    "name": "default",
+                                    "tables": [
+                                        {
+                                            "table": "documents_content_chunks",
+                                            "text_column": "content",
+                                            "vector_column": "embedding"
+                                        }
+                                    ],
+                                    "embedding_llm": {
+                                        "provider": "openai",
+                                        "model": "text-embedding-3-small",
+                                        "api_key": "sk-..."
+                                    },
+                                    "rag_llm": {
+                                        "provider": "anthropic",
+                                        "model": "claude-sonnet-4-5",
+                                        "api_key": "sk-ant-NEW-KEY"
+                                    },
+                                    "token_budget": 4000,
+                                    "top_n": 10
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }'
+    ```
+
+The RAG service container is replaced with the new configuration. Poll
+the database status until `state` is `"available"` and `service_ready`
+is `true` before sending queries.
+
+## Querying the RAG Service
+
+Once the service is running, submit queries to retrieve answers based on
+your documents.
+
+### List Available Pipelines
+
+=== "curl"
+
+    ```bash
+    curl http://localhost:9200/v1/pipelines
+    ```
+
+### Query a Pipeline
+
+=== "curl"
+
+    ```bash
+    curl -X POST http://localhost:9200/v1/pipelines/default \
+      -H "Content-Type: application/json" \
+      -d '{
+        "query": "How does RAG improve LLM responses?",
+        "include_sources": true
+      }'
+    ```
+
+### Request Fields
+
+| Field | Type | Default | Description |
+|---|---|---|---|
+| `query` | string | — | Required. The natural language question to answer. |
+| `include_sources` | boolean | `false` | Return the source documents used to generate the answer. |
+| `top_n` | integer | — | Override the pipeline's `top_n` for this request. |
+| `stream` | boolean | `false` | Stream the answer as Server-Sent Events. |
+
+### Response Format
+
+```json
+{
+    "answer": "RAG (Retrieval-Augmented Generation) improves LLM responses by retrieving relevant documents from your database before generating answers. This grounds the LLM in your specific data, reducing hallucinations and improving accuracy...",
+    "sources": [
+        {
+            "id": "42",
+            "content": "The RAG service enables retrieval-augmented generation workflows...",
+            "score": 0.87
+        }
+    ],
+    "tokens_used": 1243
+}
+```
+
+`sources` is only populated when `include_sources` is `true` in the request.
+
+The RAG service uses **hybrid search**, combining two complementary search
+techniques that are merged using **Reciprocal Rank Fusion (RRF)**:
+
+1. **Vector Similarity Search**: Retrieves documents semantically similar to
+   the query using cosine distance on embeddings.
+2. **BM25 Keyword Search**: Retrieves documents with exact keyword matches
+   using TF-IDF scoring.
+
+This combination ensures the LLM receives context that is both semantically
+relevant and keyword-relevant. Documents appearing in both result sets receive
+higher scores, naturally prioritizing highly-relevant results.
+
+### Search Configuration
+
+Configure search behavior in the pipeline:
+
+```json
+"search": {
+    "hybrid_enabled": true,
+    "vector_weight": 0.7
+}
+```
+
+| Parameter | Range | Description |
+|-----------|-------|-------------|
+| `hybrid_enabled` | `true` / `false` | Enable hybrid search (default: `true`). Set to `false` for vector-only search. |
+| `vector_weight` | 0.0–1.0 | Weight for vector search vs BM25 (default: `0.5`). Higher values prioritize semantic relevance. |
+
+### Token Budget
+
+The `token_budget` field controls how much context is sent to the LLM:
+
+- Documents are ranked and packed in order until the budget is exhausted
+- The final document is truncated at a sentence boundary (not mid-word)
+
+Increase the budget to send more context, or decrease it to reduce LLM costs.
+
+## User-Managed Responsibilities
+
+You are responsible for:
+
+1. **Embedding Generation**: Using embedding provider APIs (OpenAI, Voyage AI,
+   Ollama) to generate vector embeddings for your documents
+2. **Document Ingestion**: Loading document text and embeddings into the
+   `documents_content_chunks` table
+3. **API Keys**: Providing credentials for embedding and LLM providers in the
+   service `config`
+4. **Chunking Strategy**: Deciding how to split large documents for optimal
+   retrieval (e.g., 500-1000 character chunks with overlap)
+
+The Control Plane handles:
+
+1. **Schema Provisioning**: Automatically creating pgvector extension, tables,
+   and indexes via `scripts.post_database_create` during database creation
+2. **Service Deployment**: Provisioning and managing the RAG container
+3. **Database Credentials**: Automatically embedding the `connect_as` user's
+   credentials in the service configuration (credentials must be defined in
+   `database_users` during database creation)
+4. **Health Monitoring**: Checking service health and restarting on failure
+
+## Troubleshooting
+
+### About Automated Scripts
+
+The `scripts.post_database_create` field executes SQL automatically during
+database creation. Some important details:
+
+- **Execution Timing**: Scripts run once, immediately after Spock is initialized
+- **Transactional**: All statements execute within a single transaction
+- **No Re-Execution**: If you update the database spec later, scripts are not re-run
+- **Constraints**: Some SQL commands are not allowed within transactions:
+  - `VACUUM`, `ANALYZE` (use `REINDEX` instead)
+  - `CREATE INDEX CONCURRENTLY`
+  - `CREATE DATABASE`, `DROP DATABASE`
+
+If a script fails during database creation, you can use `update-database` to
+retry after fixing the problematic statement.
+
+### Service Fails to Start
+
+**Check database connectivity:**
+
+```bash
+# From host, verify database is accessible
+psql -h localhost -U admin -d knowledge_base -c "SELECT 1"
+```
+
+**Check user permissions:**
+
+```sql
+-- Verify the service user exists and has table access
+\du+ admin
+\dt documents_content_chunks
+```
+
+### Poor Query Results
+
+**Verify documents are loaded:**
+
+```sql
+-- Check document count
+SELECT COUNT(*) FROM documents_content_chunks;
+
+-- Verify embeddings exist
+SELECT COUNT(*) FROM documents_content_chunks WHERE embedding IS NOT NULL;
+```
+
+**Inspect embedding quality:**
+
+```sql
+-- Find documents similar to a test query embedding
+SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) as similarity
+FROM documents_content_chunks
+ORDER BY similarity DESC
+LIMIT 5;
+```
+
+**Try simpler queries:**
+
+Start with factual, keyword-based questions before complex analytical questions.
+
+### Empty Context Window
+
+If the RAG service returns limited context, the token budget may be exhausted. Increase it:
+
+```json
+"token_budget": 8000
+```
+
+Or store smaller, more focused document chunks.
+
+## Next Steps
+
+- Once you've validated the RAG service with manual documents, consider automating embedding generation
+- Implement document versioning and updates for evolving knowledge bases
+- Set up monitoring for query latency and answer quality
+- Explore pgedge_vectorizer for automated chunking and embedding in high-volume scenarios
+
+## Responsibility Summary
+
+| Step | Who | How |
+|---|---|---|
+| Provision schema (pgvector, tables, indexes) | Control Plane | `scripts.post_database_create` in database spec |
+| Deploy RAG container | Control Plane | Automatic on `POST /v1/databases` |
+| Inject database credentials | Control Plane | Automatic via `connect_as` field |
+| Health monitoring and restart | Control Plane | Automatic |
+| Generate embeddings | You | Call OpenAI / Voyage / Ollama API |
+| Load documents into table | You | `INSERT` using psycopg2 or any Postgres client |
+| Submit queries | Your application | `POST /v1/pipelines/{name}` on the RAG service |
+
+## Additional Resources
+
+- [RAG Server Repository](https://github.com/pgEdge/pgedge-rag-server)
+- [RAG Server Documentation](https://docs.pgedge.com/pgedge-rag-server/)
+- [pgvector Documentation](https://github.com/pgvector/pgvector)
+- [Managing Services](managing.md)

From 6b43be60532b517247284056a715ace5d3a520f8 Mon Sep 17 00:00:00 2001
From: Siva <siva.tamatam@pgedge.com>
Date: Wed, 22 Apr 2026 01:14:18 +0530
Subject: [PATCH 02/11] addressing AI review comments

---
 docs/services/rag.md | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/docs/services/rag.md b/docs/services/rag.md
index e80a98fc..bd6acd7d 100644
--- a/docs/services/rag.md
+++ b/docs/services/rag.md
@@ -332,9 +332,12 @@ python3 load_rag_documents.py
 ## Examples
 
 The following examples show how to configure the RAG service for common
-use cases. All examples use the `scripts.post_database_create` field to
-automatically provision the database schema (pgvector extension, tables,
-and indexes) during database creation.
+use cases. The first example includes the complete
+`scripts.post_database_create` setup to automatically provision the
+database schema (pgvector extension, tables, and indexes). Subsequent
+examples focus on service configuration variations and omit the schema
+setup for brevity — in production, always include the schema setup from
+the first example.
 
 ### Minimal (OpenAI + Anthropic)
 

From 453f5a72daa36478d291cef3c6791e9a2ac1ce60 Mon Sep 17 00:00:00 2001
From: Siva <siva.tamatam@pgedge.com>
Date: Wed, 22 Apr 2026 19:28:38 +0530
Subject: [PATCH 03/11] addressing review comments

---
 docs/services/rag.md | 537 +++++++++++++++++--------------------------
 1 file changed, 207 insertions(+), 330 deletions(-)

diff --git a/docs/services/rag.md b/docs/services/rag.md
index bd6acd7d..c8b91141 100644
--- a/docs/services/rag.md
+++ b/docs/services/rag.md
@@ -1,21 +1,23 @@
 # pgEdge RAG Server
 
-The RAG (Retrieval-Augmented Generation) service runs an intelligent query
-server alongside your database. The service uses vector and keyword search
-to retrieve relevant document chunks from PostgreSQL and synthesizes
-LLM-generated answers based on the retrieved context. For more information,
-see the [pgEdge RAG Server](https://github.com/pgEdge/pgedge-rag-server)
+The RAG (Retrieval-Augmented Generation) service runs an intelligent
+query server alongside your database. The service uses vector and
+keyword search to retrieve relevant document chunks from PostgreSQL
+and synthesizes LLM-generated answers based on the retrieved context.
+For more information, see the
+[pgEdge RAG Server](https://github.com/pgEdge/pgedge-rag-server)
 project.
 
 ## Overview
 
 The Control Plane provisions a RAG service container on each specified
-host. The service connects to the database using an existing user specified
-in the `connect_as` field (which must be defined in `database_users`). The
-credentials are automatically embedded in the service configuration by the
-Control Plane. Client applications submit natural language queries to the
-service, which performs hybrid vector and keyword search against document
-tables and returns LLM-synthesized answers with source citations.
+host. The service connects to the database using an existing user
+specified in the `connect_as` field, which must be defined in
+`database_users`. The credentials are automatically embedded in the
+service configuration by the Control Plane. Client applications submit
+natural language queries to the service, which performs hybrid vector
+and keyword search against document tables and returns LLM-synthesized
+answers with source citations.
 
 See [Managing Services](managing.md) for instructions on adding,
 updating, and removing services. The sections below cover RAG-specific
@@ -23,34 +25,19 @@ configuration.
 
 ## Database Prerequisites
 
-Before deploying a RAG service, your PostgreSQL database must have:
+Before deploying a RAG service, your PostgreSQL database must have the
+following items configured:
 
-1. **pgvector extension** installed and enabled
-2. **Document table(s)** with text and vector columns
-3. **HNSW index** on vector columns for fast similarity search
-4. **GIN index** on text columns for keyword search (BM25)
+- pgvector extension installed and enabled.
+- document tables with text and vector columns.
+- HNSW index on vector columns for fast similarity search.
+- GIN index on text columns for keyword search (BM25).
 
-The Control Plane can automatically provision all of these during database
-creation using the `scripts.post_database_create` hook. See [Preparing the
-Database](#preparing-the-database) for a complete example. Alternatively,
-you can provision these manually after database creation.
-
-## Automation & Responsibilities
-
-The Control Plane handles certain setup tasks automatically during database
-and service creation:
-
-**Automated (Control Plane)**
-- Creating pgvector extension
-- Creating document tables and indexes (via `scripts.post_database_create`)
-- Embedding RAG service credentials into configuration files
-- Deploying RAG container and health monitoring
-
-**Manual (You Provide)**
-- **Schema Design**: Deciding table structure, column names, vector dimensions
-- **Embedding Generation**: Using external APIs (OpenAI, Voyage, Ollama) to vectorize documents
-- **Document Loading**: Inserting documents and embeddings into the database
-- **API Credentials**: Providing LLM and embedding provider API keys
+The Control Plane can automatically provision all of these during
+database creation using the `scripts.post_database_create` hook. See
+[Preparing the Database](#preparing-the-database) for a complete
+example. Alternatively, you can provision these manually after
+database creation.
 
 ## Configuration Reference
 
@@ -59,12 +46,15 @@ service spec.
 
 ### Service Connection
 
-The `connect_as` field (at the service level) specifies which database user
-the RAG service will authenticate as. This user **must already be defined** in
-the `database_users` array when creating the database. The Control Plane
-automatically embeds that user's credentials in the service configuration.
+The `connect_as` field at the service level specifies which database
+user the RAG service authenticates as. This user must already be
+defined in the `database_users` array when creating the database. The
+Control Plane automatically embeds that user's credentials in the
+service configuration.
+
+The following example shows the `connect_as` field in the service
+spec:
 
-Example:
 ```json
 {
   "service_id": "rag",
@@ -75,6 +65,7 @@ Example:
 ```
 
 In this example, `app_read_only` must be defined in `database_users`:
+
 ```json
 {
   "username": "app_read_only",
@@ -85,9 +76,11 @@ In this example, `app_read_only` must be defined in `database_users`:
 
 ### Pipeline Configuration
 
-The `pipelines` array (required) defines one or more RAG workflows. Each
-pipeline specifies which tables to search, which embedding provider to use,
-and which LLM to use for answer generation.
+The `pipelines` array (required) defines one or more RAG workflows.
+Each pipeline specifies which tables to search, which embedding
+provider to use, and which LLM to use for answer generation.
+
+The following table describes the pipeline configuration fields:
 
 | Field | Type | Description |
 |---|---|---|
@@ -108,6 +101,8 @@ vectorize each incoming query. The embedding vector is then used for
 similarity search against stored document vectors. All required fields
 must be set; `api_key` is not required for `ollama`.
 
+The following table describes the embedding configuration fields:
+
 | Field | Type | Description |
 |---|---|---|
 | `provider` | string | Required. The embedding provider. One of: `openai`, `voyage`, `anthropic`, `ollama`. |
@@ -117,26 +112,30 @@ must be set; `api_key` is not required for `ollama`.
 
 ### LLM Configuration
 
-The `rag_llm` object configures the LLM provider used to synthesize the
-final answer from retrieved documents. `api_key` is required for all
-providers except `ollama`.
+The `rag_llm` object configures the LLM provider used to synthesize
+the final answer from retrieved documents. `api_key` is required for
+all providers except `ollama`.
+
+The following table describes the LLM configuration fields:
 
 | Field | Type | Description |
 |---|---|---|
 | `provider` | string | Required. The LLM provider. One of: `anthropic`, `openai`, `ollama`. |
-| `model` | string | Required. The model name (e.g., `claude-sonnet-4-20250514`, `gpt-4o`, `llama3.2`). |
+| `model` | string | Required. The model name (e.g., `claude-sonnet-4-5`, `gpt-4o`, `llama3.2`). |
 | `api_key` | string | API key for the provider. Required for `anthropic` and `openai`. Not used for `ollama`. |
 | `base_url` | string | Optional. Custom base URL for API gateway routing. For `ollama`, defaults to `http://localhost:11434`. |
 
 !!! note
-    If `embedding_llm` and `rag_llm` share the same provider and both specify
-    an `api_key`, the values must be identical. The RAG server maintains one
-    key slot per provider and cannot reconcile two different values.
+    If `embedding_llm` and `rag_llm` share the same provider and both
+    specify an `api_key`, the values must be identical. The pgEdge RAG
+    Server maintains one key slot per provider and cannot reconcile
+    two different values.
 
 ### Table Configuration
 
 Each table in a pipeline specifies how to access document text and
-embeddings.
+embeddings. The following table describes the table configuration
+fields:
 
 | Field | Type | Description |
 |---|---|---|
@@ -147,18 +146,20 @@ embeddings.
 
 ### Search Configuration
 
-The `search` object tunes how documents are retrieved before being passed
-to the LLM.
+The `search` object tunes how documents are retrieved before being
+passed to the LLM. The following table describes the search
+configuration fields:
 
 | Field | Type | Default | Description |
 |---|---|---|---|
 | `hybrid_enabled` | boolean | `true` | Enable hybrid search combining vector similarity and BM25 keyword matching. Set to `false` for vector-only search. |
-| `vector_weight` | float | `0.5` | Weight for vector search versus BM25 (0.0–1.0). Higher values prioritize semantic relevance. |
+| `vector_weight` | float | `0.5` | Weight for vector search versus BM25 (0.0-1.0). Higher values prioritize semantic relevance. |
 
 ### Defaults Configuration
 
-The optional `defaults` object sets fallback values applied to any pipeline
-that does not specify its own `token_budget` or `top_n`.
+The optional `defaults` object sets fallback values applied to any
+pipeline that does not specify its own `token_budget` or `top_n`. The
+following table describes the defaults configuration fields:
 
 | Field | Type | Description |
 |---|---|---|
@@ -167,14 +168,15 @@ that does not specify its own `token_budget` or `top_n`.
 
 ## Preparing the Database
 
-Before deploying a RAG service, you must prepare your PostgreSQL database
-with pgvector, document tables, and indexes. The Control Plane automatically
-executes these during database creation when you include them in the
-`scripts.post_database_create` array in your database specification.
+Before deploying a RAG service, you must prepare your PostgreSQL
+database with pgvector, document tables, and indexes. The Control
+Plane automatically executes these during database creation when you
+include them in the `scripts.post_database_create` array in your
+database specification.
 
 ### Required Schema
 
-The following SQL statements should be included in `scripts.post_database_create`
+Include the following SQL statements in `scripts.post_database_create`
 to automatically initialize the database schema during creation:
 
 ```sql
@@ -200,12 +202,13 @@ CREATE INDEX IF NOT EXISTS documents_content_idx
     ON documents_content_chunks USING gin (to_tsvector('english', content));
 ```
 
-These statements are included as individual entries in the `scripts.post_database_create`
-array (see examples below).
+These statements are included as individual entries in the
+`scripts.post_database_create` array (see examples below).
 
 ### Vector Dimensions
 
-Adjust the `vector(N)` dimension based on your embedding model:
+Adjust the `vector(N)` dimension to match your embedding model. The
+following table shows common models and their vector dimensions:
 
 | Provider | Model | Dimensions |
 |----------|-------|-----------|
@@ -214,135 +217,20 @@ Adjust the `vector(N)` dimension based on your embedding model:
 | Voyage AI | `voyage-3` / `voyage-3-large` | 1024 |
 | Ollama | Varies by model | Check model documentation |
 
-### Loading Documents
-
-After the database and RAG service are deployed, you are responsible for
-generating embeddings for your documents and loading them into the database.
-The Control Plane does not automate this step—you must run this process
-separately, typically via an external application or scheduled task.
-
-Here's a Python example using OpenAI to generate embeddings and load documents:
-
-```python
-#!/usr/bin/env python3
-"""Generate embeddings and load documents into the RAG database."""
-
-import psycopg2
-from psycopg2.extras import execute_values
-from openai import OpenAI
-import os
-import sys
-
-# Configuration
-OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
-DB_HOST = os.environ.get("DB_HOST", "localhost")
-DB_USER = os.environ.get("DB_USER", "admin")
-DB_PASSWORD = os.environ.get("DB_PASSWORD", "admin_password")
-DB_NAME = os.environ.get("DB_NAME", "knowledge_base")
-
-def chunk_text(text, chunk_size=500, overlap=50):
-    """Split text into overlapping chunks."""
-    chunks = []
-    for i in range(0, len(text), chunk_size - overlap):
-        chunk = text[i:i + chunk_size]
-        if chunk.strip():
-            chunks.append(chunk)
-    return chunks
-
-def generate_embeddings(texts, client):
-    """Generate embeddings for multiple texts."""
-    response = client.embeddings.create(
-        model="text-embedding-3-small",
-        input=texts
-    )
-    return [item.embedding for item in response.data]
-
-# Sample documents
-documents = [
-    {
-        "title": "pgEdge Overview",
-        "content": "pgEdge is a distributed PostgreSQL system...",
-        "source": "docs"
-    },
-    {
-        "title": "RAG Guide",
-        "content": "RAG enables intelligent question-answering systems...",
-        "source": "docs"
-    }
-]
-
-if not OPENAI_API_KEY:
-    print("ERROR: OPENAI_API_KEY environment variable not set")
-    sys.exit(1)
-
-client = OpenAI(api_key=OPENAI_API_KEY)
-conn = psycopg2.connect(
-    host=DB_HOST,
-    user=DB_USER,
-    password=DB_PASSWORD,
-    database=DB_NAME
-)
-cur = conn.cursor()
-
-total_inserted = 0
-
-for doc in documents:
-    print(f"Processing: {doc['title']}")
-    chunks = chunk_text(doc["content"])
-    
-    if chunks:
-        # Generate embeddings for all chunks
-        embeddings = generate_embeddings(chunks, client)
-        
-        # Prepare batch insert data
-        insert_data = [
-            (chunk, embedding, doc["title"], doc["source"])
-            for chunk, embedding in zip(chunks, embeddings)
-        ]
-        
-        # Batch insert
-        insert_query = """
-            INSERT INTO documents_content_chunks
-                (content, embedding, title, source)
-            VALUES %s
-        """
-        execute_values(cur, insert_query, insert_data)
-        conn.commit()
-        
-        inserted = len(insert_data)
-        total_inserted += inserted
-        print(f"  Inserted {inserted} chunks")
-
-print(f"\nTotal chunks inserted: {total_inserted}")
-cur.close()
-conn.close()
-```
-
-**Usage:**
-```bash
-pip install psycopg2-binary openai
-export OPENAI_API_KEY="sk-..."
-export DB_HOST="localhost"
-export DB_USER="admin"
-export DB_PASSWORD="admin_password"
-export DB_NAME="knowledge_base"
-python3 load_rag_documents.py
-```
-
 ## Examples
 
-The following examples show how to configure the RAG service for common
-use cases. The first example includes the complete
+The following examples show how to configure the RAG service for
+common use cases. The first example includes the complete
 `scripts.post_database_create` setup to automatically provision the
 database schema (pgvector extension, tables, and indexes). Subsequent
 examples focus on service configuration variations and omit the schema
-setup for brevity — in production, always include the schema setup from
-the first example.
+setup for brevity - in production, always include the schema setup
+from the first example.
 
 ### Minimal (OpenAI + Anthropic)
 
-In the following example, a `curl` command provisions a RAG service with
-OpenAI for embeddings and Anthropic Claude for answer generation:
+In the following example, a `curl` command provisions a RAG service
+with OpenAI for embeddings and Anthropic Claude for answer generation:
 
 === "curl"
 
@@ -400,7 +288,7 @@ OpenAI for embeddings and Anthropic Claude for answer generation:
                                     },
                                     "rag_llm": {
                                         "provider": "anthropic",
-                                        "model": "claude-sonnet-4-20250514",
+                                        "model": "claude-sonnet-4-5",
                                         "api_key": "sk-ant-..."
                                     },
                                     "token_budget": 4000,
@@ -416,8 +304,8 @@ OpenAI for embeddings and Anthropic Claude for answer generation:
 
 ### OpenAI End-to-End
 
-In the following example, OpenAI is used for both embeddings and answer
-generation:
+In the following example, OpenAI is used for both embeddings and
+answer generation:
 
 === "curl"
 
@@ -479,8 +367,9 @@ generation:
 
 ### Voyage AI with Vector-Only Search
 
-In the following example, Voyage AI is used for embeddings and the service
-is configured for vector-only search (disabling BM25 keyword matching):
+In the following example, Voyage AI is used for embeddings and the
+service is configured for vector-only search (disabling BM25 keyword
+matching):
 
 === "curl"
 
@@ -528,7 +417,7 @@ is configured for vector-only search (disabling BM25 keyword matching):
                                     },
                                     "rag_llm": {
                                         "provider": "anthropic",
-                                        "model": "claude-sonnet-4-20250514",
+                                        "model": "claude-sonnet-4-5",
                                         "api_key": "sk-ant-..."
                                     },
                                     "search": {
@@ -546,9 +435,9 @@ is configured for vector-only search (disabling BM25 keyword matching):
 
 ### Ollama (Self-Hosted)
 
-In the following example, the RAG service uses a self-hosted Ollama server
-for both embeddings and answer generation. No API key is required; the
-Ollama server URL is provided via `base_url`:
+In the following example, the RAG service uses a self-hosted Ollama
+server for both embeddings and answer generation. No API key is
+required; the Ollama server URL is provided via `base_url`:
 
 === "curl"
 
@@ -610,8 +499,8 @@ Ollama server URL is provided via `base_url`:
 
 ### Multiple Pipelines with Shared Defaults
 
-In the following example, two pipelines share default `token_budget` and
-`top_n` values set at the `defaults` level:
+In the following example, two pipelines share default `token_budget`
+and `top_n` values set at the `defaults` level:
 
 === "curl"
 
@@ -664,7 +553,7 @@ In the following example, two pipelines share default `token_budget` and
                                     },
                                     "rag_llm": {
                                         "provider": "anthropic",
-                                        "model": "claude-sonnet-4-20250514",
+                                        "model": "claude-sonnet-4-5",
                                         "api_key": "sk-ant-..."
                                     }
                                 },
@@ -685,7 +574,7 @@ In the following example, two pipelines share default `token_budget` and
                                     },
                                     "rag_llm": {
                                         "provider": "anthropic",
-                                        "model": "claude-sonnet-4-20250514",
+                                        "model": "claude-sonnet-4-5",
                                         "api_key": "sk-ant-..."
                                     },
                                     "top_n": 5
@@ -698,12 +587,12 @@ In the following example, two pipelines share default `token_budget` and
         }'
     ```
 
-## End-to-End Walkthrough
+## Deployment Guide
 
-This section shows the complete flow from database creation to a working
-pipeline query.
+This section shows the complete flow from database creation to a
+working pipeline query.
 
-### Step 1 — Create the Database
+### Step 1 - Create the Database
 
 Include `scripts.post_database_create` to automatically provision the
 pgvector schema during database creation. This avoids any manual setup
@@ -786,10 +675,10 @@ URL stays stable across container restarts.
         }'
     ```
 
-### Step 2 — Check the Database and Service Status
+### Step 2 - Check the Database and Service Status
 
-Run the following command after ~60–90 seconds to check the database is
-ready and the RAG service is running:
+Run the following command after approximately 60-90 seconds to check
+that the database is ready and the RAG service is running:
 
 === "curl"
 
@@ -797,14 +686,14 @@ ready and the RAG service is running:
     curl -s http://host-1:3000/v1/databases/knowledge-base
     ```
 
-In the response, look for two things:
+In the response, look for the following items:
 
-- `state: "available"` at the top level — the database is provisioned
-  and healthy
-- `service_ready: true` inside `service_instances[].status` — the RAG
-  container is up and accepting requests
+- `state: "available"` at the top level - the database is provisioned
+  and healthy.
+- `service_ready: true` inside `service_instances[].status` - the RAG
+  container is up and accepting requests.
 
-```
+```text
 {
   state: "available"
   instances: [
@@ -835,17 +724,18 @@ In the response, look for two things:
 }
 ```
 
-The `host_port` value is the port to use when querying the RAG service.
-If you used a fixed `port: 9200` in the service spec, this will always
-be `9200`.
+The `host_port` value is the port to use when querying the RAG
+service. If you used a fixed `port: 9200` in the service spec, the
+host port will always be `9200`.
 
 !!! tip
-    Use a fixed `port` value (e.g. `9200`) in the service spec rather than
-    `port: 0`. When `port: 0` is used, Docker assigns a random host port
-    that changes each time the RAG container is replaced (e.g. after an
-    API key update), requiring you to look up the new port each time.
+    Use a fixed `port` value (e.g. `9200`) in the service spec rather
+    than `port: 0`. When `port: 0` is used, Docker assigns a random
+    host port that changes each time the RAG container is replaced
+    (e.g. after an API key update), requiring you to look up the new
+    port each time.
 
-### Step 3 — Load Documents
+### Step 3 - Load Documents
 
 The RAG service needs documents with embeddings in the database before
 it can answer queries. The following Python script generates embeddings
@@ -890,6 +780,9 @@ cur.close()
 conn.close()
 ```
 
+Install the dependencies and run the script with the following
+commands:
+
 ```bash
 pip install psycopg2-binary openai
 export OPENAI_API_KEY="sk-..."
@@ -900,14 +793,16 @@ export DB_NAME="knowledge_base"
 python3 load_documents.py
 ```
 
-Verify documents were inserted:
+To verify that documents were inserted, run the following query:
 
 ```bash
 psql "postgresql://admin:admin_password@host-1:5432/knowledge_base" \
   -c "SELECT COUNT(*), COUNT(embedding) FROM documents_content_chunks;"
 ```
 
-### Step 4 — Query the Pipeline
+### Step 4 - Query the Pipeline
+
+Send a query to the RAG service using the following command:
 
 ```bash
 curl -X POST http://host-1:9200/v1/pipelines/default \
@@ -918,7 +813,7 @@ curl -X POST http://host-1:9200/v1/pipelines/default \
   }'
 ```
 
-A successful response:
+A successful response looks like this:
 
 ```json
 {
@@ -931,15 +826,16 @@ A successful response:
 }
 ```
 
-`sources` is only populated when `include_sources: true` is set in the
-request.
+`sources` is only populated when `include_sources: true` is set in
+the request.
 
-### Step 5 — Update the Service Config
+### Step 5 - Update the Service Config
 
-To update the service (for example, to rotate an API key or change the
-LLM model), submit a `POST /v1/databases/{id}` with the complete updated
-spec. The update endpoint requires all fields — include `database_name`,
-`nodes`, `database_users`, and the full `services` array:
+To update the service (for example, to rotate an API key or change
+the LLM model), submit a `POST /v1/databases/{id}` with the complete
+updated spec. The update endpoint requires all fields - include
+`database_name`, `nodes`, `database_users`, and the full `services`
+array:
 
 === "curl"
 
@@ -1006,17 +902,19 @@ spec. The update endpoint requires all fields — include `database_name`,
         }'
     ```
 
-The RAG service container is replaced with the new configuration. Poll
-the database status until `state` is `"available"` and `service_ready`
-is `true` before sending queries.
+The RAG service container is replaced with the new configuration.
+Poll the database status until `state` is `"available"` and
+`service_ready` is `true` before sending queries.
 
 ## Querying the RAG Service
 
-Once the service is running, submit queries to retrieve answers based on
-your documents.
+Once the service is running, submit queries to retrieve answers based
+on your documents.
 
 ### List Available Pipelines
 
+To list all configured pipelines, send the following request:
+
 === "curl"
 
     ```bash
@@ -1025,6 +923,9 @@ your documents.
 
 ### Query a Pipeline
 
+To submit a query to a pipeline, send a POST request with the query
+text:
+
 === "curl"
 
     ```bash
@@ -1038,15 +939,19 @@ your documents.
 
 ### Request Fields
 
+The following table describes the query request fields:
+
 | Field | Type | Default | Description |
 |---|---|---|---|
-| `query` | string | — | Required. The natural language question to answer. |
+| `query` | string | - | Required. The natural language question to answer. |
 | `include_sources` | boolean | `false` | Return the source documents used to generate the answer. |
-| `top_n` | integer | — | Override the pipeline's `top_n` for this request. |
+| `top_n` | integer | - | Override the pipeline's `top_n` for this request. |
 | `stream` | boolean | `false` | Stream the answer as Server-Sent Events. |
 
 ### Response Format
 
+A successful query response looks like this:
+
 ```json
 {
     "answer": "RAG (Retrieval-Augmented Generation) improves LLM responses by retrieving relevant documents from your database before generating answers. This grounds the LLM in your specific data, reducing hallucinations and improving accuracy...",
@@ -1061,148 +966,114 @@ your documents.
 }
 ```
 
-`sources` is only populated when `include_sources` is `true` in the request.
-
-The RAG service uses **hybrid search**, combining two complementary search
-techniques that are merged using **Reciprocal Rank Fusion (RRF)**:
-
-1. **Vector Similarity Search**: Retrieves documents semantically similar to
-   the query using cosine distance on embeddings.
-2. **BM25 Keyword Search**: Retrieves documents with exact keyword matches
-   using TF-IDF scoring.
-
-This combination ensures the LLM receives context that is both semantically
-relevant and keyword-relevant. Documents appearing in both result sets receive
-higher scores, naturally prioritizing highly-relevant results.
-
-### Search Configuration
+`sources` is only populated when `include_sources` is `true` in the
+request.
 
-Configure search behavior in the pipeline:
+The RAG service's hybrid search combines two complementary techniques,
+merged using Reciprocal Rank Fusion (RRF):
 
-```json
-"search": {
-    "hybrid_enabled": true,
-    "vector_weight": 0.7
-}
-```
+- vector similarity search, which retrieves documents semantically
+  similar to the query using cosine distance on embeddings.
+- BM25 keyword search, which retrieves documents with exact keyword
+  matches using TF-IDF scoring.
 
-| Parameter | Range | Description |
-|-----------|-------|-------------|
-| `hybrid_enabled` | `true` / `false` | Enable hybrid search (default: `true`). Set to `false` for vector-only search. |
-| `vector_weight` | 0.0–1.0 | Weight for vector search vs BM25 (default: `0.5`). Higher values prioritize semantic relevance. |
+This combination ensures the LLM receives context that is both
+semantically relevant and keyword-relevant. Documents appearing in
+both result sets receive higher scores, naturally prioritizing
+highly-relevant results.
 
 ### Token Budget
 
-The `token_budget` field controls how much context is sent to the LLM:
-
-- Documents are ranked and packed in order until the budget is exhausted
-- The final document is truncated at a sentence boundary (not mid-word)
-
-Increase the budget to send more context, or decrease it to reduce LLM costs.
-
-## User-Managed Responsibilities
-
-You are responsible for:
-
-1. **Embedding Generation**: Using embedding provider APIs (OpenAI, Voyage AI,
-   Ollama) to generate vector embeddings for your documents
-2. **Document Ingestion**: Loading document text and embeddings into the
-   `documents_content_chunks` table
-3. **API Keys**: Providing credentials for embedding and LLM providers in the
-   service `config`
-4. **Chunking Strategy**: Deciding how to split large documents for optimal
-   retrieval (e.g., 500-1000 character chunks with overlap)
-
-The Control Plane handles:
-
-1. **Schema Provisioning**: Automatically creating pgvector extension, tables,
-   and indexes via `scripts.post_database_create` during database creation
-2. **Service Deployment**: Provisioning and managing the RAG container
-3. **Database Credentials**: Automatically embedding the `connect_as` user's
-   credentials in the service configuration (credentials must be defined in
-   `database_users` during database creation)
-4. **Health Monitoring**: Checking service health and restarting on failure
+The `token_budget` field controls how much context is sent to the LLM.
+The service ranks documents and packs them in order until the budget
+is exhausted. The final document is truncated at a sentence boundary.
+Increase the budget to send more context, or decrease it to reduce
+LLM costs.
 
 ## Troubleshooting
 
+The following sections describe common issues and how to resolve them.
+
 ### About Automated Scripts
 
-The `scripts.post_database_create` field executes SQL automatically during
-database creation. Some important details:
+The `scripts.post_database_create` field executes SQL automatically
+during database creation. The following details apply:
 
-- **Execution Timing**: Scripts run once, immediately after Spock is initialized
-- **Transactional**: All statements execute within a single transaction
-- **No Re-Execution**: If you update the database spec later, scripts are not re-run
-- **Constraints**: Some SQL commands are not allowed within transactions:
-  - `VACUUM`, `ANALYZE` (use `REINDEX` instead)
-  - `CREATE INDEX CONCURRENTLY`
-  - `CREATE DATABASE`, `DROP DATABASE`
+- Execution timing: scripts run once, immediately after Spock is
+  initialized
+- Transactional: all statements execute within a single transaction
+- No re-execution: if you update the database spec later, scripts are
+  not re-run
+- Constraints: some SQL commands are not allowed within transactions,
+  including `VACUUM`, `ANALYZE`, `CREATE INDEX CONCURRENTLY`,
+  `CREATE DATABASE`, and `DROP DATABASE`
 
-If a script fails during database creation, you can use `update-database` to
-retry after fixing the problematic statement.
+If a script fails during database creation, you can use
+`update-database` to retry after fixing the problematic statement.
 
 ### Service Fails to Start
 
-**Check database connectivity:**
+To diagnose a service that fails to start, check database
+connectivity and user permissions.
+
+To verify that the database is accessible, run the following command:
 
 ```bash
-# From host, verify database is accessible
 psql -h localhost -U admin -d knowledge_base -c "SELECT 1"
 ```
 
-**Check user permissions:**
+To verify that the service user exists and has table access, run the
+following query:
 
 ```sql
--- Verify the service user exists and has table access
 \du+ admin
 \dt documents_content_chunks
 ```
 
 ### Poor Query Results
 
-**Verify documents are loaded:**
+To diagnose poor query results, verify that documents are loaded and
+embeddings are present.
+
+To check document counts and embedding coverage, run the following
+queries:
 
 ```sql
--- Check document count
 SELECT COUNT(*) FROM documents_content_chunks;
 
--- Verify embeddings exist
 SELECT COUNT(*) FROM documents_content_chunks WHERE embedding IS NOT NULL;
 ```
 
-**Inspect embedding quality:**
+To find documents similar to a test query embedding, run the following
+query:
 
 ```sql
--- Find documents similar to a test query embedding
 SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) as similarity
 FROM documents_content_chunks
 ORDER BY similarity DESC
 LIMIT 5;
 ```
 
-**Try simpler queries:**
-
-Start with factual, keyword-based questions before complex analytical questions.
+Start with factual, keyword-based questions before complex analytical
+questions to verify that the pipeline is working correctly.
 
 ### Empty Context Window
 
-If the RAG service returns limited context, the token budget may be exhausted. Increase it:
+If the RAG service returns limited context, the token budget may be
+exhausted. Increase the budget in the pipeline configuration:
 
 ```json
 "token_budget": 8000
 ```
 
-Or store smaller, more focused document chunks.
-
-## Next Steps
-
-- Once you've validated the RAG service with manual documents, consider automating embedding generation
-- Implement document versioning and updates for evolving knowledge bases
-- Set up monitoring for query latency and answer quality
-- Explore pgedge_vectorizer for automated chunking and embedding in high-volume scenarios
+Alternatively, store smaller, more focused document chunks to fit more
+context within the budget.
 
 ## Responsibility Summary
 
+The following table summarizes which tasks are handled by the Control
+Plane and which are your responsibility:
+
 | Step | Who | How |
 |---|---|---|
 | Provision schema (pgvector, tables, indexes) | Control Plane | `scripts.post_database_create` in database spec |
@@ -1213,9 +1084,15 @@ Or store smaller, more focused document chunks.
 | Load documents into table | You | `INSERT` using psycopg2 or any Postgres client |
 | Submit queries | Your application | `POST /v1/pipelines/{name}` on the RAG service |
 
-## Additional Resources
+## Next Steps
+
+The following resources provide more information on related topics.
 
-- [RAG Server Repository](https://github.com/pgEdge/pgedge-rag-server)
-- [RAG Server Documentation](https://docs.pgedge.com/pgedge-rag-server/)
-- [pgvector Documentation](https://github.com/pgvector/pgvector)
-- [Managing Services](managing.md)
+- The [Managing Services](managing.md) guide describes how to add,
+  update, and remove services.
+- The [pgEdge RAG Server](https://github.com/pgEdge/pgedge-rag-server)
+  repository contains the pgEdge RAG Server source code.
+- The [pgEdge RAG Server Documentation](https://docs.pgedge.com/pgedge-rag-server/)
+  covers the pgEdge RAG Server API and configuration in detail.
+- The [pgvector Documentation](https://github.com/pgvector/pgvector)
+  explains how to install and use the pgvector extension.

From 9575b8efa54a0cec1f4fb0dfc9f86c40d75e0726 Mon Sep 17 00:00:00 2001
From: Siva <siva.tamatam@pgedge.com>
Date: Wed, 22 Apr 2026 20:04:19 +0530
Subject: [PATCH 04/11] docs: resolve index.md conflict and apply stylesheet

- Resolve index.md merge conflict: keep RAG Server link, adopt
  main's connect_as-based Database Credentials section and
  updated Next Steps
- Apply pgEdge stylesheet to rag.md: 79-char wrap, hyphens for
  em-dashes, table intro sentences, bullet periods, no bold
  headings, Next Steps as doc links
- Remove redundant sections: Automation & Responsibilities,
  Loading Documents (duplicated in Step 3), User-Managed
  Responsibilities, Search Configuration (duplicated in
  Config Reference)

PLAT-495

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 docs/services/index.md | 18 +++++++++---------
 docs/services/rag.md   |  8 ++++----
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/docs/services/index.md b/docs/services/index.md
index 406ad457..acc94367 100644
--- a/docs/services/index.md
+++ b/docs/services/index.md
@@ -84,15 +84,15 @@ creating one instance per host for redundancy:
 
 ## Database Credentials
 
-Each service instance is automatically provisioned with two dedicated
-database users. The Control Plane manages these credentials; you do not
-need to create or rotate them manually. The credentials are:
-
-- `svc_{service_id}_ro` is a read-only user with read access to the
-  database; this user is the default for most service types.
-- `svc_{service_id}_rw` is a read-write user with read and write access
-  to the database; this user is provisioned when the service needs
-  read/write access.
+Each service connects to the database as a user you specify with the
+`connect_as` field. The `connect_as` value must reference a username
+in your `database_users` array. The Control Plane uses those
+credentials to generate the service's connection string and to configure
+any required role grants (for example, granting the anonymous role to
+a PostgREST authenticator).
+
+You own and manage the `connect_as` user. Removing a service does not
+drop the underlying database user.
 
 ## Next Steps
 
diff --git a/docs/services/rag.md b/docs/services/rag.md
index c8b91141..f7f251f4 100644
--- a/docs/services/rag.md
+++ b/docs/services/rag.md
@@ -1000,13 +1000,13 @@ The `scripts.post_database_create` field executes SQL automatically
 during database creation. The following details apply:
 
 - Execution timing: scripts run once, immediately after Spock is
-  initialized
-- Transactional: all statements execute within a single transaction
+  initialized.
+- Transactional: all statements execute within a single transaction.
 - No re-execution: if you update the database spec later, scripts are
-  not re-run
+  not re-run.
 - Constraints: some SQL commands are not allowed within transactions,
   including `VACUUM`, `ANALYZE`, `CREATE INDEX CONCURRENTLY`,
-  `CREATE DATABASE`, and `DROP DATABASE`
+  `CREATE DATABASE`, and `DROP DATABASE`.
 
 If a script fails during database creation, you can use
 `update-database` to retry after fixing the problematic statement.

From 83ce64331ba9312dde3afbcde9b7591872d1cbfb Mon Sep 17 00:00:00 2001
From: Siva <siva.tamatam@pgedge.com>
Date: Wed, 22 Apr 2026 20:11:45 +0530
Subject: [PATCH 05/11] docs: revert unintended MCP description change in
 index.md

Restore main's trimmed MCP description; our PR only adds
the RAG Server entry.

PLAT-495

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 docs/services/index.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/docs/services/index.md b/docs/services/index.md
index 1f25ebb5..3adb2489 100644
--- a/docs/services/index.md
+++ b/docs/services/index.md
@@ -7,8 +7,7 @@ specify with the `connect_as` field. The Control Plane supports the
 following service types:
 
 - The [pgEdge Postgres MCP Server](mcp.md) connects AI agents and
-  LLM-powered applications to your database, enabling natural language
-  queries and AI-powered data access.
+  LLM-powered applications to your database.
 - The [pgEdge RAG Server](rag.md) enables retrieval-augmented generation
   workflows using your database as a knowledge store, returning
   LLM-synthesized answers grounded in your data.

From 7f295586658da015cab5c298924eb0972680de9b Mon Sep 17 00:00:00 2001
From: Siva <siva.tamatam@pgedge.com>
Date: Wed, 22 Apr 2026 22:31:23 +0530
Subject: [PATCH 06/11] docs: fix RRF score values in RAG response examples

Score values like 0.82/0.87 implied cosine similarity but the
RAG service returns RRF scores which are much smaller (~0.008).
Update both example responses to use realistic RRF score values.

PLAT-495

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 docs/services/rag.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/services/rag.md b/docs/services/rag.md
index f7f251f4..ee3ad2b4 100644
--- a/docs/services/rag.md
+++ b/docs/services/rag.md
@@ -819,8 +819,8 @@ A successful response looks like this:
 {
     "answer": "Multi-active replication allows multiple PostgreSQL nodes to accept writes simultaneously...",
     "sources": [
-        {"id": "5", "content": "...", "score": 0.82},
-        {"id": "1", "content": "...", "score": 0.79}
+        {"id": "5", "content": "...", "score": 0.00820},
+        {"id": "1", "content": "...", "score": 0.00806}
     ],
     "tokens_used": 1243
 }
@@ -959,7 +959,7 @@ A successful query response looks like this:
         {
             "id": "42",
             "content": "The RAG service enables retrieval-augmented generation workflows...",
-            "score": 0.87
+            "score": 0.00820
         }
     ],
     "tokens_used": 1243

From 2d4c6c055557e82c34507e5fe3d3f73950ec74c2 Mon Sep 17 00:00:00 2001
From: Siva <siva.tamatam@pgedge.com>
Date: Thu, 23 Apr 2026 00:25:54 +0530
Subject: [PATCH 07/11] addressing review comments

---
 docs/services/rag.md | 40 +++++++++++++++++++++-------------------
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/docs/services/rag.md b/docs/services/rag.md
index ee3ad2b4..ceef5999 100644
--- a/docs/services/rag.md
+++ b/docs/services/rag.md
@@ -105,10 +105,10 @@ The following table describes the embedding configuration fields:
 
 | Field | Type | Description |
 |---|---|---|
-| `provider` | string | Required. The embedding provider. One of: `openai`, `voyage`, `anthropic`, `ollama`. |
+| `provider` | string | Required. The embedding provider. One of: `openai`, `voyage`, `ollama`. |
 | `model` | string | Required. The embedding model name (e.g., `text-embedding-3-small`, `voyage-3`, `nomic-embed-text`). |
-| `api_key` | string | API key for the provider. Required for `openai`, `voyage`, and `anthropic`. Not used for `ollama`. |
-| `base_url` | string | Optional. Custom base URL for the provider API. For `ollama`, defaults to `http://localhost:11434`. |
+| `api_key` | string | API key for the provider. Required for `openai` and `voyage`. Not used for `ollama`. |
+| `base_url` | string | Optional. Custom base URL for the provider API. Required for `ollama` — set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). |
 
 ### LLM Configuration
 
@@ -123,7 +123,7 @@ The following table describes the LLM configuration fields:
 | `provider` | string | Required. The LLM provider. One of: `anthropic`, `openai`, `ollama`. |
 | `model` | string | Required. The model name (e.g., `claude-sonnet-4-5`, `gpt-4o`, `llama3.2`). |
 | `api_key` | string | API key for the provider. Required for `anthropic` and `openai`. Not used for `ollama`. |
-| `base_url` | string | Optional. Custom base URL for API gateway routing. For `ollama`, defaults to `http://localhost:11434`. |
+| `base_url` | string | Optional. Custom base URL for API gateway routing. Required for `ollama` — set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). |
 
 !!! note
     If `embedding_llm` and `rag_llm` share the same provider and both
@@ -215,17 +215,20 @@ following table shows common models and their vector dimensions:
 | OpenAI | `text-embedding-3-small` | 1536 |
 | OpenAI | `text-embedding-3-large` | 3072 |
 | Voyage AI | `voyage-3` / `voyage-3-large` | 1024 |
-| Ollama | Varies by model | Check model documentation |
+| Ollama | `nomic-embed-text` | 768 |
+| Ollama | Other models | Check model documentation |
 
 ## Examples
 
 The following examples show how to configure the RAG service for
 common use cases. The first example includes the complete
 `scripts.post_database_create` setup to automatically provision the
-database schema (pgvector extension, tables, and indexes). Subsequent
-examples focus on service configuration variations and omit the schema
-setup for brevity - in production, always include the schema setup
-from the first example.
+database schema (pgvector extension, tables, and indexes) using
+`vector(1536)` for OpenAI embeddings. Subsequent examples focus on
+service configuration variations and omit the schema setup for brevity.
+If you use a different embedding model, adjust the `vector(N)` dimension
+in your schema to match - for example, `vector(1024)` for `voyage-3` or
+`vector(768)` for `nomic-embed-text`.
 
 ### Minimal (OpenAI + Anthropic)
 
@@ -238,7 +241,7 @@ with OpenAI for embeddings and Anthropic Claude for answer generation:
     curl -X POST http://host-1:3000/v1/databases \
         -H 'Content-Type: application/json' \
         --data '{
-            "id": "knowledge_base",
+            "id": "knowledge-base",
             "spec": {
                 "database_name": "knowledge_base",
                 "database_users": [
@@ -313,7 +316,7 @@ answer generation:
     curl -X POST http://host-1:3000/v1/databases \
         -H 'Content-Type: application/json' \
         --data '{
-            "id": "knowledge_base",
+            "id": "knowledge-base",
             "spec": {
                 "database_name": "knowledge_base",
                 "database_users": [
@@ -377,7 +380,7 @@ matching):
     curl -X POST http://host-1:3000/v1/databases \
         -H 'Content-Type: application/json' \
         --data '{
-            "id": "knowledge_base",
+            "id": "knowledge-base",
             "spec": {
                 "database_name": "knowledge_base",
                 "database_users": [
@@ -421,8 +424,7 @@ matching):
                                         "api_key": "sk-ant-..."
                                     },
                                     "search": {
-                                        "hybrid_enabled": false,
-                                        "vector_weight": 1.0
+                                        "hybrid_enabled": false
                                     }
                                 }
                             ]
@@ -445,7 +447,7 @@ required; the Ollama server URL is provided via `base_url`:
     curl -X POST http://host-1:3000/v1/databases \
         -H 'Content-Type: application/json' \
         --data '{
-            "id": "knowledge_base",
+            "id": "knowledge-base",
             "spec": {
                 "database_name": "knowledge_base",
                 "database_users": [
@@ -508,7 +510,7 @@ and `top_n` values set at the `defaults` level:
     curl -X POST http://host-1:3000/v1/databases \
         -H 'Content-Type: application/json' \
         --data '{
-            "id": "knowledge_base",
+            "id": "knowledge-base",
             "spec": {
                 "database_name": "knowledge_base",
                 "database_users": [
@@ -1022,11 +1024,11 @@ To verify that the database is accessible, run the following command:
 psql -h localhost -U admin -d knowledge_base -c "SELECT 1"
 ```
 
-To verify that the service user exists and has table access, run the
-following query:
+To verify that the service user (`app_read_only`) exists and has table
+access, run the following query:
 
 ```sql
-\du+ admin
+\du+ app_read_only
 \dt documents_content_chunks
 ```
 

From 69c87addac081a6e5efcf9fc99c22b9fc23dd8c2 Mon Sep 17 00:00:00 2001
From: Siva <siva.tamatam@pgedge.com>
Date: Thu, 23 Apr 2026 00:38:24 +0530
Subject: [PATCH 08/11] addressing pgedge-skill docs review comments

---
 docs/services/rag.md | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/docs/services/rag.md b/docs/services/rag.md
index ceef5999..77ce518b 100644
--- a/docs/services/rag.md
+++ b/docs/services/rag.md
@@ -13,11 +13,11 @@ project.
 The Control Plane provisions a RAG service container on each specified
 host. The service connects to the database using an existing user
 specified in the `connect_as` field, which must be defined in
-`database_users`. The credentials are automatically embedded in the
-service configuration by the Control Plane. Client applications submit
-natural language queries to the service, which performs hybrid vector
-and keyword search against document tables and returns LLM-synthesized
-answers with source citations.
+`database_users`, and automatically embeds that user's credentials in
+the service configuration. Client applications submit natural language
+queries to the service, which performs hybrid vector and keyword search
+against document tables and returns LLM-synthesized answers with source
+citations.
 
 See [Managing Services](managing.md) for instructions on adding,
 updating, and removing services. The sections below cover RAG-specific
@@ -108,7 +108,7 @@ The following table describes the embedding configuration fields:
 | `provider` | string | Required. The embedding provider. One of: `openai`, `voyage`, `ollama`. |
 | `model` | string | Required. The embedding model name (e.g., `text-embedding-3-small`, `voyage-3`, `nomic-embed-text`). |
 | `api_key` | string | API key for the provider. Required for `openai` and `voyage`. Not used for `ollama`. |
-| `base_url` | string | Optional. Custom base URL for the provider API. Required for `ollama` — set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). |
+| `base_url` | string | Optional. Custom base URL for the provider API. Required for `ollama` - set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). |
 
 ### LLM Configuration
 
@@ -123,7 +123,7 @@ The following table describes the LLM configuration fields:
 | `provider` | string | Required. The LLM provider. One of: `anthropic`, `openai`, `ollama`. |
 | `model` | string | Required. The model name (e.g., `claude-sonnet-4-5`, `gpt-4o`, `llama3.2`). |
 | `api_key` | string | API key for the provider. Required for `anthropic` and `openai`. Not used for `ollama`. |
-| `base_url` | string | Optional. Custom base URL for API gateway routing. Required for `ollama` — set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). |
+| `base_url` | string | Optional. Custom base URL for API gateway routing. Required for `ollama` - set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). |
 
 !!! note
     If `embedding_llm` and `rag_llm` share the same provider and both
@@ -1002,13 +1002,13 @@ The `scripts.post_database_create` field executes SQL automatically
 during database creation. The following details apply:
 
 - Execution timing: scripts run once, immediately after Spock is
-  initialized.
-- Transactional: all statements execute within a single transaction.
+  initialized
+- Transactional: all statements execute within a single transaction
 - No re-execution: if you update the database spec later, scripts are
-  not re-run.
+  not re-run
 - Constraints: some SQL commands are not allowed within transactions,
   including `VACUUM`, `ANALYZE`, `CREATE INDEX CONCURRENTLY`,
-  `CREATE DATABASE`, and `DROP DATABASE`.
+  `CREATE DATABASE`, and `DROP DATABASE`
 
 If a script fails during database creation, you can use
 `update-database` to retry after fixing the problematic statement.

From 0dcab019dc05646d31896137cb19108f862b0c89 Mon Sep 17 00:00:00 2001
From: Siva <siva.tamatam@pgedge.com>
Date: Thu, 23 Apr 2026 00:44:02 +0530
Subject: [PATCH 09/11] addressing AI review comments

---
 docs/services/rag.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/services/rag.md b/docs/services/rag.md
index 77ce518b..04c99f33 100644
--- a/docs/services/rag.md
+++ b/docs/services/rag.md
@@ -920,7 +920,7 @@ To list all configured pipelines, send the following request:
 === "curl"
 
     ```bash
-    curl http://localhost:9200/v1/pipelines
+    curl http://host-1:9200/v1/pipelines
     ```
 
 ### Query a Pipeline
@@ -931,7 +931,7 @@ text:
 === "curl"
 
     ```bash
-    curl -X POST http://localhost:9200/v1/pipelines/default \
+    curl -X POST http://host-1:9200/v1/pipelines/default \
       -H "Content-Type: application/json" \
       -d '{
         "query": "How does RAG improve LLM responses?",
@@ -1021,7 +1021,7 @@ connectivity and user permissions.
 To verify that the database is accessible, run the following command:
 
 ```bash
-psql -h localhost -U admin -d knowledge_base -c "SELECT 1"
+psql -h host-1 -U admin -d knowledge_base -c "SELECT 1"
 ```
 
 To verify that the service user (`app_read_only`) exists and has table
@@ -1088,7 +1088,7 @@ Plane and which are your responsibility:
 
 ## Next Steps
 
-The following resources provide more information on related topics.
+The following resources provide more information on related topics:
 
 - The [Managing Services](managing.md) guide describes how to add,
   update, and remove services.

From c8668ef0340e1ef58d1fce9fe6022df585fdcf51 Mon Sep 17 00:00:00 2001
From: susan-pgedge <130390403+susan-pgedge@users.noreply.github.com>
Date: Thu, 23 Apr 2026 11:53:18 -0400
Subject: [PATCH 10/11] Refine language and formatting in rag.md

---
 docs/services/rag.md | 46 +++++++++++++++++++++-----------------------
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/docs/services/rag.md b/docs/services/rag.md
index 04c99f33..070e997f 100644
--- a/docs/services/rag.md
+++ b/docs/services/rag.md
@@ -28,10 +28,10 @@ configuration.
 Before deploying a RAG service, your PostgreSQL database must have the
 following items configured:
 
-- pgvector extension installed and enabled.
-- document tables with text and vector columns.
-- HNSW index on vector columns for fast similarity search.
-- GIN index on text columns for keyword search (BM25).
+- The pgvector extension must be installed and enabled.
+- The database must have document tables with text and vector columns.
+- An HNSW index on vector columns enables fast similarity search.
+- A GIN index on text columns enables keyword search (BM25).
 
 The Control Plane can automatically provision all of these during
 database creation using the `scripts.post_database_create` hook. See
@@ -78,7 +78,7 @@ In this example, `app_read_only` must be defined in `database_users`:
 
 The `pipelines` array (required) defines one or more RAG workflows.
 Each pipeline specifies which tables to search, which embedding
-provider to use, and which LLM to use for answer generation.
+provider to use, and which LLM to use to generate answers.
 
 The following table describes the pipeline configuration fields:
 
@@ -233,7 +233,7 @@ in your schema to match - for example, `vector(1024)` for `voyage-3` or
 ### Minimal (OpenAI + Anthropic)
 
 In the following example, a `curl` command provisions a RAG service
-with OpenAI for embeddings and Anthropic Claude for answer generation:
+that uses OpenAI for embeddings and Anthropic Claude to generate answers:
 
 === "curl"
 
@@ -307,8 +307,8 @@ with OpenAI for embeddings and Anthropic Claude for answer generation:
 
 ### OpenAI End-to-End
 
-In the following example, OpenAI is used for both embeddings and
-answer generation:
+In the following example, OpenAI is used for both embeddings and to generate
+answers:
 
 === "curl"
 
@@ -690,10 +690,10 @@ that the database is ready and the RAG service is running:
 
 In the response, look for the following items:
 
-- `state: "available"` at the top level - the database is provisioned
-  and healthy.
-- `service_ready: true` inside `service_instances[].status` - the RAG
-  container is up and accepting requests.
+- The `state: "available"` field at the top level confirms that the
+  database is provisioned and healthy.
+- The `service_ready: true` field inside `service_instances[].status`
+  confirms that the RAG container is up and accepting requests.
 
 ```text
 {
@@ -974,10 +974,10 @@ request.
 The RAG service's hybrid search combines two complementary techniques,
 merged using Reciprocal Rank Fusion (RRF):
 
-- vector similarity search, which retrieves documents semantically
-  similar to the query using cosine distance on embeddings.
-- BM25 keyword search, which retrieves documents with exact keyword
-  matches using TF-IDF scoring.
+- Vector similarity search retrieves documents semantically similar to
+  the query using cosine distance on embeddings.
+- BM25 keyword search retrieves documents with exact keyword matches
+  using TF-IDF scoring.
 
 This combination ensures the LLM receives context that is both
 semantically relevant and keyword-relevant. Documents appearing in
@@ -1001,14 +1001,12 @@ The following sections describe common issues and how to resolve them.
 The `scripts.post_database_create` field executes SQL automatically
 during database creation. The following details apply:
 
-- Execution timing: scripts run once, immediately after Spock is
-  initialized
-- Transactional: all statements execute within a single transaction
-- No re-execution: if you update the database spec later, scripts are
-  not re-run
-- Constraints: some SQL commands are not allowed within transactions,
-  including `VACUUM`, `ANALYZE`, `CREATE INDEX CONCURRENTLY`,
-  `CREATE DATABASE`, and `DROP DATABASE`
+| Property | Details |
+|---|---|
+| Execution timing | Scripts run once, immediately after Spock is initialized. |
+| Transactional | All statements execute within a single transaction. |
+| No re-execution | If you update the database spec later, scripts are not re-run. |
+| Constraints | Some SQL commands are not allowed within transactions, including `VACUUM`, `ANALYZE`, `CREATE INDEX CONCURRENTLY`, `CREATE DATABASE`, and `DROP DATABASE`. |
 
 If a script fails during database creation, you can use
 `update-database` to retry after fixing the problematic statement.

From 68c9d5306398ebb98b618bedbcca9137bd0ad322 Mon Sep 17 00:00:00 2001
From: susan-pgedge <130390403+susan-pgedge@users.noreply.github.com>
Date: Thu, 23 Apr 2026 11:56:32 -0400
Subject: [PATCH 11/11] Update documentation for shared defaults in pipelines

Clarified wording in the documentation regarding shared default values for pipelines.
---
 docs/services/rag.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/services/rag.md b/docs/services/rag.md
index 070e997f..55e439eb 100644
--- a/docs/services/rag.md
+++ b/docs/services/rag.md
@@ -501,8 +501,9 @@ required; the Ollama server URL is provided via `base_url`:
 
 ### Multiple Pipelines with Shared Defaults
 
-In the following example, two pipelines share default `token_budget`
-and `top_n` values set at the `defaults` level:
+In the following example, two pipelines share default values for
+`token_budget` and `top_n`, set with the `defaults` properties:
+
 
 === "curl"