From 95698a3241a56f6c973659b8e96ac983579e51ef Mon Sep 17 00:00:00 2001 From: Siva Date: Wed, 22 Apr 2026 00:45:15 +0530 Subject: [PATCH 01/11] docs: add RAG service documentation and deployment guide --- changes/unreleased/Added-20260422-004204.yaml | 3 + docs/services/index.md | 5 +- docs/services/rag.md | 1218 +++++++++++++++++ 3 files changed, 1224 insertions(+), 2 deletions(-) create mode 100644 changes/unreleased/Added-20260422-004204.yaml create mode 100644 docs/services/rag.md diff --git a/changes/unreleased/Added-20260422-004204.yaml b/changes/unreleased/Added-20260422-004204.yaml new file mode 100644 index 00000000..4b8ad287 --- /dev/null +++ b/changes/unreleased/Added-20260422-004204.yaml @@ -0,0 +1,3 @@ +kind: Added +body: Add RAG service with hybrid vector + LLM search +time: 2026-04-22T00:42:04.283582+05:30 diff --git a/docs/services/index.md b/docs/services/index.md index 5267e180..406ad457 100644 --- a/docs/services/index.md +++ b/docs/services/index.md @@ -15,8 +15,9 @@ the following service types: - The [pgEdge Postgres MCP Server](mcp.md) connects AI agents and LLM-powered applications to your database, enabling natural language queries and AI-powered data access. -- The pgEdge RAG Server *(coming soon)* enables retrieval-augmented - generation workflows using your database as a knowledge store. +- The [pgEdge RAG Server](rag.md) enables retrieval-augmented generation + workflows using your database as a knowledge store, returning + LLM-synthesized answers grounded in your data. - [PostgREST](postgrest.md) automatically generates a REST API from your PostgreSQL schema, making your data accessible over HTTP without writing backend code. diff --git a/docs/services/rag.md b/docs/services/rag.md new file mode 100644 index 00000000..e80a98fc --- /dev/null +++ b/docs/services/rag.md @@ -0,0 +1,1218 @@ +# pgEdge RAG Server + +The RAG (Retrieval-Augmented Generation) service runs an intelligent query +server alongside your database. The service uses vector and keyword search +to retrieve relevant document chunks from PostgreSQL and synthesizes +LLM-generated answers based on the retrieved context. For more information, +see the [pgEdge RAG Server](https://github.com/pgEdge/pgedge-rag-server) +project. + +## Overview + +The Control Plane provisions a RAG service container on each specified +host. The service connects to the database using an existing user specified +in the `connect_as` field (which must be defined in `database_users`). The +credentials are automatically embedded in the service configuration by the +Control Plane. Client applications submit natural language queries to the +service, which performs hybrid vector and keyword search against document +tables and returns LLM-synthesized answers with source citations. + +See [Managing Services](managing.md) for instructions on adding, +updating, and removing services. The sections below cover RAG-specific +configuration. + +## Database Prerequisites + +Before deploying a RAG service, your PostgreSQL database must have: + +1. **pgvector extension** installed and enabled +2. **Document table(s)** with text and vector columns +3. **HNSW index** on vector columns for fast similarity search +4. **GIN index** on text columns for keyword search (BM25) + +The Control Plane can automatically provision all of these during database +creation using the `scripts.post_database_create` hook. See [Preparing the +Database](#preparing-the-database) for a complete example. Alternatively, +you can provision these manually after database creation. + +## Automation & Responsibilities + +The Control Plane handles certain setup tasks automatically during database +and service creation: + +**Automated (Control Plane)** +- Creating pgvector extension +- Creating document tables and indexes (via `scripts.post_database_create`) +- Embedding RAG service credentials into configuration files +- Deploying RAG container and health monitoring + +**Manual (You Provide)** +- **Schema Design**: Deciding table structure, column names, vector dimensions +- **Embedding Generation**: Using external APIs (OpenAI, Voyage, Ollama) to vectorize documents +- **Document Loading**: Inserting documents and embeddings into the database +- **API Credentials**: Providing LLM and embedding provider API keys + +## Configuration Reference + +All configuration fields are provided in the `config` object of the +service spec. + +### Service Connection + +The `connect_as` field (at the service level) specifies which database user +the RAG service will authenticate as. This user **must already be defined** in +the `database_users` array when creating the database. The Control Plane +automatically embeds that user's credentials in the service configuration. + +Example: +```json +{ + "service_id": "rag", + "service_type": "rag", + "connect_as": "app_read_only", + "config": { ... } +} +``` + +In this example, `app_read_only` must be defined in `database_users`: +```json +{ + "username": "app_read_only", + "password": "your_password", + "attributes": ["LOGIN"] +} +``` + +### Pipeline Configuration + +The `pipelines` array (required) defines one or more RAG workflows. Each +pipeline specifies which tables to search, which embedding provider to use, +and which LLM to use for answer generation. + +| Field | Type | Description | +|---|---|---| +| `pipelines[].name` | string | Required. Pipeline identifier used in query URLs. Lowercase alphanumeric, hyphens, and underscores. Must not start with a hyphen. | +| `pipelines[].description` | string | Optional. Human-readable pipeline description. | +| `pipelines[].tables[]` | array | Required. Array of table specifications. See [Table Configuration](#table-configuration). | +| `pipelines[].embedding_llm` | object | Required. Embedding provider config. See [Embedding Configuration](#embedding-configuration). | +| `pipelines[].rag_llm` | object | Required. LLM provider config. See [LLM Configuration](#llm-configuration). | +| `pipelines[].token_budget` | integer | Optional. Max tokens for context documents sent to the LLM. | +| `pipelines[].top_n` | integer | Optional. Number of documents to retrieve per query. | +| `pipelines[].system_prompt` | string | Optional. Custom system prompt prepended to every LLM request for this pipeline. | +| `pipelines[].search` | object | Optional. Search behavior settings. See [Search Configuration](#search-configuration). | + +### Embedding Configuration + +The `embedding_llm` object configures the embedding provider used to +vectorize each incoming query. The embedding vector is then used for +similarity search against stored document vectors. All required fields +must be set; `api_key` is not required for `ollama`. + +| Field | Type | Description | +|---|---|---| +| `provider` | string | Required. The embedding provider. One of: `openai`, `voyage`, `anthropic`, `ollama`. | +| `model` | string | Required. The embedding model name (e.g., `text-embedding-3-small`, `voyage-3`, `nomic-embed-text`). | +| `api_key` | string | API key for the provider. Required for `openai`, `voyage`, and `anthropic`. Not used for `ollama`. | +| `base_url` | string | Optional. Custom base URL for the provider API. For `ollama`, defaults to `http://localhost:11434`. | + +### LLM Configuration + +The `rag_llm` object configures the LLM provider used to synthesize the +final answer from retrieved documents. `api_key` is required for all +providers except `ollama`. + +| Field | Type | Description | +|---|---|---| +| `provider` | string | Required. The LLM provider. One of: `anthropic`, `openai`, `ollama`. | +| `model` | string | Required. The model name (e.g., `claude-sonnet-4-20250514`, `gpt-4o`, `llama3.2`). | +| `api_key` | string | API key for the provider. Required for `anthropic` and `openai`. Not used for `ollama`. | +| `base_url` | string | Optional. Custom base URL for API gateway routing. For `ollama`, defaults to `http://localhost:11434`. | + +!!! note + If `embedding_llm` and `rag_llm` share the same provider and both specify + an `api_key`, the values must be identical. The RAG server maintains one + key slot per provider and cannot reconcile two different values. + +### Table Configuration + +Each table in a pipeline specifies how to access document text and +embeddings. + +| Field | Type | Description | +|---|---|---| +| `table` | string | Required. The table or view name containing documents. | +| `text_column` | string | Required. Column name containing the document text. | +| `vector_column` | string | Required. Column name containing the embedding vectors. | +| `id_column` | string | Optional. Column name for document IDs. Defaults to the table's primary key. Required for views. | + +### Search Configuration + +The `search` object tunes how documents are retrieved before being passed +to the LLM. + +| Field | Type | Default | Description | +|---|---|---|---| +| `hybrid_enabled` | boolean | `true` | Enable hybrid search combining vector similarity and BM25 keyword matching. Set to `false` for vector-only search. | +| `vector_weight` | float | `0.5` | Weight for vector search versus BM25 (0.0–1.0). Higher values prioritize semantic relevance. | + +### Defaults Configuration + +The optional `defaults` object sets fallback values applied to any pipeline +that does not specify its own `token_budget` or `top_n`. + +| Field | Type | Description | +|---|---|---| +| `defaults.token_budget` | integer | Default max tokens for context documents. Must be a positive integer. | +| `defaults.top_n` | integer | Default number of documents to retrieve. Must be a positive integer. | + +## Preparing the Database + +Before deploying a RAG service, you must prepare your PostgreSQL database +with pgvector, document tables, and indexes. The Control Plane automatically +executes these during database creation when you include them in the +`scripts.post_database_create` array in your database specification. + +### Required Schema + +The following SQL statements should be included in `scripts.post_database_create` +to automatically initialize the database schema during creation: + +```sql +-- Enable pgvector extension +CREATE EXTENSION IF NOT EXISTS vector; + +-- Create documents table with embeddings +CREATE TABLE IF NOT EXISTS documents_content_chunks ( + id BIGSERIAL PRIMARY KEY, + content TEXT NOT NULL, + embedding vector(1536), + title TEXT, + source TEXT, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +-- HNSW index for vector similarity search +CREATE INDEX IF NOT EXISTS documents_embedding_idx + ON documents_content_chunks USING hnsw (embedding vector_cosine_ops); + +-- GIN index for keyword search (BM25) +CREATE INDEX IF NOT EXISTS documents_content_idx + ON documents_content_chunks USING gin (to_tsvector('english', content)); +``` + +These statements are included as individual entries in the `scripts.post_database_create` +array (see examples below). + +### Vector Dimensions + +Adjust the `vector(N)` dimension based on your embedding model: + +| Provider | Model | Dimensions | +|----------|-------|-----------| +| OpenAI | `text-embedding-3-small` | 1536 | +| OpenAI | `text-embedding-3-large` | 3072 | +| Voyage AI | `voyage-3` / `voyage-3-large` | 1024 | +| Ollama | Varies by model | Check model documentation | + +### Loading Documents + +After the database and RAG service are deployed, you are responsible for +generating embeddings for your documents and loading them into the database. +The Control Plane does not automate this step—you must run this process +separately, typically via an external application or scheduled task. + +Here's a Python example using OpenAI to generate embeddings and load documents: + +```python +#!/usr/bin/env python3 +"""Generate embeddings and load documents into the RAG database.""" + +import psycopg2 +from psycopg2.extras import execute_values +from openai import OpenAI +import os +import sys + +# Configuration +OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") +DB_HOST = os.environ.get("DB_HOST", "localhost") +DB_USER = os.environ.get("DB_USER", "admin") +DB_PASSWORD = os.environ.get("DB_PASSWORD", "admin_password") +DB_NAME = os.environ.get("DB_NAME", "knowledge_base") + +def chunk_text(text, chunk_size=500, overlap=50): + """Split text into overlapping chunks.""" + chunks = [] + for i in range(0, len(text), chunk_size - overlap): + chunk = text[i:i + chunk_size] + if chunk.strip(): + chunks.append(chunk) + return chunks + +def generate_embeddings(texts, client): + """Generate embeddings for multiple texts.""" + response = client.embeddings.create( + model="text-embedding-3-small", + input=texts + ) + return [item.embedding for item in response.data] + +# Sample documents +documents = [ + { + "title": "pgEdge Overview", + "content": "pgEdge is a distributed PostgreSQL system...", + "source": "docs" + }, + { + "title": "RAG Guide", + "content": "RAG enables intelligent question-answering systems...", + "source": "docs" + } +] + +if not OPENAI_API_KEY: + print("ERROR: OPENAI_API_KEY environment variable not set") + sys.exit(1) + +client = OpenAI(api_key=OPENAI_API_KEY) +conn = psycopg2.connect( + host=DB_HOST, + user=DB_USER, + password=DB_PASSWORD, + database=DB_NAME +) +cur = conn.cursor() + +total_inserted = 0 + +for doc in documents: + print(f"Processing: {doc['title']}") + chunks = chunk_text(doc["content"]) + + if chunks: + # Generate embeddings for all chunks + embeddings = generate_embeddings(chunks, client) + + # Prepare batch insert data + insert_data = [ + (chunk, embedding, doc["title"], doc["source"]) + for chunk, embedding in zip(chunks, embeddings) + ] + + # Batch insert + insert_query = """ + INSERT INTO documents_content_chunks + (content, embedding, title, source) + VALUES %s + """ + execute_values(cur, insert_query, insert_data) + conn.commit() + + inserted = len(insert_data) + total_inserted += inserted + print(f" Inserted {inserted} chunks") + +print(f"\nTotal chunks inserted: {total_inserted}") +cur.close() +conn.close() +``` + +**Usage:** +```bash +pip install psycopg2-binary openai +export OPENAI_API_KEY="sk-..." +export DB_HOST="localhost" +export DB_USER="admin" +export DB_PASSWORD="admin_password" +export DB_NAME="knowledge_base" +python3 load_rag_documents.py +``` + +## Examples + +The following examples show how to configure the RAG service for common +use cases. All examples use the `scripts.post_database_create` field to +automatically provision the database schema (pgvector extension, tables, +and indexes) during database creation. + +### Minimal (OpenAI + Anthropic) + +In the following example, a `curl` command provisions a RAG service with +OpenAI for embeddings and Anthropic Claude for answer generation: + +=== "curl" + + ```sh + curl -X POST http://host-1:3000/v1/databases \ + -H 'Content-Type: application/json' \ + --data '{ + "id": "knowledge_base", + "spec": { + "database_name": "knowledge_base", + "database_users": [ + { + "username": "admin", + "password": "admin_password", + "db_owner": true, + "attributes": ["SUPERUSER", "LOGIN"] + } + ], + "port": 5432, + "nodes": [ + { "name": "n1", "host_ids": ["host-1"] } + ], + "scripts": { + "post_database_create": [ + "CREATE EXTENSION IF NOT EXISTS vector", + "CREATE TABLE IF NOT EXISTS documents_content_chunks (id BIGSERIAL PRIMARY KEY, content TEXT NOT NULL, embedding vector(1536), title TEXT, source TEXT)", + "CREATE INDEX ON documents_content_chunks USING hnsw (embedding vector_cosine_ops)", + "CREATE INDEX ON documents_content_chunks USING gin (to_tsvector('\''english'\'', content))" + ] + }, + "services": [ + { + "service_id": "rag", + "service_type": "rag", + "version": "latest", + "host_ids": ["host-1"], + "port": 9200, + "connect_as": "admin", + "config": { + "pipelines": [ + { + "name": "default", + "description": "Main RAG pipeline", + "tables": [ + { + "table": "documents_content_chunks", + "text_column": "content", + "vector_column": "embedding" + } + ], + "embedding_llm": { + "provider": "openai", + "model": "text-embedding-3-small", + "api_key": "sk-..." + }, + "rag_llm": { + "provider": "anthropic", + "model": "claude-sonnet-4-20250514", + "api_key": "sk-ant-..." + }, + "token_budget": 4000, + "top_n": 10 + } + ] + } + } + ] + } + }' + ``` + +### OpenAI End-to-End + +In the following example, OpenAI is used for both embeddings and answer +generation: + +=== "curl" + + ```sh + curl -X POST http://host-1:3000/v1/databases \ + -H 'Content-Type: application/json' \ + --data '{ + "id": "knowledge_base", + "spec": { + "database_name": "knowledge_base", + "database_users": [ + { + "username": "admin", + "password": "admin_password", + "db_owner": true, + "attributes": ["SUPERUSER", "LOGIN"] + } + ], + "nodes": [ + { "name": "n1", "host_ids": ["host-1"] } + ], + "services": [ + { + "service_id": "rag", + "service_type": "rag", + "version": "latest", + "host_ids": ["host-1"], + "port": 9200, + "connect_as": "admin", + "config": { + "pipelines": [ + { + "name": "default", + "tables": [ + { + "table": "documents_content_chunks", + "text_column": "content", + "vector_column": "embedding" + } + ], + "embedding_llm": { + "provider": "openai", + "model": "text-embedding-3-small", + "api_key": "sk-..." + }, + "rag_llm": { + "provider": "openai", + "model": "gpt-4o", + "api_key": "sk-..." + } + } + ] + } + } + ] + } + }' + ``` + +### Voyage AI with Vector-Only Search + +In the following example, Voyage AI is used for embeddings and the service +is configured for vector-only search (disabling BM25 keyword matching): + +=== "curl" + + ```sh + curl -X POST http://host-1:3000/v1/databases \ + -H 'Content-Type: application/json' \ + --data '{ + "id": "knowledge_base", + "spec": { + "database_name": "knowledge_base", + "database_users": [ + { + "username": "admin", + "password": "admin_password", + "db_owner": true, + "attributes": ["SUPERUSER", "LOGIN"] + } + ], + "nodes": [ + { "name": "n1", "host_ids": ["host-1"] } + ], + "services": [ + { + "service_id": "rag", + "service_type": "rag", + "version": "latest", + "host_ids": ["host-1"], + "port": 9200, + "connect_as": "admin", + "config": { + "pipelines": [ + { + "name": "default", + "tables": [ + { + "table": "documents_content_chunks", + "text_column": "content", + "vector_column": "embedding" + } + ], + "embedding_llm": { + "provider": "voyage", + "model": "voyage-3", + "api_key": "pa-..." + }, + "rag_llm": { + "provider": "anthropic", + "model": "claude-sonnet-4-20250514", + "api_key": "sk-ant-..." + }, + "search": { + "hybrid_enabled": false, + "vector_weight": 1.0 + } + } + ] + } + } + ] + } + }' + ``` + +### Ollama (Self-Hosted) + +In the following example, the RAG service uses a self-hosted Ollama server +for both embeddings and answer generation. No API key is required; the +Ollama server URL is provided via `base_url`: + +=== "curl" + + ```sh + curl -X POST http://host-1:3000/v1/databases \ + -H 'Content-Type: application/json' \ + --data '{ + "id": "knowledge_base", + "spec": { + "database_name": "knowledge_base", + "database_users": [ + { + "username": "admin", + "password": "admin_password", + "db_owner": true, + "attributes": ["SUPERUSER", "LOGIN"] + } + ], + "nodes": [ + { "name": "n1", "host_ids": ["host-1"] } + ], + "services": [ + { + "service_id": "rag", + "service_type": "rag", + "version": "latest", + "host_ids": ["host-1"], + "port": 9200, + "connect_as": "admin", + "config": { + "pipelines": [ + { + "name": "default", + "tables": [ + { + "table": "documents_content_chunks", + "text_column": "content", + "vector_column": "embedding" + } + ], + "embedding_llm": { + "provider": "ollama", + "model": "nomic-embed-text", + "base_url": "http://ollama-host:11434" + }, + "rag_llm": { + "provider": "ollama", + "model": "llama3.2", + "base_url": "http://ollama-host:11434" + } + } + ] + } + } + ] + } + }' + ``` + +### Multiple Pipelines with Shared Defaults + +In the following example, two pipelines share default `token_budget` and +`top_n` values set at the `defaults` level: + +=== "curl" + + ```sh + curl -X POST http://host-1:3000/v1/databases \ + -H 'Content-Type: application/json' \ + --data '{ + "id": "knowledge_base", + "spec": { + "database_name": "knowledge_base", + "database_users": [ + { + "username": "admin", + "password": "admin_password", + "db_owner": true, + "attributes": ["SUPERUSER", "LOGIN"] + } + ], + "nodes": [ + { "name": "n1", "host_ids": ["host-1"] } + ], + "services": [ + { + "service_id": "rag", + "service_type": "rag", + "version": "latest", + "host_ids": ["host-1"], + "port": 9200, + "connect_as": "admin", + "config": { + "defaults": { + "token_budget": 4000, + "top_n": 10 + }, + "pipelines": [ + { + "name": "docs", + "description": "Product documentation", + "tables": [ + { + "table": "doc_chunks", + "text_column": "content", + "vector_column": "embedding" + } + ], + "embedding_llm": { + "provider": "openai", + "model": "text-embedding-3-small", + "api_key": "sk-..." + }, + "rag_llm": { + "provider": "anthropic", + "model": "claude-sonnet-4-20250514", + "api_key": "sk-ant-..." + } + }, + { + "name": "support", + "description": "Support ticket history", + "tables": [ + { + "table": "ticket_chunks", + "text_column": "body", + "vector_column": "embedding" + } + ], + "embedding_llm": { + "provider": "openai", + "model": "text-embedding-3-small", + "api_key": "sk-..." + }, + "rag_llm": { + "provider": "anthropic", + "model": "claude-sonnet-4-20250514", + "api_key": "sk-ant-..." + }, + "top_n": 5 + } + ] + } + } + ] + } + }' + ``` + +## End-to-End Walkthrough + +This section shows the complete flow from database creation to a working +pipeline query. + +### Step 1 — Create the Database + +Include `scripts.post_database_create` to automatically provision the +pgvector schema during database creation. This avoids any manual setup +after deployment. Use a fixed `port` value for the RAG service so the +URL stays stable across container restarts. + +=== "curl" + + ```sh + curl -X POST http://host-1:3000/v1/databases \ + -H 'Content-Type: application/json' \ + --data '{ + "id": "knowledge-base", + "spec": { + "database_name": "knowledge_base", + "database_users": [ + { + "username": "admin", + "password": "admin_password", + "db_owner": true, + "attributes": ["SUPERUSER", "LOGIN"] + }, + { + "username": "app_read_only", + "password": "readonly_password", + "attributes": ["LOGIN"] + } + ], + "port": 5432, + "nodes": [ + { "name": "n1", "host_ids": ["host-1"] } + ], + "scripts": { + "post_database_create": [ + "CREATE EXTENSION IF NOT EXISTS vector", + "CREATE TABLE IF NOT EXISTS documents_content_chunks (id BIGSERIAL PRIMARY KEY, content TEXT NOT NULL, embedding vector(1536), title TEXT, source TEXT)", + "CREATE INDEX ON documents_content_chunks USING hnsw (embedding vector_cosine_ops)", + "CREATE INDEX ON documents_content_chunks USING gin (to_tsvector('\''english'\'', content))", + "GRANT SELECT ON documents_content_chunks TO app_read_only" + ] + }, + "services": [ + { + "service_id": "rag", + "service_type": "rag", + "version": "latest", + "host_ids": ["host-1"], + "port": 9200, + "connect_as": "app_read_only", + "config": { + "pipelines": [ + { + "name": "default", + "description": "Main RAG pipeline", + "tables": [ + { + "table": "documents_content_chunks", + "text_column": "content", + "vector_column": "embedding" + } + ], + "embedding_llm": { + "provider": "openai", + "model": "text-embedding-3-small", + "api_key": "sk-..." + }, + "rag_llm": { + "provider": "anthropic", + "model": "claude-sonnet-4-5", + "api_key": "sk-ant-..." + }, + "token_budget": 4000, + "top_n": 10 + } + ] + } + } + ] + } + }' + ``` + +### Step 2 — Check the Database and Service Status + +Run the following command after ~60–90 seconds to check the database is +ready and the RAG service is running: + +=== "curl" + + ```sh + curl -s http://host-1:3000/v1/databases/knowledge-base + ``` + +In the response, look for two things: + +- `state: "available"` at the top level — the database is provisioned + and healthy +- `service_ready: true` inside `service_instances[].status` — the RAG + container is up and accepting requests + +``` +{ + state: "available" + instances: [ + { + state: "available" + postgres: { + patroni_state: "running" + role: "primary" + } + } + ] + service_instances: [ + { + state: "running" + status: { + service_ready: true + ports: [ + { + container_port: 8080 + host_port: 9200 + name: "tcp" + } + ] + last_health_at: "2026-04-22T10:00:00Z" + } + } + ] +} +``` + +The `host_port` value is the port to use when querying the RAG service. +If you used a fixed `port: 9200` in the service spec, this will always +be `9200`. + +!!! tip + Use a fixed `port` value (e.g. `9200`) in the service spec rather than + `port: 0`. When `port: 0` is used, Docker assigns a random host port + that changes each time the RAG container is replaced (e.g. after an + API key update), requiring you to look up the new port each time. + +### Step 3 — Load Documents + +The RAG service needs documents with embeddings in the database before +it can answer queries. The following Python script generates embeddings +using OpenAI and inserts them into `documents_content_chunks`: + +```python +#!/usr/bin/env python3 +import psycopg2 +from psycopg2.extras import execute_values +from openai import OpenAI +import os + +client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) +conn = psycopg2.connect( + host=os.environ.get("DB_HOST", "host-1"), + port=int(os.environ.get("DB_PORT", "5432")), + user=os.environ.get("DB_USER", "admin"), + password=os.environ.get("DB_PASSWORD", "admin_password"), + database=os.environ.get("DB_NAME", "knowledge_base"), +) +cur = conn.cursor() + +documents = [ + {"title": "My Doc", "content": "Full document text goes here...", "source": "docs"}, +] + +def chunk_text(text, size=500, overlap=50): + return [text[i:i+size] for i in range(0, len(text), size-overlap) if text[i:i+size].strip()] + +for doc in documents: + chunks = chunk_text(doc["content"]) + resp = client.embeddings.create(model="text-embedding-3-small", input=chunks) + embeddings = [item.embedding for item in resp.data] + execute_values(cur, + "INSERT INTO documents_content_chunks (content, embedding, title, source) VALUES %s", + [(c, e, doc["title"], doc["source"]) for c, e in zip(chunks, embeddings)], + ) + conn.commit() + print(f"Loaded {len(chunks)} chunks from '{doc['title']}'") + +cur.close() +conn.close() +``` + +```bash +pip install psycopg2-binary openai +export OPENAI_API_KEY="sk-..." +export DB_HOST="host-1" +export DB_USER="admin" +export DB_PASSWORD="admin_password" +export DB_NAME="knowledge_base" +python3 load_documents.py +``` + +Verify documents were inserted: + +```bash +psql "postgresql://admin:admin_password@host-1:5432/knowledge_base" \ + -c "SELECT COUNT(*), COUNT(embedding) FROM documents_content_chunks;" +``` + +### Step 4 — Query the Pipeline + +```bash +curl -X POST http://host-1:9200/v1/pipelines/default \ + -H "Content-Type: application/json" \ + -d '{ + "query": "How does multi-active replication work?", + "include_sources": true + }' +``` + +A successful response: + +```json +{ + "answer": "Multi-active replication allows multiple PostgreSQL nodes to accept writes simultaneously...", + "sources": [ + {"id": "5", "content": "...", "score": 0.82}, + {"id": "1", "content": "...", "score": 0.79} + ], + "tokens_used": 1243 +} +``` + +`sources` is only populated when `include_sources: true` is set in the +request. + +### Step 5 — Update the Service Config + +To update the service (for example, to rotate an API key or change the +LLM model), submit a `POST /v1/databases/{id}` with the complete updated +spec. The update endpoint requires all fields — include `database_name`, +`nodes`, `database_users`, and the full `services` array: + +=== "curl" + + ```sh + curl -X POST http://host-1:3000/v1/databases/knowledge-base \ + -H 'Content-Type: application/json' \ + --data '{ + "spec": { + "database_name": "knowledge_base", + "port": 5432, + "nodes": [ + { "name": "n1", "host_ids": ["host-1"] } + ], + "database_users": [ + { + "username": "admin", + "password": "admin_password", + "db_owner": true, + "attributes": ["SUPERUSER", "LOGIN"] + }, + { + "username": "app_read_only", + "password": "readonly_password", + "attributes": ["LOGIN"] + } + ], + "services": [ + { + "service_id": "rag", + "service_type": "rag", + "version": "latest", + "host_ids": ["host-1"], + "port": 9200, + "connect_as": "app_read_only", + "config": { + "pipelines": [ + { + "name": "default", + "tables": [ + { + "table": "documents_content_chunks", + "text_column": "content", + "vector_column": "embedding" + } + ], + "embedding_llm": { + "provider": "openai", + "model": "text-embedding-3-small", + "api_key": "sk-..." + }, + "rag_llm": { + "provider": "anthropic", + "model": "claude-sonnet-4-5", + "api_key": "sk-ant-NEW-KEY" + }, + "token_budget": 4000, + "top_n": 10 + } + ] + } + } + ] + } + }' + ``` + +The RAG service container is replaced with the new configuration. Poll +the database status until `state` is `"available"` and `service_ready` +is `true` before sending queries. + +## Querying the RAG Service + +Once the service is running, submit queries to retrieve answers based on +your documents. + +### List Available Pipelines + +=== "curl" + + ```bash + curl http://localhost:9200/v1/pipelines + ``` + +### Query a Pipeline + +=== "curl" + + ```bash + curl -X POST http://localhost:9200/v1/pipelines/default \ + -H "Content-Type: application/json" \ + -d '{ + "query": "How does RAG improve LLM responses?", + "include_sources": true + }' + ``` + +### Request Fields + +| Field | Type | Default | Description | +|---|---|---|---| +| `query` | string | — | Required. The natural language question to answer. | +| `include_sources` | boolean | `false` | Return the source documents used to generate the answer. | +| `top_n` | integer | — | Override the pipeline's `top_n` for this request. | +| `stream` | boolean | `false` | Stream the answer as Server-Sent Events. | + +### Response Format + +```json +{ + "answer": "RAG (Retrieval-Augmented Generation) improves LLM responses by retrieving relevant documents from your database before generating answers. This grounds the LLM in your specific data, reducing hallucinations and improving accuracy...", + "sources": [ + { + "id": "42", + "content": "The RAG service enables retrieval-augmented generation workflows...", + "score": 0.87 + } + ], + "tokens_used": 1243 +} +``` + +`sources` is only populated when `include_sources` is `true` in the request. + +The RAG service uses **hybrid search**, combining two complementary search +techniques that are merged using **Reciprocal Rank Fusion (RRF)**: + +1. **Vector Similarity Search**: Retrieves documents semantically similar to + the query using cosine distance on embeddings. +2. **BM25 Keyword Search**: Retrieves documents with exact keyword matches + using TF-IDF scoring. + +This combination ensures the LLM receives context that is both semantically +relevant and keyword-relevant. Documents appearing in both result sets receive +higher scores, naturally prioritizing highly-relevant results. + +### Search Configuration + +Configure search behavior in the pipeline: + +```json +"search": { + "hybrid_enabled": true, + "vector_weight": 0.7 +} +``` + +| Parameter | Range | Description | +|-----------|-------|-------------| +| `hybrid_enabled` | `true` / `false` | Enable hybrid search (default: `true`). Set to `false` for vector-only search. | +| `vector_weight` | 0.0–1.0 | Weight for vector search vs BM25 (default: `0.5`). Higher values prioritize semantic relevance. | + +### Token Budget + +The `token_budget` field controls how much context is sent to the LLM: + +- Documents are ranked and packed in order until the budget is exhausted +- The final document is truncated at a sentence boundary (not mid-word) + +Increase the budget to send more context, or decrease it to reduce LLM costs. + +## User-Managed Responsibilities + +You are responsible for: + +1. **Embedding Generation**: Using embedding provider APIs (OpenAI, Voyage AI, + Ollama) to generate vector embeddings for your documents +2. **Document Ingestion**: Loading document text and embeddings into the + `documents_content_chunks` table +3. **API Keys**: Providing credentials for embedding and LLM providers in the + service `config` +4. **Chunking Strategy**: Deciding how to split large documents for optimal + retrieval (e.g., 500-1000 character chunks with overlap) + +The Control Plane handles: + +1. **Schema Provisioning**: Automatically creating pgvector extension, tables, + and indexes via `scripts.post_database_create` during database creation +2. **Service Deployment**: Provisioning and managing the RAG container +3. **Database Credentials**: Automatically embedding the `connect_as` user's + credentials in the service configuration (credentials must be defined in + `database_users` during database creation) +4. **Health Monitoring**: Checking service health and restarting on failure + +## Troubleshooting + +### About Automated Scripts + +The `scripts.post_database_create` field executes SQL automatically during +database creation. Some important details: + +- **Execution Timing**: Scripts run once, immediately after Spock is initialized +- **Transactional**: All statements execute within a single transaction +- **No Re-Execution**: If you update the database spec later, scripts are not re-run +- **Constraints**: Some SQL commands are not allowed within transactions: + - `VACUUM`, `ANALYZE` (use `REINDEX` instead) + - `CREATE INDEX CONCURRENTLY` + - `CREATE DATABASE`, `DROP DATABASE` + +If a script fails during database creation, you can use `update-database` to +retry after fixing the problematic statement. + +### Service Fails to Start + +**Check database connectivity:** + +```bash +# From host, verify database is accessible +psql -h localhost -U admin -d knowledge_base -c "SELECT 1" +``` + +**Check user permissions:** + +```sql +-- Verify the service user exists and has table access +\du+ admin +\dt documents_content_chunks +``` + +### Poor Query Results + +**Verify documents are loaded:** + +```sql +-- Check document count +SELECT COUNT(*) FROM documents_content_chunks; + +-- Verify embeddings exist +SELECT COUNT(*) FROM documents_content_chunks WHERE embedding IS NOT NULL; +``` + +**Inspect embedding quality:** + +```sql +-- Find documents similar to a test query embedding +SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) as similarity +FROM documents_content_chunks +ORDER BY similarity DESC +LIMIT 5; +``` + +**Try simpler queries:** + +Start with factual, keyword-based questions before complex analytical questions. + +### Empty Context Window + +If the RAG service returns limited context, the token budget may be exhausted. Increase it: + +```json +"token_budget": 8000 +``` + +Or store smaller, more focused document chunks. + +## Next Steps + +- Once you've validated the RAG service with manual documents, consider automating embedding generation +- Implement document versioning and updates for evolving knowledge bases +- Set up monitoring for query latency and answer quality +- Explore pgedge_vectorizer for automated chunking and embedding in high-volume scenarios + +## Responsibility Summary + +| Step | Who | How | +|---|---|---| +| Provision schema (pgvector, tables, indexes) | Control Plane | `scripts.post_database_create` in database spec | +| Deploy RAG container | Control Plane | Automatic on `POST /v1/databases` | +| Inject database credentials | Control Plane | Automatic via `connect_as` field | +| Health monitoring and restart | Control Plane | Automatic | +| Generate embeddings | You | Call OpenAI / Voyage / Ollama API | +| Load documents into table | You | `INSERT` using psycopg2 or any Postgres client | +| Submit queries | Your application | `POST /v1/pipelines/{name}` on the RAG service | + +## Additional Resources + +- [RAG Server Repository](https://github.com/pgEdge/pgedge-rag-server) +- [RAG Server Documentation](https://docs.pgedge.com/pgedge-rag-server/) +- [pgvector Documentation](https://github.com/pgvector/pgvector) +- [Managing Services](managing.md) From 6b43be60532b517247284056a715ace5d3a520f8 Mon Sep 17 00:00:00 2001 From: Siva Date: Wed, 22 Apr 2026 01:14:18 +0530 Subject: [PATCH 02/11] addressing AI review comments --- docs/services/rag.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/services/rag.md b/docs/services/rag.md index e80a98fc..bd6acd7d 100644 --- a/docs/services/rag.md +++ b/docs/services/rag.md @@ -332,9 +332,12 @@ python3 load_rag_documents.py ## Examples The following examples show how to configure the RAG service for common -use cases. All examples use the `scripts.post_database_create` field to -automatically provision the database schema (pgvector extension, tables, -and indexes) during database creation. +use cases. The first example includes the complete +`scripts.post_database_create` setup to automatically provision the +database schema (pgvector extension, tables, and indexes). Subsequent +examples focus on service configuration variations and omit the schema +setup for brevity — in production, always include the schema setup from +the first example. ### Minimal (OpenAI + Anthropic) From 453f5a72daa36478d291cef3c6791e9a2ac1ce60 Mon Sep 17 00:00:00 2001 From: Siva Date: Wed, 22 Apr 2026 19:28:38 +0530 Subject: [PATCH 03/11] addressing review comments --- docs/services/rag.md | 537 +++++++++++++++++-------------------------- 1 file changed, 207 insertions(+), 330 deletions(-) diff --git a/docs/services/rag.md b/docs/services/rag.md index bd6acd7d..c8b91141 100644 --- a/docs/services/rag.md +++ b/docs/services/rag.md @@ -1,21 +1,23 @@ # pgEdge RAG Server -The RAG (Retrieval-Augmented Generation) service runs an intelligent query -server alongside your database. The service uses vector and keyword search -to retrieve relevant document chunks from PostgreSQL and synthesizes -LLM-generated answers based on the retrieved context. For more information, -see the [pgEdge RAG Server](https://github.com/pgEdge/pgedge-rag-server) +The RAG (Retrieval-Augmented Generation) service runs an intelligent +query server alongside your database. The service uses vector and +keyword search to retrieve relevant document chunks from PostgreSQL +and synthesizes LLM-generated answers based on the retrieved context. +For more information, see the +[pgEdge RAG Server](https://github.com/pgEdge/pgedge-rag-server) project. ## Overview The Control Plane provisions a RAG service container on each specified -host. The service connects to the database using an existing user specified -in the `connect_as` field (which must be defined in `database_users`). The -credentials are automatically embedded in the service configuration by the -Control Plane. Client applications submit natural language queries to the -service, which performs hybrid vector and keyword search against document -tables and returns LLM-synthesized answers with source citations. +host. The service connects to the database using an existing user +specified in the `connect_as` field, which must be defined in +`database_users`. The credentials are automatically embedded in the +service configuration by the Control Plane. Client applications submit +natural language queries to the service, which performs hybrid vector +and keyword search against document tables and returns LLM-synthesized +answers with source citations. See [Managing Services](managing.md) for instructions on adding, updating, and removing services. The sections below cover RAG-specific @@ -23,34 +25,19 @@ configuration. ## Database Prerequisites -Before deploying a RAG service, your PostgreSQL database must have: +Before deploying a RAG service, your PostgreSQL database must have the +following items configured: -1. **pgvector extension** installed and enabled -2. **Document table(s)** with text and vector columns -3. **HNSW index** on vector columns for fast similarity search -4. **GIN index** on text columns for keyword search (BM25) +- pgvector extension installed and enabled. +- document tables with text and vector columns. +- HNSW index on vector columns for fast similarity search. +- GIN index on text columns for keyword search (BM25). -The Control Plane can automatically provision all of these during database -creation using the `scripts.post_database_create` hook. See [Preparing the -Database](#preparing-the-database) for a complete example. Alternatively, -you can provision these manually after database creation. - -## Automation & Responsibilities - -The Control Plane handles certain setup tasks automatically during database -and service creation: - -**Automated (Control Plane)** -- Creating pgvector extension -- Creating document tables and indexes (via `scripts.post_database_create`) -- Embedding RAG service credentials into configuration files -- Deploying RAG container and health monitoring - -**Manual (You Provide)** -- **Schema Design**: Deciding table structure, column names, vector dimensions -- **Embedding Generation**: Using external APIs (OpenAI, Voyage, Ollama) to vectorize documents -- **Document Loading**: Inserting documents and embeddings into the database -- **API Credentials**: Providing LLM and embedding provider API keys +The Control Plane can automatically provision all of these during +database creation using the `scripts.post_database_create` hook. See +[Preparing the Database](#preparing-the-database) for a complete +example. Alternatively, you can provision these manually after +database creation. ## Configuration Reference @@ -59,12 +46,15 @@ service spec. ### Service Connection -The `connect_as` field (at the service level) specifies which database user -the RAG service will authenticate as. This user **must already be defined** in -the `database_users` array when creating the database. The Control Plane -automatically embeds that user's credentials in the service configuration. +The `connect_as` field at the service level specifies which database +user the RAG service authenticates as. This user must already be +defined in the `database_users` array when creating the database. The +Control Plane automatically embeds that user's credentials in the +service configuration. + +The following example shows the `connect_as` field in the service +spec: -Example: ```json { "service_id": "rag", @@ -75,6 +65,7 @@ Example: ``` In this example, `app_read_only` must be defined in `database_users`: + ```json { "username": "app_read_only", @@ -85,9 +76,11 @@ In this example, `app_read_only` must be defined in `database_users`: ### Pipeline Configuration -The `pipelines` array (required) defines one or more RAG workflows. Each -pipeline specifies which tables to search, which embedding provider to use, -and which LLM to use for answer generation. +The `pipelines` array (required) defines one or more RAG workflows. +Each pipeline specifies which tables to search, which embedding +provider to use, and which LLM to use for answer generation. + +The following table describes the pipeline configuration fields: | Field | Type | Description | |---|---|---| @@ -108,6 +101,8 @@ vectorize each incoming query. The embedding vector is then used for similarity search against stored document vectors. All required fields must be set; `api_key` is not required for `ollama`. +The following table describes the embedding configuration fields: + | Field | Type | Description | |---|---|---| | `provider` | string | Required. The embedding provider. One of: `openai`, `voyage`, `anthropic`, `ollama`. | @@ -117,26 +112,30 @@ must be set; `api_key` is not required for `ollama`. ### LLM Configuration -The `rag_llm` object configures the LLM provider used to synthesize the -final answer from retrieved documents. `api_key` is required for all -providers except `ollama`. +The `rag_llm` object configures the LLM provider used to synthesize +the final answer from retrieved documents. `api_key` is required for +all providers except `ollama`. + +The following table describes the LLM configuration fields: | Field | Type | Description | |---|---|---| | `provider` | string | Required. The LLM provider. One of: `anthropic`, `openai`, `ollama`. | -| `model` | string | Required. The model name (e.g., `claude-sonnet-4-20250514`, `gpt-4o`, `llama3.2`). | +| `model` | string | Required. The model name (e.g., `claude-sonnet-4-5`, `gpt-4o`, `llama3.2`). | | `api_key` | string | API key for the provider. Required for `anthropic` and `openai`. Not used for `ollama`. | | `base_url` | string | Optional. Custom base URL for API gateway routing. For `ollama`, defaults to `http://localhost:11434`. | !!! note - If `embedding_llm` and `rag_llm` share the same provider and both specify - an `api_key`, the values must be identical. The RAG server maintains one - key slot per provider and cannot reconcile two different values. + If `embedding_llm` and `rag_llm` share the same provider and both + specify an `api_key`, the values must be identical. The pgEdge RAG + Server maintains one key slot per provider and cannot reconcile + two different values. ### Table Configuration Each table in a pipeline specifies how to access document text and -embeddings. +embeddings. The following table describes the table configuration +fields: | Field | Type | Description | |---|---|---| @@ -147,18 +146,20 @@ embeddings. ### Search Configuration -The `search` object tunes how documents are retrieved before being passed -to the LLM. +The `search` object tunes how documents are retrieved before being +passed to the LLM. The following table describes the search +configuration fields: | Field | Type | Default | Description | |---|---|---|---| | `hybrid_enabled` | boolean | `true` | Enable hybrid search combining vector similarity and BM25 keyword matching. Set to `false` for vector-only search. | -| `vector_weight` | float | `0.5` | Weight for vector search versus BM25 (0.0–1.0). Higher values prioritize semantic relevance. | +| `vector_weight` | float | `0.5` | Weight for vector search versus BM25 (0.0-1.0). Higher values prioritize semantic relevance. | ### Defaults Configuration -The optional `defaults` object sets fallback values applied to any pipeline -that does not specify its own `token_budget` or `top_n`. +The optional `defaults` object sets fallback values applied to any +pipeline that does not specify its own `token_budget` or `top_n`. The +following table describes the defaults configuration fields: | Field | Type | Description | |---|---|---| @@ -167,14 +168,15 @@ that does not specify its own `token_budget` or `top_n`. ## Preparing the Database -Before deploying a RAG service, you must prepare your PostgreSQL database -with pgvector, document tables, and indexes. The Control Plane automatically -executes these during database creation when you include them in the -`scripts.post_database_create` array in your database specification. +Before deploying a RAG service, you must prepare your PostgreSQL +database with pgvector, document tables, and indexes. The Control +Plane automatically executes these during database creation when you +include them in the `scripts.post_database_create` array in your +database specification. ### Required Schema -The following SQL statements should be included in `scripts.post_database_create` +Include the following SQL statements in `scripts.post_database_create` to automatically initialize the database schema during creation: ```sql @@ -200,12 +202,13 @@ CREATE INDEX IF NOT EXISTS documents_content_idx ON documents_content_chunks USING gin (to_tsvector('english', content)); ``` -These statements are included as individual entries in the `scripts.post_database_create` -array (see examples below). +These statements are included as individual entries in the +`scripts.post_database_create` array (see examples below). ### Vector Dimensions -Adjust the `vector(N)` dimension based on your embedding model: +Adjust the `vector(N)` dimension to match your embedding model. The +following table shows common models and their vector dimensions: | Provider | Model | Dimensions | |----------|-------|-----------| @@ -214,135 +217,20 @@ Adjust the `vector(N)` dimension based on your embedding model: | Voyage AI | `voyage-3` / `voyage-3-large` | 1024 | | Ollama | Varies by model | Check model documentation | -### Loading Documents - -After the database and RAG service are deployed, you are responsible for -generating embeddings for your documents and loading them into the database. -The Control Plane does not automate this step—you must run this process -separately, typically via an external application or scheduled task. - -Here's a Python example using OpenAI to generate embeddings and load documents: - -```python -#!/usr/bin/env python3 -"""Generate embeddings and load documents into the RAG database.""" - -import psycopg2 -from psycopg2.extras import execute_values -from openai import OpenAI -import os -import sys - -# Configuration -OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") -DB_HOST = os.environ.get("DB_HOST", "localhost") -DB_USER = os.environ.get("DB_USER", "admin") -DB_PASSWORD = os.environ.get("DB_PASSWORD", "admin_password") -DB_NAME = os.environ.get("DB_NAME", "knowledge_base") - -def chunk_text(text, chunk_size=500, overlap=50): - """Split text into overlapping chunks.""" - chunks = [] - for i in range(0, len(text), chunk_size - overlap): - chunk = text[i:i + chunk_size] - if chunk.strip(): - chunks.append(chunk) - return chunks - -def generate_embeddings(texts, client): - """Generate embeddings for multiple texts.""" - response = client.embeddings.create( - model="text-embedding-3-small", - input=texts - ) - return [item.embedding for item in response.data] - -# Sample documents -documents = [ - { - "title": "pgEdge Overview", - "content": "pgEdge is a distributed PostgreSQL system...", - "source": "docs" - }, - { - "title": "RAG Guide", - "content": "RAG enables intelligent question-answering systems...", - "source": "docs" - } -] - -if not OPENAI_API_KEY: - print("ERROR: OPENAI_API_KEY environment variable not set") - sys.exit(1) - -client = OpenAI(api_key=OPENAI_API_KEY) -conn = psycopg2.connect( - host=DB_HOST, - user=DB_USER, - password=DB_PASSWORD, - database=DB_NAME -) -cur = conn.cursor() - -total_inserted = 0 - -for doc in documents: - print(f"Processing: {doc['title']}") - chunks = chunk_text(doc["content"]) - - if chunks: - # Generate embeddings for all chunks - embeddings = generate_embeddings(chunks, client) - - # Prepare batch insert data - insert_data = [ - (chunk, embedding, doc["title"], doc["source"]) - for chunk, embedding in zip(chunks, embeddings) - ] - - # Batch insert - insert_query = """ - INSERT INTO documents_content_chunks - (content, embedding, title, source) - VALUES %s - """ - execute_values(cur, insert_query, insert_data) - conn.commit() - - inserted = len(insert_data) - total_inserted += inserted - print(f" Inserted {inserted} chunks") - -print(f"\nTotal chunks inserted: {total_inserted}") -cur.close() -conn.close() -``` - -**Usage:** -```bash -pip install psycopg2-binary openai -export OPENAI_API_KEY="sk-..." -export DB_HOST="localhost" -export DB_USER="admin" -export DB_PASSWORD="admin_password" -export DB_NAME="knowledge_base" -python3 load_rag_documents.py -``` - ## Examples -The following examples show how to configure the RAG service for common -use cases. The first example includes the complete +The following examples show how to configure the RAG service for +common use cases. The first example includes the complete `scripts.post_database_create` setup to automatically provision the database schema (pgvector extension, tables, and indexes). Subsequent examples focus on service configuration variations and omit the schema -setup for brevity — in production, always include the schema setup from -the first example. +setup for brevity - in production, always include the schema setup +from the first example. ### Minimal (OpenAI + Anthropic) -In the following example, a `curl` command provisions a RAG service with -OpenAI for embeddings and Anthropic Claude for answer generation: +In the following example, a `curl` command provisions a RAG service +with OpenAI for embeddings and Anthropic Claude for answer generation: === "curl" @@ -400,7 +288,7 @@ OpenAI for embeddings and Anthropic Claude for answer generation: }, "rag_llm": { "provider": "anthropic", - "model": "claude-sonnet-4-20250514", + "model": "claude-sonnet-4-5", "api_key": "sk-ant-..." }, "token_budget": 4000, @@ -416,8 +304,8 @@ OpenAI for embeddings and Anthropic Claude for answer generation: ### OpenAI End-to-End -In the following example, OpenAI is used for both embeddings and answer -generation: +In the following example, OpenAI is used for both embeddings and +answer generation: === "curl" @@ -479,8 +367,9 @@ generation: ### Voyage AI with Vector-Only Search -In the following example, Voyage AI is used for embeddings and the service -is configured for vector-only search (disabling BM25 keyword matching): +In the following example, Voyage AI is used for embeddings and the +service is configured for vector-only search (disabling BM25 keyword +matching): === "curl" @@ -528,7 +417,7 @@ is configured for vector-only search (disabling BM25 keyword matching): }, "rag_llm": { "provider": "anthropic", - "model": "claude-sonnet-4-20250514", + "model": "claude-sonnet-4-5", "api_key": "sk-ant-..." }, "search": { @@ -546,9 +435,9 @@ is configured for vector-only search (disabling BM25 keyword matching): ### Ollama (Self-Hosted) -In the following example, the RAG service uses a self-hosted Ollama server -for both embeddings and answer generation. No API key is required; the -Ollama server URL is provided via `base_url`: +In the following example, the RAG service uses a self-hosted Ollama +server for both embeddings and answer generation. No API key is +required; the Ollama server URL is provided via `base_url`: === "curl" @@ -610,8 +499,8 @@ Ollama server URL is provided via `base_url`: ### Multiple Pipelines with Shared Defaults -In the following example, two pipelines share default `token_budget` and -`top_n` values set at the `defaults` level: +In the following example, two pipelines share default `token_budget` +and `top_n` values set at the `defaults` level: === "curl" @@ -664,7 +553,7 @@ In the following example, two pipelines share default `token_budget` and }, "rag_llm": { "provider": "anthropic", - "model": "claude-sonnet-4-20250514", + "model": "claude-sonnet-4-5", "api_key": "sk-ant-..." } }, @@ -685,7 +574,7 @@ In the following example, two pipelines share default `token_budget` and }, "rag_llm": { "provider": "anthropic", - "model": "claude-sonnet-4-20250514", + "model": "claude-sonnet-4-5", "api_key": "sk-ant-..." }, "top_n": 5 @@ -698,12 +587,12 @@ In the following example, two pipelines share default `token_budget` and }' ``` -## End-to-End Walkthrough +## Deployment Guide -This section shows the complete flow from database creation to a working -pipeline query. +This section shows the complete flow from database creation to a +working pipeline query. -### Step 1 — Create the Database +### Step 1 - Create the Database Include `scripts.post_database_create` to automatically provision the pgvector schema during database creation. This avoids any manual setup @@ -786,10 +675,10 @@ URL stays stable across container restarts. }' ``` -### Step 2 — Check the Database and Service Status +### Step 2 - Check the Database and Service Status -Run the following command after ~60–90 seconds to check the database is -ready and the RAG service is running: +Run the following command after approximately 60-90 seconds to check +that the database is ready and the RAG service is running: === "curl" @@ -797,14 +686,14 @@ ready and the RAG service is running: curl -s http://host-1:3000/v1/databases/knowledge-base ``` -In the response, look for two things: +In the response, look for the following items: -- `state: "available"` at the top level — the database is provisioned - and healthy -- `service_ready: true` inside `service_instances[].status` — the RAG - container is up and accepting requests +- `state: "available"` at the top level - the database is provisioned + and healthy. +- `service_ready: true` inside `service_instances[].status` - the RAG + container is up and accepting requests. -``` +```text { state: "available" instances: [ @@ -835,17 +724,18 @@ In the response, look for two things: } ``` -The `host_port` value is the port to use when querying the RAG service. -If you used a fixed `port: 9200` in the service spec, this will always -be `9200`. +The `host_port` value is the port to use when querying the RAG +service. If you used a fixed `port: 9200` in the service spec, the +host port will always be `9200`. !!! tip - Use a fixed `port` value (e.g. `9200`) in the service spec rather than - `port: 0`. When `port: 0` is used, Docker assigns a random host port - that changes each time the RAG container is replaced (e.g. after an - API key update), requiring you to look up the new port each time. + Use a fixed `port` value (e.g. `9200`) in the service spec rather + than `port: 0`. When `port: 0` is used, Docker assigns a random + host port that changes each time the RAG container is replaced + (e.g. after an API key update), requiring you to look up the new + port each time. -### Step 3 — Load Documents +### Step 3 - Load Documents The RAG service needs documents with embeddings in the database before it can answer queries. The following Python script generates embeddings @@ -890,6 +780,9 @@ cur.close() conn.close() ``` +Install the dependencies and run the script with the following +commands: + ```bash pip install psycopg2-binary openai export OPENAI_API_KEY="sk-..." @@ -900,14 +793,16 @@ export DB_NAME="knowledge_base" python3 load_documents.py ``` -Verify documents were inserted: +To verify that documents were inserted, run the following query: ```bash psql "postgresql://admin:admin_password@host-1:5432/knowledge_base" \ -c "SELECT COUNT(*), COUNT(embedding) FROM documents_content_chunks;" ``` -### Step 4 — Query the Pipeline +### Step 4 - Query the Pipeline + +Send a query to the RAG service using the following command: ```bash curl -X POST http://host-1:9200/v1/pipelines/default \ @@ -918,7 +813,7 @@ curl -X POST http://host-1:9200/v1/pipelines/default \ }' ``` -A successful response: +A successful response looks like this: ```json { @@ -931,15 +826,16 @@ A successful response: } ``` -`sources` is only populated when `include_sources: true` is set in the -request. +`sources` is only populated when `include_sources: true` is set in +the request. -### Step 5 — Update the Service Config +### Step 5 - Update the Service Config -To update the service (for example, to rotate an API key or change the -LLM model), submit a `POST /v1/databases/{id}` with the complete updated -spec. The update endpoint requires all fields — include `database_name`, -`nodes`, `database_users`, and the full `services` array: +To update the service (for example, to rotate an API key or change +the LLM model), submit a `POST /v1/databases/{id}` with the complete +updated spec. The update endpoint requires all fields - include +`database_name`, `nodes`, `database_users`, and the full `services` +array: === "curl" @@ -1006,17 +902,19 @@ spec. The update endpoint requires all fields — include `database_name`, }' ``` -The RAG service container is replaced with the new configuration. Poll -the database status until `state` is `"available"` and `service_ready` -is `true` before sending queries. +The RAG service container is replaced with the new configuration. +Poll the database status until `state` is `"available"` and +`service_ready` is `true` before sending queries. ## Querying the RAG Service -Once the service is running, submit queries to retrieve answers based on -your documents. +Once the service is running, submit queries to retrieve answers based +on your documents. ### List Available Pipelines +To list all configured pipelines, send the following request: + === "curl" ```bash @@ -1025,6 +923,9 @@ your documents. ### Query a Pipeline +To submit a query to a pipeline, send a POST request with the query +text: + === "curl" ```bash @@ -1038,15 +939,19 @@ your documents. ### Request Fields +The following table describes the query request fields: + | Field | Type | Default | Description | |---|---|---|---| -| `query` | string | — | Required. The natural language question to answer. | +| `query` | string | - | Required. The natural language question to answer. | | `include_sources` | boolean | `false` | Return the source documents used to generate the answer. | -| `top_n` | integer | — | Override the pipeline's `top_n` for this request. | +| `top_n` | integer | - | Override the pipeline's `top_n` for this request. | | `stream` | boolean | `false` | Stream the answer as Server-Sent Events. | ### Response Format +A successful query response looks like this: + ```json { "answer": "RAG (Retrieval-Augmented Generation) improves LLM responses by retrieving relevant documents from your database before generating answers. This grounds the LLM in your specific data, reducing hallucinations and improving accuracy...", @@ -1061,148 +966,114 @@ your documents. } ``` -`sources` is only populated when `include_sources` is `true` in the request. - -The RAG service uses **hybrid search**, combining two complementary search -techniques that are merged using **Reciprocal Rank Fusion (RRF)**: - -1. **Vector Similarity Search**: Retrieves documents semantically similar to - the query using cosine distance on embeddings. -2. **BM25 Keyword Search**: Retrieves documents with exact keyword matches - using TF-IDF scoring. - -This combination ensures the LLM receives context that is both semantically -relevant and keyword-relevant. Documents appearing in both result sets receive -higher scores, naturally prioritizing highly-relevant results. - -### Search Configuration +`sources` is only populated when `include_sources` is `true` in the +request. -Configure search behavior in the pipeline: +The RAG service's hybrid search combines two complementary techniques, +merged using Reciprocal Rank Fusion (RRF): -```json -"search": { - "hybrid_enabled": true, - "vector_weight": 0.7 -} -``` +- vector similarity search, which retrieves documents semantically + similar to the query using cosine distance on embeddings. +- BM25 keyword search, which retrieves documents with exact keyword + matches using TF-IDF scoring. -| Parameter | Range | Description | -|-----------|-------|-------------| -| `hybrid_enabled` | `true` / `false` | Enable hybrid search (default: `true`). Set to `false` for vector-only search. | -| `vector_weight` | 0.0–1.0 | Weight for vector search vs BM25 (default: `0.5`). Higher values prioritize semantic relevance. | +This combination ensures the LLM receives context that is both +semantically relevant and keyword-relevant. Documents appearing in +both result sets receive higher scores, naturally prioritizing +highly-relevant results. ### Token Budget -The `token_budget` field controls how much context is sent to the LLM: - -- Documents are ranked and packed in order until the budget is exhausted -- The final document is truncated at a sentence boundary (not mid-word) - -Increase the budget to send more context, or decrease it to reduce LLM costs. - -## User-Managed Responsibilities - -You are responsible for: - -1. **Embedding Generation**: Using embedding provider APIs (OpenAI, Voyage AI, - Ollama) to generate vector embeddings for your documents -2. **Document Ingestion**: Loading document text and embeddings into the - `documents_content_chunks` table -3. **API Keys**: Providing credentials for embedding and LLM providers in the - service `config` -4. **Chunking Strategy**: Deciding how to split large documents for optimal - retrieval (e.g., 500-1000 character chunks with overlap) - -The Control Plane handles: - -1. **Schema Provisioning**: Automatically creating pgvector extension, tables, - and indexes via `scripts.post_database_create` during database creation -2. **Service Deployment**: Provisioning and managing the RAG container -3. **Database Credentials**: Automatically embedding the `connect_as` user's - credentials in the service configuration (credentials must be defined in - `database_users` during database creation) -4. **Health Monitoring**: Checking service health and restarting on failure +The `token_budget` field controls how much context is sent to the LLM. +The service ranks documents and packs them in order until the budget +is exhausted. The final document is truncated at a sentence boundary. +Increase the budget to send more context, or decrease it to reduce +LLM costs. ## Troubleshooting +The following sections describe common issues and how to resolve them. + ### About Automated Scripts -The `scripts.post_database_create` field executes SQL automatically during -database creation. Some important details: +The `scripts.post_database_create` field executes SQL automatically +during database creation. The following details apply: -- **Execution Timing**: Scripts run once, immediately after Spock is initialized -- **Transactional**: All statements execute within a single transaction -- **No Re-Execution**: If you update the database spec later, scripts are not re-run -- **Constraints**: Some SQL commands are not allowed within transactions: - - `VACUUM`, `ANALYZE` (use `REINDEX` instead) - - `CREATE INDEX CONCURRENTLY` - - `CREATE DATABASE`, `DROP DATABASE` +- Execution timing: scripts run once, immediately after Spock is + initialized +- Transactional: all statements execute within a single transaction +- No re-execution: if you update the database spec later, scripts are + not re-run +- Constraints: some SQL commands are not allowed within transactions, + including `VACUUM`, `ANALYZE`, `CREATE INDEX CONCURRENTLY`, + `CREATE DATABASE`, and `DROP DATABASE` -If a script fails during database creation, you can use `update-database` to -retry after fixing the problematic statement. +If a script fails during database creation, you can use +`update-database` to retry after fixing the problematic statement. ### Service Fails to Start -**Check database connectivity:** +To diagnose a service that fails to start, check database +connectivity and user permissions. + +To verify that the database is accessible, run the following command: ```bash -# From host, verify database is accessible psql -h localhost -U admin -d knowledge_base -c "SELECT 1" ``` -**Check user permissions:** +To verify that the service user exists and has table access, run the +following query: ```sql --- Verify the service user exists and has table access \du+ admin \dt documents_content_chunks ``` ### Poor Query Results -**Verify documents are loaded:** +To diagnose poor query results, verify that documents are loaded and +embeddings are present. + +To check document counts and embedding coverage, run the following +queries: ```sql --- Check document count SELECT COUNT(*) FROM documents_content_chunks; --- Verify embeddings exist SELECT COUNT(*) FROM documents_content_chunks WHERE embedding IS NOT NULL; ``` -**Inspect embedding quality:** +To find documents similar to a test query embedding, run the following +query: ```sql --- Find documents similar to a test query embedding SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) as similarity FROM documents_content_chunks ORDER BY similarity DESC LIMIT 5; ``` -**Try simpler queries:** - -Start with factual, keyword-based questions before complex analytical questions. +Start with factual, keyword-based questions before complex analytical +questions to verify that the pipeline is working correctly. ### Empty Context Window -If the RAG service returns limited context, the token budget may be exhausted. Increase it: +If the RAG service returns limited context, the token budget may be +exhausted. Increase the budget in the pipeline configuration: ```json "token_budget": 8000 ``` -Or store smaller, more focused document chunks. - -## Next Steps - -- Once you've validated the RAG service with manual documents, consider automating embedding generation -- Implement document versioning and updates for evolving knowledge bases -- Set up monitoring for query latency and answer quality -- Explore pgedge_vectorizer for automated chunking and embedding in high-volume scenarios +Alternatively, store smaller, more focused document chunks to fit more +context within the budget. ## Responsibility Summary +The following table summarizes which tasks are handled by the Control +Plane and which are your responsibility: + | Step | Who | How | |---|---|---| | Provision schema (pgvector, tables, indexes) | Control Plane | `scripts.post_database_create` in database spec | @@ -1213,9 +1084,15 @@ Or store smaller, more focused document chunks. | Load documents into table | You | `INSERT` using psycopg2 or any Postgres client | | Submit queries | Your application | `POST /v1/pipelines/{name}` on the RAG service | -## Additional Resources +## Next Steps + +The following resources provide more information on related topics. -- [RAG Server Repository](https://github.com/pgEdge/pgedge-rag-server) -- [RAG Server Documentation](https://docs.pgedge.com/pgedge-rag-server/) -- [pgvector Documentation](https://github.com/pgvector/pgvector) -- [Managing Services](managing.md) +- The [Managing Services](managing.md) guide describes how to add, + update, and remove services. +- The [pgEdge RAG Server](https://github.com/pgEdge/pgedge-rag-server) + repository contains the pgEdge RAG Server source code. +- The [pgEdge RAG Server Documentation](https://docs.pgedge.com/pgedge-rag-server/) + covers the pgEdge RAG Server API and configuration in detail. +- The [pgvector Documentation](https://github.com/pgvector/pgvector) + explains how to install and use the pgvector extension. From 9575b8efa54a0cec1f4fb0dfc9f86c40d75e0726 Mon Sep 17 00:00:00 2001 From: Siva Date: Wed, 22 Apr 2026 20:04:19 +0530 Subject: [PATCH 04/11] docs: resolve index.md conflict and apply stylesheet - Resolve index.md merge conflict: keep RAG Server link, adopt main's connect_as-based Database Credentials section and updated Next Steps - Apply pgEdge stylesheet to rag.md: 79-char wrap, hyphens for em-dashes, table intro sentences, bullet periods, no bold headings, Next Steps as doc links - Remove redundant sections: Automation & Responsibilities, Loading Documents (duplicated in Step 3), User-Managed Responsibilities, Search Configuration (duplicated in Config Reference) PLAT-495 Co-Authored-By: Claude Sonnet 4.6 --- docs/services/index.md | 18 +++++++++--------- docs/services/rag.md | 8 ++++---- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/services/index.md b/docs/services/index.md index 406ad457..acc94367 100644 --- a/docs/services/index.md +++ b/docs/services/index.md @@ -84,15 +84,15 @@ creating one instance per host for redundancy: ## Database Credentials -Each service instance is automatically provisioned with two dedicated -database users. The Control Plane manages these credentials; you do not -need to create or rotate them manually. The credentials are: - -- `svc_{service_id}_ro` is a read-only user with read access to the - database; this user is the default for most service types. -- `svc_{service_id}_rw` is a read-write user with read and write access - to the database; this user is provisioned when the service needs - read/write access. +Each service connects to the database as a user you specify with the +`connect_as` field. The `connect_as` value must reference a username +in your `database_users` array. The Control Plane uses those +credentials to generate the service's connection string and to configure +any required role grants (for example, granting the anonymous role to +a PostgREST authenticator). + +You own and manage the `connect_as` user. Removing a service does not +drop the underlying database user. ## Next Steps diff --git a/docs/services/rag.md b/docs/services/rag.md index c8b91141..f7f251f4 100644 --- a/docs/services/rag.md +++ b/docs/services/rag.md @@ -1000,13 +1000,13 @@ The `scripts.post_database_create` field executes SQL automatically during database creation. The following details apply: - Execution timing: scripts run once, immediately after Spock is - initialized -- Transactional: all statements execute within a single transaction + initialized. +- Transactional: all statements execute within a single transaction. - No re-execution: if you update the database spec later, scripts are - not re-run + not re-run. - Constraints: some SQL commands are not allowed within transactions, including `VACUUM`, `ANALYZE`, `CREATE INDEX CONCURRENTLY`, - `CREATE DATABASE`, and `DROP DATABASE` + `CREATE DATABASE`, and `DROP DATABASE`. If a script fails during database creation, you can use `update-database` to retry after fixing the problematic statement. From 83ce64331ba9312dde3afbcde9b7591872d1cbfb Mon Sep 17 00:00:00 2001 From: Siva Date: Wed, 22 Apr 2026 20:11:45 +0530 Subject: [PATCH 05/11] docs: revert unintended MCP description change in index.md Restore main's trimmed MCP description; our PR only adds the RAG Server entry. PLAT-495 Co-Authored-By: Claude Sonnet 4.6 --- docs/services/index.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/services/index.md b/docs/services/index.md index 1f25ebb5..3adb2489 100644 --- a/docs/services/index.md +++ b/docs/services/index.md @@ -7,8 +7,7 @@ specify with the `connect_as` field. The Control Plane supports the following service types: - The [pgEdge Postgres MCP Server](mcp.md) connects AI agents and - LLM-powered applications to your database, enabling natural language - queries and AI-powered data access. + LLM-powered applications to your database. - The [pgEdge RAG Server](rag.md) enables retrieval-augmented generation workflows using your database as a knowledge store, returning LLM-synthesized answers grounded in your data. From 7f295586658da015cab5c298924eb0972680de9b Mon Sep 17 00:00:00 2001 From: Siva Date: Wed, 22 Apr 2026 22:31:23 +0530 Subject: [PATCH 06/11] docs: fix RRF score values in RAG response examples Score values like 0.82/0.87 implied cosine similarity but the RAG service returns RRF scores which are much smaller (~0.008). Update both example responses to use realistic RRF score values. PLAT-495 Co-Authored-By: Claude Sonnet 4.6 --- docs/services/rag.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/services/rag.md b/docs/services/rag.md index f7f251f4..ee3ad2b4 100644 --- a/docs/services/rag.md +++ b/docs/services/rag.md @@ -819,8 +819,8 @@ A successful response looks like this: { "answer": "Multi-active replication allows multiple PostgreSQL nodes to accept writes simultaneously...", "sources": [ - {"id": "5", "content": "...", "score": 0.82}, - {"id": "1", "content": "...", "score": 0.79} + {"id": "5", "content": "...", "score": 0.00820}, + {"id": "1", "content": "...", "score": 0.00806} ], "tokens_used": 1243 } @@ -959,7 +959,7 @@ A successful query response looks like this: { "id": "42", "content": "The RAG service enables retrieval-augmented generation workflows...", - "score": 0.87 + "score": 0.00820 } ], "tokens_used": 1243 From 2d4c6c055557e82c34507e5fe3d3f73950ec74c2 Mon Sep 17 00:00:00 2001 From: Siva Date: Thu, 23 Apr 2026 00:25:54 +0530 Subject: [PATCH 07/11] addressing review comments --- docs/services/rag.md | 40 +++++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/docs/services/rag.md b/docs/services/rag.md index ee3ad2b4..ceef5999 100644 --- a/docs/services/rag.md +++ b/docs/services/rag.md @@ -105,10 +105,10 @@ The following table describes the embedding configuration fields: | Field | Type | Description | |---|---|---| -| `provider` | string | Required. The embedding provider. One of: `openai`, `voyage`, `anthropic`, `ollama`. | +| `provider` | string | Required. The embedding provider. One of: `openai`, `voyage`, `ollama`. | | `model` | string | Required. The embedding model name (e.g., `text-embedding-3-small`, `voyage-3`, `nomic-embed-text`). | -| `api_key` | string | API key for the provider. Required for `openai`, `voyage`, and `anthropic`. Not used for `ollama`. | -| `base_url` | string | Optional. Custom base URL for the provider API. For `ollama`, defaults to `http://localhost:11434`. | +| `api_key` | string | API key for the provider. Required for `openai` and `voyage`. Not used for `ollama`. | +| `base_url` | string | Optional. Custom base URL for the provider API. Required for `ollama` — set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). | ### LLM Configuration @@ -123,7 +123,7 @@ The following table describes the LLM configuration fields: | `provider` | string | Required. The LLM provider. One of: `anthropic`, `openai`, `ollama`. | | `model` | string | Required. The model name (e.g., `claude-sonnet-4-5`, `gpt-4o`, `llama3.2`). | | `api_key` | string | API key for the provider. Required for `anthropic` and `openai`. Not used for `ollama`. | -| `base_url` | string | Optional. Custom base URL for API gateway routing. For `ollama`, defaults to `http://localhost:11434`. | +| `base_url` | string | Optional. Custom base URL for API gateway routing. Required for `ollama` — set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). | !!! note If `embedding_llm` and `rag_llm` share the same provider and both @@ -215,17 +215,20 @@ following table shows common models and their vector dimensions: | OpenAI | `text-embedding-3-small` | 1536 | | OpenAI | `text-embedding-3-large` | 3072 | | Voyage AI | `voyage-3` / `voyage-3-large` | 1024 | -| Ollama | Varies by model | Check model documentation | +| Ollama | `nomic-embed-text` | 768 | +| Ollama | Other models | Check model documentation | ## Examples The following examples show how to configure the RAG service for common use cases. The first example includes the complete `scripts.post_database_create` setup to automatically provision the -database schema (pgvector extension, tables, and indexes). Subsequent -examples focus on service configuration variations and omit the schema -setup for brevity - in production, always include the schema setup -from the first example. +database schema (pgvector extension, tables, and indexes) using +`vector(1536)` for OpenAI embeddings. Subsequent examples focus on +service configuration variations and omit the schema setup for brevity. +If you use a different embedding model, adjust the `vector(N)` dimension +in your schema to match - for example, `vector(1024)` for `voyage-3` or +`vector(768)` for `nomic-embed-text`. ### Minimal (OpenAI + Anthropic) @@ -238,7 +241,7 @@ with OpenAI for embeddings and Anthropic Claude for answer generation: curl -X POST http://host-1:3000/v1/databases \ -H 'Content-Type: application/json' \ --data '{ - "id": "knowledge_base", + "id": "knowledge-base", "spec": { "database_name": "knowledge_base", "database_users": [ @@ -313,7 +316,7 @@ answer generation: curl -X POST http://host-1:3000/v1/databases \ -H 'Content-Type: application/json' \ --data '{ - "id": "knowledge_base", + "id": "knowledge-base", "spec": { "database_name": "knowledge_base", "database_users": [ @@ -377,7 +380,7 @@ matching): curl -X POST http://host-1:3000/v1/databases \ -H 'Content-Type: application/json' \ --data '{ - "id": "knowledge_base", + "id": "knowledge-base", "spec": { "database_name": "knowledge_base", "database_users": [ @@ -421,8 +424,7 @@ matching): "api_key": "sk-ant-..." }, "search": { - "hybrid_enabled": false, - "vector_weight": 1.0 + "hybrid_enabled": false } } ] @@ -445,7 +447,7 @@ required; the Ollama server URL is provided via `base_url`: curl -X POST http://host-1:3000/v1/databases \ -H 'Content-Type: application/json' \ --data '{ - "id": "knowledge_base", + "id": "knowledge-base", "spec": { "database_name": "knowledge_base", "database_users": [ @@ -508,7 +510,7 @@ and `top_n` values set at the `defaults` level: curl -X POST http://host-1:3000/v1/databases \ -H 'Content-Type: application/json' \ --data '{ - "id": "knowledge_base", + "id": "knowledge-base", "spec": { "database_name": "knowledge_base", "database_users": [ @@ -1022,11 +1024,11 @@ To verify that the database is accessible, run the following command: psql -h localhost -U admin -d knowledge_base -c "SELECT 1" ``` -To verify that the service user exists and has table access, run the -following query: +To verify that the service user (`app_read_only`) exists and has table +access, run the following query: ```sql -\du+ admin +\du+ app_read_only \dt documents_content_chunks ``` From 69c87addac081a6e5efcf9fc99c22b9fc23dd8c2 Mon Sep 17 00:00:00 2001 From: Siva Date: Thu, 23 Apr 2026 00:38:24 +0530 Subject: [PATCH 08/11] addressing pgedge-skill docs review comments --- docs/services/rag.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/services/rag.md b/docs/services/rag.md index ceef5999..77ce518b 100644 --- a/docs/services/rag.md +++ b/docs/services/rag.md @@ -13,11 +13,11 @@ project. The Control Plane provisions a RAG service container on each specified host. The service connects to the database using an existing user specified in the `connect_as` field, which must be defined in -`database_users`. The credentials are automatically embedded in the -service configuration by the Control Plane. Client applications submit -natural language queries to the service, which performs hybrid vector -and keyword search against document tables and returns LLM-synthesized -answers with source citations. +`database_users`, and automatically embeds that user's credentials in +the service configuration. Client applications submit natural language +queries to the service, which performs hybrid vector and keyword search +against document tables and returns LLM-synthesized answers with source +citations. See [Managing Services](managing.md) for instructions on adding, updating, and removing services. The sections below cover RAG-specific @@ -108,7 +108,7 @@ The following table describes the embedding configuration fields: | `provider` | string | Required. The embedding provider. One of: `openai`, `voyage`, `ollama`. | | `model` | string | Required. The embedding model name (e.g., `text-embedding-3-small`, `voyage-3`, `nomic-embed-text`). | | `api_key` | string | API key for the provider. Required for `openai` and `voyage`. Not used for `ollama`. | -| `base_url` | string | Optional. Custom base URL for the provider API. Required for `ollama` — set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). | +| `base_url` | string | Optional. Custom base URL for the provider API. Required for `ollama` - set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). | ### LLM Configuration @@ -123,7 +123,7 @@ The following table describes the LLM configuration fields: | `provider` | string | Required. The LLM provider. One of: `anthropic`, `openai`, `ollama`. | | `model` | string | Required. The model name (e.g., `claude-sonnet-4-5`, `gpt-4o`, `llama3.2`). | | `api_key` | string | API key for the provider. Required for `anthropic` and `openai`. Not used for `ollama`. | -| `base_url` | string | Optional. Custom base URL for API gateway routing. Required for `ollama` — set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). | +| `base_url` | string | Optional. Custom base URL for API gateway routing. Required for `ollama` - set this to the network-accessible address of your Ollama server (e.g., `http://192.168.1.10:11434`). | !!! note If `embedding_llm` and `rag_llm` share the same provider and both @@ -1002,13 +1002,13 @@ The `scripts.post_database_create` field executes SQL automatically during database creation. The following details apply: - Execution timing: scripts run once, immediately after Spock is - initialized. -- Transactional: all statements execute within a single transaction. + initialized +- Transactional: all statements execute within a single transaction - No re-execution: if you update the database spec later, scripts are - not re-run. + not re-run - Constraints: some SQL commands are not allowed within transactions, including `VACUUM`, `ANALYZE`, `CREATE INDEX CONCURRENTLY`, - `CREATE DATABASE`, and `DROP DATABASE`. + `CREATE DATABASE`, and `DROP DATABASE` If a script fails during database creation, you can use `update-database` to retry after fixing the problematic statement. From 0dcab019dc05646d31896137cb19108f862b0c89 Mon Sep 17 00:00:00 2001 From: Siva Date: Thu, 23 Apr 2026 00:44:02 +0530 Subject: [PATCH 09/11] addressing AI review comments --- docs/services/rag.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/services/rag.md b/docs/services/rag.md index 77ce518b..04c99f33 100644 --- a/docs/services/rag.md +++ b/docs/services/rag.md @@ -920,7 +920,7 @@ To list all configured pipelines, send the following request: === "curl" ```bash - curl http://localhost:9200/v1/pipelines + curl http://host-1:9200/v1/pipelines ``` ### Query a Pipeline @@ -931,7 +931,7 @@ text: === "curl" ```bash - curl -X POST http://localhost:9200/v1/pipelines/default \ + curl -X POST http://host-1:9200/v1/pipelines/default \ -H "Content-Type: application/json" \ -d '{ "query": "How does RAG improve LLM responses?", @@ -1021,7 +1021,7 @@ connectivity and user permissions. To verify that the database is accessible, run the following command: ```bash -psql -h localhost -U admin -d knowledge_base -c "SELECT 1" +psql -h host-1 -U admin -d knowledge_base -c "SELECT 1" ``` To verify that the service user (`app_read_only`) exists and has table @@ -1088,7 +1088,7 @@ Plane and which are your responsibility: ## Next Steps -The following resources provide more information on related topics. +The following resources provide more information on related topics: - The [Managing Services](managing.md) guide describes how to add, update, and remove services. From c8668ef0340e1ef58d1fce9fe6022df585fdcf51 Mon Sep 17 00:00:00 2001 From: susan-pgedge <130390403+susan-pgedge@users.noreply.github.com> Date: Thu, 23 Apr 2026 11:53:18 -0400 Subject: [PATCH 10/11] Refine language and formatting in rag.md --- docs/services/rag.md | 46 +++++++++++++++++++++----------------------- 1 file changed, 22 insertions(+), 24 deletions(-) diff --git a/docs/services/rag.md b/docs/services/rag.md index 04c99f33..070e997f 100644 --- a/docs/services/rag.md +++ b/docs/services/rag.md @@ -28,10 +28,10 @@ configuration. Before deploying a RAG service, your PostgreSQL database must have the following items configured: -- pgvector extension installed and enabled. -- document tables with text and vector columns. -- HNSW index on vector columns for fast similarity search. -- GIN index on text columns for keyword search (BM25). +- The pgvector extension must be installed and enabled. +- The database must have document tables with text and vector columns. +- An HNSW index on vector columns enables fast similarity search. +- A GIN index on text columns enables keyword search (BM25). The Control Plane can automatically provision all of these during database creation using the `scripts.post_database_create` hook. See @@ -78,7 +78,7 @@ In this example, `app_read_only` must be defined in `database_users`: The `pipelines` array (required) defines one or more RAG workflows. Each pipeline specifies which tables to search, which embedding -provider to use, and which LLM to use for answer generation. +provider to use, and which LLM to use to generate answers. The following table describes the pipeline configuration fields: @@ -233,7 +233,7 @@ in your schema to match - for example, `vector(1024)` for `voyage-3` or ### Minimal (OpenAI + Anthropic) In the following example, a `curl` command provisions a RAG service -with OpenAI for embeddings and Anthropic Claude for answer generation: +that uses OpenAI for embeddings and Anthropic Claude to generate answers: === "curl" @@ -307,8 +307,8 @@ with OpenAI for embeddings and Anthropic Claude for answer generation: ### OpenAI End-to-End -In the following example, OpenAI is used for both embeddings and -answer generation: +In the following example, OpenAI is used for both embeddings and to generate +answers: === "curl" @@ -690,10 +690,10 @@ that the database is ready and the RAG service is running: In the response, look for the following items: -- `state: "available"` at the top level - the database is provisioned - and healthy. -- `service_ready: true` inside `service_instances[].status` - the RAG - container is up and accepting requests. +- The `state: "available"` field at the top level confirms that the + database is provisioned and healthy. +- The `service_ready: true` field inside `service_instances[].status` + confirms that the RAG container is up and accepting requests. ```text { @@ -974,10 +974,10 @@ request. The RAG service's hybrid search combines two complementary techniques, merged using Reciprocal Rank Fusion (RRF): -- vector similarity search, which retrieves documents semantically - similar to the query using cosine distance on embeddings. -- BM25 keyword search, which retrieves documents with exact keyword - matches using TF-IDF scoring. +- Vector similarity search retrieves documents semantically similar to + the query using cosine distance on embeddings. +- BM25 keyword search retrieves documents with exact keyword matches + using TF-IDF scoring. This combination ensures the LLM receives context that is both semantically relevant and keyword-relevant. Documents appearing in @@ -1001,14 +1001,12 @@ The following sections describe common issues and how to resolve them. The `scripts.post_database_create` field executes SQL automatically during database creation. The following details apply: -- Execution timing: scripts run once, immediately after Spock is - initialized -- Transactional: all statements execute within a single transaction -- No re-execution: if you update the database spec later, scripts are - not re-run -- Constraints: some SQL commands are not allowed within transactions, - including `VACUUM`, `ANALYZE`, `CREATE INDEX CONCURRENTLY`, - `CREATE DATABASE`, and `DROP DATABASE` +| Property | Details | +|---|---| +| Execution timing | Scripts run once, immediately after Spock is initialized. | +| Transactional | All statements execute within a single transaction. | +| No re-execution | If you update the database spec later, scripts are not re-run. | +| Constraints | Some SQL commands are not allowed within transactions, including `VACUUM`, `ANALYZE`, `CREATE INDEX CONCURRENTLY`, `CREATE DATABASE`, and `DROP DATABASE`. | If a script fails during database creation, you can use `update-database` to retry after fixing the problematic statement. From 68c9d5306398ebb98b618bedbcca9137bd0ad322 Mon Sep 17 00:00:00 2001 From: susan-pgedge <130390403+susan-pgedge@users.noreply.github.com> Date: Thu, 23 Apr 2026 11:56:32 -0400 Subject: [PATCH 11/11] Update documentation for shared defaults in pipelines Clarified wording in the documentation regarding shared default values for pipelines. --- docs/services/rag.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/services/rag.md b/docs/services/rag.md index 070e997f..55e439eb 100644 --- a/docs/services/rag.md +++ b/docs/services/rag.md @@ -501,8 +501,9 @@ required; the Ollama server URL is provided via `base_url`: ### Multiple Pipelines with Shared Defaults -In the following example, two pipelines share default `token_budget` -and `top_n` values set at the `defaults` level: +In the following example, two pipelines share default values for +`token_budget` and `top_n`, set with the `defaults` properties: + === "curl"