docs: strengthen RAG schema and guidance by tsivaprasad · Pull Request #374 · pgEdge/control-plane

tsivaprasad · 2026-04-30T19:47:58Z

Summary

Strengthens the RAG service documentation with clearer schema requirements and guidance, based on issues observed during testing.

Changes

Set embedding vector(N) NOT NULL in all table schema examples to prevent NULL embeddings at the database level
Document that non-superuser connect_as users require GRANT SELECT on document tables — omitting this causes the service to silently return "No relevant information found" for all queries
Replace the \dt table check with has_table_privilege() in the "Service Fails to Start" troubleshooting section
Update the "Poor Query Results" section to explain that NULL embeddings cause silent vector search failures and show how to find and remove them on existing tables

Testing

Documentation-only change; no code changes.

Checklist

Documentation updated (if needed)

Notes for Reviewers

Two silent failure modes documented here were discovered during RAG service testing:

NULL embeddings — inserting rows without the embedding column causes the vector search to crash internally and return empty results with no visible error. The NOT NULL constraint on the schema prevents this. The corresponding RAG server fix (filtering NULL rows in the SQL query) is in a separate PR in the pgedge-rag-server repo.
Missing SELECT privilege — if connect_as is a non-superuser, the RAG container connects successfully but gets permission denied on the table, which also surfaces as "No relevant information found" with no indication of the cause.

coderabbitai · 2026-04-30T19:48:12Z

📝 Walkthrough

Walkthrough

Updated RAG documentation to enforce database permissions and schema constraints. Changed the embedding column from nullable to NOT NULL, added guidance on granting SELECT permissions to connect_as users, and enhanced troubleshooting steps with has_table_privilege verification and NULL embedding detection.

Changes

Cohort / File(s)	Summary
RAG Documentation Updates `docs/services/rag.md`	Enforced database permissions by adding explicit `GRANT SELECT` requirements for non-superuser `connect_as` users; changed `documents_content_chunks.embedding` column to `NOT NULL` across all schema examples; enhanced troubleshooting with `has_table_privilege` verification, conditional grant remediation, and checks for existing `NULL` embeddings; refined prose and formatting.

Poem

🐰 With permissions now declared so clear,
And embeddings no longer we fear,
Schema constraints keep vectors true,
Troubleshooting knows just what to do,
No silent "No relevant" refrain—
Your RAG now runs like a rabbit's brain! 🌟

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: strengthening RAG schema and guidance documentation with clearer requirements and operational improvements.
Description check	✅ Passed	The description includes all required sections from the template (Summary, Changes, Testing, Checklist) with comprehensive details about schema changes, permission requirements, and silent failure modes.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch PLAT-584-rag-service-returns-no-relevant-information-found-on-nodes-added-via-database-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codacy-production · 2026-04-30T19:48:46Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/services/rag.md (1)
1025-1049: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

"Service Fails to Start" may be misleading for SELECT-privilege issues.

Based on the provided server preflight behavior (role-existence only), lacking GRANT SELECT on document tables is more likely to cause empty/“No relevant information found” query results than an outright failure to start. Since you also added a separate “Poor Query Results” section for NULL embeddings, it would help to either:

rename/scope this section to “connection/user/privilege issues affecting queries”, or

add a short note that SELECT-privilege problems typically surface during querying.
Suggested edit
-### Service Fails to Start
+### Service Fails to Start (or Returns Empty Results)
 
 To diagnose a service that fails to start, check database
 connectivity and user permissions.
+
+Note: If the `connect_as` role exists but lacks `GRANT SELECT` on
+the document table(s), the service may still start—queries can
+return `"No relevant information found"` instead.
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @docs/services/rag.md around lines 1025 - 1049, The "Service Fails to Start"
section heading is misleading because the preflight only checks role existence
and missing SELECT privileges usually cause empty/"No relevant information
found" query results rather than startup failure; update the section title
and/or content in docs/services/rag.md (the "Service Fails to Start" heading and
its paragraphs) to either rename it to something like "Connection / User /
Privilege Issues Affecting Queries" or add a short clarifying sentence under the
existing heading stating that missing SELECT privileges on
documents_content_chunks typically surface as poor/empty query results during
runtime rather than preventing the service from starting, and keep the existing
diagnostic commands (\du+ example and has_table_privilege example) as-is.
</details>

</blockquote></details>

</blockquote></details>
🧹 Nitpick comments (4)
docs/services/rag.md (4)
77-87: ⚡ Quick win

Clarify GRANT SELECT guidance: apply to each configured document table and qualify schema/table names.

The new “connect_as user must have SELECT privilege on each document table” section is good, but the example SQL hard-codes documents_content_chunks without a schema and doesn’t explicitly tie the GRANT target to pipelines[].tables[].table (which is what users will actually configure).

Consider updating the snippet/prose to:

mention you must grant on every table listed under pipelines[].tables[], and

use a schema-qualified table name (or at least instruct users to adjust based on their schema).
Suggested edit
 If the `connect_as` user is not a database owner or superuser, it must
 have `SELECT` privilege on each document table. Grant this privilege in
 the `scripts.post_database_create` array:
 
 ```sql
-GRANT SELECT ON documents_content_chunks TO app_read_only;
+-- Grant SELECT on EVERY table referenced by `pipelines[].tables[]`.
+-- If your tables are not in `public`, qualify the schema accordingly.
+GRANT SELECT ON public.documents_content_chunks TO app_read_only;
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @docs/services/rag.md around lines 77 - 87, Update the guidance to state that
the GRANT must be applied to every table listed under pipelines[].tables[].table
and to use schema-qualified names; replace the hard-coded example by explaining
to run a GRANT SELECT on each configured table (e.g., GRANT SELECT ON
.<table_name> TO <connect_as_user>) and add a short note suggesting to
replace with public if using the default schema.
</details>

---

`1050-1073`: _⚡ Quick win_

**DELETE remediation is destructive: add a short safety/capacity note.**

The “Poor Query Results” section proposes:
- `DELETE FROM documents_content_chunks WHERE embedding IS NULL;`

That’s correct for remediation, but it can be disruptive on large tables (locks) and is irreversible without backups. A brief note like “back up / test in staging / consider batching for large tables” would reduce operator risk.

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @docs/services/rag.md around lines 1050 - 1073, Update the "Poor Query
Results" remediation that suggests DELETE FROM documents_content_chunks WHERE
embedding IS NULL to include a short safety/capacity note: advise taking a
backup or testing in staging first, consider running a SELECT COUNT(*) check
and/or deleting in small batched transactions (or using LIMIT with repeated
deletes) to avoid long locks, and remind operators that deletes are irreversible
without backups. Reference the existing section title "Poor Query Results" and
the DELETE statement so the note is inserted immediately after that remediation.
</details>

---

`1036-1048`: _⚡ Quick win_

**Make `has_table_privilege()` example robust: schema-qualify (or cast to regclass).**

`SELECT has_table_privilege('app_read_only', 'documents_content_chunks', 'SELECT');` assumes the table is discoverable via the database search_path. If the document tables live outside `public`, this example can yield `false` (or error), confusing users.

Prefer `public.documents_content_chunks` (or instruct users to use their schema), or use `::regclass`.

<details>
<summary>Suggested edit</summary>

```diff
-SELECT has_table_privilege('app_read_only', 'documents_content_chunks', 'SELECT');
+-- Use schema-qualified table name to avoid search_path issues
+SELECT has_table_privilege('app_read_only', 'public.documents_content_chunks', 'SELECT');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/services/rag.md` around lines 1036 - 1048, Update the
has_table_privilege example to avoid search_path issues by schema-qualifying the
table name or casting it to regclass: change the call to use either
'public.documents_content_chunks' (or the user's schema) or
'documents_content_chunks'::regclass when calling has_table_privilege for the
user 'app_read_only' (referencing has_table_privilege, documents_content_chunks,
app_read_only) so the check reliably returns the correct result regardless of
search_path.
193-206: ⚡ Quick win

Embedding NOT NULL change: add an explicit “ALTER TABLE ... SET NOT NULL” migration step.

You’ve already documented how to find and delete existing NULL embeddings in “Poor Query Results”, and the examples set embedding vector(N) NOT NULL going forward. What’s missing is the final step for existing deployments: after cleanup, users should be guided to enforce the constraint (otherwise NULLs can reappear later).
Suggested addition
 -- Remove rows missing embeddings
 DELETE FROM documents_content_chunks WHERE embedding IS NULL;
+
+-- Then enforce the constraint to prevent future NULL embeddings
+ALTER TABLE documents_content_chunks
+  ALTER COLUMN embedding SET NOT NULL;
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @docs/services/rag.md around lines 193 - 206, Add a final migration step that
enforces the non-null constraint on the embedding column after users have
cleaned up any NULL embeddings: create a migration that alters the
documents_content_chunks table to set the embedding column (embedding) to NOT
NULL, and run it only after confirming there are no rows with NULL embeddings so
the constraint will succeed.
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In @docs/services/rag.md:

Around line 1025-1049: The "Service Fails to Start" section heading is
misleading because the preflight only checks role existence and missing SELECT
privileges usually cause empty/"No relevant information found" query results
rather than startup failure; update the section title and/or content in
docs/services/rag.md (the "Service Fails to Start" heading and its paragraphs)
to either rename it to something like "Connection / User / Privilege Issues
Affecting Queries" or add a short clarifying sentence under the existing heading
stating that missing SELECT privileges on documents_content_chunks typically
surface as poor/empty query results during runtime rather than preventing the
service from starting, and keep the existing diagnostic commands (\du+ example
and has_table_privilege example) as-is.

Nitpick comments:
In @docs/services/rag.md:

Around line 77-87: Update the guidance to state that the GRANT must be applied
to every table listed under pipelines[].tables[].table and to use
schema-qualified names; replace the hard-coded example by explaining to run a
GRANT SELECT on each configured table (e.g., GRANT SELECT ON
.<table_name> TO <connect_as_user>) and add a short note suggesting to
replace with public if using the default schema.

Around line 1050-1073: Update the "Poor Query Results" remediation that
suggests DELETE FROM documents_content_chunks WHERE embedding IS NULL to include
a short safety/capacity note: advise taking a backup or testing in staging
first, consider running a SELECT COUNT(*) check and/or deleting in small batched
transactions (or using LIMIT with repeated deletes) to avoid long locks, and
remind operators that deletes are irreversible without backups. Reference the
existing section title "Poor Query Results" and the DELETE statement so the note
is inserted immediately after that remediation.

Around line 1036-1048: Update the has_table_privilege example to avoid
search_path issues by schema-qualifying the table name or casting it to
regclass: change the call to use either 'public.documents_content_chunks' (or
the user's schema) or 'documents_content_chunks'::regclass when calling
has_table_privilege for the user 'app_read_only' (referencing
has_table_privilege, documents_content_chunks, app_read_only) so the check
reliably returns the correct result regardless of search_path.

Around line 193-206: Add a final migration step that enforces the non-null
constraint on the embedding column after users have cleaned up any NULL
embeddings: create a migration that alters the documents_content_chunks table to
set the embedding column (embedding) to NOT NULL, and run it only after
confirming there are no rows with NULL embeddings so the constraint will
succeed.
</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `a4ac85c8-6380-4b58-9a55-f1f9c07d79ed`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 1e081a892622594a3142c7d093ce1fbd30bc7a7c and 1d2d32e1c2d184826d7ecf92bb7e9235c997b3b5.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `docs/services/rag.md`

</details>

</details>

docs: strengthen RAG schema and guidance

1d2d32e

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

mmols approved these changes May 4, 2026

View reviewed changes

tsivaprasad merged commit 76f2be6 into main May 4, 2026
3 checks passed

tsivaprasad deleted the PLAT-584-rag-service-returns-no-relevant-information-found-on-nodes-added-via-database-update branch May 4, 2026 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: strengthen RAG schema and guidance#374

docs: strengthen RAG schema and guidance#374
tsivaprasad merged 1 commit into
mainfrom
PLAT-584-rag-service-returns-no-relevant-information-found-on-nodes-added-via-database-update

tsivaprasad commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading

Walkthrough

Changes

Poem

Uh oh!

codacy-production Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tsivaprasad commented Apr 30, 2026

Summary

Changes

Testing

Checklist

Notes for Reviewers

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

Uh oh!

codacy-production Bot commented Apr 30, 2026

Up to standards ✅

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading