From 06457cb672e3d1493933f843f5997b1db3d6037c Mon Sep 17 00:00:00 2001
From: Francesco Vadicamo <f.vadicamo@gmail.com>
Date: Sun, 26 Apr 2026 18:55:20 +0000
Subject: [PATCH 1/2] docs(backlog): track v0.5.0 release-PR review feedback

Two new DEBT entries from Gemini review on the develop->main release PR
(#71). All medium/low severity, deferred from v0.5.0 release per
zero-new-commit policy on release PRs:

- DEBT-020 (medium): audit log skipped when request_id is missing on
  admin writes. patch_namespace_config and similar endpoints should
  synthesize a fallback uuid4 so NFR-007 holds even if the request-id
  middleware misbehaves.

- DEBT-021 (low): _fetch_document_names adds a separate Postgres
  round-trip per query. The WI-2 plan called for a JOIN with
  source_documents inside the existing chunk-fetch query; v0.5.0 opted
  for a clarity-first separate call, consolidation tracked here.

Three other Gemini comments on the same review map to existing items
(DEBT-018 for :root scope) or were false positives (claimed missing
imports already present at line 13). The four code-quality bot LOW
nitpicks ("self_inner" in nested test fakes) are style noise on
disambiguating closure-bound mocks; no entry created.
---
 .s2s/BACKLOG.md | 53 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/.s2s/BACKLOG.md b/.s2s/BACKLOG.md
index 0790542..78fef9b 100644
--- a/.s2s/BACKLOG.md
+++ b/.s2s/BACKLOG.md
@@ -720,6 +720,59 @@ In today's deploy there is one widget per page and init fires once, so the impac
 
 ---
 
+### DEBT-020: Audit log fallback when `request_id` is missing on admin writes
+
+**Status**: planned | **Priority**: medium | **Created**: 2026-04-26
+**Origin**: Gemini review on PR #71 (v0.5.0 release), `vektra-admin/src/vektra_admin/api.py:768-783`
+
+**Context**: `patch_namespace_config` writes the audit log only when `request.state.request_id` is truthy:
+
+```python
+request_id = getattr(request.state, "request_id", None)
+if request_id:
+    background_tasks.add_task(_audit.log_event, ...)
+```
+
+If the request-id middleware fails to set the attribute (or runs out of order), critical administrative events like namespace configuration changes are silently skipped from the audit log. This weakens NFR-007 (audit every sensitive content access). Other admin write endpoints (key CRUD, etc.) likely share the same pattern and should be audited together.
+
+**Proposed approach**:
+1. Synthesize a fallback `uuid4()` request_id when `request.state.request_id` is missing, so the audit log always fires.
+2. Emit a structlog warning in that path so middleware misconfiguration is observable.
+3. Sweep `vektra-admin/api.py` and `vektra-learn/api.py` for the same `if request_id:` pattern; apply the fix consistently.
+
+**Acceptance criteria**:
+- [ ] `patch_namespace_config` always writes an audit row, with a synthetic request_id when missing
+- [ ] Structlog warning emitted when the fallback path triggers
+- [ ] Same pattern applied to other admin write endpoints (POST /api-keys, DELETE /api-keys/{id}, etc.)
+- [ ] Unit test covers the fallback path
+
+---
+
+### DEBT-021: Replace `_fetch_document_names` round-trip with JOIN in chunk fetch
+
+**Status**: planned | **Priority**: low | **Created**: 2026-04-26
+**Origin**: Gemini review on PR #71 (v0.5.0 release), `vektra-core/src/vektra_core/pipeline.py:100-126`
+
+**Context**: WI-2 (FEAT-012) added `document_name` to source citations by introducing `_fetch_document_names()`, which performs a separate `SELECT id, filename, deleted_at FROM source_documents WHERE id = ANY(...)` after the main chunk fetch. This adds one Postgres round-trip per query.
+
+The original WI-2 plan suggested integrating the filename lookup into the existing `document_chunks` SELECT via a JOIN to `source_documents`. The v0.5.0 implementation chose the separate-call shape for clarity during milestone scope; consolidation is tracked here for future hardening.
+
+In current low-QPS deployments the extra round-trip is not measurable in user-facing latency. It becomes relevant under high concurrency.
+
+**Proposed approach**:
+1. Move filename resolution into the chunk-fetch query (in `vektra-index` or wherever `document_chunks` is queried) via `LEFT JOIN source_documents ON document_chunks.source_document_id = source_documents.id`.
+2. Return chunks already enriched with `document_name` and a `deleted_at` (or `archived` boolean) flag, so `_fetch_document_names` can be deleted.
+3. Preserve the `(archived)` suffix at the citation-rendering layer (API serialization).
+
+**Acceptance criteria**:
+- [ ] Single Postgres query covers chunk text + filename + archived flag
+- [ ] `_fetch_document_names()` removed from `vektra-core/pipeline.py`
+- [ ] Streaming and non-streaming pipelines both reflect the new shape
+- [ ] No regression in `(archived)` rendering for soft-deleted documents
+- [ ] Latency benchmark confirms one fewer round-trip per `/query`
+
+---
+
 ### INFRA-005: Docker log persistence across container restarts
 
 **Status**: planned | **Priority**: medium | **Created**: 2026-03-23

From 855bebdec592f579619d81e3a3de5d40de1b5885 Mon Sep 17 00:00:00 2001
From: Francesco Vadicamo <f.vadicamo@gmail.com>
Date: Sun, 26 Apr 2026 20:31:40 +0000
Subject: [PATCH 2/2] docs(backlog): broaden DEBT-020 scope and multi-provider
 DEBT-021
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Address two Gemini review comments on PR #72:

- DEBT-020 (3143979031): title "on admin writes" too narrow per NFR-007
  ("audit every sensitive content access"). Sensitive read endpoints like
  `get_conversation_turns` (vektra-admin/api.py:491-539) share the same
  `if request_id:` pattern. Renamed to "on sensitive endpoints" and
  expanded the sweep + acceptance criteria to cover both reads and writes.

- DEBT-021 (3143979036): "Single Postgres query" was Pgvector-centric and
  ignored the multi-provider architecture. Reframed as
  "provider-aligned filename resolution" with two explicit paths:
  (A) Postgres-side JOIN — provider-agnostic since chunks live in
  document_chunks for both pgvector and qdrant deployments,
  (B) Qdrant-native payload denormalization — follow-up for
  Qdrant-heavy deployments, separate ingest-side change.
---
 .s2s/BACKLOG.md | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/.s2s/BACKLOG.md b/.s2s/BACKLOG.md
index 78fef9b..5c77651 100644
--- a/.s2s/BACKLOG.md
+++ b/.s2s/BACKLOG.md
@@ -720,10 +720,10 @@ In today's deploy there is one widget per page and init fires once, so the impac
 
 ---
 
-### DEBT-020: Audit log fallback when `request_id` is missing on admin writes
+### DEBT-020: Audit log fallback when `request_id` is missing on sensitive endpoints
 
 **Status**: planned | **Priority**: medium | **Created**: 2026-04-26
-**Origin**: Gemini review on PR #71 (v0.5.0 release), `vektra-admin/src/vektra_admin/api.py:768-783`
+**Origin**: Gemini review on PR #71 (v0.5.0 release), `vektra-admin/src/vektra_admin/api.py:768-783`. Same pattern applies to sensitive reads, e.g. `get_conversation_turns` in `vektra-admin/src/vektra_admin/api.py:491-539`.
 
 **Context**: `patch_namespace_config` writes the audit log only when `request.state.request_id` is truthy:
 
@@ -733,43 +733,51 @@ if request_id:
     background_tasks.add_task(_audit.log_event, ...)
 ```
 
-If the request-id middleware fails to set the attribute (or runs out of order), critical administrative events like namespace configuration changes are silently skipped from the audit log. This weakens NFR-007 (audit every sensitive content access). Other admin write endpoints (key CRUD, etc.) likely share the same pattern and should be audited together.
+If the request-id middleware fails to set the attribute (or runs out of order), audit events are silently skipped. This weakens NFR-007 ("audit every sensitive content access"). The same `if request_id:` guard appears on sensitive **read** endpoints too (e.g. `GET /admin/conversations/{id}/turns`), which means both write and read paths leak audit coverage when the middleware misbehaves. The fix should sweep both classes of endpoint, not only writes.
 
 **Proposed approach**:
 1. Synthesize a fallback `uuid4()` request_id when `request.state.request_id` is missing, so the audit log always fires.
 2. Emit a structlog warning in that path so middleware misconfiguration is observable.
-3. Sweep `vektra-admin/api.py` and `vektra-learn/api.py` for the same `if request_id:` pattern; apply the fix consistently.
+3. Sweep `vektra-admin/api.py` and `vektra-learn/api.py` for the `if request_id:` pattern across **both read and write** sensitive endpoints; apply the fix consistently.
 
 **Acceptance criteria**:
 - [ ] `patch_namespace_config` always writes an audit row, with a synthetic request_id when missing
 - [ ] Structlog warning emitted when the fallback path triggers
-- [ ] Same pattern applied to other admin write endpoints (POST /api-keys, DELETE /api-keys/{id}, etc.)
-- [ ] Unit test covers the fallback path
+- [ ] Same pattern applied to sensitive endpoints across vektra-admin and vektra-learn, covering reads (e.g. `GET /admin/conversations/{id}/turns`) **and** writes (`POST /api-keys`, `DELETE /api-keys/{id}`, `PATCH /admin/namespaces/{id}/config`, etc.)
+- [ ] Unit test covers the fallback path on at least one read and one write endpoint
 
 ---
 
-### DEBT-021: Replace `_fetch_document_names` round-trip with JOIN in chunk fetch
+### DEBT-021: Replace `_fetch_document_names` round-trip with provider-aligned filename resolution
 
 **Status**: planned | **Priority**: low | **Created**: 2026-04-26
-**Origin**: Gemini review on PR #71 (v0.5.0 release), `vektra-core/src/vektra_core/pipeline.py:100-126`
+**Origin**: Gemini review on PR #71 (v0.5.0 release), `vektra-core/src/vektra_core/pipeline.py:100-126`. Multi-provider scoping refined per Gemini review on PR #72.
 
 **Context**: WI-2 (FEAT-012) added `document_name` to source citations by introducing `_fetch_document_names()`, which performs a separate `SELECT id, filename, deleted_at FROM source_documents WHERE id = ANY(...)` after the main chunk fetch. This adds one Postgres round-trip per query.
 
-The original WI-2 plan suggested integrating the filename lookup into the existing `document_chunks` SELECT via a JOIN to `source_documents`. The v0.5.0 implementation chose the separate-call shape for clarity during milestone scope; consolidation is tracked here for future hardening.
+The original WI-2 plan suggested integrating filename lookup into the existing chunk-fetch query. The v0.5.0 implementation chose the separate-call shape for clarity during milestone scope; consolidation is tracked here for future hardening.
 
 In current low-QPS deployments the extra round-trip is not measurable in user-facing latency. It becomes relevant under high concurrency.
 
+**Multi-provider note**: chunk metadata (text, source_document_id) lives in the Postgres `document_chunks` table regardless of vector store provider. Two valid optimization paths exist:
+
+- **Path A — Postgres-side JOIN** (works for both `pgvector` and `qdrant`): collapse the chunk fetch and filename lookup into a single SQL with `LEFT JOIN source_documents ON document_chunks.source_document_id = source_documents.id`. Saves one round-trip; provider-agnostic because chunks always live in Postgres.
+- **Path B — Provider-native payload denormalization** (Qdrant only): store `document_name` and a soft-delete flag inside the vector payload at ingest time. Search results carry the filename directly, eliminating the need for any Postgres lookup beyond the chunk text. Lower latency than path A for Qdrant deployments, at the cost of payload migration on document rename or soft-delete.
+
+Path A is sufficient for the immediate goal (one fewer round-trip). Path B is a follow-up for Qdrant-heavy deployments and requires a separate ingest-side change.
+
 **Proposed approach**:
-1. Move filename resolution into the chunk-fetch query (in `vektra-index` or wherever `document_chunks` is queried) via `LEFT JOIN source_documents ON document_chunks.source_document_id = source_documents.id`.
-2. Return chunks already enriched with `document_name` and a `deleted_at` (or `archived` boolean) flag, so `_fetch_document_names` can be deleted.
-3. Preserve the `(archived)` suffix at the citation-rendering layer (API serialization).
+1. Implement path A first: move filename + `deleted_at` resolution into the chunk-fetch query (in `vektra-index` or wherever `document_chunks` is queried). Delete `_fetch_document_names`.
+2. Preserve the `(archived)` suffix at the citation-rendering layer (API serialization), independent of the storage path.
+3. Optionally: under the Qdrant provider, follow up with path B (denormalize at ingest, sync on document rename / soft-delete) as a separate item.
 
 **Acceptance criteria**:
-- [ ] Single Postgres query covers chunk text + filename + archived flag
+- [ ] Chunk-fetch query covers text + filename + soft-delete flag in a single round-trip (path A)
 - [ ] `_fetch_document_names()` removed from `vektra-core/pipeline.py`
 - [ ] Streaming and non-streaming pipelines both reflect the new shape
 - [ ] No regression in `(archived)` rendering for soft-deleted documents
-- [ ] Latency benchmark confirms one fewer round-trip per `/query`
+- [ ] Latency benchmark confirms one fewer Postgres round-trip per `/query` regardless of vector store provider
+- [ ] Qdrant payload denormalization (path B) tracked as a follow-up if needed
 
 ---