From 4b47d89d71130c48c3aeba23184b1f9fb93f012f Mon Sep 17 00:00:00 2001 From: Francesco Vadicamo Date: Sat, 18 Apr 2026 12:28:34 +0000 Subject: [PATCH 01/55] docs(backlog): reconcile v0.4.0 completions and add v0.5 entries - mark FEAT-006, FEAT-007, FEAT-009, FEAT-010 completed (PR #50, 2026-03-23) - note FEAT-005 partial overlap with FEAT-020 hybrid grounding mode - add FEAT-022 suggested questions (quick-start chips) - add FEAT-023 socratic interaction mode (per-course learning style) --- .s2s/BACKLOG.md | 58 +++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 54 insertions(+), 4 deletions(-) diff --git a/.s2s/BACKLOG.md b/.s2s/BACKLOG.md index fa32efba..278f8ebf 100644 --- a/.s2s/BACKLOG.md +++ b/.s2s/BACKLOG.md @@ -1147,6 +1147,8 @@ Plan generation follows a three-phase approach (lesson learned from Phase 1): **Status**: draft | **Priority**: medium | **Created**: 2026-03-17 **Origin**: Moodle integration testing — greetings like "ciao", "buongiorno" trigger `no_relevant_context` and produce empty/unhelpful responses +**Note (2026-04-18)**: FEAT-020 (`grounding_mode=hybrid`) partially overlaps: in hybrid mode the pipeline no longer short-circuits on `no_relevant_context` and calls the LLM. However, the system prompt is generic (training-data fallback), not tailored to greetings/meta-questions/off-topic with an explicit "redirect to course" behavior. FEAT-005 remains open for the dedicated no-context system prompt variant described in the acceptance criteria. + **Context**: Both `SimpleQueryPipeline` and `AdvancedQueryPipeline` short-circuit when no chunks pass the relevance threshold (`min_relevance_score`): they return `answer: null` + `no_relevant_context: true` without ever calling the LLM. This is correct for retrieval quality (ARCH-056, REQ-066) — the system should not hallucinate answers from non-relevant chunks. However, for the learn chatbot widget (and any conversational interface), this creates a poor UX for: @@ -1179,7 +1181,7 @@ The existing **SafeguardHook** (`pre_query`, `post_retrieval`, `pre_response`) c ### FEAT-007: Markdown rendering in widget chat messages -**Status**: in_progress | **Priority**: medium | **Created**: 2026-03-20 +**Status**: completed | **Priority**: medium | **Created**: 2026-03-20 | **Completed**: 2026-03-23 | **PR**: #50 **Origin**: Moodle integration testing (2026-03-20) **Context**: The learn chatbot widget (`vektra-chat.js`) renders all messages as plain text via `textContent`. LLM responses typically contain Markdown formatting (bold, italic, lists, code blocks, headings) which is displayed as raw syntax. This makes responses harder to read, especially for structured answers with bullet points or code examples. @@ -1280,7 +1282,7 @@ If no custom template exists, the global system.j2 still has access to the same ### FEAT-006: Widget error feedback when Vektra API is unreachable -**Status**: draft | **Priority**: medium | **Created**: 2026-03-20 +**Status**: completed | **Priority**: medium | **Created**: 2026-03-20 | **Completed**: 2026-03-23 | **PR**: #50 **Origin**: Moodle integration testing on remote machine (2026-03-20) **Context**: When the chatbot widget JS (`vektra-chat.js`) cannot reach the Vektra API (missing SSH tunnel, CORS misconfiguration, Vektra container down), the floating chat button silently fails to appear. No error is shown to the user or the admin. The Moodle block still displays "AI Assistant is active" because the server-side token generation succeeded (PHP runs inside Docker, reaches Vektra on the internal network), but the browser-side widget cannot load or connect. @@ -1469,7 +1471,7 @@ Phase 2 (skip_retrieval flag): ### FEAT-009: Widget token auto-refresh on expiry -**Status**: draft | **Priority**: high | **Created**: 2026-03-20 +**Status**: completed | **Priority**: high | **Created**: 2026-03-20 | **Completed**: 2026-03-23 | **PR**: #50 **Origin**: Moodle integration testing - "invalid or expired dashboard token" after ~1h session **Context**: The JWT dashboard token has a 1h TTL (default). The token is generated server-side by the Moodle plugin (or any LMS) at page load and embedded in the widget via `data-token` attribute. Once expired, all subsequent queries fail with "signature has expired". The user must manually reload the page to get a fresh token. @@ -1498,7 +1500,7 @@ For Moodle specifically, the plugin would expose a lightweight AJAX endpoint (`/ ### FEAT-010: Enable SSE streaming in widget -**Status**: draft | **Priority**: medium | **Created**: 2026-03-20 +**Status**: completed | **Priority**: medium | **Created**: 2026-03-20 | **Completed**: 2026-03-23 | **PR**: #50 **Origin**: Moodle integration testing - responses arrive as a single block, no progressive rendering **Context**: The widget's api-client.js already has a complete SSE streaming parser (lines 65-107) with `onToken`, `onSources`, `onDone` callbacks. The chat-ui.js has `createStreamMessage()` and `appendToken()` methods that progressively append text to the DOM. However, the query is sent with `stream: false` (hardcoded, line 33), so all responses arrive as a single JSON blob. @@ -1548,6 +1550,54 @@ This does not violate REQ-051 if data is aggregated (no individual conversations --- +### FEAT-022: Suggested questions as quick-start chips in widget + +**Status**: draft | **Priority**: low | **Created**: 2026-04-18 +**Origin**: v0.5.0 scoping discussion - instructor wants to guide students toward typical questions without crafting a new conversation each time + +**Context**: students opening the chatbot often do not know how to start. A short list of instructor-curated prompts, rendered as clickable chips above the input box on first open, lowers the barrier and steers usage toward pedagogically useful questions (e.g., "Riassumi la lezione 3", "Quali sono i punti chiave del capitolo?", "Fammi un quiz su questo argomento"). + +**Proposed approach**: purely client-side, category (A) config (visual, no backend involvement). Passed via `data-suggested-questions` attribute on the script tag as a JSON array. The Moodle block config form exposes a textarea (one question per line) that the plugin serializes into the attribute. The widget renders chips on first open; clicking one fills the input and submits as if typed. + +**Traceability**: ADR-0025, FEAT-016 + +**Acceptance Criteria** (tentative): +- [ ] `data-suggested-questions` attribute parsed as JSON array of strings +- [ ] Chips rendered above the input on first open only (hidden after first message) +- [ ] Click fills input and submits +- [ ] Chips respect theme (light/dark) and primary color from FEAT-016 +- [ ] Missing/malformed attribute falls back to no chips (no error) +- [ ] Moodle block config exposes a textarea for instructor to edit the list + +--- + +### FEAT-023: Socratic interaction mode for guided learning + +**Status**: draft | **Priority**: medium | **Created**: 2026-04-18 +**Origin**: v0.5.0 scoping discussion - pedagogical need to differentiate "answer giver" from "learning guide" per course + +**Context**: in some courses the instructor wants the chatbot to act as a Socratic tutor — asking the student guiding questions instead of delivering the answer directly. This supports active learning and prevents the chatbot from becoming a shortcut that bypasses the learning process. Other courses (reference-style, FAQ-style) want the chatbot to give direct answers. The choice is per-course and must be configurable by the instructor. + +Implementation approach is deliberately deferred. Questions to resolve when designing this feature: +- Is Socratic mode a third `grounding_mode` value (alongside `strict` and `hybrid`), or an orthogonal `interaction_mode` dimension? +- How do the two interact when combined (strict + socratic, hybrid + socratic)? +- Does the Socratic prompt need worked examples or is a system-prompt instruction enough? +- How does it behave across multi-turn conversations (when does it "reveal" the answer)? +- Should the instructor configure the depth of Socratic questioning (light nudging vs full inquiry)? + +Storage aligns with the same model as grounding mode: persisted in `namespaces.config` JSONB (category B config, backend-enforced), configurable via the Moodle block config form → `PATCH /api/v1/namespaces/{id}/config`. The widget does not need to know — the difference is entirely in the system prompt selected server-side. + +**Traceability**: FEAT-020 (grounding mode), ADR-0020 (prompt template architecture) + +**Acceptance Criteria** (tentative, pending design): +- [ ] Per-namespace Socratic mode flag in `namespaces.config` +- [ ] System prompt variant that implements Socratic dialogue +- [ ] Works alongside strict/hybrid grounding modes +- [ ] Validated with representative course scenarios +- [ ] Instructor can enable/disable from Moodle block config + +--- + ## In Progress ### BUG-011: ~~Ingest pipeline does not generate sparse embeddings for hybrid search~~ From c6db57fbf87a916bab18589799a600dd117094e5 Mon Sep 17 00:00:00 2001 From: Francesco Vadicamo Date: Sat, 18 Apr 2026 14:23:33 +0000 Subject: [PATCH 02/55] docs(plans): add v0.5.0 implementation plan Covers widget production-ready items (FEAT-016, FEAT-012, FEAT-004) plus the instructor configuration backend endpoint. Splits course customization into two categories: visual (Moodle block config, data-attrs) and behavioral (namespaces.config JSONB, admin API via Moodle server-side). Implementation order: WI-3 (PATCH /namespaces/config) -> WI-2 (document name in citations) -> WI-1 (GET /conversations/{id}/turns JWT-scoped) -> WI-4 (widget data-attrs) -> WI-5 (widget conversation persistence). --- .../20260418-v050-widget-and-prof-config.md | 306 ++++++++++++++++++ 1 file changed, 306 insertions(+) create mode 100644 .s2s/plans/20260418-v050-widget-and-prof-config.md diff --git a/.s2s/plans/20260418-v050-widget-and-prof-config.md b/.s2s/plans/20260418-v050-widget-and-prof-config.md new file mode 100644 index 00000000..78ed8bdd --- /dev/null +++ b/.s2s/plans/20260418-v050-widget-and-prof-config.md @@ -0,0 +1,306 @@ +# Plan: v0.5.0 - Widget production-ready + instructor configuration + +**Status**: active +**Plan ID**: 20260418-v050-widget-and-prof-config +**Branch**: feat/v050-widget-and-prof-config +**Created**: 2026-04-18T12:40:00Z +**Updated**: 2026-04-18T12:40:00Z +**Source**: v0.5.0 milestone scoping discussion (2026-04-18) +**Source type**: milestone + +## Traceability + +- **Requirements**: REQ-049 (multi-turn conversations), REQ-053 (QueryPipeline Protocol), REQ-066 (no relevant context) +- **Architecture**: ARCH-047 (namespace metadata as JSONB), ARCH-063 (widget architecture), ARCH-054 (composable prompt templates) +- **Decisions**: ADR-0025 (learn chatbot widget as backend-served JS bundle), ADR-0020 (prompt template architecture), ADR-0010 (authentication gateway) +- **Backlog items**: FEAT-016, FEAT-012, FEAT-004 +- **Related (deferred)**: FEAT-008 (per-namespace prompt templates), FEAT-014 (source visibility), FEAT-022 (suggested questions), FEAT-023 (socratic mode) +- **Builds on**: FEAT-020 (grounding mode already reads from `namespaces.config`) +- **Dependencies**: none (all infrastructure in place) + +## Overview + +v0.5.0 closes the gap between "widget works" (v0.4.0) and "widget is production-ready for real courses". Three user-visible gaps remain: + +1. **White-label** (FEAT-016): every course looks identical. Universities need their own title, color, welcome message. +2. **Citation clarity** (FEAT-012): sources show `chunk_id` UUIDs — meaningless to students. Document filename must be shown. +3. **Conversation persistence** (FEAT-004): refresh or accidental tab close loses the conversation. Unacceptable for courses where students ask follow-up questions. + +Plus one foundational backend addition: **instructor configuration API** — professors need to toggle grounding mode (strict/hybrid) per course from inside Moodle without admin intervention. The `namespaces.config` JSONB column and pipeline resolution already exist (FEAT-020); only the write API and the Moodle integration are missing. + +**Repositories touched**: vektra-stack (this plan), vektra-moodle (separate plan in the plugin repo). + +**Not in scope**: FEAT-022 (suggested questions), FEAT-023 (socratic mode), FEAT-014 (source visibility toggle), FEAT-008 (per-namespace prompt templates), full admin UI for namespace config. These are deliberately deferred. + +## Architecture decisions for this plan + +### Two-category configuration model + +All course customization splits cleanly into two categories, persisted in different places: + +| Category | Examples | Storage | Read by | Write path | +|---|---|---|---|---| +| **(A) Visual** | title, color, welcome message, icon | Moodle block config | Widget via `data-*` attrs | Moodle block edit form → stays in Moodle | +| **(B) Behavioral** | `grounding_mode`, future `show_sources`, `top_k` override | `namespaces.config` JSONB | Vektra pipeline server-side | Moodle block edit form → AJAX → `PATCH /api/v1/namespaces/{id}/config` | + +Rationale: behavioral config must be server-enforced to prevent clients from bypassing (e.g., a browser can't be trusted to enforce strict mode). Visual config has no security implications and can stay in the host plugin. + +### Authentication model for namespace PATCH + +Same trust model already used for token generation (FEAT-003): + +1. Moodle plugin holds `VEKTRA_API_KEY` (admin scope) in its server-side PHP config. +2. Moodle's native auth verifies "user X is a teacher of course Y" before showing/accepting the block edit form. +3. When a teacher saves the form, Moodle PHP calls `PATCH /api/v1/namespaces/{course_id}/config` authenticated with the admin API key. +4. Vektra does not know individual professors. It trusts the upstream auth (same philosophy as `VEKTRA_LEARN_REQUIRE_ENROLLMENT=false`). + +No new Vektra-side auth primitive needed. No "teacher scope" API key type to invent. + +### Conversation persistence: tab-scoped, no cross-device + +- Storage: `sessionStorage` keyed by `course_id` (tab-scoped per browser tab, wiped on tab close). +- Not `localStorage`: shared computers in university labs must not leak conversations across students. +- Not server-side "last conversation for this student": avoids privacy debate and keeps the widget stateless from Vektra's perspective (conversation turns are already persisted encrypted server-side per ADR-0011, this is just a UI convenience). +- On load: if `sessionStorage[course_id]` has a `conversation_id`, widget fetches turns from `GET /api/v1/conversations/{id}/turns` and renders them before the user types. +- "New chat" button: explicit reset (clears sessionStorage + DOM). + +## Work items + +### WI-1 - Backend: GET /api/v1/conversations/{id}/turns (JWT-scoped) + +FEAT-004 prerequisite. Admin equivalent already exists at `vektra-admin/src/vektra_admin/api.py:463`; this is the student-scoped mirror in `vektra-learn`. + +**Endpoint**: +- Method: `GET` +- Path: `/api/v1/conversations/{conversation_id}/turns` +- Auth: JWT (learn dashboard token) +- Path param: `conversation_id: UUID` +- Response: `200 OK` with `{"turns": [{"turn_id": UUID, "question": str, "answer": str, "created_at": ISO8601, "sources": [...] }], "conversation_id": UUID, "namespace": str}` +- Errors: `404` if conversation does not exist, `403` if JWT namespace does not match conversation namespace, `401` if JWT invalid/expired + +**Authorization rule**: the JWT contains `namespace` (or `course_id` fallback per FEAT-003). The endpoint MUST verify `conversation.namespace_id == jwt.namespace` before returning turns. This is the only authorization gate — we do not also check `student_id` because a student may legitimately reload a conversation started in a previous session with the same course. + +**Files**: +- `vektra-learn/src/vektra_learn/api.py` - add endpoint +- `vektra-learn/src/vektra_learn/models.py` - add `ConversationTurnsResponse` Pydantic model if not present +- `vektra-core/src/vektra_core/conversations.py` (or wherever `conversation_turns` table is queried) - add `get_turns_by_conversation_id(conversation_id, namespace) -> list[Turn]` helper that decrypts `question` and `answer` via `pgp_sym_decrypt` (see CLAUDE.md DB investigation notes) +- Tests: `vektra-learn/tests/test_conversations.py` + +**Decryption note**: `conversation_turns.question` and `conversation_turns.answer` are encrypted with `pgp_sym_encrypt()` using `VEKTRA_CONVERSATION_KEY`. The helper must decrypt before serializing. See `.claude/CLAUDE.md` → "Conversation turns are encrypted". + +**Test gates** (acceptance): +- [ ] Valid JWT + matching namespace returns decrypted turns ordered by `created_at` ASC +- [ ] Valid JWT + different namespace returns `403` (no data leak across courses) +- [ ] Expired JWT returns `401` +- [ ] Non-existent `conversation_id` returns `404` +- [ ] Response includes `sources` field (even if empty array) so widget does not need a conditional render +- [ ] Integration test uses a real Postgres (per `feedback_no_mock_database` memory) + +### WI-2 - Backend: document name in source citations (FEAT-012) + +Currently `SourceRef` returns `chunk_id` (UUID). Students see `[1] 4a2f3b... (0.82)`. They need to see the document name. + +**Data source**: `document_chunks.document_id` → `documents.source_file` (or `documents.title` if we want a human-friendlier field). The Qdrant payload also carries `source_file` per audit earlier. + +**Changes**: +- Extend `SourceRef` schema to include `document_name: str` (derived from `documents.source_file`, fallback to `documents.id[:8]` if null) +- Update the pipeline source assembly to fetch `source_file` from the document referenced by each chunk. Prefer a JOIN in the existing query over a second round-trip per chunk. +- Widget change: render `[1] document_name (score)` instead of `[1] chunk_id (score)`. Tooltip on hover can keep the chunk UUID for debugging. + +**Files**: +- `vektra-core/src/vektra_core/api.py` or `pipeline.py` - `SourceRef` schema + assembly +- `vektra-learn/src/vektra_learn/query.py` - ensure the field passes through to the learn response +- `vektra-learn/widget/src/chat-ui.js` (or wherever sources render) - use `document_name` +- Tests: both backend (pipeline returns document_name) and widget (renders correctly) + +**Edge cases**: +- Document soft-deleted (REQ-057): still render the name, append " (archived)" marker. Do NOT hide the citation — it must match what was retrieved. +- Document name with special chars (e.g., emoji, Italian accents): must survive JSON round-trip. Widget uses `textContent`, not `innerHTML`, for the citation label. + +**Test gates**: +- [ ] `/api/v1/query` response includes `document_name` for each source +- [ ] `/api/v1/learn/query` response propagates the field +- [ ] Widget displays `[1] ()` — no UUID visible in normal UI +- [ ] Falls back gracefully if `source_file` is null + +### WI-3 - Backend: PATCH /api/v1/namespaces/{id}/config (instructor config) + +Foundation for category (B) config. Writes into the existing `namespaces.config` JSONB column. Read path already exists (`resolve_grounding_mode` at `vektra-shared/src/vektra_shared/namespace.py:19`). + +**Endpoint**: +- Method: `PATCH` +- Path: `/api/v1/namespaces/{namespace_id}/config` +- Auth: admin API key (via `require_admin` dependency pattern already used elsewhere) +- Body: `{"grounding_mode": "strict" | "hybrid" | null}` (null = unset, falls back to env var). Explicitly allow partial updates — only keys present in the body are updated, others are preserved. Unknown keys rejected with `400`. +- Response: `200 OK` with the current full `config` object after merge +- Errors: `404` if namespace does not exist, `400` if value invalid (not in enum), `401`/`403` for auth + +**Design choice — allowed keys**: +Keep the whitelist tight. For v0.5.0, only `grounding_mode` is a valid key. Adding more keys is a later patch-release change. The endpoint must reject unknown keys (not silently ignore) so that typos from the Moodle plugin surface immediately during development. + +```python +ALLOWED_CONFIG_KEYS = {"grounding_mode"} # extend cautiously; each key is a public contract +ALLOWED_GROUNDING_MODES = {"strict", "hybrid"} +``` + +**Files**: +- `vektra-admin/src/vektra_admin/api.py` (namespace routes already live here) - add PATCH +- `vektra-admin/src/vektra_admin/service.py` (or similar) - `update_namespace_config(namespace_id, partial_config) -> dict` +- Tests: `vektra-admin/tests/test_namespaces.py` + +**Why admin API (not a new "teacher" scope)**: Moodle acts as the trusted upstream (see "Authentication model" above). Inventing a new scope here adds complexity without benefit — Moodle's admin key is already stored server-side and never reaches the browser. + +**Test gates**: +- [ ] PATCH with `{"grounding_mode": "hybrid"}` updates the value in `namespaces.config` +- [ ] Subsequent query to that namespace uses hybrid mode (end-to-end: write → resolve → pipeline) +- [ ] PATCH with unknown key returns `400` with the rejected key name +- [ ] PATCH with invalid value (e.g., `"grounding_mode": "banana"`) returns `400` +- [ ] PATCH with `null` value for a key removes it from config (fallback to env var) +- [ ] PATCH is partial: existing keys not in the body are preserved +- [ ] Non-admin API key returns `403` + +### WI-4 - Widget: white-label data-attrs (FEAT-016) + +Client-side only. Read additional `data-*` attributes from the script tag, expose them as CSS custom properties + template values. + +**New data-* attributes**: + +| Attribute | Default | Applied to | +|---|---|---| +| `data-title` | "Course Assistant" / i18n | Chat panel header | +| `data-primary-color` | `#2563eb` | Button, links, accents (as `--vektra-primary` CSS var) | +| `data-icon` | speech bubble emoji | Floating button icon (URL or emoji) | +| `data-welcome-message` | (none) | First assistant message on open | +| `data-powered-by` | "true" | If "false", hide "Powered by Vektra" footer | + +**Files**: +- `vektra-learn/widget/src/index.js` - parse new attributes in the constructor +- `vektra-learn/widget/src/styles.js` - replace hardcoded `#2563eb` with `var(--vektra-primary, #2563eb)` +- `vektra-learn/widget/src/chat-ui.js` - render title from config, inject welcome message on first open +- Tests: `vektra-learn/widget/tests/` - verify each attribute changes the rendered output, missing attributes fall back to defaults + +**Fallback chain** (consistent with FEAT-016 backlog spec): `data-* attribute > hardcoded default`. Namespace metadata override is deferred to a later release (would require category-B endpoint extension, not worth it for visual fields). + +**Test gates**: +- [ ] Each `data-*` attribute individually changes the widget output +- [ ] Missing attributes fall back to current v0.4.0 defaults (no breaking change for existing Moodle installs) +- [ ] Light/dark theme still works when `data-primary-color` is set +- [ ] `data-icon` accepts both emoji and URL; URL shows image, emoji shows text +- [ ] XSS: `data-title` with ` + * + * White-label attributes (all optional): + * data-title - header title and button aria-label + * data-primary-color - hex/rgb/named color used for buttons and accents + * data-icon - emoji or image URL for the floating button + * data-welcome-message - assistant message shown on first open + * data-powered-by - "true" (default) shows Vektra attribution, + * "false" hides it */ import { ApiClient } from "./api-client.js"; @@ -35,6 +48,18 @@ import { ChatUI } from "./chat-ui.js"; const language = scriptTag.getAttribute("data-language") || "en"; const tokenRefreshUrl = scriptTag.getAttribute("data-token-refresh-url") || null; + // White-label attributes (all optional). See file header for semantics. + const customTitle = scriptTag.getAttribute("data-title") || null; + const customPrimaryColor = + scriptTag.getAttribute("data-primary-color") || null; + const customIcon = scriptTag.getAttribute("data-icon") || null; + const welcomeMessage = scriptTag.getAttribute("data-welcome-message") || null; + const poweredByAttr = scriptTag.getAttribute("data-powered-by"); + // Default ON; only "false" (case-insensitive) hides attribution. + const showPoweredBy = poweredByAttr === null + ? true + : poweredByAttr.toLowerCase() !== "false"; + if (!apiUrl || !courseId || !token) { console.error( "[vektra-chat] Missing required attributes: data-api-url, data-course-id, data-token" @@ -48,6 +73,11 @@ import { ChatUI } from "./chat-ui.js"; const ui = new ChatUI({ theme, language, + customTitle, + customPrimaryColor, + customIcon, + welcomeMessage, + showPoweredBy, onSend(question) { const stream = ui.createStreamMessage(); diff --git a/vektra-learn/widget/src/styles.js b/vektra-learn/widget/src/styles.js index 22296cb0..93afaeac 100644 --- a/vektra-learn/widget/src/styles.js +++ b/vektra-learn/widget/src/styles.js @@ -52,7 +52,7 @@ export function buildStyles(theme) { width: 56px; height: 56px; border-radius: 50%; - background: ${t.primary}; + background: var(--vektra-primary, ${t.primary}); color: #fff; border: none; cursor: pointer; @@ -67,6 +67,7 @@ export function buildStyles(theme) { } .vektra-chat-btn:hover { background: ${t.primaryHover}; + filter: brightness(0.92); transform: scale(1.05); } @@ -135,7 +136,7 @@ export function buildStyles(theme) { } .vektra-chat-msg.user { align-self: flex-end; - background: ${t.userBubble}; + background: var(--vektra-primary, ${t.userBubble}); color: ${t.userText}; border-bottom-right-radius: 4px; } @@ -188,7 +189,7 @@ export function buildStyles(theme) { font-size: 12px; } .vektra-chat-msg.assistant a { - color: ${t.primary}; + color: var(--vektra-primary, ${t.primary}); text-decoration: underline; } .vektra-chat-msg.assistant strong { @@ -206,7 +207,7 @@ export function buildStyles(theme) { background: none; border: none; cursor: pointer; - color: ${t.primary}; + color: var(--vektra-primary, ${t.primary}); font-size: 12px; padding: 0; font-family: inherit; @@ -215,7 +216,7 @@ export function buildStyles(theme) { text-decoration: underline; } .vektra-chat-sources-toggle:focus-visible { - outline: 2px solid ${t.primary}; + outline: 2px solid var(--vektra-primary, ${t.primary}); outline-offset: 2px; border-radius: 4px; } @@ -235,7 +236,7 @@ export function buildStyles(theme) { } .vektra-chat-source-num { font-weight: 600; - color: ${t.primary}; + color: var(--vektra-primary, ${t.primary}); margin-right: 4px; } .vektra-chat-source-text { @@ -280,14 +281,14 @@ export function buildStyles(theme) { font-family: inherit; } .vektra-chat-input:focus-visible { - outline: 2px solid ${t.primary}; + outline: 2px solid var(--vektra-primary, ${t.primary}); outline-offset: -1px; } .vektra-chat-input::placeholder { color: ${t.textSecondary}; } .vektra-chat-send { - background: ${t.primary}; + background: var(--vektra-primary, ${t.primary}); color: #fff; border: none; border-radius: 8px; @@ -299,12 +300,33 @@ export function buildStyles(theme) { } .vektra-chat-send:hover { background: ${t.primaryHover}; + filter: brightness(0.92); } .vektra-chat-send:disabled { opacity: 0.5; cursor: not-allowed; } +.vektra-chat-btn-icon { + width: 28px; + height: 28px; + display: block; +} + +.vektra-chat-powered-by { + padding: 6px 12px; + text-align: center; + font-size: 11px; + color: ${t.textSecondary}; + background: ${t.bgSecondary}; + border-top: 1px solid ${t.border}; + flex-shrink: 0; +} +.vektra-chat-powered-by a { + color: ${t.textSecondary}; + text-decoration: underline; +} + @media (max-width: 480px) { .vektra-chat-panel { width: calc(100vw - 16px); From baf77b19a9ee84d5fc71822b9e868686e0b9803e Mon Sep 17 00:00:00 2001 From: Francesco Vadicamo Date: Tue, 21 Apr 2026 15:22:35 +0000 Subject: [PATCH 07/55] feat(widget): conversation persistence + New chat button (WI-5, FEAT-004) A page refresh currently loses the whole conversation. Students asking follow-up questions about course materials can't afford that. This adds tab-scoped persistence and a "New chat" button. - sessionStorage key "vektra-conv:" stores { conversation_id, stored_at }. Tab-scoped avoids leaking conversations across students on shared university lab computers; localStorage would have been wrong. - Stale cutoff at 24h so tabs left open overnight don't silently resurface yesterday's chat at the top of the screen. - On widget init: read storage, call the new WI-1 endpoint, replay turns into the chat panel before enabling input. 404/403/empty turns silently abandon the stored id (widget recovers; no error shown to the user). - Each successful query persists the server-assigned conversation_id so that the next reload picks up where we left off. - "New chat" button in the panel header clears storage and resets the client conversation id so the next question opens a fresh thread. Works as a recovery path if the server reassigns the id. - ApiClient gains setConversationId(id) and getConversationTurns(id). Part of v0.5.0 widget-and-prof-config plan. All five work items done; widget bundle rebuilt and lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) --- vektra-learn/widget/src/api-client.js | 41 +++++++++++++++ vektra-learn/widget/src/chat-ui.js | 60 ++++++++++++++++++++- vektra-learn/widget/src/index.js | 75 +++++++++++++++++++++++++++ vektra-learn/widget/src/styles.js | 13 +++++ 4 files changed, 188 insertions(+), 1 deletion(-) diff --git a/vektra-learn/widget/src/api-client.js b/vektra-learn/widget/src/api-client.js index b24ecdf9..2d0b601c 100644 --- a/vektra-learn/widget/src/api-client.js +++ b/vektra-learn/widget/src/api-client.js @@ -32,6 +32,47 @@ export class ApiClient { return this._conversationId; } + /** + * Manually set the current conversation ID (e.g. restored from sessionStorage). + * Use null to clear it. + */ + setConversationId(id) { + this._conversationId = id || null; + } + + /** + * Fetch decrypted turns for a stored conversation (WI-1 / FEAT-004). + * Returns null on 404/403 (so the caller can reset local state silently), + * throws on other errors so that a transient network failure does not + * wipe the sessionStorage entry. + * + * @param {string} conversationId + * @returns {Promise<{conversation_id: string, namespace: string, turns: Array}|null>} + */ + async getConversationTurns(conversationId, _retried = false) { + const resp = await fetch( + `${this._apiUrl}/api/v1/learn/conversations/${encodeURIComponent(conversationId)}/turns`, + { + method: "GET", + headers: { Authorization: `Bearer ${this._token}` }, + } + ); + if (resp.status === 401 && !_retried) { + const newToken = await this._refreshToken(); + if (newToken) { + this._token = newToken; + return this.getConversationTurns(conversationId, true); + } + } + if (resp.status === 404 || resp.status === 403) { + return null; // caller clears local state, no error surfaced to the user + } + if (!resp.ok) { + throw new Error(`HTTP ${resp.status}`); + } + return resp.json(); + } + /** * Check if the Vektra API is reachable. * @returns {Promise} diff --git a/vektra-learn/widget/src/chat-ui.js b/vektra-learn/widget/src/chat-ui.js index 6133c198..96121ad7 100644 --- a/vektra-learn/widget/src/chat-ui.js +++ b/vektra-learn/widget/src/chat-ui.js @@ -20,6 +20,7 @@ const I18N = { reconnecting: "Reconnecting...", sessionExpired: "Your session has expired. Please reload the page.", close: "Close", + newChat: "New chat", }, it: { title: "Assistente del corso", @@ -34,6 +35,7 @@ const I18N = { reconnecting: "Riconnessione...", sessionExpired: "La sessione è scaduta. Ricarica la pagina.", close: "Chiudi", + newChat: "Nuova chat", }, }; @@ -73,6 +75,7 @@ export class ChatUI { * @param {string|null} [opts.welcomeMessage] - first assistant message on open * @param {boolean} [opts.showPoweredBy=true] - show "Powered by Vektra" footer * @param {function} opts.onSend - callback(question: string) + * @param {function} [opts.onNewChat] - callback invoked when "New chat" is clicked */ constructor({ theme = "light", @@ -83,10 +86,12 @@ export class ChatUI { welcomeMessage = null, showPoweredBy = true, onSend, + onNewChat = null, }) { this._theme = theme; this._lang = I18N[language] || I18N.en; this._onSend = onSend; + this._onNewChat = onNewChat; this._isOpen = false; this._sending = false; this._status = null; // "unavailable" | "reconnecting" | "sessionExpired" | null @@ -146,7 +151,10 @@ export class ChatUI { this._panel.innerHTML = `
- +
+ + +
@@ -180,6 +188,7 @@ export class ChatUI { this._inputEl = this._panel.querySelector(".vektra-chat-input"); this._sendBtn = this._panel.querySelector(".vektra-chat-send"); this._closeBtn = this._panel.querySelector(".vektra-chat-close"); + this._newChatBtn = this._panel.querySelector(".vektra-chat-new"); } _bindEvents() { @@ -192,6 +201,55 @@ export class ChatUI { this._handleSend(); } }); + if (this._newChatBtn) { + this._newChatBtn.addEventListener("click", () => this._handleNewChat()); + } + } + + _handleNewChat() { + if (this._sending) return; + this.reset(); + if (this._onNewChat) this._onNewChat(); + } + + /** + * Reset the chat panel: clear messages, re-arm welcome message, allow input. + * Does NOT call the onNewChat callback; use _handleNewChat for that. + */ + reset() { + this._messagesEl.textContent = ""; + this._welcomeShown = false; + if (this._isOpen && this._welcomeMessage) { + // Re-inject welcome message so the cleared panel isn't blank + this._welcomeShown = true; + const msg = document.createElement("div"); + msg.className = "vektra-chat-msg assistant"; + msg.textContent = this._welcomeMessage; + this._messagesEl.appendChild(msg); + } + } + + /** + * Replay a list of stored turns as user/assistant message bubbles (WI-5). + * Call after restoring a conversation from sessionStorage but before + * enabling user input. Each turn is a plain object with + * { question, answer, sources }. Sources may be empty for now. + */ + replayTurns(turns) { + if (!Array.isArray(turns)) return; + // Replaying takes ownership of the message list — clear any welcome msg + // so the history is the first thing the student sees. + this._messagesEl.textContent = ""; + this._welcomeShown = true; // don't re-emit welcome on first open + for (const turn of turns) { + if (turn.question) this.addMessage("user", turn.question); + if (turn.answer) { + const msgEl = this.addMessage("assistant", turn.answer); + if (Array.isArray(turn.sources) && turn.sources.length > 0) { + this.addSources(msgEl, turn.sources); + } + } + } } toggle() { diff --git a/vektra-learn/widget/src/index.js b/vektra-learn/widget/src/index.js index 27846735..096dabf2 100644 --- a/vektra-learn/widget/src/index.js +++ b/vektra-learn/widget/src/index.js @@ -31,6 +31,50 @@ import { ApiClient } from "./api-client.js"; import { ChatUI } from "./chat-ui.js"; +// Storage key for per-course conversation persistence (WI-5). Tab-scoped +// via sessionStorage so that shared university computers don't leak a +// student's conversation to the next user. +const STORAGE_PREFIX = "vektra-conv:"; +// Tabs kept open for days shouldn't silently resurface yesterday's chat. +const STALE_AFTER_MS = 24 * 60 * 60 * 1000; + +function _readStored(courseId) { + try { + const raw = window.sessionStorage.getItem(STORAGE_PREFIX + courseId); + if (!raw) return null; + const parsed = JSON.parse(raw); + if (!parsed || !parsed.conversation_id) return null; + const age = Date.now() - (parsed.stored_at || 0); + if (age < 0 || age > STALE_AFTER_MS) return null; + return parsed; + } catch { + return null; + } +} + +function _writeStored(courseId, conversationId) { + try { + window.sessionStorage.setItem( + STORAGE_PREFIX + courseId, + JSON.stringify({ + conversation_id: conversationId, + stored_at: Date.now(), + }) + ); + } catch { + // Storage full or unavailable — silently ignore; persistence is a + // convenience, not a correctness requirement. + } +} + +function _clearStored(courseId) { + try { + window.sessionStorage.removeItem(STORAGE_PREFIX + courseId); + } catch { + // ignore + } +} + (function () { // Capture the script tag synchronously (before DOMContentLoaded) const scripts = document.querySelectorAll('script[src*="vektra-chat"]'); @@ -93,6 +137,10 @@ import { ChatUI } from "./chat-ui.js"; }, onDone() { ui.doneSending(); + // Persist conversation ID for history restore on next load (WI-5). + if (client.conversationId) { + _writeStored(courseId, client.conversationId); + } }, onError(errMsg) { // Show session expired message for auth failures @@ -105,8 +153,35 @@ import { ChatUI } from "./chat-ui.js"; }, }); }, + onNewChat() { + // Reset both storage and client-side state so the next query + // creates a fresh conversation. + _clearStored(courseId); + client.setConversationId(null); + }, }); + // Restore a conversation from a prior load (WI-5 / FEAT-004). Silent: + // any failure (404, 403, network) keeps the widget usable with a clean + // slate rather than surfacing an error. + async function restoreConversation() { + const stored = _readStored(courseId); + if (!stored) return; + try { + const payload = await client.getConversationTurns(stored.conversation_id); + if (payload && Array.isArray(payload.turns) && payload.turns.length > 0) { + client.setConversationId(stored.conversation_id); + ui.replayTurns(payload.turns); + } else { + // Returned null (404/403) or empty turns — abandon stored id + _clearStored(courseId); + } + } catch { + // Network/transient error: keep stored id, try again next load + } + } + restoreConversation(); + // Check API connectivity on startup and show status if unreachable let retryTimer = null; diff --git a/vektra-learn/widget/src/styles.js b/vektra-learn/widget/src/styles.js index 93afaeac..941f45f4 100644 --- a/vektra-learn/widget/src/styles.js +++ b/vektra-learn/widget/src/styles.js @@ -108,6 +108,12 @@ export function buildStyles(theme) { font-weight: 600; font-size: 15px; } +.vektra-chat-header-actions { + display: flex; + align-items: center; + gap: 4px; +} +.vektra-chat-new, .vektra-chat-close { background: none; border: none; @@ -117,6 +123,13 @@ export function buildStyles(theme) { padding: 4px; line-height: 1; } +.vektra-chat-new { + font-size: 16px; +} +.vektra-chat-new:hover, +.vektra-chat-close:hover { + color: ${t.text}; +} .vektra-chat-messages { flex: 1; From bcbde1a86366da0e69cb5592e21ac74dbf3f9648 Mon Sep 17 00:00:00 2001 From: Francesco Vadicamo Date: Tue, 21 Apr 2026 15:25:01 +0000 Subject: [PATCH 08/55] docs(v0.5.0): update BACKLOG and CHANGELOG for widget + prof config - Mark FEAT-012 completed, FEAT-016 partial (data-attrs only), FEAT-004 partial (persistence + new chat, sources still empty). - Record all 5 WIs under an Unreleased v0.5.0 section so the release notes are ready when this branch merges. Co-Authored-By: Claude Opus 4.7 (1M context) --- .s2s/BACKLOG.md | 10 +++++++--- CHANGELOG.md | 15 +++++++++++++++ 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/.s2s/BACKLOG.md b/.s2s/BACKLOG.md index 278f8ebf..a494132a 100644 --- a/.s2s/BACKLOG.md +++ b/.s2s/BACKLOG.md @@ -1305,7 +1305,7 @@ This makes troubleshooting difficult: the admin sees "active" but students see n ### FEAT-012: Include document name in query source citations -**Status**: draft | **Priority**: medium | **Created**: 2026-03-20 +**Status**: completed | **Priority**: medium | **Created**: 2026-03-20 | **Completed**: 2026-04-21 | **Plan**: 20260418-v050-widget-and-prof-config **Origin**: Moodle integration testing - sources show chunk_id (UUID) instead of document name **Context**: The learn query response includes source citations with `doc_id`, `chunk_id`, `score`, and `snippet`. The widget renders these as `[1] chunk_id (score)` with a snippet preview. The `chunk_id` is a UUID which is meaningless to the user. The original document filename (e.g., "Escapologia Fiscale - 59 segreti.pdf") is not included in the source data. @@ -1349,9 +1349,11 @@ Approach 1 (keyword proximity) is the best cost/benefit trade-off for a first im ### FEAT-016: White-label widget customization (name, colors, branding) -**Status**: draft | **Priority**: medium | **Created**: 2026-03-20 +**Status**: partial (data-attrs) | **Priority**: medium | **Created**: 2026-03-20 | **Updated**: 2026-04-21 | **Plan**: 20260418-v050-widget-and-prof-config **Origin**: vertical deployment requirements - universities and organizations need chatbot with their own branding +**v0.5.0 progress**: data-* attributes implemented (`data-title`, `data-primary-color`, `data-icon`, `data-welcome-message`, `data-powered-by`). Namespace-backed branding (category B) and JWT-claim precedence remain open — see FEAT-008. + **Context**: The widget currently supports only `theme` (light/dark) and `language` (en/it) as visual customization. Everything else is hardcoded: title ("Course Assistant"), primary color (#2563eb blue), icon (speech bubble emoji), and no welcome message. ADR-0025 defines the `data-*` attribute contract as the configuration API, and the "configuration over fork" principle requires that customization happens via config, not code changes. For vertical deployments (e.g., a university running Vektra for their students), the chatbot should be brandable to match the institution's identity. The same applies to any organization deploying Vektra as infrastructure behind their own product. @@ -1640,10 +1642,12 @@ The core API (`POST /api/v1/query`) has the same design — it's documented as " ### FEAT-004: Widget conversation lifecycle improvements -**Status**: draft | **Priority**: medium | **Created**: 2026-03-16 +**Status**: partial (persistence + new chat) | **Priority**: medium | **Created**: 2026-03-16 | **Updated**: 2026-04-21 | **Plan**: 20260418-v050-widget-and-prof-config **Origin**: Moodle integration testing (2026-03-16) **Depends on**: BUG-010 +**v0.5.0 progress**: sessionStorage persistence with 24h stale cutoff (tab-scoped, keyed by course_id), history replay via new `GET /conversations/{id}/turns`, explicit "New chat" button. Remaining open: idle timeout, cross-device continuity, token-refresh interaction policy. Sources are returned empty for v0.5.0 — extending turns response with citations requires joining query_traces and is deferred. + **Context**: After BUG-010 is fixed, the widget will support multi-turn conversations within a single page load. However, the `conversation_id` lives only in JS memory (`ApiClient._conversationId`) and is lost on page refresh, navigation, or tab close. Additionally, there is no explicit way for the user to start a fresh conversation. These are UX improvements to evaluate for the learn chatbot widget. **Areas to evaluate**: diff --git a/CHANGELOG.md b/CHANGELOG.md index 9c679a36..b687aeb9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,21 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). +## [Unreleased] — v0.5.0 "Widget production-ready + instructor configuration" + +### Added + +- **vektra-admin**: `PATCH /api/v1/namespaces/{id}/config` endpoint for instructor configuration. Admin-scoped, whitelisted (v0.5.0 accepts `grounding_mode` only), partial updates, null removes a key. Backs the Moodle block form that lets teachers toggle strict/hybrid RAG per course without admin intervention. +- **vektra-learn**: `GET /api/v1/learn/conversations/{id}/turns` JWT-scoped endpoint so the widget can restore a conversation after a page reload. Returns decrypted question/answer + created_at; admin-only metadata is not exposed. 403 on namespace mismatch, 404 on missing. +- **vektra-core**: `document_name` field on every source citation, joined from `source_documents.filename` so the widget renders `[1] lecture-07.pdf` instead of chunk UUIDs. Propagates through both `SimpleQueryPipeline` and `AdvancedQueryPipeline`, JSON and SSE paths. +- **widget**: white-label `data-*` attributes — `data-title`, `data-primary-color`, `data-icon` (emoji or URL), `data-welcome-message`, `data-powered-by`. All rendered via `textContent` / safe color whitelist to avoid XSS. +- **widget**: tab-scoped conversation persistence via `sessionStorage` keyed by `course_id` (24h cutoff); history replay on load via the new turns endpoint; explicit "New chat" button in the header. +- **errors**: new codes `ERR-ADMIN-005/006/007` (namespace config) and `ERR-LEARN-005/006` (conversation turns). + +### Changed + +- **widget styles**: button and accent colors now use `var(--vektra-primary, …)` so `data-primary-color` takes effect without rebuilding the bundle. Hover states use `filter: brightness()` so custom colors still feel interactive. + ## [0.3.0] - 2026-03-21 E-learning vertical refinements, widget UX improvements, and Phase 2 stabilization. From 45437c83068a4c26bf41aebb1613b7cfe1beeb31 Mon Sep 17 00:00:00 2001 From: Francesco Vadicamo Date: Tue, 21 Apr 2026 17:05:33 +0000 Subject: [PATCH 09/55] fix(core): populate document_name in AdvancedQueryPipeline.execute_stream (WI-2) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Missed in 192b62c. The advanced pipeline is the default in production (Combo D), and students use streaming (stream: true hardcoded in the widget client), so citations served over SSE never got the filename — the widget fell back to chunk UUIDs exactly in the code path that matters most. Co-Authored-By: Claude Opus 4.7 (1M context) --- vektra-core/src/vektra_core/advanced_pipeline.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/vektra-core/src/vektra_core/advanced_pipeline.py b/vektra-core/src/vektra_core/advanced_pipeline.py index 50ba59a1..15aeab38 100644 --- a/vektra-core/src/vektra_core/advanced_pipeline.py +++ b/vektra-core/src/vektra_core/advanced_pipeline.py @@ -794,6 +794,7 @@ async def _stream( log.warning("conversation_turn_store_failed", error=str(exc)) # Yield sources (only budget-selected chunks, not all filtered) + name_map = await _fetch_document_names([r.document_id for r in selected_chunks]) sources_data = [ { "doc_id": str(r.document_id), @@ -802,6 +803,7 @@ async def _stream( "snippet": r.text_snippet, "citation_id": str(uuid4()), "document_version": r.document_version, + "document_name": name_map.get(r.document_id), } for r in selected_chunks ] From e787cc4c1dc0657194f776f027672c08369e5098 Mon Sep 17 00:00:00 2001 From: Francesco Vadicamo Date: Tue, 21 Apr 2026 17:23:15 +0000 Subject: [PATCH 10/55] fix(v0.5.0): archived marker, learn audit log, namespace PATCH route MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three review findings addressed in one commit — all three changes are cross-cutting and touching any one of them alone would force a stale code/doc split. 1. WI-2 archived marker (REQ-057, plan gate): _fetch_document_names now also reads deleted_at and appends "(archived)" to the filename when the source document is soft-deleted. Citations stay traceable for students without hiding the link to a removed asset. 2. WI-1 audit log (NFR-007): successful turns reads write a learn_conversation_turns_read audit row via vektra_shared.audit so decrypted conversation access is traceable. Uses the learn JWT sentinel key_id (same one used for ensure_conversation) and captures namespace, conversation_id, turn count, student_id, course_id. 3. WI-3 route naming: PATCH namespace config moves from /api/v1/namespaces/{id}/config to /api/v1/admin/namespaces/{id}/config to match the admin-endpoint convention established by DEBT-011's /api/v1/admin/conversations/{id}/turns. Tests, api.md, and CHANGELOG updated accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 5 ++-- docs/reference/api.md | 4 +-- vektra-admin/src/vektra_admin/api.py | 4 +-- vektra-admin/tests/test_integration.py | 18 +++++------ vektra-core/src/vektra_core/pipeline.py | 23 ++++++++++---- vektra-core/tests/test_pipeline.py | 40 +++++++++++++++++++++++++ vektra-learn/src/vektra_learn/api.py | 27 ++++++++++++++++- vektra-learn/tests/test_api.py | 31 ++++++++++++++----- 8 files changed, 123 insertions(+), 29 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b687aeb9..6dcebbc0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,9 +8,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). ### Added -- **vektra-admin**: `PATCH /api/v1/namespaces/{id}/config` endpoint for instructor configuration. Admin-scoped, whitelisted (v0.5.0 accepts `grounding_mode` only), partial updates, null removes a key. Backs the Moodle block form that lets teachers toggle strict/hybrid RAG per course without admin intervention. +- **vektra-admin**: `PATCH /api/v1/admin/namespaces/{id}/config` endpoint for instructor configuration. Admin-scoped, whitelisted (v0.5.0 accepts `grounding_mode` only), partial updates, null removes a key. Backs the Moodle block form that lets teachers toggle strict/hybrid RAG per course without admin intervention. - **vektra-learn**: `GET /api/v1/learn/conversations/{id}/turns` JWT-scoped endpoint so the widget can restore a conversation after a page reload. Returns decrypted question/answer + created_at; admin-only metadata is not exposed. 403 on namespace mismatch, 404 on missing. -- **vektra-core**: `document_name` field on every source citation, joined from `source_documents.filename` so the widget renders `[1] lecture-07.pdf` instead of chunk UUIDs. Propagates through both `SimpleQueryPipeline` and `AdvancedQueryPipeline`, JSON and SSE paths. +- **vektra-core**: `document_name` field on every source citation, joined from `source_documents.filename` so the widget renders `[1] lecture-07.pdf` instead of chunk UUIDs. Propagates through both `SimpleQueryPipeline` and `AdvancedQueryPipeline`, JSON and SSE paths. Soft-deleted source documents (REQ-057) keep their citation with an `(archived)` suffix so answers stay traceable. +- **vektra-learn**: content-access audit entry (`learn_conversation_turns_read`) written on every successful turns fetch via the shared `vektra_shared.audit` interface (NFR-007). - **widget**: white-label `data-*` attributes — `data-title`, `data-primary-color`, `data-icon` (emoji or URL), `data-welcome-message`, `data-powered-by`. All rendered via `textContent` / safe color whitelist to avoid XSS. - **widget**: tab-scoped conversation persistence via `sessionStorage` keyed by `course_id` (24h cutoff); history replay on load via the new turns endpoint; explicit "New chat" button in the header. - **errors**: new codes `ERR-ADMIN-005/006/007` (namespace config) and `ERR-LEARN-005/006` (conversation turns). diff --git a/docs/reference/api.md b/docs/reference/api.md index 99a76d36..3b1bfbdc 100644 --- a/docs/reference/api.md +++ b/docs/reference/api.md @@ -142,7 +142,7 @@ Returns HTTP 204 (no body). ## Namespaces -### PATCH /api/v1/namespaces/{namespace_id}/config +### PATCH /api/v1/admin/namespaces/{namespace_id}/config Partially update a namespace's behavioral config (JSONB). Requires `admin` scope. @@ -155,7 +155,7 @@ curl -s -X PATCH \ -H "Authorization: Bearer $VEKTRA_API_KEY" \ -H "Content-Type: application/json" \ -d '{"grounding_mode":"hybrid"}' \ - http://localhost:8000/api/v1/namespaces/default/config | python3 -m json.tool + http://localhost:8000/api/v1/admin/namespaces/default/config | python3 -m json.tool ``` Request body (flat dict, one entry per config key): diff --git a/vektra-admin/src/vektra_admin/api.py b/vektra-admin/src/vektra_admin/api.py index 050f1431..1fbe30e2 100644 --- a/vektra-admin/src/vektra_admin/api.py +++ b/vektra-admin/src/vektra_admin/api.py @@ -535,7 +535,7 @@ class NamespaceConfigResponse(BaseModel): @router.patch( - "/api/v1/namespaces/{namespace_id}/config", + "/api/v1/admin/namespaces/{namespace_id}/config", response_model=NamespaceConfigResponse, ) async def patch_namespace_config( @@ -622,7 +622,7 @@ async def patch_namespace_config( background_tasks.add_task( _audit.log_event, key_id=key_info.key_id, - endpoint=f"/api/v1/namespaces/{namespace_id}/config", + endpoint=f"/api/v1/admin/namespaces/{namespace_id}/config", method="PATCH", status_code=200, request_id=request_id, diff --git a/vektra-admin/tests/test_integration.py b/vektra-admin/tests/test_integration.py index e28242b6..7cb63fca 100644 --- a/vektra-admin/tests/test_integration.py +++ b/vektra-admin/tests/test_integration.py @@ -579,7 +579,7 @@ async def test_namespace_config_patch_sets_grounding_mode( await _seed_namespace(fresh_engine, ns_id) resp = await client.patch( - f"/api/v1/namespaces/{ns_id}/config", + f"/api/v1/admin/namespaces/{ns_id}/config", json={"grounding_mode": "hybrid"}, headers={"Authorization": f"Bearer {admin_key}"}, ) @@ -603,7 +603,7 @@ async def test_namespace_config_patch_rejects_unknown_key( await _seed_namespace(fresh_engine, ns_id) resp = await client.patch( - f"/api/v1/namespaces/{ns_id}/config", + f"/api/v1/admin/namespaces/{ns_id}/config", json={"not_a_real_key": "whatever"}, headers={"Authorization": f"Bearer {admin_key}"}, ) @@ -622,7 +622,7 @@ async def test_namespace_config_patch_rejects_invalid_value( await _seed_namespace(fresh_engine, ns_id) resp = await client.patch( - f"/api/v1/namespaces/{ns_id}/config", + f"/api/v1/admin/namespaces/{ns_id}/config", json={"grounding_mode": "banana"}, headers={"Authorization": f"Bearer {admin_key}"}, ) @@ -641,7 +641,7 @@ async def test_namespace_config_patch_null_removes_key( await _seed_namespace(fresh_engine, ns_id) r1 = await client.patch( - f"/api/v1/namespaces/{ns_id}/config", + f"/api/v1/admin/namespaces/{ns_id}/config", json={"grounding_mode": "strict"}, headers={"Authorization": f"Bearer {admin_key}"}, ) @@ -649,7 +649,7 @@ async def test_namespace_config_patch_null_removes_key( assert r1.json()["config"] == {"grounding_mode": "strict"} r2 = await client.patch( - f"/api/v1/namespaces/{ns_id}/config", + f"/api/v1/admin/namespaces/{ns_id}/config", json={"grounding_mode": None}, headers={"Authorization": f"Bearer {admin_key}"}, ) @@ -676,7 +676,7 @@ async def test_namespace_config_patch_is_partial(client, bootstrap_key, fresh_en ) resp = await client.patch( - f"/api/v1/namespaces/{ns_id}/config", + f"/api/v1/admin/namespaces/{ns_id}/config", json={"grounding_mode": "hybrid"}, headers={"Authorization": f"Bearer {admin_key}"}, ) @@ -690,7 +690,7 @@ async def test_namespace_config_patch_not_found(client, bootstrap_key): """PATCH on non-existent namespace returns 404 + ERR-ADMIN-005.""" admin_key = await _create_admin_key(client, bootstrap_key) resp = await client.patch( - "/api/v1/namespaces/does-not-exist/config", + "/api/v1/admin/namespaces/does-not-exist/config", json={"grounding_mode": "strict"}, headers={"Authorization": f"Bearer {admin_key}"}, ) @@ -715,7 +715,7 @@ async def test_namespace_config_patch_requires_admin_scope( query_key = resp.json()["key"] resp = await client.patch( - f"/api/v1/namespaces/{ns_id}/config", + f"/api/v1/admin/namespaces/{ns_id}/config", json={"grounding_mode": "hybrid"}, headers={"Authorization": f"Bearer {query_key}"}, ) @@ -736,7 +736,7 @@ async def test_namespace_config_patch_resolves_via_shared_helper( await _seed_namespace(fresh_engine, ns_id) resp = await client.patch( - f"/api/v1/namespaces/{ns_id}/config", + f"/api/v1/admin/namespaces/{ns_id}/config", json={"grounding_mode": "hybrid"}, headers={"Authorization": f"Bearer {admin_key}"}, ) diff --git a/vektra-core/src/vektra_core/pipeline.py b/vektra-core/src/vektra_core/pipeline.py index 6cae0637..c8aff40a 100644 --- a/vektra-core/src/vektra_core/pipeline.py +++ b/vektra-core/src/vektra_core/pipeline.py @@ -76,10 +76,15 @@ async def _fetch_document_names( ) -> dict[Any, str]: """Resolve document IDs to their filenames for source citations (FEAT-012). - Batch lookup against ``source_documents.filename``. Returns an empty map - on any failure (DB not initialized in tests, transient DB error, etc.) — - the pipeline must continue to respond even if citations lose their - human-readable label. Widget falls back to ``chunk_id`` when missing. + Batch lookup against ``source_documents``. Soft-deleted documents + (REQ-057) are still returned with an ``(archived)`` suffix rather than + hidden: the citation must match what was actually retrieved from the + vector store, otherwise students see answers with no traceable source. + + Returns an empty map on any failure (DB not initialized in tests, + transient DB error, etc.) — the pipeline must continue to respond even + if citations lose their human-readable label. Widget falls back to + ``chunk_id`` when a name is missing. """ if not doc_ids: return {} @@ -98,12 +103,18 @@ async def _fetch_document_names( async with factory() as session: result = await session.execute( text( - "SELECT id, filename FROM source_documents " + "SELECT id, filename, deleted_at FROM source_documents " "WHERE id = ANY(CAST(:ids AS uuid[]))" ), {"ids": unique_ids}, ) - return {row[0]: row[1] for row in result.all()} + out: dict[Any, str] = {} + for row in result.all(): + name = row[1] + if row[2] is not None: + name = f"{name} (archived)" + out[row[0]] = name + return out except Exception as exc: log.debug("document_names_fetch_failed", error=str(exc)) return {} diff --git a/vektra-core/tests/test_pipeline.py b/vektra-core/tests/test_pipeline.py index 0a99cdc0..8a1300a7 100644 --- a/vektra-core/tests/test_pipeline.py +++ b/vektra-core/tests/test_pipeline.py @@ -457,6 +457,46 @@ async def _fake_fetch_miss(doc_ids): assert all(s.document_name is None for s in response2.sources) +async def test_fetch_document_names_marks_archived(monkeypatch): + """Soft-deleted documents are returned with ``(archived)`` suffix (REQ-057). + + The citation must still match what was retrieved from the vector store; + we append a marker rather than hiding the document so students can see + that the source exists but is no longer available. + """ + from vektra_core import pipeline as pipeline_mod + + rows = [ + ("doc-a", "active.pdf", None), + ("doc-b", "gone.pdf", "2026-04-20T10:00:00+00:00"), + ] + + class _FakeResult: + def all(self_inner): + return rows + + class _FakeSession: + async def __aenter__(self_inner): + return self_inner + + async def __aexit__(self_inner, *_): + return False + + async def execute(self_inner, *_a, **_k): + return _FakeResult() + + def _factory(): + return _FakeSession() + + # Bypass get_session_factory() so the helper uses our fake session + from vektra_shared import db as db_mod + + monkeypatch.setattr(db_mod, "get_session_factory", lambda: _factory) + + result = await pipeline_mod._fetch_document_names(["doc-a", "doc-b"]) + assert result == {"doc-a": "active.pdf", "doc-b": "gone.pdf (archived)"} + + # --------------------------------------------------------------------------- # Streaming tests (_stream / execute_stream) # --------------------------------------------------------------------------- diff --git a/vektra-learn/src/vektra_learn/api.py b/vektra-learn/src/vektra_learn/api.py index bc05a046..e5d0ba63 100644 --- a/vektra-learn/src/vektra_learn/api.py +++ b/vektra-learn/src/vektra_learn/api.py @@ -18,7 +18,7 @@ import httpx import jwt import structlog -from fastapi import APIRouter, Depends, HTTPException, Query, Request +from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException, Query, Request from fastapi.responses import StreamingResponse from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer from pydantic import BaseModel @@ -38,6 +38,7 @@ TokenRequest, TokenResponse, ) +from vektra_shared.audit import log_event as _audit_log_event from vektra_shared.auth import ApiKeyInfo, require_scope from vektra_shared.errors import ( ERR_LEARN_001, @@ -518,6 +519,7 @@ def _resolve_namespace_from_token( async def get_conversation_turns( conversation_id: UUID, request: Request, + background_tasks: BackgroundTasks, token_payload: dict[str, Any] = Depends(_validate_dashboard_token), ) -> ConversationTurnsResponse: """Return decrypted turns for a conversation belonging to the token's course. @@ -605,6 +607,29 @@ async def get_conversation_turns( ) for t in turns ] + + # NFR-007: log sensitive content access (decrypted conversation turns). + # Learn endpoints authenticate via JWT and do not carry a key_id, so we + # use the sentinel defined for learn-originated rows. + request_id = getattr(request.state, "request_id", None) + if request_id: + background_tasks.add_task( + _audit_log_event, + key_id=_LEARN_SENTINEL_KEY_ID, + endpoint=f"/api/v1/learn/conversations/{conversation_id}/turns", + method="GET", + status_code=200, + request_id=request_id, + action="learn_conversation_turns_read", + log_metadata={ + "namespace": namespace, + "conversation_id": str(conversation_id), + "turn_count": len(items), + "student_id": token_payload.get("sub"), + "course_id": token_payload.get("course_id"), + }, + ) + return ConversationTurnsResponse( conversation_id=conversation_id, namespace=namespace, diff --git a/vektra-learn/tests/test_api.py b/vektra-learn/tests/test_api.py index feaed75f..fdb74fd9 100644 --- a/vektra-learn/tests/test_api.py +++ b/vektra-learn/tests/test_api.py @@ -499,9 +499,10 @@ async def test_returns_decrypted_turns_on_namespace_match(self): ) request = self._make_request(conv_store) + bg = MagicMock() token_payload = {"sub": "s1", "course_id": "CS101"} - resp = await get_conversation_turns(cid, request, token_payload) + resp = await get_conversation_turns(cid, request, bg, token_payload) assert resp.conversation_id == cid assert resp.namespace == "CS101" assert len(resp.turns) == 2 @@ -511,6 +512,14 @@ async def test_returns_decrypted_turns_on_namespace_match(self): # Admin-only metadata must not be exposed assert not hasattr(resp.turns[0], "model") assert not hasattr(resp.turns[0], "response_id") + # Audit log scheduled for content access (NFR-007) + bg.add_task.assert_called_once() + call_kwargs = bg.add_task.call_args.kwargs + assert call_kwargs["action"] == "learn_conversation_turns_read" + assert call_kwargs["log_metadata"]["conversation_id"] == str(cid) + assert call_kwargs["log_metadata"]["namespace"] == "CS101" + assert call_kwargs["log_metadata"]["turn_count"] == 2 + assert call_kwargs["log_metadata"]["student_id"] == "s1" async def test_403_on_namespace_mismatch(self): """Conversation exists but belongs to a different course.""" @@ -529,14 +538,17 @@ async def test_403_on_namespace_mismatch(self): conv_store.get_turns_detail = AsyncMock() request = self._make_request(conv_store) + bg = MagicMock() token_payload = {"sub": "s1", "course_id": "CS101"} with pytest.raises(HTTPException) as exc_info: - await get_conversation_turns(cid, request, token_payload) + await get_conversation_turns(cid, request, bg, token_payload) assert exc_info.value.status_code == 403 assert exc_info.value.detail["error"]["code"] == "ERR-LEARN-006" # Must not leak the actual content of the other-namespace conversation conv_store.get_turns_detail.assert_not_awaited() + # No audit log on denied access (we only log successful reads) + bg.add_task.assert_not_called() async def test_404_on_missing_conversation(self): from vektra_learn.api import get_conversation_turns @@ -546,10 +558,11 @@ async def test_404_on_missing_conversation(self): conv_store.get_turns_detail = AsyncMock() request = self._make_request(conv_store) + bg = MagicMock() token_payload = {"sub": "s1", "course_id": "CS101"} with pytest.raises(HTTPException) as exc_info: - await get_conversation_turns(uuid4(), request, token_payload) + await get_conversation_turns(uuid4(), request, bg, token_payload) assert exc_info.value.status_code == 404 assert exc_info.value.detail["error"]["code"] == "ERR-LEARN-005" conv_store.get_turns_detail.assert_not_awaited() @@ -571,10 +584,11 @@ async def test_404_on_soft_deleted_conversation(self): conv_store.get_turns_detail = AsyncMock() request = self._make_request(conv_store) + bg = MagicMock() token_payload = {"sub": "s1", "course_id": "CS101"} with pytest.raises(HTTPException) as exc_info: - await get_conversation_turns(cid, request, token_payload) + await get_conversation_turns(cid, request, bg, token_payload) assert exc_info.value.status_code == 404 async def test_rejects_token_without_course_id(self): @@ -582,10 +596,11 @@ async def test_rejects_token_without_course_id(self): conv_store = MagicMock() request = self._make_request(conv_store) + bg = MagicMock() token_payload = {"sub": "s1"} # no course_id with pytest.raises(HTTPException) as exc_info: - await get_conversation_turns(uuid4(), request, token_payload) + await get_conversation_turns(uuid4(), request, bg, token_payload) assert exc_info.value.status_code == 401 assert exc_info.value.detail["error"]["code"] == "ERR-LEARN-003" @@ -606,13 +621,14 @@ async def test_respects_namespace_claim_over_course_id(self): conv_store.get_turns_detail = AsyncMock(return_value=[]) request = self._make_request(conv_store) + bg = MagicMock() token_payload = { "sub": "s1", "course_id": "CS101", "namespace": "shared-materials", } - resp = await get_conversation_turns(cid, request, token_payload) + resp = await get_conversation_turns(cid, request, bg, token_payload) assert resp.namespace == "shared-materials" async def test_501ish_when_store_lacks_decryption(self): @@ -625,10 +641,11 @@ class _BareStore: conv_store = _BareStore() request = self._make_request(conv_store) + bg = MagicMock() token_payload = {"sub": "s1", "course_id": "CS101"} with pytest.raises(HTTPException) as exc_info: - await get_conversation_turns(uuid4(), request, token_payload) + await get_conversation_turns(uuid4(), request, bg, token_payload) # ERR-LEARN-001 is CONFIGURATION → 500 assert exc_info.value.status_code == 500 assert exc_info.value.detail["error"]["code"] == "ERR-LEARN-001" From bda675b595fdc386a5151c46e84c9ec9fbca7c5a Mon Sep 17 00:00:00 2001 From: Francesco Vadicamo Date: Tue, 21 Apr 2026 17:47:18 +0000 Subject: [PATCH 11/55] fix(v0.5.0): powered-by customization, restore race, fetch timeout, type test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Second round of review fixes. WI-4: two new optional attrs, data-powered-by-text and data-powered-by-url, so universities can replace the footer text/link without touching the widget source. Default behaviour unchanged ("Powered by Vektra" → vektralabs). Custom URLs are whitelisted to http(s) and same-origin paths; javascript: and data: URLs are rejected. WI-5: restoreConversation is now awaited in init(), preventing a fast-typing user from racing the fetch and seeing their first user message wiped out when replayTurns arrives. Paired with a bounded AbortSignal.timeout(8s) on getConversationTurns so a stalled backend cannot block widget init indefinitely. WI-3: integration test asserts a non-string value (42) is still rejected with ERR-ADMIN-007. Guards the whitelist contract against future permissive/type-coercing refactors. Co-Authored-By: Claude Opus 4.7 (1M context) --- vektra-admin/tests/test_integration.py | 23 ++++++++++++ vektra-learn/widget/src/api-client.js | 4 ++ vektra-learn/widget/src/chat-ui.js | 52 ++++++++++++++++++++++---- vektra-learn/widget/src/index.js | 30 +++++++++++---- 4 files changed, 93 insertions(+), 16 deletions(-) diff --git a/vektra-admin/tests/test_integration.py b/vektra-admin/tests/test_integration.py index 7cb63fca..d52325f4 100644 --- a/vektra-admin/tests/test_integration.py +++ b/vektra-admin/tests/test_integration.py @@ -632,6 +632,29 @@ async def test_namespace_config_patch_rejects_invalid_value( assert "banana" in err["message"] +async def test_namespace_config_patch_rejects_non_string_value( + client, bootstrap_key, fresh_engine +): + """Guard against clients (e.g. Moodle plugin) sending the wrong JSON type. + + 42 is not in the allowed enum; the whitelist check rejects it regardless + of runtime type. Without this test the behaviour would silently work + today but could regress to a permissive type-coercing check. + """ + admin_key = await _create_admin_key(client, bootstrap_key) + ns_id = "wi3-nonstring-value" + await _seed_namespace(fresh_engine, ns_id) + + resp = await client.patch( + f"/api/v1/admin/namespaces/{ns_id}/config", + json={"grounding_mode": 42}, + headers={"Authorization": f"Bearer {admin_key}"}, + ) + assert resp.status_code == 400, resp.text + err = resp.json()["detail"]["error"] + assert err["code"] == "ERR-ADMIN-007" + + async def test_namespace_config_patch_null_removes_key( client, bootstrap_key, fresh_engine ): diff --git a/vektra-learn/widget/src/api-client.js b/vektra-learn/widget/src/api-client.js index 2d0b601c..e2c86953 100644 --- a/vektra-learn/widget/src/api-client.js +++ b/vektra-learn/widget/src/api-client.js @@ -50,11 +50,15 @@ export class ApiClient { * @returns {Promise<{conversation_id: string, namespace: string, turns: Array}|null>} */ async getConversationTurns(conversationId, _retried = false) { + // Bounded timeout so a stalled backend doesn't block widget init. + // AbortSignal.timeout rejects the fetch with an AbortError; callers + // already handle thrown errors by leaving stored state untouched. const resp = await fetch( `${this._apiUrl}/api/v1/learn/conversations/${encodeURIComponent(conversationId)}/turns`, { method: "GET", headers: { Authorization: `Bearer ${this._token}` }, + signal: AbortSignal.timeout(8000), } ); if (resp.status === 401 && !_retried) { diff --git a/vektra-learn/widget/src/chat-ui.js b/vektra-learn/widget/src/chat-ui.js index 96121ad7..e63d76ba 100644 --- a/vektra-learn/widget/src/chat-ui.js +++ b/vektra-learn/widget/src/chat-ui.js @@ -64,6 +64,17 @@ function _iconIsUrl(value) { ); } +// Only http(s) and same-origin paths are allowed as link targets. Rejects +// javascript:, data:, and other schemes that could enable XSS when the +// operator-controlled attribute is combined with custom text. +function _isSafeLinkUrl(value) { + if (typeof value !== "string") return false; + const v = value.trim(); + return ( + v.startsWith("http://") || v.startsWith("https://") || v.startsWith("/") + ); +} + export class ChatUI { /** * @param {object} opts @@ -73,7 +84,9 @@ export class ChatUI { * @param {string|null} [opts.customPrimaryColor] - CSS color for accents * @param {string|null} [opts.customIcon] - emoji or image URL for the button * @param {string|null} [opts.welcomeMessage] - first assistant message on open - * @param {boolean} [opts.showPoweredBy=true] - show "Powered by Vektra" footer + * @param {boolean} [opts.showPoweredBy=true] - show attribution footer + * @param {string|null} [opts.poweredByText] - custom footer text (plain text) + * @param {string|null} [opts.poweredByUrl] - custom footer link target * @param {function} opts.onSend - callback(question: string) * @param {function} [opts.onNewChat] - callback invoked when "New chat" is clicked */ @@ -85,6 +98,8 @@ export class ChatUI { customIcon = null, welcomeMessage = null, showPoweredBy = true, + poweredByText = null, + poweredByUrl = null, onSend, onNewChat = null, }) { @@ -100,6 +115,8 @@ export class ChatUI { this._icon = customIcon; this._welcomeMessage = welcomeMessage; this._showPoweredBy = showPoweredBy; + this._poweredByText = poweredByText; + this._poweredByUrl = poweredByUrl; this._welcomeShown = false; // Accept color only if it matches the conservative whitelist; anything // else is silently ignored (falls back to theme default) to prevent CSS @@ -171,13 +188,32 @@ export class ChatUI { if (this._showPoweredBy) { const footer = document.createElement("div"); footer.className = "vektra-chat-powered-by"; - const link = document.createElement("a"); - link.href = "https://vektralabs.github.io"; - link.target = "_blank"; - link.rel = "noopener noreferrer"; - link.textContent = "Vektra"; - footer.textContent = "Powered by "; - footer.appendChild(link); + const url = _isSafeLinkUrl(this._poweredByUrl) + ? this._poweredByUrl.trim() + : "https://vektralabs.github.io"; + + if (this._poweredByText) { + // Operator-supplied text is rendered as a single link (textContent, + // not innerHTML, so no HTML injection). If no custom URL was given, + // we still link to the Vektra default — operators who want a + // non-link footer can just pass a blank data-powered-by-url (caught + // by _isSafeLinkUrl returning false) — actually the default wins in + // that case too; we keep it simple and always render a link. + const link = document.createElement("a"); + link.href = url; + link.target = "_blank"; + link.rel = "noopener noreferrer"; + link.textContent = this._poweredByText; + footer.appendChild(link); + } else { + const link = document.createElement("a"); + link.href = url; + link.target = "_blank"; + link.rel = "noopener noreferrer"; + link.textContent = "Vektra"; + footer.textContent = "Powered by "; + footer.appendChild(link); + } this._panel.appendChild(footer); } diff --git a/vektra-learn/widget/src/index.js b/vektra-learn/widget/src/index.js index 096dabf2..60be37cc 100644 --- a/vektra-learn/widget/src/index.js +++ b/vektra-learn/widget/src/index.js @@ -16,7 +16,9 @@ * data-primary-color="#9333ea" * data-icon="https://example.com/bot.png" * data-welcome-message="Hi! Ask me anything about the course." - * data-powered-by="false" + * data-powered-by="true" + * data-powered-by-text="Supported by University of X" + * data-powered-by-url="https://univ.example/help" * > * * White-label attributes (all optional): @@ -24,8 +26,12 @@ * data-primary-color - hex/rgb/named color used for buttons and accents * data-icon - emoji or image URL for the floating button * data-welcome-message - assistant message shown on first open - * data-powered-by - "true" (default) shows Vektra attribution, + * data-powered-by - "true" (default) shows the attribution footer, * "false" hides it + * data-powered-by-text - overrides the footer text (plain text, no HTML). + * Default: "Powered by Vektra" as a link. + * data-powered-by-url - overrides the footer link target. Default: + * https://vektralabs.github.io */ import { ApiClient } from "./api-client.js"; @@ -103,6 +109,8 @@ function _clearStored(courseId) { const showPoweredBy = poweredByAttr === null ? true : poweredByAttr.toLowerCase() !== "false"; + const poweredByText = scriptTag.getAttribute("data-powered-by-text") || null; + const poweredByUrl = scriptTag.getAttribute("data-powered-by-url") || null; if (!apiUrl || !courseId || !token) { console.error( @@ -111,7 +119,7 @@ function _clearStored(courseId) { return; } - function init() { + async function init() { const client = new ApiClient(apiUrl, token, courseId, { tokenRefreshUrl }); const ui = new ChatUI({ @@ -122,6 +130,8 @@ function _clearStored(courseId) { customIcon, welcomeMessage, showPoweredBy, + poweredByText, + poweredByUrl, onSend(question) { const stream = ui.createStreamMessage(); @@ -161,9 +171,12 @@ function _clearStored(courseId) { }, }); - // Restore a conversation from a prior load (WI-5 / FEAT-004). Silent: - // any failure (404, 403, network) keeps the widget usable with a clean - // slate rather than surfacing an error. + // Restore a conversation from a prior load (WI-5 / FEAT-004). Awaited + // so that a message sent by a fast user can't race the fetch: if we + // left this fire-and-forget, a replayTurns arriving after the first + // new query would wipe the already-rendered user message. The fetch + // has an 8s timeout (api-client.js) so a stalled backend cannot block + // init indefinitely. Any failure keeps the stored id for next reload. async function restoreConversation() { const stored = _readStored(courseId); if (!stored) return; @@ -177,10 +190,11 @@ function _clearStored(courseId) { _clearStored(courseId); } } catch { - // Network/transient error: keep stored id, try again next load + // Network/transient error (incl. AbortError on timeout): keep + // stored id, try again next load. } } - restoreConversation(); + await restoreConversation(); // Check API connectivity on startup and show status if unreachable let retryTimer = null; From dfe6be5416ef103871e2062540118c4c5aa902a0 Mon Sep 17 00:00:00 2001 From: Francesco Vadicamo Date: Tue, 21 Apr 2026 20:45:36 +0000 Subject: [PATCH 12/55] fix(learn): always emit turns-read audit even without request_id (CR-3120130889) The existing `if request_id:` guard silently dropped the NFR-007 audit row whenever the RequestIdMiddleware wasn't wired (tests, early boot, misconfiguration). A successful 200 with no audit row is the worst possible failure mode for a compliance log. Synthesize a UUID fallback so every successful turns read leaves an audit trail unconditionally, and add a regression test covering the empty-state case. Renames the adjacent 501ish test to the accurate ERR-LEARN-001 name while we're in the file. Co-Authored-By: Claude Opus 4.7 (1M context) --- vektra-learn/src/vektra_learn/api.py | 40 +++++++++++----------- vektra-learn/tests/test_api.py | 50 ++++++++++++++++++++++++++-- 2 files changed, 68 insertions(+), 22 deletions(-) diff --git a/vektra-learn/src/vektra_learn/api.py b/vektra-learn/src/vektra_learn/api.py index e5d0ba63..4d5bac26 100644 --- a/vektra-learn/src/vektra_learn/api.py +++ b/vektra-learn/src/vektra_learn/api.py @@ -610,25 +610,27 @@ async def get_conversation_turns( # NFR-007: log sensitive content access (decrypted conversation turns). # Learn endpoints authenticate via JWT and do not carry a key_id, so we - # use the sentinel defined for learn-originated rows. - request_id = getattr(request.state, "request_id", None) - if request_id: - background_tasks.add_task( - _audit_log_event, - key_id=_LEARN_SENTINEL_KEY_ID, - endpoint=f"/api/v1/learn/conversations/{conversation_id}/turns", - method="GET", - status_code=200, - request_id=request_id, - action="learn_conversation_turns_read", - log_metadata={ - "namespace": namespace, - "conversation_id": str(conversation_id), - "turn_count": len(items), - "student_id": token_payload.get("sub"), - "course_id": token_payload.get("course_id"), - }, - ) + # use the sentinel defined for learn-originated rows. Fire the audit + # unconditionally with a synthesized request_id if the middleware hasn't + # set one — a missing correlation id must not silently skip the audit + # row (compliance gap otherwise invisible). + request_id = getattr(request.state, "request_id", None) or uuid4() + background_tasks.add_task( + _audit_log_event, + key_id=_LEARN_SENTINEL_KEY_ID, + endpoint=f"/api/v1/learn/conversations/{conversation_id}/turns", + method="GET", + status_code=200, + request_id=request_id, + action="learn_conversation_turns_read", + log_metadata={ + "namespace": namespace, + "conversation_id": str(conversation_id), + "turn_count": len(items), + "student_id": token_payload.get("sub"), + "course_id": token_payload.get("course_id"), + }, + ) return ConversationTurnsResponse( conversation_id=conversation_id, diff --git a/vektra-learn/tests/test_api.py b/vektra-learn/tests/test_api.py index fdb74fd9..fe7aa2d3 100644 --- a/vektra-learn/tests/test_api.py +++ b/vektra-learn/tests/test_api.py @@ -631,8 +631,11 @@ async def test_respects_namespace_claim_over_course_id(self): resp = await get_conversation_turns(cid, request, bg, token_payload) assert resp.namespace == "shared-materials" - async def test_501ish_when_store_lacks_decryption(self): - """In-memory store (no get_metadata / get_turns_detail) returns 503.""" + async def test_500_err_learn_001_when_store_lacks_decryption(self): + """In-memory store (no get_metadata / get_turns_detail) returns 500. + + ERR-LEARN-001 is CONFIGURATION → 500. + """ from vektra_learn.api import get_conversation_turns # bare object with none of the required methods @@ -646,6 +649,47 @@ class _BareStore: with pytest.raises(HTTPException) as exc_info: await get_conversation_turns(uuid4(), request, bg, token_payload) - # ERR-LEARN-001 is CONFIGURATION → 500 assert exc_info.value.status_code == 500 assert exc_info.value.detail["error"]["code"] == "ERR-LEARN-001" + + async def test_audit_fires_even_without_request_id(self): + """NFR-007: audit must fire unconditionally on successful turns read. + + If the RequestIdMiddleware isn't wired (tests, early boot), a missing + request.state.request_id must not silently skip the audit row — the + handler synthesizes a UUID fallback so every authenticated content + access leaves an audit trail. + """ + from vektra_learn.api import get_conversation_turns + + cid = uuid4() + conv_store = MagicMock() + conv_store.get_metadata = AsyncMock( + return_value={ + "id": cid, + "namespace_id": "CS101", + "deleted_at": None, + "turn_count": 0, + } + ) + conv_store.get_turns_detail = AsyncMock(return_value=[]) + + # Simulate middleware not wired: request.state has no attribute + request = self._make_request(conv_store) + + class _EmptyState: + pass + + request.state = _EmptyState() + + bg = MagicMock() + token_payload = {"sub": "s1", "course_id": "CS101"} + + await get_conversation_turns(cid, request, bg, token_payload) + bg.add_task.assert_called_once() + call_kwargs = bg.add_task.call_args.kwargs + assert call_kwargs["action"] == "learn_conversation_turns_read" + # Synthesized request_id must still be a UUID (not None) + from uuid import UUID as _UUID + + assert isinstance(call_kwargs["request_id"], _UUID) From c8e72f849b4490fa701ad7450d4efeeaed9b78ed Mon Sep 17 00:00:00 2001 From: Francesco Vadicamo Date: Tue, 21 Apr 2026 20:46:09 +0000 Subject: [PATCH 13/55] fix(widget): preserve --vektra-primary on hover (CR-3120130903, Gemini dup) Both .vektra-chat-btn:hover and .vektra-chat-send:hover hardcoded ${t.primaryHover}, overriding the CSS variable so a prof's data-primary-color snapped back to theme-default blue on hover. Switch both hover backgrounds to var(--vektra-primary, ${t.primaryHover}) matching the resting-state pattern: custom-color deployments get their brand color darkened by the existing filter: brightness(0.92), theme default still gets the handcrafted primaryHover shade. Co-Authored-By: Claude Opus 4.7 (1M context) --- vektra-learn/widget/src/styles.js | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vektra-learn/widget/src/styles.js b/vektra-learn/widget/src/styles.js index 941f45f4..944c7a3f 100644 --- a/vektra-learn/widget/src/styles.js +++ b/vektra-learn/widget/src/styles.js @@ -66,7 +66,7 @@ export function buildStyles(theme) { line-height: 1; } .vektra-chat-btn:hover { - background: ${t.primaryHover}; + background: var(--vektra-primary, ${t.primaryHover}); filter: brightness(0.92); transform: scale(1.05); } @@ -312,7 +312,7 @@ export function buildStyles(theme) { transition: background 0.2s; } .vektra-chat-send:hover { - background: ${t.primaryHover}; + background: var(--vektra-primary, ${t.primaryHover}); filter: brightness(0.92); } .vektra-chat-send:disabled { From 18af4f38df632fdad2dc0e594504779ebc0bdebf Mon Sep 17 00:00:00 2001 From: Francesco Vadicamo Date: Tue, 21 Apr 2026 20:47:29 +0000 Subject: [PATCH 14/55] fix(core): normalize document_name lookup keys to str (CR-3120130886) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit asyncpg returns uuid.UUID objects for UUID columns and callers pass SearchResult.document_id (UUID) — both sides are UUID, so the lookup happened to work in practice. But the helper's signature left this coincidence implicit, and any path that passed a stringified id (or a future SearchResult field using str) would silently miss. Stringify on both ends: helper returns dict[str, str] with str(row[0]) keys, call sites do name_map.get(str(r.document_id)). Test fake updated to match the documented contract. Co-Authored-By: Claude Opus 4.7 (1M context) --- vektra-core/src/vektra_core/advanced_pipeline.py | 4 ++-- vektra-core/src/vektra_core/pipeline.py | 15 ++++++++++----- vektra-core/tests/test_pipeline.py | 4 ++-- 3 files changed, 14 insertions(+), 9 deletions(-) diff --git a/vektra-core/src/vektra_core/advanced_pipeline.py b/vektra-core/src/vektra_core/advanced_pipeline.py index 15aeab38..c6275d61 100644 --- a/vektra-core/src/vektra_core/advanced_pipeline.py +++ b/vektra-core/src/vektra_core/advanced_pipeline.py @@ -554,7 +554,7 @@ async def execute( snippet=r.text_snippet, citation_id=uuid4(), document_version=r.document_version, - document_name=name_map.get(r.document_id), + document_name=name_map.get(str(r.document_id)), ) for r in selected_chunks ] @@ -803,7 +803,7 @@ async def _stream( "snippet": r.text_snippet, "citation_id": str(uuid4()), "document_version": r.document_version, - "document_name": name_map.get(r.document_id), + "document_name": name_map.get(str(r.document_id)), } for r in selected_chunks ] diff --git a/vektra-core/src/vektra_core/pipeline.py b/vektra-core/src/vektra_core/pipeline.py index c8aff40a..f3f48df8 100644 --- a/vektra-core/src/vektra_core/pipeline.py +++ b/vektra-core/src/vektra_core/pipeline.py @@ -73,7 +73,7 @@ def _history_to_messages(history: list[dict[str, str | None]]) -> list[Message]: async def _fetch_document_names( doc_ids: list[Any], -) -> dict[Any, str]: +) -> dict[str, str]: """Resolve document IDs to their filenames for source citations (FEAT-012). Batch lookup against ``source_documents``. Soft-deleted documents @@ -81,6 +81,11 @@ async def _fetch_document_names( hidden: the citation must match what was actually retrieved from the vector store, otherwise students see answers with no traceable source. + Keys are always stringified: asyncpg returns ``uuid.UUID`` objects for + UUID columns and callers may pass either ``UUID`` or ``str`` ids, so + the map is normalised to ``str`` on both ends to avoid silent lookup + misses. Call sites use ``name_map.get(str(r.document_id))``. + Returns an empty map on any failure (DB not initialized in tests, transient DB error, etc.) — the pipeline must continue to respond even if citations lose their human-readable label. Widget falls back to @@ -108,12 +113,12 @@ async def _fetch_document_names( ), {"ids": unique_ids}, ) - out: dict[Any, str] = {} + out: dict[str, str] = {} for row in result.all(): name = row[1] if row[2] is not None: name = f"{name} (archived)" - out[row[0]] = name + out[str(row[0])] = name return out except Exception as exc: log.debug("document_names_fetch_failed", error=str(exc)) @@ -578,7 +583,7 @@ async def execute( snippet=r.text_snippet, citation_id=uuid4(), document_version=r.document_version, - document_name=name_map.get(r.document_id), + document_name=name_map.get(str(r.document_id)), ) for r in selected_chunks ] @@ -973,7 +978,7 @@ async def _stream(self, query: QueryRequest) -> AsyncGenerator[QueryChunk, None] "snippet": r.text_snippet, "citation_id": str(uuid4()), "document_version": r.document_version, - "document_name": name_map.get(r.document_id), + "document_name": name_map.get(str(r.document_id)), } for r in selected_chunks ] diff --git a/vektra-core/tests/test_pipeline.py b/vektra-core/tests/test_pipeline.py index 8a1300a7..83a4c022 100644 --- a/vektra-core/tests/test_pipeline.py +++ b/vektra-core/tests/test_pipeline.py @@ -435,11 +435,11 @@ async def test_execute_populates_document_name(monkeypatch): vector_store = AsyncMock() vector_store.search = AsyncMock(return_value=results) - # Case 1: DB lookup returns a mapping + # Case 1: DB lookup returns a mapping (keys stringified per helper contract) async def _fake_fetch_hit(doc_ids): # Called with the list of document ids from selected_chunks assert set(doc_ids) == {doc_a, doc_b} - return {doc_a: "lecture-07.pdf", doc_b: "slides.pptx"} + return {str(doc_a): "lecture-07.pdf", str(doc_b): "slides.pptx"} monkeypatch.setattr(pipeline_mod, "_fetch_document_names", _fake_fetch_hit) From 86e53300b7dabb8e6b4b733169b9f6b56540120c Mon Sep 17 00:00:00 2001 From: Francesco Vadicamo Date: Tue, 21 Apr 2026 20:49:29 +0000 Subject: [PATCH 15/55] style(v0.5.0): address minor review comments (CR + Gemini) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Batch of cosmetic / docs fixes from the PR #66 review. - widget: localize "Powered by" via new I18N.poweredBy key (en/it) (CR-3120130900) - widget: tidy the self-contradicting comment around the powered-by footer branch; behaviour unchanged (CR nitpick chat-ui.js:195-216) - widget: wrap init() with .catch(console.error) so any unexpected async rejection surfaces instead of becoming an unhandled promise rejection (CR nitpick index.js:122-197) - shared/types.py: correct SourceRef.document_name docstring — archived documents carry " (archived)" suffix; None is only for DB failures (CR-3120130909) - docs/reference/api.md: add document_name to the /api/v1/query response example so the contract matches the current payload (CR outside-diff api.md:283-297) Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/reference/api.md | 3 ++- vektra-learn/widget/src/chat-ui.js | 25 ++++++++++-------------- vektra-learn/widget/src/index.js | 11 ++++++++--- vektra-shared/src/vektra_shared/types.py | 5 +++-- 4 files changed, 23 insertions(+), 21 deletions(-) diff --git a/docs/reference/api.md b/docs/reference/api.md index 3b1bfbdc..5ebae5d1 100644 --- a/docs/reference/api.md +++ b/docs/reference/api.md @@ -293,7 +293,8 @@ Response: "score": 0.912, "snippet": "...", "citation_id": "d4e5f6-...", - "document_version": 1 + "document_version": 1, + "document_name": "lecture-07.pdf" } ], "conversation_id": null, diff --git a/vektra-learn/widget/src/chat-ui.js b/vektra-learn/widget/src/chat-ui.js index e63d76ba..ee9eaaea 100644 --- a/vektra-learn/widget/src/chat-ui.js +++ b/vektra-learn/widget/src/chat-ui.js @@ -21,6 +21,7 @@ const I18N = { sessionExpired: "Your session has expired. Please reload the page.", close: "Close", newChat: "New chat", + poweredBy: "Powered by", }, it: { title: "Assistente del corso", @@ -36,6 +37,7 @@ const I18N = { sessionExpired: "La sessione è scaduta. Ricarica la pagina.", close: "Chiudi", newChat: "Nuova chat", + poweredBy: "Offerto da", }, }; @@ -188,30 +190,23 @@ export class ChatUI { if (this._showPoweredBy) { const footer = document.createElement("div"); footer.className = "vektra-chat-powered-by"; + // If poweredByText is set, the whole footer is a single link with that + // text. Otherwise the footer is " Vektra". + // A blank or invalid poweredByUrl falls back to the Vektra default URL. const url = _isSafeLinkUrl(this._poweredByUrl) ? this._poweredByUrl.trim() : "https://vektralabs.github.io"; + const link = document.createElement("a"); + link.href = url; + link.target = "_blank"; + link.rel = "noopener noreferrer"; if (this._poweredByText) { - // Operator-supplied text is rendered as a single link (textContent, - // not innerHTML, so no HTML injection). If no custom URL was given, - // we still link to the Vektra default — operators who want a - // non-link footer can just pass a blank data-powered-by-url (caught - // by _isSafeLinkUrl returning false) — actually the default wins in - // that case too; we keep it simple and always render a link. - const link = document.createElement("a"); - link.href = url; - link.target = "_blank"; - link.rel = "noopener noreferrer"; link.textContent = this._poweredByText; footer.appendChild(link); } else { - const link = document.createElement("a"); - link.href = url; - link.target = "_blank"; - link.rel = "noopener noreferrer"; link.textContent = "Vektra"; - footer.textContent = "Powered by "; + footer.textContent = `${this._lang.poweredBy} `; footer.appendChild(link); } this._panel.appendChild(footer); diff --git a/vektra-learn/widget/src/index.js b/vektra-learn/widget/src/index.js index 60be37cc..40fc58e0 100644 --- a/vektra-learn/widget/src/index.js +++ b/vektra-learn/widget/src/index.js @@ -229,10 +229,15 @@ function _clearStored(courseId) { checkConnection(); } - // Wait for DOM to be ready before creating UI elements + // Wait for DOM to be ready before creating UI elements. init is async + // (awaits restoreConversation); wrap with .catch so any unexpected + // rejection surfaces in the console rather than as an unhandled promise. + function runInit() { + init().catch((err) => console.error("[vektra-chat] init failed:", err)); + } if (document.readyState === "loading") { - document.addEventListener("DOMContentLoaded", init); + document.addEventListener("DOMContentLoaded", runInit); } else { - init(); + runInit(); } })(); diff --git a/vektra-shared/src/vektra_shared/types.py b/vektra-shared/src/vektra_shared/types.py index e5507854..d1d90342 100644 --- a/vektra-shared/src/vektra_shared/types.py +++ b/vektra-shared/src/vektra_shared/types.py @@ -284,8 +284,9 @@ class SourceRef: citation_id: UUID document_version: int = 1 # from SearchResult (REQ-056) # FEAT-012: filename of the source document so students can see citations - # like "[1] lecture-07.pdf (0.82)" instead of a chunk UUID. None when - # the referenced document row was soft-deleted or the DB lookup failed. + # like "[1] lecture-07.pdf (0.82)" instead of a chunk UUID. Soft-deleted + # documents (REQ-057) appear with an " (archived)" suffix; None only when + # the DB lookup failed (transient error, DB not initialised in tests). document_name: str | None = None From c57e5f5f7924a201521b27bcdb884fdd00e1b767 Mon Sep 17 00:00:00 2001 From: Francesco Vadicamo Date: Tue, 21 Apr 2026 21:17:49 +0000 Subject: [PATCH 16/55] docs(backlog): track deferred items from PR #66 review Three low-priority items deferred during the PR #66 self-review, now tracked in the backlog so they don't get lost once the PR merges. - DEBT-017: consolidate namespace-resolution logic between `_resolve_namespace_from_token` and `course_query` in vektra-learn - DEBT-018: scope the widget `--vektra-primary` override to widget roots and dedupe the style node across instantiations - DEBT-019: add a unit assertion for `document_name` on the streaming sources payload to complement `test_execute_populates_document_name` Co-Authored-By: Claude Opus 4.7 (1M context) --- .s2s/BACKLOG.md | 56 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/.s2s/BACKLOG.md b/.s2s/BACKLOG.md index a494132a..1e9a46bb 100644 --- a/.s2s/BACKLOG.md +++ b/.s2s/BACKLOG.md @@ -664,6 +664,62 @@ As a result, `conversation.j2` and `TemplateRenderer.render_conversation()` are --- +### DEBT-017: Consolidate namespace-resolution logic in vektra-learn + +**Status**: planned | **Priority**: low | **Created**: 2026-04-21 +**Origin**: CodeRabbit review on PR #66 (v0.5.0), `vektra-learn/src/vektra_learn/api.py:498-512` + +**Context**: `_resolve_namespace_from_token()` (added for WI-1 in v0.5.0) re-implements the same fallback chain that `course_query` does inline around lines 663-673 and 694-695: read `course_id` from the JWT, fall back to `namespace` claim, default to `course_id`. The learn-query path additionally performs an enrollment lookup when `VEKTRA_LEARN_REQUIRE_ENROLLMENT=true` which is interleaved with the plain resolution, so a naive extraction would miss that branch. + +**Proposed approach**: extend `_resolve_namespace_from_token` to return a `(course_id, namespace, namespace_source)` tuple, then refactor `course_query` to call it for the non-enrollment branch while keeping the enrollment path inline. Both endpoints stay in lockstep if the JWT schema evolves (e.g., a new claim is added). + +**Acceptance criteria**: +- [ ] Single helper used by both `get_conversation_turns` and `course_query` (non-enrollment branch) +- [ ] Enrollment-required branch unchanged +- [ ] Tests cover both call sites against a shared fixture set +- [ ] No behaviour change to error codes (ERR-LEARN-003 on missing course_id) + +--- + +### DEBT-018: Scope widget `--vektra-primary` override and dedupe style node + +**Status**: planned | **Priority**: low | **Created**: 2026-04-21 +**Origin**: CodeRabbit review on PR #66 (v0.5.0), `vektra-learn/widget/src/chat-ui.js:138-144` + +**Context**: `ChatUI._injectStyles()` appends a new `