ossirytk · ossirytk · Apr 4, 2026 · Apr 3, 2026 · Apr 4, 2026
diff --git a/README.md b/README.md
@@ -13,10 +13,11 @@ Character-focused local chatbot with RAG support (ChromaDB + LangChain), CLI and
 ## What It Includes
 
 - Local chat runtime backed by `llama-cpp-python`
-- Character-card-driven prompting (`cards/*.json`)
+- Character-card-driven prompting (`cards/*.json`) with avatar display
 - RAG retrieval from ChromaDB collections
 - Dynamic context budgeting and history management
 - GPU offload auto-layer calculation and KV cache quant support
+- Web UI (FastAPI + Jinja2 + HTMX): chat, session management, RAG management, diagnostics
 - Scripted workflows for analyzing, pushing, and managing RAG data
 
 ## Current Runtime Entry Points
@@ -75,9 +76,20 @@ Notes for web chat behavior:
 
 - Shows status updates (`Ready`, `Sending`, `Thinking`, `Streaming`, `Timed out`).
 - Applies a stream timeout and surfaces a `Retry` button on stream failure.
-- Supports named session save + explicit session picker load in the sidebar.
-- Shows both latest retrieval debug stats and per-turn retrieval trace history.
-- Provides quick actions for copy/export and command-equivalent controls (`clear`, `reload`, `help`).
+- Sidebar has three tabs: **Character** (avatar + card info), **Sessions** (save/load/search), **Debug** (per-turn retrieval trace + diagnostics).
+- Named session save/load and full-text session search with character and date filters.
+- Token budget bar in the Diagnostics tab shows real-time context-window allocation (system / history / RAG / examples / input / reserved / free).
+- Per-turn stats: estimated prompt and completion tokens, context window fill %, RAG chunks used.
+- Quick actions for copy/export (TXT, JSON, ZIP bundle) and command-equivalent controls (`clear`, `reload`, `help`).
+- Saveable preset profiles for retrieval settings (MMR, rerank, multi-query, k values).
+
+RAG management UI at **`/rag`** (link in the chat sidebar):
+
+- Upload new source files (`.txt`) and create ChromaDB collections directly from the browser.
+- View, lint, and run coverage analysis on `rag_data/` files.
+- List, query, rebuild, and delete collections.
+- Run fixture evaluations and view retrieval trend history.
+- View embedding benchmark results.
 
 ## Setup
 

diff --git a/cards/Shodan-specV2.jpg b/cards/Shodan-specV2.jpg
diff --git a/core/conversation_manager.py b/core/conversation_manager.py
@@ -143,6 +143,17 @@ def __init__(self) -> None:
             "mes": {"mode": "unknown", "returned": 0, "candidates": 0, "queries": 0, "rerank_applied": False},
             "cleanup": {"main": 0, "mes": 0, "cross_removed": 0},
         }
+        self.last_token_budget: dict[str, int] = {
+            "system_prompt_tokens": 0,
+            "history_tokens": 0,
+            "rag_tokens": 0,
+            "examples_tokens": 0,
+            "input_tokens": 0,
+            "total_estimated": 0,
+            "context_window": 0,
+            "available_for_context": 0,
+            "reserved_for_response": 0,
+        }
         self._vector_client: object | None = None
         self._vector_embedder: object | None = None
         self._cross_encoder: object | None = None

diff --git a/core/conversation_prompt_history_mixin.py b/core/conversation_prompt_history_mixin.py
@@ -186,6 +186,18 @@ def _prepare_dynamic_vector_context(self, message: str, mes_example: str) -> tup
         vector_context = str(allocation["allocated_context"])
         allocated_history = str(allocation["allocated_history"])
 
+        self.last_token_budget = {
+            "system_prompt_tokens": budget.system_prompt_tokens,
+            "history_tokens": int(allocation["history_tokens"]),
+            "rag_tokens": int(allocation["context_tokens"]),
+            "examples_tokens": int(allocation["examples_tokens"]),
+            "input_tokens": int(allocation["input_tokens"]),
+            "total_estimated": int(allocation["total_allocated"]) + budget.system_prompt_tokens,
+            "context_window": budget.total_context,
+            "available_for_context": budget.available_for_context,
+            "reserved_for_response": budget.reserved_for_response,
+        }
+
         if self.runtime_config.debug_context:
             logger.debug(self.context_manager.get_context_info(budget, allocation))
 

diff --git a/core/rag_manager.py b/core/rag_manager.py
@@ -204,6 +204,36 @@ def file_content(config: RagScriptConfig, filename: str) -> str | None:
     return candidate.read_text(encoding="utf-8")
 
 
+def save_rag_file(config: RagScriptConfig, stem: str, content: bytes) -> dict[str, Any]:
+    """Save *content* as ``{stem}.txt`` in the rag_data directory.
+
+    Raises ``ValueError`` if *stem* is invalid.
+    Returns a file-info dict matching the shape produced by :func:`list_rag_files`.
+    """
+    if not is_valid_stem(stem):
+        msg = f"Invalid stem {stem!r}: only letters, digits, underscores, and hyphens are allowed."
+        raise ValueError(msg)
+    rag_dir = Path(config.documents_directory)
+    rag_dir.mkdir(parents=True, exist_ok=True)
+    dest = rag_dir / f"{stem}.txt"
+    dest.write_bytes(content)
+    return {
+        "name": dest.name,
+        "stem": stem,
+        "type": "message_examples" if stem.endswith("_message_examples") else "lore",
+        "size": len(content),
+        "has_metadata": (rag_dir / f"{stem}.json").exists(),
+    }
+
+
+def list_rag_stems(config: RagScriptConfig) -> list[str]:
+    """Return a sorted list of stems for all .txt files in rag_data/."""
+    rag_dir = Path(config.documents_directory)
+    if not rag_dir.exists():
+        return []
+    return sorted(p.stem for p in rag_dir.glob("*.txt"))
+
+
 # ---------------------------------------------------------------------------
 # Linting
 # ---------------------------------------------------------------------------

diff --git a/docs/future_work/COPILOT_COMPACT_REFERENCE.md b/docs/future_work/COPILOT_COMPACT_REFERENCE.md
@@ -1,6 +1,6 @@
 # Copilot Compact Reference — Implemented State
 
-Last verified: 2026-03-29
+Last verified: 2026-04-03
 
 Use this as the single compact reference for implemented work across conversation quality, RAG quality, and web app behavior.
 
@@ -143,18 +143,46 @@ Primary files:
 - **Per-turn diagnostics panel**: collapsible sidebar panel showing Turn, Latency (s), Chars, Main chunks, MES chunks, Cross-removed, and Drift score (colour-coded at warning/fail thresholds) for the last 40 turns. Auto-refreshes after each stream. Route: `GET /chat/diagnostics`.
 - **Saveable preset profiles**: collapsible sidebar panel for saving/applying/deleting named snapshots of 7 retrieval settings (`use_mmr`, `rag_rerank_enabled`, `rag_sentence_compression_enabled`, `rag_multi_query_enabled`, `rag_k`, `rag_k_mes`, `debug_context`). Profiles persisted in `configs/profiles.json`; applied in-place to the live `ConversationRuntimeConfig` without restart. Routes: `GET/POST /settings/profiles/*`.
 - **One-click export bundle**: `GET /chat/export/bundle` downloads a ZIP containing `manifest.json`, `conversation.json` (full session), `retrieval_traces.json` (per-turn history), and `drift_history.json`. Button in composer quick-actions.
+- **RAG Management UI** (`/rag`): Standalone dark-theme page with left nav. Sections: Collections (list, detail, delete, ad-hoc query, rebuild/push with async job, fingerprint backfill), Files (list, view, lint run/fix, coverage analysis), Evaluate (fixture pack selector, run evaluate-fixtures, results table, retrieval trend history), Benchmark (last-run model comparison table). Long-running ops (push, evaluate) use in-memory `JobStore` + HTMX polling (`every 2s`). Link from chat sidebar.
+- **Session history search**: Collapsible "Search sessions" panel inside the Sessions sidebar panel. Searches all saved `logs/web_sessions/session_*.json` files by free text (matches session name and message content), character name filter, and optional date range. Returns matching sessions with inline message excerpts and a Load button. Route: `GET /sessions/search?q=&character=&from_date=&to_date=`.
+- **Token budget visualization + per-turn stats** (`/chat/diagnostics`): A stacked colour-coded bar at the top of the Diagnostics panel shows the current context-window allocation split across System prompt, History, RAG context, Examples, User input, Reserved, and Free headroom (green/yellow/red by fill %). The per-turn table now shows estimated Prompt tokens, estimated Completion tokens (chars/4), Context window % fill (colour-coded), and RAG chunks retrieved. A session-totals row below the table shows cumulative prompt/completion tokens and average context %. Backend: `ConversationManager.last_token_budget` dict populated from `ContextBudget` + `allocate_content()` return values in `_prepare_dynamic_vector_context()`; stored per trace in `_record_retrieval_trace`.
+- **Character avatar display + tabbed sidebar**: The chat sidebar is restructured into three tabs
+  — 🎭 Character, 💾 Sessions, 🔍 Debug — with a compact always-visible header showing a small
+  avatar and character name. The Character tab displays the full avatar image (if present) alongside
+  card metadata. Route: `GET /characters/avatar` returns the avatar as a `FileResponse`;
+  `_character_avatar_path()` searches `character_storage/<stem>/avatar.{ext}` then `cards/<stem>.{ext}`.
+  `has_avatar` bool is passed to the index template context.
+- **RAG file upload + create-collection from UI**: The RAG Files page now includes an "Upload
+  Source File" panel — file picker (`.txt`), auto-filled stem, optional collection name for
+  immediate ingest. Uploading without a collection name saves the file and refreshes the file list.
+  With a collection name it triggers a push job. Each lore file row has an "Ingest →" toggle that
+  reveals an inline form to build a collection from that file. The Collections page has a "Create
+  New Collection" section with a dropdown of existing file stems. New routes:
+  `POST /rag/files/upload` (multipart), `POST /rag/collections`.
+  New backend: `rag_manager.save_rag_file()`, `rag_manager.list_rag_stems()`.
+- **Bug fix — creating new ChromaDB collections**: `push_to_collection()` in
+  `scripts/rag/push_rag_data.py` previously only caught `ValueError` when deleting a non-existent
+  collection before recreating it. ChromaDB raises `chromadb.errors.NotFoundError` for missing
+  collections; that exception was uncaught and crashed the entire push. Fixed by widening the
+  `except` clause to use the already-defined `MISSING_COLLECTION_ERRORS` tuple
+  (`ValueError | NotFoundError`). This was a latent bug exposed by the first UI-driven
+  collection creation.
 
 Primary files:
 
 - `web_app.py`
 - `main.py`
 - `core/preset_profiles.py`
+- `core/rag_manager.py` (+ `save_rag_file`, `list_rag_stems`, `_character_avatar_path` helpers)
+- `core/job_queue.py`
+- `scripts/rag/push_rag_data.py` (bug fix: `MISSING_COLLECTION_ERRORS` in `push_to_collection`)
 - `templates/index.html`
 - `templates/chat_message_pair.html`
 - `templates/chat_messages.html`
 - `templates/chat_single_message.html`
 - `templates/diagnostics_panel.html`
 - `templates/presets_panel.html`
+- `templates/rag/layout.html` (+ 13 RAG partial templates incl. `upload_result.html`)
 
 ## Current Defaults Snapshot