Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,11 @@ Character-focused local chatbot with RAG support (ChromaDB + LangChain), CLI and
## What It Includes

- Local chat runtime backed by `llama-cpp-python`
- Character-card-driven prompting (`cards/*.json`)
- Character-card-driven prompting (`cards/*.json`) with avatar display
- RAG retrieval from ChromaDB collections
- Dynamic context budgeting and history management
- GPU offload auto-layer calculation and KV cache quant support
- Web UI (FastAPI + Jinja2 + HTMX): chat, session management, RAG management, diagnostics
- Scripted workflows for analyzing, pushing, and managing RAG data

## Current Runtime Entry Points
Expand Down Expand Up @@ -75,9 +76,20 @@ Notes for web chat behavior:

- Shows status updates (`Ready`, `Sending`, `Thinking`, `Streaming`, `Timed out`).
- Applies a stream timeout and surfaces a `Retry` button on stream failure.
- Supports named session save + explicit session picker load in the sidebar.
- Shows both latest retrieval debug stats and per-turn retrieval trace history.
- Provides quick actions for copy/export and command-equivalent controls (`clear`, `reload`, `help`).
- Sidebar has three tabs: **Character** (avatar + card info), **Sessions** (save/load/search), **Debug** (per-turn retrieval trace + diagnostics).
- Named session save/load and full-text session search with character and date filters.
- Token budget bar in the Diagnostics tab shows real-time context-window allocation (system / history / RAG / examples / input / reserved / free).
- Per-turn stats: estimated prompt and completion tokens, context window fill %, RAG chunks used.
- Quick actions for copy/export (TXT, JSON, ZIP bundle) and command-equivalent controls (`clear`, `reload`, `help`).
- Saveable preset profiles for retrieval settings (MMR, rerank, multi-query, k values).

RAG management UI at **`/rag`** (link in the chat sidebar):

- Upload new source files (`.txt`) and create ChromaDB collections directly from the browser.
- View, lint, and run coverage analysis on `rag_data/` files.
- List, query, rebuild, and delete collections.
- Run fixture evaluations and view retrieval trend history.
- View embedding benchmark results.

## Setup

Expand Down
Binary file added cards/Shodan-specV2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions core/conversation_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,17 @@ def __init__(self) -> None:
"mes": {"mode": "unknown", "returned": 0, "candidates": 0, "queries": 0, "rerank_applied": False},
"cleanup": {"main": 0, "mes": 0, "cross_removed": 0},
}
self.last_token_budget: dict[str, int] = {
"system_prompt_tokens": 0,
"history_tokens": 0,
"rag_tokens": 0,
"examples_tokens": 0,
"input_tokens": 0,
"total_estimated": 0,
"context_window": 0,
"available_for_context": 0,
"reserved_for_response": 0,
}
self._vector_client: object | None = None
self._vector_embedder: object | None = None
self._cross_encoder: object | None = None
Expand Down
12 changes: 12 additions & 0 deletions core/conversation_prompt_history_mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,18 @@ def _prepare_dynamic_vector_context(self, message: str, mes_example: str) -> tup
vector_context = str(allocation["allocated_context"])
allocated_history = str(allocation["allocated_history"])

self.last_token_budget = {
"system_prompt_tokens": budget.system_prompt_tokens,
"history_tokens": int(allocation["history_tokens"]),
"rag_tokens": int(allocation["context_tokens"]),
"examples_tokens": int(allocation["examples_tokens"]),
"input_tokens": int(allocation["input_tokens"]),
"total_estimated": int(allocation["total_allocated"]) + budget.system_prompt_tokens,
"context_window": budget.total_context,
"available_for_context": budget.available_for_context,
"reserved_for_response": budget.reserved_for_response,
}

if self.runtime_config.debug_context:
logger.debug(self.context_manager.get_context_info(budget, allocation))

Expand Down
30 changes: 30 additions & 0 deletions core/rag_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,36 @@ def file_content(config: RagScriptConfig, filename: str) -> str | None:
return candidate.read_text(encoding="utf-8")


def save_rag_file(config: RagScriptConfig, stem: str, content: bytes) -> dict[str, Any]:
"""Save *content* as ``{stem}.txt`` in the rag_data directory.

Raises ``ValueError`` if *stem* is invalid.
Returns a file-info dict matching the shape produced by :func:`list_rag_files`.
"""
if not is_valid_stem(stem):
msg = f"Invalid stem {stem!r}: only letters, digits, underscores, and hyphens are allowed."
raise ValueError(msg)
rag_dir = Path(config.documents_directory)
rag_dir.mkdir(parents=True, exist_ok=True)
dest = rag_dir / f"{stem}.txt"
dest.write_bytes(content)
return {
"name": dest.name,
"stem": stem,
"type": "message_examples" if stem.endswith("_message_examples") else "lore",
"size": len(content),
"has_metadata": (rag_dir / f"{stem}.json").exists(),
}


def list_rag_stems(config: RagScriptConfig) -> list[str]:
"""Return a sorted list of stems for all .txt files in rag_data/."""
rag_dir = Path(config.documents_directory)
if not rag_dir.exists():
return []
return sorted(p.stem for p in rag_dir.glob("*.txt"))


# ---------------------------------------------------------------------------
# Linting
# ---------------------------------------------------------------------------
Expand Down
30 changes: 29 additions & 1 deletion docs/future_work/COPILOT_COMPACT_REFERENCE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Copilot Compact Reference — Implemented State

Last verified: 2026-03-29
Last verified: 2026-04-03

Use this as the single compact reference for implemented work across conversation quality, RAG quality, and web app behavior.

Expand Down Expand Up @@ -143,18 +143,46 @@ Primary files:
- **Per-turn diagnostics panel**: collapsible sidebar panel showing Turn, Latency (s), Chars, Main chunks, MES chunks, Cross-removed, and Drift score (colour-coded at warning/fail thresholds) for the last 40 turns. Auto-refreshes after each stream. Route: `GET /chat/diagnostics`.
- **Saveable preset profiles**: collapsible sidebar panel for saving/applying/deleting named snapshots of 7 retrieval settings (`use_mmr`, `rag_rerank_enabled`, `rag_sentence_compression_enabled`, `rag_multi_query_enabled`, `rag_k`, `rag_k_mes`, `debug_context`). Profiles persisted in `configs/profiles.json`; applied in-place to the live `ConversationRuntimeConfig` without restart. Routes: `GET/POST /settings/profiles/*`.
- **One-click export bundle**: `GET /chat/export/bundle` downloads a ZIP containing `manifest.json`, `conversation.json` (full session), `retrieval_traces.json` (per-turn history), and `drift_history.json`. Button in composer quick-actions.
- **RAG Management UI** (`/rag`): Standalone dark-theme page with left nav. Sections: Collections (list, detail, delete, ad-hoc query, rebuild/push with async job, fingerprint backfill), Files (list, view, lint run/fix, coverage analysis), Evaluate (fixture pack selector, run evaluate-fixtures, results table, retrieval trend history), Benchmark (last-run model comparison table). Long-running ops (push, evaluate) use in-memory `JobStore` + HTMX polling (`every 2s`). Link from chat sidebar.
- **Session history search**: Collapsible "Search sessions" panel inside the Sessions sidebar panel. Searches all saved `logs/web_sessions/session_*.json` files by free text (matches session name and message content), character name filter, and optional date range. Returns matching sessions with inline message excerpts and a Load button. Route: `GET /sessions/search?q=&character=&from_date=&to_date=`.
- **Token budget visualization + per-turn stats** (`/chat/diagnostics`): A stacked colour-coded bar at the top of the Diagnostics panel shows the current context-window allocation split across System prompt, History, RAG context, Examples, User input, Reserved, and Free headroom (green/yellow/red by fill %). The per-turn table now shows estimated Prompt tokens, estimated Completion tokens (chars/4), Context window % fill (colour-coded), and RAG chunks retrieved. A session-totals row below the table shows cumulative prompt/completion tokens and average context %. Backend: `ConversationManager.last_token_budget` dict populated from `ContextBudget` + `allocate_content()` return values in `_prepare_dynamic_vector_context()`; stored per trace in `_record_retrieval_trace`.
- **Character avatar display + tabbed sidebar**: The chat sidebar is restructured into three tabs
— 🎭 Character, 💾 Sessions, 🔍 Debug — with a compact always-visible header showing a small
avatar and character name. The Character tab displays the full avatar image (if present) alongside
card metadata. Route: `GET /characters/avatar` returns the avatar as a `FileResponse`;
`_character_avatar_path()` searches `character_storage/<stem>/avatar.{ext}` then `cards/<stem>.{ext}`.
`has_avatar` bool is passed to the index template context.
- **RAG file upload + create-collection from UI**: The RAG Files page now includes an "Upload
Source File" panel — file picker (`.txt`), auto-filled stem, optional collection name for
immediate ingest. Uploading without a collection name saves the file and refreshes the file list.
With a collection name it triggers a push job. Each lore file row has an "Ingest →" toggle that
reveals an inline form to build a collection from that file. The Collections page has a "Create
New Collection" section with a dropdown of existing file stems. New routes:
`POST /rag/files/upload` (multipart), `POST /rag/collections`.
New backend: `rag_manager.save_rag_file()`, `rag_manager.list_rag_stems()`.
- **Bug fix — creating new ChromaDB collections**: `push_to_collection()` in
`scripts/rag/push_rag_data.py` previously only caught `ValueError` when deleting a non-existent
collection before recreating it. ChromaDB raises `chromadb.errors.NotFoundError` for missing
collections; that exception was uncaught and crashed the entire push. Fixed by widening the
`except` clause to use the already-defined `MISSING_COLLECTION_ERRORS` tuple
(`ValueError | NotFoundError`). This was a latent bug exposed by the first UI-driven
collection creation.

Primary files:

- `web_app.py`
- `main.py`
- `core/preset_profiles.py`
- `core/rag_manager.py` (+ `save_rag_file`, `list_rag_stems`, `_character_avatar_path` helpers)
- `core/job_queue.py`
- `scripts/rag/push_rag_data.py` (bug fix: `MISSING_COLLECTION_ERRORS` in `push_to_collection`)
- `templates/index.html`
- `templates/chat_message_pair.html`
- `templates/chat_messages.html`
- `templates/chat_single_message.html`
- `templates/diagnostics_panel.html`
- `templates/presets_panel.html`
- `templates/rag/layout.html` (+ 13 RAG partial templates incl. `upload_result.html`)

## Current Defaults Snapshot

Expand Down
Loading