Quick Reference

API Endpoints - Quick Reference

Base URL: http://localhost:8004/api/v1

Auth note: after setup completion, most API routes are protected and require bearer auth.

Quick Navigation

Health & Status
Knowledge Bases
Documents
Chat & Conversations
Retrieve-only
KB Transfer (Export/Import)
Prompts & Self-Check
Embeddings
LLM Models
Ollama
Settings

Health & Status

Method	Endpoint	Description
GET	`/health`	Check API health
GET	`/ready`	Check dependencies readiness
GET	`/info`	Get API info and configuration

Knowledge Bases

Method	Endpoint	Description	Auth
POST	`/knowledge-bases/`	Create knowledge base	-
GET	`/knowledge-bases/`	List knowledge bases	-
GET	`/knowledge-bases/{kb_id}`	Get KB details	-
PUT	`/knowledge-bases/{kb_id}`	Update KB	-
GET	`/knowledge-bases/{kb_id}/retrieval-settings`	Get KB retrieval settings	-
PUT	`/knowledge-bases/{kb_id}/retrieval-settings`	Update KB retrieval settings	-
DELETE	`/knowledge-bases/{kb_id}/retrieval-settings`	Clear KB retrieval settings	-
DELETE	`/knowledge-bases/{kb_id}`	Delete KB (soft)	-
POST	`/knowledge-bases/{kb_id}/reprocess`	Reprocess all documents	-
POST	`/knowledge-bases/{kb_id}/regenerate_chat_titles`	Regenerate chat titles	-
POST	`/knowledge-bases/{kb_id}/cleanup-orphaned-chunks`	Clean orphaned chunks	-

Key Parameters:

name: KB name (required)
embedding_model: Model name (optional, default from settings)
chunking_strategy: fixed_size, semantic, paragraph (optional)
chunk_size: Chunk size in chars (optional, default 1000)
chunk_overlap: Overlap size (optional, default 200)
use_llm_chat_titles: Override LLM titles per KB (optional, null = global default)

Documents

Method	Endpoint	Description	Auth
POST	`/documents/`	Upload document	-
GET	`/documents/`	List documents	-
GET	`/documents/{doc_id}`	Get document with content	-
GET	`/documents/{doc_id}/status`	Get processing status	-
DELETE	`/documents/{doc_id}`	Delete document	-
POST	`/documents/{doc_id}/reprocess`	Reprocess document	-
POST	`/documents/{doc_id}/analyze`	Analyze structure	-
POST	`/documents/{doc_id}/structure/apply`	Apply structure	-
GET	`/documents/{doc_id}/structure`	Get structure	-

Upload Format: multipart/form-data

file: File to upload
knowledge_base_id: Target KB UUID
filename: Custom filename (optional)

Status Polling:

Poll /documents/{doc_id}/status every 1-2 seconds
Check status field for completed or failed
Use progress_percentage (0-100) and processing_stage for UI

Processing Stages:

"Loading document..." - 5%
"Preparing to chunk..." - 15%
"Chunking completed (N chunks)" - 30%
"Generating embeddings (X/N)" - 35-75%
"Embeddings created (N)" - 75%
"Indexing in Qdrant..." - 80%
"Qdrant indexing completed" - 85%
"Indexing BM25..." - 90%
"BM25 indexing completed" - 95%
"Completed" - 100%

Chat & Conversations

Method	Endpoint	Description	Auth
POST	`/chat/`	Query knowledge base	-
GET	`/chat/knowledge-bases/{kb_id}/stats`	Get chat stats	-
GET	`/chat/conversations`	List conversations	-
GET	`/chat/conversations/{id}`	Get conversation	-
PATCH	`/chat/conversations/{id}`	Update conversation title	-
PATCH	`/chat/conversations/{id}/settings`	Update settings	-
GET	`/chat/conversations/{id}/messages`	Get messages	-
DELETE	`/chat/conversations/{id}`	Delete conversation	-

Key Parameters (POST /chat/):

question: User question (required)
knowledge_base_id: Target KB (required)
conversation_id: Continue conversation (optional)
top_k: Chunks to retrieve (optional, default 5)
retrieval_mode: dense or hybrid (optional, default dense)
temperature: LLM temperature (optional, default 0.7)
llm_model: Model name (optional, default from settings)
use_structure: Use document structure (optional, default false)
use_mmr: Use MMR for diversity (optional, default false)
use_self_check: Two-stage answer validation (optional, default true)
context_expansion: Context expansion modes (optional, e.g., ["window"])
context_window: Window size (chunks on each side) for windowed retrieval (optional, default 0)

Hybrid Search Parameters:

lexical_top_k: BM25 top-k (default 10)
hybrid_dense_weight: Dense weight (default 0.7)
hybrid_lexical_weight: Lexical weight (default 0.3)

Reranking Parameters:

rerank_enabled: enable rerank stage
rerank_provider: auto, voyage, cohere
rerank_model: model name for selected provider
rerank_candidate_pool: how many retrieved chunks are reranked
rerank_top_n: how many chunks to keep after rerank
rerank_min_score: optional relevance-score cutoff

Retrieve-only

Method	Endpoint	Description	Auth
POST	`/retrieve/`	Retrieve chunks without LLM generation	-

Key Parameters (POST /retrieve/):

query: Search query (required)
knowledge_base_id: Target KB (required)
top_k: Chunks to retrieve (optional)
document_ids: Limit retrieval to specific document UUIDs inside the KB (optional)
retrieval_mode: dense or hybrid (optional)
score_threshold: Minimum score filter (optional)
use_structure: Structure-aware retrieval (optional)
use_mmr: Diversity-aware retrieval (optional)
context_expansion: e.g. ['window'] (optional)
context_window: Window size (0–5)
debug: Include debug block in response (optional)

Reranking fields are also supported on /retrieve/ with the same semantics as /chat/.

KB Transfer (Export/Import)

Method	Endpoint	Description	Auth
POST	`/kb/export`	Export KB(s) to a `.tar.gz` archive	-
POST	`/kb/import`	Import KB archive (multipart)	-
POST	`/kb/export-chats-md`	Export chats as Markdown (`.zip`)	-

Notes:

mode=merge can target a KB via target_kb_id (single-KB archive only).
If vectors are included, target KB embedding model/provider/dimension must match.

Prompts & Self-Check

Method	Endpoint	Description	Auth
GET	`/prompts/`	List chat prompt versions	-
GET	`/prompts/active`	Get active chat prompt	-
GET	`/prompts/{id}`	Get chat prompt version	-
POST	`/prompts/`	Create chat prompt version	-
POST	`/prompts/{id}/activate`	Activate chat prompt version	-
GET	`/prompts/self-check`	List self-check prompt versions	-
GET	`/prompts/self-check/active`	Get active self-check prompt	-
GET	`/prompts/self-check/{id}`	Get self-check prompt version	-
POST	`/prompts/self-check`	Create self-check prompt version	-
POST	`/prompts/self-check/{id}/activate`	Activate self-check prompt	-

Create Prompt Payload:

name: Optional prompt name
system_content: Required prompt text
activate: Optional boolean to set as active

Embeddings

Method	Endpoint	Description
GET	`/embeddings/models`	List all models
GET	`/embeddings/models/{name}`	Get model details
GET	`/embeddings/providers`	List providers
GET	`/embeddings/providers/{provider}/models`	Get provider models

Available Providers:

openai: OpenAI embeddings
voyage: Voyage AI embeddings
ollama: Local Ollama embeddings

Popular Models:

text-embedding-3-small (OpenAI, 1536 dim, $0.02/1M tokens)
text-embedding-3-large (OpenAI, 3072 dim, $0.13/1M tokens)
voyage-4 (Voyage, 1024 dim, $0.06/1M tokens)
nomic-embed-text (Ollama, 768 dim, free)

LLM Models

Method	Endpoint	Description
GET	`/llm/models`	List all LLM models
GET	`/llm/providers`	List LLM providers

Available Providers:

openai: GPT models
ollama: Local LLM models

Popular Models:

gpt-4o (OpenAI, latest GPT-4 Optimized)
gpt-4o-mini (OpenAI, fast and cheap)
gpt-4-turbo (OpenAI, GPT-4 Turbo)
llama3.1:8b (Ollama, local)

Ollama

Method	Endpoint	Description
GET	`/ollama/status`	Check Ollama status
GET	`/ollama/models`	List all Ollama models
GET	`/ollama/models/embeddings`	List embedding models
GET	`/ollama/models/llm`	List LLM models

Settings

Method	Endpoint	Description
GET	`/settings/`	Get app settings
PUT	`/settings/`	Update settings
POST	`/settings/reset`	Reset to defaults
GET	`/settings/metadata`	Get metadata (options)

Key Settings:

llm_model: Default LLM model
llm_provider: Default LLM provider
temperature: Default temperature (0-2)
top_k: Default retrieval count
retrieval_mode: Default retrieval mode
use_structure: Use document structure
kb_chunk_size: Default chunk size for new KBs
kb_chunk_overlap: Default overlap for new KBs
use_llm_chat_titles: Enable LLM chat titles globally
active_prompt_version_id: Active chat prompt version
active_self_check_prompt_version_id: Active self-check prompt version
show_prompt_versions: Show prompt version badge in chat responses
rerank_enabled: Global default for reranking
rerank_provider: Global rerank provider (auto|voyage|cohere)
rerank_model: Global rerank model
rerank_candidate_pool: Global rerank candidate pool
rerank_top_n: Global post-rerank keep count
rerank_min_score: Global rerank score threshold

MCP Integration Notes

Use POST /api/v1/retrieve/ for MCP search tools (no chat side-effects).
Avoid POST /api/v1/chat/ from MCP unless you want persistent conversations.
MCP tools retrieve_chunks and rag_query accept options.document_ids (UUID or UUID[]) for document-level filtering.
MCP endpoint: /mcp
Auth:
- Static MCP tokens (Admin UI → MCP)
- OAuth (Gateway): POST /oauth/token with password + refresh flow

Common Patterns

Create KB → Upload Doc → Query

# 1. Create KB
KB_ID=$(curl -X POST /api/v1/knowledge-bases/ -d '{"name":"MyKB"}' | jq -r .id)

# 2. Upload document
DOC_ID=$(curl -X POST /api/v1/documents/ -F file=@doc.pdf -F knowledge_base_id=$KB_ID | jq -r .id)

# 3. Wait for completion
while [ "$(curl -s /api/v1/documents/$DOC_ID/status | jq -r .status)" != "completed" ]; do sleep 2; done

# 4. Query
curl -X POST /api/v1/chat/ -d '{"question":"What is this about?","knowledge_base_id":"'$KB_ID'"}'

Poll Document Progress

DOC_ID="..."
while true; do
  STATUS=$(curl -s /api/v1/documents/$DOC_ID/status)
  echo "$STATUS" | jq '{status, progress: .progress_percentage, stage: .processing_stage}'

  if [ "$(echo $STATUS | jq -r .status)" = "completed" ]; then
    break
  fi
  sleep 1
done

Hybrid Search Query

Retrieve-only Query

curl -X POST /api/v1/retrieve/   -H "Content-Type: application/json"   -d '{
    "query": "What is this about?",
    "knowledge_base_id": "uuid",
    "top_k": 5,
    "debug": true
  }'

curl -X POST /api/v1/chat/ \
  -H "Content-Type: application/json" \
  -d '{
    "question": "How to configure?",
    "knowledge_base_id": "uuid",
    "retrieval_mode": "hybrid",
    "hybrid_dense_weight": 0.7,
    "hybrid_lexical_weight": 0.3,
    "lexical_top_k": 10,
    "top_k": 5,
    "debug": true
  }'

HTTP Status Codes

Code	Meaning	When
200	OK	Success
201	Created	Resource created
204	No Content	Successful deletion
400	Bad Request	Invalid parameters
404	Not Found	Resource not found
422	Unprocessable	Validation error
500	Server Error	Internal error
503	Unavailable	Service down

Error Response Format

{
  "detail": "Error message",
  "path": "/api/v1/endpoint",
  "suggestion": "Try this instead"
}

Interactive Documentation

Swagger UI: http://localhost:8004/docs
ReDoc: http://localhost:8004/redoc
OpenAPI JSON: http://localhost:8004/api/v1/openapi.json

Last Updated: 2026-02-06

📝 Questions? Open an issue | 🌟 Like it? Star the repo | 📖 API Docs: Swagger UI

📚 Documentation

Getting Started

🏠 Home

API Reference

Operations

🔧 Troubleshooting Guide

Links

Version: v1.0
Updated: 2026-02-08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Reference

API Endpoints - Quick Reference

Quick Navigation

Health & Status

Knowledge Bases

Documents

Chat & Conversations

Retrieve-only

KB Transfer (Export/Import)

Prompts & Self-Check

Embeddings

LLM Models

Ollama

Settings

MCP Integration Notes

Common Patterns

Create KB → Upload Doc → Query

Poll Document Progress

Hybrid Search Query

Retrieve-only Query

HTTP Status Codes

Error Response Format

Interactive Documentation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

📚 Documentation

Getting Started

API Reference

Operations

Links

Clone this wiki locally