Skip to content

Quick Reference

loglux edited this page Mar 6, 2026 · 13 revisions

API Endpoints - Quick Reference

Base URL: http://localhost:8004/api/v1

Auth note: after setup completion, most API routes are protected and require bearer auth.

Quick Navigation


Health & Status

Method Endpoint Description
GET /health Check API health
GET /ready Check dependencies readiness
GET /info Get API info and configuration

Knowledge Bases

Method Endpoint Description Auth
POST /knowledge-bases/ Create knowledge base -
GET /knowledge-bases/ List knowledge bases -
GET /knowledge-bases/{kb_id} Get KB details -
PUT /knowledge-bases/{kb_id} Update KB -
GET /knowledge-bases/{kb_id}/retrieval-settings Get KB retrieval settings -
PUT /knowledge-bases/{kb_id}/retrieval-settings Update KB retrieval settings -
DELETE /knowledge-bases/{kb_id}/retrieval-settings Clear KB retrieval settings -
DELETE /knowledge-bases/{kb_id} Delete KB (soft) -
POST /knowledge-bases/{kb_id}/reprocess Reprocess all documents -
POST /knowledge-bases/{kb_id}/regenerate_chat_titles Regenerate chat titles -
POST /knowledge-bases/{kb_id}/cleanup-orphaned-chunks Clean orphaned chunks -

Key Parameters:

  • name: KB name (required)
  • embedding_model: Model name (optional, default from settings)
  • chunking_strategy: fixed_size, semantic, paragraph (optional)
  • chunk_size: Chunk size in chars (optional, default 1000)
  • chunk_overlap: Overlap size (optional, default 200)
  • use_llm_chat_titles: Override LLM titles per KB (optional, null = global default)

Documents

Method Endpoint Description Auth
POST /documents/ Upload document -
GET /documents/ List documents -
GET /documents/{doc_id} Get document with content -
GET /documents/{doc_id}/status Get processing status -
DELETE /documents/{doc_id} Delete document -
POST /documents/{doc_id}/reprocess Reprocess document -
POST /documents/{doc_id}/analyze Analyze structure -
POST /documents/{doc_id}/structure/apply Apply structure -
GET /documents/{doc_id}/structure Get structure -

Upload Format: multipart/form-data

  • file: File to upload
  • knowledge_base_id: Target KB UUID
  • filename: Custom filename (optional)

Status Polling:

  • Poll /documents/{doc_id}/status every 1-2 seconds
  • Check status field for completed or failed
  • Use progress_percentage (0-100) and processing_stage for UI

Processing Stages:

  1. "Loading document..." - 5%
  2. "Preparing to chunk..." - 15%
  3. "Chunking completed (N chunks)" - 30%
  4. "Generating embeddings (X/N)" - 35-75%
  5. "Embeddings created (N)" - 75%
  6. "Indexing in Qdrant..." - 80%
  7. "Qdrant indexing completed" - 85%
  8. "Indexing BM25..." - 90%
  9. "BM25 indexing completed" - 95%
  10. "Completed" - 100%

Chat & Conversations

Method Endpoint Description Auth
POST /chat/ Query knowledge base -
GET /chat/knowledge-bases/{kb_id}/stats Get chat stats -
GET /chat/conversations List conversations -
GET /chat/conversations/{id} Get conversation -
PATCH /chat/conversations/{id} Update conversation title -
PATCH /chat/conversations/{id}/settings Update settings -
GET /chat/conversations/{id}/messages Get messages -
DELETE /chat/conversations/{id} Delete conversation -

Key Parameters (POST /chat/):

  • question: User question (required)
  • knowledge_base_id: Target KB (required)
  • conversation_id: Continue conversation (optional)
  • top_k: Chunks to retrieve (optional, default 5)
  • retrieval_mode: dense or hybrid (optional, default dense)
  • temperature: LLM temperature (optional, default 0.7)
  • llm_model: Model name (optional, default from settings)
  • use_structure: Use document structure (optional, default false)
  • use_mmr: Use MMR for diversity (optional, default false)
  • use_self_check: Two-stage answer validation (optional, default true)
  • context_expansion: Context expansion modes (optional, e.g., ["window"])
  • context_window: Window size (chunks on each side) for windowed retrieval (optional, default 0)

Hybrid Search Parameters:

  • lexical_top_k: BM25 top-k (default 10)
  • hybrid_dense_weight: Dense weight (default 0.7)
  • hybrid_lexical_weight: Lexical weight (default 0.3)

Reranking Parameters:

  • rerank_enabled: enable rerank stage
  • rerank_provider: auto, voyage, cohere
  • rerank_model: model name for selected provider
  • rerank_candidate_pool: how many retrieved chunks are reranked
  • rerank_top_n: how many chunks to keep after rerank
  • rerank_min_score: optional relevance-score cutoff

Retrieve-only

Method Endpoint Description Auth
POST /retrieve/ Retrieve chunks without LLM generation -

Key Parameters (POST /retrieve/):

  • query: Search query (required)
  • knowledge_base_id: Target KB (required)
  • top_k: Chunks to retrieve (optional)
  • document_ids: Limit retrieval to specific document UUIDs inside the KB (optional)
  • retrieval_mode: dense or hybrid (optional)
  • score_threshold: Minimum score filter (optional)
  • use_structure: Structure-aware retrieval (optional)
  • use_mmr: Diversity-aware retrieval (optional)
  • context_expansion: e.g. ['window'] (optional)
  • context_window: Window size (0–5)
  • debug: Include debug block in response (optional)

Reranking fields are also supported on /retrieve/ with the same semantics as /chat/.


KB Transfer (Export/Import)

Method Endpoint Description Auth
POST /kb/export Export KB(s) to a .tar.gz archive -
POST /kb/import Import KB archive (multipart) -
POST /kb/export-chats-md Export chats as Markdown (.zip) -

Notes:

  • mode=merge can target a KB via target_kb_id (single-KB archive only).
  • If vectors are included, target KB embedding model/provider/dimension must match.

Prompts & Self-Check

Method Endpoint Description Auth
GET /prompts/ List chat prompt versions -
GET /prompts/active Get active chat prompt -
GET /prompts/{id} Get chat prompt version -
POST /prompts/ Create chat prompt version -
POST /prompts/{id}/activate Activate chat prompt version -
GET /prompts/self-check List self-check prompt versions -
GET /prompts/self-check/active Get active self-check prompt -
GET /prompts/self-check/{id} Get self-check prompt version -
POST /prompts/self-check Create self-check prompt version -
POST /prompts/self-check/{id}/activate Activate self-check prompt -

Create Prompt Payload:

  • name: Optional prompt name
  • system_content: Required prompt text
  • activate: Optional boolean to set as active

Embeddings

Method Endpoint Description
GET /embeddings/models List all models
GET /embeddings/models/{name} Get model details
GET /embeddings/providers List providers
GET /embeddings/providers/{provider}/models Get provider models

Available Providers:

  • openai: OpenAI embeddings
  • voyage: Voyage AI embeddings
  • ollama: Local Ollama embeddings

Popular Models:

  • text-embedding-3-small (OpenAI, 1536 dim, $0.02/1M tokens)
  • text-embedding-3-large (OpenAI, 3072 dim, $0.13/1M tokens)
  • voyage-4 (Voyage, 1024 dim, $0.06/1M tokens)
  • nomic-embed-text (Ollama, 768 dim, free)

LLM Models

Method Endpoint Description
GET /llm/models List all LLM models
GET /llm/providers List LLM providers

Available Providers:

  • openai: GPT models
  • ollama: Local LLM models

Popular Models:

  • gpt-4o (OpenAI, latest GPT-4 Optimized)
  • gpt-4o-mini (OpenAI, fast and cheap)
  • gpt-4-turbo (OpenAI, GPT-4 Turbo)
  • llama3.1:8b (Ollama, local)

Ollama

Method Endpoint Description
GET /ollama/status Check Ollama status
GET /ollama/models List all Ollama models
GET /ollama/models/embeddings List embedding models
GET /ollama/models/llm List LLM models

Settings

Method Endpoint Description
GET /settings/ Get app settings
PUT /settings/ Update settings
POST /settings/reset Reset to defaults
GET /settings/metadata Get metadata (options)

Key Settings:

  • llm_model: Default LLM model
  • llm_provider: Default LLM provider
  • temperature: Default temperature (0-2)
  • top_k: Default retrieval count
  • retrieval_mode: Default retrieval mode
  • use_structure: Use document structure
  • kb_chunk_size: Default chunk size for new KBs
  • kb_chunk_overlap: Default overlap for new KBs
  • use_llm_chat_titles: Enable LLM chat titles globally
  • active_prompt_version_id: Active chat prompt version
  • active_self_check_prompt_version_id: Active self-check prompt version
  • show_prompt_versions: Show prompt version badge in chat responses
  • rerank_enabled: Global default for reranking
  • rerank_provider: Global rerank provider (auto|voyage|cohere)
  • rerank_model: Global rerank model
  • rerank_candidate_pool: Global rerank candidate pool
  • rerank_top_n: Global post-rerank keep count
  • rerank_min_score: Global rerank score threshold

MCP Integration Notes

  • Use POST /api/v1/retrieve/ for MCP search tools (no chat side-effects).
  • Avoid POST /api/v1/chat/ from MCP unless you want persistent conversations.
  • MCP tools retrieve_chunks and rag_query accept options.document_ids (UUID or UUID[]) for document-level filtering.
  • MCP endpoint: /mcp
  • Auth:
    • Static MCP tokens (Admin UI → MCP)
    • OAuth (Gateway): POST /oauth/token with password + refresh flow

Common Patterns

Create KB → Upload Doc → Query

# 1. Create KB
KB_ID=$(curl -X POST /api/v1/knowledge-bases/ -d '{"name":"MyKB"}' | jq -r .id)

# 2. Upload document
DOC_ID=$(curl -X POST /api/v1/documents/ -F file=@doc.pdf -F knowledge_base_id=$KB_ID | jq -r .id)

# 3. Wait for completion
while [ "$(curl -s /api/v1/documents/$DOC_ID/status | jq -r .status)" != "completed" ]; do sleep 2; done

# 4. Query
curl -X POST /api/v1/chat/ -d '{"question":"What is this about?","knowledge_base_id":"'$KB_ID'"}'

Poll Document Progress

DOC_ID="..."
while true; do
  STATUS=$(curl -s /api/v1/documents/$DOC_ID/status)
  echo "$STATUS" | jq '{status, progress: .progress_percentage, stage: .processing_stage}'

  if [ "$(echo $STATUS | jq -r .status)" = "completed" ]; then
    break
  fi
  sleep 1
done

Hybrid Search Query

Retrieve-only Query

curl -X POST /api/v1/retrieve/   -H "Content-Type: application/json"   -d '{
    "query": "What is this about?",
    "knowledge_base_id": "uuid",
    "top_k": 5,
    "debug": true
  }'
curl -X POST /api/v1/chat/ \
  -H "Content-Type: application/json" \
  -d '{
    "question": "How to configure?",
    "knowledge_base_id": "uuid",
    "retrieval_mode": "hybrid",
    "hybrid_dense_weight": 0.7,
    "hybrid_lexical_weight": 0.3,
    "lexical_top_k": 10,
    "top_k": 5,
    "debug": true
  }'

HTTP Status Codes

Code Meaning When
200 OK Success
201 Created Resource created
204 No Content Successful deletion
400 Bad Request Invalid parameters
404 Not Found Resource not found
422 Unprocessable Validation error
500 Server Error Internal error
503 Unavailable Service down

Error Response Format

{
  "detail": "Error message",
  "path": "/api/v1/endpoint",
  "suggestion": "Try this instead"
}

Interactive Documentation


Last Updated: 2026-02-06