-
Notifications
You must be signed in to change notification settings - Fork 1
Quick Reference
loglux edited this page Mar 6, 2026
·
13 revisions
Base URL: http://localhost:8004/api/v1
Auth note: after setup completion, most API routes are protected and require bearer auth.
- Health & Status
- Knowledge Bases
- Documents
- Chat & Conversations
- Retrieve-only
- KB Transfer (Export/Import)
- Prompts & Self-Check
- Embeddings
- LLM Models
- Ollama
- Settings
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Check API health |
| GET | /ready |
Check dependencies readiness |
| GET | /info |
Get API info and configuration |
| Method | Endpoint | Description | Auth |
|---|---|---|---|
| POST | /knowledge-bases/ |
Create knowledge base | - |
| GET | /knowledge-bases/ |
List knowledge bases | - |
| GET | /knowledge-bases/{kb_id} |
Get KB details | - |
| PUT | /knowledge-bases/{kb_id} |
Update KB | - |
| GET | /knowledge-bases/{kb_id}/retrieval-settings |
Get KB retrieval settings | - |
| PUT | /knowledge-bases/{kb_id}/retrieval-settings |
Update KB retrieval settings | - |
| DELETE | /knowledge-bases/{kb_id}/retrieval-settings |
Clear KB retrieval settings | - |
| DELETE | /knowledge-bases/{kb_id} |
Delete KB (soft) | - |
| POST | /knowledge-bases/{kb_id}/reprocess |
Reprocess all documents | - |
| POST | /knowledge-bases/{kb_id}/regenerate_chat_titles |
Regenerate chat titles | - |
| POST | /knowledge-bases/{kb_id}/cleanup-orphaned-chunks |
Clean orphaned chunks | - |
Key Parameters:
-
name: KB name (required) -
embedding_model: Model name (optional, default from settings) -
chunking_strategy:fixed_size,semantic,paragraph(optional) -
chunk_size: Chunk size in chars (optional, default 1000) -
chunk_overlap: Overlap size (optional, default 200) -
use_llm_chat_titles: Override LLM titles per KB (optional, null = global default)
| Method | Endpoint | Description | Auth |
|---|---|---|---|
| POST | /documents/ |
Upload document | - |
| GET | /documents/ |
List documents | - |
| GET | /documents/{doc_id} |
Get document with content | - |
| GET | /documents/{doc_id}/status |
Get processing status | - |
| DELETE | /documents/{doc_id} |
Delete document | - |
| POST | /documents/{doc_id}/reprocess |
Reprocess document | - |
| POST | /documents/{doc_id}/analyze |
Analyze structure | - |
| POST | /documents/{doc_id}/structure/apply |
Apply structure | - |
| GET | /documents/{doc_id}/structure |
Get structure | - |
Upload Format: multipart/form-data
-
file: File to upload -
knowledge_base_id: Target KB UUID -
filename: Custom filename (optional)
Status Polling:
- Poll
/documents/{doc_id}/statusevery 1-2 seconds - Check
statusfield forcompletedorfailed - Use
progress_percentage(0-100) andprocessing_stagefor UI
Processing Stages:
-
"Loading document..."- 5% -
"Preparing to chunk..."- 15% -
"Chunking completed (N chunks)"- 30% -
"Generating embeddings (X/N)"- 35-75% -
"Embeddings created (N)"- 75% -
"Indexing in Qdrant..."- 80% -
"Qdrant indexing completed"- 85% -
"Indexing BM25..."- 90% -
"BM25 indexing completed"- 95% -
"Completed"- 100%
| Method | Endpoint | Description | Auth |
|---|---|---|---|
| POST | /chat/ |
Query knowledge base | - |
| GET | /chat/knowledge-bases/{kb_id}/stats |
Get chat stats | - |
| GET | /chat/conversations |
List conversations | - |
| GET | /chat/conversations/{id} |
Get conversation | - |
| PATCH | /chat/conversations/{id} |
Update conversation title | - |
| PATCH | /chat/conversations/{id}/settings |
Update settings | - |
| GET | /chat/conversations/{id}/messages |
Get messages | - |
| DELETE | /chat/conversations/{id} |
Delete conversation | - |
Key Parameters (POST /chat/):
-
question: User question (required) -
knowledge_base_id: Target KB (required) -
conversation_id: Continue conversation (optional) -
top_k: Chunks to retrieve (optional, default 5) -
retrieval_mode:denseorhybrid(optional, defaultdense) -
temperature: LLM temperature (optional, default 0.7) -
llm_model: Model name (optional, default from settings) -
use_structure: Use document structure (optional, default false) -
use_mmr: Use MMR for diversity (optional, default false) -
use_self_check: Two-stage answer validation (optional, default true) -
context_expansion: Context expansion modes (optional, e.g.,["window"]) -
context_window: Window size (chunks on each side) for windowed retrieval (optional, default 0)
Hybrid Search Parameters:
-
lexical_top_k: BM25 top-k (default 10) -
hybrid_dense_weight: Dense weight (default 0.7) -
hybrid_lexical_weight: Lexical weight (default 0.3)
Reranking Parameters:
-
rerank_enabled: enable rerank stage -
rerank_provider:auto,voyage,cohere -
rerank_model: model name for selected provider -
rerank_candidate_pool: how many retrieved chunks are reranked -
rerank_top_n: how many chunks to keep after rerank -
rerank_min_score: optional relevance-score cutoff
| Method | Endpoint | Description | Auth |
|---|---|---|---|
| POST | /retrieve/ |
Retrieve chunks without LLM generation | - |
Key Parameters (POST /retrieve/):
-
query: Search query (required) -
knowledge_base_id: Target KB (required) -
top_k: Chunks to retrieve (optional) -
document_ids: Limit retrieval to specific document UUIDs inside the KB (optional) -
retrieval_mode:denseorhybrid(optional) -
score_threshold: Minimum score filter (optional) -
use_structure: Structure-aware retrieval (optional) -
use_mmr: Diversity-aware retrieval (optional) -
context_expansion: e.g.['window'](optional) -
context_window: Window size (0–5) -
debug: Include debug block in response (optional)
Reranking fields are also supported on /retrieve/ with the same semantics as /chat/.
| Method | Endpoint | Description | Auth |
|---|---|---|---|
| POST | /kb/export |
Export KB(s) to a .tar.gz archive |
- |
| POST | /kb/import |
Import KB archive (multipart) | - |
| POST | /kb/export-chats-md |
Export chats as Markdown (.zip) |
- |
Notes:
-
mode=mergecan target a KB viatarget_kb_id(single-KB archive only). - If vectors are included, target KB embedding model/provider/dimension must match.
| Method | Endpoint | Description | Auth |
|---|---|---|---|
| GET | /prompts/ |
List chat prompt versions | - |
| GET | /prompts/active |
Get active chat prompt | - |
| GET | /prompts/{id} |
Get chat prompt version | - |
| POST | /prompts/ |
Create chat prompt version | - |
| POST | /prompts/{id}/activate |
Activate chat prompt version | - |
| GET | /prompts/self-check |
List self-check prompt versions | - |
| GET | /prompts/self-check/active |
Get active self-check prompt | - |
| GET | /prompts/self-check/{id} |
Get self-check prompt version | - |
| POST | /prompts/self-check |
Create self-check prompt version | - |
| POST | /prompts/self-check/{id}/activate |
Activate self-check prompt | - |
Create Prompt Payload:
-
name: Optional prompt name -
system_content: Required prompt text -
activate: Optional boolean to set as active
| Method | Endpoint | Description |
|---|---|---|
| GET | /embeddings/models |
List all models |
| GET | /embeddings/models/{name} |
Get model details |
| GET | /embeddings/providers |
List providers |
| GET | /embeddings/providers/{provider}/models |
Get provider models |
Available Providers:
-
openai: OpenAI embeddings -
voyage: Voyage AI embeddings -
ollama: Local Ollama embeddings
Popular Models:
-
text-embedding-3-small(OpenAI, 1536 dim, $0.02/1M tokens) -
text-embedding-3-large(OpenAI, 3072 dim, $0.13/1M tokens) -
voyage-4(Voyage, 1024 dim, $0.06/1M tokens) -
nomic-embed-text(Ollama, 768 dim, free)
| Method | Endpoint | Description |
|---|---|---|
| GET | /llm/models |
List all LLM models |
| GET | /llm/providers |
List LLM providers |
Available Providers:
-
openai: GPT models -
ollama: Local LLM models
Popular Models:
-
gpt-4o(OpenAI, latest GPT-4 Optimized) -
gpt-4o-mini(OpenAI, fast and cheap) -
gpt-4-turbo(OpenAI, GPT-4 Turbo) -
llama3.1:8b(Ollama, local)
| Method | Endpoint | Description |
|---|---|---|
| GET | /ollama/status |
Check Ollama status |
| GET | /ollama/models |
List all Ollama models |
| GET | /ollama/models/embeddings |
List embedding models |
| GET | /ollama/models/llm |
List LLM models |
| Method | Endpoint | Description |
|---|---|---|
| GET | /settings/ |
Get app settings |
| PUT | /settings/ |
Update settings |
| POST | /settings/reset |
Reset to defaults |
| GET | /settings/metadata |
Get metadata (options) |
Key Settings:
-
llm_model: Default LLM model -
llm_provider: Default LLM provider -
temperature: Default temperature (0-2) -
top_k: Default retrieval count -
retrieval_mode: Default retrieval mode -
use_structure: Use document structure -
kb_chunk_size: Default chunk size for new KBs -
kb_chunk_overlap: Default overlap for new KBs -
use_llm_chat_titles: Enable LLM chat titles globally -
active_prompt_version_id: Active chat prompt version -
active_self_check_prompt_version_id: Active self-check prompt version -
show_prompt_versions: Show prompt version badge in chat responses -
rerank_enabled: Global default for reranking -
rerank_provider: Global rerank provider (auto|voyage|cohere) -
rerank_model: Global rerank model -
rerank_candidate_pool: Global rerank candidate pool -
rerank_top_n: Global post-rerank keep count -
rerank_min_score: Global rerank score threshold
- Use
POST /api/v1/retrieve/for MCP search tools (no chat side-effects). - Avoid
POST /api/v1/chat/from MCP unless you want persistent conversations. - MCP tools
retrieve_chunksandrag_queryacceptoptions.document_ids(UUIDorUUID[]) for document-level filtering. - MCP endpoint:
/mcp - Auth:
- Static MCP tokens (Admin UI → MCP)
- OAuth (Gateway):
POST /oauth/tokenwith password + refresh flow
# 1. Create KB
KB_ID=$(curl -X POST /api/v1/knowledge-bases/ -d '{"name":"MyKB"}' | jq -r .id)
# 2. Upload document
DOC_ID=$(curl -X POST /api/v1/documents/ -F file=@doc.pdf -F knowledge_base_id=$KB_ID | jq -r .id)
# 3. Wait for completion
while [ "$(curl -s /api/v1/documents/$DOC_ID/status | jq -r .status)" != "completed" ]; do sleep 2; done
# 4. Query
curl -X POST /api/v1/chat/ -d '{"question":"What is this about?","knowledge_base_id":"'$KB_ID'"}'DOC_ID="..."
while true; do
STATUS=$(curl -s /api/v1/documents/$DOC_ID/status)
echo "$STATUS" | jq '{status, progress: .progress_percentage, stage: .processing_stage}'
if [ "$(echo $STATUS | jq -r .status)" = "completed" ]; then
break
fi
sleep 1
donecurl -X POST /api/v1/retrieve/ -H "Content-Type: application/json" -d '{
"query": "What is this about?",
"knowledge_base_id": "uuid",
"top_k": 5,
"debug": true
}'curl -X POST /api/v1/chat/ \
-H "Content-Type: application/json" \
-d '{
"question": "How to configure?",
"knowledge_base_id": "uuid",
"retrieval_mode": "hybrid",
"hybrid_dense_weight": 0.7,
"hybrid_lexical_weight": 0.3,
"lexical_top_k": 10,
"top_k": 5,
"debug": true
}'| Code | Meaning | When |
|---|---|---|
| 200 | OK | Success |
| 201 | Created | Resource created |
| 204 | No Content | Successful deletion |
| 400 | Bad Request | Invalid parameters |
| 404 | Not Found | Resource not found |
| 422 | Unprocessable | Validation error |
| 500 | Server Error | Internal error |
| 503 | Unavailable | Service down |
{
"detail": "Error message",
"path": "/api/v1/endpoint",
"suggestion": "Try this instead"
}- Swagger UI: http://localhost:8004/docs
- ReDoc: http://localhost:8004/redoc
- OpenAPI JSON: http://localhost:8004/api/v1/openapi.json
Last Updated: 2026-02-06
📝 Questions? Open an issue | 🌟 Like it? Star the repo | 📖 API Docs: Swagger UI
Version: v1.0
Updated: 2026-02-08