Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,9 +275,10 @@ mcp:

Full schema: [docs/reference/config.md](docs/reference/config.md).

Alternative config presets: `config.ollama.example.yml` (local Ollama),
`config.claude_code.example.yml` (Claude Code CLI), `config.codex.example.yml`,
`config.hermes.example.yml`, `config.openclaw.example.yml`.
Ready-made presets live in [`configs/`](configs/README.md): LLM-provider presets under
`configs/llm/` (Ollama, Claude Code, Codex, Hermes, OpenClaw, OpenRouter-free) and
embedding/retrieval presets under `configs/embedders/` (bge-m3, openai-large, specter2, …).
Point at one with `-c`, e.g. `uv run perspicacite -c configs/embedders/bge_m3.yml serve`.

---

Expand Down
53 changes: 53 additions & 0 deletions configs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Configuration presets

The canonical, fully-documented template is **[`../config.example.yml`](../config.example.yml)**.
Copy it to `config.yml` (git-ignored) and edit:

```bash
cp config.example.yml config.yml
```

`config.yml` in the repo root is the default the CLI loads when you don't pass `-c`.
The files here are ready-made starting points for specific LLM providers or embedding
backends — copy one over `config.yml`, or point at it directly:

```bash
uv run perspicacite -c configs/embedders/openai_large.yml serve
```

## `llm/` — LLM provider presets

Swap the chat/synthesis backend. Each sets `llm.*` for one provider; embedding defaults
to the open local `all-MiniLM-L6-v2`.

| Preset | Backend |
|--------|---------|
| [`llm/claude_code.yml`](llm/claude_code.yml) | Claude Code subscription (CLI auth) |
| [`llm/codex.yml`](llm/codex.yml) | OpenAI Codex CLI subscription |
| [`llm/hermes.yml`](llm/hermes.yml) | Hermes Agent (Nous Research) |
| [`llm/ollama.yml`](llm/ollama.yml) | Local-only / zero cloud cost (Ollama) |
| [`llm/openclaw.yml`](llm/openclaw.yml) | OpenClaw agent |
| [`llm/openrouter-free.yml`](llm/openrouter-free.yml) | OpenRouter free tier |

## `embedders/` — embedding / retrieval presets

Swap the KB embedding model (and matching reranker). See
[`../docs/embedding-models.md`](../docs/embedding-models.md) for the benchmark table.
A KB must be **rebuilt** when you change its embedding model.

| Preset | Embedding model | Notes |
|--------|-----------------|-------|
| [`embedders/bge_m3.yml`](embedders/bge_m3.yml) | `BAAI/bge-m3` | Production biomedical (recommended) |
| [`embedders/openai_large.yml`](embedders/openai_large.yml) | `text-embedding-3-large` | Cross-domain, best generalisation |
| [`embedders/specter2.yml`](embedders/specter2.yml) | `allenai/specter2_base` | Scientific-paper embeddings |
| [`embedders/pubmedbert.yml`](embedders/pubmedbert.yml) | `pritamdeka/S-PubMedBert-MS-MARCO` | Biomedical |
| [`embedders/neuml_pubmedbert.yml`](embedders/neuml_pubmedbert.yml) | `NeuML/pubmedbert-base-embeddings` | Biomedical (NeuML) |
| [`embedders/biomedbert.yml`](embedders/biomedbert.yml) | `microsoft/BiomedNLP-BiomedBERT-…` | Biomedical (Microsoft) |
| [`embedders/bge_en_icl.yml`](embedders/bge_en_icl.yml) | `BAAI/bge-en-icl` | In-context-learning embeddings |
| [`embedders/gte_qwen2_7b.yml`](embedders/gte_qwen2_7b.yml) | `Alibaba-NLP/gte-Qwen2-7B-instruct` | Large instruct embedder |
| [`embedders/stella_1_5b.yml`](embedders/stella_1_5b.yml) | `dunzhang/stella_en_1.5B_v5` | Compact high-quality embedder |
| [`embedders/qwen3_14b.yml`](embedders/qwen3_14b.yml) | `text-embedding-3-large` | Qwen3-14B chat + OpenAI embeddings |
| [`embedders/code_kb.yml`](embedders/code_kb.yml) | `mistralai/codestral-embed-2505` | Code knowledge bases |

Every preset here is parse-validated against the config schema by
`tests/integration/test_config_audit.py`.
2 changes: 1 addition & 1 deletion config_bge_en_icl.yml → configs/embedders/bge_en_icl.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
# Reranker: bge-reranker-v2-m3 pairs well with all BGE embeddings.
#
# Launch:
# uv run perspicacite -c config_bge_en_icl.yml serve
# uv run perspicacite -c configs/embedders/bge_en_icl.yml serve

version: "2.0.0"
config_name: "bge-en-icl-port8005"
Expand Down
4 changes: 2 additions & 2 deletions config_bge_m3.yml → configs/embedders/bge_m3.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# Full benchmark: perspicacite-eval/docs/retrieval_benchmark_2026_05_26.md
#
# Launch:
# uv run perspicacite -c config_bge_m3.yml serve
# uv run perspicacite -c configs/embedders/bge_m3.yml serve

version: "2.0.0"
config_name: "bge-m3-port8004"
Expand Down Expand Up @@ -69,7 +69,7 @@ rag_modes:
# bge-reranker-v2-m3: domain-aware pair for bge-m3 embeddings.
# Key finding: bge-reranker HELPS weaker/domain embeddings (bge-m3, MiniLM, PubMedBERT)
# but HURTS strong embeddings (OpenAI 3-large: −2.1 pp vs ms-marco).
# Do not swap to ms-marco in this config — use config_openai_large.yml for OpenAI.
# Do not swap to ms-marco in this config — use configs/embedders/openai_large.yml for OpenAI.
reranker_model: "BAAI/bge-reranker-v2-m3"

basic:
Expand Down
2 changes: 1 addition & 1 deletion config_biomedbert.yml → configs/embedders/biomedbert.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# No API key required (fully local via sentence-transformers).
#
# Launch:
# uv run perspicacite -c config_biomedbert.yml serve
# uv run perspicacite -c configs/embedders/biomedbert.yml serve

version: "2.0.0"
config_name: "biomedbert-port8006"
Expand Down
2 changes: 1 addition & 1 deletion config_code_kb.yml → configs/embedders/code_kb.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
# Costs: codestral-embed billed per OpenRouter pricing (new docs only; queries cheap).
#
# Launch:
# OPENROUTER_API_KEY=$OPENROUTER_API_KEY uv run perspicacite -c config_code_kb.yml serve
# OPENROUTER_API_KEY=$OPENROUTER_API_KEY uv run perspicacite -c configs/embedders/code_kb.yml serve

version: "2.0.0"
config_name: "code-kb-codestral-port8003"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# Note: requires `st:` prefix since Alibaba-NLP/ is not in auto-detected namespaces.
#
# Launch:
# uv run perspicacite -c config_gte_qwen2_7b.yml serve
# uv run perspicacite -c configs/embedders/gte_qwen2_7b.yml serve

version: "2.0.0"
config_name: "gte-qwen2-7b-port8007"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# No API key required (fully local via sentence-transformers).
#
# Launch:
# uv run perspicacite -c config_neuml_pubmedbert.yml serve
# uv run perspicacite -c configs/embedders/neuml_pubmedbert.yml serve

version: "2.0.0"
config_name: "neuml-pubmedbert-port8007"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# Full benchmark: perspicacite-eval/docs/retrieval_benchmark_2026_05_26.md
#
# Launch:
# OPENAI_API_KEY=$OPENAI_API_KEY uv run perspicacite -c config_openai_large.yml serve
# OPENAI_API_KEY=$OPENAI_API_KEY uv run perspicacite -c configs/embedders/openai_large.yml serve

version: "2.0.0"
config_name: "openai-large-port8002"
Expand Down
2 changes: 1 addition & 1 deletion config_pubmedbert.yml → configs/embedders/pubmedbert.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# No API key required (fully local via sentence-transformers).
#
# Launch:
# uv run perspicacite -c config_pubmedbert.yml serve
# uv run perspicacite -c configs/embedders/pubmedbert.yml serve

version: "2.0.0"
config_name: "pubmedbert-port8005"
Expand Down
2 changes: 1 addition & 1 deletion config_qwen3_14b.yml → configs/embedders/qwen3_14b.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
# /opt/homebrew/opt/ollama/bin/ollama serve (or brew services start ollama)
#
# Launch:
# OPENAI_API_KEY=$OPENAI_API_KEY uv run perspicacite -c config_qwen3_14b.yml serve
# OPENAI_API_KEY=$OPENAI_API_KEY uv run perspicacite -c configs/embedders/qwen3_14b.yml serve
#
# Thinking mode: Qwen3 supports /think ... /no_think tokens.
# Set QWEN3_NO_THINK=1 to prepend /no_think to all prompts (faster, less depth).
Expand Down
2 changes: 1 addition & 1 deletion config_specter2.yml → configs/embedders/specter2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
# 0.7 default filters everything → 0 results)
# Launch:
# TRANSFORMERS_OFFLINE=1 HF_DATASETS_OFFLINE=1 \
# uv run perspicacite -c config_specter2.yml serve
# uv run perspicacite -c configs/embedders/specter2.yml serve
#
# TRANSFORMERS_OFFLINE/HF_DATASETS_OFFLINE prevent the model from trying
# to re-fetch on every startup (we got hit by HF 429s in the previous
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
# in the auto-detected namespace list; prefix stripped before loading.
#
# Launch:
# uv run perspicacite -c config_stella_1_5b.yml serve
# uv run perspicacite -c configs/embedders/stella_1_5b.yml serve

version: "2.0.0"
config_name: "stella-1.5b-port8006"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
# 2. Sign in (one-time): `claude login`
# 3. Verify the CLI works: `echo "say hi" | claude -p --model haiku`
# 4. Use this file as your config:
# perspicacite -c config.claude_code.example.yml serve
# perspicacite -c configs/llm/claude_code.yml serve
#
# **Caveat — shared rate limits.** Perspicacité shares your
# interactive Claude Code rate window. A heavy ingest can freeze
Expand Down
2 changes: 1 addition & 1 deletion config.codex.example.yml → configs/llm/codex.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# 2. Sign in: `codex login` (browser-based ChatGPT auth)
# 3. Verify: `echo "say hi" | codex exec --skip-git-repo-check`
# 4. Use this file as your config:
# perspicacite -c config.codex.example.yml serve
# perspicacite -c configs/llm/codex.yml serve
#
# **Caveat — Codex is an agent, not a pure completion endpoint.**
# Each call spins up Codex's full session machinery (sandbox, tool
Expand Down
4 changes: 2 additions & 2 deletions config.hermes.example.yml → configs/llm/hermes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@
# 2. Configure: `hermes setup`
# 3. Verify: `hermes` (interactive) or your version's one-shot flag
# 4. Use this file as your config:
# perspicacite -c config.hermes.example.yml serve
# perspicacite -c configs/llm/hermes.yml serve
#
# **Note:** If Hermes doesn't ship a one-shot completion mode in
# your version, the alternative is to run Hermes models directly
# via Ollama (the Hermes family is published on Ollama as e.g.
# `hermes-3:70b`) — that's a fully supported path today via
# config.ollama.example.yml.
# configs/llm/ollama.yml.

llm:
default_provider: "agent_cli"
Expand Down
2 changes: 1 addition & 1 deletion config.ollama.example.yml → configs/llm/ollama.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# 3. Start the Ollama server (auto-starts on macOS once installed):
# ollama serve
# 4. Use this file as your config:
# perspicacite -c config.ollama.example.yml serve
# perspicacite -c configs/llm/ollama.yml serve
#
# **Quality vs hardware tradeoffs**
# - 70B models need ~40 GB RAM. Worth it for synthesis quality on
Expand Down
2 changes: 1 addition & 1 deletion config.openclaw.example.yml → configs/llm/openclaw.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
# 2. Ensure the gateway is running: `openclaw onboard --install-daemon`
# 3. Verify: `openclaw agent --message "say hi"`
# 4. Use this file as your config:
# perspicacite -c config.openclaw.example.yml serve
# perspicacite -c configs/llm/openclaw.yml serve
#
# Same caveats as the other agent-CLI presets: no prompt caching, no
# per-call temperature/max_tokens, output buffered (no streaming).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
# 3. Add to your shell profile (~/.zshrc or ~/.bashrc):
# export OPENROUTER_API_KEY="sk-or-v1-..."
# 4. Copy this file:
# cp config.example.openrouter-free.yml config.yml
# cp configs/llm/openrouter-free.yml config.yml
# 5. Start the server:
# source ~/.zshrc && uv run perspicacite -c config.yml serve
#
Expand Down
10 changes: 5 additions & 5 deletions docs/agent-cli-caveats.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ subprocess-based LLM routing path). Captured from live testing
during the May 2026 rollout. Keep this in sync as upstream CLIs evolve.

See also:
- [`config.claude_code.example.yml`](../config.claude_code.example.yml)
- [`config.codex.example.yml`](../config.codex.example.yml)
- [`config.openclaw.example.yml`](../config.openclaw.example.yml)
- [`config.hermes.example.yml`](../config.hermes.example.yml)
- [`configs/llm/claude_code.yml`](../configs/llm/claude_code.yml)
- [`configs/llm/codex.yml`](../configs/llm/codex.yml)
- [`configs/llm/openclaw.yml`](../configs/llm/openclaw.yml)
- [`configs/llm/hermes.yml`](../configs/llm/hermes.yml)
- [`src/perspicacite/llm/agent_cli.py`](../src/perspicacite/llm/agent_cli.py)

## What "agent CLI" routing means
Expand Down Expand Up @@ -170,7 +170,7 @@ installed version).
`hermes setup`), not by a CLI flag.
- **Simpler alternative for Hermes models:** the Hermes family is
published on Ollama as `hermes-3:70b` etc. Use
[`config.ollama.example.yml`](../config.ollama.example.yml) with
[`configs/llm/ollama.yml`](../configs/llm/ollama.yml) with
`default_model: "hermes-3:70b"` — fully supported today, no CLI
dependency.

Expand Down
4 changes: 2 additions & 2 deletions docs/embedding-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ Full benchmark data: [perspicacite-eval/docs/retrieval_benchmark_2026_05_26.md](
| Config file | Embedding | Dims | Reranker | NDCG@10 (SciFact) | Use when |
|---|---|---|---|---|---|
| `config.yml` | all-MiniLM-L6-v2 | 384 | ms-marco-MiniLM-L-12-v2 | **0.851** | Dev, resource-constrained, fast setup |
| `config_bge_m3.yml` | BAAI/bge-m3 | 1024 | bge-reranker-v2-m3 | **0.879** | Production biomedical (recommended) |
| `config_openai_large.yml` | text-embedding-3-large | 3072 | ms-marco-MiniLM-L-12-v2 | **0.872** | Cross-domain, best generalisation |
| `configs/embedders/bge_m3.yml` | BAAI/bge-m3 | 1024 | bge-reranker-v2-m3 | **0.879** | Production biomedical (recommended) |
| `configs/embedders/openai_large.yml` | text-embedding-3-large | 3072 | ms-marco-MiniLM-L-12-v2 | **0.872** | Cross-domain, best generalisation |

---

Expand Down
30 changes: 15 additions & 15 deletions docs/guides/embedding-and-rag-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ llm:

## Tier 2 — OpenAI: best accuracy, cloud cost

**Config file:** `config_openai_large.yml`
**Config file:** `configs/embedders/openai_large.yml`

### Model
`text-embedding-3-large` — 3 072-dim, OpenAI API, ~$0.13 per million tokens.
Expand All @@ -102,7 +102,7 @@ Gain over MiniLM baseline: **+12 pp NDCG@10 (no rerank), +2 pp with CE reranker*
export OPENAI_API_KEY="sk-..."

# Using the dedicated config (port 8002 by default)
uv run perspicacite -c config_openai_large.yml serve
uv run perspicacite -c configs/embedders/openai_large.yml serve
```

Key settings:
Expand All @@ -125,7 +125,7 @@ You can run both servers simultaneously (they share `chroma_db/` but use differe
uv run perspicacite -c config.yml serve

# Terminal 2 — OpenAI on :8002
OPENAI_API_KEY=$OPENAI_API_KEY uv run perspicacite -c config_openai_large.yml serve
OPENAI_API_KEY=$OPENAI_API_KEY uv run perspicacite -c configs/embedders/openai_large.yml serve
```

Each server uses its own KB (`scifact_abstracts` for MiniLM, `scifact_openai_large`
Expand All @@ -136,7 +136,7 @@ for OpenAI) and embeds queries with the matching model. See

## Tier 3a — Biomedical local: best life-science accuracy

**Config file:** `config_pubmedbert.yml`
**Config file:** `configs/embedders/pubmedbert.yml`

### Model
`pritamdeka/S-PubMedBert-MS-MARCO` — 768-dim, PubMedBERT fine-tuned for retrieval on
Expand All @@ -157,14 +157,14 @@ domain-adapted retrieval model with a powerful cross-encoder reranker.
### Launch

```bash
uv run perspicacite -c config_pubmedbert.yml serve
uv run perspicacite -c configs/embedders/pubmedbert.yml serve
# Model auto-downloads from HuggingFace on first run (~440 MB)
```

For offline environments (after first download):
```bash
TRANSFORMERS_OFFLINE=1 HF_DATASETS_OFFLINE=1 \
uv run perspicacite -c config_pubmedbert.yml serve
uv run perspicacite -c configs/embedders/pubmedbert.yml serve
```

Key settings:
Expand All @@ -187,7 +187,7 @@ rag_modes:

## Tier 3b — General local SOTA

**Config file:** `config_bge_m3.yml`
**Config file:** `configs/embedders/bge_m3.yml`

### Model
`BAAI/bge-m3` — 1 024-dim, multilingual MTEB SOTA retrieval model. ~2.3 GB.
Expand All @@ -208,7 +208,7 @@ knowledge_base:
GPU launch:
```bash
# If you have a CUDA GPU, sentence-transformers will use it automatically
uv run perspicacite -c config_bge_m3.yml serve
uv run perspicacite -c configs/embedders/bge_m3.yml serve
```

---
Expand All @@ -227,11 +227,11 @@ uv run perspicacite -c config.yml serve &

# Port 8001 — SPECTER2 (scientific citation context)
TRANSFORMERS_OFFLINE=1 HF_DATASETS_OFFLINE=1 \
uv run perspicacite -c config_specter2.yml serve &
uv run perspicacite -c configs/embedders/specter2.yml serve &

# Port 8002 — OpenAI 3-large (highest accuracy, paid)
OPENAI_API_KEY=$OPENAI_API_KEY \
uv run perspicacite -c config_openai_large.yml serve &
uv run perspicacite -c configs/embedders/openai_large.yml serve &
```

### Ingest the same corpus into each KB
Expand Down Expand Up @@ -338,7 +338,7 @@ llm:
timeout: 300 # 14B can be slow for long answers
```

See `config_qwen3_14b.yml` for a complete example.
See `configs/embedders/qwen3_14b.yml` for a complete example.

**Thinking mode (Qwen3):** Qwen3 supports `/think` and `/no_think` tokens. The server
inserts these based on mode complexity. Set `QWEN3_NO_THINK=1` env var to always
Expand All @@ -364,10 +364,10 @@ RAM: ~300 MB. Works on any machine with internet for LLM calls.
```bash
# Port 8005 — PubMedBERT + bge-reranker + local Qwen3
# First run: downloads ~2.6 GB of models
uv run perspicacite -c config_pubmedbert.yml serve
uv run perspicacite -c configs/embedders/pubmedbert.yml serve

# With local LLM (Ollama):
# Edit config_pubmedbert.yml: llm.default_provider = "ollama", default_model = "qwen3:14b"
# Edit configs/embedders/pubmedbert.yml: llm.default_provider = "ollama", default_model = "qwen3:14b"
```

RAM: ~3 GB (PubMedBERT + bge-reranker + Qwen3 8B) or ~11 GB (Qwen3 14B).
Expand All @@ -383,8 +383,8 @@ share the same server.
```bash
# OpenAI 3-large + bge-reranker + Claude Opus
OPENAI_API_KEY=$OPENAI_API_KEY ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
uv run perspicacite -c config_openai_large.yml serve
# Edit config_openai_large.yml:
uv run perspicacite -c configs/embedders/openai_large.yml serve
# Edit configs/embedders/openai_large.yml:
# llm.default_provider = "anthropic"
# llm.default_model = "claude-opus-4-5"
# rag_modes.reranker_model = "BAAI/bge-reranker-v2-m3"
Expand Down
2 changes: 1 addition & 1 deletion docs/recipe-book-2026-05-15.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ documents *intent* — "what should I run when I want to ...?"
> **Prereqs:** `pip install -e .`, `perspicacite serve` is reachable
> at `http://localhost:8000` for the recipes that need it, and a
> `.env` with at least one `*_API_KEY` is in place. See
> `config/config.example.yml` for the canonical config layout.
> `config.example.yml` for the canonical config layout.

---

Expand Down
Loading
Loading