clawsqlite (knowledge)

Languages: English | 中文说明

clawsqlite is a CLI toolbox for SQLite‑based applications in OpenClaw. The first built‑in application is a local Markdown + SQLite knowledge base.

This repo currently focuses on the knowledge app:

commands are exposed under clawsqlite knowledge ... for users/skills.

A local Markdown + SQLite knowledge base for OpenClaw, designed for both humans and agents.

The knowledge app helps you:

Ingest URLs or raw text as Markdown files + SQLite records
Run fast full‑text search over your notes and scraped articles
Optionally enable vector search via an external embedding service
Regenerate titles/summaries via heuristics; optionally use a small LLM for tags
Keep the KB healthy with explicit maintenance commands (reindex/check/fix + maintenance prune/gc)

Status: already used in real OpenClaw setups. The schema and CLI are kept small and stable on purpose.

1. Features

Pure SQLite backend
- articles table as source of truth
- articles_fts FTS5 table for full‑text search
- articles_vec vec0 table for vector search (optional)
Markdown storage
- Each article is stored as articles/<id>__<slug>.md
- Markdown files include a small METADATA header + MARKDOWN body section
Configurable root
- All data lives under a single root directory
- DB and articles dir default to <root>/knowledge.sqlite3 and <root>/articles (see env overrides below)
Embeddings + LLM (optional)
- Embeddings: OpenAI‑compatible /v1/embeddings API
- Small LLM: OpenAI‑compatible /v1/chat/completions API
Tag generation & search ranking
- Tags are generated from the long summary via a small LLM (when configured) or a jieba-based heuristic extractor
- If the LLM is not configured, tag generation degrades to jieba and emits a NEXT hint
- Search ranking uses tag/query matching as an additional signal on top of FTS and vector similarity. Internally, the scorer:
  - embeds both the article summary and the tag string into vec0 tables (articles_vec and articles_tag_vec)
  - L2-normalizes both stored vectors and query vectors, then scores semantic similarity via cosine similarity mapped into [0,1] (with a distance->sigmoid fallback for older/partial vec rows)
  - splits the tag channel into semantic (vector) tag score and lexical tag match score, with the split controlled by CLAWSQLITE_TAG_VEC_FRACTION (default 0.7)
  - applies an optional log compression to the lexical tag score (ln(1 + alpha*x) / ln(1 + alpha), alpha from CLAWSQLITE_TAG_FTS_LOG_ALPHA, default 5.0) so that many partial tag hits don’t overpower the semantic channels
CLI first
- Simple subcommands: ingest, search, show, export, update, delete, reindex, doctor

2. Requirements

The knowledge app expects an environment similar to the OpenClaw container:

Python 3.10+ with sqlite3 and FTS5 enabled
Python dependencies:
- jieba (optional but strongly recommended for Chinese tag extraction)
- pypinyin (optional; used to generate pinyin slugs for CJK titles)
sqlite extensions (optional but recommended):
- libsimple.so (tokenizer simple) for better CJK tokenization
- vec0.so from sqlite-vec
Network access to your embedding / small LLM HTTP endpoints (if you enable those features)

The repo assumes these paths by default (you can override them):

Tokenizer extension: /usr/local/lib/libsimple.so
vec0 extension: auto-discovered under /app/node_modules/**/vec0.so or system lib dirs

In a fresh environment you typically need to:

Install jieba via pip:
```
pip install jieba
```
For libsimple.so and vec0.so:
- In the OpenClaw container these are preinstalled.
- On a custom system you can either:
  - Use distro packages if available (check your Linux distribution), or
  - Build from source following the upstream docs:
    - sqlite-vec: https://github.com/asg017/sqlite-vec
    - simple tokenizer: see the OpenClaw docs for building libsimple.so.
- If these extensions are missing, the knowledge app will automatically degrade to:
  - SQLite built‑in tokenizer for FTS
  - If jieba is available, optionally pre-segment CJK text in Python for better Chinese recall (controlled by CLAWSQLITE_FTS_JIEBA=auto|on|off)
  - FTS‑only mode when vec0 is unavailable.

3. Installation

3.1 From PyPI (recommended for general use)

Once published, the simplest way to use clawsqlite is via PyPI:

pip install clawsqlite

# Then
clawsqlite knowledge --help

This installs the clawsqlite console script so you can call the CLI from anywhere in your environment.

3.2 From source (development / OpenClaw workspace)

Clone the repo:

git clone git@github.com:ernestyu/clawsqlite.git
cd clawsqlite

(Inside OpenClaw’s workspace this repo may already be present at /home/node/.openclaw/workspace/clawsqlite.)

You can run the knowledge app via the main shell entrypoint:

# From the repo root
./bin/clawsqlite knowledge --help

# Or explicitly choose a Python binary (e.g. your venv)
CLAWSQLITE_PYTHON=/opt/venv/bin/python ./bin/clawsqlite knowledge --help

The recommended CLI entrypoint for skills/users is:

clawsqlite knowledge ...

Use:

clawsqlite knowledge ...

To run a quick self-check of your knowledge DB + env (paths, vec0, embedding, small LLM), you can use:

clawsqlite knowledge doctor --json
# or, from source without installing the package
python -m clawsqlite_knowledge.cli doctor

4. Configuration

4.1 Root & paths

The knowledge app determines its root + DB + articles directory via CLI flags + env + defaults.

Priority for root:

CLI: --root
Env: CLAWSQLITE_ROOT
Default: <current working dir>/knowledge_data

DB path:

--db > CLAWSQLITE_DB > <root>/knowledge.sqlite3

Articles dir:

--articles-dir > CLAWSQLITE_ARTICLES_DIR > <root>/articles

4.2 Project `.env`

A project‑level .env file is the primary configuration source. Start from the example:

cp ENV.example .env
# then edit .env

ENV.example contains fields like:

# Embedding service (required for vector search)
EMBEDDING_BASE_URL=https://embed.example.com/v1
EMBEDDING_MODEL=your-embedding-model
EMBEDDING_API_KEY=sk-your-embedding-key
CLAWSQLITE_VEC_DIM=1024

# Small LLM (optional)
SMALL_LLM_BASE_URL=https://llm.example.com/v1
SMALL_LLM_MODEL=your-small-llm
SMALL_LLM_API_KEY=sk-your-small-llm-key

# Root override (optional)
# CLAWSQLITE_ROOT=/path/to/knowledge_root
# CLAWSQLITE_DB=/path/to/knowledge.sqlite3
# CLAWSQLITE_ARTICLES_DIR=/path/to/articles

At runtime, clawsqlite knowledge ... (and python -m clawsqlite_cli) will auto‑load a project‑level .env from the current working directory. Existing environment variables are not overridden. In OpenClaw containers this is usually done via the agent’s env config instead of editing .env.

4.3 Embedding configuration

Embeddings are used for vector search (articles_vec) and can be disabled if you only want FTS.

Required env (typically via .env):

EMBEDDING_MODEL – model name used by your embedding endpoint
EMBEDDING_BASE_URL – base URL, e.g. https://embed.example.com/v1
EMBEDDING_API_KEY – bearer token
CLAWSQLITE_VEC_DIM – embedding dimension (e.g. 1024 for BAAI/bge-m3)

The knowledge app will:

Use these env vars in clawsqlite_knowledge.embed.get_embedding() (via httpx POST to /v1/embeddings)
Use CLAWSQLITE_VEC_DIM to define embedding float[DIM] in articles_vec

If any of these are missing, vector features are treated as disabled:

embedding_enabled() returns False
ingest will not call the embedding API
search in mode=hybrid will auto‑downgrade to FTS‑only and print a NEXT hint

4.4 Tag generation (LLM + jieba fallback)

By default the knowledge app builds a long summary from the first ~1200 word‑units (ending at a paragraph boundary) plus the final paragraph. Tags are generated from this long summary.

If SMALL_LLM_* is configured and you use --gen-provider llm, tags are generated by the small LLM.
Otherwise tags fall back to the jieba‑based v6 heuristic extractor (and emit a NEXT hint). If jieba is missing, we fall back to a lightweight keyword extractor.

Search ranking also uses tags as a small but important signal:

When jieba is available (tags are ordered by importance), we compute a continuous tag match score in [0,1] based on how many query keywords exactly match the top tags and how early they appear.
When jieba is not available, we fall back to a simple 0/1 bonus for any exact tag match, to avoid over‑interpreting a noisy tag order.

For search we build a small query plan:

query_refine: a retrieval-friendly sentence (LLM when enabled; otherwise the raw query)
query_tags: a keyword/phrase list (LLM when enabled; otherwise heuristic v4 extraction)

We use:

query_refine for FTS and for the main semantic channel (summary vectors in articles_vec)
query_tags for lexical tag matching and for the tag semantic channel (articles_tag_vec)

The final hybrid score is a weighted blend of signals::

score = w_vec * vec_score + w_fts * fts_score
        + w_tag * tag_score + w_priority * priority_bonus
        + w_recency * recency_bonus

By default we use::

# Mode1/Mode3 (embedding enabled)
CLAWSQLITE_SCORE_WEIGHTS_MODE1=vec=0.45,fts=0.25,tag=0.15,priority=0.03,recency=0.02
CLAWSQLITE_SCORE_WEIGHTS_MODE3=vec=0.45,fts=0.25,tag=0.15,priority=0.03,recency=0.02

# Mode2/Mode4 (no embedding)
CLAWSQLITE_SCORE_WEIGHTS_MODE2=fts=0.60,tag=0.25,priority=0.08,recency=0.07
CLAWSQLITE_SCORE_WEIGHTS_MODE4=fts=0.60,tag=0.25,priority=0.08,recency=0.07

which roughly means:

~45% vector similarity for deep semantic anchoring (summary vectors)
~25% BM25 keywords for textual sanity checks (FTS over title/tags/summary)
~15% tag channel (split between semantic tag vectors and lexical tag match)
~3% priority as a manual pinning mechanism
~2% recency to keep new knowledge slightly favored without dominating

You can override these weights via:

CLAWSQLITE_SCORE_WEIGHTS_MODE1..MODE4 (recommended, mode-specific)
Legacy compatibility: CLAWSQLITE_SCORE_WEIGHTS and CLAWSQLITE_SCORE_WEIGHTS_TEXT

See ENV.example for details.

For mixed Chinese/English knowledge bases you may want to bias more strongly towards tag semantics. A common production shape is::

CLAWSQLITE_SCORE_WEIGHTS_MODE1=vec=0.30,fts=0.10,tag=0.55,priority=0.03,recency=0.02
CLAWSQLITE_TAG_VEC_FRACTION=0.82

which yields approximately:

~45% tag semantic (vector) score
~30% summary semantic (vector) score
~10% lexical tag match
~10% full-text FTS
~5% priority/recency

And you can tune the lexical tag compression via::

CLAWSQLITE_TAG_FTS_LOG_ALPHA=5.0  # larger = stronger compression, 0 = disable

4.4 Small LLM configuration (optional)

For better tags (and optional keyword expansion) you can configure a small LLM endpoint:

SMALL_LLM_BASE_URL=https://llm.example.com/v1
SMALL_LLM_MODEL=your-small-llm
SMALL_LLM_API_KEY=sk-your-small-llm-key

Then use --gen-provider llm when ingesting or updating records. The knowledge app will call an OpenAI‑compatible chat completions API to generate tags from the long summary.

If these env vars are not set, provider=openclaw uses heuristics only (no network calls).

4.5 Scraper configuration

The knowledge app does not implement web scraping itself. For --url ingest it runs an external scraper command, configured via:

CLI: --scrape-cmd
Env: CLAWSQLITE_SCRAPE_CMD

Recommended usage:

CLAWSQLITE_SCRAPE_CMD="node /path/to/scrape.js --some-flag"

The knowledge app will:

Load this value from .env (stripping outer quotes)
Use shlex.split() to build argv (no shell=True by default)
Append the URL as the last argument if you don’t use {url}

Scraper output formats:

New format (recommended):

--- METADATA ---
Title: Some article title
Author: Someone
...

--- MARKDOWN ---
# Markdown heading
Body...

Old format (still supported):

Title: Some article title
# Markdown heading
Body...

The knowledge app will parse these into title and markdown body.

5. Quickstart

5.1 Minimal setup

Clone & cd

git clone git@github.com:ernestyu/clawsqlite.git
cd clawsqlite

（可选）在你的外层环境中配置好根目录和 Embedding/LLM 相关 env，或直接在运行时通过 --root/--db/--articles-dir 传入。这里不再强依赖项目内置 .env 加载逻辑。
First ingest (text) – this also creates the DB and basic tables:
```
clawsqlite knowledge ingest \
  --text "Hello clawsqlite" \
  --title "First note" \
  --category dev \
  --tags test \
  --gen-provider off \
  --json
```
This will:
- Create <root>/knowledge.sqlite3
- Create <root>/articles/000001__first-note.md
- Index the record in FTS (and vec if embedding is configured)
Search it back:
```
clawsqlite knowledge search "Hello" --mode fts --json
```
You should see the record you just created.

5.2 Ingest a URL

Assuming you have a scraper command set in .env:

clawsqlite knowledge ingest \
  --url "https://example.com/article" \
  --category web \
  --tags example \
  --gen-provider openclaw \
  --json

This will:

Call your scraper
Extract title + markdown body
Optionally generate summary/tags
Store everything in DB + markdown

5.3 Updating an existing URL (`--update-existing`)

If you know a URL’s content has changed and you want to refresh the existing record:

clawsqlite knowledge ingest \
  --url "https://example.com/article" \
  --gen-provider openclaw \
  --update-existing \
  --json

Semantics:

If a record with this source_url exists (and is not deleted), and --update-existing is set:
- The knowledge app updates that record’s title / summary / tags / category / priority
- Keeps the same id
- Rewrites the markdown file
- Updates FTS and vec indexes
If no such record exists, it behaves like a normal ingest.

Note: source_url has a UNIQUE index for non‑empty, non‑Local values, so each URL maps to at most one active record.

6. CLI Overview

All commands share common flags (--root, --db, --articles-dir, --json, --verbose). Run clawsqlite knowledge <command> --help for full details.

6.1 ingest

clawsqlite knowledge ingest --url URL [options]
clawsqlite knowledge ingest --text TEXT [options]

Key options:

--url / --text
--title, --summary, --tags, --category, --priority
--gen-provider {openclaw,llm,off}
--scrape-cmd (or env CLAWSQLITE_SCRAPE_CMD)
--update-existing (for URL mode)

6.2 search

clawsqlite knowledge search "query" --mode hybrid --topk 20 --json

Modes:

hybrid – combine vec + FTS
fts – full‑text only
vec – vector only (requires embedding enabled)

Other flags:

--candidates – candidate pool before re‑ranking
--llm-keywords {auto,on,off} – small-LLM usage policy for building query_refine/query_tags
--gen-provider – set to llm to enable the small LLM (requires SMALL_LLM_*)
Filters: --category, --tag, --since, --priority, --include-deleted

6.3 show / export / update / delete

show – dump one record (optionally with full markdown content)
export – write a record to a .md or .json file
update – patch fields or regenerate via generator (id/source_url/created_at are treated as read-only)
delete – soft delete by default (sets deleted_at); --hard for permanent removal

All of these commands now check that the DB file exists before opening it:

If --root/--db or .env point to a non‑existent DB path, they will report:
```
ERROR: db not found at /path/to/db. Check --root/--db or .env configuration.
```
instead of silently creating an empty DB and then failing with id not found.

6.4 reindex

Maintenance operations:

reindex --check – report missing fields/indexes
reindex --fix-missing – regen fields/indexes using current generator
reindex --rebuild --fts – rebuild FTS index (via clawsqlite index rebuild)
reindex --rebuild --vec – clear vec index only (no embedding); use clawsqlite knowledge embed-from-summary to refill embeddings.

The check output includes flags like vec_available and embedding_enabled to help you understand whether vec features are actually usable for the current DB.

6.5 interest clustering (topics)

clawsqlite can build topic-like clusters from existing articles_vec + articles_tag_vec.

6.5.1 Build command

clawsqlite knowledge build-interest-clusters \
  --db /path/to/clawkb.sqlite3 \
  --algo kmeans++ \
  --use-pca \
  --pca-explained-variance-threshold 0.95 \
  --min-cluster-size 8 \
  --max-clusters 50

You can switch backend:

clawsqlite knowledge build-interest-clusters \
  --db /path/to/clawkb.sqlite3 \
  --algo hierarchical \
  --hierarchical-linkage average \
  --hierarchical-distance-threshold 0.20

6.5.2 Vector pipeline (important)

For each eligible article (undeleted, non-empty summary, at least one vec):

If both summary_vec and tag_vec exist:
- summary_vec = L2(summary_vec)
- tag_vec = L2(tag_vec)
- mixed = (1-tag_weight)*summary_vec + tag_weight*tag_vec
- interest_vec_1024 = L2(mixed)
If only one branch exists:
- use that branch after L2 normalize.

tag_weight is controlled by CLAWSQLITE_INTEREST_TAG_WEIGHT (default 0.75).

6.5.3 PCA + clustering backends

PCA is optional (CLAWSQLITE_INTEREST_USE_PCA=true/false).
PCA dimension is auto-selected by cumulative explained variance threshold: CLAWSQLITE_INTEREST_PCA_EXPLAINED_VARIANCE_THRESHOLD (e.g. 0.90/0.95).
Clustering can run in PCA space, but persisted centroids are always recomputed in original 1024-d space.

Backends:

kmeans++
- standard kmeans++ initialization
- initial k0 = min(max_clusters, max(1, floor(n/min_cluster_size)))
- supports optional post-merge (enable_post_merge + cosine threshold)
hierarchical
- linkage: average or complete
- cut by distance threshold (cosine-distance semantics)
- falls back to a pure-Python implementation when scipy is unavailable
- no extra post-merge by default

6.5.4 Unified postprocessing and persistence

After initial labels from either backend:

Small-cluster reassignment:
- clusters with size < min_cluster_size are reassigned to nearest large cluster using original 1024-d vectors.
Final centroids:
- recompute centroid in original 1024-d space, then L2 normalize.
Stable cluster IDs:
- sort by size desc, tie-break by min article_id, renumber to 1..K.
Write DB:
- interest_clusters
- interest_cluster_members (membership=1.0)
- interest_meta (build timestamp + algo/pca/tag-weight metadata)

6.5.5 Main env knobs

CLAWSQLITE_INTEREST_CLUSTER_ALGO (kmeans++ | hierarchical)
CLAWSQLITE_INTEREST_TAG_WEIGHT (0..1)
CLAWSQLITE_INTEREST_USE_PCA (true/false)
CLAWSQLITE_INTEREST_PCA_EXPLAINED_VARIANCE_THRESHOLD
CLAWSQLITE_INTEREST_MIN_SIZE
CLAWSQLITE_INTEREST_MAX_CLUSTERS
CLAWSQLITE_INTEREST_KMEANS_RANDOM_STATE
CLAWSQLITE_INTEREST_KMEANS_N_INIT
CLAWSQLITE_INTEREST_KMEANS_MAX_ITER
CLAWSQLITE_INTEREST_ENABLE_POST_MERGE
CLAWSQLITE_INTEREST_MERGE_DISTANCE
CLAWSQLITE_INTEREST_HIERARCHICAL_LINKAGE
CLAWSQLITE_INTEREST_HIERARCHICAL_DISTANCE_THRESHOLD

6.5.6 Cluster quality inspection

为了调参和观察兴趣簇结构，可使用内置分析命令：

clawsqlite knowledge inspect-interest-clusters \
  --db /path/to/clawkb.sqlite3 \
  --vec-dim 1024

提示：该命令依赖 numpy；如需生成 PNG 图，还需要 matplotlib。

仅统计：pip install 'clawsqlite[analysis]'
统计 + 绘图：pip install 'clawsqlite[analysis,plot]'

该命令会：

从 interest_clusters / interest_cluster_members 读出当前簇；
以与构建簇相同的方式重建 interest 向量；
打印每个簇的：size / n_members / mean_radius / max_radius；
打印簇心之间 1 - cos 距离的 min / max / median；
若环境可用 matplotlib，生成一张 PCA 2D 散点图：
- 点位置：簇心的 PCA 降维坐标 (PC1, PC2)；
- 点大小：与簇大小成比例（使用开根号压缩动态范围）；
- 点颜色：对应该簇的 mean_radius，配色为 viridis，色条标签为 mean_radius (1 - cos)；
- PNG 默认写到当前工作目录：./interest_clusters_pca.png。

可以加 --no-plot 仅打印数值统计，不生成图片：

clawsqlite knowledge inspect-interest-clusters \
  --db /path/to/clawkb.sqlite3 \
  --vec-dim 1024 \
  --no-plot

实际调参建议：

先用当前 KB 跑一轮 build-interest-clusters，再运行 inspect-interest-clusters 看各簇的大小、半径以及簇心距离；
使用辅助脚本（tests/test_interest_merge_suggest.py）或类似逻辑，按 alpha * median(mean_radius) 估算一个合适的 CLAWSQLITE_INTEREST_MERGE_DISTANCE（例如 0.07）；
调整 .env 中的 CLAWSQLITE_INTEREST_MIN_SIZE / CLAWSQLITE_INTEREST_MAX_CLUSTERS / CLAWSQLITE_INTEREST_MERGE_DISTANCE 后重新构建，并用 inspect-interest-clusters 观察结构变化。

6.6 maintenance

clawsqlite knowledge maintenance prune --days 3 --dry-run

This scans for orphaned files, old .bak_* backups, and broken DB paths. Use --dry-run to preview deletions; maintenance gc is an alias of prune.

7. Notes on File Naming & Titles

Markdown files are named:

<id:06d>__<slugified-title>.md

id comes from the articles table
slugified-title is derived from the title:
- If pypinyin is available, CJK tokens are converted to pinyin;
- ASCII letters/digits are preserved; other symbols become -;
- Repeated - are collapsed; empty results fall back to untitled.

For CJK titles, the filename is typically a pinyin‑based slug. If pypinyin is not installed, it may fall back to untitled. This does not affect functionality:

The real title is stored in the DB
Search operates on DB fields and FTS, not filenames

We may refine this strategy in future (e.g. supporting CJK slugs or ID‑only filenames), but the current format is stable and works well with existing tools.

8. 中文说明（简要）

clawsqlite 提供的知识库应用是一个基于 SQLite + FTS5 + sqlite-vec 的本地知识库 CLI，主要特性：

文章元数据存到 articles 表，全文索引用 FTS5，向量检索用 sqlite-vec
每篇文章同时会写成一个 markdown 文件，包含 --- METADATA --- 和 --- MARKDOWN ---
支持：
- ingest：从 URL 或纯文本入库
- search：FTS / 向量 / 混合模式
- show / export / update / delete / reindex / maintenance

快速开始

克隆仓库：

git clone git@github.com:ernestyu/clawsqlite.git
cd clawsqlite

在运行环境中配置好 CLAWSQLITE_ROOT / CLAWSQLITE_DB 等路径以及 EMBEDDING_* / SMALL_LLM_* 等变量，或直接在命令行通过 --root/--db/--articles-dir 传入。这一步通常在 OpenClaw 的 agent 配置中完成。

第一次入库（文本）：

clawsqlite knowledge ingest \
  --text "你好，clawsqlite" \
  --title "第一次笔记" \
  --category test \
  --tags demo \
  --gen-provider off

搜索：

clawsqlite knowledge search "clawsqlite" --mode fts

URL 重抓 & 刷新

当你知道某篇文章更新了，并且之前已经用 URL 入过库，可以用：

clawsqlite knowledge ingest \
  --url "https://example.com/article" \
  --update-existing

行为：

source_url 已存在时：更新同一条记录（title/summary/tags/category/priority），并重写 markdown + 索引
source_url 不存在时：正常新增一条记录

articles.source_url 对非空、非 Local 的 URL 有唯一约束，保证一个 URL 对应最多一条记录。

如果你在使用过程中遇到问题或有改进建议，欢迎在仓库里开 issue / PR。

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
bin		bin
clawsqlite_knowledge		clawsqlite_knowledge
clawsqlite_plumbing		clawsqlite_plumbing
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLI_HELP_SPEC.md		CLI_HELP_SPEC.md
CLI_HELP_SPEC_MANSTYLE.md		CLI_HELP_SPEC_MANSTYLE.md
ENV.example		ENV.example
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_zh.md		README_zh.md
clawsqlite_cli.py		clawsqlite_cli.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

clawsqlite (knowledge)

1. Features

2. Requirements

3. Installation

3.1 From PyPI (recommended for general use)

3.2 From source (development / OpenClaw workspace)

4. Configuration

4.1 Root & paths

4.2 Project .env

4.3 Embedding configuration

4.4 Tag generation (LLM + jieba fallback)

4.4 Small LLM configuration (optional)

4.5 Scraper configuration

5. Quickstart

5.1 Minimal setup

5.2 Ingest a URL

5.3 Updating an existing URL (--update-existing)

6. CLI Overview

6.1 ingest

6.2 search

6.3 show / export / update / delete

6.4 reindex

6.5 interest clustering (topics)

6.5.1 Build command

6.5.2 Vector pipeline (important)

6.5.3 PCA + clustering backends

6.5.4 Unified postprocessing and persistence

6.5.5 Main env knobs

6.5.6 Cluster quality inspection

6.6 maintenance

7. Notes on File Naming & Titles

8. 中文说明（简要）

快速开始

URL 重抓 & 刷新

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

4.2 Project `.env`

5.3 Updating an existing URL (`--update-existing`)

Packages