From 2f36ee5a38543cbe0e7cf6c5df352e306389bf85 Mon Sep 17 00:00:00 2001
From: aaltshuler <andrew@collectivelab.io>
Date: Sun, 14 Jun 2026 14:44:42 +0300
Subject: [PATCH] =?UTF-8?q?docs(user):=20add=20task=20guides=20=E2=80=94?=
 =?UTF-8?q?=20hybrid=20search,=20cluster=20on=20S3,=20review=20workflow=20?=
 =?UTF-8?q?(Phase=203b)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Four new pages under docs/user/guides/, each a runnable, code-verified command
sequence that composes the reference docs into a real workflow:

- guides/hybrid-search.md — schema with a @embed vector + text body, load, then
  a query fusing bm25 and nearest with rrf. Notes that indexes are engine-
  maintained (no manual build step) and links embeddings.md for the provider env.
- guides/cluster-on-s3.md — cluster.yaml with a storage: s3:// root, the
  validate→import→plan→apply flow, loading via the graph's storage URI, and
  config-free serving with `omnigraph-server --cluster s3://…`.
- guides/review-workflow.md — load onto a branch with --from, inspect it with
  --branch reads / commit list, merge with --into, then delete + cleanup.
- guides/index.md — the section landing page.

Every command was checked against crates/omnigraph-cli/src/cli.rs (e.g. caught
that `load` has no --cluster/--cluster-graph — those are storage-plane only — and
used the positional storage URI instead).

Wired into docs/user/index.md (new Guides section) and AGENTS.md's topic table.

Verified: zero broken links; check-agents-md.sh green (61 links, 58 docs).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 AGENTS.md                           |  1 +
 docs/user/guides/cluster-on-s3.md   | 98 ++++++++++++++++++++++++++++
 docs/user/guides/hybrid-search.md   | 99 +++++++++++++++++++++++++++++
 docs/user/guides/index.md           | 14 ++++
 docs/user/guides/review-workflow.md | 63 ++++++++++++++++++
 docs/user/index.md                  | 11 ++++
 6 files changed, 286 insertions(+)
 create mode 100644 docs/user/guides/cluster-on-s3.md
 create mode 100644 docs/user/guides/hybrid-search.md
 create mode 100644 docs/user/guides/index.md
 create mode 100644 docs/user/guides/review-workflow.md

diff --git a/AGENTS.md b/AGENTS.md
index 065e28aa..9e867919 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -101,6 +101,7 @@ Full diagram and concurrency model: [docs/dev/architecture.md](docs/dev/architec
 | Error taxonomy and result serialization | [docs/user/operations/errors.md](docs/user/operations/errors.md) |
 | Install (binary / Homebrew / source / channels) | [docs/user/install.md](docs/user/install.md) |
 | Deployment (binary / container / RustFS bootstrap / auth / build variants) | [docs/user/deployment.md](docs/user/deployment.md) |
+| Task guides (hybrid search, cluster on S3, review workflow) | [docs/user/guides/index.md](docs/user/guides/index.md) |
 | CI / release workflows | [docs/dev/ci.md](docs/dev/ci.md) |
 | Code ownership (CODEOWNERS source of truth, roles, regeneration) | [docs/dev/codeowners.md](docs/dev/codeowners.md) |
 | Branch protection policy (declarative, applied via `scripts/apply-branch-protection.sh`) | [docs/dev/branch-protection.md](docs/dev/branch-protection.md) |
diff --git a/docs/user/guides/cluster-on-s3.md b/docs/user/guides/cluster-on-s3.md
new file mode 100644
index 00000000..7ef77da4
--- /dev/null
+++ b/docs/user/guides/cluster-on-s3.md
@@ -0,0 +1,98 @@
+# Run a Cluster on S3
+
+This guide takes a cluster from a local config directory to a server that boots
+**config-free from an object-storage bucket** — the bucket is the whole
+deployment artifact. For the full control-plane reference, see
+[operating a cluster](../clusters/index.md) and
+[cluster config](../clusters/config.md).
+
+## 1. Declare the cluster
+
+Lay out a config directory. The one S3-specific line is `storage:` — it puts the
+state ledger, catalog, and graph data on the bucket instead of in the folder:
+
+```
+company-brain/
+├── cluster.yaml
+├── people.pg
+├── queries/
+│   └── people.gq
+└── base.policy.yaml
+```
+
+```yaml
+# cluster.yaml
+version: 1
+storage: s3://my-bucket/clusters/company-brain   # the deployment lives here
+metadata:
+  name: company-brain
+graphs:
+  knowledge:
+    schema: people.pg
+    queries: queries/
+policies:
+  base:
+    file: base.policy.yaml
+    applies_to: [knowledge]
+```
+
+Set the S3 credentials in the environment (for a non-AWS S3-compatible store such
+as MinIO or RustFS, also set `AWS_ENDPOINT_URL_S3`):
+
+```bash
+export AWS_ACCESS_KEY_ID=...  AWS_SECRET_ACCESS_KEY=...  AWS_REGION=us-east-1
+# export AWS_ENDPOINT_URL_S3=https://...   # non-AWS S3-compatible stores
+```
+
+## 2. Validate, plan, apply
+
+`apply` is the only command that changes the world; `plan` previews it:
+
+```bash
+omnigraph cluster validate --config company-brain   # parse + typecheck
+omnigraph cluster import   --config company-brain   # create the state ledger
+omnigraph cluster plan     --config company-brain   # preview the diff
+omnigraph cluster apply    --config company-brain   # converge onto the bucket
+```
+
+`apply` creates the graph at the derived root
+(`s3://my-bucket/clusters/company-brain/graphs/knowledge.omni`), applies its
+schema, and publishes the query and policy into the content-addressed catalog.
+`converged: true` means there is nothing left to do — re-running `apply` is always
+safe.
+
+## 3. Load data
+
+The control plane manages *definitions*; rows go through the normal data plane.
+Address the graph by its storage URI (the derived `graphs/<id>.omni` root):
+
+```bash
+omnigraph load --data seed.jsonl --mode overwrite \
+  s3://my-bucket/clusters/company-brain/graphs/knowledge.omni
+```
+
+## 4. Serve config-free from the bucket
+
+A serving host needs only the storage-root URI and credentials — no checkout of
+the config repo:
+
+```bash
+OMNIGRAPH_SERVER_BEARER_TOKENS_JSON='{"act-reader":"s3cret"}' \
+  omnigraph-server --cluster s3://my-bucket/clusters/company-brain --bind 0.0.0.0:8080
+```
+
+The server boots from the **applied revision** recorded in the ledger — never from
+config that was merely written. Roll out a change by `apply`-ing again, then
+restarting replicas.
+
+## 5. Maintain it
+
+Storage maintenance runs out-of-band, addressed by cluster + graph name (it
+resolves the graph's storage URI from the served state):
+
+```bash
+omnigraph optimize --cluster company-brain --cluster-graph knowledge
+omnigraph cleanup  --cluster company-brain --cluster-graph knowledge --keep 10 --confirm
+```
+
+See [maintenance](../operations/maintenance.md) for what each command does.
diff --git a/docs/user/guides/hybrid-search.md b/docs/user/guides/hybrid-search.md
new file mode 100644
index 00000000..ecba4402
--- /dev/null
+++ b/docs/user/guides/hybrid-search.md
@@ -0,0 +1,99 @@
+# Hybrid Search End to End
+
+This guide builds a small document graph and runs a **hybrid** query that fuses
+full-text (BM25) and vector (k-NN) rankings with Reciprocal Rank Fusion. You do
+not build indexes by hand — the engine maintains them; a freshly loaded row is
+searchable immediately.
+
+See [search](../search/index.md) for the function reference and
+[embeddings](../search/embeddings.md) for the full provider/env matrix.
+
+## 1. Schema
+
+A document with a text body for full-text search and a vector for similarity.
+`@embed("body")` tells the engine to embed the `body` text into `embedding` at
+load time:
+
+```
+node Document {
+  title: String,
+  body: String,
+  embedding: Vector(768) @embed("body"),
+}
+```
+
+```bash
+omnigraph init --schema schema.pg docs.omni
+```
+
+## 2. Configure embeddings
+
+Ingest-time embedding uses the engine's embedding client. Point it at your
+provider (see [embeddings](../search/embeddings.md) for every variable):
+
+```bash
+export GEMINI_API_KEY=...        # ingest-time document embeddings
+# For local experimentation without a provider, deterministic mock vectors:
+# export OMNIGRAPH_EMBEDDINGS_MOCK=1 NANOGRAPH_EMBEDDINGS_MOCK=1
+```
+
+If you would rather supply vectors yourself, drop `@embed` and include the
+`embedding` array in each input record instead.
+
+## 3. Load
+
+```bash
+omnigraph load --data docs.jsonl --mode overwrite docs.omni
+```
+
+Each row's `body` is embedded into `embedding` as it loads. The BM25 (full-text)
+and vector indexes are maintained by the engine — there is no separate build step.
+
+## 4. Query — full-text, vector, then hybrid
+
+Full-text only:
+
+```gq
+query text_search($q: String) {
+  match { $d: Document { } }
+  return { $d.title, bm25($d.body, $q) as score }
+  order { score desc }
+  limit 10
+}
+```
+
+Vector only (the query text is embedded at query time; `nearest` requires a
+`limit`):
+
+```gq
+query vector_search($q: String) {
+  match { $d: Document { } }
+  return { $d.title, nearest($d.embedding, $q) as score }
+  order { score desc }
+  limit 10
+}
+```
+
+Hybrid — fuse both rankings with `rrf`:
+
+```gq
+query hybrid($q: String) {
+  match { $d: Document { } }
+  return {
+    $d.title,
+    rrf( nearest($d.embedding, $q), bm25($d.body, $q) ) as score
+  }
+  order { score desc }
+  limit 10
+}
+```
+
+Run it:
+
+```bash
+omnigraph read --query queries.gq --name hybrid \
+  --params '{"q":"trends in AI safety"}' --format table docs.omni
+```
+
+`rrf` combines the two rankings without needing their score scales to match, so
+you get a single fused ordering from a lexical signal and a semantic one.
diff --git a/docs/user/guides/index.md b/docs/user/guides/index.md
new file mode 100644
index 00000000..dfb684fd
--- /dev/null
+++ b/docs/user/guides/index.md
@@ -0,0 +1,14 @@
+# Guides
+
+Task-oriented walkthroughs that compose the building blocks from the reference
+docs into real workflows. Each one is a runnable sequence of commands.
+
+- [Hybrid search end to end](hybrid-search.md) — combine full-text and vector
+  search in one query.
+- [Run a cluster on S3](cluster-on-s3.md) — go from a config directory to a
+  config-free server booting from a bucket.
+- [Branch-based review workflow](review-workflow.md) — stage data on a branch,
+  review it, and merge.
+
+New to OmniGraph? Start with the [quickstart](../quickstart.md) and
+[concepts](../concepts/index.md) first.
diff --git a/docs/user/guides/review-workflow.md b/docs/user/guides/review-workflow.md
new file mode 100644
index 00000000..3d648594
--- /dev/null
+++ b/docs/user/guides/review-workflow.md
@@ -0,0 +1,63 @@
+# Branch-Based Review Workflow
+
+Branches let you stage changes off `main`, inspect them in isolation, and merge
+only once they look right — Git-style, atomic across the whole graph. This guide
+walks a typical "review an incoming batch before it hits main" flow.
+
+See [branches & commits](../branching/index.md) and [merging](../branching/merge.md)
+for the underlying model.
+
+## 1. Stage the batch on its own branch
+
+Loading into a branch that does not exist is an error unless you pass `--from`,
+which forks it from a base first. So one command both forks the branch and loads
+into it:
+
+```bash
+omnigraph load --data batch.jsonl --mode merge \
+  --branch review/2026-04-25 --from main graph.omni
+```
+
+(Equivalently, create the branch first with
+`omnigraph branch create review/2026-04-25 --from main graph.omni`, then `load`
+without `--from`.)
+
+`main` is untouched — the batch lives only on `review/2026-04-25`.
+
+## 2. Inspect the branch in isolation
+
+Run any read query against the branch with `--branch`:
+
+```bash
+omnigraph read --query checks.gq --name count_by_type \
+  --branch review/2026-04-25 --format table graph.omni
+```
+
+Compare it against `main` — list each branch's commits, or diff them:
+
+```bash
+omnigraph branch list graph.omni
+omnigraph commit list --branch review/2026-04-25 graph.omni
+```
+
+## 3. Merge when it looks right
+
+```bash
+omnigraph branch merge review/2026-04-25 --into main graph.omni
+```
+
+The merge is three-way and atomic. If both `main` and the branch changed the same
+data incompatibly, the merge fails with a structured list of conflicts and
+publishes nothing — resolve them and re-merge. See
+[merging](../branching/merge.md) for the conflict kinds.
+
+## 4. Clean up
+
+Once merged, delete the review branch:
+
+```bash
+omnigraph branch delete review/2026-04-25 graph.omni
+```
+
+Branch storage is reclaimed; if a transient error interrupts reclamation, the
+[`cleanup`](../operations/maintenance.md) command sweeps the leftovers later.
diff --git a/docs/user/index.md b/docs/user/index.md
index cabd98a0..80c844dc 100644
--- a/docs/user/index.md
+++ b/docs/user/index.md
@@ -65,6 +65,17 @@ start with install, then follow the section that matches your task.
 | Understand graph layout and URI support | [concepts/storage.md](concepts/storage.md) |
 | Look up constants and tunables | [reference/constants.md](reference/constants.md) |
 
+## Guides
+
+Task-oriented walkthroughs that compose the building blocks above:
+
+| Guide | Read |
+|---|---|
+| All guides | [guides/index.md](guides/index.md) |
+| Hybrid search end to end | [guides/hybrid-search.md](guides/hybrid-search.md) |
+| Run a cluster on S3 | [guides/cluster-on-s3.md](guides/cluster-on-s3.md) |
+| Branch-based review workflow | [guides/review-workflow.md](guides/review-workflow.md) |
+
 ## Releases
 
 Release notes live in [releases/](../releases/). Use them for user-visible