From 2f36ee5a38543cbe0e7cf6c5df352e306389bf85 Mon Sep 17 00:00:00 2001 From: aaltshuler Date: Sun, 14 Jun 2026 14:44:42 +0300 Subject: [PATCH] =?UTF-8?q?docs(user):=20add=20task=20guides=20=E2=80=94?= =?UTF-8?q?=20hybrid=20search,=20cluster=20on=20S3,=20review=20workflow=20?= =?UTF-8?q?(Phase=203b)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Four new pages under docs/user/guides/, each a runnable, code-verified command sequence that composes the reference docs into a real workflow: - guides/hybrid-search.md — schema with a @embed vector + text body, load, then a query fusing bm25 and nearest with rrf. Notes that indexes are engine- maintained (no manual build step) and links embeddings.md for the provider env. - guides/cluster-on-s3.md — cluster.yaml with a storage: s3:// root, the validate→import→plan→apply flow, loading via the graph's storage URI, and config-free serving with `omnigraph-server --cluster s3://…`. - guides/review-workflow.md — load onto a branch with --from, inspect it with --branch reads / commit list, merge with --into, then delete + cleanup. - guides/index.md — the section landing page. Every command was checked against crates/omnigraph-cli/src/cli.rs (e.g. caught that `load` has no --cluster/--cluster-graph — those are storage-plane only — and used the positional storage URI instead). Wired into docs/user/index.md (new Guides section) and AGENTS.md's topic table. Verified: zero broken links; check-agents-md.sh green (61 links, 58 docs). Co-Authored-By: Claude Opus 4.8 --- AGENTS.md | 1 + docs/user/guides/cluster-on-s3.md | 98 ++++++++++++++++++++++++++++ docs/user/guides/hybrid-search.md | 99 +++++++++++++++++++++++++++++ docs/user/guides/index.md | 14 ++++ docs/user/guides/review-workflow.md | 63 ++++++++++++++++++ docs/user/index.md | 11 ++++ 6 files changed, 286 insertions(+) create mode 100644 docs/user/guides/cluster-on-s3.md create mode 100644 docs/user/guides/hybrid-search.md create mode 100644 docs/user/guides/index.md create mode 100644 docs/user/guides/review-workflow.md diff --git a/AGENTS.md b/AGENTS.md index 065e28aa..9e867919 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -101,6 +101,7 @@ Full diagram and concurrency model: [docs/dev/architecture.md](docs/dev/architec | Error taxonomy and result serialization | [docs/user/operations/errors.md](docs/user/operations/errors.md) | | Install (binary / Homebrew / source / channels) | [docs/user/install.md](docs/user/install.md) | | Deployment (binary / container / RustFS bootstrap / auth / build variants) | [docs/user/deployment.md](docs/user/deployment.md) | +| Task guides (hybrid search, cluster on S3, review workflow) | [docs/user/guides/index.md](docs/user/guides/index.md) | | CI / release workflows | [docs/dev/ci.md](docs/dev/ci.md) | | Code ownership (CODEOWNERS source of truth, roles, regeneration) | [docs/dev/codeowners.md](docs/dev/codeowners.md) | | Branch protection policy (declarative, applied via `scripts/apply-branch-protection.sh`) | [docs/dev/branch-protection.md](docs/dev/branch-protection.md) | diff --git a/docs/user/guides/cluster-on-s3.md b/docs/user/guides/cluster-on-s3.md new file mode 100644 index 00000000..7ef77da4 --- /dev/null +++ b/docs/user/guides/cluster-on-s3.md @@ -0,0 +1,98 @@ +# Run a Cluster on S3 + +This guide takes a cluster from a local config directory to a server that boots +**config-free from an object-storage bucket** — the bucket is the whole +deployment artifact. For the full control-plane reference, see +[operating a cluster](../clusters/index.md) and +[cluster config](../clusters/config.md). + +## 1. Declare the cluster + +Lay out a config directory. The one S3-specific line is `storage:` — it puts the +state ledger, catalog, and graph data on the bucket instead of in the folder: + +``` +company-brain/ +├── cluster.yaml +├── people.pg +├── queries/ +│ └── people.gq +└── base.policy.yaml +``` + +```yaml +# cluster.yaml +version: 1 +storage: s3://my-bucket/clusters/company-brain # the deployment lives here +metadata: + name: company-brain +graphs: + knowledge: + schema: people.pg + queries: queries/ +policies: + base: + file: base.policy.yaml + applies_to: [knowledge] +``` + +Set the S3 credentials in the environment (for a non-AWS S3-compatible store such +as MinIO or RustFS, also set `AWS_ENDPOINT_URL_S3`): + +```bash +export AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... AWS_REGION=us-east-1 +# export AWS_ENDPOINT_URL_S3=https://... # non-AWS S3-compatible stores +``` + +## 2. Validate, plan, apply + +`apply` is the only command that changes the world; `plan` previews it: + +```bash +omnigraph cluster validate --config company-brain # parse + typecheck +omnigraph cluster import --config company-brain # create the state ledger +omnigraph cluster plan --config company-brain # preview the diff +omnigraph cluster apply --config company-brain # converge onto the bucket +``` + +`apply` creates the graph at the derived root +(`s3://my-bucket/clusters/company-brain/graphs/knowledge.omni`), applies its +schema, and publishes the query and policy into the content-addressed catalog. +`converged: true` means there is nothing left to do — re-running `apply` is always +safe. + +## 3. Load data + +The control plane manages *definitions*; rows go through the normal data plane. +Address the graph by its storage URI (the derived `graphs/.omni` root): + +```bash +omnigraph load --data seed.jsonl --mode overwrite \ + s3://my-bucket/clusters/company-brain/graphs/knowledge.omni +``` + +## 4. Serve config-free from the bucket + +A serving host needs only the storage-root URI and credentials — no checkout of +the config repo: + +```bash +OMNIGRAPH_SERVER_BEARER_TOKENS_JSON='{"act-reader":"s3cret"}' \ + omnigraph-server --cluster s3://my-bucket/clusters/company-brain --bind 0.0.0.0:8080 +``` + +The server boots from the **applied revision** recorded in the ledger — never from +config that was merely written. Roll out a change by `apply`-ing again, then +restarting replicas. + +## 5. Maintain it + +Storage maintenance runs out-of-band, addressed by cluster + graph name (it +resolves the graph's storage URI from the served state): + +```bash +omnigraph optimize --cluster company-brain --cluster-graph knowledge +omnigraph cleanup --cluster company-brain --cluster-graph knowledge --keep 10 --confirm +``` + +See [maintenance](../operations/maintenance.md) for what each command does. diff --git a/docs/user/guides/hybrid-search.md b/docs/user/guides/hybrid-search.md new file mode 100644 index 00000000..ecba4402 --- /dev/null +++ b/docs/user/guides/hybrid-search.md @@ -0,0 +1,99 @@ +# Hybrid Search End to End + +This guide builds a small document graph and runs a **hybrid** query that fuses +full-text (BM25) and vector (k-NN) rankings with Reciprocal Rank Fusion. You do +not build indexes by hand — the engine maintains them; a freshly loaded row is +searchable immediately. + +See [search](../search/index.md) for the function reference and +[embeddings](../search/embeddings.md) for the full provider/env matrix. + +## 1. Schema + +A document with a text body for full-text search and a vector for similarity. +`@embed("body")` tells the engine to embed the `body` text into `embedding` at +load time: + +``` +node Document { + title: String, + body: String, + embedding: Vector(768) @embed("body"), +} +``` + +```bash +omnigraph init --schema schema.pg docs.omni +``` + +## 2. Configure embeddings + +Ingest-time embedding uses the engine's embedding client. Point it at your +provider (see [embeddings](../search/embeddings.md) for every variable): + +```bash +export GEMINI_API_KEY=... # ingest-time document embeddings +# For local experimentation without a provider, deterministic mock vectors: +# export OMNIGRAPH_EMBEDDINGS_MOCK=1 NANOGRAPH_EMBEDDINGS_MOCK=1 +``` + +If you would rather supply vectors yourself, drop `@embed` and include the +`embedding` array in each input record instead. + +## 3. Load + +```bash +omnigraph load --data docs.jsonl --mode overwrite docs.omni +``` + +Each row's `body` is embedded into `embedding` as it loads. The BM25 (full-text) +and vector indexes are maintained by the engine — there is no separate build step. + +## 4. Query — full-text, vector, then hybrid + +Full-text only: + +```gq +query text_search($q: String) { + match { $d: Document { } } + return { $d.title, bm25($d.body, $q) as score } + order { score desc } + limit 10 +} +``` + +Vector only (the query text is embedded at query time; `nearest` requires a +`limit`): + +```gq +query vector_search($q: String) { + match { $d: Document { } } + return { $d.title, nearest($d.embedding, $q) as score } + order { score desc } + limit 10 +} +``` + +Hybrid — fuse both rankings with `rrf`: + +```gq +query hybrid($q: String) { + match { $d: Document { } } + return { + $d.title, + rrf( nearest($d.embedding, $q), bm25($d.body, $q) ) as score + } + order { score desc } + limit 10 +} +``` + +Run it: + +```bash +omnigraph read --query queries.gq --name hybrid \ + --params '{"q":"trends in AI safety"}' --format table docs.omni +``` + +`rrf` combines the two rankings without needing their score scales to match, so +you get a single fused ordering from a lexical signal and a semantic one. diff --git a/docs/user/guides/index.md b/docs/user/guides/index.md new file mode 100644 index 00000000..dfb684fd --- /dev/null +++ b/docs/user/guides/index.md @@ -0,0 +1,14 @@ +# Guides + +Task-oriented walkthroughs that compose the building blocks from the reference +docs into real workflows. Each one is a runnable sequence of commands. + +- [Hybrid search end to end](hybrid-search.md) — combine full-text and vector + search in one query. +- [Run a cluster on S3](cluster-on-s3.md) — go from a config directory to a + config-free server booting from a bucket. +- [Branch-based review workflow](review-workflow.md) — stage data on a branch, + review it, and merge. + +New to OmniGraph? Start with the [quickstart](../quickstart.md) and +[concepts](../concepts/index.md) first. diff --git a/docs/user/guides/review-workflow.md b/docs/user/guides/review-workflow.md new file mode 100644 index 00000000..3d648594 --- /dev/null +++ b/docs/user/guides/review-workflow.md @@ -0,0 +1,63 @@ +# Branch-Based Review Workflow + +Branches let you stage changes off `main`, inspect them in isolation, and merge +only once they look right — Git-style, atomic across the whole graph. This guide +walks a typical "review an incoming batch before it hits main" flow. + +See [branches & commits](../branching/index.md) and [merging](../branching/merge.md) +for the underlying model. + +## 1. Stage the batch on its own branch + +Loading into a branch that does not exist is an error unless you pass `--from`, +which forks it from a base first. So one command both forks the branch and loads +into it: + +```bash +omnigraph load --data batch.jsonl --mode merge \ + --branch review/2026-04-25 --from main graph.omni +``` + +(Equivalently, create the branch first with +`omnigraph branch create review/2026-04-25 --from main graph.omni`, then `load` +without `--from`.) + +`main` is untouched — the batch lives only on `review/2026-04-25`. + +## 2. Inspect the branch in isolation + +Run any read query against the branch with `--branch`: + +```bash +omnigraph read --query checks.gq --name count_by_type \ + --branch review/2026-04-25 --format table graph.omni +``` + +Compare it against `main` — list each branch's commits, or diff them: + +```bash +omnigraph branch list graph.omni +omnigraph commit list --branch review/2026-04-25 graph.omni +``` + +## 3. Merge when it looks right + +```bash +omnigraph branch merge review/2026-04-25 --into main graph.omni +``` + +The merge is three-way and atomic. If both `main` and the branch changed the same +data incompatibly, the merge fails with a structured list of conflicts and +publishes nothing — resolve them and re-merge. See +[merging](../branching/merge.md) for the conflict kinds. + +## 4. Clean up + +Once merged, delete the review branch: + +```bash +omnigraph branch delete review/2026-04-25 graph.omni +``` + +Branch storage is reclaimed; if a transient error interrupts reclamation, the +[`cleanup`](../operations/maintenance.md) command sweeps the leftovers later. diff --git a/docs/user/index.md b/docs/user/index.md index cabd98a0..80c844dc 100644 --- a/docs/user/index.md +++ b/docs/user/index.md @@ -65,6 +65,17 @@ start with install, then follow the section that matches your task. | Understand graph layout and URI support | [concepts/storage.md](concepts/storage.md) | | Look up constants and tunables | [reference/constants.md](reference/constants.md) | +## Guides + +Task-oriented walkthroughs that compose the building blocks above: + +| Guide | Read | +|---|---| +| All guides | [guides/index.md](guides/index.md) | +| Hybrid search end to end | [guides/hybrid-search.md](guides/hybrid-search.md) | +| Run a cluster on S3 | [guides/cluster-on-s3.md](guides/cluster-on-s3.md) | +| Branch-based review workflow | [guides/review-workflow.md](guides/review-workflow.md) | + ## Releases Release notes live in [releases/](../releases/). Use them for user-visible