Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 46 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,46 @@ Versioning: [Semantic Versioning](https://semver.org/spec/v2.0.0.html) — all `

## [Unreleased]

Open roadmap.
### Fixed

- **Web search now actually applies `CEREFOX_MIN_SEARCH_SCORE`.** The v0.10.1 fix was
incomplete: the web UI defaults to `docs` mode, but only the `hybrid` branch in
`discovery.ts` was updated — the `docs` branch still passed `p_min_score: 0.0` (a
`replace_all` missed it due to a different indent). The default web search therefore
applied no threshold. Both branches now use `getMinSearchScore()`. (Note: in hybrid/docs,
the threshold filters *vector-only* matches; FTS keyword matches still pass by design.)
- **CLI honors `CEREFOX_MAX_RESPONSE_BYTES`.** The CLI enforces a response byte budget
(`--max-bytes`) but ignored the env var; its default now reads
`CEREFOX_MAX_RESPONSE_BYTES` (200000 fallback). Corrected CLAUDE.md: the budget applies
to MCP/EF **and** the CLI; only the web UI is unlimited.

### Security

- **Local container binds to `127.0.0.1` by default** (was `0.0.0.0`), so a single-user
self-hosted backend isn't exposed on the LAN. Opt in with `CEREFOX_LOCAL_BIND=0.0.0.0`.

### Docs

- World-B (local/self-hosted) coverage across the guides: `upgrading.md`
(`cerefox-local upgrade`), `operational-cost.md` (fully-local scenario — no Supabase/EF
cost), `access-paths.md` (in-container PostgREST + docker-exec MCP; token never leaves
the container), `connect-agents.md` (`cerefox-local configure-agent` / `cerefox-local mcp`).

### Added — local backend (World B), continued

- **`cerefox-local configure-agent --tool <client>`** now wires non-Claude clients too
(Claude Desktop, Cursor, Codex, Gemini), not just Claude Code. It reuses the bundled
config writers via a one-shot `docker run` (the bin gains a `--local` flag that points
the MCP entry at the `cerefox-local mcp` shim); Claude Code still goes through
`claude mcp add` on the host.
- **Shell completion is program-name aware + auto-installed.** `cerefox completion <shell>`
emits a script bound to the actual program name, so `cerefox-local completion <shell>`
produces a working `cerefox-local` completion that doesn't clash with the cloud `cerefox`
one (functions + bindings namespaced; cloud output unchanged). `install-local.sh` now
wires it up host-side (best-effort, idempotent) — generating the script from the
container and sourcing it from your shell rc, mirroring the cloud installer + printing an
"exec $shell" hint. (The `completion install` subcommand itself can't be used for World B
— proxied into the container, it would write inside it — hence the host-side wiring.)

---

Expand Down Expand Up @@ -37,7 +76,12 @@ the local/World-B container:
### Changed — local backend (World B) polish

- `install-local.sh` **auto-selects a free host port** (steps `+10` past a busy port, and
past `8000` when a cloud install shares that default) instead of silently colliding.
past `8000` when a cloud install shares that default) instead of silently colliding;
clearer message distinguishing "in use" from "avoiding the cloud default".
- **`cerefox-local start`/`upgrade`/`init` re-check the port at bring-up time** and step
`+10` to a free one (persisting it to `~/.cerefox/local/.env`) if the stored port was
taken since last run — so a port grabbed by something else doesn't leave the server
failing to bind. Only the container-(re)starting verbs do this; proxied KB commands don't.
- Detect-and-guide when Docker is missing or its daemon is stopped (no auto-install).
- World-B users can put the `CEREFOX_*` tuning overrides above in `~/.cerefox/local/.env`;
they're forwarded into the container (apply with `cerefox-local init`).
Expand Down
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -420,4 +420,4 @@ These live in `docs/guides/` and are written for someone who has never seen the
- **Agent guides**: `AGENT_GUIDE.md` (comprehensive reference for AI agents using Cerefox tools), `AGENT_QUICK_REFERENCE.md` (minimal quick reference card -- 8 tools, key rules, workflows)
- **Schema**: `src/cerefox/db/schema.sql`
- **Config**: `.env` file or environment variables (see `src/cerefox/config.py`)
- **Max response size**: defaults to 200000 bytes (MCP/Edge Function paths only; web UI and CLI are unlimited; configurable via `CEREFOX_MAX_RESPONSE_BYTES`)
- **Max response size**: defaults to 200000 bytes, configurable via `CEREFOX_MAX_RESPONSE_BYTES`. Enforced on the MCP / Edge Function paths **and the CLI** (the CLI also accepts a per-call `--max-bytes`). The **web UI is unlimited** (no byte budget).
28 changes: 17 additions & 11 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The most valuable contributions fall into these categories:

**Performance and security improvements**: profiling, query optimization, security hardening, input validation.

**Ingestion formats**: ingestion is Markdown/`.txt`-only as of v0.7 (PDF/DOCX converters were dropped). If you want to support new source formats (e.g., HTML, EPUB, Notion exports, Obsidian vaults), the conversion-to-Markdown step would need to be reintroduced — open an issue to discuss before starting.
**Ingestion formats**: Markdown / `.txt` / `.docx` (`.docx` is converted to Markdown via `mammoth` on ingest; fidelity varies). **PDF is not supported** (dropped in v0.7 — convert to Markdown upstream). To add new source formats (HTML, EPUB, Notion exports, Obsidian vaults), extend the conversion step in `packages/memory/src/ingestion/file-to-markdown.ts` — open an issue to discuss before starting.

**Knowledge system integrations**: two-way sync with knowledge management systems (Obsidian, Logseq, Notion, etc.) is an area with significant potential. If you use Cerefox alongside another knowledge tool, an integration that keeps them in sync would be a meaningful contribution.

Expand All @@ -32,33 +32,39 @@ The most valuable contributions fall into these categories:

All contributions must follow Cerefox's architecture:

**Single implementation principle**: business logic lives in Postgres RPCs (`src/cerefox/db/rpcs.sql`). Python, Edge Functions, and the MCP server are thin adapters that call RPCs. Do not duplicate logic across access paths.
**Single implementation principle**: business logic lives in Postgres RPCs (`src/cerefox/db/rpcs.sql` — still the live SQL source of truth). The TS client (`packages/memory`), the Edge Functions, and the shared MCP tool handlers (`_shared/mcp-tools/`) are thin adapters that call those RPCs. Do not duplicate logic across access paths.

**Markdown-first**: all content is stored as Markdown documents. Derived structures (embeddings, indexes, metadata) are regenerable from the document corpus.

**Cloud embeddings**: Cerefox uses cloud embedding APIs (OpenAI, Fireworks AI). New embedders must implement the `Embedder` protocol in `src/cerefox/embeddings/base.py` and output 768-dimensional vectors.
**Cloud embeddings**: Cerefox uses cloud embedding APIs. The live embedder is TypeScript in `_shared/embeddings/` (OpenAI `text-embedding-3-small`, 768-dim — the only one wired today; a Fireworks/OpenAI-compatible option is roadmap, not implemented). Any embedder must output **768-dim** vectors to match the `vector(768)` schema; changing the model/dimensions is a breaking change requiring `cerefox server reindex`.

See `docs/solution-design.md` and `docs/research/vision.md` for the full architecture and project direction.

---

## Development Setup

Cerefox is a Python + TypeScript project. As of v0.2.0, contributors need **three** runtimes installed locally:
Cerefox is a **TypeScript** project (Bun/Node). The entire runtime — CLI, MCP server, web
server, and ingestion pipeline — is TypeScript in [`@cerefox/memory`](https://www.npmjs.com/package/@cerefox/memory)
as of v0.9. Python survives **only** as a frozen, unmaintained MCP fallback (`uv run cerefox
mcp`); `uv` is optional and only needed if you want to touch that husk.

| Tool | Why | Install |
|---|---|---|
| **Python 3.11+** with [`uv`](https://docs.astral.sh/uv/) | Backend, CLI, MCP server, ingestion pipeline | `curl -LsSf https://astral.sh/uv/install.sh \| sh` |
| **Node 20+** with `npm` | Frontend (React + Vite), Supabase Edge Functions | [nodejs.org](https://nodejs.org/) or `nvm install 20` |
| **[Bun](https://bun.sh) 1.x** | TypeScript scripts (`scripts/*.ts`, starting with `cut_release.ts` in v0.2.0) | `curl -fsSL https://bun.sh/install \| bash` |
| **[Bun](https://bun.sh) 1.x** | The whole TS runtime + `scripts/*.ts` + tests (`bun test`) | `curl -fsSL https://bun.sh/install \| bash` |
| **Node 20+** with `npm` | Frontend (React + Vite) build + npm publish; an alternative TS runtime | [nodejs.org](https://nodejs.org/) or `nvm install 20` |
| **Python 3.11+** with [`uv`](https://docs.astral.sh/uv/) | **Optional** — only for the legacy `uv run cerefox mcp` fallback | `curl -LsSf https://astral.sh/uv/install.sh \| sh` |

The Bun requirement is new in v0.2.0 — see [Script-language policy](#script-language-policy-effective-from-v020) below. From v0.5.0 the local MCP server **and** the main CLI both ship as bins inside the npm package [`@cerefox/memory`](https://www.npmjs.com/package/@cerefox/memory); end users install via `npm`/`bun install -g` and don't need uv or a clone. Contributors still need all three runtimes (Python for the schema deploy + web server + ingestion pipeline until v0.6/v0.7, Node for the frontend + npm publish, Bun for TS scripts and `_shared/`/`packages/memory/` tests).
End users install via `npm`/`bun install -g @cerefox/memory` (or the one-liner installer) and
need neither `uv` nor a clone. The **local / self-hosted (Docker) backend** is separate again
— see [`docs/guides/setup-local.md`](docs/guides/setup-local.md).

```bash
# Clone and install
# Clone and install (TS deps for root + packages/memory + frontend)
git clone https://github.com/fstamatelopoulos/cerefox.git
cd cerefox
uv sync
bun install
# uv sync # OPTIONAL — only for the legacy `uv run cerefox mcp` fallback

# Run tests (`bun test` is the only runner; pytest is retired)
cd _shared && bun test # TS unit tests (mocked)
Expand Down Expand Up @@ -131,7 +137,7 @@ export const COMPATIBILITY = {
- **Client patch releases never raise a minimum.** A patch must run against the same server range as the minor it patches.
- Each bump is intentional and reviewed at PR time — don't raise a minimum "just because" the server moved. The minimum is the *oldest server this client still works with*, not *the newest server available*.

Two versions track the server side: the **schema version** (`@version:` marker in `src/cerefox/db/schema.sql`, covers schema + RPCs since they deploy atomically) and **`EF_VERSION`** (`_shared/ef-meta/index.ts`, covers all Edge Functions). `cut_release.ts` bumps `EF_VERSION` only when EF source changed since the last tag; the schema version is bumped by hand when `schema.sql`/`rpcs.sql` change.
Two versions track the server side: the **schema version** (`@version:` marker in `src/cerefox/db/schema.sql`, covers schema + RPCs since they deploy atomically) and **`EF_VERSION`** (`_shared/ef-meta/index.ts`, covers all Edge Functions). `cut_release.ts` bumps `EF_VERSION` automatically when EF source changed since the last tag, and **gates** the schema version: it fails the cut if `schema.sql`/`rpcs.sql` changed without a matching `@version:` bump (both the `schema.sql` marker and the `cerefox_schema_version()` literal in `rpcs.sql` must move together).

---

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
**User-owned shared memory for AI agents.** A persistent, curated knowledge layer that multiple AI tools can read and write, backed by Postgres + pgvector.

[![Apache 2.0 License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://python.org)
[![Node 20+](https://img.shields.io/badge/node-20+-green.svg)](https://nodejs.org)

---
Expand Down Expand Up @@ -41,7 +40,7 @@ Cerefox is **asynchronous shared memory, not a message bus**. It solves the pers
| **Metadata search** | Standalone metadata-only search (no text query needed); find documents by key-value criteria, project, and date range; optional content inclusion with byte budget; dedicated MCP tool, CLI command, and web UI page |
| **Project discovery** | `cerefox_list_projects` MCP tool for agents to discover available projects; all search results include human-readable `project_names` alongside UUIDs |
| **Heading-aware chunking** | Greedy section accumulation — H1/H2/H3 sections accumulate until MAX_CHUNK_CHARS; heading breadcrumb preserved per chunk |
| **Cloud embeddings** | OpenAI `text-embedding-3-small` (768-dim) via API — or swap to Fireworks AI |
| **Cloud embeddings** | OpenAI `text-embedding-3-small` (768-dim) via API (the only embedder wired in the TS runtime today) |
| **Remote MCP endpoint** | `cerefox-mcp` Supabase Edge Function — MCP Streamable HTTP; connect Claude Desktop, Claude Code, or Cursor with just a URL and anon key; no Python install needed |
| **Local MCP server** | `cerefox mcp` stdio server (TypeScript, from `@cerefox/memory`) -- local alternative with zero Edge Function usage, lower latency, and offline support; `npm install -g @cerefox/memory`. (A frozen Python MCP server also ships for repo-clone users: `uv run cerefox mcp`.) |
| **Web UI** | React + TypeScript SPA (Mantine UI) at `/app/`; Hono (TypeScript) JSON API backend served by `cerefox web`; Markdown viewer, search with 4 modes, document editing, project management |
Expand Down Expand Up @@ -113,7 +112,7 @@ cerefox web # web UI → http://localhost:8000/app/
```

**Prerequisites:** Node 20+ or Bun 1.0+ · a Supabase account (free tier) · an
embedding API key (OpenAI `text-embedding-3-small` by default, or Fireworks AI).
embedding API key (OpenAI `text-embedding-3-small`).

> **Full walkthrough:** [`docs/guides/quickstart.md`](docs/guides/quickstart.md)
> (~15 min). Supabase specifics: [`docs/guides/setup-supabase.md`](docs/guides/setup-supabase.md).
Expand All @@ -135,7 +134,8 @@ cerefox-local configure-agent # wire an MCP client (e.g. Claude Code)
# 3. Use it:
cerefox-local document ingest my-notes.md --title "My notes"
cerefox-local search "what did I decide about auth?"
# web UI → http://localhost:8000/app/ (manage: cerefox-local status | upgrade | stop)
# web UI → http://localhost:8000/app/ (or the port the installer chose — it auto-steps
# to 8010/… if 8000 is busy; `cerefox-local status` shows the URL. Manage: status | upgrade | stop)
```

**Prerequisites:** Docker (Docker Desktop or [Colima](https://github.com/abiosoft/colima))
Expand Down
Loading
Loading